WO2020169838A1 - Modular binding proteins - Google Patents

Modular binding proteins Download PDF

Info

Publication number
WO2020169838A1
WO2020169838A1 PCT/EP2020/054697 EP2020054697W WO2020169838A1 WO 2020169838 A1 WO2020169838 A1 WO 2020169838A1 EP 2020054697 W EP2020054697 W EP 2020054697W WO 2020169838 A1 WO2020169838 A1 WO 2020169838A1
Authority
WO
WIPO (PCT)
Prior art keywords
seq
binding protein
modular binding
peptide ligand
repeat
Prior art date
Application number
PCT/EP2020/054697
Other languages
French (fr)
Inventor
Laura ITZHAKI
Pam ROWLING
Graham LADDS
Alberto PEREZ RIBA
Beatriz GOYENECHEA CORZO
Joseph MABBITT
Marco BARDELLI
Simon GILBERT
Original Assignee
Cambridge Enterprise Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US16/282,155 external-priority patent/US11279925B2/en
Application filed by Cambridge Enterprise Limited filed Critical Cambridge Enterprise Limited
Publication of WO2020169838A1 publication Critical patent/WO2020169838A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B40/00Libraries per se, e.g. arrays, mixtures
    • C40B40/04Libraries containing only organic compounds
    • C40B40/10Libraries containing peptides or polypeptides, or derivatives thereof
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1044Preparation or screening of libraries displayed on scaffold proteins
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K9/00Medicinal preparations characterised by special physical form
    • A61K9/10Dispersions; Emulsions
    • A61K9/127Liposomes
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/46Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates
    • C07K14/47Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals

Definitions

  • This invention relates to modular binding proteins and their production and uses.
  • tandem repeat proteins A priority area in medicine, particularly cancer research, is the expansion of the‘druggable’ proteome, which is currently limited to narrow classes of molecular targets.
  • PPIs protein-protein interactions
  • the architecture of tandem repeat proteins has tremendous scope for rational design (Kobe & Kajava 2000, Longo & Blaber, 2014, Rowling et al. , 2015).
  • the key features of tandem repeat proteins are relatively small size, modularity and extremely high stability (and therefore recombinant production) without the need of disulphide bonds.
  • Individual consensus-designed repeats are self-compatible and can be put together in any order; function is therefore also modular, which means that multiple functions can be independently designed and incorporated in a combinatorial fashion within a single molecule (WO2017106728).
  • Novel repeat protein functions e.g. DARPins (Tamaskovic et al., 2012), have been developed based on the natural type of PPI interface of these proteins i.e. spanning many repeat units to create an extended, high-affinity binding interface for the target. Mutations have been introduced into the surface residues in the tetratricopeptide (TPR) repeats of the cytosolic receptor peroxin 5 (Sampathkumar et al. (2008) J. Mol. Biol., 381 , 867-880). Binding of peptide ligands to peroxin 5 is shown to be mediated by residues located in several different TPR repeats.
  • TPR tetratricopeptide
  • modular proteins capable of binding to one or more target molecules can be generated by displaying peptidyl binding motifs, such as short linear motifs (SLiMs), on modular scaffolds. These modular binding proteins may be useful, for example as single- or multifunction protein therapeutics.
  • peptidyl binding motifs such as short linear motifs (SLiMs)
  • An aspect of the invention provides a modular binding protein comprising;
  • each said peptide ligand being located in an inter-repeat loop or at the N or C terminus of the modular binding protein.
  • the one or more peptide ligands comprise an amino acid sequence set out in Table 8.
  • a modular binding protein may comprise an amino acid sequence set out in Table 13 (SEQ ID NOs: 1230-1304), or a variant thereof.
  • the modular binding protein may comprise a first peptide ligand that binds a first target molecule and a second peptide ligand that binds a second target molecule.
  • One of the first or second target molecules may be an E3 ubiquitin ligase.
  • One of the first and second peptide ligands may comprise an amino acid sequence set out in Table 8 (SEQ ID NOs: 277-388) and the other of the first and second peptide ligands may comprise an amino acid sequence set out in Table 9 (SEQ ID NOs: 470-617).
  • Another aspect of the invention provides a method of producing a modular binding protein comprising; inserting a first nucleic acid encoding a peptide ligand into a second nucleic acid encoding two or more repeat domains linked by inter-repeat loops, to produce a chimeric nucleic acid encoding a modular binding protein as described herein; and
  • the peptide ligand comprises an amino acid sequence set out in Table 8 (SEQ ID NOs: 277-388) or Table 9 (SEQ ID NOs: 470-617).
  • Another aspect of the invention provides a method of producing a modular binding protein that binds to a first target molecule and a second target molecule comprising;
  • nucleic acid encoding two or more repeat domains linked by inter-repeat loops, and incorporating into said nucleic acid a first nucleotide sequence encoding a first peptide ligand that binds to a first target molecule and a second nucleotide sequence encoding a second peptide ligand that binds to a second target molecule to generate a nucleic acid encoding a modular binding protein comprising said first and second peptide ligands, wherein said peptide ligands are located in an inter-repeat loop or at the N or C terminus of the modular binding protein; and
  • the first and second peptide ligands comprise amino acid sequences set out in Table 8 (SEQ ID NOs: 277-388) or Table 9 (SEQ ID NOs: 470-617).
  • one of the first or second target molecules is an E3 ubiquitin ligase.
  • Another aspect of the invention provides a library comprising modular binding proteins, each modular binding protein in the library comprising;
  • the one or more peptide ligands comprise an amino acid sequence set out in Table 8 (SEQ ID NOs: 277-388) or Table 9 (SEQ ID NOs: 470-617).
  • Another aspect of the invention provides a library comprising a first and a second sub-library of modular binding proteins, each modular binding protein in the first and second sub-libraries comprising;
  • the peptide ligand in the modular binding proteins in the first sub-library binds to a first target molecule and is located in one of (i) an inter-repeat loop; (ii) the N terminus or (iii) the C terminus of the modular binding protein, and
  • the peptide ligand in the modular binding proteins in the second sub-library binds to a second target molecule and is located in another of (i) an inter-repeat loop; (ii) the N terminus or (iii) the C terminus of the modular binding protein.
  • Another aspect of the invention provides a method of producing a library of modular binding proteins comprising;
  • a second peptide ligand comprising at least one diverse amino acid residue, each said peptide ligand being located in an inter-repeat loop or at the N or C terminus of the modular binding protein, and
  • Another aspect of the invention provides a method of screening a library comprising;
  • each modular binding protein in the library comprising;
  • a second peptide ligand comprising at least one diverse amino acid residue, wherein the first and second peptide ligands are located in the inter-repeat loop, at the N terminus or at the C terminus of the protein,
  • FIG. 1 shows the thermostability of consensus-designed tetratrico peptide (CTPR) proteins containing loop- or helix-grafted binding motifs: Thermal denaturation, monitored by circular dichroism, of 2-repeat RTPR (a CTPR in which lysine residues have been replaced with arginine residues) proteins: RTPR2 (in diamonds), RTPR2 containing a loop binding-module (circles) and RTPR2 containing a helix binding-module (squares). All samples are at 20 mM in 10 mM sodium phosphate buffer pH 7.4, 150 mM NaCI.
  • CPR consensus-designed tetratrico peptide
  • FIG. 2 shows the thermostability of CTPR proteins of increasing length containing an increasing number of binding modules (alternating with blank modules): Thermal denaturation curves, monitored by circular dichroism, of TPR proteins containing 1 , 2, 3 and 4 loops comprising a tankyrase-binding sequence: 1TBP-CTPR2, 2TBP-CTPR4, 3TBP-CTPR6, 4TBP-CTPR8. All samples are at 20 mM in 10 mM sodium phosphate buffer pH 7.4, 150 mM NaCI.
  • Figure 3 shows an example of helix grafting.
  • Figure 3A (i) shows the crystal structures of SOS1 (son- of-sevenless homologue 1) bound to KRAS (Kirsten rat sarcoma) (PDB 1 NVU, Margarit et al. Cell (2003) 112(5):685-95), and (ii) shows the SOS1 helix grafted onto a helix at the N-terminus of a CTPR2 protein.
  • the modelled structure of SOS-RTPR2 is shown, and the sequence of the helix is given with the key KRAS-binding residues in grey and the residues that form the interface with the CTPR helices in black (iii) shows the modelled structure of SOS-TPR2 in complex with KRAS.
  • Figure 3B shows binding of SOS-TPR2 to KRAS measured by competitive fluorescence polarization (FP).
  • FP competitive fluorescence polarization
  • the complex between mant-GTP and KRAS was pre-formed, and 0.1-300 pM SOS-RTPR2 was then titrated in to the complex, displacing the mant-GTP from KRAS resulting in a decrease in FP.
  • EC50 is 3 pM.
  • Figure 4 shows another example of helix grafting.
  • Figure 4A shows the modelled structure of the Mdm2 (Mouse double minute 2 homolog) N-terminal domain in complex with the p53-TPR2 comprising the Mdm2-binding helix of p53 grafted onto a helix at the C-terminus of a CTPR2 protein.
  • Figure 4B shows an ITC analysis of the interaction between p53-TPR2 and Mdm2 N-terminal domain. The N-terminal domain of Mdm2 was titrated into the cell containing 10mM p53-TPR2.
  • Figure 5 shows an example of single and multivalent loop-grafted CTPRs.
  • Figure 5A shows an ITC analysis of the interaction between a series of tankyrase-binding loop-grafted CTPR2 proteins (TBP- CTPR2) and the substrate-binding ARC4 (ankyrin-repeat cluster) domain of tankyrase. There is an enhancement of both binding affinity and dissociation constant with increasing number of binding modules.
  • Figure 5B shows native gel analysis (using a native gel in Tris-Glycine buffer pH 8.0, 40 mM protein concentration) of multivalent TBP-CTPR proteins expressed as fusion constructs with the foldon trimerisation domain (Boudko et al 2002; Meier et al. 2004).
  • 1TBP-CTPR2, 2TBP-CTPR4 and 4TBP-CTPR8 (all lacking the foldon domain) were purified and run as monomeric controls. Constructs having the foldon domain run at much higher molecular weights than their monomeric counterparts.
  • Figure 6 shows an example of loop-grafted CTPRs comprising the 10-residue Skp2-binding sequence derived from p27 grafted into a loop of a CTPR protein (CTPR-p27).
  • Figure 6A shows that HA-CTPR2-p27 is able to co-IP FLAG-Skp2 from HEK293T cells.
  • Figure 6B shows E. colhex pressed and purified TPR5-p27 inhibits p27 ubiquitination in vitro.
  • Figure 7 shows another example of loop-grafted CTPRs.
  • Figure 7 A shows (left) ITC analysis of the interaction between the Keapl (Kelch-like ECH-associated protein 1) KELCH domain and a CTPR2 protein containing a loop-grafted Keapl -binding sequence derived from the protein Nrf2 (Nuclear factor (erythroid-derived 2)-like 2) (Nrf-CTPR2). No binding is observed for the blank CPTR2 protein (right).
  • Figure 7B shows that three variants of Nrf-CTPR2 (Nrf-CTPR2 (i), Nrf-CTPR2 (ii), Nrf-CTPR2 (iii) can co-IP Keapl from HEK293T cells.
  • Figure 8 shows live-cell imaging of intracellular delivery of an RTPR achieved by resurfacing (by introducing Arginine residues at surface sites).
  • PC3 left
  • U20S right cells incubated with 10 mM FITC-labelled resurfaced TBP-RTPR2 for 3 hours at 37°C, 5% CO2.
  • Figure 9 shows the induced degradation of the target protein beta-catenin by designed heterobifunctional RTPRs.
  • Figure 9A shows the beta-catenin levels in cells transfected with either HA- tagged beta-catenin plasmid alone or HA-tagged beta-catenin plasmid together with one of two different hetero-bifunctional RTPR plasmids (LRH1-TPR-p27 and axin-TPR-p27, designed to bind simultaneously to beta-catenin and to E3 ligase SCF Skp2 ).
  • Figure 9B shows a quantitative analysis of the beta-catenin levels in the presence of different hetero-bifunctional RTPRs designed to bind simultaneously to beta-catenin and to either E3 ligase SCF Skp2 or E3 ligase Mdm2.
  • the analysis was performed using densitometry of the bands detected by Western blots corresponding to HA-tagged beta-catenin normalised to actin bands using ImageJ. Negative controls used were single-function TPRs or blank (non-functional) TPRs.
  • a modular binding protein may comprise: two repeat domains with a helical target-binding peptide and a helical E3-binding peptide at the N and C termini ( Figure 10A); three repeat domains with a helical E3-binding peptide at the C terminus and a target peptide ligand in the first inter-repeat loop from the N terminus ( Figure 10B); three repeat domains with a helical target-binding peptide at the N terminus and an E3 peptide ligand in the second inter-repeat loop from the N terminus ( Figure 10C), four repeat domains with a target-peptide ligand and an E3 peptide ligand in the first and third inter-repeat loop from the N terminus ( Figure 10D).
  • Figure 11 shows a schematic of a modular binding protein with four peptide ligands located in alternate inter-repeat loops. The binding sites are arrayed at 90° to each other.
  • Figure 12 shows a schematic of a modular binding protein engineered so that peptide ligands in alternate inter-repeat loops bind adjacent epitopes on the target.
  • Figure 13 shows the modelled structure of a hetero-bifunctional modular binding protein comprising TPR repeat domains, an LRH1 -derived peptide ligand designed to bind target beta-catenin, and a p53-derived N-terminal peptide ligand designed to bind to the E3 ubiquitin ligase mdm2.
  • Figure 14 shows a schematic of the combinatorial assembly of a module comprising a repeat domain and a terminal helical peptide ligand and a module comprising repeat domains and an inter-repeat loop peptide ligand to generate a modular binding protein.
  • Figure 15 shows examples of different modular binding protein formats (i) shows the blank proteins; (ii) shows binding peptides inserted into one or more inter-repeat loops (iii) shows helical binding peptides at one or both of the termini; (iv) is a combination of loop and helical binding peptides; (v) and (vi) show examples of how multivalency can be achieved.
  • Figure 16 shows a schematic of the assembly of a modular binding protein by the progressive screening of modular binding proteins comprising modules with a diverse peptide ligand in addition to modules already identified in previous rounds of screening.
  • FIG 17 shows the effect of designed multi-valent tankyrase-binding TPR proteins on Wnt signalling.
  • HEK293T cells were transfected with TPR-encoding plasmids using Lipofectamine2000.
  • the TPR proteins contained 1-4 copies of a tankyrase-binding peptide (TBP) grafted onto the inter-repeat loop(s).
  • TBP tankyrase-binding peptide
  • 2TBP-CTPR4 is a protein comprising 4 TPR modules with one TBP grafted onto the loop between the first and second TPR and one between the third and fourth TPR. ‘Foldon’ indicates a trimeric TPR-foldon fusion protein.
  • Figure 18 shows characterisation of the size and charge of liposome-encapsulated TPR proteins.
  • Figure 19 shows the delivery of TPR proteins into cells by liposome encapsulation.
  • FITC dye-labelled liposomes stain the cell membrane upon membrane fusion (red panel), and RITC-labelled TPR protein cargo is then delivered into the cytoplasm.
  • the green panel and red-green merge show that the proteins have entered the cells and are spread diffusely in the cytoplasm.
  • Figure 20 shows that liposome-encapsulated TPR proteins are not toxic to HEK293T cells at the concentrations used.
  • FIG. 21 shows the effect of designed TPR proteins delivered by liposome encapsulation.
  • the TPR proteins contained a tankyrase-binding peptide. Cells were treated with liposomes for 2 hr.
  • FIG 22 shows the effect of designed TPR proteins delivered by liposome encapsulation.
  • the TPR proteins contained a tankyrase-binding peptide. Cells were treated with liposomes encapsulating 32 mg protein for variable times (2-8 h) indicated in the figure.
  • FIG 23 shows the effect of designed hetero-bifunctional TPR proteins on KRAS levels in HEK 293T cells.
  • the TPR proteins contained a binding sequence for KRAS (a non-helical peptide sequence, referred to as KBL, grafted onto an inter-repeat loop of the RTPR) and a degron derived from p27 grafted onto another inter-repeat loop.
  • KBL non-helical peptide sequence
  • KBL non-helical peptide sequence
  • degron derived from p27 grafted onto another inter-repeat loop.
  • Cells were transiently transfected with 50 ng or 500ng of TPR encoding plasmids, as indicated, and with KRAS plasmid or empty vector as control. 24 hours post transfection the cells were lysed, and KRAS levels were evaluated by western blot. In dark grey are cells treated transfected with single-function TPR plasmid (containing degron only).
  • FIG 24 shows the effect of designed hetero-bifunctional TPR proteins (delivered by liposome encapsulation) on KRAS levels.
  • the TPR proteins contained a KRAS-binding peptide and a SCF Skp2 - binding peptide to direct KRAS for ubiquitination and subsequent degradation. Cells were treated with liposomes for 2 hr.
  • FIG. 25 shows the shows the effect of designed hetero-bifunctional TPR proteins (delivered by liposome encapsulation) on KRAS levels.
  • the TPR proteins contained a KRAS-binding peptide and a SCFSkp2-binding peptide to direct KRAS for ubiquitination and subsequent degradation.
  • Cells were treated with TPR protein for variable times (2-8 h) indicated in the figure.
  • FIG 26 shows the effect of hetero-bifunctional TPR proteins targeting endogenous KRAS to the CMA (chaperone-mediated autophagy) pathway.
  • the TPR proteins contained a binding sequence for KRAS (either a grafted helix derived from son-of-sevenless-homolog 1 (SOS) or a non-helical peptide sequence (referred to as‘KBL’) displayed in a loop of the RTPR) and targeted for degradation using two different chaperone-mediated autophagy peptides (referred to as‘CMA_Q’ or‘CMA_K’) at the N- or C-terminus of the construct.
  • KRAS grafted helix derived from son-of-sevenless-homolog 1
  • KBL non-helical peptide sequence
  • Figure 27 shows examples of variations in the linker sequence connecting a peptide ligand to an interrepeat loop in order to optimise the binding affinity for the target.
  • the example shown is Nrf-TPR, a TPR protein designed to bind to Keapl (see Fig. 7 of the original patent application).
  • Glycine residues were introduced into the linker to provide flexibility and increased spatial sampling. The introduction of this more flexible linker sequence was found to increase the binding affinity of the Nrf-TPR protein (labelled‘Flexible’) when compared with the consensus-like linker sequence.
  • Altering the charge content of the linker sequence (’labelled‘Charged’) and altering the conformational properties (based on the predictions of the program CIDER (Holehouse et al. Biophys. J. 1 12, 16-21 (2017)) of the loop by changing the amino acid composition of the linker sequence (labelled‘CIDER-optimised’) also affected the Keapl -binding affinity.
  • Figure 28 shows the results of a screen for Beta Catenin degradation by bispecific CPTR constructs using the“Phospho” peptide ligand that binds to Beta Catenin.
  • the structures and sequences the CPTR constructs are shown in Tables 12 and 13.
  • Figure 29 shows the results of a screen for Beta Catenin degradation by bispecific CPTR constructs using the“AXIN” peptide ligand that binds to Beta Catenin.
  • the structures and sequences the CPTR constructs are shown in Tables 12 and 13.
  • Figure 30 shows the results of a screen for Beta Catenin degradation by bispecific CPTR constructs using the“BCL9” peptide ligand that binds to Beta Catenin.
  • the structures and sequences the CPTR constructs are shown in Tables 12 and 13.
  • Figure 31 shows the results of a screen for Beta Catenin degradation by bispecific CPTR constructs using the“LRH1” peptide ligand that binds to Beta Catenin.
  • the structures and sequences the CPTR constructs are shown in Tables 12 and 13.
  • This invention relates to the modular binding proteins that comprise multiple repeat domains. These repeat domains are linked to each other in the polypeptide chain by inter-repeat loops. One or more peptide ligands or binding domains, are located in one or more of the inter-repeat loops and/or in N or C terminal helices of the modular binding protein.
  • the peptide ligands may be to the same or different target molecules and the modular binding protein may be multi-functional and/or multi-valent.
  • the geometrical display of the grafted binding sites may be precisely and predictably tuned by adjusting the positions of the binding sites and the number and shape of the repeat domains. Modular binding proteins as described herein may be useful in a range of therapeutic and diagnostic applications.
  • a repeat domain is a repetitive structural element of 30 to 100 amino acids that forms a defined secondary structure. Multiple repeat domains stack sequentially in a modular fashion to form a stable protein, which may for example have a solenoid or toroid structure. Repeat domains may be synthetic or may be naturally-occurring repeats from tandem repeat proteins, or variants thereof.
  • EGF repeats EGF repeats
  • cadherin repeats leucine-rich repeats
  • HEAT repeats HEAT repeats
  • ankyrin repeats armadillo repeats
  • tetratricopeptide repeats etc.
  • a repeat domain may have the structure of a solenoid repeat.
  • the structures of solenoid repeats are well known in the art (see for example Kobe & Kajava Trends in Biochemical Sciences 2000;
  • a repeat domain may have an a/a or a/3io (helix-turn-helix or hth) structure, for example a tetratricopeptide repeat structure; a/a/a (helix-turn-helix-turn-helix or hthth) structure, for example an armadillo repeat structure; a b/b/a/a structure; a a/b or 3io/b structure, for example a leucine rich repeat (LRR) structure; a b/b/b structure, for example, an IGF1 RL, HPR or PeIC repeat structure; or a b/b structure, for example a serralysin or EGF repeat structure.
  • LRR leucine rich repeat
  • A“scaffold” refers to two or more repeat domains
  • a“grafted scaffold” refers to a chimeric protein, which is a continuous polypeptide comprising a scaffold and a heterologous binding site (e.g., a peptide ligand).
  • Ankyrin repeat one of the most widely existing protein motifs in nature, consists of 30-34 amino acid residues and exclusively functions to mediate protein-protein interactions, some of which are directly involved in the development of human cancer and other diseases.
  • Each ankyrin repeat exhibits a helix-turn-helix conformation, and strings of such tandem repeats are packed in a nearly linear array to form helix-turn-helix bundles with relatively flexible loops.
  • the global structure of an ankyrin repeat protein is mainly stabilized by intra- and inter-repeat hydrophobic and hydrogen bonding interactions.
  • the repetitive and elongated nature of ankyrin repeat proteins provides the molecular bases of the unique characteristics. Examples of ANK repeats suitable for use as described herein are shown in Table 14.
  • Consensus Sequences for ANK repeats include the following:
  • the armadillo (Arm) repeat is an approximately 40 amino acid long tandemly repeated sequence motif first identified in the Drosophila melanogaster segment polarity gene armadillo involved in signal transduction through wingless.
  • Animal Arm-repeat proteins function in various processes, including intracellular signaling and cytoskeletal regulation, and include such proteins as beta-catenin, the junctional plaque protein plakoglobin, the adenomatous polyposis coli (ARC) tumour suppressor protein, and the nuclear transport factor importin-alpha, amongst others [(PUBMED:9770300)].
  • Consensus sequences for ARM repeats include the following:
  • Suitable repeat domains may include domains of the Ankyrin clan (Pfam: CL0465), such as ankyrin (PF00023), which may comprise a 30-34 amino-acid repeat composed of two beta strands and two alpha helices; domains of the leucine-rich repeat (LRR) clan (Pfam; CL0022), such as LRR1
  • PF00560 which may comprise a 20-30 amino acid repeat composed of an a/b horseshoe fold
  • domains of the Pec Lyase-like (CL0268) clan such as pec lyase C (PF00544), which may comprise a right handed beta helix
  • domains of the beta-Roll (CL0592) clan such as Haemolysin-type calciumbinding repeat (PF000353), which may comprise short repeat units (e.g.
  • Suitable repeat domains may be identified using the PFAM database (see for example Finn et al Nucleic Acids Research (2016) Database Issue 44:D279-D285).
  • the repeat domain may have the structure of an a/a- solenoid repeat domain, such as a helix-turn-helix.
  • a helix-turn-helix domain comprises two antiparallel a-helices of 12-45 amino acids.
  • Suitable helix-turn-helix domains include tetratricopeptide-like repeat domains.
  • Tetratricopeptide-like repeats may include domains of the TPR clan (CL0020), for example and Arm domains (see for example Armadillo; PF00514; Huber et al Cell 1997;90: 871-882), HEAT domains (Huntingtin, EF3, PP2A, TOR1 ; PF02985; see for example Groves et al. Cell. 96 (1): 99-110), PPR domains
  • helix-turn-helix domain may be synthetic, for example DHR1 to DHR83 as disclosed in Brunette et al., Nature 2015 528 580-584.
  • the helix-tum-helix scaffold may be a tetratricopeptide repeat domain (TPR) (D’Andrea & Regan, 2003) or a variant thereof.
  • TPR repeat domains may include naturally occurring or synthetic TPR domains. Suitable TPR repeat domains are well known in the art (see for example Parmeggiani et al. , J. Mol. Biol. 427 563-575) and may have the amino acid sequence:
  • AEAWYNLGNAYYKQGDYQKAIEYYQKALEL-X1X2 X3X4 (SEQ ID NO: 13) wherein Xi- 4 are independently any amino acid, preferably Xi and X2 being D and P respectively, or may be a variant of this sequence.
  • Other TPR repeat domain sequences are shown in Tables 4-6 and 1 1 below.
  • TPR repeat consensus sequences include the following:
  • TPR repeat domain sequences are shown in Tables 4 (SEQ ID NOs 168-181 ), 6 (SEQ ID NOs
  • Preferred TPR domains may include CTPR, RTPRa, RTPRb and KTPRb domains, for example a domain having a sequence shown in Table 4 (SEQ ID NOs 170-183) or Table 6 (SEQ ID NOs: 223- 246) or Table 11 (SEQ ID NOs: 816-1229) or a variant of a sequence shown in Table 4 (SEQ ID NOs 170-183) or Table 6 (SEQ ID NOs: 223-246) or Table 11 (SEQ ID NOs: 816-1229).
  • a TPR repeat domain may be a human TPR repeat domain, preferably a TPR repeat domain from a human protein in blood. TPR repeat domains from human blood may have reduced immunogenicity in vivo. Suitable human blood TPR repeat domains may include repeat domains from IFIT1 , IFIT2 or IFIT3. Other examples of human blood repeat domains identified in the plasma proteome database are shown in Table 5 (SEQ ID NOs; 184-222).
  • Suitable human blood repeat domains may be identified from the plasma proteome database (Nanjappa et al Nucl Acids Res 2014 Jan;42(Database issue):D959-65) for example by searching for sequences with high sequence identity to the TPR repeat domain using standard sequence analysis tools (e.g. Altschul et al Nucleic Acids Res. 25:3389-34021; Altschul et al FEBS J. 272:5101-5109).
  • the two or more repeat domains of a modular binding protein described herein may comprise a small, glutamine-rich, tetratricopeptide repeat protein alpha (SGTA) domain.
  • the STGA domain is involved in protein quality control pathways and comprises three TPR repeat domains and a C-terminal capping helix.
  • the STGA domain binds to the Rpn13 substrate receptor of the proteasome.
  • a modular binding protein comprising the STGA domain may bind to proteasome receptors through the STGA domain.
  • One or more peptide ligands for a target molecule grafted into the STGA domain of a modular binding protein as described herein may allow the target molecule to be delivered directly to the proteasome for degradation by the modular binding protein.
  • a suitable STGA domain may have the sequence:
  • a suitable STGA domain may be encoded by the nucleotide sequence:
  • a variant of a reference repeat domain or peptide ligand sequence set out herein may comprise an amino acid sequence having at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or at least 98% sequence identity to the reference sequence.
  • Particular amino acid sequence variants may differ from a repeat domain shown above by insertion, addition, substitution or deletion of 1 amino acid, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more than 10 amino acids.
  • TPR repeat domain may comprise one or more conserved residues, for example, 1 , 2, 3, 4, 5, 6 or more preferably all of Leu at position 7, Gly or Ala at position 8, Tyr at position 11 , Ala at position 20, Ala at position 27, Leu or lie at positions 28 and 30 and Pro at position 32.
  • Use of GAP may be preferred but other algorithms may be used, e.g. BLAST (which uses the method of Altschul et al. (1990) J. Mol. Biol. 215: 405-410), FASTA (which uses the method of Pearson and Lipman (1988) PNAS USA 85: 2444-2448), or the Smith -Waterman algorithm (Smith and Waterman (1981) J. Mol Biol.
  • the default parameters e.g. for gap penalty and extension penalty, are preferably used.
  • a preferred example of algorithm that is suitable for determining percent sequence identity and sequence similarity are the BLAST and BLAST 2.0 algorithms, which are described in Altschul et al., Nuc. Acids Res. 25:3389-3402 (1977) and Altschul et al., J. Mol. Biol. 215:403-410 (1990), respectively.
  • a repeat domain may comprise one or more point mutations to facilitate grafting of hydrophobic peptide ligands.
  • aromatic residues in the repeat domain may be substituted for polar or charged residues. Suitable substitutions may be identified in a rational manner, for example using Hidden Markov plots of repeat domain sequences to identify non-aromatic residues that are found in nature in consensus aromatic positions.
  • a suitable TPR repeat domain for grafting hydrophobic peptide ligand may have the amino acid sequence:
  • AEAWYNLGNAYYRQGDYQRAIEYYQRALEL-X I X 2 X3X4 (SEQ ID NO: 20) wherein Xi- 4 are independently any amino acid, preferably Xi and X2 being D and P respectively.
  • lysine residues in the repeat domain may be replaced by arginine residues to prevent ubiquitination and subsequent degradation.
  • the modular binding protein comprises an E3 ubiquitin ligase-peptide ligand, for example in a proteolysis targeting chimera (PROTAC).
  • a suitable TPR repeat domain may have the amino acid sequence:
  • AEALNNLGNVYREQGDYQRAIEYYQRALEL-X IX2 X3X4 (SEQ ID NO: 21) wherein Xi- 4 are independently any amino acid, preferably Xi and X2 being D and P respectively.
  • a modular binding protein may comprise a scaffold with the amino acid sequence of residues 1 to 171 of SEQ ID NO: 1230 (PPX172 of Table 13 without the HA Tags).
  • a target peptide ligand and the E3 ligase peptide ligand may be grafted into any two of the N terminal helical grafting site 1 (corresponding to residues 1 to 2 of SEQ ID No 1230; before first CTPR repeat); loop grafting site 2 (corresponding to residues 35 to 36 of SEQ ID No 1230; between first and second CTPR repeat); loop grafting site 3 (corresponding to residues 69 to 70 of SEQ ID NO: 1230m between second and third CTPR repeat); loop grafting site 4 (corresponding to residues 103 to 104 of SEQ ID No 1230; between third and fourth CTPR repeat), loop grafting site 5 (corresponding to residues 137 to 138 SEQ ID No 1230; between fourth and fifth CTPR modules) and loop grafting site 6 (corresponding to residues 167 to 168
  • the modular binding protein may comprise 2, 3, 4, 5, 6, 7, 8, 9, 10 or more than 10 repeat domains.
  • the modular binding protein comprises 2 to 5 repeat domains.
  • Modular binding proteins with fewer repeat domains may display increased cell penetration.
  • a modular binding protein with 2-3 repeat domains may be useful in binding intracellular target molecule.
  • Modular binding proteins with more repeat domains may display increased stability and functionality.
  • a modular binding protein with 4 or more repeat domains may be useful in binding extracellular target molecules.
  • a modular binding protein with 6 or more repeat domains may be useful in producing long linear molecules for targeting or assembling extracellular complexes in bi- or multivalent formats.
  • a modular binding protein may comprise:
  • the repeat domains of a modular binding protein may lack binding activity i.e. the binding activity of the modular binding protein is mediated by the peptide ligands and not by residues within the repeat domains.
  • a peptide ligand is a contiguous amino acid sequence that specifically binds to a target molecule.
  • Suitable peptide ligands that are capable of grafting onto a terminal helix or inter-repeat loop are well- known in the art and include peptide sequences selected from a library, antigen epitopes, natural protein-protein interactions (helical, extended or turn-like) and short linear motifs (SLiMs). Viral SLiMs (that hijack the host machinery) may be particularly useful because they may display high binding affinities (Davey et al (201 1) Trends Biochem. Sci. 36,159-169).
  • a suitable peptide ligand for a target molecule may be selected from a library, for example using phage or ribosome display, or identified or designed using rational approaches or computational design, for example using the crystal structure of a complex or an interaction.
  • peptide ligands may be identified in an amino acid sequence using standard sequence analysis tools (e.g. Davey et al Nucleic Acids Res. 201 1 Jul 1 ; 39(Web Server issue): W56-W60).
  • Peptide ligands may be 5 to 25 amino acids in length, preferably 8 to 15 amino acids, although in some embodiments, longer peptide ligands may be employed.
  • the peptide ligands and the repeat domains of the modular binding protein are heterologous i.e. the peptide ligand is not associated with the repeat domain in naturally occurring proteins and the binding and repeat domains are artificially associated in the modular binding protein by recombinant means.
  • a modular binding protein described herein may comprise 1 to n+1 peptide ligands, where n is the number of repeat domains in the modular binding protein.
  • the number of peptide ligands is determined by the required functionality and valency of the modular binding protein.
  • one peptide ligand may be suitable for a mono-functional modular binding protein and two or more peptide ligands may be suitable for a bi-functional or multi-functional modular binding protein.
  • Modular binding proteins may be monovalent.
  • a target molecule may be bound by a single peptide ligand in a monovalent modular binding protein.
  • Modular binding proteins may be multivalent.
  • a target molecule may be bound by two or more of the same or different peptide ligands in a multivalent modular binding protein.
  • Modular binding proteins may be monospecific.
  • the peptide ligands in a monospecific modular binding protein may all bind to the same target molecule, more preferably the same site or epitope of the target molecule.
  • Modular binding proteins may be multi-specific.
  • the peptide ligands in a multi-specific modular binding protein may bind to different target molecules.
  • a bi-specific modular binding protein may comprise one or more peptide ligands that bind to a first target molecule and one or more peptide ligands that bind to a second target molecule and a tri-specific modular binding protein may comprise one or more peptide ligands that bind to a first target molecule, one or more peptide ligands that bind to a second target molecule and one or more peptide ligands that bind to third target molecule.
  • a bi-specific modular binding protein may bind to the two different target molecules concurrently.
  • This may be useful in bringing the first and second target molecules into close proximity.
  • concurrent binding of the target molecules to the modular binding protein may bring the cells into close proximity, for example to promote or enhance the interaction of the cells.
  • a modular binding protein which binds to a tumour specific antigen and a T cell antigen, such as CD3, may be useful in bringing T cells into proximity to tumour cells.
  • the target molecules are from different biological pathways, this may be may be useful in achieving synergistic effects and also for minimising resistance.
  • a tri-specific modular binding protein may bind to three different target molecules concurrently.
  • one of the target molecules may be an E3 ubiquitin ligase.
  • trispecific modular binding protein may bind to a first target molecule from a first biological pathway and a second target molecule from a second biological pathway as well as an E3 ubiquitin ligase. This may be useful in achieving synergistic effects and also for minimising resistance.
  • a peptide ligand may be located in an inter-repeat loop of the modular binding protein.
  • An inter-repeat peptide ligand may comprise 5 to 25 amino acid residues, preferably 8 to 15 amino acids. However, since there is no intrinsic restriction on the size of the inter-loop peptide ligand, longer sequences of more than 25 amino acid residues may be used in some embodiments.
  • an unstructured peptide ligand may be inserted into an inter-repeat loop.
  • One or more, two or more, three or more, four or more or five or more of the inter-repeat loops in the modular binding protein may comprise peptide ligands.
  • the peptide ligands may be located on consecutive inter- re peat loops or may have a different distribution in the inter-repeat loops of the modular binding protein.
  • inter-repeat loops comprising a peptide ligand may be separated in the modular protein by one or more, two or more, three or more or four or more interrepeat loops which lack a peptide ligand.
  • a peptide ligand may be connected to an inter-repeat loop directly or via one or more additional residues or linkers. Additional residues or linkers may be useful for example when a peptide ligand requires conformational flexibility in order to bind to a target molecule, or when the amino acid residues that are adjacent to the minimal peptide ligand favourably influence the micro-environment of the binding interface.
  • Additional residues or linkers may be positioned at the N terminus of the peptide ligand, the C terminus of the peptide ligand, or both.
  • sequence of an inter-repeat loop containing a peptide ligand may be [Xi-;]-[Xi-n]-[Xi-z] (SEQ ID NO: 22), where each residue denoted by X is independently any amino acid and may be the same amino acid or a different amino acid to any other residue that is also denoted by X, [Xi- n ] is the peptide ligand, n is 1 to 100, [Xi-;] is a linker and / ' is independently any number between 1 to 10.
  • D may be preferred at the first position of the linker [Xi-;], P may be preferred at the second position of linker [Xi-;], D may be preferred at the last position of the linker [Xi- Z ] and/or P may be preferred at the penultimate position of linker [Xi-z].
  • Examples of preferred inter-repeat loop sequences may include DP-[Xi- n ]-PX; DPXX- [Xi- n ]-XXPX (SEQ ID NO: 23); DPXX-[Xi- n ]-XPXX (SEQ ID NO: 24); DPXX-[Xi- n ]-PXX (SEQ ID NO: 25); PXX-[Xi-;]-[Xi-n]-[Xi-;]-XXPX (SEQ ID NO: 26), DPXX-[Xi-;]- [Xi- n ]-[Xi-;]-XPXX (SEQ ID NO: 27), DPXX-[Xi-;]- [Xi-n]-[Xi-;]-PXX (SEQ ID NO: 28), DPXX-[Xi-;]- [Xi- n ]-XPXX (SEQ ID NO: 29),
  • residues or linkers used to connect a peptide ligand to an inter-repeat loop depends on the peptide ligand and may be readily determined for any peptide ligand of interest using standard techniques.
  • small, non-hydrophobic amino acids such as glycine
  • proline residues may be used to increase rigidity, for example, when the peptide ligands are short.
  • an inter-repeat peptide ligand may be non-hydrophobic.
  • at least 40% of the amino acids in the peptide ligand may be charged (e.g. D, E, R or K) or polar (e.g. Q, N, H, T, Y, C or W).
  • the repeat domains may be modified to accommodate a hydrophobic peptide ligand, for example by replacing aromatic residues with charged or polar residues.
  • a peptide ligand may be located at one or both termini of the modular binding protein.
  • a peptide ligand located at the N or C terminus may comprise an a-helical structure and may comprise all or part of a half-repeat (i.e. all or part of a single a-helix) that stacks against the adjacent repeat domain.
  • the a-helix of the terminal peptide ligand makes stabilising interactions with the adjacent repeat domain and is stable and folded. Only a few of the positions that structurally define an a-helix are required for the correct interfacial interaction with the adjacent repeat domain.
  • a helical peptide ligand may be located at the N terminus of the protein.
  • the N terminal peptide ligand may be helical and may comprise all or part of the sequence X n -(X)is-XiX2XX (SEQ ID NO: 35), preferably all or part of the sequence X n -XYXXXIXXYXXXLXX-XiX 2 XX (SEQ ID NO: 36), where each residue denoted by X is independently any amino acid and may be the same amino acid or a different amino acid to any other residue in the sequence that is also denoted by X, Xi is independently any amino acid, preferably D, and X2 is independently any amino acid, preferably P, and n is 0 or any number.
  • the Y, I, and/or L residues in the N terminal peptide ligand may be substituted for an amino acid residue with similar properties (i.e. a conservative substitution).
  • a helical peptide ligand may be located at the C terminus of the protein.
  • the C terminal peptide ligand may be helical and may comprise all or part of the sequence X n -(X)is-XiX 2 XX (SEQ ID NO: 35), preferably all or part of the sequence X I X 2 XX-XXAXXXLXX[A (SEQ ID NO: 37) or V]XXXXX-X n (SEQ ID NO: 38), where X is independently any amino acid and may be the same amino acid or a different amino acid to any other residue in the sequence that is also denoted by X, Xi is independently any amino acid, preferably D, and X2 is independently any amino acid, preferably P, and n is 0 or any number.
  • the A, L and/or V residues in the C terminal peptide ligand may be substituted for an amino acid residue with similar properties (i.e. a conservative substitution).
  • the minimum length of the terminal peptide ligand is determined by the number of residues required to form a helix that binds to the target molecule. There is no intrinsic maximum length of the terminal peptide ligand and n may be any number.
  • a peptide ligand located at the N or C terminus may comprise a non-helical structure.
  • a peptide ligand that is an obligate N- or C- terminal domain (for example because the terminal amino or carboxylate group mediates the binding interaction) may be located at the beginning or end of the one or more repeat domains.
  • one or more positions in a peptide ligand may be diverse or randomised.
  • a modular binding protein comprising one or more diverse or randomised residues may form a library as described below.
  • the N and C terminal peptide ligands may be non-hydrophobic.
  • at least 20% of the amino acids in the peptide ligand may be charged (e.g. D, E, R or K) or polar (e.g. Q, N, H, T, Y, C or W).
  • the helix turn helix scaffold of the repeat domains may be modified, for example by replacing aromatic residues with charged or polar residues in order to accommodate a hydrophobic peptide ligand.
  • a modular binding protein as described herein may comprise peptide ligands in any arrangement or combination.
  • peptide ligands may be located at both the N and C terminus and optionally one or more inter-repeat loops of a modular binding protein; at the N terminus and optionally one or more loops of a modular binding protein; at the C terminus and optionally one or more loops of a modular binding protein; or in one or more inter- re peat loops of a modular binding protein.
  • the location of the peptide ligands within a modular binding protein may be determined by rational design, for example using modelling to identify the optimal arrangement for the presentation of two target molecules to each other (e.g. for substrate presentation to an E3 ubiquitin ligase); and/or by screening for example using populations of modular binding proteins with different arrangements of peptide ligands to identify the arrangement which confers the optimal interaction of target molecules.
  • Suitable target molecules for modular binding proteins described herein include biological macromolecules, such as proteins.
  • the target molecule may be a receptor, enzyme, antigen, oligosaccharide, oligonucleotide, integral membrane protein, transcription factor, transcriptional regulator, G protein coupled receptor (GPCR) or any other target of interest.
  • GPCR G protein coupled receptor
  • Proteins that are difficult to target with small molecules, such as PPIs, proteins that accumulate in neurodegenerative diseases and proteins overexpressed in disease conditions, such as cancer may be particularly suitable target molecules.
  • Target molecules may include a-synuclein; b-amyloid; tau; superoxide dismutase;
  • telomeres sarcoma-friend leukemia integration 1
  • TEL-AML1 T-cell acute lymphocytic leukemia protein 1
  • Sox2 Sox2 ((sex determining region Y)-box 2)
  • tankyrases phosphatases such as PP2A
  • epigenetic writers, readers and erasers such as histone deacetylases and histone methyltransfe rases
  • BRD4 and other bromodomain proteins kinases, such as PLK1 (polo-like kinase 1), c-ABL (Abelson murine leukemia viral oncogene homolog 1) and BCR (breakpoint cluster region)-ABL.
  • PLK1 poly-like kinase 1
  • c-ABL Abelson murine leukemia viral oncogene homolog 1
  • BCR breakpoint cluster region
  • a modular binding protein may neutralise a biological activity of the target molecule, for example by inhibiting or antagonising its activity or binding to another molecule or by tagging it for ubiquitination and proteasomal degradation or for degradation via autophagy.
  • a modular binding protein may activate a biological activity of the target molecule.
  • the target molecule may be b-catenin.
  • Suitable peptide ligands that specifically bind to b-catenin are well-known in the art and include b-catenin-peptide ligands derived from axin (e.g. GAYPEYILDIHVYRVQLEL (SEQ ID NO: 39) and variants thereof), Bcl-9 (e.g.
  • TCF7L2 e.g., SQEQLEHRYRSLITLYDIQLML (SEQ ID NO: 40) and variants thereof), TCF7L2 (e.g., SEQ ID NO: 40) and variants thereof), TCF7L2 (e.g., SEQ ID NO: 40) and variants thereof), TCF7L2 (e.g., SEQ ID NO: 40) and variants thereof), TCF7L2 (e.g.
  • I CAT e.g.
  • YAYQRAIVEYMLRLMS (SEQ ID NO: 42) and variants thereof), LRH-1 (e.g. YEQAIAAYLDALMC (SEQ ID NO: 43) and variants thereof) or APC (e.g. SCSEELEALEALELDE (SEQ ID NO: 44) and variants thereof).
  • LRH-1 e.g. YEQAIAAYLDALMC (SEQ ID NO: 43) and variants thereof
  • APC e.g. SCSEELEALEALELDE (SEQ ID NO: 44) and variants thereof.
  • the target molecule may be KRAS.
  • Suitable peptide ligands that specifically bind to KRAS are well-known in the art and include a KRAS-peptide ligand from SOS-1 (e.g.
  • FEGIALTNYLKALEG SEQ ID NO: 45
  • variants thereof KRAS-peptide ligands identified by phage display (see for example Sakamoto et al. Biochem. Biophys. Res. Comm. (2017) 484 605- 611).
  • the target molecule may be tankyrase.
  • Suitable peptide ligands that specifically bind to tankyrase are well-known in the art and include tankyrase peptide ligands from Axin (e.g. REAGDGEE (SEQ ID NO: 46) and HLQREAGDGEEFRS (SEQ ID NO: 47) or variants thereof).
  • the target molecule may be EWS-FLI1.
  • Suitable peptide ligands that specifically bind to EWS-FLI1 are well-known in the art and include the ESAP1 peptide
  • TM RG KKKRTR AN (SEQ ID NO: 48) and variants thereof.
  • Other suitable sequences may be identified by phage display (see for example Erkizan et al. Cell Cycle (2011) 10, 3397-408).
  • the target molecule may be Aurora-A.
  • Suitable peptide ligands that specifically bind to Aurora-A are well-known in the art and include Aurora-A binding sequences from TPX2, such as SYSYDAPSDFINFSS (SEQ ID NO: 49) (Bayliss et al. Mol. Cell (2003) 12, 851-62) and Aurora-A binding sequences from N-myc, such as N-myc residues 19-47 or 61-89 (see for example Richards et al. PNAS (2016) 113, 13726-31).
  • the target molecule may be N-Myc or C-Myc.
  • Suitable peptide ligands that specifically bind to N-myc or C-myc are well-known in the art and include helical binding sequences from Aurora-A (see for example Richards et al. PNAS (2016) 1 13, 13726-31).
  • the target molecule may be WDR5 (WD repeat-containing protein 5).
  • Suitable peptide ligands that specifically bind to WDR5 are well-known in the art and include the WDR5-interacting motif (WIN) of MLL1 (mixed lineage leukemia protein 1) (see for example Song & guitarist J. Biol. Chem. (2008) 283, 35258-64; Patel et al. J. Biol. Chem. (2008) 283, 32158-61), e.g. EPPLNPHGSARAEVHLRKS (SEQ ID NO: 50) and variants thereof.
  • WDR5-interacting motif WIN
  • MLL1 mixed lineage leukemia protein 1
  • EPPLNPHGSARAEVHLRKS SEQ ID NO: 50
  • the target molecule may be BRD4 or a Bromodomain protein.
  • Suitable peptide ligands that specifically bind to BRD4 are well-known in the art and include sequences derived from histone protein ligands.
  • the target molecule may be a HD AC (histone deacetylase).
  • HD AC histone deacetylase
  • Suitable peptide ligands that specifically bind to HD AC are well-known in the art and include binding sequences derived from SMRT and other proteins that recruit HDACs to specific transcriptional regulatory complexes or binding sequences derived from histone proteins (see for example Watson et al. Nat. Comm. (2016) 7, 11262; Dowling et al. Biochem. (2008) 47, 13554-63).
  • the target molecule may be Notch.
  • Suitable peptide ligands that specifically bind to Notch are well-known in the art and include binding sequences from the N-terminus of MAML1 (mastermind like protein 1), e.g. SAVMERLRRRIELCRRHHST (SEQ ID NO: 51) and variants thereof (see for example Moellering et al. Nature (2009) 462, 182-8).
  • the target molecule may be a Cdk (cyclin-dependent kinase).
  • Suitable peptide ligands that specifically bind to Cdks are well-known in the art and include substrate-based peptides, for example, Cdk2 sequences derived from cyclin A, such as TYTKKQVLRMEHLVLKVLTFDL (SEQ ID NO: 52) and variants thereof (see for example Gondeau et al. J. Biol. Chem. (2005) 280, 13793- 800; Mendoza et al. Cancer Res. (2003) 63, 1020-4).
  • the target molecule may be PLK1 (polo-like kinase 1).
  • PLK1 poly-like kinase 1
  • Suitable peptide ligands that specifically bind to PLK1 are well-known in the art and include optimised substrate-derived sequences that bind to the substrate-binding PBD (polo-box domain), such as
  • MAGPMQSEPLMGAKK (SEQ ID NO: 53) and variants thereof.
  • the target molecule may be Tau.
  • Suitable peptide ligands that specifically bind to Tau are well-known in the art and include tau-binding sequences derived from alpha- and beta-tubulin, such as KDYEEVGVDSVE (SEQ ID NO: 54) and YQQYQDATADEQG (SEQ ID NO: 55) and variants thereof (see for example Maccioni et al. EMBO J. (1988) 7, 1957-63; Rivas et al. PNAS (1988) 85, 6092-6).
  • the target molecule may be BCR-ABL.
  • Suitable peptide ligands that specifically bind to BCR-ABL are well-known in the art and include optimized substrate-derived sequences, such as EAIYAAPFAKKK (SEQ ID NO: 56) and variants thereof.
  • the target molecule may be PP2A (protein phosphatase 2A).
  • PP2A protein phosphatase 2A
  • Suitable peptide ligands that specifically bind to PP2A are well-known in the art and include sequences that bind the B56 regulatory subunit, such as LQTIQEEE (SEQ ID NO: 57) and variants thereof (see for example Hetz et al. Mol. Cell (2016), 63 686-95).
  • the target molecule may be EED (Embryonic ectoderm development).
  • EED Embryonic ectoderm development
  • Suitable peptide ligands that specifically bind to EED are well-known in the art and include helical binding sequences from co-factor EZH2 (enhancer of zeste homolog 2), such as
  • FSSNRQKILERTEILNQEWKQRRIQPV (SEQ ID NO: 58) and variants thereof (see for example Kim et al. Nat. Chem. Biol. (2013) 9, 643-50.)
  • the target molecule may be MCL-1 (induced myeloid leukemia cell differentiation protein).
  • MCL-1 induced myeloid leukemia cell differentiation protein
  • Suitable peptide ligands that specifically bind to MCL-1 are well-known in the art and include sequences from BCL2, e.g. KALETLRRVGDGVQRNHETAF (SEQ ID NO: 59) and variants thereof (see for example Stewart et al. Nat. Chem. Biol. (2010) 6, 595-601).
  • the target molecule may be RAS.
  • RAS peptide ligands are well- known in the art and include RAS-binding peptides identified by phage display, such as RRRRCPLYISYDPVCRRRR (SEQ ID NO: 60) and variants thereof (see for example Sakamoto et al. BBRC (2017) 484, 605-1 1).
  • the target molecule may be GSK3 (glycogen synthase kinase 3).
  • GSK3 peptide ligands are well-known in the art and include substrate-competitive binding sequences such as KEAPPAPPQDP (SEQ ID NO: 61), LSRRPDYR (SEQ ID NO: 62), RREGGMSRPADVDG (SEQ ID NO: 63), and YRRAAVPPSPSLSRHSSPSQDEDEEE (SEQ ID NO: 64) and variants thereof (see for example llouz et al. J. Biol. Chem. 281 (2006), 30621-30630. Plotkin et al. J. Pharmacol.
  • the target molecule may be CtBP (C-terminal binding protein).
  • CtBP peptide ligands are well-known in the art and include sequences identified from a cyclic peptide library screen, such as SGWTWRMY (SEQ ID NO: 65) and variants thereof (see for example Bids et al. Chem. Sci. (2013) 4, 3046-57).
  • Suitable peptide ligands for target molecules that may be used in a modular binding protein as described herein are shown in Tables 2 and 7.
  • Preferred peptide ligands are shown in Table 8 (SEQ ID NOs: 277-388).
  • a modular binding protein as described herein may comprise a peptide ligand for an E3 ubiquitin ligase.
  • E3 ubiquitin ligases include MDM2, SCF Skp2 , BTB-CUL3-RBX1 , APC/C, SIAH, CHIP, Cul4-DDB1 , SCF-family, b-TrCP, Fbw7 and Fbx4.
  • Suitable peptide ligands for E3 ubiquitin ligases are well known in the art and may be 5 to 20 amino acids.
  • a suitable peptide ligand for MDM2 may include a peptide ligand from p53 (e.g. FAAYWNLLSAYG (SEQ ID NO: 66)) and or a variant thereof.
  • a suitable peptide ligand for SCF Skp2 may include a peptide ligand from p27 (e.g. AGSNEQEPKKRS (SEQ ID NO: 67)) and variants thereof.
  • a suitable peptide ligand for Keap1-Cul3 may include a peptide ligand from Nrf2 (e.g.
  • a suitable peptide ligand for SPOP-Cul3 may be include a peptide ligand from Puc (e.g. LACDEVTSTTSSSTA (SEQ ID NO: 69) or a variant thereof.
  • a suitable peptide ligand for APC/C may include the degrons termed ABBA (e.g.
  • SLSSAFHVFEDGNKEN (SEQ ID NO: 70)
  • KEN e.g. SEDKENVPP (SEQ ID NO: 71)
  • DBOX e.g. PRLPLGDVSNN (SEQ ID NO: 72)
  • a suitable peptide ligand for SIAH may include a peptide ligand from PHYL (e.g.
  • a suitable peptide ligand for CHIP (carboxyl terminus of Hsc70-interacting protein) may include peptide sequences such as ASRMEEVD (SEQ ID NO: 74) (from Hsp90 C-terminus) and GPTIEEVD (SEQ ID NO: 75) (from Hsp70 C-terminus) or a variant thereof.
  • a suitable peptide ligand for beta-TrCP may include a degron sequence motif (including phosphomimetic amino acids), such as DDGYFD (SEQ ID NO: 76) or a variant thereof.
  • a suitable peptide ligand for Fbx4 may include sequences derived from TRF1 , such as MPIFWKAHRMSKMGTG (SEQ ID NO: 77) or a variant thereof (see for example Lee et al.
  • a suitable peptide ligand for FBw7 may include degron sequence motifs (including phosphomimetic amino acids), such as LPSGLLEPPQD (SEQ ID NO: 78).
  • a suitable peptide ligand for DDB1-Cul4 may include sequences derived from HBx (hepatitis B virus X protein) and similar proteins from other viruses and from DCAFs (DDB1-CUL4-associated factors) including helical motifs such as ILPKVLHKRTLGL (SEQ ID NO: 79), NFVSWHANRQLGM (SEQ ID NO: 80), NTVEYFTSQQVTG (SEQ ID NO: 81), and NITRDLIRRQIKE (SEQ ID NO: 82) (see for example Li et al. Nat. Struct. Mol. Biol. (2010) 17, 105-1 11).
  • Suitable peptide ligands for E3 ubiquitin ligases that may be used in a modular binding protein as described herein are shown in Table 3.
  • Preferred peptide ligands for E3 ubiquitin ligases are shown in Table 9 (SEQ ID NOs: 470-617).
  • a modular binding protein comprising a peptide ligand for an E3 ubiquitin ligase may also comprise a peptide ligand for a target molecule. Binding of the modular binding protein to both the target molecule and the E3 ubiquitin ligase may cause the target molecule to be ubiquitinated by the E3 ubiquitin ligase. Ubiquitinylated target molecules are then degraded by the proteasome. This allows the specific targeting of molecules for proteolysis by the modular binding protein.
  • the ubiquitination and subsequent degradation of a target protein has been shown for hetero-bifunctional small molecules (PROTACs; proteolysis targeting chimeras) that bind the target protein and a ubiquitin ligase simultaneously (see for example Bondeson et al. Nat. Chem. Biol. 2015; Deshaies 2015; Lu et al. 2015).
  • PROTACs proteolysis targeting chimeras
  • the modular binding protein may lack lysine residues, so that it avoids ubiquitination by the E3 ubiquitin ligase.
  • Examples of modular binding proteins that bind E3 ubiquitin ligase and a target molecule are shown in Tables 1 and 8.
  • a suitable modular binding protein may comprise an N terminal peptide ligand that binds a target protein, such as b catenin, and a C terminal peptide ligand that binds an E3 ubiquitin ligase.
  • the N terminal peptide ligand may be a b catenin-binding sequence derived from Bcl9 and the C terminal peptide ligand may be an Mdm2-binding sequence derived from p53.
  • a modular binding protein may comprise a C terminal peptide ligand that binds a target protein, such as b catenin, and an N terminal peptide ligand that binds an E3 ubiquitin ligase (see figure 10A).
  • Another suitable modular binding protein may comprise three repeat domains, a peptide ligand located in an inter-repeat loop that binds a target protein, such as b catenin, and a C terminal peptide ligand that binds an E3 ubiquitin ligase.
  • the inter-repeat loop peptide ligand may be derived from the phosphorylated region of APC (adenomentous polyposis coli) and the C terminal peptide ligand may be an Mdm2-binding sequence derived from p53.
  • the modular binding protein may comprise a peptide ligand located in an inter- re peat loop that binds an E3 ubiquitin ligase, and a C terminal peptide ligand that binds a target protein, such as b catenin (See figure 10B).
  • Another suitable modular binding protein may comprise three repeat domains, an N terminal peptide ligand that binds a target protein, such as b catenin, and a peptide ligand located in an inter-module loop that binds an E3 ubiquitin ligase.
  • the N terminal peptide ligand may be a b catenin- binding sequence derived from LRH1 (liver receptor homolog 1) and the inter-module loop peptide ligand may be a sequence derived from the Skp2-targeting region of p27.
  • the modular binding protein may comprise an N terminal peptide ligand that binds an E3 ubiquitin ligase and a peptide ligand located in an inter-module loop that binds a target protein, such as b catenin (see figure 10C).
  • Another suitable modular binding protein may comprise four repeat domains, a first peptide ligand located in an inter-repeat loop that binds an E3 ubiquitin ligase and a second peptide ligand located in an inter-repeat loop that binds a target molecule.
  • the first and second inter-repeat loops may be separate by an inter-repeat loop lacking a peptide ligand.
  • the first peptide ligand may be located in the first inter-repeat loop inter-repeat loop from the N terminus and the second peptide ligand may be located in the third inter-repeat loop from the N terminus or vice versa.
  • a modular binding protein as described herein may comprise an amino acid shown in Table 10 (SEQ ID NOs: 766-815) or a variant thereof.
  • a modular binding protein as described herein may comprise a peptide ligand that binds to a component of a target-selective autophagy pathway, such as chaperone-mediated autophagy (CM A).
  • CM A chaperone-mediated autophagy
  • the modular binding protein and target molecules bound thereto are thus recognised by the autophagy pathway and the target molecules are subsequently degraded.
  • Suitable components of the CMA pathway include heat shock cognate protein of 70 kDa (hsc70, HSPA8, Gene ID: 3312).
  • Suitable peptide ligands are well known in the art (Dice J.F. (1990). Trends Biochem. Sci.
  • a modular binding protein may comprise a scaffold with the amino acid sequence of residues 1 to 171 of SEQ ID NO: 1230 (PPX172 of Table 13 without the HA Tag).
  • a target peptide ligand and the E3 ligase peptide ligand may be grafted into any two of the N terminal helical grafting site 1 and loop grafting sites 2 to 6.
  • the target peptide ligand is located in helical grafting site 1 , loop grafting site 2, loop grafting site 4 or loop grafting site 5 of the scaffold.
  • the E3 ligase peptide ligand or other degron may be located in any of loop grafting sites 2 to 6.
  • the target peptide ligand may be in site 1 and the E3 ligase peptide ligand may be in site 4; the target peptide ligand may be in site 2 and the E3 ligase peptide ligand may be in site 5, the target peptide ligand may be in site 5 and the E3 ligase peptide ligand may be in site 2; the target peptide ligand may be in site 1 and the E3 ligase peptide ligand may be in site 6, the target peptide ligand may be in site 2 and the E3 ligase peptide ligand may be in site 6, the target peptide ligand may be in site 3 and the E3 ligase peptide ligand may be in site 6, the target peptide ligand may be in site 4 and the E3 ligase peptide ligand may be in site 6; the target peptide ligand may be in the site 5 and the E3 ligase peptide lig
  • Suitable peptide ligands include beta eaten in binding ligands, for example helical beta-catenin binding sequence from the protein AXIN, such as ILXXHV and variants thereof; peptides from ARC
  • peptides from LRH1 such as YEQAIAAYLDALMC and variants thereof
  • peptides from BCL9 such as TLXXIQXXL, LXTLXXIQ, and SLXXIXXML and variants thereof
  • KRAS binding ligands for example alpha-helical sequences from the protein SOS1 (Son of seven less homolog 1 ), such as TNXXKXXE and variants thereof.
  • Suitable E3 ligase peptide ligands include SCFSkp2 binding sequences from p27, such as
  • LDPETGEL and variants thereof E3 Cul3-SPOP binding sequences from Puc, such as DEVTSTTSS, and variants thereof, SIAH binding sequences from PHYL, such as LRPVAMVRPWVR, and variants thereof, COP1 binding sequences from Trib, such as SDQIVPEYQE, and variants thereof, UBR5 binding sequences from PAM2, such as LSVNAPEFYP, and variants thereof, beta-TRCP binding sequences from CDC25B, such as TEEDDGFVDI, and variants thereof, and MDM2 binding sequences from p53, such as FSXXWXXL and variants thereof
  • a modular binding protein as described herein may comprise an amino acid sequence shown in Table 13 (SEQ ID NOs: 1230-1304) or a variant of an amino acid sequence shown in Table 13.
  • Variants of a reference sequence are described above and may include variants in which the one or both of the target peptide ligand and the E3 ligase peptide ligand in a reference amino acid sequence of Table 13 are replaced by a different peptide ligand. Suitable peptide ligands are described below.
  • Variants may also include variants in which the scaffold sequence in a reference amino acid sequence of Table 13 is replaced by a different scaffold sequence. Suitable scaffold sequences are described above.
  • a modular binding protein may further comprise one or more additional domains which confer additional functionality, such as targeting domains, intracellular transport domains, stabilising domains or oligomerisation domains. Additional domains may for example be located at the N or C terminus of the modular binding protein or in a loop between repeats.
  • a targeting domain may be useful in targeting the modular binding protein to a particular destination in vivo, such as a target tissue, cell, membrane or intracellular organelle.
  • Suitable targeting domains include chimeric antigen receptors (CARs).
  • An intracellular transport domain may facilitate the passage of the modular binding protein through the cell membrane into cells, for example to bind intracellular target molecules.
  • Suitable intracellular transfer domains are well known in the art (see for example Bechara et al FEBS Letters 587 1 (2013) 1693-1 02) and include cell-penetrating peptides (CPFs), such as Antennapedia (43-58), Tat (48- 60), Cadherin (615-632) and poly-Arg.
  • a stabilising domain may increase the half-life of the modular binding protein in vivo.
  • Suitable stabilising domains are well known in the art and include Fc domains, serum albumin, unstructured peptides such as XTEN 98 or PAS" and polyethylene glycol (PEG).
  • An oligomerisation domain may facilitate the formation of multi-protein complexes, for example to increase avidity against multi-valent targets.
  • Suitable oligomerisation domains include the‘foldon’ domain, the natural trimerisation domain of T4 fibritin (Meier et al., J. Mol. Biol. (2004) 344(4):1051 - 69).
  • a modular binding protein may further comprise a cytotoxic or therapeutic agent and/or or detectable label.
  • Suitable cytotoxic agents include, for example, chemotherapeutic agents, such as methotrexate, auristatin adriamicin, doxorubicin, melphalan, mitomycin C, ozogamicin, chlorambucil, maytansine, emtansine, daunorubicin or other intercalating agents, enzymatically active toxins of bacterial, fungal, plant, or animal origin, such as diphtheria A chain, nonbinding active fragments of diphtheria toxin, exotoxin A chain, ricin A chain, abrin A chain, modeccin A chain, a-amanitin, alpha-sarcin, Aleurites fordii proteins, tubulysins, dianthin proteins, Phytolaca americana proteins (PAP I, PAPII, and PAP-S), Momordica charantia inhibitor, curcin, crotin, Sapaonaria officinalis inhibitor, gel
  • Suitable cytotoxic agents may also include radioisotopes.
  • a variety of radionuclides are available for the production of radioconjugated modular binding proteins including, but not limited to, "Y, 125 l, 131 l, 123 l, 111 ln, 131 In, 105 Rh, 153 Sm, 67 Cu, 67 Ga, 166 Ho, 177 Lu, 186 Re, 188 Re and 212 Bi.
  • Conjugates of a modular binding protein and one or more small anti-cancer molecules for example toxins, such as a calicheamicin, maytansinoids, a trichothene, and CC1065, and the derivatives of these toxins that have toxin activity, may also be used.
  • Suitable therapeutic agents may include cytokines (e.g. IL2, IL12 and TNF), chemokines, procoagulant factors (e.g. tissue factor), enzymes, liposomes, and immune response factors.
  • cytokines e.g. IL2, IL12 and TNF
  • chemokines e.g. IL2, IL12 and TNF
  • procoagulant factors e.g. tissue factor
  • a detectable label may be any molecule that produces or can be induced to produce a signal, including but not limited to fluorescers, radiolabels, enzymes, chemiluminescers or photosensitizers. Thus, binding may be detected and/or measured by detecting fluorescence or luminescence, radioactivity, enzyme activity or light absorbance. Detectable labels may be attached to modular binding proteins using conventional chemistry known in the art.
  • the label can produce a signal detectable by external means, for example, by visual examination, electromagnetic radiation, heat, and chemical reagents.
  • the label can also be bound to another specific binding member that binds the modular binding protein, or to a support.
  • a modular binding protein may be configured for display on a particle or molecular complex, such as a cell, ribosome or phage, for example for screening and selection.
  • a suitable modular binding protein may further comprise a display moiety, such as phage coat protein, to facilitate display on a particle or molecular complex.
  • the phage coat protein may be fused or covalently linked to the modular binding protein.
  • Modular binding proteins as described herein may be produced by recombinant means.
  • a method of producing a modular binding protein as described herein may comprise expressing a nucleic acid encoding the modular binding protein.
  • a nucleic acid may be expressed in a host cell and the expressed modular binding protein may then be isolated and/or purified from the cell culture.
  • a method may comprise;
  • a third nucleic acid encoding a second peptide ligand is inserted into the second nucleic acid to produce a chimeric nucleic acid encoding a chimeric protein comprising a first peptide ligand located at one of the first, second, third and fourth loops of the fibronectin scaffold and a second peptide ligand located at another of the first, second, third and fourth loops of the fibronectin scaffold.
  • One of the first and second peptide ligands may comprise a sequence set out in Table 8 (SEQ ID NOs: 277-388) and the other of the first and second peptide ligands may comprise a sequence set out in Table 9 (SEQ ID NOs: 470-617).
  • Methods described herein may be useful in producing a modular binding protein that binds to a first target molecule and a second target molecule.
  • a method may comprise;
  • nucleic acid incorporating into said nucleic acid a first nucleotide sequence encoding a first peptide ligand that binds to a first target molecule and a second nucleotide sequence encoding a second peptide ligand that binds to a second target molecule to generate a nucleic acid encoding a modular binding protein comprising said first and second peptide ligands, wherein said peptide ligands are located in an inter-repeat loop or at the N or C terminus of the modular binding protein;
  • One of the first and second target molecules may be an E3 ubiquitin ligase.
  • a method may comprise;
  • nucleic acid incorporating into said nucleic acid a first nucleotide sequence encoding a first peptide ligand that binds to a target molecule and a second nucleotide sequence encoding a second peptide ligand that binds to an E3 ubiquitin ligase to generate a nucleic acid encoding a modular binding protein comprising said first and second peptide ligands, wherein said peptide ligands are located in an interrepeat loop or at the N or C terminus of the modular binding protein; and
  • the first peptide ligand may comprise a sequence set out in Table 8 (SEQ ID NOs: 277-388) and second peptide ligand may comprise a sequence set out in Table 9 (SEQ ID NOs: 470-617).
  • the nucleic acid may be comprised within an expression vector.
  • Suitable vectors can be chosen or constructed, containing appropriate regulatory sequences, including promoter sequences, terminator fragments, polyadenylation sequences, enhancer sequences, marker genes and other sequences as appropriate.
  • the vector contains appropriate regulatory sequences to drive the expression of the nucleic acid in a host cell.
  • Suitable regulatory sequences to drive the expression of heterologous nucleic acid coding sequences in expression systems are well- known in the art and include constitutive promoters, for example viral promoters such as CMV or SV40, and inducible promoters, such as Tet-on controlled promoters.
  • a vector may also comprise sequences, such as origins of replication and selectable markers, which allow for its selection and replication and expression in bacterial hosts such as E. coli and/or in eukaryotic cells.
  • a host cell comprising a nucleic acid encoding a modular binding protein as described herein or vector containing such a nucleic acid is also provided as an aspect of the invention.
  • Suitable host cells include bacteria, mammalian cells, plant cells, filamentous fungi, yeast and baculovirus systems and transgenic plants and animals.
  • the expression of proteins in prokaryotic cells is well established in the art.
  • a common bacterial host is E. coli.
  • a modular binding protein may also be produced by expression in eukaryotic cells in culture.
  • Mammalian cell lines available in the art for expression of a modular binding protein include Chinese hamster ovary (CHO) cells, HeLa cells, baby hamster kidney cells, NS0 mouse melanoma cells, YB2/0 rat myeloma cells, human embryonic kidney cells (e.g. HEK293 cells), human embryonic retina cells (e.g. PerC6 cells) and many others.
  • CHO Chinese hamster ovary
  • HeLa cells HeLa cells
  • baby hamster kidney cells e.g. hamster kidney cells
  • NS0 mouse melanoma cells e.g. YB2/0 rat myeloma cells
  • human embryonic kidney cells e.g. HEK293 cells
  • human embryonic retina cells e.g. PerC6 cells
  • Modular binding proteins as described herein may be used to produce libraries.
  • a suitable library may be screened in order to identify and isolate modular binding proteins with specific binding activity.
  • a library may comprise modular binding proteins, each modular binding protein in the library comprising:
  • the first peptide ligand may comprise a sequence set out in Table 8 (SEQ ID NOs: 277-388) or Table 9 (SEQ ID NOs: 470-617) or a variant thereof.
  • the residues at one or more positions in the peptide ligand of the modular binding proteins in the library may be diverse or randomised i.e. the residue located at the one or more positions may be different in different molecules in a population.
  • 1 to 12 positions within a helical peptide ligand at the N or C terminus of the modular binding proteins in the library may be diverse or randomised.
  • the non-constrained X n sequence of the peptide ligand may contain additional diversity.
  • 1 to n positions within an inter-repeat peptide ligand of the modular binding proteins in the library may be diverse or randomised, where n is the number of amino acids in the peptide ligand.
  • peptide ligands may be screened individually and a modular binding protein progressively assembled from repeat domains comprising peptide ligands identified in different rounds of screening.
  • a library may comprise modular binding proteins, each modular binding protein in the library comprising:
  • the constant peptide ligand may comprise a sequence set out in Table 8 (SEQ ID NOs: 277-388) or Table 9 (SEQ ID NOs: 470-617) or a variant thereof.
  • At least one amino acid residue in the diverse peptide ligands in said library may be diverse.
  • a library may be produced by a method comprising:
  • the population of nucleic acids may be provided by a method comprising inserting a first population of nucleic acids encoding a diverse peptide ligand into a second population of nucleic acids encoding the two or more repeat domains linked by inter-repeat loops, optionally wherein the first and second nucleic acids are linked with a third population of nucleic acids encoding linkers of up to 10 amino acids.
  • the nucleic acids may be contained in vectors, for example expression vectors.
  • Suitable vectors include phage-based or phagemid-based phage display vectors.
  • the nucleic acids may be recombinantly expressed in a cell or in solution using a cell-free in vitro translation system such as a ribosome, to generate the library.
  • the library is expressed in a system in which the function of the modular binding protein enables isolation of its encoding nucleic acid.
  • the modular binding protein may be displayed on a particle or molecular complex to enable selection and/or screening.
  • the library of modular binding proteins may be displayed on beads, cell-free ribosomes, bacteriophage, prokaryotic cells or eukaryotic cells.
  • the encoded modular binding protein may be presented within an emulsion where activity of the modular binding protein causes an identifiable change.
  • the encoded modular binding protein may be expressed within or in proximity of a cell where activity of the modular binding protein causes a phenotypic change or changes in the expression of a reporter gene.
  • the nucleic acids are expressed in a prokaryotic cell, such as E coli.
  • the nucleic acids may be expressed in a prokaryotic cell to generate a library of recombine binding proteins that is displayed on the surface of bacteriophage.
  • Suitable prokaryotic phage display systems are well known in the art, and are described for example in Kontermann, R & Dubel, S, Antibody Engineering, Springer-Verlag New York, LLC; 2001 , ISBN: 3540413545, W092/01047, US5969108, US5565332, US5733743, US5858657, US5871907, US5872215, US5885793,
  • Phage display systems allow the production of large libraries, for example libraries with 10 8 or more,
  • the cell may be a eukaryotic cell, such as a yeast, insect, plant or mammalian cell.
  • a diverse sequence as described herein is a sequence which varies between the members of a population i.e. the sequence is different in different members of the population.
  • a diverse sequence may be random i.e. the identity of the amino acid or nucleotide at each position in the diverse sequence may be randomly selected from the complete set of naturally occurring amino acids or nucleotides or a sub-set thereof.
  • Diversity may be introduced into the peptide ligand using approaches known to those skilled in the art, such as oligonucleotide-directed mutagenesis 22 , Molecular Cloning: a Laboratory Manual: 3rd edition, Russell et al. , 2001 , Cold Spring Harbor Laboratory Press, and references therein).
  • Diverse sequences may be contiguous or may be distributed within the peptide ligand.
  • Suitable methods for introducing diverse sequences into peptide ligand are well-described in the art and include oligonucleotide-directed mutagenesis (see Molecular Cloning: a Laboratory Manual: 3rd edition, Russell et al., 2001 , Cold Spring Harbor Laboratory Press, and references therein).
  • diversification may be generated using oligonucleotide mixes created using partial or complete randomisation of nucleotides or created using codons mixtures, for example using trinucleotides.
  • a population of diverse oligonucleotides may be synthesised using high throughput gene synthesis methods and combined to create a precisely defined and controlled population of peptide ligands.
  • “doping” techniques in which the original nucleotide predominates with alternative nucleotide(s) present at lower frequency may be used.
  • the library is a display library.
  • the modular binding proteins in the library may be displayed on the surface of particles, or molecular complexes such as beads, for example, plastic or resin beads, ribosomes, cells or viruses, including replicable genetic packages, such as yeast, bacteria or bacteriophage (e.g. Fd, M13 or T7) particles, viruses, cells, including mammalian cells, or covalent, ribosomal or other in vitro display systems.
  • yeast bacteria or bacteriophage
  • the modular binding proteins in the library are displayed on the surface of a viral particle such as a bacteriophage.
  • a viral particle such as a bacteriophage.
  • Each modular binding protein in the library may further comprise a phage coat protein to facilitate display.
  • Each viral particle may comprise nucleic acid encoding the modular binding protein displayed on the particle.
  • Suitable viral particles include bacteriophage, for example filamentous bacteriophage such as M13 and Fd.
  • Phage display is described for example in W092/01047 and US patents US5969108, US5565332, US5733743, US5858657, US5871907, US5872215, US5885793, US5962255, US6140471 ,
  • Libraries as described herein may be screened for modular binding proteins which display binding activity, for example binding to a target molecule. Binding may be measured directly or may be measured indirectly through agonistic or antagonistic effects resulting from binding.
  • a method of screening may comprise;
  • each modular binding protein in the library comprising;
  • first peptide ligand comprises a sequence set out in Table 8 (SEQ ID NOs: 277-388) or Table 9 (SEQ ID NOs: 470-617) and one or more residues of the second peptide ligand are diverse in said library,
  • the modular binding proteins in the library may comprise one peptide ligand with at least one diverse amino acid residue.
  • the modular binding proteins in the library comprise two repeat domains.
  • the library may be screened for peptide ligands that bind to a target molecule. Peptide ligands identified in this fashion can be assembled in a modular fashion to generate a modular binding protein as described herein that is multi-specific.
  • a first library may be screened for a first peptide ligand that binds to a first target molecule and a second library may be screened for a second peptide ligand that binds to a second target molecule.
  • the first and second peptide ligands are in different locations in the modular binding protein i.e. they are not both N terminal peptide ligands, C terminal peptide ligands or inter-repeat peptide ligands.
  • First and second peptide ligands that bind to the first and second target molecules, respectively, are identified from the first and second libraries. The identified first and second peptide ligands may then be incorporated into a modular binding protein that binds to the first and second target molecules.
  • a first library may comprise modular binding proteins in the library with a first diverse peptide ligand having at least one diverse amino acid residue.
  • a first peptide ligand that binds to a target molecule may be identified from the first library.
  • Modular binding proteins comprising the first peptide ligand may be used to generate a second library comprising a second diverse peptide ligand having at least one diverse amino acid residue.
  • the modular binding protein from the first library may be modified by addition of a second diverse peptide ligand at the N or C terminal or by the addition of additional repeat domains comprising the second diverse peptide ligand in an inter-repeat loop.
  • a second peptide ligand that binds to the same or a different target molecule may be identified from the second library.
  • Modular binding proteins comprising the first and second peptide ligands may be used to generate a third library comprising a third diverse peptide ligand having at least one diverse amino acid residue.
  • the modular binding protein from the second library may be modified by addition of a third diverse peptide ligand at the N or C terminal or by the addition of additional repeat domains comprising the third diverse peptide ligand in an inter-repeat loop.
  • a third peptide ligand that binds to the same target molecule as the first and/or second peptide ligands or a different target molecule may be identified from the third library. In this way, a modular binding protein containing multiple peptide ligands may be sequentially assembled (see Figure 16).
  • a phage library of 10 8 -10 12 first peptide ligand variants may be combined with a phage library of 10 8 -10 12 second peptide ligand variants and a phage library of 10 8 -10 12 third peptide ligand variants.
  • a phage library of 10 8 -10 12 N terminal peptide ligand variants may be combined with a phage library of 10 8 -10 12 C terminal peptide ligand variants to generate a modular binding protein with N and C terminal peptide ligands.
  • Screening a library for binding activity may comprise providing a target molecule and identifying or selecting members of the library that bind to the target, or expressing the library in a population of cells and identifying or selecting members of the library that elicit a cell phenotype.
  • the one or more identified or selected modular binding proteins may be recovered and subjected to further selection and/or screening.
  • Binding may be determined by any suitable technique.
  • the library may be contacted with the target molecule under binding conditions for a time period sufficient for the target molecule to interact with the library and form a binding reaction complex with a least one member thereof.
  • Binding conditions are those conditions compatible with the known natural binding function of the target molecule. Those compatible conditions are buffer, pH and temperature conditions that maintain the biological activity of the target molecule, thereby maintaining the ability of the molecule to participate in its preselected binding interaction. Typically, those conditions include an aqueous, physiologic solution of pH and ionic strength normally associated with the target molecule of interest.
  • the library may be contacted with the target molecule in the form of a heterogeneous or
  • the members of the library can be in the solid phase with the target molecule present in the liquid phase.
  • the target molecule can be in the solid phase with the members of the library present in the liquid phase.
  • both the library members and the target molecule can be in the liquid phase.
  • Suitable methods for determining binding of a modular binding protein to a target molecule include ELISA, bead-based binding assays (e.g. using streptavidin-coated beads in conjunction with biotinylated target molecules, surface plasmon resonance, flow cytometry,
  • biochemical or cell-based assays such as fluorescence-based or luminescence-based reporter assays may be employed.
  • Multiple rounds of panning may be performed in order to identify modular binding proteins which display the binding activity.
  • a population of modular binding proteins enriched for the binding activity may be recovered or isolated from the library and subjected to one or more further rounds of screening for the binding activity to produce one or further enriched populations.
  • Modular binding proteins which display binding activity may be identified from the one or more further enriched populations and recovered, isolated and/or further investigated.
  • binding may be determined by detecting agonism or antagonism resulting from the binding of a modular binding protein to a target molecule, such as a ligand, receptor or enzyme.
  • a target molecule such as a ligand, receptor or enzyme.
  • the library may be screened by expressing the library in reporter cells and identifying one or more reporter cells with altered gene expression or phenotype. Suitable functional screening techniques for screening recombinant populations of modular binding proteins are well- known in the art
  • Modular binding proteins which display the binding activity may be further engineered to improve an activity or property or introduce a new activity or property, for example a binding property such as affinity and/or specificity, an in vivo property such as solubility, plasma stability, or cell penetration, or an activity such as increased neutralization of the target molecule and/or modulation of a specific activity of the target molecule or an analytical property.
  • Modular binding proteins may also be engineered to improve stability, solubility or expression level.
  • Further rounds of screening may be employed to identify modular binding proteins which display the improved property or activity.
  • a population of modular binding proteins enriched for binding to the target molecule may be recovered or isolated from the library and subjected to one or more further rounds of screening for the improved or new property or activity to produce one or further enriched populations.
  • this may be repeated one or more times.
  • Modular binding proteins which display the improved property or activity may be identified from the one or more further enriched populations and recovered, isolated and/or further investigated.
  • a modular binding protein as described herein may be encapsulated in a liposome, for example for delivery into a cell.
  • Preferred liposomes include fusogenic liposomes.
  • Suitable fusogenic liposomes may comprise a cationic lipid, such as 1 , 2-dioleoyl-3-trimethylammoniumpropane (DOTAP), and a neutral lipid, such as dioleoylphosphatidylethanolamine (DOPE) for example in a 1 :1 (w/w) ratio.
  • DOTAP 2-dioleoyl-3-trimethylammoniumpropane
  • DOPE dioleoylphosphatidylethanolamine
  • a liposome may further comprise an aromatic lipid, such as DiO (3, 3'- dioctadecyloxacarbocyanine perchlorate), DiR (1 , T-dioctadecyl-3, 3, 3', 3'- tetramethylindotricarbocyanine iodide), N-(4,4-difluoro-5,7-dimethyl-4-bora-3a,4a-diaza-sindacene-3- propionyl)-1 ,2-dihexadecanoyl-sn-glycero-3-phosphoethanolamine (triethylammonium salt) (BODIPY FL-DHPE), and 2-(4,4-difluoro-5-methyl-4-bora-3a,4a-diazas-indacene-3-dodecanoyl)-1- hexadecanoyl-sn-glycero-3-phosphocholine (BOD
  • a method described herein may comprise admixing a modular binding protein or encoding nucleic acid as described herein with a solution of lipids, for example in an organic solvent, such as chloroform, and evaporating the solvent to produce liposomes encapsulating the modular binding protein.
  • Liposome encapsulations comprising a modular binding protein as described herein are provided as an aspect of the invention.
  • a modular binding protein or encoding nucleic acid as described herein may be admixed with a pharmaceutically acceptable excipient.
  • a pharmaceutical composition comprising a modular binding protein or nucleic acid as described herein and a pharmaceutically acceptable excipient is provided as an aspect of the invention.
  • pharmaceutically acceptable refers to compounds, materials, compositions, and/or dosage forms which are, within the scope of sound medical judgement, suitable for use in contact with the tissues of a subject (e.g., human) without excessive toxicity, irritation, allergic response, or other problem or complication, commensurate with a reasonable benefit/risk ratio.
  • a subject e.g., human
  • Each carrier, excipient, etc. must also be“acceptable” in the sense of being compatible with the other ingredients of the formulation. Suitable carriers, excipients, etc. can be found in standard pharmaceutical texts, for example, Remington’s Pharmaceutical Sciences, 18th edition, Mack Publishing Company, Easton, Pa., 1990.
  • the pharmaceutical composition may conveniently be presented in unit dosage form and may be prepared by any methods well-known in the art of pharmacy. Such methods include the step of bringing the modular binding protein into association with a carrier which may constitute one or more accessory ingredients.
  • pharmaceutical compositions are prepared by uniformly and intimately bringing into association the active compound with liquid carriers or finely divided solid carriers or both, and then if necessary shaping the product.
  • compositions may be in the form of liquids, solutions, suspensions, emulsions, elixirs, syrups, tablets, lozenges, granules, powders, capsules, cachets, pills, ampoules, suppositories, pessaries, ointments, gels, pastes, creams, sprays, mists, foams, lotions, oils, boluses, electuaries, or aerosols.
  • a modular binding protein, encoding nucleic acid or pharmaceutical composition comprising the modular binding protein or encoding nucleic acid may be administered to a subject by any convenient route of administration, whether systemically/ peripherally or at the site of desired action, including but not limited to, oral (e.g. by ingestion); topical (including e.g. transdermal, intranasal, ocular, buccal, and sublingual); pulmonary (e.g. by inhalation or insufflation therapy using, e.g. an aerosol, e.g.
  • vaginal for example, by injection, including subcutaneous, intradermal, intramuscular, intravenous, intraarterial, intra cardiac, intrathecal, intraspinal,
  • compositions suitable for oral administration may be presented as discrete units such as capsules, cachets or tablets, each containing a predetermined amount of the active compound; as a powder or granules; as a solution or suspension in an aqueous or non- aqueous liquid; or as an oil-in-water liquid emulsion or a water-in-oil liquid emulsion; as a bolus; as an electuary; or as a paste.
  • compositions suitable for parenteral administration include aqueous and non- aqueous isotonic, pyrogen-free, sterile injection solutions which may contain anti-oxidants, buffers, preservatives, stabilisers, bacteriostats, and solutes which render the formulation isotonic with the blood of the intended recipient; and aqueous and non-aqueous sterile suspensions which may include suspending agents and thickening agents, and liposomes or other microparticulate systems which are designed to target the compound to cells, tissue or organs.
  • Suitable isotonic vehicles for use in such formulations include Sodium Chloride Injection, Ringer’s Solution, or Lactated Ringer’s Injection.
  • concentration of the active compound in the solution is from about 1 ng/ml to about 10 mg/ml, for example, from about 10 ng/ml to about 1 mg/ml.
  • the formulations may be presented in unit-dose or multi-dose sealed containers, for example, ampoules and vials, and may be stored in a freeze-dried (lyophilised) condition requiring only the addition of the sterile liquid carrier, for example water for injections, immediately prior to use.
  • appropriate dosages of the modular binding protein can vary from patient to patient. Determining the optimal dosage will generally involve the balancing of the level of diagnostic benefit against any risk or deleterious side effects of the administration.
  • the selected dosage level will depend on a variety of factors including, but not limited to, the route of administration, the time of administration, the rate of excretion of the imaging agent, the amount of contrast required, other drugs, compounds, and/or materials used in combination, and the age, sex, weight, condition, general health, and prior medical history of the patient.
  • the amount of imaging agent and route of administration will ultimately be at the discretion of the physician, although generally the dosage will be to achieve concentrations of the imaging agent at a site, such as a tumour, a tissue of interest or the whole body, which allow for imaging without causing substantial harmful or deleterious side- effects.
  • Administration in vivo can be effected in one dose, continuously or intermittently (e.g., in divided doses at appropriate intervals). Methods of determining the most effective means and dosage of administration are well known to those of skill in the art and will vary with the formulation used for therapy, the purpose of the therapy, the target cell being treated, and the subject being treated.
  • Single or multiple administrations can be carried out with the dose level and pattern being selected by the physician.
  • Modular binding proteins described herein may be used in methods of diagnosis or treatment in human or animal subjects, e.g. human. Modular binding proteins for a target molecule may be used to treat disorders associated with the target molecule.
  • the pRSET B (His-tag) constructs were transformed into chemically competent E. coli C41 cells by heat shock and plated on LB-Amp plates. Colonies were grown in 2TY media containing ampicillin (50 micrograms/mL) at 37 °C, 220 rpm until the optical density (O.D.) at 600 nm reached 0.6. Cultures were then induced with IPTG (0.5mM) for 16-20 h at 20 °C or 4 h at 37 °C.
  • Cells were pelleted by centrifugation at 3000 g (4 °C, 10 min) and resuspended in lysis buffer (10 mM sodium phosphate pH 7.4, 150 mM NaCI, 1 tablet of SIGMAFAST protease inhibitor cocktail (EDTA-free per 100 mL of solution), then lysed on a Emulsiflex C5 homogenizer at 15000 psi. Cell debris was pelleted by centrifugation at 15,000 g at 4 °C for 45 min.
  • lysis buffer 10 mM sodium phosphate pH 7.4, 150 mM NaCI, 1 tablet of SIGMAFAST protease inhibitor cocktail (EDTA-free per 100 mL of solution
  • Ni-NTA beads 50% bed volume (GE Healthcare) (5 mL) were washed once with phosphate buffer (10 mM sodium phosphate pH 7.4, 150 mM NaCI ) before the supernatant of the cell lysate was bound to them for 1 hr at 4 °C in batch.
  • the loaded beads were washed three times with phosphate buffer (40 mL) containing 30 mM of imidazole to prevent nonspecific interaction of lysate proteins with the beads.
  • All modular binding proteins described herein are thermally very stable, with melting temperatures above 80°C. This means that the modular binding proteins could be separated from E. coli proteins by incubating the cell lysates at 65 °C for 20 min. Very few of the E. coli proteins survive such temperatures, and therefore, they will unfold and aggregate. Aggregated proteins were removed by centrifugation, leaving 80-90% pure sample of the desired protein. All our constructs folded reversibly, and therefore could be further purified by methods such as acetone or salt precipitation to remove DNA and other contaminants.
  • This approach allowed the production of large amounts of functional proteins without expensive affinity purification methods such as antibodies or His tags and is scalable to industrial production and bioreactors.
  • Plasmids were transformed into E. coli C41 cells and plated overnight. 15 mis of 2T ⁇ medium (Roche) containing 50 micrograms/ml ampicillin was placed in multiple 50 ml tubes. Several colonies were picked and resuspended in each 15 ml culture. For sufficient aeration it is important to only loosely tighten the lids of the 50 ml tubes. Cells were grown at 37 °C until OD600 of 0.6 and then induced with 0.5 mM IPTG overnight. Cells were pelleted at 3000 g (Eppendorf Centrifuge 5804) and then resuspended in 1 ml of BugBuster® cell lysis reagent. Alternatively, sonication in combination with lysozyme and DNAse I treatment was used. The lysate was spun at 12000 g for 1 minute to pellet any insoluble protein and cell debris.
  • the supernatant was added to 100 mI bed volume of pre-washed Ni-NTA agarose beads.
  • the subseguent affinity purification was performed in batch, by washing the beads 4 times with 1 ml of buffer each time (alternatively, Qiagen Ni-NTA Spin Columns can be used).
  • the first wash contained 10% BugBuster® solution and 30 mM imidazole in the chosen buffer. Here we used 50 mM sodium phosphate buffer pH 6.8, 150 mM NaCI. The three successive washes had 30 mM of imidazole in the chosen buffer. Beads were washed thoroughly to remove the detergent present in the BugBuster® solution.
  • Protein was eluted from the beads in a single step using 1 ml of chosen buffer containing 300 mM imidazole. The combination of Bugbuster® and imidazole and the repeat washes in small bead volumes yielded >95% pure protein. Imidazole was removed using a NAP-5 disposable gel- filtration column (GE Healthcare).
  • ITC was performed at 25°C using a VP-ITC (Microcal).
  • 1 TBP-CTPR2, 2TBP-CTPR4, 3TBP-CTPR6 and TNKS2 ARC4 were dialysed into 10 mM sodium phosphate buffer pH 7.4, 150 mM NaCI, 0.5 mM TCEP.
  • Dialysed TNKS2 ARC4 200 mM was titrated into the sample cell containing 1TBP-CTPR2 at 20 pM. Similar experiments were performed for 2TBP-CTPR4 and 3TBP-CTPR6. Injections of TNKS2 ARC4 into the cell were initiated with a 5 pL injection, followed by 29 injections of 10 pL.
  • the reference power was set at 15 pCal/s with an initial delay of 1000 s and a stirring speed of 485 rpm. Data were fitted using the instrument software a one-site binding model.
  • HEK293T cells were cultured in Dulbecco’s Modified Eagle’s Medium (Sigma Aldrich) supplemented with 10% fetal bovine serum and pe n ici 11 i n/stre pto my ci n (LifeTech) at 37°C with 5% CO2 air supply.
  • HEK293T were seeded in 6-well tissue culture plates (500,000 cells per well) and transfected the next day using the Lipofectamine2000 transfection reagent (Invitrogen) according to the manufacturer’s protocol.
  • HA-b-catenin (1 pg) alone and with various PROTACs (1 pg) was transfected in HEK293T cells in 6- well plates using Lipofectamine2000. After 48 hours of transfection, the cells were lysed in 200 pL of Laemmli buffer. After sample was boiled at 95°C for 20 min proteins were resolved by SDS-PAGE and transferred to a PVDF membrane, and immunoblotting was performed using anti-HA (C29F4, Cell Signaling Technologies) and anti-actin (A2066, Sigma-Aldrich) antibodies. Changes in b-catenin levels were evaluated by the densitometry of the bands corresponding to HA-b-catenin normalised to actin levels using ImageJ.
  • DOTAP cationic
  • DOPE neutral
  • DiR aromatic
  • lipid cake was hydrated with 10 mM HEPES pH 7.4, containing 27 pM protein, so that the total lipid concentration is 4 mg/ml. This mixture was vortexed for 2 minutes and then sonicated for 20 minutes at room temperature. Liposomes encapsulating proteins were stored at 4°C until further use.
  • EL empty liposomes without proteins
  • lipid cake was hydrated with 10 mM HEPES pH 7.4 without proteins.
  • An ATP assay was used to investigate whether there is any cytotoxicity associated with EL and LFP.
  • the Wnt pathway was activated by treating HEK293T cells with Wnt-conditioned media obtained from L-cells expressing Wnt3A for 8 days.
  • 10 5 HEK293T cells/well were seeded on a 24-well plate Nunclon Delta Surface plate (NUNC) and incubated overnight at 37°C, 5% C02. The following day, cells were transfected with 100 ng of TOPflash TCF7L2-firefly luciferase plasmid, 10 ng of CMV-Renilla plasmid (as internal control) and 100 ng of the corresponding TPR construct.
  • NUNC Nunclon Delta Surface plate
  • Plasmids were mixed with 0.5 pL of Lipofectamine 2000 transfection reagent according to the manufacturer’s protocol (invitrogen). Transfected cells were allowed to recover for 8 h, then they were treated with Wnt-conditioned media (1 :2 final concentration) for a further 16 h.
  • the TOPflash assay was performed using the Dual-Luciferase Reporter Assay System (Promega) (Korinek et al. , 1997 Science 275(5307):1784-7) following the manufacturer’s instructions.
  • the activities of firefly and Renilla luciferases were measured sequentially from a single sample, using the CLARIOstar plate reader. Relative luciferase values were obtained from triplicate samples dividing the firefly luminescence activity by the CMV-induced Renilla activity, and standard deviation was calculated.
  • HEK 293T cells 10 5 HEK 293T cells in 500 m ⁇ of Dulbecco’s Modified Eagles Medium (DMEM) supplemented with 10% fetal bovine serum were grown overnight in each well of a 24-well cell culture plate.
  • DMEM Modified Eagles Medium
  • For TOPFLASH reporter assays 100 ng/well of TOPFLASH plasmid and 10 ng/well of CMV-Renilla plasmid (as internal control) were used to transfect cells in 24-well plates. Cells were transfected with the Lipofectamine 2000 transfection reagent according to the manufacturer’s protocol (Invitrogen). Transfected cells were allowed to recover for 8 hours, and Wnt signalling was activated by addition of Wnt3A-conditioned media obtained from L-cells.
  • Relative luciferase values were obtained from triplicate samples (from two independent experiments) by dividing the firefly luciferase values (from TOPFLASH) by the Renilla luciferase values (from CMV renilla), and standard deviations were calculated.
  • Nrf-TPR proteins were titrated into a solution containing a mixture of FITC-labelled Nrf2 peptide and Keapl protein. The prepared plates were incubated for 30 minutes at room temperature before readings were taken.
  • Beta Catenin in MIA PaCa-2 cells was tagged with the HiBiT small peptide tag by CRISPR editing and homology directed repair (HDR). Ribonuclearprotein complex of Cas9 enzyme and gRNA
  • Stable pools were aliquoted and frozen after 3 weeks. Aliquots were used on assays until passage 10.
  • MIA PaCa-2 cells in which endogenous Beta Catenin has been tagged with the small peptide HiBiT were cultured in T175 flasks in DMEM 10% FCS at 37°C, 5% C0 2 and split 1 in 10 on reaching 90% confluency. Flasks were grown to approx. 80% confluency and cells were then split by gentle washing with 20 ml of PBS and incubation with 4 ml of cell dissociation buffer for 5 minutes. Cells were counted and seeded at 5000 cells per well into replicate white solid bottom and black clear bottom plates in 100 pi per well of DMEM 10%FCS. Cells were incubated overnight at 37°C, 5% CO2.
  • Cells were transfected with 100 ng per well of pcDNA 3.1 vector containing CPTR constructs using Lipofectamine 3000 using the manufacturer’s recommended protocol (Thermofisher Scientific) or with 25nM siRNA (Beta Catenin targeted or Scramble control) using TransIT X2 using the manufacturers recommended protocol (Mirus Bio Inc). 24 hours after transfection the NanoGlo lytic detection assay was used to determine Beta Catenin-HiBiT levels. Plates were equilibrated to room temperature and 80 mI of Nanoglo Lytic reagent was added to each well. After 10 minutes of shaking luminescence was measured on a GloMax Discover plate reader with an integration time of 2 seconds.
  • MIA PaCa-2 cells in which endogenous Beta Catenin has been tagged with the small peptide HiBiT were cultured in T175 flasks in DMEM 10% FCS at 37°C, 5% C0 2 and split 1 in 10 on reaching 90% confluency. Flasks were grown to approx. 80% confluency and cells were then split by gentle washing with 20 ml of PBS and incubation with 4 ml of cell dissociation buffer for 5 minutes. Cells were counted and seeded at 5000 cells per well into replicate white solid bottom and black clear bottom plates in 100 pi per well of DMEM 10% FCS. Cells were incubated overnight at 37°C, 5% CO2.
  • permeabilization buffer (1* PBS, 5% FCS, 0.2% Saponin, 0.2 mM filtered).
  • Cells were stained with anti HA primary antibody (HA-Tag (C29F4) Rabbit mAb), followed Goat anti-Rabbit, Alexa Fluor® 555 secondary and Hoescht 33342 Nuclear counterstain.
  • Wells were imaged immediately on an EVOS M5000 Fluorescence Microscope.
  • Tetratricopeptide repeat is a 34-residue motif that can be repeated in tandem to generate modular proteins. TPRs are used here as an example of helix-turn-helix tandem-repeats arrays, but any tandem repeat array may be used.
  • RTPR proteins comprising TPRs were derived from the consensus TPR sequence (CTPR). Two repeats were found to be sufficient to generate a highly stable mini-protein of 68 amino acids
  • RTPR2 The biophysical properties of two types of engineering strategy; loop insertions and terminal helix grafting, were assessed.
  • RTPR2 with a 20-residue unstructured loop between the two repeats showed a small shift to a lower melting temperature
  • the stabilities of proteins comprising TBP-CTPR2 (a two-repeat CTPR with a loop insertion that binds to the protein tankyrase (Guettler et al. 201 1) repeated in tandem were measured.
  • the TBP-CTPR2- containing proteins had two, four, six, and eight repeats, and they displayed one, two, three and four binding loops, respectively.
  • the helical content of the proteins monitored by molar ellipticity at 222 nm, was found to increase in proportion to the number of repeats, as did the stability, indicating that they were behaving like classic helical repeat proteins (Figure 2).
  • CTPR-mediated“stapling” (constraining) of binding helices therefore occurred through residues Tyr (/) - lie (i+4) - Tyr (/ ' +/) - Leu (i+ 11), fully stapling a 15-residue helix.
  • p53 binds to the Mdm2 E3 through an alpha helix (Figure 4A).
  • Stapled versions of the p53 helix, as well as circular peptides and grafted coiled coils, have been developed by many groups, and the sequences have been optimised to give nanomolar affinities in some cases (see for example, Ji et al. 2013; Lee et al. 2014; Kritzer et al. 2006).
  • the p53 helix has a favourable geometry to be grafted onto the C-terminal solvating helix of the CTPR scaffold, and moreover the two helices have 30% sequence identity.
  • TPB2-TPR a loop module designed to bind to oncoprotein tankyrase
  • SLiM“3BP2” a sequence that binds to the substrate-binding ankyrin-repeat clusters (ARC) of the protein tankyrase, a multi-domain poly ADP-ribose polymerase that is upregulated in many cancers (Guettler et al. 2011) onto the CTPR scaffold.
  • ARC substrate-binding ankyrin-repeat clusters
  • SLiMs in folded domains led to an increase of proteolysis resistance; showing the potential to expand the interaction surface through further rational engineering, in silico methods and/or directed evolution; controlled geometric arrangement; and bi- or multivalency of interactions.
  • Multivalency in this system was increased further via oligomerisation of the binding modules by fusing them to the foldon domain of T4 fibritin (Fig. 5B).
  • This trimerisation domain comprises of a C-terminal helix, such as that of p53-CTPR, ending with the foldon domain, a short b-sheet peptide capable of homo- trime rising.
  • the foldon domain has been shown to be highly stable and independently folded (Boudko et al 2002; Meier et al. 2004). In this way, multiple binding modules can be arranged with specified geometries to inhibit complex multivalent molecules that cannot be targeted with monovalent interactions due to their natural tendency to interact with other multivalent networks with high avidity.
  • Skp2 is the substrate recognition subunit of the SCF Skp2 ubiquitin ligase.
  • the Skp2-binding sequence that we inserted into the RTPR loop was based on the previously published degron peptide sequence derived from the substrate p27 that binds to Skp2 in complex with Cks1 (an accessory protein) (Hao et al. 2005). We used only 10 residues of this peptide. Although ideally the Skp2-binding sequence would include a phospho-threonine (as this residue makes some key contacts with Skp2 and Cks1 ), we instead explored whether we could replace the phospho-threonine with a phosphomimetic (glutamate) without affecting binding affinity.
  • Nrf-TPR a loop module designed to bind to E3 ubiquitin ligase Keap1 -Cul3
  • Keapl is the substrate recognition subunit of the Keap1 -Cul3 ubiquitin ligase.
  • a Keapl -binding sequence that we inserted into the CTPR loop was based on the previously published degron peptide sequence derived from the Keapl substrate Nrf2.
  • b-catenin The Wnt/p-catenin signalling pathway is deregulated in many cancers and in neurodegenerative diseases, and therefore b-catenin is an important drug target.
  • binding sequences both helical and non-helical
  • Mdm2 and SCF Skp2 We selected Mdm2 and SCF Skp2 to test as E3 ubiquitin ligases, as we had successfully generated single-function TPRs to bind to them (Figs. 4 and 6).
  • a range of different factors contribute to efficient ubiquitination and target degradation by these hetero-bifunctional molecules, hence the power of screening different combinations of single-function modules and potentially also different lengths of intervening blank modules.
  • TPR proteins targeting tankyrase ( Figures 21 and 22) were delivered into cells using liposome encapsulation, and the effect on Wnt signalling was assayed using a TOPFLASH assay. The results show that the designed TPR proteins are able to inhibit Wnt signalling.
  • KRAS we transfected KRAS plasmid alone or KRAS plasmid together with one of the TPR plasmids in HEK293T cells using Lipofectamine2000. 24 hours post transfection the cells were lysed, and KRAS levels were evaluated by western blot. The results show that the designed heterobifunctional TPR is capable of reducing KRAS levels (Figure 23).
  • TPR proteins targeting KRAS were delivered into cells using liposome encapsulation, and the effect on HiBiT-tagged KRAS levels was evaluated the HiBiT luminescence assay. The results show that the designed hetero-bifunctional TPR proteins are able to reduce KRAS levels ( Figures 24 and 25).
  • Hetero-bifunctional TPR proteins were designed to target endogenous KRAS for degradation via CMA ( Figure 26).
  • TPR constructs or empty vector (light grey) were transiently transfected into either HEK293T or DLD1 (colorectal cancer cell line) using Lipofectamine2000. 24 hours post transfection the cells were lysed, and KRAS levels were evaluated by western blot. The designed hetero-bifunctional TPRs that resulted in reduction of KRAS levels compared to the empty vector control are shown in white.
  • the linker sequence connecting a peptide ligand to an inter-repeat loop was varied in order to optimise the binding affinity for the target for Nrf-TPR, a TPR protein designed to bind to the protein Keapl (see Fig. 7). Glycine residues were introduced into the linker to provide flexibility and increased spatial sampling. The introduction of this more flexible linker sequence was found to increase the binding affinity of the Nrf-TPR protein (labelled‘Flexible’) when compared with the consensus-like linker sequence Altering the charge content of the linker sequence (’labelled‘Charged’) and altering the conformational properties (based on the predictions of the program CIDER (Holehouse et al.
  • Beta Catenin abundance was quantified using the lytic HiBiT assay. Mean and SEM of at least 4 replicates are shown in Figure 28 (Untreated: Untreated cells, Scr: Scrambled siRNA, Lipofectamine:
  • siRNA Lipofectamine only treated cells, siRNA: Beta catenin targeted siRNA).
  • the modular design of the constructs is shown in Table 12 and the sequences are shown in Table 13 (SEQ ID NOs: 1230-1304).
  • the bispecific CPTR-based PPX229 caused a 36% reduction in signal relative to lipofectamine only demonstrating its ability to degrade Beta Catenin more effectively than both its single function controls (PPX239 3%, PPX244 14%).
  • Positive control Beta Catenin targeted siRNA caused a 52% reduction in signal relative to scrambled RNA.
  • MIA PaCa-2 cells in which endogenous Beta Catenin has been tagged with the small peptide HiBiT were transfected with Scrambled or Beta Catenin targeted siRNA, or with CPTR scaffold based constructs utilising the AXIN targeting sequence together with single function controls. After 24 hours Beta Catenin abundance was quantified using the lytic HiBiT assay. Mean and SEM of at least 4 replicates are shown in Figure 29 (Untreated: Untreated cells, Src: Scrambled siRNA, Lipofectamine: Lipofectamine only treated cells, siRNA: Beta catenin targeted siRNA).
  • the modular design of the constructs is shown in Table 12 and the sequences are shown in Table 13 (SEQ ID NOs: 1230-1304).
  • the bispecific CPTR-based PPX197 caused a 37% reduction in signal relative to lipofectamine only demonstrating its ability to degrade Beta Catenin more effectively than both its single function controls (PPX235 1 1 %, PPX240 6%).
  • the bispecific CPTR-based PPX202 caused a 36% reduction in signal relative to lipofectamine only demonstrating its ability to degrade Beta Catenin more effectively than both its single function controls (PPX235 11 %, PPX241 8%).
  • the bispecific CPTR-based PPX225 caused a 33% reduction in signal relative to lipofectamine only demonstrating its ability to degrade Beta Catenin more effectively than both its single function controls (PPX235 1 1 %, PPX244 14%).
  • Beta Catenin abundance was quantified using the lytic HiBiT assay. Mean and SEM of at least 4 replicates are shown in Figure 30 (Untreated: Untreated cells, Scr: Scrambled siRNA, Lipofectamine:
  • siRNA Lipofectamine only treated cells, siRNA: Beta catenin targeted siRNA).
  • the modular design of the constructs is shown in Table 12 and the sequences are shown in Table 13 (SEQ ID NOs: 1230-1304).
  • the bispecific CPTR-based PPX226 caused a 36% reduction in signal relative to lipofectamine only demonstrating its ability to degrade Beta Catenin more effectively than both its single function controls (PPX237 9%, PPX244 14%).
  • Positive control Beta Catenin targeted siRNA caused a 52% reduction in signal relative to scrambled RNA.
  • Beta Catenin abundance was quantified using the lytic HiBiT assay. Mean and SEM of at least 4 replicates shown in Figure 31 (Untreated: Untreated cells, Scr: Scrambled siRNA, Lipofectamine: Lipofectamine only treated cells, siRNA: Beta catenin targeted siRNA).
  • the bispecific CPTR-based PPX203 caused a 36% reduction in signal relative to lipofectamine only (LIFO) demonstrating its ability to degrade Beta Catenin more effectively than both its single function controls (PPX241 8%, PPX236 12%).
  • Positive control Beta Catenin targeted siRNA caused a 52% reduction in signal relative to scrambled RNA.
  • the bispecific CPTR-based PPX227 caused a 30% reduction in signal relative to lipofectamine only (LIPO) demonstrating its ability to degrade Beta Catenin more effectively than both its single function controls (PPX241 3%, PPX244 14%).
  • Positive control Beta Catenin targeted siRNA caused a 52% reduction in signal relative to scrambled RNA.
  • MIA PaCa-2 cells were transfected with CPTR scaffold based constructs PPX197, PPX202, PPX203, PPX225, PPX226, PPX227 and PPX229 which contain HA tags. After 24 hours cells were fixed, permeabilised and stained for HA tag. Hoescht was used as a nuclear counterstain to obtain images of HA and Hoescht staining for transfected cells (not shown). HA was detected in cells transfected with PPX197, PPX202, PPX203, PPX225, PPX226, PPX227 and PPX229 indicating that CPTR- based molecules are expressed in cells.
  • RTPRa-ii-H ATTGAATATTATCAGCGGGCTCTGGAACTGGATCCTNNNNNN ( SEQ ID NO 171)
  • RTPRa-i-E ATTGAATATTATCAACGTGCACTGGAACTGGACCCGNNNNNN ( SEQ ID NO 172)
  • RTPRa-iii-E ATCGAATATTATCAACGTGCACTGGAACTGGACCCGNNNNNN ( SEQ ID NO 173)
  • CTPRa-E ATCGAGTATTATCAAAAAGCACTGGAACTGGACCCGNNNNNN ( SEQ ID NO 174)
  • CTPRb-E ATCGAATATTATCAAAAAGCGCTGGAACTGGACCCGNNNNNN ( SEQ ID NO 175)
  • RTPRb-E ATTGAATATTATCAACGTGCGCTGGAATTAGATCCGNNNNNN ( SEQ ID NO 176)
  • RTPRa-ii-E ATCGAATATTACCAGCGTGCGTTAGAATTAGATCCGNNNNNN ( SEQ ID NO 177)
  • RTPRa-iv-E ATTGAGTACTACCAACGTGCCCTGGAACTGGACCCTNNNNNN ( SEQ ID NO 181)
  • NMGNIYLKQRNYSKAIKFYRMALD 270 NP_783195.2 (SEQ ID NO: 189) 495 ALTNKGNTVFANGDYEKAAEFYKEAL 520
  • SPAC6B12_12-516 SEVYNYFGEI LLDQQKFDDA VKNFDHAIEL EKRE ( SEQ ID NO 823)
  • NUC2_SC PESWCI LANC FSLQREHSQA LKCINRAIQL DPTF ( SEQ ID NO 824)
  • HSU4203 EKGLYRRGEA QLLMNEFESA KGDFEKVLEV NPQN ( SEQ ID NO 833)
  • CELC55B ADILCERAEA HILDEDYDSA IEDYQKATEV NPDH SEQ ID NO 838)
  • PSEPILF SRVFENLGLV SLQMKKPAQA KEYFEKSLRL NRNQ SEQ ID NO 843
  • JT06030 PETHYNRGLA WERLGNVDQA IADYGRSIAL DRYY ( SEQ ID NO 845)
  • CELF38B1 AHCLFNLGVL YQRTNRDEMA MSAWKNATRI DPSH(SEQ ID NO 849)
  • CELF38B4 VLFHANLGIL YQRMSRHKEA ESQYRIVLAL DSKN ( SEQ ID NO 855)
  • CELC56C VEALRQKGNE LFVQKDYKEA IDAYRDALTR LDTL SEQ ID NO 860
  • CELT09B0 AEQHNTNGKK CYMNKRYDDA VDHYSKAIKV NPLP(SEQ ID NO 878 )
  • CELT09B1 VKPLYFLGNV FLQSKKYSEA ISCLSKALYH NAVI SEQ ID NO 879)
  • STI 1_YE0 ARGYSNRAAA LAKLMSFPEA IADCNKAIEK DPNF (SEQ ID NO: 923)
  • LACALS ISIYQRIGDS YAQLGNFENA ISFLEKSLEF DEKP SEQ ID NO: 927)
  • CELR05F SKAWGRMGLA YSCQNRYEHA AEAYKKALEL EPNQ ( SEQ ID NO: 928)
  • HSAB2370_1-631 FNCWESLGEA YLSRGGYTTA LKSFTKASEL NPES(SEQ ID NO: 945) S75648 ADAYNLRGVA YMVIEQYTDA LADFDQAIAL NPKD ( SEQ ID NO: 946)
  • CELT19A0 PLFHYVQGRM KLLLHDVDKA IQHLKDAMDK DPNN SEQ ID NO: 947)
  • CELT25F0 FEGNYNLGLV SFTQGKYHEC RELIEKALAA FPEH SEQ ID NO: 948)
  • CELT25F1 VTMLTGMARV QEALGEYDES VKLYKRVLDA ESNN SEQ ID NO: 950
  • HSU46570 AKLYCNRGTV NSKLRKLDDA IEDCTNAVKL DDTY ( SEQ ID NO: 957)
  • ACU89981 SRAFHRKGNA YMKMEKYAEA IDSYNRALTE HRNP ( SEQ ID NO: 978)
  • HSU20362 GQSWYFLGRC YSCIGKVQDA FVSYRQSIDK SEAS (SEQ ID NO: 981)
  • ACU89983 ALEEKNKGNA AMSAGDFKAA VEHYTNAIQH DPQN ( SEQ ID NO: 982)
  • ACU89984 SKGYSRKGAA LCYLGRYADA KAAYAAGLEV EPTN ( SEQ ID NO: 985)
  • CELF38B AKIHYNLGKV LGDNGLTKDA EKNYWNAIKL DPSY(SEQ ID NO 1066)
  • HSU2036 TEALYNIGLT YEKLNRLDEA LDCFLKLHAI LRNS ( SEQ ID NO 1080)
  • CELK04G1 AVAWSNLGCV FNSQGEIWLA IHHFEKAVTL DPNF ( SEQ ID NO 1081)

Abstract

This invention relates to modular proteins that are capable of binding to one or more target molecules. The modular binding proteins comprise two or more repeat domains, such as tetratricopeptide repeat domains; inter-repeat loops linking the repeat domains; and one or more heterologous peptide ligands that bind to a target molecule. Each peptide ligand is located in an inter-repeat loop or at the N or C terminus of the modular binding protein and one or more of the heterologous peptide ligands comprises at least one amino acid sequence set out in Table 8 or Table 9 or a variant thereof. Modular binding proteins with various configurations and methods for their production and use are provided.

Description

Modular Binding Proteins
Field
This invention relates to modular binding proteins and their production and uses.
Background
A priority area in medicine, particularly cancer research, is the expansion of the‘druggable’ proteome, which is currently limited to narrow classes of molecular targets. For example, protein-protein interactions (PPIs) are fundamental to all biological processes and represent a large proportion of potential drug targets, but they are not readily amenable to conventional small molecule inhibition. The architecture of tandem repeat proteins has tremendous scope for rational design (Kobe & Kajava 2000, Longo & Blaber, 2014, Rowling et al. , 2015). The key features of tandem repeat proteins are relatively small size, modularity and extremely high stability (and therefore recombinant production) without the need of disulphide bonds. Individual consensus-designed repeats are self-compatible and can be put together in any order; function is therefore also modular, which means that multiple functions can be independently designed and incorporated in a combinatorial fashion within a single molecule (WO2017106728).
Novel repeat protein functions, e.g. DARPins (Tamaskovic et al., 2012), have been developed based on the natural type of PPI interface of these proteins i.e. spanning many repeat units to create an extended, high-affinity binding interface for the target. Mutations have been introduced into the surface residues in the tetratricopeptide (TPR) repeats of the cytosolic receptor peroxin 5 (Sampathkumar et al. (2008) J. Mol. Biol., 381 , 867-880). Binding of peptide ligands to peroxin 5 is shown to be mediated by residues located in several different TPR repeats. The interactions of TPR containing protein kinesin-1 with different cargo proteins has also been reported (Zhu et al PLoS One 2012 7 3 e33943). The specificity and stability of ankyrin repeat proteins has been modified through the introduction of mutations into ankyrin repeat sequences (Li et al (2006) Biochemistry 45 15168-15178).
Summary
The present inventors have found that modular proteins capable of binding to one or more target molecules can be generated by displaying peptidyl binding motifs, such as short linear motifs (SLiMs), on modular scaffolds. These modular binding proteins may be useful, for example as single- or multifunction protein therapeutics.
An aspect of the invention provides a modular binding protein comprising;
(i) two or more repeat domains,
(ii) inter-repeat loops linking said repeat domains; and
(iii) one or more peptide ligands, each said peptide ligand being located in an inter-repeat loop or at the N or C terminus of the modular binding protein.
Preferably, the one or more peptide ligands comprise an amino acid sequence set out in Table 8
(SEQ ID NOs: 277-388) or Table 9 (SEQ ID NOs: 470-617). In other embodiments, a modular binding protein may comprise an amino acid sequence set out in Table 13 (SEQ ID NOs: 1230-1304), or a variant thereof.
In some preferred embodiments, the modular binding protein may comprise a first peptide ligand that binds a first target molecule and a second peptide ligand that binds a second target molecule. One of the first or second target molecules may be an E3 ubiquitin ligase. One of the first and second peptide ligands may comprise an amino acid sequence set out in Table 8 (SEQ ID NOs: 277-388) and the other of the first and second peptide ligands may comprise an amino acid sequence set out in Table 9 (SEQ ID NOs: 470-617).
Another aspect of the invention provides a method of producing a modular binding protein comprising; inserting a first nucleic acid encoding a peptide ligand into a second nucleic acid encoding two or more repeat domains linked by inter-repeat loops, to produce a chimeric nucleic acid encoding a modular binding protein as described herein; and
expressing said chimeric nucleic acid to produce the modular binding protein.
Preferably the peptide ligand comprises an amino acid sequence set out in Table 8 (SEQ ID NOs: 277-388) or Table 9 (SEQ ID NOs: 470-617).
Another aspect of the invention provides a method of producing a modular binding protein that binds to a first target molecule and a second target molecule comprising;
a nucleic acid encoding two or more repeat domains linked by inter-repeat loops, and incorporating into said nucleic acid a first nucleotide sequence encoding a first peptide ligand that binds to a first target molecule and a second nucleotide sequence encoding a second peptide ligand that binds to a second target molecule to generate a nucleic acid encoding a modular binding protein comprising said first and second peptide ligands, wherein said peptide ligands are located in an inter-repeat loop or at the N or C terminus of the modular binding protein; and
expressing the nucleic acid to produce said protein.
Preferably the first and second peptide ligands comprise amino acid sequences set out in Table 8 (SEQ ID NOs: 277-388) or Table 9 (SEQ ID NOs: 470-617).
In some preferred embodiments, one of the first or second target molecules is an E3 ubiquitin ligase.
Another aspect of the invention provides a library comprising modular binding proteins, each modular binding protein in the library comprising;
(i) two or more repeat domains,
(ii) inter-repeat loops linking said repeat domains; and
(iii) one or more peptide ligands, each said peptide ligand being located in an inter-repeat loop or at the N or C terminus of the modular binding protein,
wherein at least one amino acid residue in the peptide ligands in said library is diverse.
Preferably the one or more peptide ligands comprise an amino acid sequence set out in Table 8 (SEQ ID NOs: 277-388) or Table 9 (SEQ ID NOs: 470-617).
Another aspect of the invention provides a library comprising a first and a second sub-library of modular binding proteins, each modular binding protein in the first and second sub-libraries comprising;
(i) two or more repeat domains,
(ii) inter-repeat loops linking said repeat domains;
(iii) a first peptide ligand comprising an amino acid sequence set out in Table 8 (SEQ ID NOs: 277-388) or Table 9 (SEQ ID NOs: 470-617) and
(iv) a second peptide ligand comprising at least one diverse amino acid residue,
wherein the peptide ligand in the modular binding proteins in the first sub-library binds to a first target molecule and is located in one of (i) an inter-repeat loop; (ii) the N terminus or (iii) the C terminus of the modular binding protein, and
the peptide ligand in the modular binding proteins in the second sub-library binds to a second target molecule and is located in another of (i) an inter-repeat loop; (ii) the N terminus or (iii) the C terminus of the modular binding protein.
Another aspect of the invention provides a method of producing a library of modular binding proteins comprising;
(a) providing a population of nucleic acids encoding a diverse population of modular binding proteins comprising
(i) two or more repeat domains,
(ii) inter-repeat loops linking said repeat domains; and
(iii) a first peptide ligand comprising an amino acid sequence set out in Table 8 (SEQ ID NOs: 277-388) or Table 9 (SEQ ID NOs: 470-617) and
(iv) a second peptide ligand comprising at least one diverse amino acid residue, each said peptide ligand being located in an inter-repeat loop or at the N or C terminus of the modular binding protein, and
(b) expressing said population of nucleic acids to produce the diverse population,
thereby producing a library of modular binding proteins.
Another aspect of the invention provides a method of screening a library comprising;
(a) providing a library of modular binding proteins, each modular binding protein in the library comprising;
(i) two or more repeat domains, (ii) inter-repeat loops linking said repeat domains;
(iii) a first peptide ligand comprising an amino acid sequence set out in Table 8 (SEQ ID NOs: 277-388) or Table 9 (SEQ ID NOs: 470-617) and
(iv) a second peptide ligand comprising at least one diverse amino acid residue, wherein the first and second peptide ligands are located in the inter-repeat loop, at the N terminus or at the C terminus of the protein,
(b) screening the library for modular binding proteins which display a binding activity, and
(c) identifying one or more modular binding proteins in the library which display the binding activity.
Other aspects and embodiments of the invention are described in more detail below.
Brief Description of Figures
Figure 1 shows the thermostability of consensus-designed tetratrico peptide (CTPR) proteins containing loop- or helix-grafted binding motifs: Thermal denaturation, monitored by circular dichroism, of 2-repeat RTPR (a CTPR in which lysine residues have been replaced with arginine residues) proteins: RTPR2 (in diamonds), RTPR2 containing a loop binding-module (circles) and RTPR2 containing a helix binding-module (squares). All samples are at 20 mM in 10 mM sodium phosphate buffer pH 7.4, 150 mM NaCI.
Figure 2 shows the thermostability of CTPR proteins of increasing length containing an increasing number of binding modules (alternating with blank modules): Thermal denaturation curves, monitored by circular dichroism, of TPR proteins containing 1 , 2, 3 and 4 loops comprising a tankyrase-binding sequence: 1TBP-CTPR2, 2TBP-CTPR4, 3TBP-CTPR6, 4TBP-CTPR8. All samples are at 20 mM in 10 mM sodium phosphate buffer pH 7.4, 150 mM NaCI.
Figure 3 shows an example of helix grafting. Figure 3A (i) shows the crystal structures of SOS1 (son- of-sevenless homologue 1) bound to KRAS (Kirsten rat sarcoma) (PDB 1 NVU, Margarit et al. Cell (2003) 112(5):685-95), and (ii) shows the SOS1 helix grafted onto a helix at the N-terminus of a CTPR2 protein. The modelled structure of SOS-RTPR2 is shown, and the sequence of the helix is given with the key KRAS-binding residues in grey and the residues that form the interface with the CTPR helices in black (iii) shows the modelled structure of SOS-TPR2 in complex with KRAS. Figure 3B shows binding of SOS-TPR2 to KRAS measured by competitive fluorescence polarization (FP). The complex between mant-GTP and KRAS was pre-formed, and 0.1-300 pM SOS-RTPR2 was then titrated in to the complex, displacing the mant-GTP from KRAS resulting in a decrease in FP. EC50 is 3 pM.
Figure 4 shows another example of helix grafting. Figure 4A shows the modelled structure of the Mdm2 (Mouse double minute 2 homolog) N-terminal domain in complex with the p53-TPR2 comprising the Mdm2-binding helix of p53 grafted onto a helix at the C-terminus of a CTPR2 protein. Figure 4B shows an ITC analysis of the interaction between p53-TPR2 and Mdm2 N-terminal domain. The N-terminal domain of Mdm2 was titrated into the cell containing 10mM p53-TPR2.
Figure 5 shows an example of single and multivalent loop-grafted CTPRs. Figure 5A shows an ITC analysis of the interaction between a series of tankyrase-binding loop-grafted CTPR2 proteins (TBP- CTPR2) and the substrate-binding ARC4 (ankyrin-repeat cluster) domain of tankyrase. There is an enhancement of both binding affinity and dissociation constant with increasing number of binding modules. Figure 5B shows native gel analysis (using a native gel in Tris-Glycine buffer pH 8.0, 40 mM protein concentration) of multivalent TBP-CTPR proteins expressed as fusion constructs with the foldon trimerisation domain (Boudko et al 2002; Meier et al. 2004). 1TBP-CTPR2, 2TBP-CTPR4 and 4TBP-CTPR8 (all lacking the foldon domain) were purified and run as monomeric controls. Constructs having the foldon domain run at much higher molecular weights than their monomeric counterparts.
Figure 6 shows an example of loop-grafted CTPRs comprising the 10-residue Skp2-binding sequence derived from p27 grafted into a loop of a CTPR protein (CTPR-p27).
Figure 6A shows that HA-CTPR2-p27 is able to co-IP FLAG-Skp2 from HEK293T cells.
Figure 6B shows E. colhex pressed and purified TPR5-p27 inhibits p27 ubiquitination in vitro.
Figure 7 shows another example of loop-grafted CTPRs. Figure 7 A shows (left) ITC analysis of the interaction between the Keapl (Kelch-like ECH-associated protein 1) KELCH domain and a CTPR2 protein containing a loop-grafted Keapl -binding sequence derived from the protein Nrf2 (Nuclear factor (erythroid-derived 2)-like 2) (Nrf-CTPR2). No binding is observed for the blank CPTR2 protein (right). Figure 7B shows that three variants of Nrf-CTPR2 (Nrf-CTPR2 (i), Nrf-CTPR2 (ii), Nrf-CTPR2 (iii) can co-IP Keapl from HEK293T cells.
Figure 8 shows live-cell imaging of intracellular delivery of an RTPR achieved by resurfacing (by introducing Arginine residues at surface sites). PC3 (left) and U20S (right) cells incubated with 10 mM FITC-labelled resurfaced TBP-RTPR2 for 3 hours at 37°C, 5% CO2. Overlay of DIC (differential interference contrast) and confocal image. Intracellular fluorescence was also observed at lower concentrations of protein.
Figure 9 shows the induced degradation of the target protein beta-catenin by designed heterobifunctional RTPRs. Figure 9A shows the beta-catenin levels in cells transfected with either HA- tagged beta-catenin plasmid alone or HA-tagged beta-catenin plasmid together with one of two different hetero-bifunctional RTPR plasmids (LRH1-TPR-p27 and axin-TPR-p27, designed to bind simultaneously to beta-catenin and to E3 ligase SCFSkp2). Figure 9B shows a quantitative analysis of the beta-catenin levels in the presence of different hetero-bifunctional RTPRs designed to bind simultaneously to beta-catenin and to either E3 ligase SCFSkp2 or E3 ligase Mdm2. The analysis was performed using densitometry of the bands detected by Western blots corresponding to HA-tagged beta-catenin normalised to actin bands using ImageJ. Negative controls used were single-function TPRs or blank (non-functional) TPRs.
Figure 10 shows examples of different modular binding protein formats. A modular binding protein may comprise: two repeat domains with a helical target-binding peptide and a helical E3-binding peptide at the N and C termini (Figure 10A); three repeat domains with a helical E3-binding peptide at the C terminus and a target peptide ligand in the first inter-repeat loop from the N terminus (Figure 10B); three repeat domains with a helical target-binding peptide at the N terminus and an E3 peptide ligand in the second inter-repeat loop from the N terminus (Figure 10C), four repeat domains with a target-peptide ligand and an E3 peptide ligand in the first and third inter-repeat loop from the N terminus (Figure 10D).
Figure 11 shows a schematic of a modular binding protein with four peptide ligands located in alternate inter-repeat loops. The binding sites are arrayed at 90° to each other.
Figure 12 shows a schematic of a modular binding protein engineered so that peptide ligands in alternate inter-repeat loops bind adjacent epitopes on the target.
Figure 13 shows the modelled structure of a hetero-bifunctional modular binding protein comprising TPR repeat domains, an LRH1 -derived peptide ligand designed to bind target beta-catenin, and a p53-derived N-terminal peptide ligand designed to bind to the E3 ubiquitin ligase mdm2.
Figure 14 shows a schematic of the combinatorial assembly of a module comprising a repeat domain and a terminal helical peptide ligand and a module comprising repeat domains and an inter-repeat loop peptide ligand to generate a modular binding protein.
Figure 15 shows examples of different modular binding protein formats (i) shows the blank proteins; (ii) shows binding peptides inserted into one or more inter-repeat loops (iii) shows helical binding peptides at one or both of the termini; (iv) is a combination of loop and helical binding peptides; (v) and (vi) show examples of how multivalency can be achieved.
Figure 16 shows a schematic of the assembly of a modular binding protein by the progressive screening of modular binding proteins comprising modules with a diverse peptide ligand in addition to modules already identified in previous rounds of screening.
Figure 17 shows the effect of designed multi-valent tankyrase-binding TPR proteins on Wnt signalling. HEK293T cells were transfected with TPR-encoding plasmids using Lipofectamine2000. The TPR proteins contained 1-4 copies of a tankyrase-binding peptide (TBP) grafted onto the inter-repeat loop(s). For example, 2TBP-CTPR4 is a protein comprising 4 TPR modules with one TBP grafted onto the loop between the first and second TPR and one between the third and fourth TPR. ‘Foldon’ indicates a trimeric TPR-foldon fusion protein. Figure 18 shows characterisation of the size and charge of liposome-encapsulated TPR proteins.
Figure 19 shows the delivery of TPR proteins into cells by liposome encapsulation. FITC dye-labelled liposomes stain the cell membrane upon membrane fusion (red panel), and RITC-labelled TPR protein cargo is then delivered into the cytoplasm. The green panel and red-green merge show that the proteins have entered the cells and are spread diffusely in the cytoplasm.
Figure 20 shows that liposome-encapsulated TPR proteins are not toxic to HEK293T cells at the concentrations used.
Figure 21 shows the effect of designed TPR proteins delivered by liposome encapsulation. The TPR proteins contained a tankyrase-binding peptide. Cells were treated with liposomes for 2 hr.
Figure 22 shows the effect of designed TPR proteins delivered by liposome encapsulation. The TPR proteins contained a tankyrase-binding peptide. Cells were treated with liposomes encapsulating 32 mg protein for variable times (2-8 h) indicated in the figure.
Figure 23 shows the effect of designed hetero-bifunctional TPR proteins on KRAS levels in HEK 293T cells. The TPR proteins contained a binding sequence for KRAS (a non-helical peptide sequence, referred to as KBL, grafted onto an inter-repeat loop of the RTPR) and a degron derived from p27 grafted onto another inter-repeat loop. Cells were transiently transfected with 50 ng or 500ng of TPR encoding plasmids, as indicated, and with KRAS plasmid or empty vector as control. 24 hours post transfection the cells were lysed, and KRAS levels were evaluated by western blot. In dark grey are cells treated transfected with single-function TPR plasmid (containing degron only).
Figure 24 shows the effect of designed hetero-bifunctional TPR proteins (delivered by liposome encapsulation) on KRAS levels. The TPR proteins contained a KRAS-binding peptide and a SCFSkp2- binding peptide to direct KRAS for ubiquitination and subsequent degradation. Cells were treated with liposomes for 2 hr.
Figure 25 shows the shows the effect of designed hetero-bifunctional TPR proteins (delivered by liposome encapsulation) on KRAS levels. The TPR proteins contained a KRAS-binding peptide and a SCFSkp2-binding peptide to direct KRAS for ubiquitination and subsequent degradation. Cells were treated with TPR protein for variable times (2-8 h) indicated in the figure.
Figure 26 shows the effect of hetero-bifunctional TPR proteins targeting endogenous KRAS to the CMA (chaperone-mediated autophagy) pathway. The TPR proteins contained a binding sequence for KRAS (either a grafted helix derived from son-of-sevenless-homolog 1 (SOS) or a non-helical peptide sequence (referred to as‘KBL’) displayed in a loop of the RTPR) and targeted for degradation using two different chaperone-mediated autophagy peptides (referred to as‘CMA_Q’ or‘CMA_K’) at the N- or C-terminus of the construct. Constructs or empty vector (light grey) were transiently transfected into either HEK293T or DLD1 (colorectal cancer cell line). 24 hours post transfection the cells were lysed, and KRAS levels were evaluated by western blot. Those constructs that resulted in significant reduction in KRAS compared to the empty vector control are shown in white.
Figure 27 shows examples of variations in the linker sequence connecting a peptide ligand to an interrepeat loop in order to optimise the binding affinity for the target. The example shown is Nrf-TPR, a TPR protein designed to bind to Keapl (see Fig. 7 of the original patent application). Glycine residues were introduced into the linker to provide flexibility and increased spatial sampling. The introduction of this more flexible linker sequence was found to increase the binding affinity of the Nrf-TPR protein (labelled‘Flexible’) when compared with the consensus-like linker sequence. Altering the charge content of the linker sequence (’labelled‘Charged’) and altering the conformational properties (based on the predictions of the program CIDER (Holehouse et al. Biophys. J. 1 12, 16-21 (2017)) of the loop by changing the amino acid composition of the linker sequence (labelled‘CIDER-optimised’) also affected the Keapl -binding affinity.
Figure 28 shows the results of a screen for Beta Catenin degradation by bispecific CPTR constructs using the“Phospho” peptide ligand that binds to Beta Catenin. The structures and sequences the CPTR constructs are shown in Tables 12 and 13.
Figure 29 shows the results of a screen for Beta Catenin degradation by bispecific CPTR constructs using the“AXIN” peptide ligand that binds to Beta Catenin. The structures and sequences the CPTR constructs are shown in Tables 12 and 13.
Figure 30 shows the results of a screen for Beta Catenin degradation by bispecific CPTR constructs using the“BCL9” peptide ligand that binds to Beta Catenin. The structures and sequences the CPTR constructs are shown in Tables 12 and 13.
Figure 31 shows the results of a screen for Beta Catenin degradation by bispecific CPTR constructs using the“LRH1” peptide ligand that binds to Beta Catenin. The structures and sequences the CPTR constructs are shown in Tables 12 and 13.
Detailed Description
This invention relates to the modular binding proteins that comprise multiple repeat domains. These repeat domains are linked to each other in the polypeptide chain by inter-repeat loops. One or more peptide ligands or binding domains, are located in one or more of the inter-repeat loops and/or in N or C terminal helices of the modular binding protein. The peptide ligands may be to the same or different target molecules and the modular binding protein may be multi-functional and/or multi-valent. The geometrical display of the grafted binding sites may be precisely and predictably tuned by adjusting the positions of the binding sites and the number and shape of the repeat domains. Modular binding proteins as described herein may be useful in a range of therapeutic and diagnostic applications.
A repeat domain is a repetitive structural element of 30 to 100 amino acids that forms a defined secondary structure. Multiple repeat domains stack sequentially in a modular fashion to form a stable protein, which may for example have a solenoid or toroid structure. Repeat domains may be synthetic or may be naturally-occurring repeats from tandem repeat proteins, or variants thereof.
Due to the identical form of their building blocks, solenoid domains can only assume a limited number of shapes. Two main topologies are possible: linear (or open, generally with some degree of helical curvature) and circular (or closed). Patthy, Laszlo (2007). Protein Evolution. Wiley-Blackwell. ISBN 978-1 -4051-5166-5.
If the two terminal repeats in a solenoid do not physically interact, it leads to an open or linear structure. Members of this group are frequently rod- or crescent-shaped. The number of individual repeats can range from 2 to over 50. A clear advantage of this topology is that both the N- and C- terminal ends are free to add new repeats and folds, or even remove existing ones during evolution without any gross impact on the structural stability of the entire domain. Kinch LN, Grishin NV (June 2002). Curr. Opin. Struct. Biol. 12 (3): 400-8. doi:10.1016/s0959-440x(02)00338-x. PMID 12127461. This type of domain is extremely common among extracellular segments of receptors or cell adhesion molecules. A non-exhaustive list of examples include: EGF repeats, cadherin repeats, leucine-rich repeats, HEAT repeats, ankyrin repeats, armadillo repeats, tetratricopeptide repeats, etc. Whenever a linear solenoid domain structure participates in protein-protein interactions, frequently at least 3 or more repetitive subunits form the ligand-binding sites. Thus - while individual repeats might have a (limited) ability to fold on their own - they usually cannot perform the functions of the entire domain alone.
In the case when the N- and C-terminal repeats lie in close physical contact in a solenoid domain, the result is a topologically compact, closed structure. Such domains typically display a high rotational symmetry (unlike open solenoids that only have translational symmetries), and assume a wheel-like shape. Because of the limitations of this structure, the number of individual repeats is not arbitrary. In the case of WD40 repeats (perhaps the largest family of closed solenoids) the number of repeats can range from 4 to 10 (more usually between 5 and 7). (Vogel C, Berzuini C, Bashton M, Gough J, Teichmann SA (February 2004). J. Mol. Biol. 336 (3): 809-23). Kelch repeats, beta- barrels and beta- trefoil repeats are further examples for this architecture.
A repeat domain may have the structure of a solenoid repeat. The structures of solenoid repeats are well known in the art (see for example Kobe & Kajava Trends in Biochemical Sciences 2000;
25(10):509-15). For example, a repeat domain may have an a/a or a/3io (helix-turn-helix or hth) structure, for example a tetratricopeptide repeat structure; a/a/a (helix-turn-helix-turn-helix or hthth) structure, for example an armadillo repeat structure; a b/b/a/a structure; a a/b or 3io/b structure, for example a leucine rich repeat (LRR) structure; a b/b/b structure, for example, an IGF1 RL, HPR or PeIC repeat structure; or a b/b structure, for example a serralysin or EGF repeat structure.
A“scaffold” refers to two or more repeat domains, and a“grafted scaffold” refers to a chimeric protein, which is a continuous polypeptide comprising a scaffold and a heterologous binding site (e.g., a peptide ligand).
Ankyrin repeat, one of the most widely existing protein motifs in nature, consists of 30-34 amino acid residues and exclusively functions to mediate protein-protein interactions, some of which are directly involved in the development of human cancer and other diseases. Each ankyrin repeat exhibits a helix-turn-helix conformation, and strings of such tandem repeats are packed in a nearly linear array to form helix-turn-helix bundles with relatively flexible loops. The global structure of an ankyrin repeat protein is mainly stabilized by intra- and inter-repeat hydrophobic and hydrogen bonding interactions. The repetitive and elongated nature of ankyrin repeat proteins provides the molecular bases of the unique characteristics. Examples of ANK repeats suitable for use as described herein are shown in Table 14.
Consensus Sequences for ANK repeats (SMART database, see Table 14) include the following:
004242/1 -30 NGHTALHIAASK - GDEQCVKLLLEHGA - DPNA (SEQ ID
NO: 1)
CONSENSUS/80% . t . sslhhsh . t. tp . phhphllp . t. pht (SEQ ID NO: 2)
CONSENSUS/65% pstosLphAstp. sphphlphLlptss. shsh (SEQ ID NO: 3)
CONSENSUS/50% sGpTsLHhAsps. sshdlchLlspus. slst (SEQ ID NO: 4)
The armadillo (Arm) repeat is an approximately 40 amino acid long tandemly repeated sequence motif first identified in the Drosophila melanogaster segment polarity gene armadillo involved in signal transduction through wingless. Animal Arm-repeat proteins function in various processes, including intracellular signaling and cytoskeletal regulation, and include such proteins as beta-catenin, the junctional plaque protein plakoglobin, the adenomatous polyposis coli (ARC) tumour suppressor protein, and the nuclear transport factor importin-alpha, amongst others [(PUBMED:9770300)].
Examples of Arm repeats suitable for use as described herein are shown in Table 15.
Consensus sequences for ARM repeats (SMART database, see Table 15) include the following:
IM02HUMANb PND-KIQAVIDAG-VCRRLVELLM - HNDYKWSPALRA
(SEQ ID NO: 5)
CONSENSUS/80% pt ... h .. hhp . t .. hi .. lhphlt. p . pi . t . shhs
(SEQ ID NO: 6)
CONSENSUS/65% ssp . ptphlhpts .. slshLlpLLp. pts . plhptsshs (SEQ ID NO: 7)
CONSENSUS/50% ssc . sppsllcsG . slstLlpLLs. sscsclppsAstA
(SEQ ID NO: 8)
IM02HUMANb VGNIVT (SEQ ID NO: 9)
CONSENSUS/80% ltpls. (SEQ ID NO: 10)
CONSENSUS/65% LpNlst (SEQ ID NO: 11)
CONSENSUS/50% LsNlus (SEQ ID NO: 12)
Suitable repeat domains may include domains of the Ankyrin clan (Pfam: CL0465), such as ankyrin (PF00023), which may comprise a 30-34 amino-acid repeat composed of two beta strands and two alpha helices; domains of the leucine-rich repeat (LRR) clan (Pfam; CL0022), such as LRR1
(PF00560), which may comprise a 20-30 amino acid repeat composed of an a/b horseshoe fold; domains of the Pec Lyase-like (CL0268) clan, such as pec lyase C (PF00544), which may comprise a right handed beta helix; domains of the beta-Roll (CL0592) clan such as Haemolysin-type calciumbinding repeat (PF000353), which may comprise short repeat units (e.g. 9-mers) that form a beta-roll made up of a super-helix of beta-strand-tums of two short strands each, stabilised by Ca2+ ions; domains of the PSI clan (CL0630), such as trefoil (PF00088); and domains of the tetratrico peptide clan (CL0020), such as TPR-1 (PR00515), which may comprise a 24 to 90 amino acid repeat composed of a helix-turn-helix.
Suitable repeat domains may be identified using the PFAM database (see for example Finn et al Nucleic Acids Research (2016) Database Issue 44:D279-D285).
In some preferred embodiments, the repeat domain may have the structure of an a/a- solenoid repeat domain, such as a helix-turn-helix. A helix-turn-helix domain comprises two antiparallel a-helices of 12-45 amino acids.
Suitable helix-turn-helix domains include tetratricopeptide-like repeat domains. Tetratricopeptide-like repeats may include domains of the TPR clan (CL0020), for example and Arm domains (see for example Armadillo; PF00514; Huber et al Cell 1997;90: 871-882), HEAT domains (Huntingtin, EF3, PP2A, TOR1 ; PF02985; see for example Groves et al. Cell. 96 (1): 99-110), PPR domains
(pentatricopeptide repeat PF01535; see for example Small (2000) Trends Biochem. Sci. 25 (2): 46- 7), TALE domains (TAL (transcription activator-like) effector; PF03377; see for example Zhang et al Nature Biotechnology. 29 (2): 149-53) and TPR1 domains (tetratricopeptide repeat-1 ; PF00515; see for example Blatch et al BioEssays. 21 (1 1): 932-9).
Other suitable helix-turn-helix domain may be synthetic, for example DHR1 to DHR83 as disclosed in Brunette et al., Nature 2015 528 580-584. In some preferred embodiments, the helix-tum-helix scaffold may be a tetratricopeptide repeat domain (TPR) (D’Andrea & Regan, 2003) or a variant thereof. TPR repeat domains may include naturally occurring or synthetic TPR domains. Suitable TPR repeat domains are well known in the art (see for example Parmeggiani et al. , J. Mol. Biol. 427 563-575) and may have the amino acid sequence:
AEAWYNLGNAYYKQGDYQKAIEYYQKALEL-X1X2 X3X4 (SEQ ID NO: 13) wherein Xi-4 are independently any amino acid, preferably Xi and X2 being D and P respectively, or may be a variant of this sequence. Other TPR repeat domain sequences are shown in Tables 4-6 and 1 1 below.
Additional TPR repeat consensus sequences (SMART database, see Table 1 1 (SEQ ID NOs: 816- 1229) include the following:
S75991 ALTLNNIGTI YYAREDYDQA LNYYEQALSL SRAV (SEQ ID NO: 14)
CONSENSUS/80% XXhhXthuXh hXXXtphppA htxhppsltht XpX (SEQ ID NO: 15)
CONSENSUS/65% spshhphGth hhphsphppA lphappAlpl pspX (SEQ ID NO: 16)
CONSENSUS/50% spsatslGps atptucaccA lcsap+ALcl sPss (SEQ ID NO: 17)
Other TPR repeat domain sequences are shown in Tables 4 (SEQ ID NOs 168-181 ), 6 (SEQ ID NOs
221 -243) and 1 1 (SEQ ID NOs 81 1 -1233) below.
The grouping of amino acids to classes and class abbreviation (the key) used within consensus sequences are shown below
Figure imgf000013_0001
Preferred TPR domains may include CTPR, RTPRa, RTPRb and KTPRb domains, for example a domain having a sequence shown in Table 4 (SEQ ID NOs 170-183) or Table 6 (SEQ ID NOs: 223- 246) or Table 11 (SEQ ID NOs: 816-1229) or a variant of a sequence shown in Table 4 (SEQ ID NOs 170-183) or Table 6 (SEQ ID NOs: 223-246) or Table 11 (SEQ ID NOs: 816-1229).
In some embodiments, a TPR repeat domain may be a human TPR repeat domain, preferably a TPR repeat domain from a human protein in blood. TPR repeat domains from human blood may have reduced immunogenicity in vivo. Suitable human blood TPR repeat domains may include repeat domains from IFIT1 , IFIT2 or IFIT3. Other examples of human blood repeat domains identified in the plasma proteome database are shown in Table 5 (SEQ ID NOs; 184-222).
Suitable human blood repeat domains may be identified from the plasma proteome database (Nanjappa et al Nucl Acids Res 2014 Jan;42(Database issue):D959-65) for example by searching for sequences with high sequence identity to the TPR repeat domain using standard sequence analysis tools (e.g. Altschul et al Nucleic Acids Res. 25:3389-34021; Altschul et al FEBS J. 272:5101-5109).
In some embodiments, the two or more repeat domains of a modular binding protein described herein may comprise a small, glutamine-rich, tetratricopeptide repeat protein alpha (SGTA) domain. The STGA domain is involved in protein quality control pathways and comprises three TPR repeat domains and a C-terminal capping helix. The STGA domain binds to the Rpn13 substrate receptor of the proteasome. A modular binding protein comprising the STGA domain may bind to proteasome receptors through the STGA domain. One or more peptide ligands for a target molecule grafted into the STGA domain of a modular binding protein as described herein may allow the target molecule to be delivered directly to the proteasome for degradation by the modular binding protein.
A suitable STGA domain may have the sequence:
ER3AEAEKRKTEONE0MKnENEEAAnHEUOKAIERNRANAnUEONKAAAU3KRONUAOAn0ROEKAIOI
RRAU3KAUOKMORAR33RNKHnEAnAUUKKARERRRRNETUK3NRKIAERKRKEAR
(SEQ ID NO: 18) (domains TPR1-A, TPR1-B, TPR2-A, TPR1-B, TPR3-A, TPR3-B
(sequentially) solid underlined, C terminal helix dotted underlined)
or may be a variant of this sequence.
A suitable STGA domain may be encoded by the nucleotide sequence:
gaagatagcgcggaagcggaacgcctgaaaaccgaaggcaacgaacagatgaaagtggaaaactttgaa gcggcggtgcatttttatggcaaagcgattgaactgaacccggcgaacgcggtgtatttttgcaaccgc gcggcggcgtatagcaaactgggcaactatgcgggcgcggtgcaggattgcgaacgcgcgatttgcatt gatccggcgtatagcaaagcgtatggccgcatgggcctggcgctgagcagcctgaacaaacatgtggaa gcggtggcgtattataaaaaagcgctggaactggatccggataacgaaacctataaaagcaacctgaaa attgcggaactgaaactgcgcgaagcgccg (SEQ ID NO: 19)
A variant of a reference repeat domain or peptide ligand sequence set out herein may comprise an amino acid sequence having at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or at least 98% sequence identity to the reference sequence. Particular amino acid sequence variants may differ from a repeat domain shown above by insertion, addition, substitution or deletion of 1 amino acid, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more than 10 amino acids. Preferred variants of a TPR repeat domain may comprise one or more conserved residues, for example, 1 , 2, 3, 4, 5, 6 or more preferably all of Leu at position 7, Gly or Ala at position 8, Tyr at position 11 , Ala at position 20, Ala at position 27, Leu or lie at positions 28 and 30 and Pro at position 32.
Sequence similarity and identity are commonly defined with reference to the algorithm GAP
(Wisconsin Package, Accelerys, San Diego USA). GAP uses the Needleman and Wunsch algorithm to align two complete sequences that maximizes the number of matches and minimizes the number of gaps. Generally, default parameters are used, with a gap creation penalty = 12 and gap extension penalty = 4. Use of GAP may be preferred but other algorithms may be used, e.g. BLAST (which uses the method of Altschul et al. (1990) J. Mol. Biol. 215: 405-410), FASTA (which uses the method of Pearson and Lipman (1988) PNAS USA 85: 2444-2448), or the Smith -Waterman algorithm (Smith and Waterman (1981) J. Mol Biol. 147: 195-197), or the TBLASTN program, of Altschul et al. (1990) supra, generally employing default parameters. In particular, the psi-Blast algorithm (Nucl. Acids Res. (1997) 25 3389-3402) may be used. Computerized implementations of these algorithms (GAP, BESTFIT, PASTA, and FASTA in the Wsconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, Wl) are available and publicly available computer software may be used such as ClustalOmega (Soding, J. 2005, Bioinformatics 21 , 951-960), T-coffee (Notredame et al. 2000, J. Mol. Biol. (2000) 302, 205-217), Kalign (Lassmann and Sonnhammer 2005, BMC
Bioinformatics, 6(298)), Genomequest™ software (Gene-IT, Worcester MA USA) and MAFFT (Katoh and Stand ley 2013, Molecular Biology and Evolution, 30(4) 772-780 software. When using such software, the default parameters, e.g. for gap penalty and extension penalty, are preferably used. A preferred example of algorithm that is suitable for determining percent sequence identity and sequence similarity are the BLAST and BLAST 2.0 algorithms, which are described in Altschul et al., Nuc. Acids Res. 25:3389-3402 (1977) and Altschul et al., J. Mol. Biol. 215:403-410 (1990), respectively.
Sequence comparison may be made over the full-length of the relevant sequence described herein.
For example, a repeat domain may comprise one or more point mutations to facilitate grafting of hydrophobic peptide ligands. For example, aromatic residues in the repeat domain may be substituted for polar or charged residues. Suitable substitutions may be identified in a rational manner, for example using Hidden Markov plots of repeat domain sequences to identify non-aromatic residues that are found in nature in consensus aromatic positions. A suitable TPR repeat domain for grafting hydrophobic peptide ligand may have the amino acid sequence:
AEAWYNLGNAYYRQGDYQRAIEYYQRALEL-XIX2 X3X4 (SEQ ID NO: 20) wherein Xi-4 are independently any amino acid, preferably Xi and X2 being D and P respectively.
In some embodiments, lysine residues in the repeat domain may be replaced by arginine residues to prevent ubiquitination and subsequent degradation. This may be particularly useful when the modular binding protein comprises an E3 ubiquitin ligase-peptide ligand, for example in a proteolysis targeting chimera (PROTAC). For example, a suitable TPR repeat domain may have the amino acid sequence:
AEALNNLGNVYREQGDYQRAIEYYQRALEL-XIX2 X3X4 (SEQ ID NO: 21) wherein Xi-4 are independently any amino acid, preferably Xi and X2 being D and P respectively.
In some embodiments, a modular binding protein may comprise a scaffold with the amino acid sequence of residues 1 to 171 of SEQ ID NO: 1230 (PPX172 of Table 13 without the HA Tags). A target peptide ligand and the E3 ligase peptide ligand may be grafted into any two of the N terminal helical grafting site 1 (corresponding to residues 1 to 2 of SEQ ID No 1230; before first CTPR repeat); loop grafting site 2 (corresponding to residues 35 to 36 of SEQ ID No 1230; between first and second CTPR repeat); loop grafting site 3 (corresponding to residues 69 to 70 of SEQ ID NO: 1230m between second and third CTPR repeat); loop grafting site 4 (corresponding to residues 103 to 104 of SEQ ID No 1230; between third and fourth CTPR repeat), loop grafting site 5 (corresponding to residues 137 to 138 SEQ ID No 1230; between fourth and fifth CTPR modules) and loop grafting site 6 (corresponding to residues 167 to 168 SEQ ID No 1230; after last CTPR repeat) DPNN peptides may be placed before and after a peptide ligand to force a turn in the structure.
In preferred embodiments, the modular binding protein may comprise 2, 3, 4, 5, 6, 7, 8, 9, 10 or more than 10 repeat domains. Preferably, the modular binding protein comprises 2 to 5 repeat domains. Modular binding proteins with fewer repeat domains may display increased cell penetration. For example, a modular binding protein with 2-3 repeat domains may be useful in binding intracellular target molecule. Modular binding proteins with more repeat domains may display increased stability and functionality. For example, a modular binding protein with 4 or more repeat domains may be useful in binding extracellular target molecules. A modular binding protein with 6 or more repeat domains may be useful in producing long linear molecules for targeting or assembling extracellular complexes in bi- or multivalent formats.
In other embodiments, sufficient stability and functionality may be conferred by a single repeat domain with N and C terminal peptide ligands. For example, a modular binding protein may comprise:
(i) a repeat domain, and
(ii) peptide ligands at the N and C terminal of the repeat domain.
The repeat domains of a modular binding protein may lack binding activity i.e. the binding activity of the modular binding protein is mediated by the peptide ligands and not by residues within the repeat domains.
A peptide ligand is a contiguous amino acid sequence that specifically binds to a target molecule. Suitable peptide ligands that are capable of grafting onto a terminal helix or inter-repeat loop are well- known in the art and include peptide sequences selected from a library, antigen epitopes, natural protein-protein interactions (helical, extended or turn-like) and short linear motifs (SLiMs). Viral SLiMs (that hijack the host machinery) may be particularly useful because they may display high binding affinities (Davey et al (201 1) Trends Biochem. Sci. 36,159-169).
A suitable peptide ligand for a target molecule may be selected from a library, for example using phage or ribosome display, or identified or designed using rational approaches or computational design, for example using the crystal structure of a complex or an interaction. In some embodiments, peptide ligands may be identified in an amino acid sequence using standard sequence analysis tools (e.g. Davey et al Nucleic Acids Res. 201 1 Jul 1 ; 39(Web Server issue): W56-W60).
Peptide ligands may be 5 to 25 amino acids in length, preferably 8 to 15 amino acids, although in some embodiments, longer peptide ligands may be employed.
The peptide ligands and the repeat domains of the modular binding protein are heterologous i.e. the peptide ligand is not associated with the repeat domain in naturally occurring proteins and the binding and repeat domains are artificially associated in the modular binding protein by recombinant means.
A modular binding protein described herein may comprise 1 to n+1 peptide ligands, where n is the number of repeat domains in the modular binding protein. The number of peptide ligands is determined by the required functionality and valency of the modular binding protein. For example, one peptide ligand may be suitable for a mono-functional modular binding protein and two or more peptide ligands may be suitable for a bi-functional or multi-functional modular binding protein.
Modular binding proteins may be monovalent. A target molecule may be bound by a single peptide ligand in a monovalent modular binding protein. Modular binding proteins may be multivalent. A target molecule may be bound by two or more of the same or different peptide ligands in a multivalent modular binding protein.
Modular binding proteins may be monospecific. The peptide ligands in a monospecific modular binding protein may all bind to the same target molecule, more preferably the same site or epitope of the target molecule.
Modular binding proteins may be multi-specific. The peptide ligands in a multi-specific modular binding protein may bind to different target molecules. For example, a bi-specific modular binding protein may comprise one or more peptide ligands that bind to a first target molecule and one or more peptide ligands that bind to a second target molecule and a tri-specific modular binding protein may comprise one or more peptide ligands that bind to a first target molecule, one or more peptide ligands that bind to a second target molecule and one or more peptide ligands that bind to third target molecule. A bi-specific modular binding protein may bind to the two different target molecules concurrently. This may be useful in bringing the first and second target molecules into close proximity. When the target molecules are located on different cells, concurrent binding of the target molecules to the modular binding protein may bring the cells into close proximity, for example to promote or enhance the interaction of the cells. For example, a modular binding protein which binds to a tumour specific antigen and a T cell antigen, such as CD3, may be useful in bringing T cells into proximity to tumour cells. When the target molecules are from different biological pathways, this may be may be useful in achieving synergistic effects and also for minimising resistance.
A tri-specific modular binding protein may bind to three different target molecules concurrently. In some embodiments, one of the target molecules may be an E3 ubiquitin ligase. For example, trispecific modular binding protein may bind to a first target molecule from a first biological pathway and a second target molecule from a second biological pathway as well as an E3 ubiquitin ligase. This may be useful in achieving synergistic effects and also for minimising resistance.
A peptide ligand may be located in an inter-repeat loop of the modular binding protein.
An inter-repeat peptide ligand may comprise 5 to 25 amino acid residues, preferably 8 to 15 amino acids. However, since there is no intrinsic restriction on the size of the inter-loop peptide ligand, longer sequences of more than 25 amino acid residues may be used in some embodiments.
In some embodiments, an unstructured peptide ligand may be inserted into an inter-repeat loop.
One or more, two or more, three or more, four or more or five or more of the inter-repeat loops in the modular binding protein may comprise peptide ligands. The peptide ligands may be located on consecutive inter- re peat loops or may have a different distribution in the inter-repeat loops of the modular binding protein. For example, inter-repeat loops comprising a peptide ligand may be separated in the modular protein by one or more, two or more, three or more or four or more interrepeat loops which lack a peptide ligand.
A peptide ligand may be connected to an inter-repeat loop directly or via one or more additional residues or linkers. Additional residues or linkers may be useful for example when a peptide ligand requires conformational flexibility in order to bind to a target molecule, or when the amino acid residues that are adjacent to the minimal peptide ligand favourably influence the micro-environment of the binding interface.
Additional residues or linkers may be positioned at the N terminus of the peptide ligand, the C terminus of the peptide ligand, or both. For example, the sequence of an inter-repeat loop containing a peptide ligand may be [Xi-;]-[Xi-n]-[Xi-z] (SEQ ID NO: 22), where each residue denoted by X is independently any amino acid and may be the same amino acid or a different amino acid to any other residue that is also denoted by X, [Xi-n] is the peptide ligand, n is 1 to 100, [Xi-;] is a linker and /' is independently any number between 1 to 10. In some embodiments, D may be preferred at the first position of the linker [Xi-;], P may be preferred at the second position of linker [Xi-;], D may be preferred at the last position of the linker [Xi-Z] and/or P may be preferred at the penultimate position of linker [Xi-z]. Examples of preferred inter-repeat loop sequences may include DP-[Xi-n]-PX; DPXX- [Xi-n]-XXPX (SEQ ID NO: 23); DPXX-[Xi-n]-XPXX (SEQ ID NO: 24); DPXX-[Xi-n]-PXXX (SEQ ID NO: 25); PXX-[Xi-;]-[Xi-n]-[Xi-;]-XXPX (SEQ ID NO: 26), DPXX-[Xi-;]- [Xi-n]-[Xi-;]-XPXX (SEQ ID NO: 27), DPXX-[Xi-;]- [Xi-n]-[Xi-;]-PXXX (SEQ ID NO: 28), DPXX-[Xi-;]- [Xi-n]-XPXX (SEQ ID NO: 29), DPXX-[Xi- ;]-[Xi-n]-XPXX (SEQ ID NO: 30), DPXX-[Xi-;]-[Xi-n]-XPXX (SEQ ID NO: 31), DPXX-[Xi-n]-[Xi-;]-XXPX (SEQ ID NO: 32), DPXX-[Xi-n]-[Xi-;]-XPXX (SEQ ID NO: 33) and DPXX-[Xi-n]-[Xi-;]-PXXX (SEQ ID NO: 34).
The precise sequence of the residues or linkers used to connect a peptide ligand to an inter-repeat loop depends on the peptide ligand and may be readily determined for any peptide ligand of interest using standard techniques. For example, small, non-hydrophobic amino acids, such as glycine, may be used to provide flexibility and increased spatial sampling, for example when a peptide ligand needs to adopt a specific conformation, or proline residues may be used to increase rigidity, for example, when the peptide ligands are short.
In some preferred embodiments, an inter-repeat peptide ligand may be non-hydrophobic. For example, at least 40% of the amino acids in the peptide ligand may be charged (e.g. D, E, R or K) or polar (e.g. Q, N, H, T, Y, C or W). Alternatively, the repeat domains may be modified to accommodate a hydrophobic peptide ligand, for example by replacing aromatic residues with charged or polar residues.
A peptide ligand may be located at one or both termini of the modular binding protein.
In some embodiments, a peptide ligand located at the N or C terminus may comprise an a-helical structure and may comprise all or part of a half-repeat (i.e. all or part of a single a-helix) that stacks against the adjacent repeat domain. The a-helix of the terminal peptide ligand makes stabilising interactions with the adjacent repeat domain and is stable and folded. Only a few of the positions that structurally define an a-helix are required for the correct interfacial interaction with the adjacent repeat domain. The residues in some of these positions are defined (Tyr (/) - lie (i+4) - Tyr (/'+/) - Leu (i+11) for the N-terminal a-helix and Ala (/) - Leu (i+4) - Ala/Val (/'+/) for the C-terminal helix), but the remaining positions of the a-helix may be modified to form a helical peptide ligand.
A helical peptide ligand may be located at the N terminus of the protein. The N terminal peptide ligand may be helical and may comprise all or part of the sequence Xn-(X)is-XiX2XX (SEQ ID NO: 35), preferably all or part of the sequence Xn-XYXXXIXXYXXXLXX-XiX2XX (SEQ ID NO: 36), where each residue denoted by X is independently any amino acid and may be the same amino acid or a different amino acid to any other residue in the sequence that is also denoted by X, Xi is independently any amino acid, preferably D, and X2 is independently any amino acid, preferably P, and n is 0 or any number. In some embodiments, the Y, I, and/or L residues in the N terminal peptide ligand may be substituted for an amino acid residue with similar properties (i.e. a conservative substitution).
A helical peptide ligand may be located at the C terminus of the protein. The C terminal peptide ligand may be helical and may comprise all or part of the sequence Xn-(X)is-XiX2XX (SEQ ID NO: 35), preferably all or part of the sequence XIX2XX-XXAXXXLXX[A (SEQ ID NO: 37) or V]XXXXX-Xn (SEQ ID NO: 38), where X is independently any amino acid and may be the same amino acid or a different amino acid to any other residue in the sequence that is also denoted by X, Xi is independently any amino acid, preferably D, and X2 is independently any amino acid, preferably P, and n is 0 or any number. In some embodiments, the A, L and/or V residues in the C terminal peptide ligand may be substituted for an amino acid residue with similar properties (i.e. a conservative substitution).
The minimum length of the terminal peptide ligand is determined by the number of residues required to form a helix that binds to the target molecule. There is no intrinsic maximum length of the terminal peptide ligand and n may be any number.
In other embodiments, a peptide ligand located at the N or C terminus may comprise a non-helical structure. For example, a peptide ligand that is an obligate N- or C- terminal domain (for example because the terminal amino or carboxylate group mediates the binding interaction) may be located at the beginning or end of the one or more repeat domains.
In some embodiments, one or more positions in a peptide ligand may be diverse or randomised. A modular binding protein comprising one or more diverse or randomised residues may form a library as described below.
In some embodiments, the N and C terminal peptide ligands may be non-hydrophobic. For example, at least 20% of the amino acids in the peptide ligand may be charged (e.g. D, E, R or K) or polar (e.g. Q, N, H, T, Y, C or W). Alternatively, the helix turn helix scaffold of the repeat domains may be modified, for example by replacing aromatic residues with charged or polar residues in order to accommodate a hydrophobic peptide ligand.
A modular binding protein as described herein may comprise peptide ligands in any arrangement or combination. For example, peptide ligands may be located at both the N and C terminus and optionally one or more inter-repeat loops of a modular binding protein; at the N terminus and optionally one or more loops of a modular binding protein; at the C terminus and optionally one or more loops of a modular binding protein; or in one or more inter- re peat loops of a modular binding protein. The location of the peptide ligands within a modular binding protein may be determined by rational design, for example using modelling to identify the optimal arrangement for the presentation of two target molecules to each other (e.g. for substrate presentation to an E3 ubiquitin ligase); and/or by screening for example using populations of modular binding proteins with different arrangements of peptide ligands to identify the arrangement which confers the optimal interaction of target molecules.
Suitable target molecules for modular binding proteins described herein include biological macromolecules, such as proteins. The target molecule may be a receptor, enzyme, antigen, oligosaccharide, oligonucleotide, integral membrane protein, transcription factor, transcriptional regulator, G protein coupled receptor (GPCR) or any other target of interest. Proteins that are difficult to target with small molecules, such as PPIs, proteins that accumulate in neurodegenerative diseases and proteins overexpressed in disease conditions, such as cancer, may be particularly suitable target molecules. Target molecules may include a-synuclein; b-amyloid; tau; superoxide dismutase;
huntingtin; b-catenin; KRAS; components of superenhancers and other types of transcriptional regulators, such as N-Myc, C-Myc, Notch, aurora A, EWS-FLI1 (Ewing’s sarcoma-friend leukemia integration 1), TEL-AML1 , TAL1 (T-cell acute lymphocytic leukemia protein 1) and Sox2 ((sex determining region Y)-box 2); tankyrases; phosphatases such as PP2A; epigenetic writers, readers and erasers, such as histone deacetylases and histone methyltransfe rases; BRD4 and other bromodomain proteins; and kinases, such as PLK1 (polo-like kinase 1), c-ABL (Abelson murine leukemia viral oncogene homolog 1) and BCR (breakpoint cluster region)-ABL.
In some embodiments, a modular binding protein may neutralise a biological activity of the target molecule, for example by inhibiting or antagonising its activity or binding to another molecule or by tagging it for ubiquitination and proteasomal degradation or for degradation via autophagy. In other embodiments, a modular binding protein may activate a biological activity of the target molecule.
In some embodiments, the target molecule may be b-catenin. Suitable peptide ligands that specifically bind to b-catenin are well-known in the art and include b-catenin-peptide ligands derived from axin (e.g. GAYPEYILDIHVYRVQLEL (SEQ ID NO: 39) and variants thereof), Bcl-9 (e.g.
SQEQLEHRYRSLITLYDIQLML (SEQ ID NO: 40) and variants thereof), TCF7L2 (e.g.
QELGDNDELMHFSYESTQD (SEQ ID NO: 41) and variants thereof), I CAT (e.g.
YAYQRAIVEYMLRLMS (SEQ ID NO: 42) and variants thereof), LRH-1 (e.g. YEQAIAAYLDALMC (SEQ ID NO: 43) and variants thereof) or APC (e.g. SCSEELEALEALELDE (SEQ ID NO: 44) and variants thereof).
In some embodiments, the target molecule may be KRAS. Suitable peptide ligands that specifically bind to KRAS are well-known in the art and include a KRAS-peptide ligand from SOS-1 (e.g.
FEGIALTNYLKALEG (SEQ ID NO: 45) and variants thereof) and KRAS-peptide ligands identified by phage display (see for example Sakamoto et al. Biochem. Biophys. Res. Comm. (2017) 484 605- 611).
In some embodiments, the target molecule may be tankyrase. Suitable peptide ligands that specifically bind to tankyrase are well-known in the art and include tankyrase peptide ligands from Axin (e.g. REAGDGEE (SEQ ID NO: 46) and HLQREAGDGEEFRS (SEQ ID NO: 47) or variants thereof).
In some embodiments, the target molecule may be EWS-FLI1. Suitable peptide ligands that specifically bind to EWS-FLI1 are well-known in the art and include the ESAP1 peptide
TM RG KKKRTR AN (SEQ ID NO: 48) and variants thereof. Other suitable sequences may be identified by phage display (see for example Erkizan et al. Cell Cycle (2011) 10, 3397-408).
In some embodiments, the target molecule may be Aurora-A. Suitable peptide ligands that specifically bind to Aurora-A are well-known in the art and include Aurora-A binding sequences from TPX2, such as SYSYDAPSDFINFSS (SEQ ID NO: 49) (Bayliss et al. Mol. Cell (2003) 12, 851-62) and Aurora-A binding sequences from N-myc, such as N-myc residues 19-47 or 61-89 (see for example Richards et al. PNAS (2016) 113, 13726-31).
In some embodiments, the target molecule may be N-Myc or C-Myc. Suitable peptide ligands that specifically bind to N-myc or C-myc are well-known in the art and include helical binding sequences from Aurora-A (see for example Richards et al. PNAS (2016) 1 13, 13726-31).
In some embodiments, the target molecule may be WDR5 (WD repeat-containing protein 5).
Suitable peptide ligands that specifically bind to WDR5 are well-known in the art and include the WDR5-interacting motif (WIN) of MLL1 (mixed lineage leukemia protein 1) (see for example Song & Kingston J. Biol. Chem. (2008) 283, 35258-64; Patel et al. J. Biol. Chem. (2008) 283, 32158-61), e.g. EPPLNPHGSARAEVHLRKS (SEQ ID NO: 50) and variants thereof.
In some embodiments, the target molecule may be BRD4 or a Bromodomain protein.
Suitable peptide ligands that specifically bind to BRD4 are well-known in the art and include sequences derived from histone protein ligands.
In some embodiments, the target molecule may be a HD AC (histone deacetylase). Suitable peptide ligands that specifically bind to HD AC are well-known in the art and include binding sequences derived from SMRT and other proteins that recruit HDACs to specific transcriptional regulatory complexes or binding sequences derived from histone proteins (see for example Watson et al. Nat. Comm. (2016) 7, 11262; Dowling et al. Biochem. (2008) 47, 13554-63).
In some embodiments, the target molecule may be Notch. Suitable peptide ligands that specifically bind to Notch are well-known in the art and include binding sequences from the N-terminus of MAML1 (mastermind like protein 1), e.g. SAVMERLRRRIELCRRHHST (SEQ ID NO: 51) and variants thereof (see for example Moellering et al. Nature (2009) 462, 182-8).
In some embodiments, the target molecule may be a Cdk (cyclin-dependent kinase). Suitable peptide ligands that specifically bind to Cdks are well-known in the art and include substrate-based peptides, for example, Cdk2 sequences derived from cyclin A, such as TYTKKQVLRMEHLVLKVLTFDL (SEQ ID NO: 52) and variants thereof (see for example Gondeau et al. J. Biol. Chem. (2005) 280, 13793- 800; Mendoza et al. Cancer Res. (2003) 63, 1020-4).
In some embodiments, the target molecule may be PLK1 (polo-like kinase 1). Suitable peptide ligands that specifically bind to PLK1 are well-known in the art and include optimised substrate-derived sequences that bind to the substrate-binding PBD (polo-box domain), such as
MAGPMQSEPLMGAKK (SEQ ID NO: 53) and variants thereof.
In some embodiments, the target molecule may be Tau. Suitable peptide ligands that specifically bind to Tau are well-known in the art and include tau-binding sequences derived from alpha- and beta-tubulin, such as KDYEEVGVDSVE (SEQ ID NO: 54) and YQQYQDATADEQG (SEQ ID NO: 55) and variants thereof (see for example Maccioni et al. EMBO J. (1988) 7, 1957-63; Rivas et al. PNAS (1988) 85, 6092-6).
In some embodiments, the target molecule may be BCR-ABL. Suitable peptide ligands that specifically bind to BCR-ABL are well-known in the art and include optimized substrate-derived sequences, such as EAIYAAPFAKKK (SEQ ID NO: 56) and variants thereof.
In some embodiments, the target molecule may be PP2A (protein phosphatase 2A). Suitable peptide ligands that specifically bind to PP2A are well-known in the art and include sequences that bind the B56 regulatory subunit, such as LQTIQEEE (SEQ ID NO: 57) and variants thereof (see for example Hetz et al. Mol. Cell (2016), 63 686-95).
In some embodiments, the target molecule may be EED (Embryonic ectoderm development). Suitable peptide ligands that specifically bind to EED are well-known in the art and include helical binding sequences from co-factor EZH2 (enhancer of zeste homolog 2), such as
FSSNRQKILERTEILNQEWKQRRIQPV (SEQ ID NO: 58) and variants thereof (see for example Kim et al. Nat. Chem. Biol. (2013) 9, 643-50.)
In some embodiments, the target molecule may be MCL-1 (induced myeloid leukemia cell differentiation protein). Suitable peptide ligands that specifically bind to MCL-1 are well-known in the art and include sequences from BCL2, e.g. KALETLRRVGDGVQRNHETAF (SEQ ID NO: 59) and variants thereof (see for example Stewart et al. Nat. Chem. Biol. (2010) 6, 595-601).
In some embodiments, the target molecule may be RAS. Suitable RAS peptide ligands are well- known in the art and include RAS-binding peptides identified by phage display, such as RRRRCPLYISYDPVCRRRR (SEQ ID NO: 60) and variants thereof (see for example Sakamoto et al. BBRC (2017) 484, 605-1 1).
In some embodiments, the target molecule may be GSK3 (glycogen synthase kinase 3). Suitable GSK3 peptide ligands are well-known in the art and include substrate-competitive binding sequences such as KEAPPAPPQDP (SEQ ID NO: 61), LSRRPDYR (SEQ ID NO: 62), RREGGMSRPADVDG (SEQ ID NO: 63), and YRRAAVPPSPSLSRHSSPSQDEDEEE (SEQ ID NO: 64) and variants thereof (see for example llouz et al. J. Biol. Chem. 281 (2006), 30621-30630. Plotkin et al. J. Pharmacol.
Exp. Ther. (2003) 305, 974-980).
In some embodiments, the target molecule may be CtBP (C-terminal binding protein). Suitable CtBP peptide ligands are well-known in the art and include sequences identified from a cyclic peptide library screen, such as SGWTWRMY (SEQ ID NO: 65) and variants thereof (see for example Bids et al. Chem. Sci. (2013) 4, 3046-57).
Examples of suitable peptide ligands for target molecules that may be used in a modular binding protein as described herein are shown in Tables 2 and 7. Preferred peptide ligands are shown in Table 8 (SEQ ID NOs: 277-388).
In some preferred embodiments, a modular binding protein as described herein may comprise a peptide ligand for an E3 ubiquitin ligase. Examples of suitable E3 ubiquitin ligases include MDM2, SCFSkp2, BTB-CUL3-RBX1 , APC/C, SIAH, CHIP, Cul4-DDB1 , SCF-family, b-TrCP, Fbw7 and Fbx4.
Suitable peptide ligands for E3 ubiquitin ligases (degrons) are well known in the art and may be 5 to 20 amino acids. For example, a suitable peptide ligand for MDM2 may include a peptide ligand from p53 (e.g. FAAYWNLLSAYG (SEQ ID NO: 66)) and or a variant thereof. A suitable peptide ligand for SCFSkp2 may include a peptide ligand from p27 (e.g. AGSNEQEPKKRS (SEQ ID NO: 67)) and variants thereof. A suitable peptide ligand for Keap1-Cul3 may include a peptide ligand from Nrf2 (e.g. DPETGEL (SEQ ID NO: 68)) or a variant thereof. A suitable peptide ligand for SPOP-Cul3 may be include a peptide ligand from Puc (e.g. LACDEVTSTTSSSTA (SEQ ID NO: 69) or a variant thereof. A suitable peptide ligand for APC/C may include the degrons termed ABBA (e.g.
SLSSAFHVFEDGNKEN (SEQ ID NO: 70)), KEN (e.g. SEDKENVPP (SEQ ID NO: 71)), or DBOX (e.g. PRLPLGDVSNN (SEQ ID NO: 72)) or a variant thereof. In some instances, a combination of these degrons for may be used (mimicking the bipartite or tripartite degrons found in some natural substrates). A suitable peptide ligand for SIAH may include a peptide ligand from PHYL (e.g.
LRPVAMVRPTV (SEQ ID NO: 73)) or a variant thereof. A suitable peptide ligand for CHIP (carboxyl terminus of Hsc70-interacting protein) may include peptide sequences such as ASRMEEVD (SEQ ID NO: 74) (from Hsp90 C-terminus) and GPTIEEVD (SEQ ID NO: 75) (from Hsp70 C-terminus) or a variant thereof. A suitable peptide ligand for beta-TrCP may include a degron sequence motif (including phosphomimetic amino acids), such as DDGYFD (SEQ ID NO: 76) or a variant thereof. A suitable peptide ligand for Fbx4 may include sequences derived from TRF1 , such as MPIFWKAHRMSKMGTG (SEQ ID NO: 77) or a variant thereof (see for example Lee et al.
Chembiochem (2013) 14, 445-451). A suitable peptide ligand for FBw7 may include degron sequence motifs (including phosphomimetic amino acids), such as LPSGLLEPPQD (SEQ ID NO: 78). A suitable peptide ligand for DDB1-Cul4 may include sequences derived from HBx (hepatitis B virus X protein) and similar proteins from other viruses and from DCAFs (DDB1-CUL4-associated factors) including helical motifs such as ILPKVLHKRTLGL (SEQ ID NO: 79), NFVSWHANRQLGM (SEQ ID NO: 80), NTVEYFTSQQVTG (SEQ ID NO: 81), and NITRDLIRRQIKE (SEQ ID NO: 82) (see for example Li et al. Nat. Struct. Mol. Biol. (2010) 17, 105-1 11).
Examples of suitable peptide ligands for E3 ubiquitin ligases that may be used in a modular binding protein as described herein are shown in Table 3. Preferred peptide ligands for E3 ubiquitin ligases are shown in Table 9 (SEQ ID NOs: 470-617).
A modular binding protein comprising a peptide ligand for an E3 ubiquitin ligase may also comprise a peptide ligand for a target molecule. Binding of the modular binding protein to both the target molecule and the E3 ubiquitin ligase may cause the target molecule to be ubiquitinated by the E3 ubiquitin ligase. Ubiquitinylated target molecules are then degraded by the proteasome. This allows the specific targeting of molecules for proteolysis by the modular binding protein. The ubiquitination and subsequent degradation of a target protein has been shown for hetero-bifunctional small molecules (PROTACs; proteolysis targeting chimeras) that bind the target protein and a ubiquitin ligase simultaneously (see for example Bondeson et al. Nat. Chem. Biol. 2015; Deshaies 2015; Lu et al. 2015).
In some embodiments, the modular binding protein may lack lysine residues, so that it avoids ubiquitination by the E3 ubiquitin ligase. Examples of modular binding proteins that bind E3 ubiquitin ligase and a target molecule are shown in Tables 1 and 8.
A suitable modular binding protein may comprise an N terminal peptide ligand that binds a target protein, such as b catenin, and a C terminal peptide ligand that binds an E3 ubiquitin ligase. For example, the N terminal peptide ligand may be a b catenin-binding sequence derived from Bcl9 and the C terminal peptide ligand may be an Mdm2-binding sequence derived from p53. Alternatively, a modular binding protein may comprise a C terminal peptide ligand that binds a target protein, such as b catenin, and an N terminal peptide ligand that binds an E3 ubiquitin ligase (see figure 10A).
Another suitable modular binding protein may comprise three repeat domains, a peptide ligand located in an inter-repeat loop that binds a target protein, such as b catenin, and a C terminal peptide ligand that binds an E3 ubiquitin ligase. For example, the inter-repeat loop peptide ligand may be derived from the phosphorylated region of APC (adenomentous polyposis coli) and the C terminal peptide ligand may be an Mdm2-binding sequence derived from p53. Alternatively, the modular binding protein may comprise a peptide ligand located in an inter- re peat loop that binds an E3 ubiquitin ligase, and a C terminal peptide ligand that binds a target protein, such as b catenin (See figure 10B).
Another suitable modular binding protein may comprise three repeat domains, an N terminal peptide ligand that binds a target protein, such as b catenin, and a peptide ligand located in an inter-module loop that binds an E3 ubiquitin ligase. For example, the N terminal peptide ligand may be a b catenin- binding sequence derived from LRH1 (liver receptor homolog 1) and the inter-module loop peptide ligand may be a sequence derived from the Skp2-targeting region of p27. Alternatively, the modular binding protein may comprise an N terminal peptide ligand that binds an E3 ubiquitin ligase and a peptide ligand located in an inter-module loop that binds a target protein, such as b catenin (see figure 10C).
Another suitable modular binding protein may comprise four repeat domains, a first peptide ligand located in an inter-repeat loop that binds an E3 ubiquitin ligase and a second peptide ligand located in an inter-repeat loop that binds a target molecule. The first and second inter-repeat loops may be separate by an inter-repeat loop lacking a peptide ligand. For example, the first peptide ligand may be located in the first inter-repeat loop inter-repeat loop from the N terminus and the second peptide ligand may be located in the third inter-repeat loop from the N terminus or vice versa.
In some preferred embodiments, a modular binding protein as described herein may comprise an amino acid shown in Table 10 (SEQ ID NOs: 766-815) or a variant thereof.
In other preferred embodiments, a modular binding protein as described herein may comprise a peptide ligand that binds to a component of a target-selective autophagy pathway, such as chaperone-mediated autophagy (CM A). The modular binding protein and target molecules bound thereto are thus recognised by the autophagy pathway and the target molecules are subsequently degraded. Suitable components of the CMA pathway include heat shock cognate protein of 70 kDa (hsc70, HSPA8, Gene ID: 3312). Suitable peptide ligands are well known in the art (Dice J.F. (1990). Trends Biochem. Sci. 15, 305-309) and include Lys-Phe-Glu-Arg-Gln (KFERQ) and variants thereof, such as CMA_Q and CMA_K, as described herein. These domains have been demonstrated to be capable of targeting heterologous proteins to the autophagy pathway (Fan, X.et al; (2014) Nature Neuroscience 17, 471-480).
In some embodiments, a modular binding protein may comprise a scaffold with the amino acid sequence of residues 1 to 171 of SEQ ID NO: 1230 (PPX172 of Table 13 without the HA Tag). A target peptide ligand and the E3 ligase peptide ligand may be grafted into any two of the N terminal helical grafting site 1 and loop grafting sites 2 to 6. Preferably, the target peptide ligand is located in helical grafting site 1 , loop grafting site 2, loop grafting site 4 or loop grafting site 5 of the scaffold. Preferably, the E3 ligase peptide ligand or other degron may be located in any of loop grafting sites 2 to 6.
For example, the target peptide ligand may be in site 1 and the E3 ligase peptide ligand may be in site 4; the target peptide ligand may be in site 2 and the E3 ligase peptide ligand may be in site 5, the target peptide ligand may be in site 5 and the E3 ligase peptide ligand may be in site 2; the target peptide ligand may be in site 1 and the E3 ligase peptide ligand may be in site 6, the target peptide ligand may be in site 2 and the E3 ligase peptide ligand may be in site 6, the target peptide ligand may be in site 3 and the E3 ligase peptide ligand may be in site 6, the target peptide ligand may be in site 4 and the E3 ligase peptide ligand may be in site 6; the target peptide ligand may be in the site 5 and the E3 ligase peptide ligand may be in site 6; the target peptide ligand may be in site 1 and the E3 ligase peptide ligand may be in site 3; and the target peptide ligand may be in site 2 and the E3 ligase peptide ligand may be in site 4.
Suitable peptide ligands include beta eaten in binding ligands, for example helical beta-catenin binding sequence from the protein AXIN, such as ILXXHV and variants thereof; peptides from ARC
(Adenomatous polyposis coli;“phospho”), such as SEELEALEALELDE or GSEELEALEALELDEA and variants thereof; peptides from LRH1 , such as YEQAIAAYLDALMC and variants thereof, peptides from BCL9, such as TLXXIQXXL, LXTLXXIQ, and SLXXIXXML and variants thereof, and KRAS binding ligands, for example alpha-helical sequences from the protein SOS1 (Son of seven less homolog 1 ), such as TNXXKXXE and variants thereof.
Suitable E3 ligase peptide ligands include SCFSkp2 binding sequences from p27, such as
AGSNEQEPNR and variants thereof, Cul3-KEAP1 binding sequences from NRF2, such as
LDPETGEL and variants thereof, E3 Cul3-SPOP binding sequences from Puc, such as DEVTSTTSS, and variants thereof, SIAH binding sequences from PHYL, such as LRPVAMVRPWVR, and variants thereof, COP1 binding sequences from Trib, such as SDQIVPEYQE, and variants thereof, UBR5 binding sequences from PAM2, such as LSVNAPEFYP, and variants thereof, beta-TRCP binding sequences from CDC25B, such as TEEDDGFVDI, and variants thereof, and MDM2 binding sequences from p53, such as FSXXWXXL and variants thereof
Examples of modular binding proteins as described herein are shown in Tables 12 and 13. For example, a modular binding protein as described herein may comprise an amino acid sequence shown in Table 13 (SEQ ID NOs: 1230-1304) or a variant of an amino acid sequence shown in Table 13. Variants of a reference sequence are described above and may include variants in which the one or both of the target peptide ligand and the E3 ligase peptide ligand in a reference amino acid sequence of Table 13 are replaced by a different peptide ligand. Suitable peptide ligands are described below. Variants may also include variants in which the scaffold sequence in a reference amino acid sequence of Table 13 is replaced by a different scaffold sequence. Suitable scaffold sequences are described above. In addition to repeat domains and peptide ligands, a modular binding protein may further comprise one or more additional domains which confer additional functionality, such as targeting domains, intracellular transport domains, stabilising domains or oligomerisation domains. Additional domains may for example be located at the N or C terminus of the modular binding protein or in a loop between repeats.
A targeting domain may be useful in targeting the modular binding protein to a particular destination in vivo, such as a target tissue, cell, membrane or intracellular organelle. Suitable targeting domains include chimeric antigen receptors (CARs).
An intracellular transport domain may facilitate the passage of the modular binding protein through the cell membrane into cells, for example to bind intracellular target molecules. Suitable intracellular transfer domains are well known in the art (see for example Bechara et al FEBS Letters 587 1 (2013) 1693-1 02) and include cell-penetrating peptides (CPFs), such as Antennapedia (43-58), Tat (48- 60), Cadherin (615-632) and poly-Arg.
A stabilising domain may increase the half-life of the modular binding protein in vivo. Suitable stabilising domains are well known in the art and include Fc domains, serum albumin, unstructured peptides such as XTEN98 or PAS" and polyethylene glycol (PEG).
An oligomerisation domain may facilitate the formation of multi-protein complexes, for example to increase avidity against multi-valent targets. Suitable oligomerisation domains include the‘foldon’ domain, the natural trimerisation domain of T4 fibritin (Meier et al., J. Mol. Biol. (2004) 344(4):1051 - 69).
In addition to repeat domains, peptide ligands and optionally one or more additional domains, a modular binding protein may further comprise a cytotoxic or therapeutic agent and/or or detectable label.
Suitable cytotoxic agents include, for example, chemotherapeutic agents, such as methotrexate, auristatin adriamicin, doxorubicin, melphalan, mitomycin C, ozogamicin, chlorambucil, maytansine, emtansine, daunorubicin or other intercalating agents, enzymatically active toxins of bacterial, fungal, plant, or animal origin, such as diphtheria A chain, nonbinding active fragments of diphtheria toxin, exotoxin A chain, ricin A chain, abrin A chain, modeccin A chain, a-amanitin, alpha-sarcin, Aleurites fordii proteins, tubulysins, dianthin proteins, Phytolaca americana proteins (PAP I, PAPII, and PAP-S), Momordica charantia inhibitor, curcin, crotin, Sapaonaria officinalis inhibitor, gelonin, mitogellin, restrictocin, phenomycin, enomycin, pyrrolobenzodiazepines, and the tricothecenes and fragments of any of these. Suitable cytotoxic agents may also include radioisotopes. A variety of radionuclides are available for the production of radioconjugated modular binding proteins including, but not limited to, "Y, 125l, 131 l, 123l, 111 ln, 131 In, 105Rh, 153Sm, 67Cu, 67Ga, 166Ho, 177Lu, 186Re, 188Re and 212Bi. Conjugates of a modular binding protein and one or more small anti-cancer molecules, for example toxins, such as a calicheamicin, maytansinoids, a trichothene, and CC1065, and the derivatives of these toxins that have toxin activity, may also be used.
Suitable therapeutic agents may include cytokines (e.g. IL2, IL12 and TNF), chemokines, procoagulant factors (e.g. tissue factor), enzymes, liposomes, and immune response factors.
A detectable label may be any molecule that produces or can be induced to produce a signal, including but not limited to fluorescers, radiolabels, enzymes, chemiluminescers or photosensitizers. Thus, binding may be detected and/or measured by detecting fluorescence or luminescence, radioactivity, enzyme activity or light absorbance. Detectable labels may be attached to modular binding proteins using conventional chemistry known in the art.
There are numerous methods by which the label can produce a signal detectable by external means, for example, by visual examination, electromagnetic radiation, heat, and chemical reagents. The label can also be bound to another specific binding member that binds the modular binding protein, or to a support.
In some embodiments, a modular binding protein may be configured for display on a particle or molecular complex, such as a cell, ribosome or phage, for example for screening and selection. A suitable modular binding protein may further comprise a display moiety, such as phage coat protein, to facilitate display on a particle or molecular complex. The phage coat protein may be fused or covalently linked to the modular binding protein.
Modular binding proteins as described herein may be produced by recombinant means. For example, a method of producing a modular binding protein as described herein may comprise expressing a nucleic acid encoding the modular binding protein. A nucleic acid may be expressed in a host cell and the expressed modular binding protein may then be isolated and/or purified from the cell culture.
In some embodiments, a method may comprise;
inserting a first nucleic acid encoding a peptide ligand into a second nucleic acid encoding a two or more repeat domains to produce a chimeric nucleic acid encoding a modular binding protein comprising a peptide ligand located in an inter- re peat loop or at the N or C terminus of the modular binding protein; and,
expressing said chimeric nucleic acid to produce the modular binding protein.
Preferably, a third nucleic acid encoding a second peptide ligand is inserted into the second nucleic acid to produce a chimeric nucleic acid encoding a chimeric protein comprising a first peptide ligand located at one of the first, second, third and fourth loops of the fibronectin scaffold and a second peptide ligand located at another of the first, second, third and fourth loops of the fibronectin scaffold. One of the first and second peptide ligands may comprise a sequence set out in Table 8 (SEQ ID NOs: 277-388) and the other of the first and second peptide ligands may comprise a sequence set out in Table 9 (SEQ ID NOs: 470-617).
Methods described herein may be useful in producing a modular binding protein that binds to a first target molecule and a second target molecule. For example, a method may comprise;
providing a nucleic acid encoding two or more repeat domains linked by inter-repeat loops, each repeat domain; and
incorporating into said nucleic acid a first nucleotide sequence encoding a first peptide ligand that binds to a first target molecule and a second nucleotide sequence encoding a second peptide ligand that binds to a second target molecule to generate a nucleic acid encoding a modular binding protein comprising said first and second peptide ligands, wherein said peptide ligands are located in an inter-repeat loop or at the N or C terminus of the modular binding protein; and
expressing the nucleic acid to produce said protein
One of the first and second target molecules may be an E3 ubiquitin ligase. For example, a method may comprise;
providing a nucleic acid encoding two or more repeat domains linked by inter-repeat loops, each repeat domain; and
incorporating into said nucleic acid a first nucleotide sequence encoding a first peptide ligand that binds to a target molecule and a second nucleotide sequence encoding a second peptide ligand that binds to an E3 ubiquitin ligase to generate a nucleic acid encoding a modular binding protein comprising said first and second peptide ligands, wherein said peptide ligands are located in an interrepeat loop or at the N or C terminus of the modular binding protein; and
expressing the nucleic acid to produce said protein.
The first peptide ligand may comprise a sequence set out in Table 8 (SEQ ID NOs: 277-388) and second peptide ligand may comprise a sequence set out in Table 9 (SEQ ID NOs: 470-617).
An isolated nucleic acid encoding a modular binding protein as described herein is provided as an aspect of the invention. The nucleic acid may be comprised within an expression vector. Suitable vectors can be chosen or constructed, containing appropriate regulatory sequences, including promoter sequences, terminator fragments, polyadenylation sequences, enhancer sequences, marker genes and other sequences as appropriate. Preferably, the vector contains appropriate regulatory sequences to drive the expression of the nucleic acid in a host cell. Suitable regulatory sequences to drive the expression of heterologous nucleic acid coding sequences in expression systems are well- known in the art and include constitutive promoters, for example viral promoters such as CMV or SV40, and inducible promoters, such as Tet-on controlled promoters. A vector may also comprise sequences, such as origins of replication and selectable markers, which allow for its selection and replication and expression in bacterial hosts such as E. coli and/or in eukaryotic cells.
Many techniques and protocols that are suitable for the expression of recombinant modular binding proteins in cell culture and their subsequent isolation and purification are known in the art (see for example Protocols in Molecular Biology, Second Edition, Ausubel et al. eds. John Wiley & Sons,
1992; Recombinant Gene Expression Protocols Ed RS Tuan (Mar 1997) Humana Press Inc).
A host cell comprising a nucleic acid encoding a modular binding protein as described herein or vector containing such a nucleic acid is also provided as an aspect of the invention. Suitable host cells include bacteria, mammalian cells, plant cells, filamentous fungi, yeast and baculovirus systems and transgenic plants and animals. The expression of proteins in prokaryotic cells is well established in the art. A common bacterial host is E. coli. A modular binding protein may also be produced by expression in eukaryotic cells in culture. Mammalian cell lines available in the art for expression of a modular binding protein include Chinese hamster ovary (CHO) cells, HeLa cells, baby hamster kidney cells, NS0 mouse melanoma cells, YB2/0 rat myeloma cells, human embryonic kidney cells (e.g. HEK293 cells), human embryonic retina cells (e.g. PerC6 cells) and many others.
Modular binding proteins as described herein may be used to produce libraries. A suitable library may be screened in order to identify and isolate modular binding proteins with specific binding activity. A library may comprise modular binding proteins, each modular binding protein in the library comprising:
(i) two or more repeat domains,
(ii) inter-repeat loops linking said repeat domains; and
(iii) a first and a second peptide ligand, each said peptide ligand being located in an interrepeat loop or at the N or C terminus of the modular binding protein,
wherein at least one amino acid residue in the second peptide ligand in said library is diverse.
The first peptide ligand may comprise a sequence set out in Table 8 (SEQ ID NOs: 277-388) or Table 9 (SEQ ID NOs: 470-617) or a variant thereof.
The residues at one or more positions in the peptide ligand of the modular binding proteins in the library may be diverse or randomised i.e. the residue located at the one or more positions may be different in different molecules in a population.
For example, 1 to 12 positions within a helical peptide ligand at the N or C terminus of the modular binding proteins in the library may be diverse or randomised. In addition, the non-constrained Xn sequence of the peptide ligand may contain additional diversity. Alternatively or additionally, 1 to n positions within an inter-repeat peptide ligand of the modular binding proteins in the library may be diverse or randomised, where n is the number of amino acids in the peptide ligand.
In some embodiments, peptide ligands may be screened individually and a modular binding protein progressively assembled from repeat domains comprising peptide ligands identified in different rounds of screening. For example, a library may comprise modular binding proteins, each modular binding protein in the library comprising:
(i) two or more repeat domains,
(ii) inter-repeat loops linking said repeat domains; and
(iii) one or more constant peptide ligands having the same amino acid sequence in each modular binding protein in the library and one or more diverse peptide ligands, preferably one diverse peptide ligand, having a different amino acid sequence in each modular binding protein in the library, said peptide ligands being located in an inter-repeat loop or at the N or C terminus of the modular binding protein.
The constant peptide ligand may comprise a sequence set out in Table 8 (SEQ ID NOs: 277-388) or Table 9 (SEQ ID NOs: 470-617) or a variant thereof.
At least one amino acid residue in the diverse peptide ligands in said library may be diverse.
A library may be produced by a method comprising:
(a) providing a population of nucleic acids encoding a diverse population of modular binding proteins comprising
(i) two or more repeat domains,
(ii) inter-repeat loops linking the two or more repeat domains; and
(iii) one or more peptide ligands, each said peptide ligand being located in an interrepeat loop or at the N or C terminus of the modular binding protein,
wherein one or more residues of a peptide ligand in each modular binding protein is diverse in said library, and
(b) expressing said population of nucleic acids to produce the diverse population, thereby producing a library of modular binding proteins.
The population of nucleic acids may be provided by a method comprising inserting a first population of nucleic acids encoding a diverse peptide ligand into a second population of nucleic acids encoding the two or more repeat domains linked by inter-repeat loops, optionally wherein the first and second nucleic acids are linked with a third population of nucleic acids encoding linkers of up to 10 amino acids.
The nucleic acids may be contained in vectors, for example expression vectors. Suitable vectors include phage-based or phagemid-based phage display vectors.
The nucleic acids may be recombinantly expressed in a cell or in solution using a cell-free in vitro translation system such as a ribosome, to generate the library. In some preferred embodiments, the library is expressed in a system in which the function of the modular binding protein enables isolation of its encoding nucleic acid. For example, the modular binding protein may be displayed on a particle or molecular complex to enable selection and/or screening. In some embodiments, the library of modular binding proteins may be displayed on beads, cell-free ribosomes, bacteriophage, prokaryotic cells or eukaryotic cells. Alternatively, the encoded modular binding protein may be presented within an emulsion where activity of the modular binding protein causes an identifiable change. Alternatively, the encoded modular binding protein may be expressed within or in proximity of a cell where activity of the modular binding protein causes a phenotypic change or changes in the expression of a reporter gene.
Preferably, the nucleic acids are expressed in a prokaryotic cell, such as E coli. For example, the nucleic acids may be expressed in a prokaryotic cell to generate a library of recombine binding proteins that is displayed on the surface of bacteriophage. Suitable prokaryotic phage display systems are well known in the art, and are described for example in Kontermann, R & Dubel, S, Antibody Engineering, Springer-Verlag New York, LLC; 2001 , ISBN: 3540413545, W092/01047, US5969108, US5565332, US5733743, US5858657, US5871907, US5872215, US5885793,
US5962255, US6140471 , US6172197, US6225447, US6291650, US6492160 and US6521404.
Phage display systems allow the production of large libraries, for example libraries with 108 or more,
109 or more, or 1010 or more members.
In other embodiments, the cell may be a eukaryotic cell, such as a yeast, insect, plant or mammalian cell.
A diverse sequence as described herein is a sequence which varies between the members of a population i.e. the sequence is different in different members of the population. A diverse sequence may be random i.e. the identity of the amino acid or nucleotide at each position in the diverse sequence may be randomly selected from the complete set of naturally occurring amino acids or nucleotides or a sub-set thereof. Diversity may be introduced into the peptide ligand using approaches known to those skilled in the art, such as oligonucleotide-directed mutagenesis22, Molecular Cloning: a Laboratory Manual: 3rd edition, Russell et al. , 2001 , Cold Spring Harbor Laboratory Press, and references therein).
Diverse sequences may be contiguous or may be distributed within the peptide ligand. Suitable methods for introducing diverse sequences into peptide ligand are well-described in the art and include oligonucleotide-directed mutagenesis (see Molecular Cloning: a Laboratory Manual: 3rd edition, Russell et al., 2001 , Cold Spring Harbor Laboratory Press, and references therein). For example, diversification may be generated using oligonucleotide mixes created using partial or complete randomisation of nucleotides or created using codons mixtures, for example using trinucleotides. Alternatively, a population of diverse oligonucleotides may be synthesised using high throughput gene synthesis methods and combined to create a precisely defined and controlled population of peptide ligands. Alternatively,“doping” techniques in which the original nucleotide predominates with alternative nucleotide(s) present at lower frequency may be used.
Preferably, the library is a display library. The modular binding proteins in the library may be displayed on the surface of particles, or molecular complexes such as beads, for example, plastic or resin beads, ribosomes, cells or viruses, including replicable genetic packages, such as yeast, bacteria or bacteriophage (e.g. Fd, M13 or T7) particles, viruses, cells, including mammalian cells, or covalent, ribosomal or other in vitro display systems. Techniques for the production of display libraries, such as phage display libraries are well known in the art. Each particle or molecular complex may comprise nucleic acid that encodes the modular binding protein that is displayed by the particle.
In some preferred embodiments, the modular binding proteins in the library are displayed on the surface of a viral particle such as a bacteriophage. Each modular binding protein in the library may further comprise a phage coat protein to facilitate display. Each viral particle may comprise nucleic acid encoding the modular binding protein displayed on the particle. Suitable viral particles include bacteriophage, for example filamentous bacteriophage such as M13 and Fd.
Suitable methods for the generation and screening of phage display libraries are well known in the art. Phage display is described for example in W092/01047 and US patents US5969108, US5565332, US5733743, US5858657, US5871907, US5872215, US5885793, US5962255, US6140471 ,
US6172197, US6225447, US6291650, US6492160 and US6521404.
Libraries as described herein may be screened for modular binding proteins which display binding activity, for example binding to a target molecule. Binding may be measured directly or may be measured indirectly through agonistic or antagonistic effects resulting from binding. A method of screening may comprise;
(a) providing a library of modular binding proteins, each modular binding protein in the library comprising;
(i) two or more repeat domains,
(ii) inter-repeat loops linking said repeat domains; and
(iii) a first and a second peptide ligand, each said peptide ligand being located in an inter-repeat loop or at the N or C terminus of the modular binding protein,
wherein the first peptide ligand comprises a sequence set out in Table 8 (SEQ ID NOs: 277-388) or Table 9 (SEQ ID NOs: 470-617) and one or more residues of the second peptide ligand are diverse in said library,
(b) screening the library for modular binding proteins which display a binding activity, and (c) identifying one or more modular binding proteins in the library which display the binding activity.
In some embodiments, the modular binding proteins in the library may comprise one peptide ligand with at least one diverse amino acid residue. Conveniently the modular binding proteins in the library comprise two repeat domains. The library may be screened for peptide ligands that bind to a target molecule. Peptide ligands identified in this fashion can be assembled in a modular fashion to generate a modular binding protein as described herein that is multi-specific.
For example, a first library may be screened for a first peptide ligand that binds to a first target molecule and a second library may be screened for a second peptide ligand that binds to a second target molecule. The first and second peptide ligands are in different locations in the modular binding protein i.e. they are not both N terminal peptide ligands, C terminal peptide ligands or inter-repeat peptide ligands. First and second peptide ligands that bind to the first and second target molecules, respectively, are identified from the first and second libraries. The identified first and second peptide ligands may then be incorporated into a modular binding protein that binds to the first and second target molecules.
A first library may comprise modular binding proteins in the library with a first diverse peptide ligand having at least one diverse amino acid residue. A first peptide ligand that binds to a target molecule may be identified from the first library. Modular binding proteins comprising the first peptide ligand may be used to generate a second library comprising a second diverse peptide ligand having at least one diverse amino acid residue. For example, the modular binding protein from the first library may be modified by addition of a second diverse peptide ligand at the N or C terminal or by the addition of additional repeat domains comprising the second diverse peptide ligand in an inter-repeat loop. A second peptide ligand that binds to the same or a different target molecule may be identified from the second library. Modular binding proteins comprising the first and second peptide ligands may be used to generate a third library comprising a third diverse peptide ligand having at least one diverse amino acid residue. For example, the modular binding protein from the second library may be modified by addition of a third diverse peptide ligand at the N or C terminal or by the addition of additional repeat domains comprising the third diverse peptide ligand in an inter-repeat loop. A third peptide ligand that binds to the same target molecule as the first and/or second peptide ligands or a different target molecule may be identified from the third library. In this way, a modular binding protein containing multiple peptide ligands may be sequentially assembled (see Figure 16).
The use of separate libraries for each peptide ligand allows large numbers of different variants of each peptide ligand to be screened independently and then combined. For example, a phage library of 108-1012 first peptide ligand variants may be combined with a phage library of 108-1012 second peptide ligand variants and a phage library of 108-1012 third peptide ligand variants. In some embodiments, a phage library of 108-1012 N terminal peptide ligand variants may be combined with a phage library of 108-1012 C terminal peptide ligand variants to generate a modular binding protein with N and C terminal peptide ligands.
Screening a library for binding activity may comprise providing a target molecule and identifying or selecting members of the library that bind to the target, or expressing the library in a population of cells and identifying or selecting members of the library that elicit a cell phenotype. The one or more identified or selected modular binding proteins may be recovered and subjected to further selection and/or screening.
Binding may be determined by any suitable technique. For example, the library may be contacted with the target molecule under binding conditions for a time period sufficient for the target molecule to interact with the library and form a binding reaction complex with a least one member thereof. Binding conditions are those conditions compatible with the known natural binding function of the target molecule. Those compatible conditions are buffer, pH and temperature conditions that maintain the biological activity of the target molecule, thereby maintaining the ability of the molecule to participate in its preselected binding interaction. Typically, those conditions include an aqueous, physiologic solution of pH and ionic strength normally associated with the target molecule of interest.
The library may be contacted with the target molecule in the form of a heterogeneous or
homogeneous admixture. Thus, the members of the library can be in the solid phase with the target molecule present in the liquid phase. Alternatively, the target molecule can be in the solid phase with the members of the library present in the liquid phase. Still further, both the library members and the target molecule can be in the liquid phase.
Suitable methods for determining binding of a modular binding protein to a target molecule are well known in the art and include ELISA, bead-based binding assays (e.g. using streptavidin-coated beads in conjunction with biotinylated target molecules, surface plasmon resonance, flow cytometry,
Western blotting, immunocytochemistry, immunoprecipitation, and affinity chromatography.
Alternatively, biochemical or cell-based assays, such as fluorescence-based or luminescence-based reporter assays may be employed.
Multiple rounds of panning may be performed in order to identify modular binding proteins which display the binding activity. For example, a population of modular binding proteins enriched for the binding activity may be recovered or isolated from the library and subjected to one or more further rounds of screening for the binding activity to produce one or further enriched populations. Modular binding proteins which display binding activity may be identified from the one or more further enriched populations and recovered, isolated and/or further investigated.
In some embodiments, binding may be determined by detecting agonism or antagonism resulting from the binding of a modular binding protein to a target molecule, such as a ligand, receptor or enzyme. For example, the library may be screened by expressing the library in reporter cells and identifying one or more reporter cells with altered gene expression or phenotype. Suitable functional screening techniques for screening recombinant populations of modular binding proteins are well- known in the art
Modular binding proteins which display the binding activity may be further engineered to improve an activity or property or introduce a new activity or property, for example a binding property such as affinity and/or specificity, an in vivo property such as solubility, plasma stability, or cell penetration, or an activity such as increased neutralization of the target molecule and/or modulation of a specific activity of the target molecule or an analytical property. Modular binding proteins may also be engineered to improve stability, solubility or expression level.
Further rounds of screening may be employed to identify modular binding proteins which display the improved property or activity. For example, a population of modular binding proteins enriched for binding to the target molecule may be recovered or isolated from the library and subjected to one or more further rounds of screening for the improved or new property or activity to produce one or further enriched populations. Optionally, this may be repeated one or more times. Modular binding proteins which display the improved property or activity may be identified from the one or more further enriched populations and recovered, isolated and/or further investigated.
A modular binding protein as described herein may be encapsulated in a liposome, for example for delivery into a cell. Preferred liposomes include fusogenic liposomes. Suitable fusogenic liposomes may comprise a cationic lipid, such as 1 , 2-dioleoyl-3-trimethylammoniumpropane (DOTAP), and a neutral lipid, such as dioleoylphosphatidylethanolamine (DOPE) for example in a 1 :1 (w/w) ratio. Optionally, a liposome may further comprise an aromatic lipid, such as DiO (3, 3'- dioctadecyloxacarbocyanine perchlorate), DiR (1 , T-dioctadecyl-3, 3, 3', 3'- tetramethylindotricarbocyanine iodide), N-(4,4-difluoro-5,7-dimethyl-4-bora-3a,4a-diaza-sindacene-3- propionyl)-1 ,2-dihexadecanoyl-sn-glycero-3-phosphoethanolamine (triethylammonium salt) (BODIPY FL-DHPE), and 2-(4,4-difluoro-5-methyl-4-bora-3a,4a-diazas-indacene-3-dodecanoyl)-1- hexadecanoyl-sn-glycero-3-phosphocholine (BODIPY-C12HPC) for example in a 0.1 :1 :1 (w/w) ratio relative to the neutral and cationic lipid. Suitable techniques for the encapsulation of proteins in liposomes and their delivery into cells are established in the art (see for example, Kube et al Langmuir (2017) 33 1051 -1059; Kolasinac et al (2018) Int. J. Mol. Sci. 19 346).
A method described herein may comprise admixing a modular binding protein or encoding nucleic acid as described herein with a solution of lipids, for example in an organic solvent, such as chloroform, and evaporating the solvent to produce liposomes encapsulating the modular binding protein. Liposome encapsulations comprising a modular binding protein as described herein are provided as an aspect of the invention.
A modular binding protein or encoding nucleic acid as described herein may be admixed with a pharmaceutically acceptable excipient. A pharmaceutical composition comprising a modular binding protein or nucleic acid as described herein and a pharmaceutically acceptable excipient is provided as an aspect of the invention.
The term“pharmaceutically acceptable” as used herein pertains to compounds, materials, compositions, and/or dosage forms which are, within the scope of sound medical judgement, suitable for use in contact with the tissues of a subject (e.g., human) without excessive toxicity, irritation, allergic response, or other problem or complication, commensurate with a reasonable benefit/risk ratio. Each carrier, excipient, etc. must also be“acceptable” in the sense of being compatible with the other ingredients of the formulation. Suitable carriers, excipients, etc. can be found in standard pharmaceutical texts, for example, Remington’s Pharmaceutical Sciences, 18th edition, Mack Publishing Company, Easton, Pa., 1990.
The pharmaceutical composition may conveniently be presented in unit dosage form and may be prepared by any methods well-known in the art of pharmacy. Such methods include the step of bringing the modular binding protein into association with a carrier which may constitute one or more accessory ingredients. In general, pharmaceutical compositions are prepared by uniformly and intimately bringing into association the active compound with liquid carriers or finely divided solid carriers or both, and then if necessary shaping the product.
Pharmaceutical compositions may be in the form of liquids, solutions, suspensions, emulsions, elixirs, syrups, tablets, lozenges, granules, powders, capsules, cachets, pills, ampoules, suppositories, pessaries, ointments, gels, pastes, creams, sprays, mists, foams, lotions, oils, boluses, electuaries, or aerosols.
A modular binding protein, encoding nucleic acid or pharmaceutical composition comprising the modular binding protein or encoding nucleic acid may be administered to a subject by any convenient route of administration, whether systemically/ peripherally or at the site of desired action, including but not limited to, oral (e.g. by ingestion); topical (including e.g. transdermal, intranasal, ocular, buccal, and sublingual); pulmonary (e.g. by inhalation or insufflation therapy using, e.g. an aerosol, e.g.
through mouth or nose); rectal; vaginal; parenteral, for example, by injection, including subcutaneous, intradermal, intramuscular, intravenous, intraarterial, intra cardiac, intrathecal, intraspinal,
intracapsular, subcapsular, intraorbital, intraperitoneal, intratracheal, subcuticular, intraarticular, subarachnoid, and intrasternal; by implant of a depot, for example, subcutaneously or intramuscularly.
Pharmaceutical compositions suitable for oral administration (e.g., by ingestion) may be presented as discrete units such as capsules, cachets or tablets, each containing a predetermined amount of the active compound; as a powder or granules; as a solution or suspension in an aqueous or non- aqueous liquid; or as an oil-in-water liquid emulsion or a water-in-oil liquid emulsion; as a bolus; as an electuary; or as a paste.
Pharmaceutical compositions suitable for parenteral administration (e.g. by injection, including cutaneous, subcutaneous, intramuscular, intravenous and intradermal), include aqueous and non- aqueous isotonic, pyrogen-free, sterile injection solutions which may contain anti-oxidants, buffers, preservatives, stabilisers, bacteriostats, and solutes which render the formulation isotonic with the blood of the intended recipient; and aqueous and non-aqueous sterile suspensions which may include suspending agents and thickening agents, and liposomes or other microparticulate systems which are designed to target the compound to cells, tissue or organs. Examples of suitable isotonic vehicles for use in such formulations include Sodium Chloride Injection, Ringer’s Solution, or Lactated Ringer’s Injection. Typically, the concentration of the active compound in the solution is from about 1 ng/ml to about 10 mg/ml, for example, from about 10 ng/ml to about 1 mg/ml. The formulations may be presented in unit-dose or multi-dose sealed containers, for example, ampoules and vials, and may be stored in a freeze-dried (lyophilised) condition requiring only the addition of the sterile liquid carrier, for example water for injections, immediately prior to use.
It will be appreciated that appropriate dosages of the modular binding protein, can vary from patient to patient. Determining the optimal dosage will generally involve the balancing of the level of diagnostic benefit against any risk or deleterious side effects of the administration. The selected dosage level will depend on a variety of factors including, but not limited to, the route of administration, the time of administration, the rate of excretion of the imaging agent, the amount of contrast required, other drugs, compounds, and/or materials used in combination, and the age, sex, weight, condition, general health, and prior medical history of the patient. The amount of imaging agent and route of administration will ultimately be at the discretion of the physician, although generally the dosage will be to achieve concentrations of the imaging agent at a site, such as a tumour, a tissue of interest or the whole body, which allow for imaging without causing substantial harmful or deleterious side- effects.
Administration in vivo can be effected in one dose, continuously or intermittently (e.g., in divided doses at appropriate intervals). Methods of determining the most effective means and dosage of administration are well known to those of skill in the art and will vary with the formulation used for therapy, the purpose of the therapy, the target cell being treated, and the subject being treated.
Single or multiple administrations can be carried out with the dose level and pattern being selected by the physician.
Modular binding proteins described herein may be used in methods of diagnosis or treatment in human or animal subjects, e.g. human. Modular binding proteins for a target molecule may be used to treat disorders associated with the target molecule.
Other aspects and embodiments of the invention provide the aspects and embodiments described above with the term“comprising” replaced by the term“consisting of and the aspects and embodiments described above with the term“comprising” replaced by the term’’consisting essentially of.
It is to be understood that the application discloses all combinations of any of the above aspects and embodiments described above with each other, unless the context demands otherwise. Similarly, the application discloses all combinations of the preferred and/or optional features either singly or together with any of the other aspects, unless the context demands otherwise.
Modifications of the above embodiments, further embodiments and modifications thereof will be apparent to the skilled person on reading this disclosure, and as such, these are within the scope of the present invention.
All documents and sequence database entries mentioned in this specification are incorporated herein by reference in their entirety for all purposes.
“and/or” where used herein is to be taken as specific disclosure of each of the two specified features or components with or without the other. For example“A and/or B” is to be taken as specific disclosure of each of (i) A, (ii) B and (iii) A and B, just as if each is set out individually herein.
Certain aspects and embodiments of the invention will now be illustrated by way of example and with reference to the figures described above.
Experiments
1. Methods
1.1 Large-scale protein purification (His-tagged) from E. coli
The pRSET B (His-tag) constructs were transformed into chemically competent E. coli C41 cells by heat shock and plated on LB-Amp plates. Colonies were grown in 2TY media containing ampicillin (50 micrograms/mL) at 37 °C, 220 rpm until the optical density (O.D.) at 600 nm reached 0.6. Cultures were then induced with IPTG (0.5mM) for 16-20 h at 20 °C or 4 h at 37 °C. Cells were pelleted by centrifugation at 3000 g (4 °C, 10 min) and resuspended in lysis buffer (10 mM sodium phosphate pH 7.4, 150 mM NaCI, 1 tablet of SIGMAFAST protease inhibitor cocktail (EDTA-free per 100 mL of solution), then lysed on a Emulsiflex C5 homogenizer at 15000 psi. Cell debris was pelleted by centrifugation at 15,000 g at 4 °C for 45 min. Ni-NTA beads 50% bed volume (GE Healthcare) (5 mL) were washed once with phosphate buffer (10 mM sodium phosphate pH 7.4, 150 mM NaCI ) before the supernatant of the cell lysate was bound to them for 1 hr at 4 °C in batch. The loaded beads were washed three times with phosphate buffer (40 mL) containing 30 mM of imidazole to prevent nonspecific interaction of lysate proteins with the beads. Samples were eluted using phosphate buffer with 300 mM imidazole, and purified by size-exclusion chromatography using a HiLoad 16/60 SuperdexG75 column (GE Life-Science) pre-equilibrated in phosphate buffer (10 mM sodium phosphate, pH 7.4, 150 mM NaCI) and proteins separated in isocratic conditions. Purity was checked on NuPage protein gel (Invitrogen), and fractions found to be over 95% pure were pooled. Purified protein was flash-frozen and stored at -80 °C until further use. Concentrations were determined by measuring absorbance at 280 nm and using a calculated extinction coefficient from ExPASy
ProtParam (Gasteiger et al. 2005) for each variant. Molecular weight and purity was confirmed using mass spectrometry (MALDI). 1.2 Large-scale protein purification (heat treatment) from E. coli
All modular binding proteins described herein are thermally very stable, with melting temperatures above 80°C. This means that the modular binding proteins could be separated from E. coli proteins by incubating the cell lysates at 65 °C for 20 min. Very few of the E. coli proteins survive such temperatures, and therefore, they will unfold and aggregate. Aggregated proteins were removed by centrifugation, leaving 80-90% pure sample of the desired protein. All our constructs folded reversibly, and therefore could be further purified by methods such as acetone or salt precipitation to remove DNA and other contaminants.
This approach allowed the production of large amounts of functional proteins without expensive affinity purification methods such as antibodies or His tags and is scalable to industrial production and bioreactors.
1.3 Small-scale purification of His-tagged proteins for higher-throughput testing
Plasmids were transformed into E. coli C41 cells and plated overnight. 15 mis of 2TΎ medium (Roche) containing 50 micrograms/ml ampicillin was placed in multiple 50 ml tubes. Several colonies were picked and resuspended in each 15 ml culture. For sufficient aeration it is important to only loosely tighten the lids of the 50 ml tubes. Cells were grown at 37 °C until OD600 of 0.6 and then induced with 0.5 mM IPTG overnight. Cells were pelleted at 3000 g (Eppendorf Centrifuge 5804) and then resuspended in 1 ml of BugBuster® cell lysis reagent. Alternatively, sonication in combination with lysozyme and DNAse I treatment was used. The lysate was spun at 12000 g for 1 minute to pellet any insoluble protein and cell debris.
The supernatant was added to 100 mI bed volume of pre-washed Ni-NTA agarose beads. The subseguent affinity purification was performed in batch, by washing the beads 4 times with 1 ml of buffer each time (alternatively, Qiagen Ni-NTA Spin Columns can be used). The first wash contained 10% BugBuster® solution and 30 mM imidazole in the chosen buffer. Here we used 50 mM sodium phosphate buffer pH 6.8, 150 mM NaCI. The three successive washes had 30 mM of imidazole in the chosen buffer. Beads were washed thoroughly to remove the detergent present in the BugBuster® solution. Protein was eluted from the beads in a single step using 1 ml of chosen buffer containing 300 mM imidazole. The combination of Bugbuster® and imidazole and the repeat washes in small bead volumes yielded >95% pure protein. Imidazole was removed using a NAP-5 disposable gel- filtration column (GE Healthcare).
1.4 Competition Fluorescence Polarization (FP)
To assay the binding of the designed SOS-TPR protein to KRAS, Competition FP was performed using purified KRAS Q61 H mutant and (2'-(or-3')-0-(N-Methylanthraniloyl) Guanosine 5'- Triphosphate, a fluorescent version of GTP, also known as mant-GTP. SOS-TPR was titrated using a 2-fold serial dilution against a 1 :1 complex of KRAS Q61 H and mant-GTP (1 mM) in a black 96-well plate (CLS3993 SIGMA). Plates were prepared under reduced light conditions and incubated at room temperature. Readings were taken on the CLARIOstar microplate reader, using an excitation filter at 360 nm and emission filter at 440 nm.
1.5 Isothermal Titration Calorimetry (ITC)
ITC was performed at 25°C using a VP-ITC (Microcal). 1 TBP-CTPR2, 2TBP-CTPR4, 3TBP-CTPR6 and TNKS2 ARC4 were dialysed into 10 mM sodium phosphate buffer pH 7.4, 150 mM NaCI, 0.5 mM TCEP. Dialysed TNKS2 ARC4 (200 mM) was titrated into the sample cell containing 1TBP-CTPR2 at 20 pM. Similar experiments were performed for 2TBP-CTPR4 and 3TBP-CTPR6. Injections of TNKS2 ARC4 into the cell were initiated with a 5 pL injection, followed by 29 injections of 10 pL. The reference power was set at 15 pCal/s with an initial delay of 1000 s and a stirring speed of 485 rpm. Data were fitted using the instrument software a one-site binding model.
1. 6 Cell culture
HEK293T cells were cultured in Dulbecco’s Modified Eagle’s Medium (Sigma Aldrich) supplemented with 10% fetal bovine serum and pe n ici 11 i n/stre pto my ci n (LifeTech) at 37°C with 5% CO2 air supply.
1. 7 Cell transfection
HEK293T were seeded in 6-well tissue culture plates (500,000 cells per well) and transfected the next day using the Lipofectamine2000 transfection reagent (Invitrogen) according to the manufacturer’s protocol.
1. 8 (3-catenin levels western blot assay
HA-b-catenin (1 pg) alone and with various PROTACs (1 pg) was transfected in HEK293T cells in 6- well plates using Lipofectamine2000. After 48 hours of transfection, the cells were lysed in 200 pL of Laemmli buffer. After sample was boiled at 95°C for 20 min proteins were resolved by SDS-PAGE and transferred to a PVDF membrane, and immunoblotting was performed using anti-HA (C29F4, Cell Signaling Technologies) and anti-actin (A2066, Sigma-Aldrich) antibodies. Changes in b-catenin levels were evaluated by the densitometry of the bands corresponding to HA-b-catenin normalised to actin levels using ImageJ.
1.9 Liposomal formulation and cytotoxicity assay
To make liposomal formulations of proteins (LFP), lipids (DOTAP (cationic): DOPE (neutral): DiR (aromatic) = 1 :1 :0.1 w/w) were dissolved in chloroform, and solvent was evaporated under vacuum overnight. Resulting mixed lipid cake was hydrated with 10 mM HEPES pH 7.4, containing 27 pM protein, so that the total lipid concentration is 4 mg/ml. This mixture was vortexed for 2 minutes and then sonicated for 20 minutes at room temperature. Liposomes encapsulating proteins were stored at 4°C until further use. To make empty liposomes (EL, empty liposomes without proteins), lipid cake was hydrated with 10 mM HEPES pH 7.4 without proteins. An ATP assay was used to investigate whether there is any cytotoxicity associated with EL and LFP.
In a typical procedure, 2 x 105 HEK 293T cells/well in 500 mί of Dulbecco’s Modified Eagles Medium (DMEM) supplemented with 10% fetal bovine serum were grown for 24 hours in a 24-well cell culture plate. Cells were incubated with liposome (EL/LFP)-media (DMEM without FBS) mix, having different volumes (0-60 mί) of EL and LFP, for 15 minutes at 37°C. After washing twice with 1x PBS, 500 mί of CellTiter-Glo® Reagent (Promega) was added and luminescence was measured using a microplate reader as par the manufacture’s protocol. Untreated cells were used as control. Data were obtained from triplicate samples, and the standard deviations were calculated from two independent experiments.
1 .10 TOPFLASH assay
The Wnt pathway was activated by treating HEK293T cells with Wnt-conditioned media obtained from L-cells expressing Wnt3A for 8 days. To perform the assay, 105 HEK293T cells/well were seeded on a 24-well plate Nunclon Delta Surface plate (NUNC) and incubated overnight at 37°C, 5% C02. The following day, cells were transfected with 100 ng of TOPflash TCF7L2-firefly luciferase plasmid, 10 ng of CMV-Renilla plasmid (as internal control) and 100 ng of the corresponding TPR construct.
Plasmids were mixed with 0.5 pL of Lipofectamine 2000 transfection reagent according to the manufacturer’s protocol (invitrogen). Transfected cells were allowed to recover for 8 h, then they were treated with Wnt-conditioned media (1 :2 final concentration) for a further 16 h. The TOPflash assay was performed using the Dual-Luciferase Reporter Assay System (Promega) (Korinek et al. , 1997 Science 275(5307):1784-7) following the manufacturer’s instructions. The activities of firefly and Renilla luciferases were measured sequentially from a single sample, using the CLARIOstar plate reader. Relative luciferase values were obtained from triplicate samples dividing the firefly luminescence activity by the CMV-induced Renilla activity, and standard deviation was calculated.
1 .1 1 TOPFLASH assay using liposome encapsulation to deliver designed TPR proteins into the cell
105 HEK 293T cells in 500 mί of Dulbecco’s Modified Eagles Medium (DMEM) supplemented with 10% fetal bovine serum were grown overnight in each well of a 24-well cell culture plate. For TOPFLASH reporter assays, 100 ng/well of TOPFLASH plasmid and 10 ng/well of CMV-Renilla plasmid (as internal control) were used to transfect cells in 24-well plates. Cells were transfected with the Lipofectamine 2000 transfection reagent according to the manufacturer’s protocol (Invitrogen). Transfected cells were allowed to recover for 8 hours, and Wnt signalling was activated by addition of Wnt3A-conditioned media obtained from L-cells. 16 hours post Wnt pathway activation, proteins were delivered into the cells by liposomal treatment. Cells were incubated with liposome (LFP)-media (DMEM without FBS) mix for 15 minutes at 37°C followed by one PBS wash. Wnt3A conditioned media was replaced and cells were incubated for variable time durations (2-8 hours). Following incubation, TOPFLASH assays were performed using the Dual-Luciferase Reporter Assay System (Promega) (Korinek et al., 1997) following the manufacturer’s instructions. Relative luciferase values were obtained from triplicate samples (from two independent experiments) by dividing the firefly luciferase values (from TOPFLASH) by the Renilla luciferase values (from CMV renilla), and standard deviations were calculated.
1.12. Competition fluorescence polarisation (FP) assay to measure the binding of designed Nrf-
TPR proteins to Keapl
To measure the binding of the designed Nrf-TPR proteins to Keapl , Competition FP was performed using 384-well black opaque optiplate microplates and a CLARIOstar microplate reader. Nrf-TPR proteins were titrated into a solution containing a mixture of FITC-labelled Nrf2 peptide and Keapl protein. The prepared plates were incubated for 30 minutes at room temperature before readings were taken.
1.13 Generation of HiBiT tagged Beta Catenin MIA PaCa-2 cell line
Beta Catenin in MIA PaCa-2 cells was tagged with the HiBiT small peptide tag by CRISPR editing and homology directed repair (HDR). Ribonuclearprotein complex of Cas9 enzyme and gRNA
(mU*mG*mA* rCrCrU rGrUrA rArArll rCrArll rCrCrU rUrUrG rUrUrU rllrArG rArGrC rllrArG rArArA rUrArG rCrArA rGrlirU rArArA rArUrA rArGrG rCrllrA rGrllrC rCrGrll rlirArU rCrArA rCrllrll rGrArA rArArA rGrllrG rGrCrA rCrCrG rArGrU rCrGrG rUrGrC mU*mU*mU* rU (SEQ ID NO: 83) was prepared and electroporated into MIA PaCa-2 cells with a single stranded DNA HDR template (CTTTCTGAAGTTCTGTAGGCAGAGTAAAAGTATTTTACCCAAACTGGCTTTTTAAAACTTCTTACC TAAAGGATGATTTAGCTAATCTTCTTGAACAGCCGCCAGCCGCTCACCAGGTCAGTATCAAACCA GGCCAGCTGATTGCTGTCACCTGGAGGCAGCCCATCCATGAGGTCCTGGGCATGCCCCAGATC TGGCA (SEQ ID NO: 84)) using the Lonza 4D Nucleofector system.
Stable pools were aliquoted and frozen after 3 weeks. Aliquots were used on assays until passage 10.
1.14 HiBiT Assay for Endogenous Beta Catenin Degradation
MIA PaCa-2 cells in which endogenous Beta Catenin has been tagged with the small peptide HiBiT (MIA PaCa-2 Beta Catenin HiBiT) were cultured in T175 flasks in DMEM 10% FCS at 37°C, 5% C02 and split 1 in 10 on reaching 90% confluency. Flasks were grown to approx. 80% confluency and cells were then split by gentle washing with 20 ml of PBS and incubation with 4 ml of cell dissociation buffer for 5 minutes. Cells were counted and seeded at 5000 cells per well into replicate white solid bottom and black clear bottom plates in 100 pi per well of DMEM 10%FCS. Cells were incubated overnight at 37°C, 5% CO2.
Cells were transfected with 100 ng per well of pcDNA 3.1 vector containing CPTR constructs using Lipofectamine 3000 using the manufacturer’s recommended protocol (Thermofisher Scientific) or with 25nM siRNA (Beta Catenin targeted or Scramble control) using TransIT X2 using the manufacturers recommended protocol (Mirus Bio Inc). 24 hours after transfection the NanoGlo lytic detection assay was used to determine Beta Catenin-HiBiT levels. Plates were equilibrated to room temperature and 80 mI of Nanoglo Lytic reagent was added to each well. After 10 minutes of shaking luminescence was measured on a GloMax Discover plate reader with an integration time of 2 seconds.
1.14 Immunofluorescent Assay for Expression
MIA PaCa-2 cells in which endogenous Beta Catenin has been tagged with the small peptide HiBiT (MIA PaCa-2 Beta Catenin HiBiT) were cultured in T175 flasks in DMEM 10% FCS at 37°C, 5% C02 and split 1 in 10 on reaching 90% confluency. Flasks were grown to approx. 80% confluency and cells were then split by gentle washing with 20 ml of PBS and incubation with 4 ml of cell dissociation buffer for 5 minutes. Cells were counted and seeded at 5000 cells per well into replicate white solid bottom and black clear bottom plates in 100 pi per well of DMEM 10% FCS. Cells were incubated overnight at 37°C, 5% CO2.
Cells were transfected with 100 ng per well of pcDNA 3.1 vector containing CPTR constructs using Lipofectamine 3000 using the manufacturers recommended protocol (Thermofisher Scientific). 24 hours after transfection, media was aspirated from the black clear bottom plates and cells were fixed with 4% PFA, quenched with 0.1 M glycine in PBS, and permeabilized with IF blocking and
permeabilization buffer (1* PBS, 5% FCS, 0.2% Saponin, 0.2 mM filtered). Cells were stained with anti HA primary antibody (HA-Tag (C29F4) Rabbit mAb), followed Goat anti-Rabbit, Alexa Fluor® 555 secondary and Hoescht 33342 Nuclear counterstain. Wells were imaged immediately on an EVOS M5000 Fluorescence Microscope.
2. Results
Tetratricopeptide repeat (TPR) is a 34-residue motif that can be repeated in tandem to generate modular proteins. TPRs are used here as an example of helix-turn-helix tandem-repeats arrays, but any tandem repeat array may be used.
RTPR proteins comprising TPRs were derived from the consensus TPR sequence (CTPR). Two repeats were found to be sufficient to generate a highly stable mini-protein of 68 amino acids
(RTPR2). The biophysical properties of two types of engineering strategy; loop insertions and terminal helix grafting, were assessed. The molar ellipticity at 222 nm (a measure of helical secondary structure content) of three different RTPR modules was monitored as a function of increasing temperature. A decrease in the absolute molar ellipticity with increasing temperature indicates a loss of structure and the unfolding of the protein. Even at the highest temperature recorded (85 °C), the RTPR2 protein without insertion was not fully denatured (Figure 1). RTPR2 with a 20-residue unstructured loop between the two repeats showed a small shift to a lower melting temperature
(Figure 1), but the protein remains fully folded up to 55 °C. This is well above physiologically relevant temperatures. RTPR2 with an additional N-terminal helix showed an increase in absolute molar ellipticity, indicating that the additional helical domain is folded. Moreover, unlike the loop insertion, the helix domain was capable of stabilising the RTPR2 module, shifting the transition midpoint to above 90 °C (Figure 1). These results showed that the two engineering strategies generated folded and stable modular mini-proteins capable of withstanding high temperatures. A key feature of the TPR scaffold was its modular nature. This modularity allowed us to display any number of binding modules in tandem to obtain bi- and multi-valent and multi-functional molecules against one, two or more targets. The stability of these proteins was shown to be modular. The stabilities of proteins comprising TBP-CTPR2 (a two-repeat CTPR with a loop insertion that binds to the protein tankyrase (Guettler et al. 201 1) repeated in tandem were measured. The TBP-CTPR2- containing proteins had two, four, six, and eight repeats, and they displayed one, two, three and four binding loops, respectively. The helical content of the proteins, monitored by molar ellipticity at 222 nm, was found to increase in proportion to the number of repeats, as did the stability, indicating that they were behaving like classic helical repeat proteins (Figure 2). These results demonstrate that bi- or multi-functional modular binding proteins have a high thermostability.
2.1 . Demonstration of proteins with a single binding function grafted onto an alpha-helix
2.1 .1 SOS1 -TPR, a helix-grafted binding module designed to bind to oncoprotein KRAS
First, we mapped the helix of SOS1 that interacts with KRAS (Margarit et al. 2003 Cell 1 12 5 685-695) onto the heptad distribution. We matched the heptad positions with the stapled SOS1 helical peptide produced by Leshchiner et al. (PNAS 2015 1 12 (6) 1761 -1766) and set the stapled side of the peptide to form the hydrophobic interface with the rest of the TPR protein (Fig. 3A). The length of the helix is important. An N-terminal solvating CTPR helix ends in the sequence DPNN, which forms a short loop that leads into the next repeat. CTPR-mediated“stapling” (constraining) of binding helices therefore occurred through residues Tyr (/) - lie (i+4) - Tyr (/'+/) - Leu (i+ 11), fully stapling a 15-residue helix.
We created a hydrophobic interface between the grafted helix and the adjacent repeat and allowed the formation of the DPNN loop at the C-terminal end of the grafted helix. We then grafted the final sequence onto the crystal structure of a CTPR B helix for further validation of the interaction. Our designed KRAS- binding protein, SOS1 -TPR, was docked against KRAS using the Haddock software (de Vries & Bonvin 201 1 ; de Vries et al. 2010). Haddock is a data-driven docking algorithm that uses known information about the interaction for its calculations. The crystal structure of SOS1 -KRAS (PDB: 1 NVU) (Margarit et al. 2003) was originally used to design the stapled peptide. The active (primary interaction residues) and the passive (5 A proximity to active) residues were extracted and inputted into the calculations.
Since the initial work, we have found that docking is not necessary to validate new helical modules. The design strategy has a solid theory based on the geometry of a-helices, and a design will be successful as long as the key binding residues have been grafted. These TPR repeats were thus found to be exceptional scaffolds to display binding helices, as they grow linearly in the opposite direction of the helix, thereby avoiding any steric clashes with the target protein.
Next, we monitored KRAS binding using the change in fluorescence polarisation of mant-GTP (2'-/3'-0- (N'-Methylanthraniloyl) guanosine-5'-0-triphosphate), a fluorescent analog of GTP (Fig. 3B). The fluorescence of mant-GTP is dependent on the hydrophobicity of its environment (excitation at 360 nm, emission at 440 nm). An increase in fluorescence intensity and fluorescence polarization was observed previously upon binding to KRAS (Leshchiner et al. 2015). SOS-TPR2 was then titrated into the preformed mant-GTP-KRAS complex. There was a clear decrease in polarisation with increasing concentrations of SOS-TPR2, indicating displacement of mant-GTP upon binding of SOS-TRP2 to KRAS (Figure 3B). Fitting the data gave an EC50 of 3.4 pM. In contrast, a blank protein, CTPR3, had no effect on the fluorescence polarisation.
2.1.2 p53-TPR, a helix-grafted binding module designed to bind to Mdm2
Many degrons (region within the substrate that is recognized by the E3 ubiquitin ligase) are unstructured.
However, p53 binds to the Mdm2 E3 through an alpha helix (Figure 4A). Stapled versions of the p53 helix, as well as circular peptides and grafted coiled coils, have been developed by many groups, and the sequences have been optimised to give nanomolar affinities in some cases (see for example, Ji et al. 2013; Lee et al. 2014; Kritzer et al. 2006). The p53 helix has a favourable geometry to be grafted onto the C-terminal solvating helix of the CTPR scaffold, and moreover the two helices have 30% sequence identity.
Proof of binding of p53-CTPR2 to Mdm2 (N-terminal domain) was obtained using isothermal titration calorimetry (ITC). Mdm2 was titrated into a solution containing 10 mM of p53-TPR2. ITC measures the heat released upon binding. A high-affinity interaction was observed with a dissociation constant of approximately 50 nM (Figure 4B).
2.2. Demonstration of proteins with a single binding function grafted onto an inter-repeat loop
2.2.1 TPB2-TPR, a loop module designed to bind to oncoprotein tankyrase
First, we introduced the SLiM“3BP2”, a sequence that binds to the substrate-binding ankyrin-repeat clusters (ARC) of the protein tankyrase, a multi-domain poly ADP-ribose polymerase that is upregulated in many cancers (Guettler et al. 2011) onto the CTPR scaffold. Grafting SLiMs in folded domains led to an increase of proteolysis resistance; showing the potential to expand the interaction surface through further rational engineering, in silico methods and/or directed evolution; controlled geometric arrangement; and bi- or multivalency of interactions.
We tested the binding of 1TBP-CTPR2, 2TBP-CTPR4 and 3TBP-CTPR6 to the ARC4 domain of tankyrase using ITC (Figure. 5A). This technique is particularly useful for these interactions, as it can measure the stoichiometry (n) of the interaction. We showed that n increased with the number of binding loops, meaning that there were as many tankyrase molecules bound to one TBP-CTPR as loops in the protein. Thus, all loops are accessible to the binding partner. Moreover, the binding affinity increases and the off rate decreases with the number of repeats indicative of an avidity effect. This type of multivalent molecule would be particularly useful for full-length tankyrase, as it has four ARC domains capable of binding the 3BP2 peptide.
Multivalency in this system was increased further via oligomerisation of the binding modules by fusing them to the foldon domain of T4 fibritin (Fig. 5B). This trimerisation domain comprises of a C-terminal helix, such as that of p53-CTPR, ending with the foldon domain, a short b-sheet peptide capable of homo- trime rising. The foldon domain has been shown to be highly stable and independently folded (Boudko et al 2002; Meier et al. 2004). In this way, multiple binding modules can be arranged with specified geometries to inhibit complex multivalent molecules that cannot be targeted with monovalent interactions due to their natural tendency to interact with other multivalent networks with high avidity.
2.2.2 Effect of introducing multivalency into a single binding function TPR
We tested the function of multi-valent CTPR proteins containing variable numbers of the“3BP2” motif that binds to the protein tankyrase. (1 TBP-CTPR2, 2TBP-CTPR4 and 3TBP-CTPR6 etc.). Multivalency was increased further via oligomerisation of the TPRs by fusing them to the foldon domain of T4 fibritin (1 TBP-CTPR2-Foldon, 2TBP-CTPR4-Foldon etc.). Tankyrase is upregulated in many cancers and exerts its effect by upregulating beta-catenin. Therefore, the inhibitory effect of the TBP- g rafted TPRs was assayed using a beta-catenin reporter gene assay (TOPFLASH assay). Increasing the number of functional units increased the inhibitory effect of the proteins, as mentioned using a Wnt signalling assay (Figure 17).
2.2.3 Skp2-RTPR, a loop module designed to bind to E3 ubiquitin ligase SCFSkp2
Skp2 is the substrate recognition subunit of the SCFSkp2 ubiquitin ligase. The Skp2-binding sequence that we inserted into the RTPR loop was based on the previously published degron peptide sequence derived from the substrate p27 that binds to Skp2 in complex with Cks1 (an accessory protein) (Hao et al. 2005). We used only 10 residues of this peptide. Although ideally the Skp2-binding sequence would include a phospho-threonine (as this residue makes some key contacts with Skp2 and Cks1 ), we instead explored whether we could replace the phospho-threonine with a phosphomimetic (glutamate) without affecting binding affinity. We found using co-immunoprecipitation that the resulting p27-TPR protein was able to bind to Skp2 (Fig. 6A) and that it was able to inhibit the ubiquitination of p27 in vitro with a high efficiency indicating a dissociation constant of the order of 30 nM (Fig. 6B). As the peptide adopts a turn-like conformation in its Skp2/Cks1 -bound state, constraining it within the RTPR scaffold leads to a large enhancement in binding affinity that outweighs any loss in affinity arising from replacing the
phosphothreonine with a phosphomimetic.
2.2.4 Nrf-TPR, a loop module designed to bind to E3 ubiquitin ligase Keap1 -Cul3
Keapl is the substrate recognition subunit of the Keap1 -Cul3 ubiquitin ligase. A Keapl -binding sequence that we inserted into the CTPR loop was based on the previously published degron peptide sequence derived from the Keapl substrate Nrf2. We found using co-immunoprecipitation that the resulting Nrf-TPR protein was able to bind to Keapl (Fig. A) and that the interaction had a high affinity in the low nanomolar range as measured by ITC analysis (Fig. 7B).
2.3. Engineering the RTPR scaffold for delivery into the cell
Combining our RTPR sequences with an alternative consensus TPR sequence (Parmeggiani et al. 2015) we included additional solvent-exposed Arginine residues, as such‘resurfacing’ or‘supercharging’ has been shown previously to facilitate the entry of proteins into cells (Chapman & McNaughton 2016; Thompson et al. 2012). Figure 8 shows that this approach was successful in delivering a fluorescent- labelled resurfaced TBP-RTPR2 protein into two different cell lines.
2.4. Design of hetero-bifunctional TPRs to direct proteins for ubiquitination and subsequent deqradation
The Wnt/p-catenin signalling pathway is deregulated in many cancers and in neurodegenerative diseases, and therefore b-catenin is an important drug target. There are a large number of known binding sequences (both helical and non-helical) for b-catenin that appear suitable for grafting onto the TPR scaffold, and therefore we chose it as the first target for our design of hetero-bifunctional TPRs to induce protein degradation. We selected Mdm2 and SCFSkp2 to test as E3 ubiquitin ligases, as we had successfully generated single-function TPRs to bind to them (Figs. 4 and 6). We generated structural models of some of the hetero-bifunctional molecules and used these as a crude assessment of whether the resulting presentation of b-catenin to the E3 looked appropriate. We then generated a small library of plasmids encoding proteins comprising three or four TPRs functionalized with different combinations of the b-catenin-binding module and the two E3 ligase-binding modules.
We transfected HA-tagged b-catenin plasmid alone or HA-tagged b-catenin plasmid together with one of the various hetero-bifunctional TPR plasmids in HEK293T cells using Lipofectamine2000. After 48 hours of transfection, the cells were lysed, the sample was boiled and proteins were resolved by SDS-PAGE and immunoblotting was performed using anti-HA and anti-actin antibodies. Changes in b-catenin levels were evaluated by the densitometry of the bands corresponding to HA-b-catenin normalised to actin levels (Fig. 9). The results show that a number of the hetero-bifunctional molecules are capable of reducing b-catenin levels by up to 70%. In contrast, neither a blank TPR nor single-function TPRs have any effect on b-catenin levels.
A range of different factors contribute to efficient ubiquitination and target degradation by these hetero-bifunctional molecules, hence the power of screening different combinations of single-function modules and potentially also different lengths of intervening blank modules.
2.5 Usinq a delivery vehicle to introduce the modular TPR proteins into cells
We encapsulated the designed TPR proteins within fusogenic liposomes made from cationic, neutral, and aromatic lipids, and we showed that they were thereby delivered into cells (Figures 18 and 19).
Empty liposomes and liposomes encapsulating TPR proteins are not toxic to the cell (Figure 20).
2.6 Further examples of designed sinqle-function and hetero-bifunctional TPRs
TPR proteins targeting tankyrase (Figures 21 and 22) were delivered into cells using liposome encapsulation, and the effect on Wnt signalling was assayed using a TOPFLASH assay. The results show that the designed TPR proteins are able to inhibit Wnt signalling. For KRAS, we transfected KRAS plasmid alone or KRAS plasmid together with one of the TPR plasmids in HEK293T cells using Lipofectamine2000. 24 hours post transfection the cells were lysed, and KRAS levels were evaluated by western blot. The results show that the designed heterobifunctional TPR is capable of reducing KRAS levels (Figure 23).
In addition, TPR proteins targeting KRAS were delivered into cells using liposome encapsulation, and the effect on HiBiT-tagged KRAS levels was evaluated the HiBiT luminescence assay. The results show that the designed hetero-bifunctional TPR proteins are able to reduce KRAS levels (Figures 24 and 25).
2.7 Hetero-bifunctional TPRs to direct KRAS for degradation via chaperone-mediated autophagy
(CMA)
Hetero-bifunctional TPR proteins were designed to target endogenous KRAS for degradation via CMA (Figure 26). TPR constructs or empty vector (light grey) were transiently transfected into either HEK293T or DLD1 (colorectal cancer cell line) using Lipofectamine2000. 24 hours post transfection the cells were lysed, and KRAS levels were evaluated by western blot. The designed hetero-bifunctional TPRs that resulted in reduction of KRAS levels compared to the empty vector control are shown in white.
2.8 Variations in the linker sequence connecting a peptide ligand to an inter-repeat loop
The linker sequence connecting a peptide ligand to an inter-repeat loop was varied in order to optimise the binding affinity for the target for Nrf-TPR, a TPR protein designed to bind to the protein Keapl (see Fig. 7). Glycine residues were introduced into the linker to provide flexibility and increased spatial sampling. The introduction of this more flexible linker sequence was found to increase the binding affinity of the Nrf-TPR protein (labelled‘Flexible’) when compared with the consensus-like linker sequence Altering the charge content of the linker sequence (’labelled‘Charged’) and altering the conformational properties (based on the predictions of the program CIDER (Holehouse et al.
Biophys. J. 1 12, 16-21 (2017)) of the loop by changing the amino acid composition of the linker sequence (labelled‘CIDER-optimised’) also affected the Keapl -binding affinity (Figure 27).
2.9 Screen for Beta Catenin degradation by bispecific CPTR construct usinq“Phospho” targeting sequence
MIA PaCa-2 cells in which endogenous Beta Catenin has been tagged with the small peptide HiBiT were transfected with Scrambled or Beta Catenin targeted siRNA, or with CPTR scaffold based constructs utilising the Phospho targeting together with single function controls. After 24 hours Beta Catenin abundance was quantified using the lytic HiBiT assay. Mean and SEM of at least 4 replicates are shown in Figure 28 (Untreated: Untreated cells, Scr: Scrambled siRNA, Lipofectamine:
Lipofectamine only treated cells, siRNA: Beta catenin targeted siRNA). The modular design of the constructs is shown in Table 12 and the sequences are shown in Table 13 (SEQ ID NOs: 1230-1304). The bispecific CPTR-based PPX229 caused a 36% reduction in signal relative to lipofectamine only demonstrating its ability to degrade Beta Catenin more effectively than both its single function controls (PPX239 3%, PPX244 14%). Positive control Beta Catenin targeted siRNA caused a 52% reduction in signal relative to scrambled RNA.
2.10 Screen for Beta Catenin degradation by bispecific CPTR construct using“AXIN” targeting sequence
MIA PaCa-2 cells in which endogenous Beta Catenin has been tagged with the small peptide HiBiT were transfected with Scrambled or Beta Catenin targeted siRNA, or with CPTR scaffold based constructs utilising the AXIN targeting sequence together with single function controls. After 24 hours Beta Catenin abundance was quantified using the lytic HiBiT assay. Mean and SEM of at least 4 replicates are shown in Figure 29 (Untreated: Untreated cells, Src: Scrambled siRNA, Lipofectamine: Lipofectamine only treated cells, siRNA: Beta catenin targeted siRNA). The modular design of the constructs is shown in Table 12 and the sequences are shown in Table 13 (SEQ ID NOs: 1230-1304). The bispecific CPTR-based PPX197 caused a 37% reduction in signal relative to lipofectamine only demonstrating its ability to degrade Beta Catenin more effectively than both its single function controls (PPX235 1 1 %, PPX240 6%). The bispecific CPTR-based PPX202 caused a 36% reduction in signal relative to lipofectamine only demonstrating its ability to degrade Beta Catenin more effectively than both its single function controls (PPX235 11 %, PPX241 8%). The bispecific CPTR-based PPX225 caused a 33% reduction in signal relative to lipofectamine only demonstrating its ability to degrade Beta Catenin more effectively than both its single function controls (PPX235 1 1 %, PPX244 14%).
2.11 Screen for Beta Catenin degradation by bispecific CPTR constructs using BCL9 targeting sequence
MIA PaCa-2 cells in which endogenous Beta Catenin has been tagged with the small peptide HiBiT were transfected with Scrambled or Beta Catenin targeted siRNA, or with CPTR scaffold based constructs utilising the BCL9 targeting together with single function controls. After 24 hours Beta Catenin abundance was quantified using the lytic HiBiT assay. Mean and SEM of at least 4 replicates are shown in Figure 30 (Untreated: Untreated cells, Scr: Scrambled siRNA, Lipofectamine:
Lipofectamine only treated cells, siRNA: Beta catenin targeted siRNA). The modular design of the constructs is shown in Table 12 and the sequences are shown in Table 13 (SEQ ID NOs: 1230-1304). The bispecific CPTR-based PPX226 caused a 36% reduction in signal relative to lipofectamine only demonstrating its ability to degrade Beta Catenin more effectively than both its single function controls (PPX237 9%, PPX244 14%). Positive control Beta Catenin targeted siRNA caused a 52% reduction in signal relative to scrambled RNA.
2.12 Screen for Beta Catenin degradation by bispecific CPTR constructs usinq“LRH1” targeting sequence
MIA PaCa-2 cells in which endogenous Beta Catenin has been tagged with the small peptide HiBiT were transfected with Scrambled or Beta Catenin targeted siRNA, or with CPTR scaffold based constructs utilising the LRH1 targeting together with single function controls. After 24 hours Beta Catenin abundance was quantified using the lytic HiBiT assay. Mean and SEM of at least 4 replicates shown in Figure 31 (Untreated: Untreated cells, Scr: Scrambled siRNA, Lipofectamine: Lipofectamine only treated cells, siRNA: Beta catenin targeted siRNA). The bispecific CPTR-based PPX203 caused a 36% reduction in signal relative to lipofectamine only (LIFO) demonstrating its ability to degrade Beta Catenin more effectively than both its single function controls (PPX241 8%, PPX236 12%). Positive control Beta Catenin targeted siRNA caused a 52% reduction in signal relative to scrambled RNA.
The bispecific CPTR-based PPX227 caused a 30% reduction in signal relative to lipofectamine only (LIPO) demonstrating its ability to degrade Beta Catenin more effectively than both its single function controls (PPX241 3%, PPX244 14%). Positive control Beta Catenin targeted siRNA caused a 52% reduction in signal relative to scrambled RNA.
2.13 Expression of CPTR constructs in MIA PaCa-2 24 hours after transfection.
MIA PaCa-2 cells were transfected with CPTR scaffold based constructs PPX197, PPX202, PPX203, PPX225, PPX226, PPX227 and PPX229 which contain HA tags. After 24 hours cells were fixed, permeabilised and stained for HA tag. Hoescht was used as a nuclear counterstain to obtain images of HA and Hoescht staining for transfected cells (not shown). HA was detected in cells transfected with PPX197, PPX202, PPX203, PPX225, PPX226, PPX227 and PPX229 indicating that CPTR- based molecules are expressed in cells.
Figure imgf000053_0001
Table 1
Figure imgf000054_0001
Table 2
co w
03
H
Figure imgf000055_0001
CLUSTAL multiple sequence alignment by MUSCLE (3.8)
RTPRc
GCAGAAGCACTGCGTAATCTGGGTCGTGTTTATCGTCGTCAGGGTCGTTATCAGCGTGCA
RTPRa-ii-H
GCCGAAGCTTGGTATAATCTGGGGAATGCCTATTACAGACAGGGGGATTATCAGCGCGCC
RTPRa-i-E
GCAGAAGCATGGTATAATCTGGGTAATGCATATTATCGCCAGGGTGATTATCAGCGTGCC
RTPRa-iii-E
GCAGAAGCATGGTATAATCTGGGCAATGCATATTATCGTCAGGGTGATTATCAGCGTGCC
CTPRa-E
GCAGAAGCATGGTATAATCTGGGTAATGCATATTACAAACAGGGCGATTATCAGAAAGCC
CTPRb-E
GCAGAAGCACTGAATAATCTGGGTAATGTTTATCGTGAACAGGGCGATTATCAGAAAGCC
RTPRb-E
GCAGAAGCACTGAATAATCTGGGTAATGTTTATCGTGAACAGGGCGATTATCAGCGTGCC
RTPRa-ii-E
GCCGAGGCCTGGTATAACCTTGGCAACGCCTATTATCGTCAAGGCGACTACCAGAGAGCA
RTPRc-H
GCCGAGGCTCTGAGAAATCTGGGCAGAGTGTACAGACGGCAGGGCAGATACCAGCGGGCC
CTPRb-H
GCCGAGGCTCTGAACAACCTGGGCAACGTGTACAGAGAGCAGGGCGACTACCAGAAGGCC
RTPRb-H
GCCGAGGCTCTGAACAACCTGGGCAACGTGTACAGAGAGCAGGGCGACTACCAGCGGGCC
RTPRa-iv-E
GCCGAGGCCTGGTACAACCTGGGTAACGCCTATTATCGCCAAGGCGACTACCAGCGTGCA
CTPRa-H
GCCGAGGCCTGGTACAATCTGGGCAACGCCTACTACAAGCAGGGCGACTACCAGAAGGCC
RTPRa-i-H
GCCGAGGCCTGGTACAACCTGGGCAACGCCTACTACCGGCAGGGCGACTACCAGCGGGCC
RTPRc ATTGAATATTATCGTCGCGCACTGGAATTAGATCCGNNNNNN ( SEQ ID NO 170)
RTPRa-ii-H ATTGAATATTATCAGCGGGCTCTGGAACTGGATCCTNNNNNN ( SEQ ID NO 171)
RTPRa-i-E ATTGAATATTATCAACGTGCACTGGAACTGGACCCGNNNNNN ( SEQ ID NO 172)
RTPRa-iii-E ATCGAATATTATCAACGTGCACTGGAACTGGACCCGNNNNNN ( SEQ ID NO 173)
CTPRa-E ATCGAGTATTATCAAAAAGCACTGGAACTGGACCCGNNNNNN ( SEQ ID NO 174)
CTPRb-E ATCGAATATTATCAAAAAGCGCTGGAACTGGACCCGNNNNNN ( SEQ ID NO 175)
RTPRb-E ATTGAATATTATCAACGTGCGCTGGAATTAGATCCGNNNNNN ( SEQ ID NO 176)
RTPRa-ii-E ATCGAATATTACCAGCGTGCGTTAGAATTAGATCCGNNNNNN ( SEQ ID NO 177)
RTPRc-H ATCGAGTATTACCGCAGAGCCCTGGAACTGGACCCCNNNNNN ( SEQ ID NO 178)
CTPRb-H ATCGAGTATTATCAGAAGGCCCTGGAACTGGACCCCNNNNNN ( SEQ ID NO 179)
RTPRb-H ATCGAGTATTATCAGAGAGCCCTGGAACTGGACCCCNNNNNN ( SEQ ID NO 180)
RTPRa-iv-E ATTGAGTACTACCAACGTGCCCTGGAACTGGACCCTNNNNNN ( SEQ ID NO 181)
CTPRa-H ATCGAGTATTATCAGAAGGCCCTGGAACTGGACCCCNNNNNN ( SEQ ID NO 182)
RTPRa-i-H ATCGAGTACTACCAGAGAGCCCTGGAACTGGACCCTNNNNNN ( SEQ ID NO 183)
Table 4
Multiple Alignment of DNA sequences of all CTPR and RTPR used in hetero-bifunctional CTPRs and
RTPRs to date 089 AEAYSNLGNVYKERGQLQEAIEHYRHALRL 118 NP_858058.1(SEQ ID NO: 184) 191 AVAWSNLGCVFNAQGEIWLAIHHFEKAVTL 220
327 ADSLNNLANIKREQGNIEEAVRLYRKALEV 356
264 NLACVYYEQGLIDLAIDTYRRAIEL 288
079 AEAYSNLGNVYKERGQLQEAIEHYRHALRL 108 NP_858059.1(SEQ ID NO: 185) 181 AVAWSNLGCVFNAQGEIWLAIHHFEKAVTL 210
317 ADSLNNLANIKREQGNIEEAVRLYRKALEV 346
254 NLACVYYEQGLIDLAIDTYRRAIEL 278
079 AEAYSNLGNVYKERGQLQEAIEHYRHALRL 108 NP_003596.2 (SEQ ID NO: 186)
812 ESFYNLGRGLHQLGLIHLAIHYYQKALEL 840 NP_036218.1 (SEQ ID NO: 187)
637 AWYGLGMIYYKQEKFSLAEMHFQKALDI 664 NP_001247.2 (SEQ ID NO: 188) 568 EAWCAAGNCFSLQREHDIAIKFFQRAIQV 596
275 AQSCYSLGNTYTLLQDYEKAIDYHLKHLAI 304
058 YSQLGNAYFYLHDYAKALEYHHHDLTL 084
315 GRACWSLGNAYTALGNHDQAMHFAEKHLEI 344
247 NMGNIYLKQRNYSKAIKFYRMALD 270 NP_783195.2 (SEQ ID NO: 189) 495 ALTNKGNTVFANGDYEKAAEFYKEAL 520
238 NMGNIYLKQRNYSKAIKFYRMALD 261 NP_006522.2 (SEQ ID NO: 190) 486 ALTNKGNTVFANGDYEKAAEFYKEAL 511
715 AQAWMNMGGIQHIKGKYVSARAYYERALQL 744 NP_787057.2 (SEQ ID NO: 191) 575 AEILSPLGALYYNTGRYEEALQIYQEAAAL 604
114 AQAAKNKGNKYFKAGKYEQAIQCYTEAISL 143 NP_055635.3 (SEQ ID NO: 192)
610 YNLGKLYHEQGHYEEALSVYKEAIQ 634 NP_689801.1 (SEQ ID NO: 193)
586 AQAWMNMGGIQHIKGKYVSARAYYERALQL 615 NP_114126.2 (SEQ ID NO: 194) 446 AEILSPLGALYYNTGRYEEALQIYQEA 472
018 AETFKEQGNAYYAKKDYNEAYNYYTKAIDM 47 NP_003306.1 (SEQ ID NO: 195)
300 AKAYARIGNSYFKEEKYKDAIHFYNKSL 327 NP_006810.1 (SEQ ID NO: 196) 231 LGNDAYKKKDFDTALKHYDKAKEL 254
365 NKGNECFQKGDYPQAMKHYTEAI 387
028 AETFKEQGNAYYAKKDYNEAYNYYTKAIDM 057 AAH11837.2 (SEQ ID NO: 197)
228 AYSNLGNAHVFLGRFDVAAEYYKKTLQL 255 NP_056412.2 (SEQ ID NO: 198) 266 AQACYSLGNTYTLLQDYERAAEYHLRHL 293
28 AEELKTQANDYFKAKDYENAIKFYSQAIEL 57 NP_006238.1 (SEQ ID NO: 199)
318 DAYKSLGQAYRELGNFEAATESFQKALLL 346 NP_078801.2 (SEQ ID NO: 200)
1262 ETLKNLAVLSYEGGDFEKAAELYKRAMEI 1290 NP_694972.3 (SEQ ID NO: 201)
140 GNKYFKQGKYDEAIDCYTKGMD 161 NP_078880.1 (SEQ ID NO: 202) 289 GNGFFKEGKYERAIECYTRGI 309
600 CWESLGEAYLSRGGYTTALKSFTKASEL 627 NP_055454.1 (SEQ ID NO: 203)
172 KATYRAGIAFYHLGDYARALRYLQEA 197 NP_689692.2 (SEQ ID NO: 204) 174 LGKIHLLEGDLDKAIEVYKKAVE 196 NP_149017.2 (SEQ ID NO: 205)
158 LGDLFSKAGDFPRAAEAYQKQLRF 181 NP_038460.3 (SEQ ID NO: 206)
384 AYFNAGNIYFHHRQFSQASDYFSKALKF 411 NP_001007796.1 (SEQ ID NO:
207)
104 EAWNQLGEVYWKKGDVAAAHTCFSGAL 130 NP_612385.1 (SEQ ID NO: 208)
814 EAWQGLGEVLQAQGQNEAAVDCFLTALEL 842 NP_065191.2 (SEQ ID NO: 209)
446 AKLWNNVGHALENEKNFERALKYFLQA 472 NP_861448.1 (SEQ ID NO: 210)
597 ADLWYNLAIVHIELKEPNEALKNFNRALEL 626
251 YRRKGDLDKAIELFQRVLE 269 NP_001540.2 (SEQ ID NO: 211)
251 YRRKGDLDKAIELFQRVLE 269 NP_001026853.1 (SEQ ID NO:
212)
079 AKTYKDEGNDYFKEKDYKKAVISYTEGL 106 NP_004614.2 (SEQ ID NO: 213)
501 AKVHYNIGKNLADKGNQTAAIRYYREAVRL 530 NP_116202.2 (SEQ ID NO: 214)
482 AKVHYNIGKNLADKGNQTAAIRYYREAVRL 511 NP_001073137.1 (SEQ ID NO : 215 )
200 GNELVKKGNHKKAIEKYSESL 220 NP_006800.2 (SEQ ID NO : 216)
123 GNEQFKKGDYIEAESSYSRALEM 145 NP_003305.1 (SEQ ID NO: 217)
438 ESLSLLGFVYKLEGNMNEALEYYERALRL 466 NP_001001887.1 (SEQ ID NO: 218)
438 ESLSLLGFVYKLEGNMNEALEYYERALRL 466 NP_001539.3 (SEQ ID NO : 219)
375 AKTKNNLASAYLKQNKYQQAEELYKEIL 402 NP_803136.2 (SEQ ID NO: 220)
564 WFSLGCAYLALEDYQGSAKAFQRCVTL 590 NP_060205.3 (SEQ ID NO: 221)
306 AESCYQLARSFHVQEDYDQAFQYYYQATQF 335 NP_055448.1 (SEQ ID NO: 222)
Table 5
CTPRa E. coll expression codon optimised
GCAGAAGCATGGTATAATCTGGGTAATGCATATTACAAACAGGGCGATTATCAGAAAGCCATCGAGTATTATCAA AAAGCACTGGAACTGGACCCGNNNNNN ( SEQ ID NO: 223)
AEAWYNLGNAYYKQGDYQKAIEYYQKALELDPXX ( SEQ ID NO: 224)
CTPRa H. Sapiens expression codon optimised
GCCGAGGCCTGGTACAATCTGGGCAACGCCTACTACAAGCAGGGCGACTACCAGAAGGCCATCGAGTATTATCAG AAGGCCCTGGAACTGGACCCCNNNNNN ( SEQ ID NO: 225)
AEAWYNLGNAYYKQGDYQKAIEYYQKALELDPXX ( SEQ ID NO: 224)
RTPRa-i H. sapiens expression codon optimised
GCCGAGGCCTGGTACAACCTGGGCAACGCCTACTACCGGCAGGGCGACTACCAGCGGGCCATCGAGTACTACCAG AGAGCCCTGGAACTGGACCCTNNNNNN ( SEQ ID NO: 226)
AEAWYNLGNAYYRQGDYQRAIEYYQRALELDPXX ( SEQ ID NO: 227)
RTPRa-ii H. sapiens expression codon optimised
GCCGAAGCTTGGTATAATCTGGGGAATGCCTATTACAGACAGGGGGATTATCAGCGCGCCATTGAATATTATCAG CGGGCTCTGGAACTGGATCCTNNNNNN (SEQ ID NO: 228)
AEAWYNLGNAYYRQGDYQRAIEYYQRALELDPXX ( SEQ ID NO: 227)
RTPRa-i E.coli expression codon optimised
GCAGAAGCATGGTATAATCTGGGTAATGCATATTATCGCCAGGGTGATTATCAGCGTGCCATTGAATATTATCAA CGTGCACTGGAACTGGACCCGNNNNNN ( SEQ ID NO: 229)
AEAWYNLGNAYYRQGDYQRAIEYYQRALELDPXX ( SEQ ID NO: 227)
RTPRa-ii E.coli expression codon optimised
GCCGAGGCCTGGTATAACCTTGGCAACGCCTATTATCGTCAAGGCGACTACCAGAGAGCAATCGAATATTACCAG CGTGCGTTAGAATTAGATCCGNNNNNN (SEQ ID NO: 230)
AEAWYNLGNAYYRQGDYQRAIEYYQRALELDPXX ( SEQ ID NO: 227)
RTPRa-iii E.coli expression codon optimised
GCAGAAGCATGGTATAATCTGGGCAATGCATATTATCGTCAGGGTGATTATCAGCGTGCCATCGAATATTATCAA CGTGCACTGGAACTGGACCCGNNNNNN ( SEQ ID NO: 231) AEAWYNLGNAYYRQGDYQRAIEYYQRALELDPXX ( SEQ ID NO: 227)
RTPRa-iv E.coli expression codon optimised
GCCGAGGCCTGGTACAACCTGGGTAACGCCTATTATCGCCAAGGCGACTACCAGCGTGCAATTGAGTACTACCAA CGTGCCCTGGAACTGGACCCTNNNNNN (SEQ ID NO: 232)
AEAWYNLGNAYYRQGDYQRAIEYYQRALELDPXX ( SEQ ID NO: 227)
CTPRb E. coli expression codon optimised
GCAGAAGCACTGAATAATCTGGGTAATGTTTATCGTGAACAGGGCGATTATCAGAAAGCCATCGAATATTATCAA AAAGCGCTGGAACTGGACCCGNNNNNN ( SEQ ID NO: 233)
AEALNNLGNVYREQGDYQKAIEYYQKALEL-DPXX ( SEQ ID NO: 234)
Figure imgf000060_0001
GCCGAGGCTCTGAACAACCTGGGCAACGTGTACAGAGAGCAGGGCGACTACCAGAAGGCCATCGAGTATTATCAG AAGGCCCTGGAACTGGACCCCNNNNNN ( SEQ ID NO: 235)
AEALNNLGNVYREQGDYQKAIEYYQKALEL-DPXX ( SEQ ID NO: 234)
RTPRb E. coli expression codon optimised
GCAGAAGCACTGAATAATCTGGGTAATGTTTATCGTGAACAGGGCGATTATCAGCGTGCCATTGAATATTATCAA CGTGCGCTGGAATTAGATCCGNNNNNN (SEQ ID NO: 236)
AEALNNLGNVYREQGDYQRAIEYYQRALEL-DPXX ( SEQ ID NO: 237)
Figure imgf000060_0002
GCCGAGGCTCTGAACAACCTGGGCAACGTGTACAGAGAGCAGGGCGACTACCAGCGGGCCATCGAGTATTATCAG AGAGCCCTGGAACTGGACCCCNNNNNN ( SEQ ID NO: 238)
AEALNNLGNVYREQGDYQRAIEYYQRALELDPXX ( SEQ ID NO: 239)
RTPRc E. Coli expression codon optimised
GCAGAAGCACTGCGTAATCTGGGTCGTGTTTATCGTCGTCAGGGTCGTTATCAGCGTGCAATTGAATATTATCGT CGCGCACTGGAATTAGATCCGNNNNNN (SEQ ID NO: 240)
AEALRNLGRVYRRQGRYQRAIEYYRRALELDPXX ( SEQ ID NO: 241)
RTPRc H. Sapiens expression codon optimised
GCCGAGGCTCTGAGAAATCTGGGCAGAGTGTACAGACGGCAGGGCAGATACCAGCGGGCCATCGAGTATTACCGC AGAGCCCTGGAACTGGACCCCNNNNNN (SEQ ID NO: 242)
AEALRNLGRVYRRQGRYQRAIEYYRRALELDPXX ( SEQ ID NO: 241)
Additional TPR repeat consensus sequences:
S75991 ALTLNNIGTI YYAREDYDQA LNYYEQALSL SRAV
(SEQ ID NO: 243)
CONSENSUS/80% XXhhXthuXh hXXXtphppA htXhppsltht XpX (SEQ ID NO: 244)
CONSENSUS/ 65% spshhphGth hhphsphppA lphappAlpl pspX (SEQ ID NO: 245)
CONSENSUS/50% spsatslGps atptucaccA lcsap+ALcl sPss (SEQ ID NO: 246)
Figure imgf000061_0001
alcohol O S,T
aliphatic L l,L,V
any A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y aromatic A F,H,W,Y charged C D,E,H,K,R
hydrophobic H A,C,F,G,H,I,K,L,M,R,T,V,W,Y
negative D,E
polar P C,D,E,H,K,N,Q,R,S,T
positive + H,K,R
A,C,D,G,N,P,S,T,V
small S tiny U A,G,S
turnlike T A,C,D,E,G,H,K,N,Q,R,S,T
Table 6 RTPR and CTPR sequences
Figure imgf000062_0001
SEQ ID NOs 247 to 276
Table 7
Figure imgf000063_0001
Figure imgf000064_0001
Figure imgf000065_0001
Figure imgf000066_0001
Figure imgf000067_0001
Figure imgf000068_0001
Figure imgf000069_0001
SEQ ID NOs 277 to 388 SEQ ID NOs 389 to 469
Table 8 (SEQ ID NOs: 277 to 469)
Figure imgf000070_0001
Figure imgf000071_0001
Figure imgf000072_0001
Figure imgf000073_0001
Figure imgf000074_0001
Figure imgf000075_0001
Figure imgf000075_0002
O LO 1. Axin—RTPR—ABBA
MGSGAYPEYILDIHVYRVQLELDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAEAWYNLGNAYYRQG DYQRAIEYYQRALELDPGGSLSSAFHVFEDGNKENGGPNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNN
(SEQ ID NO: 766)
2. Axin—RTPR—DBOX
MGSGAYPEYILDIHVYRVQLELDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAEAWYNLGNAYYRQG DYQRAIEYYQRALELDPGGPRLPLGDVSNNGGPNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNN ( SEQ ID NO: 767)
3. Axin—RTPR—KEN
MGSGAYPEYILDIHVYRVQLELDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAEAWYNLGNAYYRQG DYQRAIEYYQRALELDPGGPRLPLGDVSNNGGPNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNN ( SEQ ID NO: 768)
4. Axin—RTPR—Nrf2
MGSGAYPEYILDIHVYRVQLELDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAEAWYNLGNAYYRQG DYQRAIEYYQRALELDPGGPRLPLGDVSNNGGPNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNN ( SEQ ID NO : 769)
5. Axin—RTPR—SIAH
MGSGAYPEYILDIHVYRVQLELDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAEAWYNLGNAYYRQG DYQRAIEYYQRALELDPGGLRPVAMVRPTVGGPNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNN ( SEQ ID NO : 770)
6. Axin—RTPR—SPOP
MGSGAYPEYILDIHVYRVQLELDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPGGLACDEVTSTTSSSTA GGPNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNN ( SEQ ID NO: 771)
7. Axin—RTPR—p27
MGSGAYPEYILDIHVYRVQLELDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAEAWYNLGNAYYRQG DYQRAIEYYQRALELDPNNAGSNEQEPKKRSPDAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNN ( SEQ ID NO: 772)
8. Axin—RTPR—p53
MGSGAYPEYILDIHVYRVQLELDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAEAWYNLGNAYYRQG DYQRAIEYYQRALELDPNNFAAYWNLLSAYG (SEQ ID NO: 773)
10. Bel9—RTPR—ABBA
MGSSQEQLEHRYRSLITLYDIQLMLDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAEAWYNLGNAYY RQGDYQRAIEYYQRALELDPGGDPETGELGGPNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNN ( SEQ ID NO : 774)
11. Bel9—RTPR—DBOX—vl
MGSSQEQLEHRYRSLITLYDIQLMLDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAEAWYNLGNAYY RQGDYQRAIEYYQRALELDPGGPRLPLGDVSNNGGPNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNN (SEQ ID NO: 775)
12. Bel9—RTPR—DBOX—v2
MGSSQEQLEHRYRSLITLYDIQLMLDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPGGPRLPLGDVSNNG GPNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNN (SEQ ID NO: 776)
13. Bel9—RTPR—KEN
MGSGAYPEYILDIHVYRVQLELDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAEAWYNLGNAYYRQG DYQRAIEYYQRALELDPGGSLSSAFHVFEDGNKENGGPNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNN ( S EQ ID NO: 111
14. Bcl9—RTPR—Nrf2 MGSSQEQLEHRYRSLITLYDIQLMLDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAEAWYNLGNAYY RQGDYQRAIEYYQRALELDPGGDPETGELGGPNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNN ( SEQ ID NO : 778)
15. Bel9—RTPR—p27
MGSSQEQLEHRYRSLITLYDIQLMLDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAEAWYNLGNAYY RQGDYQRAIEYYQRALELDPNNAGSNEQEPKKRSPDAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNN (SEQ ID NO: 779)
16. Bel9—RTPR—p53
MGSSQEQLEHRYRSLITLYDIQLMLDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAEAWYNLGNAYY RQGDYQRAIEYYQRALELDPNNFAAYWNLLSAYG (SEQ ID NO: 780)
17. Bcl9—RTPR—SIAH
MGSSQEQLEHRYRSLITLYDIQLMLDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAEAWYNLGNAYY RQGDYQRAIEYYQRALELDPGGLRPVAMVRPTVGGPNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNN (SEQ ID NO: 781)
18. Bcl9—RTPR—SPOP
MGSSQEQLEHRYRSLITLYDIQLMLDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPGGLACDEVTSTTSS STAGGPNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNN (SEQ ID NO: 782)
19. TCF7L2—RTPR—Nrf2
MGSAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPGGQELGDNDELMHFSYESTQDGGPNAEAWYNLGNAYYRQG DYQRAIEYYQRALELDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPGGDPETGELGGPNAEAWYNLGNAY YRQGDYQRAIEYYQRALELDPNN (SEQ ID NO: 783)
20. TCF7L2—RTPR—p27
MGSAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPGGQELGDNDELMHFSYESTQDGGPNAEAWYNLGNAYYRQG DYQRAIEYYQRALELDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAGSNEQEPKKRSPDAEAWYNLG NAYYRQGDYQRAIEYYQRALELDPNN (SEQ ID NO: 784)
21. p27—RTPR—TCF7L2
MRGSHHHHHHGLVPRGSAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAGSNEQEPKKRSPDAEAWYNLGNA YYRQGDYQRAIEYYQRALELDPRSAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPGGQELGDNDELMHFSYEST QDGGPNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPRS ( SEQ ID NO: 785)
22. TCF7L2—RTPR—p53
MGSAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPGGQELGDNDELMHFSYESTQDGGPNAEAWYNLGNAYYRQG DYQRAIEYYQRALELDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNFAAYWNLLSAYG ( SEQ ID NO : 786)
23. ICAT—RTPR—p27
MGSYAYQRAIVEYMLRLMSDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAEAWYNLGNAYYRQGDYQ RAIEYYQRALELDPNNAGSNEQEPKKRSPDAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNN (SEQ ID NO: 787)
24. ICAT—RTPR—p53
MGSYAYQRAIVEYMLRLMSDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAEAWYNLGNAYYRQGDYQ RAIEYYQRALELDPNNFAAYWNLLSAYG (SEQ ID NO: 788)
25. LRH1—RTPR—ABBA
MGSYEQAIAAYLDALMCDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAEAWYNLGNAYYRQGDYQRA IEYYQRALELDPGGSEDKENVPPGGPNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNN (SEQ ID NO:
789)
26. LRH1—RTPR—DBOX
MGSYEQAIAAYLDALMCDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAEAWYNLGNAYYRQGDYQRA IEYYQRALELDPGGPRLPLGDVSNNGGPNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNN (SEQ ID NO:
790) 27. LRH1—RTPR—KEN
MGSYEQAIAAYLDALMCDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAEAWYNLGNAYYRQGDYQRA IEYYQRALELDPGGSEDKENVPPGGPNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNN (SEQ ID NO: 791)
28. LRH1—RTPR—Nrf2
MGSYEQAIAAYLDALMCDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAEAWYNLGNAYYRQGDYQRA IEYYQRALELDPGGDPETGELGGPNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNN ( SEQ ID NO: 792)
29. LRH1—RTPR—p27
MGSYEQAIAAYLDALMCDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAEAWYNLGNAYYRQGDYQRA IEYYQRALELDPNNAGSNEQEPKKRSPDAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNN (SEQ ID NO: 793)
30. LRH1—RTPR—p53
MGSYEQAIAAYLDALMCDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAEAWYNLGNAYYRQGDYQRA IEYYQRALELDPNNFAAYWNLLSAYG (SEQ ID NO: 794)
31. LRH1-RTPR-SIAH
MGSYEQAIAAYLDALMCDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAEAWYNLGNAYYRQGDYQRA IEYYQRALELDPGGLRPVAMVRPTVGGPNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNN (SEQ ID NO: 795)
32. LRH1—RTPR—SPOP
MGSYEQAIAAYLDALMCDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAEAWYNLGNAYYRQGDYQRA IEYYQRALELDPGGLACDEVTSTTSSSTAGGPNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNN ( SEQ ID NO : 796)
33. APC—RTPR—Nrf2
MGSAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPGGSCSEELEALEALELDEGGPNAEAWYNLGNAYYRQGDYQ RAIEYYQRALELDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPGGDPETGELGGPNAEAWYNLGNAYYRQ GDYQRAIEYYQRALELDPNN (SEQ ID NO: 797)
34. APC—RTPR—p27
MGSAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPGGSCSEELEALEALELDEGGPNAEAWYNLGNAYYRQGDYQ RAIEYYQRALELDPNN (SEQ ID NO: 798)
35. p27—RTPR—APC
MRGSHHHHHHGLVPRGSAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAGSNEQEPKKRSPDAEAWYNLGNA YYRQGDYQRAIEYYQRALELDPRSAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPGGSCSEELEALEALELDEG GPNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPRS ( SEQ ID NO: 799)
36. APC—RTPR—p53
MGSAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPGGQELGDNDELMHFSYESTQDGGPNAEAWYNLGNAYYRQG DYQRAIEYYQRALELDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNFAAYWNLLSAYG ( SEQ ID NO : 800)
37. 1TBP-CTPR2
MGSAEAWYNLGNAYYKQGDYQKAIEYYQKALELDPNNREAGDGEEDPRSAEAWYNLGNAYYKQGDYQKAIEYYQK ALELDPRS (SEQ ID NO: 801)
38. 2TBP—CTPR4
MGSAEAWYNLGNAYYKQGDYQKAIEYYQKALELDPNNREAGDGEEDPRSAEAWYNLGNAYYKQGDYQKAIEYYQK ALELDPRSAEAWYNLGNAYYKQGDYQKAIEYYQKALELDPNNREAGDGEEDPRSAEAWYNLGNAYYKQGDYQKAI EYYQKALELDPRS (SEQ ID NO: 802)
39. 3TBP—CTPR6
MGSAEAWYNLGNAYYKQGDYQKAIEYYQKALELDPNNREAGDGEEDPRSAEAWYNLGNAYYKQGDYQKAIEYYQK ALELDPRSAEAWYNLGNAYYKQGDYQKAIEYYQKALELDPNNREAGDGEEDPRSAEAWYNLGNAYYKQGDYQKAI EYYQKALELDPRSAEAWYNLGNAYYKQGDYQKAIEYYQKALELDPNNREAGDGEEDPRSAEAWYNLGNAYYKQGD YQKAIEYYQKALELDPRS (SEQ ID NO: 803)
40. 4TBP—CTPR8
MGSAEAWYNLGNAYYKQGDYQKAIEYYQKALELDPNNREAGDGEEDPRSAEAWYNLGNAYYKQGDYQKAIEYYQK ALELDPRSAEAWYNLGNAYYKQGDYQKAIEYYQKALELDPNNREAGDGEEDPRSAEAWYNLGNAYYKQGDYQKAI EYYQKALELDPRSAEAWYNLGNAYYKQGDYQKAIEYYQKALELDPNNREAGDGEEDPRSAEAWYNLGNAYYKQGD YQKAIEYYQKALELDPRSAEAWYNLGNAYYKQGDYQKAIEYYQKALELDPNNREAGDGEEDPRSAEAWYNLGNAY YKQGDYQKAIEYYQKALELDPRS (SEQ ID NO: 804)
41. 1TBP—CTPR2—Foldon (Foldon sequence in bold)
MGSAEAWYNLGNAYYKQGDYQKAIEYYQKALELDPNNREAGDGEEDPRSAEAWYNLGNAYYKQGDYQKAIEYYQK
ALELDPRSAKASLNLANADIKTIQEAGYIPEAPRDGQAYVRKDGEWVLLSTFLRS ( SEQ ID NO: 805)
42. 2TBP—CTPR4—Foldon (Foldon sequence in bold)
MGSAEAWYNLGNAYYKQGDYQKAIEYYQKALELDPNNREAGDGEEDPRSAEAWYNLGNAYYKQGDYQKAIEYYQK
ALELDPRSAEAWYNLGNAYYKQGDYQKAIEYYQKALELDPNNREAGDGEEDPRSAEAWYNLGNAYYKQGDYQKAI
EYYQKALELDPRSAKASLNLANADIKTIQEAGYIPEAPRDGQAYVRKDGEWVLLSTFLRS ( SEQ ID NO:
806)
43. 3TBP—CTPR6—Foldon (Foldon sequence in bold)
MGSAEAWYNLGNAYYKQGDYQKAIEYYQKALELDPNNREAGDGEEDPRSAEAWYNLGNAYYKQGDYQKAIEYYQK
ALELDPRSAEAWYNLGNAYYKQGDYQKAIEYYQKALELDPNNREAGDGEEDPRSAEAWYNLGNAYYKQGDYQKAI
EYYQKALELDPRSAEAWYNLGNAYYKQGDYQKAIEYYQKALELDPNNREAGDGEEDPRSAEAWYNLGNAYYKQGD
YQKAIEYYQKALELDPRSAKASLNLANADIKTIQEAGYIPEAPRDGQAYVRKDGEWVLLSTFLRS (SEQ ID NO: 807)
44. 4TBP—CTPR8—Foldon (Foldon sequence in bold)
MGSAEAWYNLGNAYYKQGDYQKAIEYYQKALELDPNNREAGDGEEDPRSAEAWYNLGNAYYKQGDYQKAIEYYQK
ALELDPRSAEAWYNLGNAYYKQGDYQKAIEYYQKALELDPNNREAGDGEEDPRSAEAWYNLGNAYYKQGDYQKAI
EYYQKALELDPRSAEAWYNLGNAYYKQGDYQKAIEYYQKALELDPNNREAGDGEEDPRSAEAWYNLGNAYYKQGD
YQKAIEYYQKALELDPRSAEAWYNLGNAYYKQGDYQKAIEYYQKALELDPNNREAGDGEEDPRSAEAWYNLGNAY
YKQGDYQKAIEYYQKALELDPRSAKASLNLANADIKTIQEAGYIPEAPRDGQAYVRKDGEWVLLSTFLRS ( SEQ ID NO: 808)
45. KBL —RTPR—CMA_Q
MGSIPNPLLGLDGSAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNPLYISYDPAEAWYNLGNAYYRQGDYQR AIEYYQRALELDPRSAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNQRFFE (SEQ ID NO: 809)
46. CMA_Q—KBL—RTPR
MGSQRFFEGSAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNPLYISYDPAEAWYNLGNAYYRQGDYQRAIEY YQRALELDPRSAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNN ( SEQ ID NO: 810)
47. CMA_K—KBL—RTPR
MGSKFERQGSAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNPLYISYDPAEAWYNLGNAYYRQGDYQRAIEY YQRALELDPRSAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNN ( SEQ ID NO: 811)
48. SOS—RTPR—CMA_K
MGSFEGIALTNYLKALEGDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAEAWYNLGNAYYRQGDYQR AIEYYQRALELDPRSKFERQ (SEQ ID NO: 812)
49. SOS—RTPR—CMA_Q
MGSFEGIALTNYLKALEGDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPRSIPNPLLGLDKFERQ (SEQ ID NO: 813)
50. SOS—RTPR—p27
MGSFEGIALTNYLKALEGDPNNAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAEAWYNLGNAYYRQGDYQR AIEYYQRALELDPRSAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAGSNEQEPKKRSPDAEAWYNLGNAYY RQGDYQRAIEYYQRALELDPRS (SEQ ID NO: 814)
51. KBL—RTPR—p27
MGSAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNPLYISYDPAEAWYNLGNAYYRQGDYQRAIEYYQRALEL DPRSAEAWYNLGNAYYRQGDYQRAIEYYQRALELDPNNAGSNEQEPKKRSAEAWYNLGNAYYRQGDYQRAIEYYQ RALELDPNN (SEQ ID NO: 815)
Table 10
S75991 ALTLNNIGTI YYAREDYDQA LNYYEQALSL SRAV ( SEQ ID NO 816)
2407639-54 AEELKEQANE YFRVKDYDHA VQYYTQAIDL SPDT(SEQ ID NO 817)
YDAB_MY WRAWFRLGLA YDGAGDRRRA REAIRRAITL EKNP ( SEQ ID NO 818)
GSIA_BA SSALYNLGNC YEKMGELQKA AEYFGKSVSI CKSE(SEQ ID NO 819)
E64417 VPMWVKKAEI LRKLGRYEDA LLCLNRALEL KPHD ( SEQ ID NO 820)
A56519 EPLLNNLGHV CRKLKKYAEA LDYHRQALVL IPQN(SEQ ID NO 821)
DMUNKNOWN_l-96 EKVYFNLGML AMDESSFDEA EQFFKRAIHL KADF ( SEQ ID NO 822)
SPAC6B12_12-516 SEVYNYFGEI LLDQQKFDDA VKNFDHAIEL EKRE ( SEQ ID NO 823)
NUC2_SC PESWCI LANC FSLQREHSQA LKCINRAIQL DPTF ( SEQ ID NO 824)
CELZK320 ARICNLQAEL LYRRREFSQA VDICKQALAY HETD ( SEQ ID NO 825)
NFRA_EC ANAYVARATI YRQRHNVPAA VSDLRAALEL EPNN ( SEQ ID NO 826)
CCU2886 GLAWHILAIA REKTGDFASS LRAYEAALAL LPDH ( SEQ ID NO 827)
S766850 FELHYHLANA YRRQQKWELA RKHYQKALDE EILL ( SEQ ID NO 828)
CELT19A VAIHFTLGNI YASIGDYQRA LNFYYSTLSL QANF ( SEQ ID NO 829)
S766851 AEAYQNLAVT SFKAGLIQES VDAFQQAIAL YEQR ( SEQ ID NO 830)
S 75601 YKGWYQQGNT LWQQGKYNDA LLSYERALEY YPGD ( SEQ ID NO 831)
YHBM_EC PEVFNYLGIY LTQAGNFDAA YEAFDSVLEL DPTY ( SEQ ID NO 832)
HSU4203 EKGLYRRGEA QLLMNEFESA KGDFEKVLEV NPQN ( SEQ ID NO 833)
YKD1_CA ADLRVGIGHC FAKMGMMDKA KTAFERAMEI EPYN ( SEQ ID NO 834)
CELZK32 PGSYSLLGDA FMKVQEPEDA INFYEQALKM QSKD ( SEQ ID NO 835)
S56658 SKGYTRKGAV QFSMKEYDKA LETYREGLKH DPNN ( SEQ ID NO 836)
G02540 AQSCYSLGNT YTLLQDYEKA IDYHLKHLAI AQEL ( SEQ ID NO 837)
CELC55B ADILCERAEA HILDEDYDSA IEDYQKATEV NPDH ( SEQ ID NO 838)
CELF30H PVLYRNRAMA RLKRDDFEGA QSDCTKALEF DGAD ( SEQ ID NO 839)
B55508 VESTSLLGLV YKLKGQEKNA LFYYEKALRL TGEM ( SEQ ID NO 840)
SMU5458 TDCLYHLAVG FLKLNDYHNA TIYCQCLLTI EPDN ( SEQ ID NO 841)
BSTHRZ1 GRTLYNIGLC KNSQSQYEDA IPYFKRAIAV FEES (SEQ ID NO 842)
PSEPILF SRVFENLGLV SLQMKKPAQA KEYFEKSLRL NRNQ ( SEQ ID NO 843)
CELF38B0 AVAWMNLGIS QMNLKKYYEA EKSLKNSLLI RPNS(SEQ ID NO 844)
JT06030 PETHYNRGLA WERLGNVDQA IADYGRSIAL DRYY ( SEQ ID NO 845)
JT06031 YKAYYNRANS YFQLGQYAQA IADYNRVLVL RPDY ( SEQ ID NO 846)
S76422 AEAYYQRGLS HYRLEDWAEA VRDCTEAIRL RGDL ( SEQ ID NO 847)
CC23_YE0 RRIWQVLGEC YSKTGNKVEA IKCYKRSIKA SQTV(SEQ ID NO 848 )
CELF38B1 AHCLFNLGVL YQRTNRDEMA MSAWKNATRI DPSH(SEQ ID NO 849)
CC23_YE1 PETCCI IANY YSARQEHEKS IMYFRRALTL DKKT ( SEQ ID NO 850)
CELF38B2 EQALNNLGNL LEKSGDSKTA ESLLARAVTL RPSF(SEQ ID NO 851)
MEU8731 FEAWAALGKL YLAQDDKGRA LDAFRRAEAL YPQW(SEQ ID NO 852)
CELF38B3 IQVLLGVAKC WMIQKELDKA LESIEKAQEL AEGL ( SEQ ID NO 853)
S75615 AEAYKARGKI YWQLGEQAQA IADLEAALAL FSDQ ( SEQ ID NO 854)
CELF38B4 VLFHANLGIL YQRMSRHKEA ESQYRIVLAL DSKN ( SEQ ID NO 855)
CC27_YE0 YNAYYGLGTS AMKLGQYEEA LLYFEKARSI NPVN(SEQ ID NO 856)
S74806 AEWLSRGLL LEAMERKEEA IPSYEKALTL EPTL ( SEQ ID NO 857) CELF38B5 SRVHMQIGSC HAKHSNFTAA ENHIKSAIDL NPTS(SEQ ID NO 858)
CC27_YE1 AYAYTLQGHE HSSNDSSDSA KTCYRKALAC DPQH ( SEQ ID NO 859)
CELC56C VEALRQKGNE LFVQKDYKEA IDAYRDALTR LDTL ( SEQ ID NO 860)
F643990 IDLILKVAFT YFKLKKYKHA LKYFEKALKL NPNV(SEQ ID NO 861)
F643991 WKLWKNLGDK AYLWKAYYEA LFCYNKALEL NQNT ( SEQ ID NO 862)
MTCl_CO AEFYFRLGES YFAHHQYTFA VSYLEQAIDL FENN ( SEQ ID NO 863)
MM284231_l-255 TEAFYKISTL YYQLGDHELS LSEVRECLKL DQDH ( SEQ ID NO 864)
F643992 TELLCKKGYA LLKLYKRDLA IKYFEKASEK DRNN ( SEQ ID NO 865)
KNLC_CA AKQLNNLALL CQNQGKYEEV EKYYKRALEI YESK ( SEQ ID NO 866)
ATAF0 AQCFMHRASA YRSAGRIAES IADCNKTLAL DPSC(SEQ ID NO 867)
S585440 VKAFYRRALA HKGLKNYQKS LIDLNKVILL DPSI(SEQ ID NO 868)
S585441 LSIIFNRAAC YLKEGNCSGC IQDCNRALEL HPFS(SEQ ID NO 869)
A645120 AVILSNIAKS LYSRGLTNKV IEVYNKAIKI AEES ( SEQ ID NO 870)
A645121 CEAFLNLAKA YERLADFTKA LQYGKASLEH PSMD(SEQ ID NO 871)
ABO 01 AMSHMNIGIC YDELKEYKKA SQHLILALEI FEKS ( SEQ ID NO 872)
149564 PSALTNKGNT VFANGDYEKA AEFYKEALRN DSSC(SEQ ID NO 873)
OM70_NE PDIYYHRAQL HFIKGEFAEA AKDYQKSIDL DSDF ( SEQ ID NO 874)
ATU6213 AEAYCNMGVI YKNRGDLEMA ITCYERCLAV SPNF(SEQ ID NO 875)
S 75709 PAVWSNRGNS RVSQNKLDEA IADFNQAI EL APEQ ( SEQ ID NO 876)
CELF32A GRCYGSLGNT YYCLGDYDQS IHFHKLRLEL SQQY ( SEQ ID NO 877)
CELT09B0 AEQHNTNGKK CYMNKRYDDA VDHYSKAIKV NPLP(SEQ ID NO 878 )
CELT09B1 VKPLYFLGNV FLQSKKYSEA ISCLSKALYH NAVI (SEQ ID NO 879)
YCF3_OD SYTLYNIGLI YGNNGNYSQA LEYYHQALEL NSNL ( SEQ ID NO 880)
YHR7_YE PPTYYHRGQM YFILQDYKNA KEDFQKAQSL NPEN ( SEQ ID NO 881)
S SN6_YE PIFWCSIGVL YYQISQYRDA LDAYTRAIRL NPYI(SEQ ID NO 882)
FLBA_CAUCR-226 AVLWNTLGTV LCNIGDAAGS IVFFDESLRL APDF ( SEQ ID NO 883)
BIMA_EM YNAWYGLGTV YDKMGKLDFA EQHFRNAAKI NPSN(SEQ ID NO 884 )
CUT9_SC0 ASMCYLRGQV YTNLSNFDRA KECYKEALMV DAKC ( SEQ ID NO 885)
CC27_HU YNAWYGLGMI YYKQEKFSLA EMHFQKALDI NPQS(SEQ ID NO 886)
MMUTY0 AAFLYGLGLV YFYYNAFQWA IRAFQEVLYV DPNF ( SEQ ID NO 887 )
KNLC_ST AKTKNNLAAA YLKQGKYKAA ETLYKQVLTR AHER ( SEQ ID NO 888 )
S757090 AFAYNNRGNA EGGLGNWTSA LEDFQQATAI APNF ( SEQ ID NO 889)
C48583 VKAYLRRGIA YEGMEKWKLA LEDYTKAQSI SPGV(SEQ ID NO 890)
S757091 TDPYLNRGTA LEAKGEFKAA IADYNRVLAV NPED ( SEQ ID NO 891)
S759910 GTSLSNIGLA YNNLGEREKA LEYYQEALTM SRSV(SEQ ID NO 892)
S566581 ISYLTNRAAV YLEMGKFEDC IKDCEKAVER GKEL ( SEQ ID NO 893)
S759911 GAILSNIGYV YDGLGELNQA MDYYQQALVL RREI ( SEQ ID NO 894)
S566582 ADEAREKGNE LFKQQKYPEA TKHYTEAIKR NPKD ( SEQ ID NO 895)
LMU73840 VKGYARKGHA YFWTKQYNRA LQAYDEGLKV DPSN(SEQ ID NO 896)
S759912 VFALNRLGTI YSDFGEKSQA LAYYEQAI PL AQQL ( SEQ ID NO 897)
S566583 AQKEKEAGNA AYKKKDFETA IGHYSKALEL DDED ( SEQ ID NO 898 )
LMU73841 HTSYSNRAAA YIKLGAFNDA LKDAEKCIEL KPDF ( SEQ ID NO 899)
S759913 GIRLNNIALL YDSVGEKDEA LTYYKEALKI SQET ( SEQ ID NO 900)
LMU73842 ATELKNKGNE EFSAGRYVEA VNYFSKAIQL DEQN ( SEQ ID NO 901)
S759914 MTTINNLGVA YDNLGESEKA LQYYAEALAL GKSL ( SEQ ID NO 902) TPRD HU GELMKMKGNE EFSKERFDIA IIYYTRAIEY RPEN ( SEQ ID NO: 903)
LMU73843 ALALKEEGNK LYLSKKFEEA LTKYQEAQVK DPNN ( SEQ ID NO: 904)
S759915 GSVI SNIALV LDGLGQKAQA LEYYQQALEL ARLV ( SEQ ID NO: 905)
S60905 IEYLYEKAQL YRRLKDKENE LKYYNLALSI DENK ( SEQ ID NO: 906)
YHBM_HA ASVYNYLGLY LLLEEDYDGA LDAFNTVFEL DSGY ( SEQ ID NO: 907)
S759916 GATLNNIGSI YNALADRKKA IDFYQQALVL IRQA ( SEQ ID NO: 908)
LMU73844 AKQKKDEGNQ YFKEDKFPEA VAAYTEAIKR NPAE ( SEQ ID NO: 909)
S75633 ASSLNNLGYL AFLQGDLPTA LDYYQRALIL RQTT ( SEQ ID NO: 910)
CET12D8 AMLHAKRANV LLKLKRPVAA IADCDKAISI NPDS(SEQ ID NO: 911)
S58544 CAIYTNRALC YLKLCQFEEA KQDCDQALQL ADGN ( SEQ ID NO: 912)
C485830 AVYYTNRAAC HQQTHMYSLM VDDCNAAIEI DPAN ( SEQ ID NO: 913)
BSZ9404 WSAYNNLALA YFYSGNWKA KQTAYEVLSH NEGN ( SEQ ID NO: 914)
Y366_HA AKARVELALS YLQQNNPQLA KINLDKALQH DKNY ( SEQ ID NO: 915)
C485831 PEEAKQLGNS FFKDGKYDQA AEFYTRAIEL QTEP ( SEQ ID NO: 916)
NPRA_BA0 AECHILLGIC YRRYGEVDQA IECYSLAHKI AQII(SEQ ID NO: 917)
YC37_PO AKLYNMLGFI YYKADQKKLA KNFYERAIEV DGNY ( SEQ ID NO: 918)
N358_HU PKAHRFLGLL YELEENTDKA VECYRRSVEL NPTQ (SEQ ID NO 919)
S42210 TDVLRSAARF YYKTHDKDRA IQLLSQALEL LPNN ( SEQ ID NO: 920)
CC27_YE PETWCCIGNL LSLQKDHDAA IKAFEKATQL DPNF ( SEQ ID NO: 921)
HIBN_XE0 AQAHQKLGEV CIESENYSQA VEDFLACLNI QKEH ( SEQ ID NO: 922)
STI 1_YE0 ARGYSNRAAA LAKLMSFPEA IADCNKAIEK DPNF (SEQ ID NO: 923)
STI 1_YE1 ADEYKQQGNA AFTAKDYDKA IELFTKAIEV SETP(SEQ ID NO: 924)
CELF32A0 AQMAYSLANA LYIGKEVQKA ITYFQRHLKI ARSL ( SEQ ID NO: 925)
STI 1_YE2 SKSFARIGNA YHKLGDLKKT IEYYQKSLTE HRTA ( SEQ ID NO: 926)
LACALS ISIYQRIGDS YAQLGNFENA ISFLEKSLEF DEKP ( SEQ ID NO: 927)
CELR05F SKAWGRMGLA YSCQNRYEHA AEAYKKALEL EPNQ ( SEQ ID NO: 928)
G025400 AKASGNLGNT LKVLGNFDEA IVCCQRHLDI SREL ( SEQ ID NO: 929)
G025401 RRAYSNLGNA YIFLGEFETA SEYYKKTLLL ARQL ( SEQ ID NO: 930)
H64467 PLLYLYKGII LNKLGKYNEA IKYFDKVLEI NPNI(SEQ ID NO: 931)
G025402 SAIYSQLGNA YFYLHDYAKA LEYHHHDLTL ARTI ( SEQ ID NO: 932)
G025403 GRACWSLGNA YTALGNHDQA MHFAEKHLEI SREV(SEQ ID NO: 933)
PTSR_HU0 YLLWNKLGAT LANGNQSEEA VAAYRRALEL QPGY ( SEQ ID NO: 934)
G025404 GRAFGNLGNT HYLLGNFRDA VIAHEQRLLI AKEF ( SEQ ID NO: 935)
PTSR_HU1 IRSRYNLGIS CINLGAHREA VEHFLEALNM QRKS ( SEQ ID NO: 936)
G025405 TKQQIEKGLQ LYQANETGKA LEIWQQWER STEL ( SEQ ID NO: 937)
YCA1_PL0 PKGYIRKGCA EHGLRQLSNA EKTYLEGLKI DPNN (SEQ ID NO: 938)
PQ01800 PETWLNRGLA YYRQGQSQAA IADFDQLLQQ SPTD(SEQ ID NO: 939)
CEC34C61 VRARYNLGIS CMQLSSYDEA LKHFLSALEL QKGG ( SEQ ID NO: 940)
CEC34C62 PDLQNALGVL YNLNRNFARA VDSLKLAI SK NPTD ( SEQ ID NO: 941)
SSD900_122-109 ADAYYNRGYA KHVLGQYQAA ITDYNQAI SL NPEF ( SEQ ID NO: 942)
STI 1_YE SKGYNRLGAA HLGLGDLDEA ESNYKKALEL DASN ( SEQ ID NO: 943)
YCV0_YE PVGYSNKAMA LIKLGEYTQA IQMCQQGLRY TSTA ( SEQ ID NO: 944)
HSAB2370_1-631 FNCWESLGEA YLSRGGYTTA LKSFTKASEL NPES(SEQ ID NO: 945) S75648 ADAYNLRGVA YMVIEQYTDA LADFDQAIAL NPKD ( SEQ ID NO: 946)
CELT19A0 PLFHYVQGRM KLLLHDVDKA IQHLKDAMDK DPNN (SEQ ID NO: 947) CELT25F0 FEGNYNLGLV SFTQGKYHEC RELIEKALAA FPEH ( SEQ ID NO: 948)
CEF10B5 HPMLNNIGHI ARRQGRLNEA IMFYQKAIRM EPKF ( SEQ ID NO: 949)
CELT25F1 VTMLTGMARV QEALGEYDES VKLYKRVLDA ESNN ( SEQ ID NO: 950)
BSAB617_17-292 VQAVFSLTHI YCKEGKYDKA VEAYDRGIKS AAEW ( SEQ ID NO: 951)
CEM720 ATMLNVLAIV YRNQENFKDA AIYLEKALSI RVQC ( SEQ ID NO: 952)
YCV0_YE0 FEKQKEQGNS LFKQGLYREA VHCYDQLITA QPQN ( SEQ ID NO: 953)
YCV0_YE1 AMSQYHMAVA YRLLGRLGSA MECCEESMKI ALQH ( SEQ ID NO: 954)
CELC55B0 YQAIYRRATT YLAMGRGKAA IVDLERVLEL KPDF ( SEQ ID NO: 955)
CELC55B1 YGARIQRGNI LLKQGELEAA EADFNIVLNH DSSN(SEQ ID NO: 956)
HSU46570 AKLYCNRGTV NSKLRKLDDA IEDCTNAVKL DDTY ( SEQ ID NO: 957)
HSU46571 SILFSNRAAA RMKQDKKEMA INDCSKAIQL NPSY(SEQ ID NO: 958)
BSU5504 TKGLLNLGNC FNKQDDVAKA TVYYQRALEL GEKI ( SEQ ID NO: 959)
HSU46572 ADALYVRGLC LYYEDCIEKA VQFFVQALRM APDH ( SEQ ID NO: 960)
HSU46573 IRAILRRAEL YEKTDKLDEA LEDYKSILEK DPSI(SEQ ID NO: 961)
HSU46574 IKAYLRRAQC YMDTEQYEEA VRDYEKVYQT EKTK ( SEQ ID NO: 962)
LJGLN12 ASACNNLAEF YRIRKEFDKA EPLYLEAINI LEES (SEQ ID NO: 963)
HSU46575 LKAKKEDGNK AFKEGNYKLA YELYTEALGI DPNN ( SEQ ID NO: 964)
D504537 GRAYYNLGLC YYNQDLLDPA IDYFEKAVST FESS(SEQ ID NO: 965)
MMUTY ADTWCSIGVL YQQQNQPMDA LQAYICAVQL DHGH ( SEQ ID NO: 966)
CELD2020 SAAWTDLGEL YEKNAQYQDA LECFKNAMLN NPVA(SEQ ID NO: 967)
HSU46576 STRLKEEGNE QFKKGDYIEA ESSYSRALEM CPSC(SEQ ID NO: 968)
YB05_YE ESLYANRAAC ELELKNYRRC IEDCSKALTI NPKN ( SEQ ID NO: 969)
HUMFKBP0 VKCLNNLAAS QLKLDHYRAA LRSCSLVLEH QPDN ( SEQ ID NO: 970)
HSU46577 ASYYGNRAAT LMMLGRFREA LGDAQQSVRL DDSF ( SEQ ID NO: 971)
CEM72 AKQLTNLGIV TQQLEKYEET ENYFKQALSI YNRA ( SEQ ID NO: 972)
S75578 SYILYNMALI HASNGDHEKA LGLYQEAIEL NPKM ( SEQ ID NO: 973)
PSEPILF0 RDAYIQLGLG YLQRGNTEQA KVPLRKALEI DPSS(SEQ ID NO: 974)
S450640 AKSCANLGNI FKMKGAYNDA LTFTFKQLDF AEEL ( SEQ ID NO: 975)
BBCDG5 ARFFNLIGLE FFKLGQYGPA IEYFAKNLEI NPNN ( SEQ ID NO: 976)
140554 AKALLNLGDS YKNMGDQDRA LYFLIKAVDQ AKES ( SEQ ID NO: 977)
ACU89981 SRAFHRKGNA YMKMEKYAEA IDSYNRALTE HRNP ( SEQ ID NO: 978)
HSU20361 IKIMQNIGVT FIQAGQYSDA INSYEHIMSM APNL ( SEQ ID NO: 979)
S668420 EDVQLNLAHC YLEMREYGKA IENYELVLKK FDNE ( SEQ ID NO: 980)
HSU20362 GQSWYFLGRC YSCIGKVQDA FVSYRQSIDK SEAS (SEQ ID NO: 981)
ACU89983 ALEEKNKGNA AMSAGDFKAA VEHYTNAIQH DPQN ( SEQ ID NO: 982)
S748530 WPALNNIGLI EYEQGKIDTA LKRWQEVIKI DSEQ ( SEQ ID NO: 983)
HSU20363 CRVCCSLGSF YAQVKDYEKA LFFPCKAAEL VNNY ( SEQ ID NO: 984)
ACU89984 SKGYSRKGAA LCYLGRYADA KAAYAAGLEV EPTN ( SEQ ID NO: 985)
S748531 AGIKFTLGNA YFQKGQYDQA VEVLLAGLAQ RPDT ( SEQ ID NO: 986)
ACU89985 SQQEKEKGND CFRNAQYPDA IKHYTEAIRR NPTD ( SEQ ID NO: 987)
D63875 AEVRLGMGHC FVKLNKLEKA RLAFSRALEL NSKC(SEQ ID NO: 988)
ACU89986 MTYLTNLAAV YMEQKNYEEC VNTCTEAIEV GRRV ( SEQ ID NO: 989)
YCOA_SY TVALTALGLA ARAIGNYPEA IAAYQQALQL DPND ( SEQ ID NO: 990)
D644170 PVAYALLGQL YELLGNFDNA LECYEKSLGI EEKF ( SEQ ID NO: 991)
YCIM HA VRASLLLANL AMLDGQYQQA VKILENVLEQ NPDY ( SEQ ID NO: 992) D644171 IPAYIIKANM LRKLGRYEEA LACVNKVLEL KEND ( SEQ ID NO 993)
YLU28151 AEAWGRLGAC QAQNEKEDPA IRALERCIKL EPGN ( SEQ ID NO 994)
S74853 PAALFDLGNA YLKLVKYPDA VTAYQKALKA EEQF ( SEQ ID NO 995)
HGV2_HA AETYYNLGLA YSFEKRYDNA LEHYQSALDV LEAR (SEQ ID NO 996)
CELR05F0 PVYFCNRAAA YCRLEQYDLA IQDCRTALAL DPSY(SEQ ID NO 997)
YC37_CY ALIYNSLGYI CSAQEQYELA IEYYKKALFY I PDF (SEQ ID NO 998)
NCF2_HU AVAYFQRGML YYQTEKYDLA IKDLKEALIQ LRGN ( SEQ ID NO 999)
HUMFKBP IKALFRKGKV LAQQGEYSEA IPILRAALKL EPSN(SEQ ID NO 1000)
JC47510 VAVRLNLGLT YLKLCKPDKC IEFCRKVLDV FGNN ( SEQ ID NO 1001)
140567 AKALLNMGNS YNQMGSLFSA VPYYHKAIKA AKIS(SEQ ID NO 1002)
CEZK856 AIAYQQIATV YEQLGDHQKA LQFGLLASHL DPKT ( SEQ ID NO 1003)
S756480 PAIYFNRANV HGVLNNYQGA IDDCSQGILL DPQD ( SEQ ID NO 1004)
S756481 EEAHYFRGLA YAMVNNYERA LADLNRTIRL NPYN ( SEQ ID NO 1005)
LEPLIPL WGAKANLATY YFSAGDFEKS IKLYEEAMKL KDAD ( SEQ ID NO 1006)
S756482 VDLLICRGQA QLGLEQPRQA IPDFDRAIEL DPRS(SEQ ID NO 1007)
CUT 9_SC AATWANLGHA YRKLKMYDAA IDALNQGLLL STND ( SEQ ID NO 1008)
PS43_TOO AHVLLNIAKC WMTEKKLDKT LGWQKAEEL ADAV ( SEQ ID NO 1009)
RNU76551 ADSLNNLANI KREQGNIEEA VRLYRKALEV FPEF ( SEQ ID NO 1010)
S 66842 PDPRIGIGLC FWQLKDSKMA IKSWQRALQL NPKN ( SEQ ID NO 1011)
HSU58970 SVLYSNRAAC HWKNGNCRDC IKDCTSALAL VPFS(SEQ ID NO 1012)
RNU76552 PDAYCNLANA LKEKGSVAEA EDCYNTALRL CPTH(SEQ ID NO 1013)
JC4751 EKALFRMGQA HLLRNDHDEA LVYFKKIVAK NPNN ( SEQ ID NO 1014)
HSU58971 IKPLLRRASA YEALEKYPMA YVDYKTVLQI DDNV ( SEQ ID NO 1015)
RNU76553 YCVRSDLGNL LKALGRLEEA KACYLKAIET QPNF ( SEQ ID NO 1016)
HSU58972 VKAFYRRAQA HKALKDYKSS FADI SNLLQI EPRN ( SEQ ID NO 1017)
RNU76554 LDAYINLGNV LKEARIFDRA VAAYLRALSL SPNH(SEQ ID NO 1018)
YQGP_BA HASYYNLALL YAEKNELAQA EKAIQTAVKL KPKE ( SEQ ID NO 1019)
D907664 ARVSIMMGRV FMAKGEYAKA VESLQRVISQ DREL ( SEQ ID NO 1020)
ABO 010 GKYYKEIAEL YELEQNFEQA IIYFEKAADI YQSE ( SEQ ID NO 1021)
KNLC_HU AATLNNLAVL YGKRGKYKEA EPLCKRALEI REKV ( SEQ ID NO 1022)
S756010 FWSWYRQADC FRRWGHYAEA LQCYKKSLEI RPND ( SEQ ID NO 1023)
G02058 VQSLSALGFV YKLEGEKRQA AEYYEKAQKI DPEN ( SEQ ID NO 1024)
YQCH_BA0 PQAYHDLALI YFKQGKKGQA MDCFRKGIRS AVDF ( SEQ ID NO 1025)
S756011 YWAWYKRGIT LEHLGRYRDA AESYDNACQI QPEN ( SEQ ID NO 1026)
S756012 YWATYRLGES YRLSGQYEAA IASFRQTLTI RQDD ( SEQ ID NO 1027)
S760710 VIALNNLANV YEKKQMVNKA LETYQETLAI EPNN ( SEQ ID NO 1028)
CC23_YE TNAWTLMGHE FVELSNSHAA IECYRRAVDI CPRD(SEQ ID NO 1029)
S756013 YWAWFRQGEL WQQIYRYGRA IACYHRALDV EVDD ( SEQ ID NO 1030)
S760711 AQEYYQLGSI YLDKKLYSQS INLFQKALKM AEQV ( SEQ ID NO 1031)
CEC34C6 ARAWCKLGLA HAENEKDQLA MQAFQKCLQI DAGN ( SEQ ID NO 1032)
PTSR_PI 0 VDAWLKLGEV QTQNEKESDG IAALEKCLEL DPTN ( SEQ ID NO 1033)
PTSR_PI 1 VRARYNLGVS FINMGRYKEA VEHLLTGI SL HEVE ( SEQ ID NO 1034)
INI 6_HU LESLSLLGFV YKLEGNMNEA LEYYERALRL AADF (SEQ ID NO: 1035)
S75202 AELFGSMGYL YARQGQFAEA SRSFQQALRV NPNN (SEQ ID NO: 1036)
Y091 CA ARAHRSLGHL LLMDKKFEEA YKHLRRSLEL QPIQ(SEQ ID NO: 1037) CC16_YE0 ISIQLNLGHT YRKLNENEIA IKCFRCVLEK NDKN ( SEQ ID NO 1038)
CC16_YE1 PLVLNEMGVM YFKKNEFVKA KKYLKKALEV VKDL ( SEQ ID NO 1039)
CC16_YE2 SSLCFLRGKI YFAQNNFNKA RDAFREAI LV DIKN ( SEQ ID NO 1040)
CC16_YE3 AITWFSVATY YMSLDRI SEA QKYYSKSSIL DPSF(SEQ ID NO 1041)
ACSC_AC ADSLGGMGLV SMRQGDTAEA RRYFEEAMAA DPKT ( SEQ ID NO 1042)
HSU5251 TDVLRSAAKF YRRKGDLDKA IELFQRVLES TPNN ( SEQ ID NO 1043)
CC16_YE4 TEGYLNLARS NEKLCEFQKT ISYCKTCLNM QGTT ( SEQ ID NO 1044)
NCU89985_1-41 AIAFKNEGNK AFAAHDWPKA IEFYDKAIEL NDKE ( SEQ ID NO 1045)
CC16_YE5 AAAWLGFAHT YALEGEQDQA LTAYSTASRF FPGM ( SEQ ID NO 1046)
CYP6_YE AKALYRRGLA YYHVNDTDMA LNDLEMATTF QPND ( SEQ ID NO 1047)
D638750 VDCYLRLGAM ARDKGNFYEA SDWFKEALQI NQDH ( SEQ ID NO 1048)
S75685 AQAHCLLGMA YLKNNMMTMA KVHIGSAVKL DPND ( SEQ ID NO 1049)
S76576 PQLFYGWGLL ETSKGNYQSA LNHYDQALAL DPEF ( SEQ ID NO 1050)
D638751 VLPFFGLGQM YIYRGDKENA SQCFEKVLKA YPNN ( SEQ ID NO 1051)
D638752 IPALLGKACI SFNKKDYRGA LAYYKKALRT NPGC(SEQ ID NO 1052)
H644670 AIAWAEKGEI LYREGKLKKS LECFDNALKI NPKD ( SEQ ID NO 1053)
RLU3940 GRAYRELGFV ALYRRRFDES LEYFQQAQDL NPND ( SEQ ID NO 1054)
H644671 PDVYVRKARI LRTLGENDKA LEYFDKALKL KPKY ( SEQ ID NO 1055)
MMU1695 LAAFLNLAMC YLKLREYNKA VECCDKALGL DSAN ( SEQ ID NO 1056)
D638753 AESCYQLARS FHVQEDYDQA FQYYYQATQF ASSS(SEQ ID NO 1057)
1495640 GRLKVNMGNI YLKQRNYSKA IKFYRMALDQ IPSV(SEQ ID NO 1058)
OM70_NE0 ALAYNLRGTF HCLMGKHEEA LADLSKSIEL DPAM ( SEQ ID NO 1059)
H644672 CQSLLYKGEI LFKLGRYGEA LKCLKKVFER NNKD ( SEQ ID NO 1060)
D638754 SDVWLNLAHI YVEQKQYISA VQMYENCLRK FYKH ( SEQ ID NO 1061)
1495641 AQVLCQIANI YELMEDPNQA IEWLMQLISV VPTD(SEQ ID NO 1062)
H644673 IRALMYIIQI LIYLGRLNQA LEYTKKALKL NPDD ( SEQ ID NO 1063)
OM70_NE1 TQSYIKRASM NLELGHPDKA EEDFNKAIEQ NAED ( SEQ ID NO 1064)
D638755 VTTSYNLARL YEAMCEFHEA EKLYKNILRE HPNY ( SEQ ID NO 1065)
CELF38B AKIHYNLGKV LGDNGLTKDA EKNYWNAIKL DPSY(SEQ ID NO 1066)
JT0603 IPPYINRGNL YSQQQDHHTA IQDFTQAITY DPNR ( SEQ ID NO 1067)
YHBM_EC0 AQLLYERGVL YDSLGLRALA RNDFSQALAI RPDM ( SEQ ID NO 1068)
OM70_NE2 AAKLKELGNK AYGSKDFNKA IDLYSKAIIC KPDP(SEQ ID NO 1069)
1495642 SQALSKLGEL YDSEGDKSQA FQYYYESYRY FPSN(SEQ ID NO 1070)
OM70_NE3 PVYYSNRAAC HNALAQWEQV VADTTAALKL DPHY ( SEQ ID NO 1071)
PQ0180 YRAYYNRGLA YLDLAQPEQA IADFQQALER LPAT ( SEQ ID NO 1072)
2408032-184 ARAFGRLGRA KLSLGDAAAA ADAYKKGLDF DPNN ( SEQ ID NO 1073)
HSU5897 SATYSNRALC YLVLKQYTEA VKDCTEALKL DGKN ( SEQ ID NO 1074)
TAFKBP7 ISIKLNNAAC KLKLKDYKEA EKICSKVLEL ESTN ( SEQ ID NO 1075)
YC37_CY0 ILARINLARI FELKNLKTEA INMYQEVLIF DPKN ( SEQ ID NO 1076)
YC37_POO IVALNNLAKI YEDTKNILKA EALYDKVLNI AKSN ( SEQ ID NO 1077)
D643710 RDAWFNKALA LRILGRYEEA RECFFRGLAV EKHL ( SEQ ID NO 1078)
ATAC01 SSVHASLGKI YNQLKQYDKA VLHFGIALDL SPSP(SEQ ID NO 1079)
HSU2036 TEALYNIGLT YEKLNRLDEA LDCFLKLHAI LRNS ( SEQ ID NO 1080)
CELK04G1 AVAWSNLGCV FNSQGEIWLA IHHFEKAVTL DPNF ( SEQ ID NO 1081)
ACU8998 HVLYSNRAAC YMKLGRVPMA VKDCDKAIEL SPTF(SEQ ID NO 1082) ATAC02 TYAHTLCGHE FAALEEFEDA ERCYRKALGI DTRH ( SEQ ID NO 1083)
YHE3_PS PAILDSMGWI NYRQGKLADA ERYLRQALQR YPDH ( SEQ ID NO 1084)
CELK04G2 AWHGNLACV YYEQGLIDLA IDTYKKAIDL QPHF ( SEQ ID NO 1085)
SC72_YE PDVFVRKADC LLKLRQWEEA RATCERGLAL APED (SEQ ID NO 1086)
CELK04G3 AAAHSNLASI LQQQGKLNDA ILHYKEAIRI APTF ( SEQ ID NO 1087)
CELK04G4 ADAHSNLASI HKDAGNMAEA IQSYSTALKL KPDF ( SEQ ID NO 1088)
CELK04G6 IDAYINLAAA LVSGGDLEQA VTAYFNALQI NPDL ( SEQ ID NO 1089)
JC4775 VNALKDRAEA YLIEEMYDEA IQDYETAQEH NEND ( SEQ ID NO 1090)
ATU62130 AEACNNLGVL YKDRDNLDKA VECYQMALSI KPNF ( SEQ ID NO 1091)
ATU62131 AEAFNNLGVL YRDAGNITMA IDAYEECLKI DPDS(SEQ ID NO 1092)
ATU62132 APAYYNLGW YSEMMQYDNA LSCYEKAALE RPMY ( SEQ ID NO 1093)
ATU62133 VEAHIGKGIC LQTQNKGNLA FDCFSEAIRL DPHN ( SEQ ID NO 1094)
YCA1_PL AQRLKELGNK CFQEGKYEEA VKYFSDAITN DPLD ( SEQ ID NO 1095)
ATU62134 AQSLNNLGW YTVQGKMDAA ASMIEKAILA NPTY ( SEQ ID NO 1096)
ATU62135 ADAMYNLGVA YGEMLKFDMA IVFYELAFHF NPHC(SEQ ID NO 1097)
RNU7655 AEAYSNLGNV YKERGQLQEA IEHYRHALRL KPDF (SEQ ID NO 1098)
ATU62136 AIVLTDLGTS LKLAGNTQEG IQKYYEALKI DPHY ( SEQ ID NO 1099)
PS43_TO TEAYLNLARG HEKLCEFSEA VAYCRTCLGA EGGP ( SEQ ID NO 1100)
D82942 FNVRFRLGVA LDNLGRFDEA IDSFKIALAC VPMK ( SEQ ID NO 1101)
YPU2283 SRGLASAARA YRNEKRWDQA LALWQSSLKK DPTN ( SEQ ID NO 1102)
CEUC55B6_3-59 VAKHLELGSQ FLARAQFADA LTQYHAAIEL DPKS(SEQ ID NO 1103)
S755780 AFVYYRDGMS AQADGEYAEA LDNYEEALRL EENP ( SEQ ID NO 1104)
D86980 FKALYRSGVA FYHLGDYDKA LYYLKEARTQ QPTD ( SEQ ID NO 1105)
D86981 AKHYGNLGRL YQSMRKFKEA EEMHIKAIQI KEQL ( SEQ ID NO 1106)
H643320 TSILRQKASI LEILGKLDEA LDCVNKILSI KKDD ( SEQ ID NO 1107)
S55383 VKALYRRAQA YTQLADLELA EVDIKKALEI DPEN ( SEQ ID NO 1108)
YAD5_CL GISYENLAWF YYLTGKYDKA IENFEKAI SM GSTN ( SEQ ID NO 1109)
PPT1_YE SIYFSNRAFA HFKVDNFQSA LNDCDEAIKL DPKN ( SEQ ID NO 1110)
CELC33H AQRLKEQGNE AFKKKKYHKA MTIYSKSLEH WPDP(SEQ ID NO 1111)
CELC17G AVLYFNRAAA QKHLGNLRSA IKDCSMGRKF DPTH ( SEQ ID NO 1112)
TPRD_HU0 CLAYCGIGKV YLKKNRFLEA LNHFEKARTL IYRL ( SEQ ID NO 1113)
D64417 ASLWYFKGKL YEKQNKFEEA LKYYNKAIQL MPHH ( SEQ ID NO 1114)
DMU18291 VYSYTLLGHE LVLTEEFDKA MDYFRAAWR DPRH ( SEQ ID NO 1115)
MXU7705 FIAQGNLGWA YYKKGEPDRA VESIKAAVTT NPNF ( SEQ ID NO 1116)
RSMGPGNO GYAHLTIALA HLGMSQFQQC LESFESAMNV ANET ( SEQ ID NO 1117)
CEF52H3 PSAYNNRAQA YRLQNKPEKA LDDLNEALSL AGPK ( SEQ ID NO 1118)
CYP4_BO TKALYRRAQG WQGLKEYDQA LADLKKAQEI APED (SEQ ID NO 1119)
I EFS_HU0 AKAYARIGNS YFKEEKYKDA IHFYNKSLAE HRTP ( SEQ ID NO 1120)
S76202 TSLQLLLGEI YLSQNRPDQA IAIYEAASKV NGND ( SEQ ID NO 1121)
I EFS_HU2 IKGYTRKAAA LEAMKDYTKA MDVYQKALDL DSSC(SEQ ID NO 1122)
F64399 YKALFGLGKS YYLMSDNKNS IKYFEKVLEL NPND ( SEQ ID NO 1123)
IEFS_HU3 MTYITNQAAV YFEKGDYNKC RELCEKAIEV GREN ( SEQ ID NO 1124)
SSN6_YE0 AKALTSLAHL YRSRDMFQRA AELYERALLV NPEL ( SEQ ID NO 1125)
YHR7_YE0 AVQLKNRGNH FFTAKNFNEA IKYYQYAIEL DPNE ( SEQ ID NO 1126)
I EFS HU4 GKGYSRKAAA LEFLNRFEEA KRTYEEGLKH EANN ( SEQ ID NO 1127) SSN6 YE1 SDVWATLGHC YLMLDDLQRA YNAYQQALYH LSNP(SEQ ID NO 1128)
S SN6_YE2 WDIWFQLGSV LESMGEWQGA KEAYEHVLAQ NQHH ( SEQ ID NO 1129)
S SN6_YE3 PKLWHGIGIL YDRYGSLDYA EEAFAKVLEL DPHF ( SEQ ID NO 1130)
HEMY_EC PLLWSTLGQS LMKHGEWQEA SLAFRAALKQ RPDA ( SEQ ID NO 1131)
S SN6_YE4 ATTWYHLGRV HMIRTDYTAA YDAFQQAVNR DSRN ( SEQ ID NO 1132)
S SN6_YE5 NEIYFRLGII YKHQGKWSQA LECFRYILPQ PPAP(SEQ ID NO 1133)
RSMGPGN FGALAGLGTM LEQMDRPKPA LEAYRAALAI HPHL ( SEQ ID NO 1134)
CYP7_YE AKAYYRRGNS YLKKKRLDEA LQDYIFCKEK NPDD ( SEQ ID NO 1135)
S76685 AIAFFNLAMI YKAQGNLLEA IKGYQEAITL QPDY ( SEQ ID NO 1136)
A55346 AEELKTQAND YFKAKDYENA IKFYSKAIEL NPSN(SEQ ID NO 1137)
CELC18H FRALGCLITA HSEMGKYEDM LRFAVAQSEA ARQM ( SEQ ID NO 1138)
S761560 VWAYINRGVA YYDLGCHQKS LDDYNQALVI DSKC(SEQ ID NO 1139)
S761561 VSSYILRAEI HNLLGDFDAA IEDCFTSLRL NPMH ( SEQ ID NO 1140)
NAS P_HU AETHYQLGLA YGYNSQYDEA VAQFSKSIEV IENR ( SEQ ID NO 1141)
S761562 HEHLFKLGVE QYILGNISDA IDSFTKALEF NDRF ( SEQ ID NO 1142)
CELF10C0 AALWVLIGHE FMEMKNNAAA CVSYRRAI El DPAD ( SEQ ID NO 1143)
S761563 DNTLYILGSS FLSLEKYQSA IDYLDQSLAL NGNL ( SEQ ID NO 1144)
C64478 PIDWFNLAYA LYHLEKYDSA LEAINEALKI SPSN(SEQ ID NO 1145)
CEF10B50 VDAIASTALC YAVLGNI DKA TEFFNKALAI DPFN ( SEQ ID NO 1146)
ATAC98_24-445 VDALYNLGGL YMDLGRFQRA SEMYTRVLTV WPNH(SEQ ID NO 1147)
A565190 ASTYSAIGYI HSLMGNFENA VDYFHTALGL RRDD ( SEQ ID NO 1148)
E644170 VALWYFKXEL YERLGKLDEA LKCYEKVIEL QPHY ( SEQ ID NO 1149)
E644171 PITWVFVGQL YGMSGNCDEA LKCYNKALGI ENRF ( SEQ ID NO 1150)
E644172 IKALLSKARI YERXGNIEAA IEYYNKAVEN IHKD ( SEQ ID NO 1151)
CELF10C HRGWYGLGQM YDIMKMPAYA LFYYQEAQKC KPHD ( SEQ ID NO 1152)
E644173 YLALFLKGLA LSAKGEIKEA ITTFEELLSY ESKN ( SEQ ID NO 1153)
CYP4_BOO LSCVLNIGAC KLKMSDWQGA VDSCLEALEI DPSN(SEQ ID NO 1154)
CYP4_B01 SEDLKNIGNT FFKSQNWEMA IKKYTKVLRY VEGS ( SEQ ID NO 1155)
MMU2783 VNELKEKGNK ALSAGNIDDA LQCYSEAIKL DPQN ( SEQ ID NO 1156)
S752020 FNELLRQGKA YVDNGNFPQA IAIYQQAAML DGEN ( SEQ ID NO 1157)
ENAC0 LETMNNMALA YEGLGKYKEA EEIWAQLLEL ANEC ( SEQ ID NO 1158)
PFCYT12 AEGLYFLGRA YMAQDRSADA AKVFERAVAL AGRQ ( SEQ ID NO 1159)
CC16_YE SEIHCSLGYL YLKTKKLQKA IDHLHKSLYL KPNN ( SEQ ID NO 1160)
BSATPC1 GSALYNIGNC YDDKGELDQA AEYFEKALPV FEDY ( SEQ ID NO 1161)
ECAE00 ARGYAAVAVA YRNLQQWQNS LTLWQKALSL EPQN ( SEQ ID NO 1162)
ECAE01 AGGYYGLARL AARNGNFDAG LDFCQQSLRA CPTN(SEQ ID NO 1163)
ECAE02 WFARHLLACF YYNKRSYNKA IAFWQRCVEM SPEF(SEQ ID NO 1164)
YAD5_CL0 NSVYRSLGIT YAKIGDYKKS EEYLKKALDA EPEK ( SEQ ID NO 1165)
PRO 6_YEAST-701 HKFFLQLGQI YHSMGNIEMS RETYLSGTRL VPNC(SEQ ID NO 1166)
YAD5_CL1 INDVYSFALS YHILGEPERA LKYFLRAVEL QPNV(SEQ ID NO 1167)
YPIA_BA VNVHQRLAES LSASGEFEDA IPWYEKAVDE NPDP(SEQ ID NO 1168)
NCF2_HU0 SRICFNIGCM YTILKNMTEA EKAFTRSINR DKHL ( SEQ ID NO 1169)
TPU93844_2-211 SVAYFYLGEI FRLTKRFRKA DIAYSTAVNF EPSN(SEQ ID NO 1170)
SKI 3 YE VESWVGLGQA YHACGRIEAS IKVFDKAIQL RPSH(SEQ ID NO 1171)
YHJL EC SEALGALGQA YSQKGDRANA VANLEKALAL DPHS(SEQ ID NO 1172) ECAE0 ASLYYYRALV HFHNEKIEEA RICIDKSLQL EPRR ( SEQ ID NO 1173)
C644780 SRWWYVKGYI YYKLGNYKDA YESFMNALRV NPKD ( SEQ ID NO 1174)
OM70_YE ALALKDKGNQ FFRNKKYDDA IKYYNWALEL KEDP ( SEQ ID NO 1175)
C644781 SLYYAKKGDI LYKLGDEEGA IEAYNKAIKL NSQN ( SEQ ID NO 1176)
C644782 YEDWVTEANY YLDEGIYDKA VECYLKALEK KNTN ( SEQ ID NO 1177)
C644783 LNALFKAGKI YLLFGDIDKA YDAFNEILQQ NPSH(SEQ ID NO 1178)
A64512 AIALSEIAKA QYNIGMHDEA LKNYDKAI FI TEGV ( SEQ ID NO 1179)
LMU7384 AKGYVRRGAA LHGMRRYDDA IAAYEKGLKV DPSN(SEQ ID NO 1180)
C644784 ISTLKSLAIV LEKSGKIDEA ITTYTKILKI VNSL ( SEQ ID NO 1181)
PTSR_HU PDVQCGLGVL FNLSGEYDKA VDCFTAALSV RPND ( SEQ ID NO 1182)
HSU4657 AETFKEQGNA YYAKKDYNEA YNYYTKAIDM CPKN(SEQ ID NO 1183)
CELK04G ADAYSNMGNT LKEMGDSSAA IACYNRAIQI NPAF ( SEQ ID NO 1184)
YKD1_CA0 PEMLNNVGAL YMSMKQYEKA EHHFKRAKER LEEQ ( SEQ ID NO 1185)
YKD1_CA1 TLAHYGLGQM YIHRNEIEEA IKCFDTVHKR LPNN ( SEQ ID NO 1186)
S619910 FRGYSRLGFA KYAQGKPEEA LEAYKKVLDI EGDN ( SEQ ID NO 1187)
S619911 AEDLKMQGNK AMANKDYELA INKYTEAIKV LPTN ( SEQ ID NO 1188)
YKD1_CA2 AEAFYQMGRC RHAQGQFDGA YKYYYQARQA NNGE ( SEQ ID NO 1189)
SR72_CA AIIHGQMAYI LQLQGRTEEA LQLYNQIIKL KPTD ( SEQ ID NO 1190)
CELT09B PKYYQNRAMC YFQLNNLKMT EEDCKRALEL SPNE(SEQ ID NO 1191)
MXU77058_2-100 PDANNAMGIL LHLAFRRPDE AVKHYTKALE VRPD(SEQ ID NO 1192)
NRFG_EC SEQWALLGEY YLWQNDYSNS LLAYRQALQL RGEN ( SEQ ID NO 1193)
NRFF_HA AETWLQLGEA YVQNNEFDSA LVAYSNAEKL SGSK(SEQ ID NO 1194)
CELT25F ADWYNIGQI LVDIGDLVSA ARSFRIALSH DPDH ( SEQ ID NO 1195)
S45064 GQVSLSMGNA FLGLSVFQKA LESFEKALRY AHNN ( SEQ ID NO 1196)
S76315 YLRYRALATT YRDLEKYDLA MANIDRALAI TPDN ( SEQ ID NO 1197)
CEAF16427_7-686 VTVLQNIALA EFHMQNYNRS LLFYRKALHL DPTH ( SEQ ID NO 1198)
BIMA_EM0 ANVHYLLGKL YKMLRDKGNA IKHFTTALNL DPKA ( SEQ ID NO 1199)
BIMA_EM1 AAVLCLQGKL WQAHKEHNKA VECYAAALKL NPFM ( SEQ ID NO 1200)
S76156 FNGFYNRGLA YYKMGQIKKA IKDFSDALIV KPTF ( SEQ ID NO 1201)
BIMA_EM2 AVLICCIGLV LEKMNNPKSA LIQYNRACTL APHS(SEQ ID NO 1202)
BIMA_EM3 AYGFTLQGHE YVANEEYDKA LDAYRSGINA DSRH ( SEQ ID NO 1203)
NUC2_SC1 SVLITCIGMI YERCKDYKKA LDFYDRACKL DEKS ( SEQ ID NO 1204)
JC4348 GRVYNLLGNI SFDQTNLKKA LKYYQIAYQL FDGN ( SEQ ID NO 1205)
LBAPREL AMTLNNMGEL YHYKNEHDKS KSYFLQSFQL QQTV ( SEQ ID NO 1206)
Y091_CA0 LGTWFNAGYC AWKLENFKES TQCYHRCVSL QPDH ( SEQ ID NO 1207)
PTSR_YE PEIQLCLGLL FYTKDDFDKT IDCFESALRV NPND ( SEQ ID NO 1208)
A60088 ALCLLCFADI HRHRSDIGKA LPRYESSLNI MTEI ( SEQ ID NO 1209)
BSU55040 VQACYDLAII FFKQGRKEEA LDYCVKGKSY AEKY ( SEQ ID NO 1210)
ATAC0 PESWCAVGNC YSLRKDHDTA LKMFQRAIQL NERF ( SEQ ID NO 1211)
D64371 AEYYYKKGVE VGNKGDVEKA LEYFNKAIEL NPFY ( SEQ ID NO 1212)
H64332 AHAWYLKGRI LKKLGNIKEA LDALKMAINL NENL ( SEQ ID NO 1213)
S748060 TSAWQNRGSA LGVMGKLEEA LANFDEALAQ NPDD ( SEQ ID NO 1214)
S748061 AEDWLNLGIQ QAQRGELETA IASWGEAISL NPQM ( SEQ ID NO 1215)
YREC SY0 LNALLEQGNE QLTNRNFAQA VQHYRQALTL EANN ( SEQ ID NO 1216)
YREC SY1 LAYSLGLATV QFRAGDYDQA LVAYRKVLAK DSNN ( SEQ ID NO 1217) YREC_SY2 AEFFNALGFN LAQSGDNRSA INAYQRATQL QPNN ( SEQ ID NO 1218) NPRA_BA YYYYKFLGLL YYCKEKYEDA LEYYKKAEQR FRSQ ( SEQ ID NO 1219) JC47750 TAARLQRGHL LLKQGKLDEA EDDFKKVLKS NPSE(SEQ ID NO 1220) G020580 AFAYTDLANM YAEGGQYSNA EDIFRKALRL ENIT ( SEQ ID NO 1221) PPP5_RA IKGYYRRAAS NMALGKFRAA LRDYETWKV KPND ( SEQ ID NO 1222) YQCH_BA ATAFFNLGNC YHKMDNLNKA ARYI EQALVQ YRKI ( SEQ ID NO 1223) CC27_HU2 SVLLCHIGW QHALKKSEKA LDTLNKAIVI DPKN ( SEQ ID NO 1224) S61991 AIYYANRAAA HSSLKEYDQA VKDAESAISI DPSY(SEQ ID NO 1225) YREC_SY ARIHGALGYA LSQLGNYSEA VTAYRRATEL EDDN ( SEQ ID NO 1226) CC27_HU3 CFTLSLLGHV YCKTDRLAKG SECYQKSLSL NPFL ( SEQ ID NO 1227) CC27_HU4 SLVYFLIGKV YKKLGQTHLA LMNFSWAMDL DPKG ( SEQ ID NO 1228) Consensus/ 60% spsbhpbG.h abpbscbppA lphapcAlpl sspp(SEQ ID NO 1229)
Table 1 1 Alignment of TPR Repeat Sequences
Table 12 Table 12 (CONT)
Figure imgf000094_0001
Table 12 (CONT) Table 12 (CONT)
Schematic representation of components used to build the CTPR/RTPR constructs.
CTPR: Consensus-designed tetratrico peptide sequence.
RTPR: As above but with the lysine residues replaced with arginine residues.
LRH1 : a beta-eaten in binding sequence from the protein LRH1
Phospho: a beta-eaten in binding sequence from the protein APC (Adenomatous polyposis coli).
AXIN: an alpha-helical beta-catenin binding sequence from the protein AXIN.
BCL9: an alpha-helical beta-catenin binding sequence from the protein BCL9.
SOS: an alpha-helical KRAS-binding sequence from the protein SOS1 (Son of sevenless homolog 1). P27v1 : a degron sequence from the protein p27 that binds the E3 SCFSkp2.
P27v2: a degron sequence from the protein p27 that binds the E3 SCFSkp2.
Puc: a degron sequence from the protein Puc that binds the E3 Cul3-SPOP.
NRF2v1 : a degron sequence from the protein NRF2 that binds the E3 Cul3-KEAP1.
NRF2v2: a degron sequence from the protein NRF2 that binds the E3 Cul3-KEAP1.
PHYL: a degron sequence from the protein PHYL that binds the E3 SIAH.
Trib: a degron sequence from the protein Trib that binds the E3 COP1.
PAM2: a degron sequence from the protein PAM2 that binds the E3 UBR5.
CDC25B: a degron sequence from the protein CDC25B that binds the E3 beta-TRCP.
P53: an alpha-helical degron sequence from the protein p53 that binds the E3 MDM2.
Figure imgf000097_0002
Figure imgf000097_0001
Figure imgf000098_0001
Figure imgf000099_0001
Figure imgf000100_0001
Figure imgf000101_0001
Figure imgf000102_0001
Figure imgf000103_0001
Table 13 - Amino acid sequences for the CTPR/RTPR constructs
All constructs were cloned into pCDNA3.1
Bold: Target-binding peptide
Bftfd JU^UDdfidia^dasbidibsaindLDegron (E3 ligase engager)
Underlined solid: HA Tag
H ^I^ί Iha b ίNtί Iΐίhu^- res 1 to 2 (before first CTPR repeat) SEQ ID No 1230
LOOP Grafting site 2 - res 35 to 36 SEQ ID No 1230 (Between first and second CTPR modules - DPNN sequence is placed before and after the targeting/deg ron peptides to facilitate the loop structure)
LOOP Grafting site 3 - res 69 to 70 SEQ ID No 1230 (Between second and third CTPR modules - DPNN sequence is placed before and after the targeting/deg ron peptides to facilitate the loop structure)
kQP.P. _G raftinjgi s ite 4 - res 103 to 104 SEQ ID No 1230(Between third and fourth CTPR modules - DPNN sequence is placed before and after the targeting/deg ron peptides to facilitate the loop structure)
Loop G.rafting_site_5 - res 137 to 138 SEQ ID No 1230 (Between fourth and fifth CTPR modules - DPNN sequence is placed before and after the targeting/deg ron peptides to facilitate the loop structure)
Logp._Graft|rig_.site .6 - res 177 to 178 SEQ ID No 1230 (after the last CTPR module) 004242/1-30 NGHTALHIAA SK-GDE QCVKLLLEHG A-DPNA (SEQ ID NO 1305) 093130/1-30 DGYTPLHYAI KY-NTI RIAKYLLDNG A-DMTL (SEQ ID NO 1306) VB18_VARV/ 1-32 TGYTALHCYL YNN-YFTN DVLKVLLNHG V-DVTI (SEQ ID NO 1307) Q25328/1-30 DKDTALHLAV YY-KNL QMIKLLIKYG I -DVTI (SEQ ID NO 1308) Q14349_1/ 1-30 GGWTPIIWAA EH-KHI EVIRMLLTRG A-DVTL (SEQ ID NO 1309) BCL3_HUMAN/ 1-30 SGRS PLIHAV EN-NSL SMVQLLLQHG A-NVNA (SEQ ID NO 1310) Q24241_4/ 1-31 DGLTPLHCAS RS-GHV EVIKHLLQQN AP- ILTK (SEQ ID NO 1311) 043150/1-30 KGSTALHYCC LT-DNA ECLKLLLRGK A- SIEI (SEQ ID NO 1312) G3790744/1-30 NKQTSLHYAC SK-NHV EIVKLLIEAD P-NUN (SEQ ID NO 1313) Q25328_3/ 1-30 EKYTPLHLAA MS-KYP ELIQILLDQG S-NFEA (SEQ ID NO 1314) GLP1_CAEEL/ 1-30 KGRTALHYAA MH-DNE EMVIMLVRRS S-NKDK (SEQ ID NO 1315) YA2A_SCHPO/ 1-31 DKMTPLHWSI VG-GNL KCMKLILKEG GI- PCTA (SEQ ID NO 1316) VBO 4_VACCC/ 1-30 NRYTPLHYVS CR-NKY DFVKLLISKG A-NVNA (SEQ ID NO 1317) HT16_HYDAT/ 1-33 GLDTRLHLAC EE-KNP NTVKELLQDS VIK-ENVNA (SEQ ID NO 1318) 004703_1/ 1-30 HRSTAIHVAI EE-NHL EMVKLLLLNG D-EIDD (SEQ ID NO 1319) G3790744_l/l-30 EGNTALHLAC DE-NRG DVAILLVNRG A-DMKM (SEQ ID NO 1320) Q25338_l/ 1-30 GNLTVLHLAV ST-GQI NIIKELLKRG S-NIEE (SEQ ID NO 1321) Q21920/ 1-30 TLETPLTIAC AN-GHK DIVELLLKEG A-NIEH (SEQ ID NO 1322) 016568/1-30 DGKTCLHVAI EN-LHK ETIELMIQSG A-DKYA (SEQ ID NO 1323) Q21920_1/ 1-30 TGDTPLSLAA RN-GYI AIMKMLIEKG G-DLTA (SEQ ID NO 1324) 082630/1-30 DGRSPLHLAA SR-GYE DITLYLIQES V-DVNI (SEQ ID NO 1325) 000542/1-31 QSRTLLHHAV ST-GSK DWRYLLDHA PP-EILD (SEQ ID NO 1326) Q25338_2/ 1-30 DGSTALHLAV SG-RKM KTVETLLNKG A-NLKE (SEQ ID NO 1327) Q86916/ 1-32 NGKNLLHMYM CNF-NVRI NVIKLLIDSG V-NFLQ (SEQ ID NO 1328) Q23595/1-30 IGQNALHMAA RV-GNL NILKYLLDRL P-ELRD (SEQ ID NO 1329) Q21587/ 1-30 IGCSALHLCA EH-GHY RMIKLLLQYM K-WEQ (SEQ ID NO 1330) P87621/1-34 DGNTPLHIVC SKT-VKNV DIIDLLLPST DV—NKQNK (SEQ ID NO 1331) YD57_SCHPO/ 1-33 AGWTPLMI SI NNR-SVPD NVIEELINRS DV-DPTI (SEQ ID NO 1332) 004097_1/ 1-30 HGRTPLHHCI SS-GNH KFAKILLRRG A-RPSI (SEQ ID NO 1333) P87621_2/ 1-30 YDSTDFKMAV EV-GSI RCVKYLLDND I - ICED (SEQ ID NO 1334) Q84644_l/ 1-30 YGNTPLAIAI ST-GQC DMVKLLMNYV T-NIFD (SEQ ID NO 1335) 017055_1/ 1-31 YGDTALHLSC YS-GRL DIVKSILDSS PT-NIVN (SEQ ID NO 1336) VB18_VARV_1 / 1-3 CGNTPFHLYL SIE—MCNNI HMTKMLLTFN P-NFEI (SEQ ID NO 1337) HRGjCOWPX/1-32 NYCTALQYYI KSS-HIDI DIVKLLMKGI D-NTAY (SEQ ID NO 1338) 055222_1/ 1-30 HGFSPLHWAC RE-GRS AWEMLIMRG A-RINV (SEQ ID NO 1339) Q28282/ 1-30 EGRSAFHWA SK-GNL ECLNAILIHG V-DITT (SEQ ID NO 1340) G3929219_3/l-30 DFMTPLHVAA ER-AHN DVMEVLHKHG A-KMNA (SEQ ID NO 1341) BCL3_HUMAN_1/1- DGDTPLHIAV VQGN-LPAVH RLVNLFQQGG R-ELDI (SEQ ID NO 1342) BCL3_HUMAN_2 / 1- HGQTAAHLAC EH-RS P TCLRALLDSA AP—GTLDL (SEQ ID NO 1343) 073630/1-30 HGLSPVHWSV KM-KNE KCLVLLVKAG A-NVNS (SEQ ID NO 1344) Q18297_2/ 1-30 DQNTPMHIVA SN-GYL EMMQLLQKHG A- SITQ (SEQ ID NO 1345) Q94527/ 1-31 DGLTPLHMAI RQ-NKY DVAKKLISYD RT- SISV (SEQ ID NO 1346) 090757/1-30 KGYTALYYTI CN-NNY DMVCFLLEKN A-DISI (SEQ ID NO 1347) HT16 HYDAT 2/1- NGRTPVQWC FY-NHA STLHLLISKG SA-DFLK (SEQ ID NO 1348) P87609_1/ 1-31 NGYTCIAIAI NE-SRNI ELLKMLLCHK P-TLDC (SEQ ID NO: 1349) O90757_l/ 1-30 GGRTSLHLAI KE-RNY EAAFVLINNG A-NVDS (SEQ ID NO: 1350) Q20109_l/ 1-31 MGWSPLMWAV YK-NHL DWDLLVNAK VN-ACDK (SEQ ID NO: 1351) SWI6_YEAST_1/1- HGNTPLHWLT SI-ANL ELVKHLVKHG S-NRLY (SEQ ID NO: 1352) P93755/ 1-31 DYRTPLMVAA TY-GSI DVIKLIVSLT DA-DVNR (SEQ ID NO: 1353) Q19995/ 1-31 AGRTPLHYAA AD-PNGE HMIKVLQKSG G-DAFI (SEQ ID NO: 1354) 048738/1-30 DGFTPIHMAA KE-GHV RIIKEFLKHC P-DSRE (SEQ ID NO: 1355) Q89202/ 1-30 ILLSCIHITI KN-GHV DMMILLLDYM T- STNT (SEQ ID NO: 1356) G3927831_l/l-30 HKQTPLMLAA MY-GRI SCVKKLAEVG A-NILM (SEQ ID NO: 1357) FEM1_CAEEL_3/1- HGHTCLMIAS YR-NKV GIVEELLKTG I -DVNK (SEQ ID NO: 1358) Q18297_3/ 1-30 EGKTAFDIAC EN-DHK DVARAFLETD Q-WKNL (SEQ ID NO: 1359) Q38898_1/ 1-30 HGVTALQVAM AE-DQM DMVNLLATNG A-DWC (SEQ ID NO: 1360) HRG_COWPX_2/l-3 NDFNIFTYMK SK-NVDI DLIKVLVEHG F-DFSV (SEQ ID NO: 1361) 043988/1-30 EGYTPIHYAV RE- SRI ETVKFLIKFN S-KLNI (SEQ ID NO: 1362) 043988_1/ 1-30 QGYTPLYLAA KA-GKT NFVKYLLSKG R- SKKI (SEQ ID NO: 1363) AKR1_YEAST/ 1-30 EGFTPLHWGT VK-GQP HVLKYLIQDG A-DFFQ (SEQ ID NO: 1364) Q21920_5/ 1-30 TKDTALTI SA EK-GHE KFVRMLLNGD A-AVDV (SEQ ID NO: 1365) P87621_3/ 1-32 QHKTPLYYLS GTD-DEVI ERINLLVRYG A-KINN (SEQ ID NO: 1366) 018270_1/ 1- 30 EGRSCLMWAV CA-GNI EVINYLIQRE D-APKR (SEQ ID NO: 1367) VC17_VACCC/ 1-33 RGNNALHCYV SNK—CDTDI KIVRLLLSRG V-ERLC (SEQ ID NO: 1368) VCO 9_VACCC/ 1-30 HGCSILYHCI KS-HSV SLVEWLIDNG A-DINI (SEQ ID NO: 1369) YIA1_YEAST/ 1-30 FDNSPLFLAS LC-GHE AWKLLLQRG A-VCDR (SEQ ID NO: 1370) Q89540_1/ 1-32 CGRTPLHAYV QYD-GVRP EWALMLEAG A-DWC (SEQ ID NO: 1371) Q25338_5/ 1-31 NNWTPLHFAI YF-KKE DAAKELLKQD DI-NLTI (SEQ ID NO: 1372) 017055_3/ 1-30 HLLPALHLAA MI-GDS EMLTILLNSG A-NIHV (SEQ ID NO: 1373) 044872/1-30 LGRNALLIAI EN-ENI EMIELLLDHN I -ETGD (SEQ ID NO: 1374) 1790447/1-30 ARDTGLFMAM QR-GHM NVINTIFNAL P-TLFN (SEQ ID NO: 1375) HRG_COWPX_3/l-3 YI I SHLYSES DSR— SCVNP EWKCLINHG I -NPSS (SEQ ID NO: 1376) Q21587_2/ 1-30 EKPSPIDIAV LK-DDP HLLKIVLDAG A-NPNA (SEQ ID NO: 1377) VB18_VARV_2 / 1-3 EGKTLLHIAC EY-NNT HVIDYLIRIN G-DINA (SEQ ID NO: 1378) VB04_VACCC_1/1- YGCTLLHRCI YH ( 5 ) SESYN ELIKILLNNG S-DVDK (SEQ ID NO: 1379) 075407/1-30 LNSTPLHWAT RQ-GHL SMWQLMKYG A-DPSL (SEQ ID NO: 1380) P87621_4/ 1-31 FGDSPLTLLI KT-LSPA HLINKLLSTS N-VITD (SEQ ID NO: 1381) 054807_1/ 1-31 EGRTPVHVAI SN-QHS VIIQLLISHP NI-ELSV (SEQ ID NO: 1382) FEM1_CAEEL_4 / 1- QGVDPLMGAA LS-GFL DVLNVLADQM PS-GIHK (SEQ ID NO: 1383) Q84566/ 1-33 GYRSLLQYAL SWG—RHGNS KVIELLLKHG A-NVDY (SEQ ID NO: 1384) Q27105_l/ 1-28 HGRTSLYWAV -KNK EIVELLQSHG A-TI— (SEQ ID NO: 1385) 013075/1-30 NGDNVLHLSI IH-LHR ELVKNLLEVM P-DMNY (SEQ ID NO: 1386) G3930527_4/l-30 QRSTPLIIAA RN-GHA KWRLLLEHY R-VQTQ (SEQ ID NO: 1387) G3925387_5/l-30 DGHSLLHWAA LG-GNA DVCQILIENK I -NPNV (SEQ ID NO: 1388) 088849_6/ 1-30 EGRTSFMWAA GK-GND DVLRTMLSLK S-DI DI (SEQ ID NO: 1389) 088849_7 / 1-30 QGATPLHYAA QS-NFA ETVKVFLQHP S-VKDD (SEQ ID NO: 1390) 017055_4/ 1-30 EGNQALHYAA KS-GSL VILNMLIKQV R-GTND (SEQ ID NO: 1391) Q18297_5/ 1-30 NKMTPLLLAC VH-GSQ El IQELIKAN S-NVTK (SEQ ID NO: 1392) FEM1 CAEEL 5/1- NNDTPMHILL RAR-EFRK SLVRALLVRG T-WLFA (SEQ ID NO: 1393) 075762_4/ 1-30 DGCTPLHYAC RQ-GGP GSVNNLLGFN V- SIHS (SEQ ID NO 1394) 014586_1/ 1-30 SWATGLHLSV LF-GHV ECLLVLLDHN A-TINC (SEQ ID NO 1395) P90784/ 1-30 NGNTALHLCV IH-DKM DMLDAVLEAG G-NIRL (SEQ ID NO 1396) Q94527_l/ 1-30 DGDSALHVAC QQ-DRA HYIRPLLGMG C-NPNL (SEQ ID NO 1397) 054807_2/ 1-31 YLQTPLHMAI AY-NHP DWSVILEQK AN-ALHA (SEQ ID NO 1398) 075762_5/ 1-30 NGWTALHHAS MG-GYT QTMKVILDTN L-KCTD (SEQ ID NO 1399) 157038_1/ 1-31 DGDTPLHLAC IS-GSV DWAALIRMA PH- POLL (SEQ ID NO 1400) Q02989_4/ 1-30 NGNTLLHLFS ST-GEL EWQFLMQNG A-NFRL (SEQ ID NO 1401) O60736_l/ 1-32 QGWTVLHEAV ST-GDP EMVYTVLQHR DY—HNTSM (SEQ ID NO 1402) ANKl_MOUSE_l 8/1 DGFTPLAVAL QQ-GHE NWAHLINYG -TKGK (SEQ ID NO 1403) 061222_1/ 1-33 YGRTPLMYAI MT-NNR SWDAIVGDG KLA-WLHK (SEQ ID NO 1404) Q62422/ 1-31 AGSTALYWAC HG-GHK DIVEVLFTQP NV-ELNQ (SEQ ID NO 1405) Q24241_15/l-30 NGLSALHMAA QG-EHD EAAHLLLDNK A- PVDE (SEQ ID NO 1406) YD57_SCHPO_2 / 1- DKRTPLHWAC SV-GKV NTIYFLLKQP NI-KPDE (SEQ ID NO 1407) G3927831_2/ 1-32 NRRTCLHYAA YY-GHA NCVQAILSAA QS—SPVAV (SEQ ID NO 1408) P70770_1/ 1-30 DNNNALWFAC FG-NHY DLIHLLLAAK I -NIDN (SEQ ID NO 1409) Q94527_2/ 1-30 AGNTPLHVAV KE-EHL SCVESFLNGV P-TVQL (SEQ ID NO 1410) P72763_4/ 1-30 QGTTALEWAI KQ-QNL SMVRKLIEHH A- PLDL (SEQ ID NO 1411) 060733_2 / 1- 30 EGCTPLHLAC RK-GDG EILVELVQYC H-TQMD (SEQ ID NO 1412) KBFl_MOUSE/ 1-30 KSYPQVKICN YV-GPA KVIVQLVTNG K-NIHL (SEQ ID NO 1413) 088849_9/1-31 ERYTPLDYAL LG-ERH EVIQFMLEHG AL- SIAA (SEQ ID NO 1414) O82630_l/ 1-30 LGSTPLLEAI KN-GND RVAALLVKEG A-TLNI (SEQ ID NO 1415) YAHD_ECOLI / 1-30 QGKTAITLAS LY-QQY ACVQALIDAG A-DINK (SEQ ID NO 1416) Q84644_3/ 1-30 REEPILHVIT SI-GNV NMMKTMLDEG A-DINV (SEQ ID NO 1417) 157038_2/ 1-30 SGRTPLHIAI EG-CNE DLANFLLDEC E-KLNL (SEQ ID NO 1418) 1790447_1/ 1-34 YGCPGLYLAM QN-GHS DIVKVILEAL PSLAQEINI (SEQ ID NO 1419) 075762_6/ 1-31 NMMAPLHIAV QG-MNN EVMKVLLEHR TI-DVNL (SEQ ID NO 1420) LI12_CAEEL_2/1- DENTPLMLAV LA-RRR RLVAYLMKAG A-DPTI (SEQ ID NO 1421) G3970962_l/ 1-30 HGYTALMFAA LS-GNK DITWVMLEAG A-ETDV (SEQ ID NO 1422) VC09_VACCC_2 / 1- QDLLLEYVSY HT-VYI NVIKCMIDEG A-TLYR (SEQ ID NO 1423) O83807_2/ 1-31 SGKTPWHWAV RA-LDR DLIKHLVTLG PP-TQER (SEQ ID NO 1424) 023296/1-30 DGNTALYYAI EG-RYL EMATCLVNAD K-DAPF (SEQ ID NO 1425) G3927831_3/l-30 DRHSVLHVAA AN-GQI EILSLLLERF T-NPDL (SEQ ID NO 1426) YB07_FOWPM_3/1- NGYSPIKMAV RL-RDV EMIKLLMSYN TY—PDYNY (SEQ ID NO 1427) RN5A_MOUSE_2 / 1- NGFTAFMEAA ER-GNA EALRFLFAKG A-NVNL (SEQ ID NO 1428) 054910_1/ 1-30 HGDTALHVAC RR-QNL ACACCLLEEQ P-EPGR (SEQ ID NO 1429) 044997_4/ 1-32 GGYEPMHTCY DHF-VGNA DCIHLILYRT S-DPTE (SEQ ID NO 1430) 000306/1-30 REQTPLFLAA RE-GAV EVAQLLLGLG A-AREL (SEQ ID NO 1431) 061222_2/ 1-30 SGNTAAHYAA AY-GFL DCLKLLASID D-NILS (SEQ ID NO 1432) 074205/1-30 DQFSPLMTAI SL-GRL DIARILLQSG A- PLDI (SEQ ID NO 1433) VB18_VARV_3 / 1-3 EYIKSRYMLL KEE-DIDE NIVSTLLDKG I -DPNF (SEQ ID NO 1434) P87603_1/ 1-33 HLITPLHSYL RRD—ESISA SVLKKVIELG A-DRNL (SEQ ID NO 1435) O82630_2/ 1-30 DHRTPLHVAA SE-GFY VLAIQLVEAS A-NVLA (SEQ ID NO 1436) Q25328_4/ 1-30 KTDTPLCYAS EN-GHF TWQYLVSNG A-KVNH (SEQ ID NO 1437) JC4356/ 1-30 AGQTAAMYAA LF-KRT EVLKELTDKG A-DLSI (SEQ ID NO 1438) 050999/1-30 NGNPIFTYAI NV-KAK SI INYLITKE F-NINL (SEQ ID NO: 1439) Q20109_3/ 1-30 YSSTALMLAT RG-NFI QWELLLTRE P-NVNV (SEQ ID NO: 1440) GLP1_CAEEL_2/1- SERSALHEAV VN-KDL RILRHLLTDK RLL-KEIDE (SEQ ID NO: 1441) Q86916_l/ 1-33 NNLSALAHYL SFN—KNVEP EIVKILIDSG S- SITE (SEQ ID NO: 1442) Q92527_l/ 1-30 RYNTVLHYAV CG-QSL SLVEKLLEYE A-DLEA (SEQ ID NO: 1443) O90757_4/ 1-30 NDYRLLHSAI TH-ENK KMIELLCLHG I -NINV (SEQ ID NO: 1444) I50404_2/ 1-32 YGNSLLHLAL QA-ADE EMLRMLLAHL GS—ATPYL (SEQ ID NO: 1445) Q94400_l/ 1-31 GGATALQYAV IN-GNE YLVEILTSHA SI-DVNI (SEQ ID NO: 1446) Q83730/ 1-32 DGYGPVQTYI HSK-NWL DTLRELVRAG A-TVHD (SEQ ID NO: 1447) 044997_5/ 1-30 NGDTPLHVAC RF-AQH TVAGYVANEK I -DVDS (SEQ ID NO: 1448) Q25338_6/ 1-30 NKYLPIHKAI IN-DDL DMVRLFLEKD P- SLKD (SEQ ID NO: 1449) 090760_3/ 1-30 TGLTPLDIAM SC-NNY El I SLLVSHV I -RLDY (SEQ ID NO: 1450) 391941_1/ 1-30 NGHSPLFYAL WE- SHV DVLNALLQRP L-NLPS (SEQ ID NO: 1451) 1790447_2/l-34 TSSHVLYHVM AN-GDA DMLKIVLNAL PLLIRTCHL (SEQ ID NO: 1452) Q02979_2/ 1-30 YKRTPLHYSC QY-GLS EVTKLI IKLM K-EWNI (SEQ ID NO: 1453) O74205_l/ 1-30 EGNSLLHIAV KT-NSL SITRLLLERY K- SCRE (SEQ ID NO: 1454) Q86916_2/ 1-32 LYETPLFSCV EKD-KVNV EILNFLIENG S-DINF (SEQ ID NO: 1455) Q94527_3/ 1-31 DGNNALHMAV LE-QSV ELLVLILDAQ NE-NLTD (SEQ ID NO: 1456) 015084_16/ 1-30 NWQTPLHIAA AN-KAV KCAEALVPLL S-NVNV (SEQ ID NO: 1457) YG4X_YEAST_1/1- DGRI PLHWSV SF-QAH EITSFLLSKM E-NVNL (SEQ ID NO: 1458) 018270_5/ 1-30 NGYTALHLAA MV-GHE KVCKILTNQP T- SI FP (SEQ ID NO: 1459) Q14349_4/ 1-30 QQRTPLMEAV VN-NHL EVARYMVQRG G-CVYS (SEQ ID NO: 1460) JQ1744_2/ 1-34 KFHTPLHTYV DKNS-KGVAP SI IQYMIDRG V- STAR (SEQ ID NO: 1461) PLU_DROME/ 1-32 FGQNALTLAT YA-GHL TLVKELLRRR SY—KDFNL (SEQ ID NO: 1462) 054807_4/ 1-31 NGNNALHLAV MH-GRL NNIRALLTEC TV-DAEA (SEQ ID NO: 1463) 015084_17/l-30 ATISPLHLAA YH-GHH QALEVLVQSL L-DLDV (SEQ ID NO: 1464) Q93203_l/ 1-30 PAIYPFHLAA KY-NHV ECMQALREHG F- SINY (SEQ ID NO: 1465) 088849_11/1-31 EGRTPLHFAV AD-GNL TWDVLTSYE SC-NITS (SEQ ID NO: 1466) Q17643/ 1-30 AMMNTLHMAV AH-KQR DIVELLLKNG V-DPNT (SEQ ID NO: 1467) 048738_2/ 1-30 DEASPLYMAV EA-GYH ELVLKMLESS S- SPSI (SEQ ID NO: 1468) 1790447_3/ 1-30 QKYSAFELAF EF-GHR VIAELILNTL N-KMAE (SEQ ID NO: 1469) Q02989_9/1-31 SECNPLHEAA AY-AHL DLVKYFVQER GI-NPAE (SEQ ID NO: 1470) Q19995_l/ 1-31 FGCTPLHSAV VH-EHT EIVRYIAGHY NS-VLNA (SEQ ID NO: 1471) 090760_4/ 1-30 DRNSLLLVAT KR-NYI DWRYLVKKG V-DINF (SEQ ID NO: 1472) Q89340_5/ 1-31 SGCTPLHRAV FN-GHD ACASMLVNKI VS-ERPL (SEQ ID NO: 1473) O90757_5/ 1-31 NKYSMLHFLS TS-NKYH NVMAVLLDKG I -DVNI (SEQ ID NO: 1474) Q20313_l/ 1-30 QDSDIFLSAC MS-GDE EEVEELLNKG A-NINT (SEQ ID NO: 1475) Q89202_2/ 1-29 KGNTALYYAV DS-GNM QTVKLFVKKN -WRLM (SEQ ID NO: 1476) YMV8_YEAST_2 / 1- KGNTCVHLAL MK-GHE QTLHLLLQQF P-RFIN (SEQ ID NO: 1477) 157038_3/ 1-30 VAQTPLHLAA LT-AQP NIMRILLLAG A-EPTV (SEQ ID NO: 1478) AKR1_YEAST_2 / 1- QGFNLLHLSV NSSN-IMLVL YVLFNWSKG LL-DI DC (SEQ ID NO: 1479) Q18297_6/ 1-30 REKELIHFAA EK-GFL EVLKALVEAG G-NKNE (SEQ ID NO: 1480) 068219/1-30 DGRTIIHYAA KD-GNL EILQQALGRK S- SYSK (SEQ ID NO: 1481) Q20109_5/ 1-30 FGNWILTSAV RS-GNA AIVRMILDKF A-DINC (SEQ ID NO: 1482) Q25328 7/1-30 DGMNFFYYAV QN-GHL NIVKYAMSEK D-KFEW (SEQ ID NO: 1483) 061222_4/ 1-30 EDNTPLHYAL TN-GNL MLFNLMLDKV A-NKRN (SEQ ID NO 1484) Q21587_5/ 1-30 LLKSALHFAL LS-GNH DLVKFMIANG S-NVNM (SEQ ID NO 1485) 061222_5/ 1-30 KKRTPLIHAM LN-GQI HTAAFLLAKG A- SLTL (SEQ ID NO 1486) Q21920_13/l-30 RKISPMMAAF RK-GHV EIVKYMVNSA K-QFPN (SEQ ID NO 1487) Q83730_l/ 1-30 LGNSCLDLAV LN-GNK YMVHRLLRKT I -TPDA (SEQ ID NO 1488) Q18297_7/ 1-30 DTKTVLHTAA FY-GNE SIVRYFIAEG V-TIDR (SEQ ID NO 1489) Q18663_l/ 1-31 EEFTPLHNAV KM-GNT VMAKNFIDSK SV-WIDE (SEQ ID NO 1490)
Q25328_8 / 1-31 GLLSALHYAI LY-KHD DVASFLMRSS NV-NVNL (SEQ ID NO 1491)
018152_4/ 1-30 YGQHPMMIAA AN-GNL KVLEYLMGIS D- IINQ (SEQ ID NO 1492) 054807_6/ 1-30 HRQTALHLAA QQ-DLP TICSVLLENG V-DFAA (SEQ ID NO 1493) O73630_3/ 1-30 NGDTPLHLAV IH-GQS SVIEQLVQII L- SI PN (SEQ ID NO 1494) 35040_3/ 1-30 RGHTPLDLTC ST-LVK TLLLNAAQNT M-EPPL (SEQ ID NO 1495) Q19995_2/ 1-33 DGRTPLHHAY MK-RRN ELIEYLLYIC PDSA-NIKD (SEQ ID NO 1496) DAPK_HUMAN_3/ 1- DADIRLWVNG CKL—ANRGA ELLVLLVNHG Q-GIEV (SEQ ID NO 1497) Q25328_9/ 1-30 KGLAPLLAFS KK-GNL DMVKYLFDKN A-NVYI (SEQ ID NO 1498) O43150_l/ 1-33 PDETALHLAV RSV—DRTSL HIVDFLVQNS G-NLDK (SEQ ID NO 1499) YAR1_YEAST_1/1- SDSTALHMAA AN-GHI ETVRYILETV S-RANS (SEQ ID NO 1500) Q84566_2/ 1-31 NGHTLLLWSA EK-SKNM ELVKYLIHAG V-El SQ (SEQ ID NO 1501) S57237/1-30 YGKS PINDAA EN-QQV ECLNVLVQHG T- SVDY (SEQ ID NO 1502) 35040_4 /1-33 NGDTPLHLAI IHG—QTSVI EQIVYVIHHA Q-DLGV (SEQ ID NO 1503) Q18587_l/ 1-30 NGRTAIQIAA DY-GQT SIIAYLISIG A-NIQD (SEQ ID NO 1504) Q02989_10/l-30 QENTPITVAI FA-NKV SILNYLVGIG A-DPNQ (SEQ ID NO 1505) Q09493_l/ 1-32 IGLTPLYYNM LTA-DSND QVAEILLREA A-DIGV (SEQ ID NO 1506) 015084_18/l-30 RGRTPIHLSA AC-GHI GVLGALLQSA A- SMDA (SEQ ID NO 1507) Q21920_15/l-33 KKNTPLMEAC AGD—QGDQA GWKLLLSKH A-EVDV (SEQ ID NO 1508) O83807_6/ 1-30 DGSTPLHVMA KR-GYT GFVQFLVDRK V-NLNA (SEQ ID NO 1509) D1037943_4/l-30 YDATAMHRAA AK-GNL KMVHILLFYK A- STNI (SEQ ID NO 1510) YB07_FOWPM_4 / 1- DIESELHEAV EE-GDV VKVEELLDSG K- FIND (SEQ ID NO 1511) 045398_1/ 1-30 DDRTALDIAV EK-RDL KSARICIKGG A-DVNA (SEQ ID NO 1512) O49409_l/ 1-30 DGGVRLMYLA NE-GDI EGIKELIDSG I -DANY (SEQ ID NO 1513) O83807_8/ 1-30 KGETPLVLTI DR-DHR DLTAYFVSLG A-DIHA (SEQ ID NO 1514) S57237_l/ 1-31 NDVTPVYLAA QE-GHL EVLKFLVLEA GG- SLYV (SEQ ID NO 1515) JQ1744_3/ 1-30 YGFTPLLSAV YA-VNL NFFNHFVKLG A-DINV (SEQ ID NO 1516) E1350345_l/l-30 NGRSCLFYAR SN-GFR EVFDMLVTAG L- SPDY (SEQ ID NO 1517) P87611_1/ 1-35 IGKTALYYYI ITRSRDKLSL DVINCLISYE K-EILY (SEQ ID NO 1518) Q23595_2/ 1-30 RGQSLLHIAC LC-GHE HIVRWI LNRS G- SDAI (SEQ ID NO 1519) P90902_2/ 1-30 DFWQPVHAAA CW-AQP DLIELLCQYG G-DIHA (SEQ ID NO 1520) VC17_VACCC_1/1- MCKNSLHYYI SS (4) QSLSK DVIKCLINNN V- SIHG (SEQ ID NO 1521) JQ1744_4/ 1-32 HGNSSLHYYM RRY-MLCK SVLDILLSTG V-DING (SEQ ID NO 1522) Q93203_2/ 1-30 KNILPIHLAA WN-GNL EIVKMLLLQT M-EKPA (SEQ ID NO 1523) 045398_2/ 1-30 EERTPLICAV MA-NNH IMCETLIRSG V-NCDV (SEQ ID NO 1524) 000522_1/ 1-31 DHWAPIHYAC WY-GKV EATRILLEKG KC-NPNL (SEQ ID NO 1525) Q02989_ll/l-30 SGFTPLHLSI SS-TSE TAAILIRNTN A-VINI (SEQ ID NO 1526) YMV8_YEAST_3/1- NGWSSLHYAS YH-GRY LICVYLIQLG H-DKHE (SEQ ID NO 1527) Q07045 4/1-35 NSNILQHILI EYMTFDDIDI HLVECMLAYG A-WNK (SEQ ID NO 1528) 015084_20/ 1-30 DGKTPLHMTA LH-GRF SRSQTI IQSG A-VI DC (SEQ ID NO: 1529) Q18297_8/ 1-30 YQLTPLHYAA MK- SNF SALHALIKLK A-DVDA (SEQ ID NO: 1530) 024538_2/ 1-32 NGNTALWYAI AS-KHY SI FRILYQLS AL—SDPYT (SEQ ID NO: 1531) O55014_3/ 1-30 LLSTALHVAV RT-GHY ECAEHLIACE A-DLNA (SEQ ID NO: 1532) 045398_3/ 1-33 QGASALEIAL CSE—HEKSR QIAAQLVEKG A-DVNV (SEQ ID NO: 1533) 017055_5/ 1-30 VHFTALHCAT YF-GQE NAVRTLISAS A-NLNL (SEQ ID NO: 1534) 015084_21/ 1-30 LKRTPIHAAA TN-GHS ECLRLLIGNA E- PQNA (SEQ ID NO: 1535) 088849_14/l-33 EGKI PLHWAA NHK—DPSAV HTVRCILDAA P-TESL (SEQ ID NO: 1536) Q02989_13/l-30 RNEYPFYLAV EK-RYK DIFDYFVSKD A-NVNE (SEQ ID NO: 1537) 023295_1/ 1-30 YGVS SLFVAI NT-GDV SLVKAI LKI I G-NKDL (SEQ ID NO: 1538) Q14349_5/ 1-30 EENICLHWAS FT-GSA AIAEVLLNAR C-DLHA (SEQ ID NO: 1539) Q12013_1/ 1-30 RGFNALHCAL VG-GDQ RVICDLILSG A-NFYE (SEQ ID NO: 1540) Q40785_2/ 1-30 DDESIVHHTA SV-GDA EGLKKALDGG A-DKDE (SEQ ID NO: 1541) DAPK_HUMAN_5/ 1- GGSNAVYWAA RH-GHV DTLKFLSENK C- PLDV (SEQ ID NO: 1542) 018270_6/ 1-34 RDRSPLHYAA CK-VNL EALRILLFVD PNGGPDFGF (SEQ ID NO: 1543) Q86916_3/ 1-30 RYCNPLFYYV EQ-DDI NGVKKWIKFV N-DCND (SEQ ID NO: 1544) AKR_ARATH_2 / 1-3 GGLTALHRAI IG-KKQ AITNYLLRES A-NPFV (SEQ ID NO: 1545) 016004_2/ 1-31 MGETPLHAAV RA-DAI EVFRLLIQNR ST-QIDA (SEQ ID NO: 1546) 016004_3/ 1- 41 DGLTPLMLAV MR (10) ETHA HFIEDLITRG G-DINH (SEQ ID NO: 1547) YA2A_SCHPO_4 / 1- QGFNCLHLAV HA-AS P LLWYLLHLD I - SVDL (SEQ ID NO: 1548) Q18297_9/ 1-30 EKKTPLRMAV EG-NHP ETLKKILQME K-KNSC (SEQ ID NO: 1549) 035433_1/ 1-31 KGLTPLALAA SS-GKI GVLAYILQRE IH-EPEC (SEQ ID NO: 1550) 054807_8/ 1-30 KGNPPLWLAL AS-NLE DIASTLVRHG C-DATC (SEQ ID NO: 1551) P87603_3/ 1-40 TYLSRLCIVN KN ( 8) NNINI EWKYLIRSG S-DISL (SEQ ID NO: 1552) 073579_3/ 1-30 SLPIQYYWSC ST- IDI EIVKLLIKDV D-TCRV (SEQ ID NO: 1553) Q21587_6/ 1-33 RLKSPFVEYF RSN—DSIDA KWKLFITHG A-KWM (SEQ ID NO: 1554) A53950_3/ 1-30 FDKSAFDIAM EK-NNT EILVMLQEAM Q-NQVN (SEQ ID NO: 1555) Q18663_3/ 1-31 SGRTPFDLAC EY-GQE KMLERLFSCG LR-KINF (SEQ ID NO: 1556) Q25328_ll/1-31 NNITALHYAA IL-GYL ETTKQLINLK El-NANV (SEQ ID NO: 1557) O72760_2/ 1-36 EYKNILIHYI YS (4) KPINY YVLYKLLRKG A-DPNY (SEQ ID NO: 1558) Q89342_2/ 1-30 DGSTPLHFIA RW-GRK ICARELITAG V-EINT (SEQ ID NO: 1559) O83807_9/ 1-30 MGSSALVLAI KK-DRS DLCHELLALG A-DLFI (SEQ ID NO: 1560) Q83730_2/ 1-33 DGRYPLLCLL END—RINTA RFVRYMIDRG T- SVYV (SEQ ID NO: 1561) O04242_3/ 1-32 RLDLPVTLCF AVN-KGDD FMLHQLLKRG L-DPNE (SEQ ID NO: 1562) O82490_l/ 1-30 YGLSPLHLAI EE-GQT RLVLSLLKVD S-DLVR (SEQ ID NO: 1563) 045398_4/ 1-30 DQFTPAHMAV SW-AQN DVLRALRDHS A-NLCD (SEQ ID NO: 1564) P87600_4/ 1-35 NGVDPLMLTM ENNMLSGHQW YLVKNILDKR P-NVNM (SEQ ID NO: 1565) O73560_2/ 1-48 DGLTSLHYYC KH ( 17 ) RAEK RFIYTI IDHG A-NINA (SEQ ID NO: 1566) P87603_5/ 1-33 NIPEILYLYI QMA—QYVDL DVIKHFLNNS V- ILDY (SEQ ID NO: 1567) Q14678_3/ 1-29 DGSTALSIAL EA-GHK DIAVLLYAHV -NFAK (SEQ ID NO: 1568) Q93318_l/ 1-38 HGDTAVDLLC S S ( 6 ) NNVRE NLYIRLLTSG A-VPNK (SEQ ID NO: 1569) G3930525_6/l-30 YGMTPLLAAS VT-GHT NIVEYLIQEQ P-GHEQ (SEQ ID NO: 1570) YB07_FOWPM_5/1- SKRPCVTAMC YAI—QNNKI DMVSMFLKRG A-DSNI (SEQ ID NO: 1571) JQ1744_5/ 1-31 FGDTCVTLAV AT-NHV YFVTAVLQHK PS-EQTI (SEQ ID NO: 1572) VB04 VACCC 2/1- YGENILHIYS MD-DANT NIIIFFLDRV L-NINK (SEQ ID NO: 1573) 041154_3/ 1-35 YGKTCIEMAA YY-GHQ DILKMMNGKL GI (4) I IND (SEQ ID NO: 1574) P87600_5/ 1-36 GDMDILYTYF NSPRTRCIKL DLIKYMVDVG IV-NLNY (SEQ ID NO: 1575) Q21920_17/l-31 DGVNCFMEVA RH-GSI DLMSLLVEFT KG-NMPM (SEQ ID NO: 1576) 044997_6/ 1-30 NGATAMHCAA KY-GHA EVFNYFHMKG G-NICA (SEQ ID NO: 1577) P72763_6/ 1-30 EGKTPLMLAA MA-NLA AVITTIANYS V-QVNR (SEQ ID NO: 1578) Q09493_2/ 1-31 QGETPLTLAA GI-PNNR AVIVSLIGGG A-HVDF (SEQ ID NO: 1579) Q24145_l/ 1-30 FGCQPLHYAA RS-KTA SFIRTLISAQ A-NVQG (SEQ ID NO: 1580) Q17583_2/ 1-31 NGVSPLMYAT TV-RSI NSAKHLVLHR DS-DVNM (SEQ ID NO: 1581) O90757_6/ 1-30 HVKAPIHVAV ER-NNI YGTMLLINRN A-DVNI (SEQ ID NO: 1582) MBP1_YEAST_1/1- ELHTAFHWAC SM-GNL PIAEALYEAG T- SIRS (SEQ ID NO: 1583) 083807_10/l-30 GGNTPLHLAC EW-KLT QAINGILRKG A-EIEA (SEQ ID NO: 1584) G4103857_2/ 1-32 GLLTPLHLAA GN-RDSR DTLELLLMNR YI-KPEL (SEQ ID NO: 1585) 018270_7 / 1-30 ESVTPLHIAA TK-QDT KILKIFVEIL K-TPKV (SEQ ID NO: 1586) 068219_2/ 1-31 GKKTTLTAEA LT-SGKY GWKALIKNS A-DVNA (SEQ ID NO: 1587) Q19995_4/ 1-30 TGSVALILAV LN-GHK SIVEAILGYS L-NTLN (SEQ ID NO: 1588) 045398_5/ 1-29 DGRTPAHIGV RE-QNV DGVKILLDAE -NVEF (SEQ ID NO: 1589) Q12013_3/ 1-30 NNRTPLLWAA YQ-GDF LTVELLLKFG S-TVAW (SEQ ID NO: 1590) 018152_5/ 1-31 MGQTPLMLAA SQ-GQL EAVQFLTKTA KA-DVDA (SEQ ID NO: 1591) Q89540_4 / 1- 32 YGRTTLHHLA RTA-RISE GMVRTLTGLG V-DPAA (SEQ ID NO: 1592) Q17643_1/ 1-30 VDQTPLGVAV RS-QSS ELIALLIAYG G-DVNL (SEQ ID NO: 1593) Q63618_3/ 1-31 DGMTPLHAAA QM-GHN PVLVWLVS FA DV- SFEQ (SEQ ID NO: 1594) 014586_2/ 1-30 DFKSPLHKAA WN-CDH VLMHMMLEAG A-EANL (SEQ ID NO: 1595) Q17583_4/ 1-31 GNLTALSMAI VC-TPGV YMVRTLIDLG A- SVNK (SEQ ID NO: 1596) 068219_3/ 1-32 TGTPALHLAT AA-GNH RTAMLLLDKG AP—ATQRD (SEQ ID NO: 1597) Q93318_2/ 1-31 DGDTPLHIVA AH-NDL GKIYALCETL RK-TMNE (SEQ ID NO: 1598) 054807_9/ 1-30 DGETALQLAI KH-QLP LWDAICTRG A-DMSV (SEQ ID NO: 1599) Q01317_4/ 1-30 DNFTPLVHSI VR-NHL ECVGRLLERS A-RIDP (SEQ ID NO: 1600) O82490_2/ 1-30 FINTPLHIAS AS-GNL SFAMELMNLK P- SFAR (SEQ ID NO: 1601) YAHD_ECOLI_2 / 1- TCLNPFLI SC LN-DDL TLLRIILPAK P-DLNC (SEQ ID NO: 1602) Q63618_4/ 1-30 LDALPVHHAA RS-GKL HCLRYLVEEV A-LPAV (SEQ ID NO: 1603) Q89440_2/ 1-44 MEMYPRHRYS KH ( 13 ) DLDM NWKELLSNG A- SLTI (SEQ ID NO: 1604) P87601_1/ 1-38 DGYTCLDVAV DR (6) REAHL KILEILLREP L- SIDC (SEQ ID NO: 1605) 018152_8/ 1-31 HGRTCVHWA ST-NDV AAAKMLKRIG GV-NFEA (SEQ ID NO: 1606) 1AP7_1/ 1-30 TGSLPIHLAI RE-GHS SWSFLAPES D-LHHR (SEQ ID NO: 1607) 017055_6/ 1-32 FSETPLHAAC TG-GKS I ELVSFLMKYP GV-DPNY (SEQ ID NO: 1608) Q18297_10/l-32 RLNTVFHIVA LR-GEP EYLEMMMDHD PV—EAIKA (SEQ ID NO: 1609) 157038_4/ 1-32 HGNTALHLSC IA-GEK QCVRALTEKF GA—TEIHE (SEQ ID NO: 1610) 075762_9/ 1-30 DKKSPLHFAA SY-GRI NTCQRLLQDI S-DTRL (SEQ ID NO: 1611) 023296_4/ 1-31 TGDSILHIAA KW-GHL ELVKEIIFEC PC-LLFE (SEQ ID NO: 1612) TRI 9_HUMAN_3/ 1- DGDTALHLAV IH-QHE PFLDFLLGFS A-GTEY (SEQ ID NO: 1613) Q23595_3/ 1-32 NGETAAHI SA AH-GDM RALEILLGGS LK—STMNC (SEQ ID NO: 1614) VC17_VACCC_2 / 1- GGSLPIQYYW SFS-TIDI EIVKLLLIKD V-DTCR (SEQ ID NO: 1615) Q18297_ll/l-30 YQKTPLQVAV DS-GKL ETCQRLVAKG A-QIES (SEQ ID NO: 1616) NTC4_MOUSE_4/l- AGRTPLHTAV AA-DARE VCQLLLASRQ T- SVDA (SEQ ID NO: 1617) 054807 10/1-30 KGRNFLHVAV QN- SDI ESVLFLI SVQ A-NVNS (SEQ ID NO: 1618) 068219_4/ 1-31 MGDTALHEAL YSD-NVTE KCFLKMLKES R-KHL (SEQ ID NO: 1619) 018270_8/ 1-32 VGRTAVFWAC MG-GQA HTLHCMIKEL GF—EWRTS (SEQ ID NO: 1620) 018152_9/ 1-31 DGAVALHYAV TH-DHV ELVKHLTNQH TV-EAKD (SEQ ID NO: 1621) Q18297_12/l-30 DERTPVFIGA KF-NAL SSVEYILDHL R-KKNK (SEQ ID NO: 1622) VCO 9_VACCC_4 / 1- DGETPLKAYV TKKN-NNIKN DWILLLSSV D-YKNI (SEQ ID NO: 1623) 1AP7_2/ 1-29 FGKTALQVMM F- GSP AVALELLKQG A- SPNV (SEQ ID NO: 1624) P87600_6/ 1-35 LGYTPLTSYI CTAQ-NYMYY DIIDCLISNK VL-NMVK (SEQ ID NO: 1625) O75762_10/ 1-30 CHETMLHRAS LF-DHH ELADYLISVG A-DINK (SEQ ID NO: 1626) P87603_6/ 1-35 YSSNIFSVYF KAHNTIGIDI KLVKWMITKG V-DINY (SEQ ID NO: 1627) 048738_5/ 1-30 MGETTLHVAA RA-GSL NIVEILVRFI T-ESSS (SEQ ID NO: 1628) P87611_2/ 1-30 DFDIFEYIES DN- IDV ELLRLLIAKG L-EINS (SEQ ID NO: 1629) Q25328_13/l-31 RGTTPFLTAV AE-NAL HIAEYLIREK RQ-DINI (SEQ ID NO: 1630) Q02979_3/ 1-31 KSSSLLHLAT EW-NNY PLLHVLLSSK RF-DINY (SEQ ID NO: 1631) Q01317_5/ 1-30 KDLPAMYYAA WE-GHL ECMKLLTPAK K-EKAA (SEQ ID NO: 1632) Q17343_21/ 1-30 EGDEPEHHYT FT-TTT TVTKEVIDDS Q-EMGD (SEQ ID NO: 1633) Q19995_6/ 1-31 DKIDGIHKSI KE-GNL DKVKELMKTK KL-AIAR (SEQ ID NO: 1634) P90902_3/ 1-30 DGSTYLHVAS AN-GYY DVAAFLLTCS V- SPLI (SEQ ID NO: 1635) 068219_5/ 1-34 YEPHPQHVAV EAV—RTGAV GVLEHLITTE VI- SVNE (SEQ ID NO: 1636) Q83730_3/ 1- 30 LDFTPINYCV IH-NDR RTFDYLLERG A-DPNV (SEQ ID NO: 1637) E1344043_2/ 1-30 FGYNALVCAV KA-RAM DCIIAIRDAG G- FI DA (SEQ ID NO: 1638) 045398_8/ 1-31 YGENAMLCAV RS-GSL DCIRAVADSS RT-NRYA (SEQ ID NO: 1639) Q12013_4/ 1-31 YWDIFIEAA KD-GDL KWKDWESG AV-DINN (SEQ ID NO: 1640) 024382_4/ 1-31 GDVGHFACVA VE-QNNL SLLKEIVRYG G-DVTL (SEQ ID NO: 1641) 4151809_3/ 1-30 DGMTALHKAA CA-RHC LALTALLDLG G- SPNY (SEQ ID NO: 1642) 073579_4/ 1-36 DESIVQAMLI NY ( 4 ) DMVS I PIVRCMLDNG A-TMDK (SEQ ID NO: 1643) 015084_23/ 1-32 NAFSPLHCAV IN-DNE GAAEMLIDTL GA—SIVNA (SEQ ID NO: 1644) Q20109_8/ 1-31 NGESLLTVAV RS-GNT AVAKQLAQLD PD-AIDE (SEQ ID NO: 1645) 004703_2/ 1-33 IRTS SLIEAM KSR—QEDNV TMMKNFLQHH K-KLKD (SEQ ID NO: 1646) Q18970_1/ 1-31 SGMTPAMMCA KR-SFTM FPLRLIVRAG A-DLSL (SEQ ID NO: 1647) Q18663_4/ 1-30 DGI SACHIAC KD-GMI DHFNLLIYYH A-DICL (SEQ ID NO: 1648) 072755_3/ 1-40 DGMIPWYCI HS ( 7 ) NITNI KIIRKLLNLS RH-APHN (SEQ ID NO: 1649) Q25338_10/l-30 NDRSAMHAVA YR-GNN KIALRFLLKN Q- SIDI (SEQ ID NO: 1650) 054807_12/l-30 EGNTVLLLAY MK-GNA NLCRAIVRSG V-RLGV (SEQ ID NO: 1651) 045398_10/l-30 RGQTACHVAA EA-GAP EILSKIVEAR R-GLKW (SEQ ID NO: 1652) Q21920_18/l-31 NCNTALIYAA ST-DGRD WREILMTEG P-KKPD (SEQ ID NO: 1653) Q21587_7/ 1-29 QLRNVLKLAA LQ-NQP EVLELLLGLG -EQYD (SEQ ID NO: 1654) VCO 9_VACCC_5/ 1- NGINIVEKYA TTS-NPNV DVFKLLLDKG IP-TCSN (SEQ ID NO: 1655) VC17_VACCC_3/1- SNINDFDLSS DN- IDL RLLKYLIVDK RI ( 4 ) NTNY (SEQ ID NO: 1656) VB04_VACCC_3/1- SGCTCI SEAV AN-NNK IIMEVLLSKR P- SLKI (SEQ ID NO: 1657) Q18970_2/ 1-33 QGYTPIHLAI QG-NHV PLVAYFLLKF EYA-KDI SD (SEQ ID NO: 1658) Q23859_3/ 1-30 NGFSALHQAV SS-DDR ILMRGLQYEN I -NVDV (SEQ ID NO: 1659) Q25328_14/l-30 YDKTALDIAI DA-KFS NIVEYLKTKS G-KFRR (SEQ ID NO: 1660) Q19995_7/ 1-30 YGMTPIHKAL LH-GQT NTVRFLLGRF P- SCVN (SEQ ID NO: 1661) VBO 4_VACCC_4 / 1- YNETSIYDAV SY-NAY NTLVYLLNRN G-DFET (SEQ ID NO: 1662) 088202 1/1-31 VLLPGLALAA AH-AGDL DTLQAFVELG R-DLNL (SEQ ID NO: 1663) E1350208_l/l-30 ELPNPLHEAA RR-GNM DMLAECLRER V- SVNS (SEQ ID NO 1664) 054807_13/l-32 ALYSPKKYSA DVM-SEMA QIAEALLQAG A-NPNM (SEQ ID NO 1665) Q23859_4/ 1-34 QQQHQLFSAV EQ-NDV TKVKKLSSKK KISKSNLTS (SEQ ID NO 1666) O04242_4/ 1-31 -GDVALYSCV AVE-ENDP ELLENI IRYG G-NVNS (SEQ ID NO 1667) Q93203_3/ 1-30 SDFTLLHHAV LE-NQN QISKLLMDYE P-LLLL (SEQ ID NO 1668) Q17583_5/ 1-30 LHITSLQIAS LL-RVD DIIPLLLQRR A-DVTV (SEQ ID NO 1669) YIA1_YEAST_1/1- KNFEELCYSC RT-GDM DNVDRLISTG V-NVNS (SEQ ID NO 1670) 035433_2/ 1-30 FGELPLSLAA CT-NQL AIVKFLLQNS W-QPAD (SEQ ID NO 1671) 083515_2/ 1-30 CFEENFIATV MD-GNI DIVNLFLDAG F- SAAL (SEQ ID NO 1672) 023295_4/ 1-30 NGWTCLSLAA HI-GYY EGVCNLLERS T-KAAE (SEQ ID NO 1673) AKR_ARATH_3 / 1-3 KKWLPLHTLA AC-GEF YLVDSLLKHN L-DINA (SEQ ID NO 1674) Q83730_4/ 1-35 SRTS PLCTVL SNKDLGNEAE ALAKQLIDAG A-DVNA (SEQ ID NO 1675) O72760_5/ 1-43 RGNTFLHYFC IY ( 12 ) HREK KFIKELVKYG A-DINK (SEQ ID NO 1676) Q63618_7/ 1-32 NGATPAHDAA AT-GYL SCLQWLLTQG GC—RVQEK (SEQ ID NO 1677) YG4X_YEAST_4 / 1- FNQI PLHRAA SV-GSL KLIELLCGLG KS-AVNW (SEQ ID NO 1678) G3786431_4/ 1-34 EGSTCLHWAA RC-GSS ECVSTI LNFP FPSEFIIEI (SEQ ID NO 1679) 073579_5/ 1-32 EGLTPLGAYS KHR-HVKY QIVHLLI S SY S-NSSN (SEQ ID NO 1680) Q01317_6/ 1-32 ERVTRTFLAS IN-EGSL EALKVLLDTG LV-DIQS (SEQ ID NO 1681) 073579_6/ 1- 33 DNNYPLHDYF VNN—NLVDV NWRFIVENN G-HMAV (SEQ ID NO 1682) 083807_13/1-31 NQETPLFSAV KS-DAA EVISILLHPQ AG-NPAL (SEQ ID NO 1683) LI12_CAEEL_4/1- SERSALHQAA AN-RDF GMMVYMLNST KLK-GDIEE (SEQ ID NO 1684) Q89202_3/ 1-30 LFIPDNKLAI DN-KDI EMLQALFKYD I -NIYS (SEQ ID NO 1685) Q24241_22 / 1-34 DGNTALHIAS NLG—YVTVM ESLKIVTSTS VI-NSNI (SEQ ID NO 1686) Q93318_4/ 1-31 VGDSPLHFAT AR-GMN NMVEALLSKR El-RVNE (SEQ ID NO 1687) Q25328_17/l-32 HGRTVFHAAA KS-GND KIMFGLTFLA KST—ELNQ (SEQ ID NO 1688) P70770_2/ 1-30 MAIVALMKAT RE-GVD AWKELIDLG V-DQAR (SEQ ID NO 1689) VB04_VACCC_5/1- HGISLIKLYL ES (4) EIDNE HIVRHLIIFD AV—ESLDY (SEQ ID NO 1690) VB04_VACCC_6/ 1- YGNTPFILLC KHD-INNV ELFEICLENA NI—DSVDF (SEQ ID NO 1691) 004704_2/ 1-34 YGDPNMYVNL LTVA-STGNA TFLEELLKAK L-DPDI (SEQ ID NO 1692) 045398_12/1-31 DSKTPFFNEI VS-RKDV ASVEALLAAN V-DCHV (SEQ ID NO 1693) TRPL_DROME_2 / 1- KSYTRFWGLL MF ( 4 ) VINVI VLLNLLIAMM S-NSYA (SEQ ID NO 1694) JQ1744_6/ 1-32 YKRQWEDYL RKR-YVKA DVLKTILDSG I -RLSK (SEQ ID NO 1695) Q84566_5/ 1-31 IRHNILFYHV IR-DNY DWKFIIDNK LV-DINE (SEQ ID NO 1696) O49409_2/ 1-30 WGSTPFADAI FY-KNI DVIKILEIHG A-KHPM (SEQ ID NO 1697) JQ1744_7/ 1-35 SGHSAMMSYL INNPDATVHP EYVSYMVDCG A-NINQ (SEQ ID NO 1698) Q02979_4/ 1-30 GGWTPMEHAV LR-GHL HIADMVQIRD E-LVTH (SEQ ID NO 1699) 024538_3/ 1-27 -NLLCLAA KR-NDL TVMNELLKQG L-NIDS (SEQ ID NO 1700) Q89202_4/ 1-30 GWKTSFYHAV ML-NDV SIVSYFLSEI P- STFD (SEQ ID NO 1701) 083807_14/l-30 SGSTPLMVAV TE-NCF ESVRMLIARD A- SLFS (SEQ ID NO 1702) G3786431_5/ 1-31 ECRTAMYLAV AE-GHL EWKAMTDFK CT- SIDG (SEQ ID NO 1703) 062398_2/ 1-27 TGNTVIHCAI - NK KCLILLMEKF R-DQTD (SEQ ID NO 1704) Q25338_ll/l-30 NHMAPIHFAA SM-GSI KMLRYLISIK D-KVSI (SEQ ID NO 1705) Q23595_4/ 1-31 CKMLPVHVAA AQ-GNI EFLRAAIKFD NQ-MVNA (SEQ ID NO 1706) Q25328_18/l-30 NGQMPIHGAA MT-GLL DVAQAI I S ID A-TWD (SEQ ID NO 1707) Q28282 5/1-31 KYDDRLMKAA ER-GDVE KVSSILAKKG I -NPGK (SEQ ID NO 1708) Q94527_4/ 1-29 NRDTLLHEVI SH- KK DKLKLAIQTI Q-VMNY (SEQ ID NO 1709) AKR1_YEAST_5/1- PLLTRYHTAC QR-GDL ATVKEMIHGK LL-EVNN (SEQ ID NO 1710) 391941_4/ 1-31 SSDLDLHLTL DD- IDE KIIHQRLQEN KA-AFFF (SEQ ID NO 1711) Q23595_5/ 1-30 DRANAIHCAA YS-GSV PVLSHLLNAF S-KKKR (SEQ ID NO 1712) P87603_8/1-29 SGMTAFHVAV C- LRK DAMKYLLSIG Y- SIDK (SEQ ID NO 1713) 072755_4/ 1-42 QMRTPLHKYL CK(ll) YYNE KIIDAFIELG A-DLTI (SEQ ID NO 1714) 018270_9/ 1-31 IGATPAHYAA Q- FSV ECLKILFAES KI—TEVND (SEQ ID NO 1715) Q90623_5/ 1-31 EGDTPLDIAE EE-AME ELLQNEVNRQ GV-DIEA (SEQ ID NO 1716) 061222_7/ 1-30 QTETPLHVAA RA-GRAV NCTFLMKEML -DLEK (SEQ ID NO 1717) JQ1744_9/ 1-33 RGHTPLLVYI TMG—LFIES DAITCMIDHG A-DPLV (SEQ ID NO 1718) P87621_5/ 1-32 NNYHILHAYC GIK-GLDE RFVEELLHIG Y- SPNE (SEQ ID NO 1719) 016229_1/ 1-32 DGKNAILDAV RE-RNKP NLLKALQNGA FV-DVYN (SEQ ID NO 1720) VB18_VARV_5 / 1-3 NNKHAIQLII DNKENSQYTI DCLLYILRYI V-DKNV (SEQ ID NO 1721) O90757_7/ 1-30 KHFNMLRKAV LN-HDH NLVNIFIDKN F-NINI (SEQ ID NO 1722) Q93318_5/ 1-32 TGKTIVHHAV DKM—DVELL DFLKTWNED -TFTE (SEQ ID NO 1723) Q21920_19/l-31 ERDSALTLSA QK-GHI KIVTAIMDYY EK-NPPQ (SEQ ID NO 1724) Q02989_18/l-31 ERKTFFDLAI EN-GRL NIVAFAVEKN KV-NLQA (SEQ ID NO 1725) A55839_4/ 1-31 GGRTPLGSAL LR- PNP ILARLLRAHG AP-EPED (SEQ ID NO 1726) 060733_5/ 1- 31 KGETVFHYAV QG-DNS QVLQLLGRNA VA-GLNQ (SEQ ID NO 1727) 083807_16/1-31 AGESILHYAA KV-ADE KTLQGLLAMN RF-GKFL (SEQ ID NO 1728) 075762_11/ 1-30 QQAS FLHLAL HN-KRK EWLTIIRSK R-WDEC (SEQ ID NO 1729) 391941_5/ 1-30 VQFDPLNVAC KF-NNH DAAKLLLEIR S-KQNA (SEQ ID NO 1730) Q94447_2/ 1-30 KSFTRFWALL MF-GSY SVINI IVLLN M-LIAM (SEQ ID NO 1731) Q18104_2/ 1-33 TIFEVLAWCN ETV—KQNDF DVMETALDSL N-NLNL (SEQ ID NO 1732) 048738_6/ 1-31 SGKSVIHAAM KA-NRR DILGIVLRQD PG-LIEL (SEQ ID NO 1733) 061222_8/ 1-46 FNRTALHYAF GN ( 13 ) SDPI AWSLLSSLI RP—EQIEI (SEQ ID NO 1734) 073579_8/1-30 RFNNCGYHCY ET- ILI DVFDILSKYM D-DIDM (SEQ ID NO 1735) 023296_6/ 1-31 QGNKHLAHVA LK-AKSI GVLDVILDEY P- SLMD (SEQ ID NO 1736) Q09493_4/ 1-30 HGNHEIHQAC KN-GLT KHVEHLLYFG G-QIDA (SEQ ID NO 1737) Q83730_5/ 1-34 YGFNVLQCYM IAHV-RSSNV QILRFLLRHG V-DS SR (SEQ ID NO 1738) Q91974_4/ 1-34 DGDTFLHLAI IHEE-KALSL EVIRQAAGDA A- FLNF (SEQ ID NO 1739) 061240_6/ 1-32 MDRLPRDIAN ER-LHT DIVQLLDEYN LV—RSPNL (SEQ ID NO 1740) Consensus/ 60% psp* sLabAs pp . spb chlcbLlpps s .... shsh (SEQ ID NO 1741)
Table 14
IM02HUMANb PND-KIQAVI DAG—VCRRL VELLM -HNDYKWSP ALRAVGNIVT (SEQ ID NO: 1742)
MMU34228c PND-KIQAVI DAG—VCRRL VELLM -HNDYKWSP ALRAVGNIVT (SEQ ID NO: 1742) cATAF00130 SND-KIQAVI EAG—WPRL IQLLG -HSSPSVLIP ALRTIGNIVT (SEQ ID NO: 1743) cATKAPAPRO SND-KIQAVI EAG—WPRL IQLLG -HSSPSVLIP ALRTIGNIVT (SEQ ID NO: 1743)
ATU69533d TND-KIQTVI QAG—WPKL VELLL -HHSPSVLIP ALRTVGNIVT (SEQ ID NO: 1744)
AB002533c GNE-QIQMVI DSG—IVPHL VPLLS -HQEVKVQTA ALRAVGNIVT (SEQ ID NO: 1745)
HSSRPlla GNE-QIQMVI DSG—IVPHL VPLLS -HQEVKVQTA ALRAVGNIVT (SEQ ID NO: 1745)
HSSRPlBc GNE-QIQMVI DSG—WPFL VPLLS -HQEVKVQTA ALRAVGNIVT (SEQ ID NO: 1746)
CELF32El0c GNE-HIQMVI EAQ—WTHL VPLLG -HVDVKVQTA ALRAVGNIVT (SEQ ID NO: 1747)
SRPlYEASTd PQE-AIQAVI DVR—IPKRL VELLS -HESTLVQTP ALRAVGNIVT (SEQ ID NO: 1748)
IMOlHUMANc PNE-RIGMW KTG—WPQL VKLLG -ASELPIVTP ALRAIGNIVT (SEQ ID NO: 1749) bATAF00130 TSE-NTNVII ESG—AVPIF IQLLS -SASEDVREQ AVWALGNVAG (SEQ ID NO: 1750) bATKAPAPRO TSE-NTNVII ESG—AVPIF IQLLS -SASEDVREQ AVWALGNVAG (SEQ ID NO: 1750)
SLU96718b TSD-HTAWI EQG—AVPIF VQLLS -SPSDDVREQ AVWALGNVAG (SEQ ID NO: 1751)
ATU69533b TSD-HTKWI DHN—AVPIF VQLLA -SPSDDVREQ AVWALGNVAG (SEQ ID NO: 1752)
IM02HUMANC NSL-QTRIVI QAR—AVPIF IELLS -SEFEDVQEQ AVWALGNIAG (SEQ ID NO: 1753)
MMU34228a NSL-QTRNVI QAG—AVPIF IELLS -SEFEDVQEQ AVWALGNIAG (SEQ ID NO: 1754)
AB002533b TSE-QTQAW QSN—AVPLF LRLLH -SPHQNVCEQ AVWALGNIIG (SEQ ID NO: 1755)
HSSRPlBb TSA-QTQAW QSN—AVPLF LRLLR -SPHQNVCEQ AVWALGNIIG (SEQ ID NO: 1756)
CELF32El0b TSE-QTQAW NAG—AVPLF LQLLS -CGNLNVCEQ SVWALGNIIG (SEQ ID NO: 1757)
IMOlHUMANa TSE-QTKAW DGG—AIPAF ISLLA -SPHAHISEQ AVWALGNIAG (SEQ ID NO: 1758)
SRPlYEASTb TSA-QTKVW DAD—AVPLF IQLLY -TGSVEVKEQ AIWALGNVAG (SEQ ID NO: 1759)
CEF53B24 SQE-VTHLFV NSN—CLDIL IKLVR SQ-NARLSSQ TVWAIANIAA (SEQ ID NO: 1760)
CELF26Bl3b STE-QTIAAV EAG—VTIPL IHLSV HQ-SAQISEQ ALWAVANIAG (SEQ ID NO: 1761)
AB002533a RNP-PIDDLI KSG—ILPIL VHCLE RDDNPSLQFE AAWALTNIAS (SEQ ID NO: 1762)
HSSRPlBa RNP-PIDDLI KSG—ILPIL VKCLE RDDNPSLQFE AAWALTNIAS (SEQ ID NO: 1763)
CELF32El0a RNP-PIDDLI GSG—ILPVL VQCLS -STDPNLQFE AAWALTNIAS (SEQ ID NO: 1764)
CELF26Bl3a KNP-PIDEVI HCG—LLQAL VQALS -VENERVQYE AAWALTNIVS (SEQ ID NO: 1765) aATAF00130 QNP-PINEW QSG—WPRV VKFLS RDDFPKLQFE AAWALTNIAS (SEQ ID NO: 1766) aATKAPAPRO QNP-PINEW QSG—WPRV VKFLS RDDFPKLQFE AAWALTNIAS (SEQ ID NO: 1766)
ATU69533a RSP-PIEEVI SAG—WPRF VEFLK KEDYPAIQFE AAWALTNIAS (SEQ ID NO: 1767)
SLU96718a RSP-PIEEVI AAG—WPRL
Figure imgf000114_0001
RTDYPQLQFE AAWALTNIAS (SEQ ID NO: 1768)
IM02HUMANa PNP-PIDEVI STPG-WARF VEFLK RKENCSLQFE SAWVLTNIAS (SEQ ID NO: 1769)
SRPlYEASTa HRP-PIDWI QAG—WPRL VEFMR ENQPEMLQLE AAWALTNIAS (SEQ ID NO: 1770)
Figure imgf000114_0002
MMU34228d DDI-QTQVIL NCS—ALQSL LHLLS- -SPKESIKKE ACWTISNITA (SEQ ID NO 1784) dATAF00130 DDL-QTQMVL DQQ—ALPCL LNLLKNN- —YKKSIKKE ACWTISNITA (SEQ ID NO 1785)
ATU69533e DDI-QTQCVI NSG—ALPCL ANLLTQN- —HKKSIKKE ACWTISNITA (SEQ ID NO 1786)
SRPlYEASTe NDL-QTQWI NAG—VLPAL RLLLS- -SPKENIKKE ACWTISNITA (SEQ ID NO 1787)
CELF26Bl3d NDS-LTQAVI DLG—SLDEI LPLME- KTRSSSIVKE CCWLVSNIIA (SEQ ID NO 1788)
CRU40057b SPE-LAQSVI DSG—ALDSL VTCL- EEFDPGVKEA SAWTLGYIAG (SEQ ID NO 1789)
CRU40057e SET-LALSVI AEK—ALPPL VSALN- EEPEDHLKSA TAWTLGQIGR (SEQ ID NO 1790)
CRU40057d TPE-LAQAW DAG—AVAYL APLVI- -NQDAKLKRQ VCCALSQIAK (SEQ ID NO 1791)
CRU40057c NAD-VAQQW DAG—AVPLL VLCVQ- -EPELSLKRI AASALSDISK (SEQ ID NO 1792)
CTNBMOUSEk DVH-NRIVIR GLN—TIPLF VQLLY- -SPIENIQRV AAGVLCELAQ (SEQ ID NO 1793)
JC4835h EAL-NRSIIR DLN—CIPTF VQLLY- -SEVENIVRV AAGVLCELAQ (SEQ ID NO 1794)
CTNBMOUSEj CPA-NHAPLR EQG—AIPRL VQLLVR (20) GVRMEEIVEG CTGALHILAR (SEQ ID NO 1795)
HSPLGLNj CPA-NHAPLQ EAA—VIPRL VQLLVK (19) GVRMEEIVEG CTGALHILAR (SEQ ID NO 1796)
JC4835g CPS-NHTPIR DQG—GLPKL VQLLMK ( 17 ) GVRMEEIVEG TVGALHILAR (SEQ ID NO 1797)
HSP0071a DNK-VKMEVC RLG—GIKHL VDLLD- -HRVLEVQKN ACGALRNLVF (SEQ ID NO 1798)
HSU96136a DNK-IKAEIR RQG—GIQLL VDLLD- -HRMTEVHRS ACGALRNLVY (SEQ ID NO 1799)
HSU51269a NEG-VKRRVR QLR—GLPLL VALLD- -HPRAEVRRR ACGALRNLSY (SEQ ID NO 1800)
P12 OMOUSEa NDK-VKTDVA KLK—GIPIL VGLLD- -HPKKEVHLG ACGALKNISF (SEQ ID NO 1801)
S60712a DES-AKQQVY QLG—GICKL VDLLR- -SPNQNVQQA AAGALRNLVF (SEQ ID NO 1802)
HSIRNAUa KSE-ARKR QLR—GILKL LQLLK- -VQNEDVQRA VCGALRNLVF (SEQ ID NO 1803)
CTNBMOUSEh NYK-NKMMVC QVG—GIEAL VRTVLRA- -GDREDITEP AICALRHLTS (SEQ ID NO 1804)
JC4835e NPR-NKQWF QVG—GIEAL VRTIINA- -GDREEITEP AVCALRHLTS (SEQ ID NO 1805)
YEB3YEASTb NNE-NKLLIV EMG—GLEPL INQMM- -GDNVEVQCN AVGCITNLAT (SEQ ID NO 1806)
YEB3YEASTf HPL-NEGLIV DAG—FLKPL VRLLD- YKDSEEIQCH AVSTLRNLAA (SEQ ID NO 1807)
P12 OMOUSEb DQD-NKIAIK NCD—GVPAL VRLLRK- -ARDMDLTEV ITGTLWNLSS (SEQ ID NO 1808)
HSU51269b DTD-NKAAIR DCG—GVPAL VRLLRA- -ARDNEVREL VTGTLWNLSS (SEQ ID NO 1809)
HSP0071b TDE-NKIAMK NVG—GIPAL LRLLRK- -SIDAEVREL VTGVLWNLSS (SEQ ID NO 1810)
HSU96136b NDD-NKIALK NCG—GIPAL VRLLRK- -TTDLEIREL VTGVLWNLSS (SEQ ID NO 1811)
S60712b STT-NKLETR RQN—GIREA VSLLR- RTGNAEIQKQ LTGLLWNLSS (SEQ ID NO 1812)
HSIRNAUb DND-NKLEVA ELN—GVPRL LQVLK- QTRDLETKKQ ITGLLWNLSS (SEQ ID NO 1813)
HSU96136c SVY-IRAAVR KEK—GRPIL VELLR- -IDNDRVACA VATALRNMAL (SEQ ID NO 1814)
HSP0071c AAY-IRGGRP KRK—GLPIL VELLR- -MDNDRWSS GATALRNMAL (SEQ ID NO 1815)
HSU51269c ATY-IRATVR KER—GLPVL VELLQ- -SETDKWRA VAIALRNLSL (SEQ ID NO 1816)
P12 OMOUSEc GRY-IRSALR QEK—ALSAR AELLT- -SEHERWKA ASGALRNLAV (SEQ ID NO 1817)
CTNBMOUSEi AEM-AQNAVR LHY—GLPW VKLLH- PPSHWPLIKA TVGLIRNLAL (SEQ ID NO 1818)
JC4835f AEH-AENGVR LHY—GIPIL VKLLN- PPSRWPLIKA WGLIRNLGL (SEQ ID NO 1819)
CTNBMOUSEc HRE-GLLAIF KSG—GIPAL VKMLG- -SPVDSVLFY AITTLHNLLL (SEQ ID NO 1820)
JC4835a HRQ-GLMAIF KCS—GIPAL VKLLG- -HRIEAWFY AITTLHNLLL (SEQ ID NO 1821)
JCGlGlb DDS-CAALLA KSG—11PAL IELLN-A QQEDDEFVCQ IIYVFYQMVF (SEQ ID NO 1822)
HSU59919b DDS-CAALLA KSG—11PAL IELLN-A QQEDDEFVCQ IIYVFYQMVF (SEQ ID NO 1822)
SPU38G55b DDS-CAAMLA KSG—IIQSL IELLN-A KQEDDEIVCQ IVYVFYQMVF (SEQ ID NO 1823)
CTNBMOUSEf CSS-NKPAIV EAG—GMQAL GLHLT- -DPSQRLVQN CLWTLRNLSD (SEQ ID NO 1824)
JC4835d CSS-NKPAIV EAG—GMQAL AHYLS- -HQSTRLVQN CLWTLRNLSD (SEQ ID NO 1825)
CELC54D15 SNF-DAPNLV AFG—GRQIL ANLLS- -HGSPRLVQS TLETLRNISD (SEQ ID NO 1826)
APCHUMAc DVA-NKATLC SMKG-CMRAL VAQL- KSESEDLQQV IASVLRNLSW (SEQ ID NO 1827)
XLUG4442c DVA-NKATLC SMKS-CMRAL VAQL- KSESEDLQQV IASVLRNLSW (SEQ ID NO 1828)
DMU77947c DEN-NKALLC GQKQ-FMEAL VAQL- DSAPDDLLQV TASVLRNLSW (SEQ ID NO 1829)
JCGIGla FME-NKNDMV EMD—IVEKL VKMIP- -CEHEDLLNI TLRLLLNLSF (SEQ ID NO 1830)
HSU59919a FME-NKNDMV EMD—IVEKL VKMIP- -CEHEDLLNI TLRLLLNLSF (SEQ ID NO 1830)
SPU38G55a YVE-NKNEMA EQQ—IIERL AKLVP- -CDHEDLLNI TLRLLLNLSF (SEQ ID NO 1831)
CTNBMOUSEd QEG-AKMAVR LAG—GLQKM VALLN- -KTNVKFLAI TTDCLQILAY (SEQ ID NO 1832) JC4835b QEG-AKMAVR LAL—GLQKM VSLLQ- RPKVKFLAI VTDCLQILAY (SEQ ID NO: 1833)
CTNBMOUSEe NQE-SKLIIL ASG—GPQAL VNIMR- TYTYEKLLWT TSRVLKVLSV (SEQ ID NO: 1834)
JC4835c NQE-SKLIIL SSG—GPAEL VRIMR- SYTYEKLLYT TCRVLKVLSV (SEQ ID NO: 1835)
HSP0071d NME-NAKALA DSG—GIEKL VNITKGRG— DRSSLKWKA AAQVLNTLWQ (SEQ ID NO: 1836)
HSU96136d NME-NAKALR DAG—GIEKL VGISKSKGD- -KHSPKWKA ASQVLNSMWQ (SEQ ID NO: 1837)
IMOlHUMANb NPA-PPIDAV EQ-ILPTL VRLLH- -HDDPEVLAD TCWAISYLTD (SEQ ID NO: 1838)
ATU69533c KPQ-PHFDQV KP-ALPAL ERLIH- -SDAEEALTD ACWALSYLSD (SEQ ID NO: 1839)
CELF32E10- DPA-PSPAW RT-ILPAL SLLIHHQ- -DTNILID TWALSYLTD (SEQ ID NO: 1840)
AB002533z DPP-PPMETI QE-ILPAL CVLIHHT- -DVNILVD TWALSYLTD (SEQ ID NO: 1841)
IM02HUMAN- DST-MCRDYV LDCN-ILPPL LQLFSK- -QNRLTMTRN AVWALSNLCR (SEQ ID NO: 1842)
MMU34228b DST-MCRDYV LNCN-ILPPL LQLFSK- -QNRLTMTRN AWALSNLCR (SEQ ID NO: 1843)
SRPlYEASTc DST-DYRDYV LQCN-AMEPI LGLFN- -SNKPSLIRT ATWTLSNLCR (SEQ ID NO: 1844)
IMOB_RAT EPN-QLKPLV IQ-AMPTL IELMKDP- -SVWRDT TAWTVGRICE (SEQ ID NO: 1845)
HUMNTF9 EPS-QLKPLV IQ-AMPTL IELMKDP- -SVWRDT AAWTVGRICE (SEQ ID NO: 1846)
APCHUMANe NED-HRQILR ENN—CLQTL LQHLK- -SHSLTIVSN ACGTLWNLSA (SEQ ID NO: 1847)
XLU64442d NED-HRQILR ENN—CLQTL LQHLK- -SHSLTIVSN ACGTLWNLSA (SEQ ID NO: 1847)
DMU77947d CEP-YRQILR QHN—CLAIL LQQLK- -SESLTWSN SCGTLWNLSA (SEQ ID NO: 1848)
APCHUMANb DEE-HRHAMN ELG—GLQAI AELLQV ( 8 ) N DHYSITLRRY AGMALTNLTF (SEQ ID NO: 1849)
XLU64442b DEE-HRHAMN ELG—GLQAI AELLQV ( 8 ) N DHYSVTLRRY AGMALTNLTF (SEQ ID NO: 1850)
DMU77947b DEE-HRHAMC ELG—ALHAI PNLVHL (10) DQCCNSLRRY ALMALTNLTF (SEQ ID NO: 1851) fATAF00130 THD-QIKFMV SQG—CIKPL CDLLT- -CPDLKWTV CLEALENILV (SEQ ID NO: 1852) dATKAPAPRO THD-QIKFMV SQG—CIKPL CDLLT- -CPDLKWTV CLEALENILV (SEQ ID NO: 1852)
ATU69533g SHD-QIKYLV EQG—CIKPL CDLLV- -CPDPRIITV CLEGLENILK (SEQ ID NO: 1853)
IM02HUMANg SAE-QIKYLV ELG—CIKPL CDLLT- -VMDSKIVQV ALNGLENILR (SEQ ID NO: 1854)
MMU34228 f SAE-QIKYLV ELG—CIKPL CDLLT- -VMDAKIVQV ALNGLENILR (SEQ ID NO: 1855)
SRPlYEASTg RPD-IIRYLV SQG—CIKPL CDLLEIA- -DNRIIEV TLDALENILK (SEQ ID NO: 1856)
IMOlHUMANf TVE-QIVYLV HCG—IIEPL MNLLT- -AKDTKIILV ILDAISNIFQ (SEQ ID NO: 1857)
AB002533f RKD-QVAYLI QQN—VIPPF CNLLT- -VKDAQWQV VLDGLSNILK (SEQ ID NO: 1858)
HSSRPlld RKD-QVAYLI QQN—VIPPF CNLLT- -VKDAQWQV VLDGLSNILK (SEQ ID NO: 1858)
CELF32El0g RPN-QVEQMV KLG—VLRPF CAMLSCT- -DSQIIQV VLDGINNILK (SEQ ID NO: 1859)
APCHUMANa SQD-SCISMR QSG—CLPLL IQLLHG ( 10 ) SRGSKEARAR ASAALHNIIH (SEQ ID NO: 1860)
XLU64442a SQD-SCIAMR QSG—CLPLL IQLLHG ( 10 ) SRGSKEARAS GSAALDNIIH (SEQ ID NO: 1861)
DMU77947a NAQ-SCATLR RSG—CMPLL VQMMH-A PDNDQEVRKC AEQALHNWH (SEQ ID NO: 1862)
CTNBMOUSE1 DKE-AAEAIE AEG—ATAPL TELLH- -SRNEGVATY AAAVLFRMSE (SEQ ID NO: 1863)
JC4835i DKE-GADAIE REG—ATTIL TELLH- -SRNDGIAAY ARAVLFRMSE (SEQ ID NO: 1864)
CELF08F8 FDE-NKIVME QNG—TIEKL LKLFP- -IQDPELRKA VIMLLFNFSF (SEQ ID NO: 1865)
GDSlHUMANc RNDANCIHMV DNG—IVEKL MDLLDRHVE- -DGNVTVQHA ALSALRNLAI (SEQ ID NO: 1866)
AT81KBGEN4 NDR-NKKIIV DEG—GVSPL LRLLK- ESSSAEGQIA AATALGLLAC (SEQ ID NO: 1867)
S51350 PDKVQRTYYV HQ-ALPSI LNLMNDQ- -SLQVKET TAWCIGRIAD (SEQ ID NO: 1868)
CELF26Bl3c PDE-QIELAR ESG—VLPHV VAFF- -KEAENLVAP ALRTLGNVAT (SEQ ID NO: 1869)
HSU51269d DSLDNARSLL QAR—GVPAL VALVA- SSQSVREAKA ASHVLQTWS (SEQ ID NO: 1870)
CELF26B13- SSQLRDYVIK CHG-VEAL MHLMEKV- DQLGDSHVRT IAWAFSNMCR (SEQ ID NO: 1871)
YEB3YEASTd SEE-NRKELV NAG—AVPVL VSLLS- -STDPDVQYY CTTALSNIAV (SEQ ID NO: 1872)
GDSlHUMANa KNEFMRIPCV DAG—LISPL VQLLN- -SKDQEVLLQ TGRALGNICY (SEQ ID NO: 1873)
YEB3YEASTe SDTSYQLEIV RAG—GLPHL VKLIQ- -SDSIPLVLA SVACIRNISI (SEQ ID NO: 1874)
CTNBMOUSEa NYQDDAELAT R-AIPEL TKLLN- -DEDQVWNK AAVMVHQLSK (SEQ ID NO: 1875)
ADBHUMAN LHDINAQMVE DQG—FLDSL RDLIA- -DSNPMWAN AVAALSEISE (SEQ ID NO: 1876)
S60712c SGMSQLIGLK EK-GLPQI ARLLQ- -SGNSDWRS GASLLSNMSR (SEQ ID NO: 1877)
HSIRNAUd PTSVAQTWQ KES—GLQHT RKMLH- -VGDPSVKKT AISLLRNLSR (SEQ ID NO: 1878)
YEB3YEASTa FAEITEKYVR QVSREVLEPI LILLQ- -SQDPQIQVA ACAALGNLAV (SEQ ID NO: 1879)
HSIRNAUc GADGRKAMRR CDG—LIDSL VHYVR- ( 4 ) D YQPDDKATEN CVCILHNLSY (SEQ ID NO: 1880) CTNBMOUSEb KEASRHAIMR SPQ—MVSAI VRTMQ- NTNDVETARC TAGTLHNLSH (SEQ ID NO: 1881)
CET19B106 ANQKHLEYW ELG—MLSAF TDLLTCM- -DVSLVSY ILDAIYLLLQ (SEQ ID NO: 1882)
YSPPAA1B AAG-GDSTVY EQQ—IIPLL EQLTK- -DNDPDVQYF ATQALEQTND (SEQ ID NO: 1883)
CRU40057a NPQ-NIEALQ QAG—AMALL RPLLL- -DNVPSIQQS AALALGRLAN (SEQ ID NO: 1884)
D87671 YSDSLTAAMI DA-VLDEL PPLISES- -DMHVSQM AISFLTTLAK (SEQ ID NO: 1885)
CELM01E114 SQPDLYGSFV EVN—GVEIL LQLLG- -HENTDIVCA TLSLLRELTD (SEQ ID NO: 1886)
HSZYGHOMO ETPDNCEMFL NFN—GMKLF LDCLK-E FPEKQELIRN MLGLLGNVAE (SEQ ID NO: 1887)
YLK3CAEELa CEEQAREQIR IYD—GVPTL LGLLS- -IKNSRLQWH VAWTLAQLAE (SEQ ID NO: 1888)
GDSIHUMANb DSHSLQAQLI NMG—VIPTL VKLLG-1 HCQNAALTEM CLVAFGNLAE (SEQ ID NO: 1889)
COPBYEASTa RQDANRTPAL KAQ—YIELL MELLS- TTTSDEVIFE TALALTVLSA (SEQ ID NO: 1890)
CEC48D1 KSSKACTNMI QAG—IISIS LSCMEK- -VESNDITTK CLELLTNLSS (SEQ ID NO: 1891)
YEB3YEASTc RDD-NKHKIA TSG—ALIPL TKLAKSK- -HIRVQRN ATGALLNMTH (SEQ ID NO: 1892)
GDSIBOVINc KSKDVIKTIV QSG—GIKHL VTMATSE- -HVIMQNE ALVALALIAA (SEQ ID NO: 1893)
APCHUMANd ADVNSKKTLR EVG—SVKAL MECALEVKK- -ESTLKSVLS ALWNLSARNP (SEQ ID NO: 1894)
P115_BOVIN SKK-YRLEVG IQ-AMEHL IHVLQTD- -RSDSEIIGY ALDTLYNIIS (SEQ ID NO: 1895)
YD7 l_SCHPO PLE-MIQIVT KMN—IPDHL TATLPRS- -SDDIQND CLFALRQVSK (SEQ ID NO: 1896)
CELB033611 SDL-PAIQEQ DMKE-SIHCI VQLIG- -CSDVTIVEL ATGTLRNIGL (SEQ ID NO: 1897)
Consensus/ 60% ssp.pbphlb pss..slshL lpLLp . p . s . plbp . tshsLpNls . (SEQ ID NO: 1898)
Table 15
References
Bondeson, D.P., Mares, A., Smith, I.E.D., Ko, E., Campos, S., Miah, A.H., Mulholland, K.E., Routly, N., Buckley, D.L., Gustafson, J.L., et al. (2015). Catalytic in vivo protein knockdown by small-molecule PROTACs. Nat. Chem. Biol. 1 1 , 611-617.
Boudko, S.P., Londer, Y.Y., Letarov, A. V, Sernova, N. V, Engel, J., and Mesyanzhinov, V. V (2002). Domain organization, folding and stability of bacteriophage T4 fibritin, a segmented coiled-coil protein. Eur. J. Biochem. 269, 833-841 .
Brunette, T.J., Parmeggiani, F., Huang, P.-S., Bhabha, G., Ekiert, D.C., Tsutakawa, S.E., Hura, G.L., Tainer, J.A., Baker, D. (2015) Exploring the repeat protein universe through computational protein design. Nature 528, 580-584.
Chapman & McNaughton, B.R. (2016). Scratching the surface: Resurfacing proteins to endow new properties and function. Cell Chem. Biol. 23, 543-553.
D’Andrea, L.D., and Regan, L. (2003). TPR proteins: the versatile helix. Trends Biochem. Sci. 28, 655- 662.
Deshaies, R.J. (2015). Protein degradation: Prime time for PROTACs. Nat. Chem. Biol. 1 1 , 634-635. de Vries, S.J., and Bonvin, A.M.J.J. (2011). CPORT: a consensus interface predictor and its performance in prediction-driven docking with HADDOCK. PLoS One 6, e17695.
de Vries, S.J., van Dijk, M., and Bonvin, A.M.J.J. (2010). The HADDOCK web server for data-driven biomolecular docking. Nat. Protoc. 5, 883-897.
Guettler, S., LaRose, J., Petsalaki, E., Gish, G., Scotter, A., Pawson, T., Rottapel, R., and Sicheri, F. (2011). Structural basis and sequence rules for substrate recognition by Tankyrase explain the basis for cherubism disease. Cell 147, 1340-1354.
Giithe, S., Kapinos, L., Moglich, A., Meier, S., Grzesiek, S., and Kiefhaber, T. (2004). Very Fast Folding and Association of a Trimerization Domain from Bacteriophage T4 Fibritin. J. Mol. Biol. 337, 905-915.
Hao, B., Zheng, N., Schulman, B.A., Wu, G., Miller, J.J., Pagano, M., Pavletich, N.P. (2005). Structural basis of the Cks1 -dependent recognition of p27(Kip1) by the SCF(Skp2) ubiquitin ligase. Mol. Cell 20, 9- 19.
Kobe, B. & Kajava, A.V. (2000). When protein folding is simplified to protein coiling: the continuum of solenoid protein structures. Trends in Biochem. Sci. 25, 509-515.
Lee, J.-H., Kang, E., Lee, J., Kim, J., Lee, K.H., Han, J., Kang, H.Y., Ahn, S., Oh, Y., Shin, D., et al. (2014). Protein grafting of p53TAD onto a leucine zipper scaffold generates a potent HDM dual inhibitor. Nat. Commun. 5, 3814.
Leshchiner, E.S., Parkhitko, A., Bird, G.H., Lucca relli, J., Bellairs, J.A., Escudero, S., Opoku-Nsiah, K., Godes, M., Perrimon, N., and Walensky, L.D. (2015). Direct inhibition of oncogenic KRAS by
hydrocarbon-stapled SOS1 helices. Proc. Natl. Acad. Sci. U. S. A. 1 12, 1761-1766.
Longo, L.M. & Blaber, M. (2014). Symmetric protein architecture in protein design: to-down symmetric deconstruction. Methods Mol. Biol. 1216, 161-82.
Lu, J., Qian, Y., Altieri, M., Dong, H., Wang, J., Raina, K., Hines, J., Winkler, J.D., Crew, A.P., Coleman, K., et al. (2015). Hijacking the E3 Ubiquitin Ligase Cereblon to Efficiently Target BRD4. Chem. Biol. 22, 755-763.
Margarit, S.M., Sondermann, H., Hall, B.E., Nagar, B., Hoelz, A., Pirruccello, M., Bar- Sagi, D., and Kuriyan, J. (2003). Structural evidence for feedback activation by Ras.GTP of the Ras-specific nucleotide exchange factor SOS. Cell 112, 685-695. Meier, S., Guthe, S., Kiefhaber, T. and Grzesiek, S. (2004). Foldon, the natural trimerization domain of T4 fibritin, dissociates into a monomeric A-state form containing a stable beta-hairpin: atomic details of trimer dissociation and local beta-hairpin stability from residual dipolar couplings. J. Mol. Biol 344, 1051-1069.
Parmeggiani, F., Huang, P.-S., Vorobiev, S., Xiao, R., Park, K., Caprari, S., Su, M., Seetharaman, J.,
Mao, L., Janjua, H., Montelione, G.T., Hunt, J., Baker, D. (2015) A general computational approach for repeat protein design. J. Mol. Biol. 427, 563-575.
Rowling, P.J., Sivertssson, E.M., Perez-Riba, A., Main, E. R., Itzhaki, L.S. (2015) Biochem. Soc. Trans.
43 881-888.
Tamaskovic, R., Simon, Stefan, N., Scwhill, Pluckthun, A. (2012). Designed ankyrin repeat proteins (DARPins): From research to therapy. Methods in Enzym. 503, 101-134.
Thompson ,D.B., Cronican, J.J., Liu, D.R. (2012). Engineering and identifying supercharged proteins for macromolecule delivery into mammalian cells. Methods Enzymol. 503, 293-319.

Claims

Claims
1 . A modular binding protein comprising;
(i) two or more repeat domains,
(ii) inter-repeat loops linking said repeat domains; and
(iii) one or more heterologous peptide ligands that bind to a target molecule, each said peptide ligand being located in an inter-repeat loop or at the N or C terminus of the modular binding protein,
wherein said one or more heterologous peptide ligands comprise at least one amino acid sequence set out in Table 8 (SEQ ID NOs: 277-388) or Table 9 (SEQ ID NOs: 470-617) or a variant thereof.
2. A modular binding protein according to claim 1 comprising a first peptide ligand that binds a first target molecule and a second peptide ligand that binds a second target molecule.
3. A modular binding protein according to claim 2 wherein the first target molecule is b-catenin, KRAS, c-Myc, BRD4, Aurora-A, CK2alpha, Notch, Cdk2, PLK1 , BCR-ABL, PP1 c, MCL-1 , Bcl-2, BCL- XL, Jun, BFL1 , BAX, e1 F4E, Fos, HDAC4, BCL6, Tau, PD-L1 , KDM4A, EGFR, RAB25, or a transmembrane protein or GPCR.
4. A modular binding protein according to claim 2 or claim 3 comprising a first peptide ligand having an amino acid sequence set out in Table 8 (SEQ ID NOs: 277-388) or a variant thereof.
5. A modular binding protein according to claim 4 wherein the first target molecule is b-catenin and the peptide ligand has the amino acid sequence;
RSKKAHVLAASVEQATQNFLEKGEQIAKESQ (SEQ ID NO: 277),
RTLTVERLLEPLVTQVTTLV (SEQ ID NO: 278),
REQLEAQEARAREAHAREAHAREAYTREAYGREAYAREAHTWEAHGREARTREAQA (SEQ ID NO: 279),
DX1X2EFDQYL (SEQ ID NO: 280), where X1 and X2 are independently any amino acid, QALLDKAKINQ (SEQ ID NO: 281 ), or
GWLDSSRSLMEQDKENEALLRF (SEQ ID NO: 282).
6. A modular binding protein according to claim 4 wherein the first target molecule is KRAS and the peptide ligand has the amino acid sequence;
PLYISY (SEQ ID NO: 283)
SIEDLHEYWARLWNYLYVA (SEQ ID NO: 284)
QASLEELHEYWARLWNYRVA (SEQ ID NO: 285)
NASIKQLHAYWQRLYAYLAAVA (SEQ ID NO: 286)
CMWWREICPVWW (SEQ ID NO: 287) FARKTFLKLAF (SEQ ID NO: 288)
ARRFFLDIAD (SEQ ID NO: 289)
FRWPX1 X2RLX3X4 (SEQ ID NO: 290), where X1 to X4 are independently any amino acid tX2VFXhX1 p (SEQ ID NO: 291), where X1 to X2 are independently any amino acid
YGHGQVYYY (SEQ ID NO: 292)
ENPKQN (SEQ ID NO: 293)
DAYECLDASRPW (SEQ ID NO: 294) or
KSRDFYH (SEQ ID NO: 295).
7. A modular binding protein according to claim 4 wherein the first target molecule is myc and the peptide ligand has the amino acid sequence;
AGVEHQLRREVEIQSH (SEQ ID NO: 296)
WSVHAPSSRRTTpLAGTLDYLPPEMI (SEQ ID NO: 297)
TYQETY (SEQ ID NO: 298)
QAEEQKLSEEDLLR KRREQLKHKLEQLRNSCA (SEQ ID NO: 299)
NELKRSFAALRDQI (SEQ ID NO: 300)
NELKRAFAALRDQI (SEQ ID NO: 301)
IREKNHYHRQEVDDLRRQNALLEQQVRAL (SEQ ID NO: 302)
FNHITNASQWE (SEQ ID NO: 303)
GDLGAFSRGQM (SEQ ID NO: 304)
RSEFYYYGNTYYYSAMD (SEQ ID NO: 305)
QHDYTATDE (SEQ ID NO: 306)
QNPEEQDEGW (SEQ ID NO: 307) or
EKCRGVFPENF (SEQ ID NO: 308).
8. A modular binding protein according to any one of claims 2 to 7 wherein the second target molecule is an E3 ubiquitin ligase selected from MDM2, SCF(Skp2), Cul3-Keap1 , Cul3-SPOP, Cul3- KELCH, KELCH actinfilin, APC/C, TRP1 , SCFFbw7, SCF_TIR1 , Cul4-DDB1-Cdt2_1 , SCF_TRCP1_1 , b-TRCP, S0FFbw2, S0FFbw3 , OPDH VHL 1 , SCF coil, SOCS box-Cul5-SPSB4, UBR5,
CRL2(KLHDC2), GID4, TRIM21 , Nedd4, ElonginC, CBL, CRL4_CDT2_1 , CBL(PTK), CBL(Met), CRL4(COP1/DET), SH3RF1 , COP-1 , SIAH, ERAD-C, and UBR.
9. A modular binding protein according to any one of claims 2 to 7 wherein the second target molecule is selected from ALIX, AP-1 , AP-2, AP, Hsc70, LC3, FIP200, LC3 and GABARAP.
10. A modular binding protein according to claim 8 or claim 9 comprising a second peptide ligand having an amino acid sequence set out in Table 9 (SEQ ID NOs: 470-617) or a variant thereof.
11. A modular binding protein according to any one of the preceding claims wherein one or more of the peptide ligands are located in inter-repeat loops.
12. A modular binding protein according to claim 11 wherein the one or more peptide ligands are connected to the inter-repeat loops by one or more additional residues.
13. A modular binding protein according to any one of the preceding claims wherein a peptide ligand is located at the N terminus.
14. A modular binding protein according to claim 13 wherein the N terminal peptide ligand comprises an a helix.
15. A modular binding protein according to claim 13 or 14 wherein the N terminal peptide ligand comprises the sequence Xn-XYXXXIXXYXXXLXX- X1X2XX, where residues denoted by X are independently any amino acid, Xi, and X2 are independently any amino acid and n is 0 or any number.
16. A modular binding protein according to claim 15 wherein Xi is D.
17. A modular binding protein according to claim 15 or claim 16 wherein X2 is P.
18. A modular binding protein according to any one of the preceding claims wherein peptide ligand is located at the C terminus.
19. A modular binding protein according to claim 18 wherein the C terminal peptide ligand comprises an a helix.
20. A modular binding protein according to claim 19 wherein the C terminal peptide ligand comprises the sequence XiX2XX-XXAXXXLXX[AV]XXXXX-Xn, where residues denoted by X are independently any amino acid, Xi, and X2 are independently any amino acid and n is 0 or any number.
21. A modular binding protein according to claim 20 wherein Xi is D.
22. A modular binding protein according to claim 20 or claim 21 wherein X2 is P.
23. A modular binding protein according to any of claims 1 to 10 comprising an N terminal peptide ligand of Table 8 (SEQ ID NOs: 277-388) or a variant thereof and a C terminal peptide ligand of Table 9 (SEQ ID NOs: 470-617) or a variant thereof.
24. A modular binding protein according to any of claims 1 to 10 comprising an inter-repeat peptide ligand of Table 8 (SEQ ID NOs: 277-388) or a variant thereof and a C terminal peptide ligand of Table 9 (SEQ ID NOs: 470-617) or a variant thereof.
25. A modular binding protein according to any of claims 1 to 10 comprising an inter-repeat peptide ligand of Table 8 (SEQ ID NOs: 277-388) or a variant thereof and an N terminal peptide ligand of Table 9 (SEQ ID NOs: 470-617) or a variant thereof
26. A modular binding protein according to any of claims 1 to 10 comprising a C terminal peptide ligand of Table 8 (SEQ ID NOs: 277-388) or a variant thereof and an N terminal peptide ligand Table 9 (SEQ ID NOs: 470-617) or a variant thereof
27. A modular binding protein according to any of claims 1 to 10 comprising an inter-repeat peptide ligand of Table 9 (SEQ ID NOs: 470-617) or a variant thereof and an N terminal peptide ligand of Table 8 (SEQ ID NOs: 277-388) or a variant thereof.
29. A modular binding protein according to any of claims 1 to 10 comprising an inter-repeat peptide ligand of Table 9 (SEQ ID NOs: 470-617) or a variant thereof and a C terminal peptide ligand of Table 8 (SEQ ID NOs: 277-388) or a variant thereof.
30. A modular binding protein according to any of the preceding claims wherein the repeat domains are helix-turn-helix repeat domains.
31. A modular binding protein according to claim 30 wherein the repeat domains are
tetratricopeptide (TPR) repeat domains.
32. A modular binding protein according to claim 31 wherein the repeat domains have the amino acid sequence Y-X1X2X3X4;
wherein Y is an amino acid sequence shown in any of Tables 4 to 6 or a variant thereof and Xi, X2, X3, X4 are independently any amino acid.
33. A modular binding protein according to claim 31 or claim 32 wherein the repeat domains have the amino acid sequence;
AE AWYN LG N AYYKQG D YQ KA I E YY Q KALE L-Xi X2X3X4 (SEQ ID NO: 13); or
AEALNNLGNVYREQGDYQKAIEYYQKALEL-X1X2X3X4 (SEQ ID NO: 1899); or
AEAWYNLGNAYYRQGDYQRAIEYYQRALEL-X1X2X3X4 (SEQ ID NO: 20); or
AEALNNLGNVYREQGDYQRAIEYYQRALEL-X1X2X3X4 (SEQ ID NO: 21); or
AEALRNLGRVYRRQGRYQRAIEYYRRALEL-X1X2X3X4 (SEQ ID NO: 1900)
wherein Xi , X2, X3, X4 are independently any amino acid, or a variant thereof.
34. A modular binding protein according to claim 32 or claim 33 wherein Xi is D.
35. A modular binding protein according to any one of claims 32 to 34 wherein X2 is P.
36. A modular binding protein according to claim 31 wherein the repeat domains have the amino acid sequence:
EDSAEAERLKTEGNEQMKVENFEAAVHFYGKAIELNPANAVYFCNRAAAYSKLGNYAGAVQ DCERAICIDPAYSKAYGRMGLALSSLNKHVEAVAYYKKALELDPDNETYKSNLKIAELKLREA
P (SEQ ID NO: 18)
or a variant thereof
37. A modular binding protein according to any one of the preceding claims comprising 2-5 repeat domains.
38. A modular binding protein according to any one of the preceding claims further comprising a targeting domain, intracellular transfer domain, stabilising domain, oligomerisation domain, cytotoxic agent, therapeutic agent and/or a detectable label.
39. A modular binding protein comprising an amino sequence set out in Table 13 (SEQ ID NOs: 1230 to 1304)
40. A nucleic acid encoding a modular binding protein according to any one of claims 1 to 39.
41. An expression vector comprising a nucleic acid according to claim 40.
42. A host cell comprising an expression vector according to claim 41 .
43. A method of producing a modular binding protein comprising expressing a nucleic acid according to claim 40 to produce the modular binding protein.
44. A method of producing a modular binding protein comprising;
inserting a first nucleic acid encoding a peptide ligand of Table 8 (SEQ ID NOs: 277-388) or Table 9 (SEQ ID NOs: 470-617) into a second nucleic acid encoding two or more repeat domains linked by inter-repeat loops to produce a chimeric nucleic acid encoding a modular binding protein according to any one of claims 1 to 39; and
expressing said chimeric nucleic acid to produce the modular binding protein.
45. A method of producing a modular binding protein that binds to a first target molecule and a second target molecule comprising;
providing a nucleic acid encoding two or more repeat domains linked by inter-repeat loops; and
incorporating into said nucleic acid a first nucleotide sequence encoding a first peptide ligand that binds to a first target molecule and a second nucleotide sequence encoding a second peptide ligand that binds to a second target molecule to generate a nucleic acid encoding a modular binding protein comprising said first and second peptide ligands, wherein said peptide ligands are located in an inter-repeat loop or at the N or C terminus of the modular binding protein;wherein one or both of the first and second peptide ligands are set out in Table 8 (SEQ ID NOs: 277-388) or Table 9 (SEQ ID NOs: 470-617) and
expressing the nucleic acid to produce said protein.
46. A method according to claim 45 wherein one of the first or second target molecules is an E3 ubiquitin ligase.
47. A library comprising modular binding proteins, each modular binding protein in the library comprising;
(i) two or more repeat domains,
(ii) inter-repeat loops linking said repeat domains; and
(iii) a first peptide ligand of Table 8 (SEQ ID NOs: 277-388) or Table 9 (SEQ ID NOs: 470- 617) and a second peptide ligand, each said peptide ligand being located in an inter-repeat loop or at the N or C terminus of the modular binding protein,
wherein at least one amino acid residue in the second peptide ligand in said library is diverse.
48. A library according to claim 47 wherein each modular binding protein in the library comprises one peptide ligand.
49. A library comprising a first and a second sub-library of modular binding proteins, each modular binding protein in the first and second sub-libraries comprising;
(i) two or more repeat domains,
(ii) inter-repeat loops linking said repeat domains; and
(iii) a peptide ligand of Table 8 (SEQ ID NOs: 277-388) or Table 9 (SEQ ID NOs: 470-617) and a peptide ligand comprising at least one diverse amino acid residue,
wherein the peptide ligand in the modular binding proteins in the first sub-library binds to a first target molecule and is located in one of (i) an inter-repeat loop; (ii) the N terminus or (iii) the C terminus of the modular binding protein, and
the peptide ligand in the modular binding proteins in the second sub-library binds to a second target molecule and is located in another of (i) an inter-repeat loop; (ii) the N terminus or (iii) the C terminus of the modular binding protein.
50. A library according to any one of claims 47-49 wherein the modular binding proteins are according to any one of claims 1 to 39.
51 . A library according to any one of claims 47-50 wherein the library is displayed on the surface of particles
52. A method of screening a library comprising;
(a) providing a library of modular binding proteins, each modular binding protein in the library comprising;
(i) two or more repeat domains
(ii) inter-repeat loops linking said repeat domains; and
(iii) a first peptide ligand located in an inter-repeat loop, at the N terminus or at the C terminus of the protein, and
(iv) a second peptide ligand located in an inter- re peat loop, at the N terminus or at the C terminus of the protein
wherein the first peptide ligand has a sequence set out in Table 8 (SEQ ID NOs: 277- 388) or Table 9 (SEQ ID NOs: 470-617) and the second peptide ligand has at least one diverse amino acid residue,
(b) screening the library for modular binding proteins which display a binding activity, and
(c) identifying one or more modular binding proteins in the library which display the binding activity.
53. A population of nucleic acids encoding a library according to any one of claims 47-51 .
54. A method of producing a library comprising expressing a population of nucleic acids according to claim 53.
55. A method of producing a library of modular binding proteins comprising;
(a) providing a population of nucleic acids encoding a diverse population of modular binding proteins comprising
(i) two or more repeat domains
(ii) inter-repeat loops linking said repeat domains;
(iii) a first peptide ligand located in an inter-repeat loop, at the N terminus or at the C terminus of the protein, and
(iv) a second peptide ligand located in an inter- re peat loop, at the N terminus or at the C terminus of the protein,
wherein the first peptide ligand has a sequence set out in Table 8 (SEQ ID NOs: 277- 388) or Table 9 (SEQ ID NOs: 470-617) and the second peptide ligand has at least one diverse amino acid residue, and
(b) expressing said population of nucleic acids to produce the diverse population, thereby producing a library of modular binding proteins.
56. A pharmaceutical composition comprising a modular binding protein according to any one of claims 1 to 39, a nucleic acid according to claim 40 or a vector according to claim 41 and a pharmaceutically acceptable excipient.
57. A method of producing a pharmaceutical composition comprising formulating a modular binding protein according to any one of claims 1 to 39, a nucleic acid according to claim 40 or a vector according to claim 41 with a pharmaceutically acceptable excipient.
58. A population of liposomes comprising a modular binding protein according to any one of claims 1 to 39, a nucleic acid according to claim 40 or a vector according to claim 41 a .
59. A method of producing a population of liposomes comprising admixing a modular binding protein according to any one of claims 1 to 39, a nucleic acid according to claim 40 or a vector according to claim 41 a with a lipid solution and evaporating said solution to produce liposomes encapsulating said modular binding protein, nucleic acid or vector.
60. A modular binding protein according to any one of claims 1 to 39, a nucleic acid according to claim 40 or a vector according to claim 41 for use in a method of diagnosis or treatment in human or animal subject.
61 . A modular binding protein according to any one of claims 1 to 39 that binds to a target molecule, a nucleic acid according to claim 40 that encodes a modular binding protein that binds to a target molecule or a vector according to claim 41 for use in the treatment of a disorder associated with the target molecule.
62. Use of a modular binding protein according to any one of claims 1 to 39 that binds to a target molecule, a nucleic acid according to claim 40 that encodes a modular binding protein that binds to a target molecule or a vector according to claim 41 in the manufacture of a medicament for use in the treatment of a disorder associated with the target molecule.
63. A method of treatment of a disorder associated with the target molecule comprising;
administering a modular binding protein according to any one of claims 1 to 39 that binds to a target molecule, a nucleic acid according to claim 40 that encodes a modular binding protein that binds to a target molecule or a vector according to claim 41 comprising said nucleic acid to an individual in need thereof.
64. A modular binding protein, nucleic acid or vector for use according to claim 61 , use according to claim 62 or method according to claim 63 wherein the disorder is an inflammatory disorder, a neurodegenerative disease, an angiogenic disorder, a bone loss disorder or cancer.
65. A modular binding protein, nucleic acid or vector for use, use or method according to claim 64 wherein the cancer is breast, ovarian, colorectal, gastrointestinal, pancreatic, prostate, thyroid, lung, hepatocellular carcinoma, oesophageal, multiple myeloma, leukemia, T-cell lymphoma,
neuroblastoma, glioblastoma multiforme, pleural mesothelioma, myelogenous leukemia (CML), acute lymphoblastic leukemia (ALL) or acute myelogenous leukemia (AML).
66. A modular binding protein, nucleic acid or vector for use, use or method according to claim 64 wherein the neurodegenerative disease is Alzheimer’s disease, Huntington’s disease, Parkinson’s disease, multiple sclerosis or amylotrophic lateral sclerosis.
67. A modular binding protein, nucleic acid or vector for use, use or method according to claim 64 wherein the inflammatory disorder is an autoimmune disease. 68. A modular binding protein, nucleic acid or vector for use, use or method according to any one of claims 60 to 67 wherein the modular binding protein, nucleic acid or vector is encapsulated in a liposome.
PCT/EP2020/054697 2019-02-21 2020-02-21 Modular binding proteins WO2020169838A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US16/282,155 US11279925B2 (en) 2017-08-18 2019-02-21 Chimeric proteins
GBGB1902392.8A GB201902392D0 (en) 2019-02-21 2019-02-21 Modular binding proteins
US16/282,155 2019-02-21
GB1902392.8 2019-02-21

Publications (1)

Publication Number Publication Date
WO2020169838A1 true WO2020169838A1 (en) 2020-08-27

Family

ID=65999040

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2020/054697 WO2020169838A1 (en) 2019-02-21 2020-02-21 Modular binding proteins

Country Status (2)

Country Link
GB (1) GB201902392D0 (en)
WO (1) WO2020169838A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114796525A (en) * 2022-04-25 2022-07-29 南方科技大学 Application of cell cycle regulatory protein inhibitor in tumor treatment
US11453683B1 (en) 2019-08-29 2022-09-27 Mirati Therapeutics, Inc. KRas G12D inhibitors
US11548888B2 (en) 2019-01-10 2023-01-10 Mirati Therapeutics, Inc. KRas G12C inhibitors
US11702418B2 (en) 2019-12-20 2023-07-18 Mirati Therapeutics, Inc. SOS1 inhibitors
US11890285B2 (en) 2019-09-24 2024-02-06 Mirati Therapeutics, Inc. Combination therapies
US11932633B2 (en) 2018-05-07 2024-03-19 Mirati Therapeutics, Inc. KRas G12C inhibitors
US11964989B2 (en) 2022-07-20 2024-04-23 Mirati Therapeutics, Inc. KRas G12D inhibitors

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1992001047A1 (en) 1990-07-10 1992-01-23 Cambridge Antibody Technology Limited Methods for producing members of specific binding pairs
US5565332A (en) 1991-09-23 1996-10-15 Medical Research Council Production of chimeric antibodies - a combinatorial approach
US5733743A (en) 1992-03-24 1998-03-31 Cambridge Antibody Technology Limited Methods for producing members of specific binding pairs
US5858657A (en) 1992-05-15 1999-01-12 Medical Research Council Methods for producing members of specific binding pairs
US5871907A (en) 1991-05-15 1999-02-16 Medical Research Council Methods for producing members of specific binding pairs
US5872215A (en) 1991-12-02 1999-02-16 Medical Research Council Specific binding members, materials and methods
US5885793A (en) 1991-12-02 1999-03-23 Medical Research Council Production of anti-self antibodies from antibody segment repertoires and displayed on phage
US5962255A (en) 1992-03-24 1999-10-05 Cambridge Antibody Technology Limited Methods for producing recombinant vectors
US6140471A (en) 1992-03-24 2000-10-31 Cambridge Antibody Technology, Ltd. Methods for producing members of specific binding pairs
US6172197B1 (en) 1991-07-10 2001-01-09 Medical Research Council Methods for producing members of specific binding pairs
US6225447B1 (en) 1991-05-15 2001-05-01 Cambridge Antibody Technology Ltd. Methods for producing members of specific binding pairs
US6492160B1 (en) 1991-05-15 2002-12-10 Cambridge Antibody Technology Limited Methods for producing members of specific binding pairs
EP3004152A1 (en) * 2013-05-31 2016-04-13 Molecular Partners AG Designed ankyrin repeat proteins binding to hepatocyte growth factor
WO2017106728A2 (en) 2015-12-16 2017-06-22 University Of Washington Repeat protein architectures
WO2019034332A1 (en) * 2017-08-18 2019-02-21 Cambridge Enterprise Limited Modular binding proteins

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1992001047A1 (en) 1990-07-10 1992-01-23 Cambridge Antibody Technology Limited Methods for producing members of specific binding pairs
US5969108A (en) 1990-07-10 1999-10-19 Medical Research Council Methods for producing members of specific binding pairs
US5871907A (en) 1991-05-15 1999-02-16 Medical Research Council Methods for producing members of specific binding pairs
US6492160B1 (en) 1991-05-15 2002-12-10 Cambridge Antibody Technology Limited Methods for producing members of specific binding pairs
US6291650B1 (en) 1991-05-15 2001-09-18 Cambridge Antibody Technology, Ltd. Methods for producing members of specific binding pairs
US6225447B1 (en) 1991-05-15 2001-05-01 Cambridge Antibody Technology Ltd. Methods for producing members of specific binding pairs
US6172197B1 (en) 1991-07-10 2001-01-09 Medical Research Council Methods for producing members of specific binding pairs
US5565332A (en) 1991-09-23 1996-10-15 Medical Research Council Production of chimeric antibodies - a combinatorial approach
US5885793A (en) 1991-12-02 1999-03-23 Medical Research Council Production of anti-self antibodies from antibody segment repertoires and displayed on phage
US5872215A (en) 1991-12-02 1999-02-16 Medical Research Council Specific binding members, materials and methods
US6521404B1 (en) 1991-12-02 2003-02-18 Medical Research Council Production of anti-self antibodies from antibody segment repertoires and displayed on phage
US6140471A (en) 1992-03-24 2000-10-31 Cambridge Antibody Technology, Ltd. Methods for producing members of specific binding pairs
US5962255A (en) 1992-03-24 1999-10-05 Cambridge Antibody Technology Limited Methods for producing recombinant vectors
US5733743A (en) 1992-03-24 1998-03-31 Cambridge Antibody Technology Limited Methods for producing members of specific binding pairs
US5858657A (en) 1992-05-15 1999-01-12 Medical Research Council Methods for producing members of specific binding pairs
EP3004152A1 (en) * 2013-05-31 2016-04-13 Molecular Partners AG Designed ankyrin repeat proteins binding to hepatocyte growth factor
WO2017106728A2 (en) 2015-12-16 2017-06-22 University Of Washington Repeat protein architectures
WO2019034332A1 (en) * 2017-08-18 2019-02-21 Cambridge Enterprise Limited Modular binding proteins

Non-Patent Citations (86)

* Cited by examiner, † Cited by third party
Title
"Protocols in Molecular Biology", 1992, JOHN WILEY & SONS
"Recombinant Gene Expression Protocols", March 1997, HUMANA PRESS INC
"Remington's Pharmaceutical Sciences", 1990, MACK PUBLISHING COMPANY
ALTSCHUL ET AL., FEBS J., vol. 272, pages 5101 - 5109
ALTSCHUL ET AL., J. MOL. BIOL., vol. 215, 1990, pages 403 - 410
ALTSCHUL ET AL., NUC. ACIDS RES., vol. 25, 1977, pages 3389 - 3402
ALTSCHUL ET AL., NUCLEIC ACIDS RES., vol. 39, 1 July 2011 (2011-07-01), pages 3389 - 34021
ANDREAS PL?CKTHUN: "Designed Ankyrin Repeat Proteins (DARPins): Binding Proteins for Research, Diagnostics, and Therapy", ANNUAL REVIEW OF PHARMACOLOGY AND TOXICOLOGY, vol. 55, no. 1, 6 January 2015 (2015-01-06), pages 489 - 511, XP055217451, ISSN: 0362-1642, DOI: 10.1146/annurev-pharmtox-010611-134654 *
BAYLISS ET AL., MOL. CELL, vol. 12, 2003, pages 851 - 62
BECHARA ET AL., FEBS LETTERS, vol. 587, no. 1, 2013, pages 1693 - 1702
BINZ H ET AL: "Designing Repeat Proteins: Well-expressed, Soluble and Stable Proteins from Combinatorial Libraries of Consensus Ankyrin Repeat Proteins", JOURNAL OF MOLECULAR BIOLOGY, ACADEMIC PRESS, UNITED KINGDOM, vol. 332, no. 2, 12 September 2003 (2003-09-12), pages 489 - 503, XP027101280, ISSN: 0022-2836, [retrieved on 20030912], DOI: 10.1016/S0022-2836(03)00896-9 *
BINZ H KASPAR ET AL: "High-affinity binders selected from designed ankyrin repeat protein libraries", NATURE BIOTECHNOLOGY, GALE GROUP INC., NEW YORK, US, vol. 22, no. 5, 1 May 2004 (2004-05-01), pages 575 - 582, XP002343919, ISSN: 1087-0156, DOI: 10.1038/NBT962 *
BIRTS ET AL., CHEM. SCI., vol. 4, 2013, pages 3046 - 57
BLATCH ET AL., BIOESSAYS., vol. 21, no. 11, pages 932 - 9
BONDESON, D.P.MARES, A.SMITH, I.E.D.KO, E.CAMPOS, S.MIAH, A.H.MULHOLLAND, K.E.ROUTLY, N.BUCKLEY, D.L.GUSTAFSON, J.L. ET AL.: "Catalytic in vivo protein knockdown by small-molecule PROTACs", NAT. CHEM. BIOL., vol. 11, 2015, pages 611 - 617, XP055279063, DOI: 10.1038/nchembio.1858
BOUDKO, S.P.LONDER, Y.Y.LETAROV, A. VSERNOVA, N. VENGEL, J.MESYANZHINOV, V. V: "Domain organization, folding and stability of bacteriophage T4 fibritin, a segmented coiled-coil protein", EUR. J. BIOCHEM., vol. 269, 2002, pages 833 - 841
BRUNETTE, T.J.PARMEGGIANI, F.HUANG, P.-S.BHABHA, G.EKIERT, D.C.TSUTAKAWA, S.E.HURA, G.L.TAINER, J.A.BAKER, D.: "Exploring the repeat protein universe through computational protein design", NATURE, vol. 528, 2015, pages 580 - 584, XP055664964, DOI: 10.1038/nature16162
CHAPMANMCNAUGHTON, B.R.: "Scratching the surface: Resurfacing proteins to endow new properties and function", CELL CHEM. BIOL., vol. 23, 2016, pages 543 - 553, XP029552462, DOI: 10.1016/j.chembiol.2016.04.010
D'ANDREA, L.D.REGAN, L.: "TPR proteins: the versatile helix", TRENDS BIOCHEM. SCI., vol. 28, 2003, pages 655 - 662, XP004476604, DOI: 10.1016/j.tibs.2003.10.007
DAVEY ET AL., TRENDS BIOCHEM. SCI., vol. 36, 2011, pages 159 - 169
DE VRIES, S.J.BONVIN, A.M.J.J.: "CPORT: a consensus interface predictor and its performance in prediction-driven docking with HADDOCK", PLOS ONE, vol. 6, 2011, pages e17695
DE VRIES, S.J.VAN DIJK, M.BONVIN, A.M.J.J.: "The HADDOCK web server for data-driven biomolecular docking", NAT. PROTOC., vol. 5, 2010, pages 883 - 897
DESHAIES, R.J.: "Protein degradation: Prime time for PROTACs", NAT. CHEM. BIOL., vol. 11, 2015, pages 634 - 635, XP055414762, DOI: 10.1038/nchembio.1887
DICE J.F., TRENDS BIOCHEM. SCI., vol. 15, 1990, pages 305 - 309
DOWLING ET AL., BIOCHEM., vol. 47, 2008, pages 13554 - 63
ERKIZAN ET AL., CELL CYCLE, vol. 10, 2011, pages 3397 - 408
FAN, X.ET, NATURE NEUROSCIENCE, vol. 17, pages 471 - 480
FINN ET AL., NUCLEIC ACIDS RESEARCH, vol. 44, 2016, pages D279 - D285
GONDEAU ET AL., J. BIOL. CHEM., vol. 280, 2005, pages 13793 - 800
GUETTLER, S.LAROSE, J.PETSALAKI, E.GISH, G.SCOTTER, A.PAWSON, T.ROTTAPEL, R.SICHERI, F.: "Structural basis and sequence rules for substrate recognition by Tankyrase explain the basis for cherubism disease", CELL, vol. 147, no. 1, 2011, pages 1340 - 1354, XP028392398, DOI: 10.1016/j.cell.2011.10.046
GUTHE, S.KAPINOS, L.MOGLICH, A.MEIER, S.GRZESIEK, S.KIEFHABER, T.: "Very Fast Folding and Association of a Trimerization Domain from Bacteriophage T4 Fibritin", J. MOL. BIOL., vol. 337, no. 4, 2004, pages 1051 - 915, XP004496019, DOI: 10.1016/j.jmb.2004.02.020
HAO, B.ZHENG, N.SCHULMAN, B.A.WU, G.MILLER, J.J.PAGANO, M.PAVLETICH, N.P.: "Structural basis of the Cks1-dependent recognition of p27(Kip1) by the SCF(Skp2) ubiquitin ligase", MOL. CELL, vol. 20, 2005, pages 9 - 19
HETZ ET AL., MOL. CELL, vol. 63, 2016, pages 686 - 95
HOLEHOUSE ET AL., BIOPHYS. J., vol. 112, 2017, pages 16 - 21
HUBER ET AL., CELL, vol. 90, 1997, pages 871 - 882
JOHANNES SCHILLING ET AL: "From DARPins to LoopDARPins: Novel LoopDARPin Design Allows the Selection of Low Picomolar Binders in a Single Round of Ribosome Display", JOURNAL OF MOLECULAR BIOLOGY, vol. 426, no. 3, 1 February 2014 (2014-02-01), United Kingdom, pages 691 - 721, XP055689492, ISSN: 0022-2836, DOI: 10.1016/j.jmb.2013.10.026 *
KATOHSTANDLEY, MOLECULAR BIOLOGY AND EVOLUTION, vol. 30, no. 4, 2013, pages 772 - 780
KIM ET AL., NAT. CHEM. BIOL., vol. 9, 2013, pages 643 - 50
KINCH LNGRISHIN NV, CURR. OPIN. STRUCT. BIOL., vol. 12, no. 3, June 2002 (2002-06-01), pages 400 - 8
KOBE, B.KAJAVA, A.V.: "When protein folding is simplified to protein coiling: the continuum of solenoid protein structures", TRENDS IN BIOCHEM. SCI., vol. 25, 2000, pages 509 - 515, XP004224292, DOI: 10.1016/S0968-0004(00)01667-4
KOBEKAJAVA, TRENDS IN BIOCHEMICAL SCIENCES, vol. 25, no. 10, 2000, pages 509 - 15
KOLAŠINAC ET AL., INT. J. MOL. SCI., vol. 19, 2018, pages 346
KONTERMANN, RDUBEL, S: "Molecular Cloning: a Laboratory Manual", 2001, COLD SPRING HARBOR LABORATORY PRESS
KORINEK ET AL., SCIENCE, vol. 275, no. 5307, 1997, pages 1784 - 7
KUBE ET AL., LANGMUIR, vol. 33, 2017, pages 1051 - 1059
LASSMANNSONNHAMMER, BMC BIOINFORMATICS, vol. 6, no. 298, 2005
LEE ET AL., CHEMBIOCHEM, vol. 14, 2013, pages 445 - 451
LEE, J.-H.KANG, E.LEE, J.KIM, J.LEE, K.H.HAN, J.KANG, H.Y.AHN, S.OH, Y.SHIN, D. ET AL.: "Protein grafting of p53TAD onto a leucine zipper scaffold generates a potent HDM dual inhibitor", NAT. COMMUN., vol. 5, 2014, pages 3814, XP055522736, DOI: 10.1038/ncomms4814
LESHCHINER ET AL., PNAS, vol. 112, no. 6, 2015, pages 1761 - 1766
LESHCHINER, E.S.PARKHITKO, A.BIRD, G.H.LUCCARELLI, J.BELLAIRS, J.A.ESCUDERO, S.OPOKU-NSIAH, K.GODES, M.PERRIMON, N.WALENSKY, L.D.: "Direct inhibition of oncogenic KRAS by hydrocarbon-stapled SOS1 helices", PROC. NATL. ACAD. SCI. U. S. A., vol. 112, 2015, pages 1761 - 1766
LI ET AL., BIOCHEMISTRY, vol. 45, 2006, pages 15168 - 15178
LI ET AL., NAT. STRUCT. MOL. BIOL., vol. 17, 2010, pages 105 - 111
LLOUZ ET AL., J. BIOL. CHEM., vol. 281, 2006, pages 30621 - 30630
LONGO, L.M.BLABER, M.: "Symmetric protein architecture in protein design: to-down symmetric deconstruction", METHODS MOL. BIOL., vol. 1216, 2014, pages 161 - 82
LU, J.QIAN, Y.ALTIERI, M.DONG, H.WANG, J.RAINA, K.HINES, J.WINKLER, J.D.CREW, A.P.COLEMAN, K. ET AL.: "Hijacking the E3 Ubiquitin Ligase Cereblon to Efficiently Target BRD4", CHEM. BIOL., vol. 22, 2015, pages 755 - 763
MACCIONI ET AL., EMBO J., vol. 7, 1988, pages 1957 - 63
MARGARIT, S.M.SONDERMANN, H.HALL, B.E.NAGAR, B.HOELZ, A., PIRRUCCELLO, M.BAR- SAGI, D.KURIYAN, J.: "Structural evidence for feedback activation by Ras.GTP of the Ras-specific nucleotide exchange factor SOS", CELL, vol. 112, no. 5, 2003, pages 685 - 695
MEIER, S.GUTHE, S.KIEFHABER, T.GRZESIEK, S.: "Foldon, the natural trimerization domain of T4 fibritin, dissociates into a monomeric A-state form containing a stable beta-hairpin: atomic details of trimer dissociation and local beta-hairpin stability from residual dipolar couplings", J. MOL. BIOL, vol. 344, 2004, pages 1051 - 1069
MENDOZA ET AL., CANCER RES., vol. 63, 2003, pages 1020 - 4
MOELLERING ET AL., NATURE, vol. 462, 2009, pages 182 - 8
NANJAPPA ET AL., NUCL ACIDS RES, vol. 42, January 2014 (2014-01-01), pages D959 - 65
NUCL. ACIDS RES., vol. 25, 1997, pages 3389 - 3402
PARMEGGIANI ET AL., J. MOL. BIOL., vol. 302, 2000, pages 205 - 217
PARMEGGIANI, F.HUANG, P.-S.VOROBIEV, S.XIAO, R.PARK, K.CAPRARI, S.SU, M.SEETHARAMAN, J.MAO, L.JANJUA, H.: "A general computational approach for repeat protein design", J. MOL. BIOL., vol. 427, 2015, pages 563 - 575, XP029189207, DOI: 10.1016/j.jmb.2014.11.005
PATEL ET AL., J. BIOL. CHEM., vol. 283, 2008, pages 32158 - 61
PATTHYLÁSZLÓ: "Protein Evolution", 2007, WILEY-BLACKWELL
PEARSONLIPMAN, PNAS USA, vol. 85, 1988, pages 2444 - 2448
PLOTKIN ET AL., J. PHARMACOL. EXP. THER., vol. 305, 2003, pages 974 - 980
RICHARDS ET AL., PNAS, vol. 113, 2016, pages 13726 - 31
RIVAS ET AL., PNAS, vol. 85, 1988, pages 6092 - 6
ROWLING, P.J.SIVERTSSSON, E.M.PEREZ-RIBA, A.MAIN, E. R.ITZHAKI, L.S., BIOCHEM. SOC. TRANS., vol. 43, 2015, pages 881 - 888
SAKAMOTO ET AL., BBRC, vol. 484, 2017, pages 605 - 11
SAKAMOTO ET AL., BIOCHEM. BIOPHYS. RES. COMM., vol. 484, 2017, pages 605 - 611
SAMPATHKUMAR ET AL., J. MOL. BIOL., vol. 381, 2008, pages 867 - 880
SMALL, TRENDS BIOCHEM. SCI., vol. 25, no. 2, 2000, pages 46 - 7
SMITHWATERMAN, J. MOL BIOL., vol. 147, 1981, pages 195 - 197
SODING, J., BIOINFORMATICS, vol. 21, 2005, pages 951 - 960
STEWART ET AL., NAT. CHEM. BIOL., vol. 6, 2010, pages 595 - 601
STUMPP M T ET AL: "DARPins: A new generation of protein therapeutics", DRUG DISCOVERY TODAY, ELSEVIER, AMSTERDAM, NL, vol. 13, no. 15-16, 1 August 2008 (2008-08-01), pages 695 - 701, XP023440383, ISSN: 1359-6446, [retrieved on 20080711], DOI: 10.1016/J.DRUDIS.2008.04.013 *
TAMASKOVIC, R.SIMON, STEFANN., SCWHILLPLUCKTHUN, A.: "Designed ankyrin repeat proteins (DARPins): From research to therapy", METHODS IN ENZYM., vol. 503, 2012, pages 101 - 134
THOMPSON ,D.B.CRONICAN, J.J.LIU, D.R.: "Engineering and identifying supercharged proteins for macromolecule delivery into mammalian cells", METHODS ENZYMOL., vol. 503, 2012, pages 293 - 319, XP055044980, DOI: 10.1016/B978-0-12-396962-0.00012-4
VOGEL CBERZUINI CBASHTON MGOUGH JTEICHMANN SA, J. MOL. BIOL., vol. 336, no. 3, February 2004 (2004-02-01), pages 809 - 23
WATSON ET AL., NAT. COMM., vol. 7, 2016, pages 11262
YKELIEN L BOERSMA ET AL: "DARPins and other repeat protein scaffolds: advances in engineering and applications", CURRENT OPINION IN BIOTECHNOLOGY, vol. 22, no. 6, 1 January 2011 (2011-01-01), pages 849 - 857, XP028397473, ISSN: 0958-1669, [retrieved on 20110608], DOI: 10.1016/J.COPBIO.2011.06.004 *
ZHANG ET AL., NATURE BIOTECHNOLOGY, vol. 29, no. 2, pages 149 - 53
ZHU ET AL., PLOS ONE, vol. 7, no. 3, 2012, pages e33943

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11932633B2 (en) 2018-05-07 2024-03-19 Mirati Therapeutics, Inc. KRas G12C inhibitors
US11548888B2 (en) 2019-01-10 2023-01-10 Mirati Therapeutics, Inc. KRas G12C inhibitors
US11453683B1 (en) 2019-08-29 2022-09-27 Mirati Therapeutics, Inc. KRas G12D inhibitors
US11890285B2 (en) 2019-09-24 2024-02-06 Mirati Therapeutics, Inc. Combination therapies
US11702418B2 (en) 2019-12-20 2023-07-18 Mirati Therapeutics, Inc. SOS1 inhibitors
CN114796525A (en) * 2022-04-25 2022-07-29 南方科技大学 Application of cell cycle regulatory protein inhibitor in tumor treatment
CN114796525B (en) * 2022-04-25 2024-04-02 南方科技大学 Use of inhibitors of cyclin-mediated proteins in tumor therapy
US11964989B2 (en) 2022-07-20 2024-04-23 Mirati Therapeutics, Inc. KRas G12D inhibitors

Also Published As

Publication number Publication date
GB201902392D0 (en) 2019-04-10

Similar Documents

Publication Publication Date Title
WO2020169838A1 (en) Modular binding proteins
US20220090054A1 (en) Chimeric proteins
CN103459415B (en) Designed repeat proteins that bind to serum albumin
JP6437468B2 (en) Polypeptide modification
AU2021200216A1 (en) Ras inhibitory peptides and uses thereof
WO2011071280A9 (en) Intracelluar targeting bipodal peptide binder
WO2020169840A1 (en) Bispecific proteins with a chimeric scaffold
Chen et al. Expression, purification, and micelle reconstitution of antimicrobial piscidin 1 and piscidin 3 for NMR studies
Raddum et al. The native structure of annexin A2 peptides in hydrophilic environment determines their anti-angiogenic effects
Collins et al. Structural dynamics of the membrane translocation domain of colicin E9 and its interaction with TolB
Mann et al. Enhancement of muramyl dipeptide‐dependent NOD2 activity by a self‐derived peptide
US20130345115A1 (en) Nuclear penetrating h4 tail peptides for the treatment of diseases mediated by impaired or loss of p53 function
US20240101604A1 (en) Selective mena binding peptides
Madden Inhibiting Protein-Protein Interactions using Rationally-designed Repeat Proteins
Diamante Engineering mono-and multi-valent inhibitors on a modular repeat-protein scaffold to target oncogenic Tankyrase
Sang Protein interactions of membrane protein U24 from Roseolovirus and implications for its function
Yong Structure and Functional Characterization of ERG and SPOP
WO2011132941A2 (en) Tf-bpb specifically binding to transcription factor
WO2023182945A2 (en) Engineering peptides using peptide epitope linker evolution
Ahsan UNDERSTANDING THE ACTIVATION OF BACTERIAL PROTEASE CLPP BY ACYLDEPSIPEPTIDE ANTIBIOTIC
Waschuk et al. 19 Lego-Like Strategy for Assembling Multivalent Peptide-Based Vehicles Able to Deliver Molecular Cargoes into Cells
WO2011132940A2 (en) Rtk-bpb specifically binding to rtk
WO2011132938A2 (en) Gpcr-bpb specifically binding to gpcr
Hajduczki Engineering Soluble Membrane Proteins and Improved-Affinity Ligands by Phage Display
Uchime Dissecting the mechanisms of pro-apoptotic BAX modulation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20707057

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20707057

Country of ref document: EP

Kind code of ref document: A1