WO2018136939A1 - Evolved proteases and uses thereof - Google Patents

Evolved proteases and uses thereof Download PDF

Info

Publication number
WO2018136939A1
WO2018136939A1 PCT/US2018/014867 US2018014867W WO2018136939A1 WO 2018136939 A1 WO2018136939 A1 WO 2018136939A1 US 2018014867 W US2018014867 W US 2018014867W WO 2018136939 A1 WO2018136939 A1 WO 2018136939A1
Authority
WO
WIPO (PCT)
Prior art keywords
protein
protease
seq
amino acid
tev
Prior art date
Application number
PCT/US2018/014867
Other languages
French (fr)
Inventor
David R. Liu
Michael S. PACKER
Original Assignee
President And Fellows Of Harvard College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by President And Fellows Of Harvard College filed Critical President And Fellows Of Harvard College
Publication of WO2018136939A1 publication Critical patent/WO2018136939A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/48Hydrolases (3) acting on peptide bonds (3.4)
    • C12N9/50Proteinases, e.g. Endopeptidases (3.4.21-3.4.25)
    • C12N9/503Proteinases, e.g. Endopeptidases (3.4.21-3.4.25) derived from viruses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y304/00Hydrolases acting on peptide bonds, i.e. peptidases (3.4)
    • C12Y304/22Cysteine endopeptidases (3.4.22)
    • C12Y304/22044Nuclear-inclusion-a endopeptidase (3.4.22.44)
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides

Definitions

  • EB022376 (formerly ROl GM065400), GMl 18062, and GM008313 awarded by the National Institutes of Health (NIH). The government has certain rights in the invention.
  • proteases are ubiquitous enzymes that play important roles in many aspects of cell and tissue biology. Proteases can also be harnessed for biotechnological and biomedical applications. Among the more than 600 naturally occurring proteases that have been described are enzymes that have proven to be important catalysts of industrial processes, essential tools for proteome analysis, and life-saving pharmaceuticals. Recombinant human proteases including thrombin, factor Vila, and tissue plasminogen activator are widely used drugs for the treatment of blood clotting diseases. In addition, the potential of protease-based therapeutics to address disease in a manner analogous to that of antibody drugs, but with catalytic turnover, has been recognized for several decades.
  • proteases have the potential to generate proteases with therapeutically relevant specificities, for example novel proteases that cleave
  • interleukin-23 interleukin-23
  • IL-23 is a pro-inflammatory cytokine that enhances expansion of T helper type 17 (Thl7) cells and upregulates inflammatory autoimmune responses. It has been demonstrated that IL-23 plays an important role in several autoimmune diseases, such as psoriasis, inflammatory bowel disease, rheumatoid arthritis, asthma, and multiple sclerosis.
  • cleavage of IL-23 by a protein e.g., an evolved protease
  • proteins described herein are useful for the treatment of diseases associated with IL-23.
  • the disclosure provides a protein (e.g., an evolved protease) that cleaves IL-23.
  • the protein is evolved from a TEV protease.
  • the protein is not evolved from a protein that naturally cleaves IL-23.
  • the disclosure provides a protein (e.g., an evolved protease) comprising an amino acid sequence that is at least 90% identical to a Tobacco etch virus (TEV) protease, for example as represented by SEQ ID NO: 1.
  • TEV Tobacco etch virus
  • the disclosure provides a protein comprising an amino acid sequence that is at least 90% identical to SEQ ID NO: 1, wherein the protein comprises at least 14 amino acid sequence mutations set forth in Table 1.
  • the amino acid sequence is not more than 94% (e.g., not more than 93.9%, 93.5%, 93%, 92.5%, 92%, 91.5%, 90% etc.) identical to SEQ ID NO: 1.
  • the protein comprises at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 amino acid mutations as set forth in Table 1.
  • At least one of the amino acid sequence mutations is introduced at an amino acid position selected from the group consisting of T17, H28, T30, N68, E107, F132, S 153, and S 170. In some embodiments, at least one of the amino acid sequence mutations is selected from the group consisting of T17S, H28L, T30A, N68D, E107D, F132L, S 153N, and S 170A.
  • the protein further comprises at least one amino acid mutation at an amino acid position selected from the group consisting of D127, S 135, T146, D148, F162, N171, N176, N177, V209, W211, M218, and K229.
  • at least one of the amino acid mutations is selected from the group consisting of D127A, S 135F, T146S, D148P, F162S, N171D, N176T, N177M, V209M, W211I, M218F, and K229E.
  • the protein comprises or consists of the amino acid sequence as set forth in any one of SEQ ID NOs: 11- 153.
  • the protein cleaves a target sequence that is present in the present in an exposed loop of an IL-23 protein. In some embodiments, the protein cleaves a target sequence as set forth as HPLVGHM (SEQ ID NO: 3). In some embodiments, the protein cleaves the canonical target sequence of a TEV protease, for example a target sequence set forth as ENLYFQS (SEQ ID NO: 2).
  • the disclosure provides a pharmaceutical composition comprising a protein as described herein and a pharmaceutically acceptable excipient.
  • the disclosure provides an isolated nucleic acid encoding a protein comprising an amino acid sequence as set forth in any one of SEQ ID NOs: 11-153. In some aspects, the disclosure provides a host cell comprising said isolated nucleic acid.
  • the disclosure provides methods of reducing IL-23 activity
  • the method comprising administering to the extracellular environment (e.g., administering to the subject) an effective amount of a protein (e.g. , a TEV variant protease) as described herein.
  • a protein e.g. , a TEV variant protease
  • the extracellular environment is in vitro. In some embodiments, the extracellular environment is in vivo, for example an extracellular environment located within a subject. In some embodiments, the extracellular environment is characterized by increased IL-23 activity relative to a normal, healthy extracellular environment.
  • this disclosure relates to the surprising discovery that cleaving IL-23 using the evolved proteases described herein results in attenuated IL- 17 secretion.
  • cleavage of the HPLVGHM (SEQ ID NO: 3) target site e.g., by a protease described herein
  • IL- 17 secretion from a cell or cells located in the extracellular environment e.g. , a cell or cells of the subject is reduced.
  • the cell is a mammalian cell, optionally a human cell or a mouse cell.
  • the cell is an immune cell, such as a macrophage, dendritic cell, or activated phagocytic cell.
  • the disclosure relates to methods of producing an evolved
  • PACE phage-assisted continuous evolution
  • proteases described herein e.g., proteases evolved using PACE technology described herein
  • FIG. 1 Overview of PACE of a protease.
  • a culture of host E. coli continuously dilute a fixed-volume vessel containing an evolving population of selection phage (SP) in which the essential phage gene gill has been replaced by a protease gene.
  • SP selection phage
  • These host cells contain an arabinose-inducible mutagenesis plasmid (MP) and an accessory plasmid (AP) that supplies gill.
  • MP arabinose-inducible mutagenesis plasmid
  • AP accessory plasmid
  • gill is made protease-dependent through the use of a protease-activated RNA polymerase (PA-RNAP) consisting of T7 RNA polymerase fused through a cleavable substrate linker to T7 lysozyme, a natural inhibitor of T7 RNAP transcription.
  • PA-RNAP protease-activated RNA polymerase
  • an SP encodes a protease capable of cleaving the substrate linker, then the resulting liberation of T7 RNAP leads to the production of pill and infectious progeny phage encoding active proteases.
  • SP encoding proteases that cannot cleave the PA-RNAP yield non-infectious progeny phage.
  • FIG. 1 Evolutionary trajectories and representative evolved TEV protease genotypes. Each arrow represents a PACE experiment with the corresponding substrate peptide (sequences are given by SEQ ID NOs: 3-8, 173, and 184) and selection stringency parameters listed beneath the arrow. Increased selection stringency annotations are: Q649S (a T7 RNAP mutant with decreased transcriptional activity), proA (lower expression of substrate PA-RNAP), and IL-23 (38-66) (native IL-23 sequence in place of GGS linker). Numbers above the arrows denote TEV protease residues that were targeted in site- saturation mutagenesis libraries used to initiate that PACE experiment.
  • FIG. 3A Overview of phage substrate display. M13 bacteriophage libraries contain pill fused to a FLAG-tag through a randomized protease substrate linker.
  • substrate phage are bound to anti- FLAG magnetic beads and treated with a protease to release phage that encode substrates that can be cleaved by the protease.
  • the remaining intact substrate phage are eluted with excess FLAG peptide.
  • the abundance of all substrate sequences within the cleaved and eluted samples is measured by high-throughput sequencing. ( Figures 3B to 3E)
  • phage substrate display was separately performed on seven libraries, each with a different single randomized position within the ENLYFQS (SEQ ID NO: 2) motif. The resulting enrichment values are displayed as sequence logos, with enrichment values above zero indicating protease acceptance, and values below zero indicating rejection.
  • FIG. 3B Wild-type TEV protease exhibits strong enrichment for the consensus motif EXLYFQS (SEQ ID NO: 168).
  • Figure 3C Evolved TEV L2F (SEQ ID NO: 137) has broadened specificity at P6 and shifted specificity at P3, PI, and PI' in accordance with the HPLVGHM (SEQ ID NO: 3) target substrate.
  • Figure 3D Mutations I138T, N171D, and N176T are sufficient to broaden P6 specificity.
  • Figure 3E Mutations T146S, D148P, S 153N, S 170A, and N177M shift specificity at both PI and P3.
  • FIG. 4 Protease-mediated attenuation of IL-17 secretion in mouse splenocytes.
  • the activity of IL-23 in vivo is mediated by stabilization of a T-helper cell lineage (Th n ) that secretes IL-17, leading to downstream pro-inflammatory signals.
  • This pathway can be assayed within a culture of mouse mononuclear splenocytes, by measuring the amount of IL-17 secretion into the cell culture media using an ELISA.
  • anti-IL-23 antibodies in a super- stoichiometric ratio prevent the secretion of IL-17.
  • FIG. 1 Selection Phage Plasmid Map.
  • the M13 bacteriophage gene gill has been replaced with the gene of interest to be evolved, maltose-binding protein (MBP) fused to TEV through a GGS-linker.
  • MBP maltose-binding protein
  • FIG. 6 Accessory Plasmid Map.
  • a single accessory plasmid is used to supply the PA-RNAP construct under constitutive expression as well as supply gill under control of the T7 promoter.
  • a lysozyme-dependent terminator is placed downstream of the T7 promoter to lower transcription of gill in the absence of active protease.
  • TEV protease clones (corresponding genotypes can be found in Table 3) after evolution on the first stepping-stone substrate show apparent proteolytic activity on both the wild-type substrate ENLYFQS (SEQ ID NO: 2) and the single mutant substrate HNLYFQS (SEQ ID NO: 4). Error bars represent the standard deviation of three technical replicates.
  • TEV protease clones from trajectories 1 and 2 (corresponding genotypes can be found in Table 4) after evolution on the second stepping-stone substrate, ENLYGQS (SEQ ID NO: 5), show activity on the wild-type substrate ENLYFQS (SEQ ID NO: 2) and both single mutant substrates (HNLYFQS (SEQ ID NO: 4), ENLYGQS (SEQ ID NO: 5)). Error bars represent the standard deviation of three technical replicates.
  • FIG. 9 Luciferase Activity Assay after PACE 3 of Trajectory 3.
  • TEV protease clones from trajectory 3 (corresponding genotypes can be found in Table 5) after evolution on the second stepping-stone substrate, HNLYFHS (SEQ ID NO: 6), show apparent activity on the wild-type substrate (SEQ ID NO: 2) and the double mutant substrate, HNLYFHS (SEQ ID NO: 6). Error bars represent the standard deviation of three technical replicates.
  • FIG. 10 Luciferase Activity Assay of Clones after PACE 5.
  • PACE evolved TEV SP clones (corresponding genotypes can be found in Table 7) from stage four of the evolutionary trajectories show proteolysis of HPLVGHM (SEQ ID NO: 3) and ENLYFQS (SEQ ID NO: 2) substrates within a protease-activated RNA polymerase as measured by downstream luciferase signal. These data indicate that the evolved enzymes were acquiring the desired phenotype. Error bars represent the standard deviation of three technical replicates.
  • TEV variant prior to stringency modulation in PACE protease-induced luminescence assays were performed on a number of accessory plasmids (APs) that were expected to exert higher selection stringency.
  • APs accessory plasmids
  • the HPLVGHM (SEQ ID NO: 3) proB AP exhibits robust protease-induced luminescence and fold activation of 4.7.
  • the flexible GGS -linkers in the PA-RNAP of the standard AP are replaced with the native sequence of IL-23 (amino acids 38-66) protease-induced luminescence is diminished (fold- activation 2.8).
  • FIG. 12 Luciferase Activity Assay of Clones after PACE 9. After multiple PACE experiments with increasing levels of positive selection stringency, many TEV protease variants (corresponding genotypes can be found in Table 11) exhibit markedly stronger apparent activity on the HPLVGHM (SEQ ID NO: 3) substrate when compared with clones from previous PACE experiments such as those seen in Figure 8. Error bars represent the standard deviation of three technical replicates.
  • Figure 14 In Vitro Proteolysis Assays to Select Highest Activity Clone.
  • TEV protease variants from the final PACE time point were overexpressed and purified. Approximately 1 ⁇ g of protease was incubated with 5 ⁇ g of a fusion protein construct in which MBP is linked to GST through a cleavable substrate linker (in this case the substrate was HPLVGHM (SEQ ID NO: 3)). Here it was observed that TEV L2F (SEQ ID NO: 137) exhibits the highest catalytic activity. Note that TEV protease variants LIF and L5B encode premature stop codons leading to products with approximately the same molecular weight as GST. Consequently, the intensity of the MBP product band best reflects reaction efficiency.
  • FIGS 15A to 15D HPLC Assay of TEV Protease Kinetics.
  • TEV protease substrate peptides and the corresponding product peptides in a 1: 1 mixture are separable by reverse-phase liquid chromatography.
  • Figure 15B WT TEV protease (0.1 ⁇ ) was incubated for 10 minutes at 30 °C with ENLYFQS (SEQ ID NO: 2) substrate concentration ranging from 50 to 800 ⁇ .
  • Figure 15C TEV L2F (SEQ ID NO: 137) protease (0.1 ⁇ ) was incubated for 10 minutes at 30 °C with HPLVGHM (SEQ ID NO: 3)_substrate concentration ranging from 50 to 2000 ⁇ .
  • TEV proteases were assayed on a panel of substrate sequences. WT TEV efficiently cleaves wild-type substrate, and to a much lesser degree processes single mutant substrates (HNLYFQS (SEQ ID NO: 4), ENLYFHS (SEQ ID NO: 7), ENLYGQS (SEQ ID NO: 5)). Evolved TEV protease clone L2F yields a visible product band for the target substrate HPLVGHM (SEQ ID NO: 3). However, this evolved protease has also maintained activity on wild-type, single, double and triple mutant substrates that were used as evolutionary stepping-stones in PACE. Sequences are given by SEQ ID NOs.: 2-7 and 173.
  • Randomized Substrate Amino Acids The logos depicted were generated using phage substrate libraries containing windows of three randomized amino acids within either the ENLYFQS (SEQ ID NO: 2) or the HPLVGHM (SEQ ID NO: 3) substrate (corresponding enrichment values in Table 19).
  • the nature of the library (sequences are given by SEQ ID NOs.: 174-183) and the protease (sequences are given by SEQ ID NOs: 1 and 173) that was used in the selection is specified in the title above each sequence logo.
  • TEV protease variants were engineered to contain groups of mutations taken from the L2F variant (SEQ ID NO: 137). These enzymes were purified and assayed in vitro on test substrate, MBP-GST, containing the wild-type substrate motif ENLYFQS (SEQ ID NO: 2). All assayed variants retained proteolytic activity despite the naive genetic dissection of mutations.
  • the logos above were generated using phage substrate libraries each containing a single randomized amino acid within the ENLYFQS (SEQ ID NO: 2) substrate (corresponding enrichment values in Table 18).
  • the genotype of the protease that was used in the selection is specified in the title above each sequence logo.
  • the above specificity profiles all exhibit the ENLYFQS (SEQ ID NO: 2) consensus motif that is characteristic of wild-type TEV protease specificity.
  • Figures 20A to 20B Identification of IL-23 Cleavage Sites by Western
  • IL-23 heterodimer IL-23
  • IL-23 monomer IL-23pl9
  • Reaction mixtures were subject to LC-MS and visualized by Western blot with anti-IL-23pl9 monoclonal antibody (Figure 20A).
  • Bands 1 and 3 correspond to intact IL-23pl9; differences in size are due to carboxy-terminal affinity purification tags.
  • Cleavage product bands 2 and 4 correspond to IL-23 fragments with new masses that are 3,598 Da less than the corresponding starting materials. This mass difference corresponds to the fragment liberated by cleavage at the target site (HPLVGH//M; SEQ ID NO: 8).
  • Cleavage of the monomer also results in a second product (band 5) with a mass that matches IL-23 cleaved at both the target site (HPLVGH//M; SEQ ID NO: 8) and an off-target site (ARVFAH//G; SEQ ID NO: 9).
  • the IL-23pl9 amino acid sequence (SEQ ID NO: 195) is shown with the target cleavage site in bold and the off-target site in italics
  • IL-23 was procured in its native state as a
  • the reaction mixture ( Figures 21C and 21D) contains 27,768 Da match for TEV L2F (SEQ ID NO: 137) as well as a 15,875 Da mass, which matches the expected cleavage product plus an unspecified 751 Da C-terminal tag.
  • IL-23 l9 was procured in its monomeric form expressed and purified from cultured HEK293T cells using a C-terminal Myc/DDK tag (TP309680, Origene). This protein was incubated under reducing conditions either in the presence or absence of TEV L2F (SEQ ID NO: 137). These samples were then analyzed by LC-MS to yield total ion current ( Figures 22A and 22C) and the corresponding deconvoluted mass spectra ( Figures 22B and 22D). The unreacted sample ( Figures 22A and 22B) contains a mass of 22,324 Da which corresponds to the IL-23pl9 sequence and Myc tag in the product data.
  • the reaction mixture ( Figures 22C and 22D) contains three additional masses: TEV L2F (SEQ ID NO: 137) (27,768 Da), substrate cleaved only at the HPLVGHM (SEQ ID NO: 3) target site (18727 Da), and substrate cleaved at both the target site and an off-target site ARVFAHG (SEQ ID NO: 10) (14,526 Da).
  • IL-23 and TEV proteases were incubated for 16 hours at 4 °C in the presence of BSA as a stabilizing carrier protein.
  • Samples were prepared at 300x concentration used in splenocyte cultures to enable detection of IL-23pl9 and IL-12p40 by Western blot. Neither component is proteolyzed by wild-type TEV protease; IL-12p40 is also unaffected by TEV L2F (SEQ ID NO: 137).
  • TEV L2F SEQ ID NO: 137
  • IL-23 and TEV proteases were incubated for 16 hours at 4 °C in the presence of BSA as a stabilizing carrier protein.
  • Samples were prepared at 300x concentration used in splenocyte cultures to enable detection of IL-23pl9 and IL-12p40 by Western blot.
  • TEV L2F SEQ ID NO: 137
  • HPLVGHM SEQ ID NO: 3
  • TEV L2F SEQ ID NO: 137
  • IL-17 is secreted by cultured mouse mononuclear splenocytes in response to human IL-23 in the media.
  • the secretion of IL- 17 can be prevented by pretreatment of IL-23 with TEV L2F (SEQ ID NO: 137) at a dose that is less half the molar equivalent of IL-23.
  • Inhibition began at a dose corresponding to 0.7 nM TEV L2F (SEQ ID NO: 137) (compared with 1.9 nM IL-23), confirming that IL-23 is deactivated with catalytic turnover by TEV L2F.
  • FIG. 26 IL-23 induced IL-17 Secretion in Mouse Splenocytes.
  • IL-17 is secreted by cultured mouse mononuclear splenocytes in response to human IL-23 in the media. This response can be prevented by addition of antibodies that neutralize IL-23 directly to cell culture media.
  • a dose-dependent response is observed in which the antibody neutralizes IL-23 through a stoichiometric binding mechanism. Inhibition began at approximately 1.3 nM antibody (compared with 1.9 nM IL-23).
  • Evolved TEV L2F SEQ ID NO: 137, when added directly to cell culture media, is unable to prevent IL-23 from stimulating IL-17 secretion.
  • FIG. 27 IL-23 In Vitro Cleavage Assay. IL-23 and TEV proteases were incubated for 16 hours at 4 °C in the presence of BSA as a stabilizing carrier protein. The addition of 10% Fetal Bovine Serum (FBS) to the assay buffer, had no effect on the efficiency of cleavage by TEV L2F (SEQ ID NO: 137). The same percentage of FBS was used to supplement cell culture media, suggesting that components within serum are not responsible for loss of TEV L2F activity when added directly to splenocyte cell cultures.
  • FBS Fetal Bovine Serum
  • Figure 28 In Vitro Cleavage Assay. After positive selection for TEV protease variants that cleave the substrate ENLYAQS, a mixture of genotypes was enriched. Variants containing the mutation V216F cleaved only the ENLYaQS substrate but not the wild-type substrate ENLYFQS (SEQ ID NO: 2).
  • Figure 29 In Vitro Cleavage Assay. After simultaneous positive and negative selection for variants that cleaved the mutant substrate ENLYAQS but not the wild- type substrate ENLYFQS, all selected variants contained the V216F mutation.
  • proteases refers to an enzyme that catalyzes the hydrolysis of a peptide (amide) bond linking amino acid residues together within a protein.
  • the term embraces both naturally occurring and engineered proteases. Many proteases are known in the art.
  • protease classes include, without limitation, serine proteases (serine alcohol), threonine proteases (threonine secondary alcohol), cysteine proteases (cysteine thiol), aspartate proteases (aspartate carboxylic acid), glutamic acid proteases (glutamate carboxylic acid), and metalloproteases (metal ion, e.g., zinc).
  • serine proteases serine proteases
  • threonine proteases threonine secondary alcohol
  • cysteine proteases cysteine proteases (cysteine thiol)
  • aspartate proteases aspartate carboxylic acid
  • glutamic acid proteases glutamic acid proteases
  • metalloproteases metal ion, e.g., zinc
  • proteases are highly specific and only cleave substrates with a specific sequence.
  • Some blood clotting proteases such as, for example, thrombin, and some viral proteases such as, for example, HCV or TEV protease, are highly specific proteases.
  • Proteases that cleave in a very specific manner typically bind to multiple amino acid residues of their substrate.
  • proteases and protease cleavage sites also sometimes referred to as "protease substrates,” will be apparent to those of skill in the art and include, without limitation, proteases listed in the MEROPS database, accessible at merops.sanger.ac.uk and described in Rawlings et ah, (2014) MEROPS: the database of proteolytic enzymes, their substrates and inhibitors. Nucleic Acids Res 42, D503-D509, the entire contents of each of which are incorporated herein by reference. The disclosure is not limited in this respect.
  • protein refers to a polymer of amino acid residues linked together by peptide bonds.
  • the term, as used herein, refers to proteins, polypeptides, and peptides of any size, structure, or function. Typically, a protein will be at least three amino acids long.
  • a protein may refer to an individual protein or a collection of proteins.
  • Inventive proteins preferably contain only natural amino acids, although non-natural amino acids ⁇ i.e., compounds that do not occur in nature but that can be incorporated into a polypeptide chain; see, for example, cco.caltech.edu/ ⁇ dadgrp/Unnatstruct.gif, which displays structures of non-natural amino acids that have been successfully incorporated into functional ion channels) and/or amino acid analogs as are known in the art may alternatively be employed.
  • amino acids in an inventive protein may be modified, for example, by the addition of a chemical entity such as a carbohydrate group, a hydroxyl group, a phosphate group, a farnesyl group, an isofarnesyl group, a fatty acid group, a linker for conjugation, functionalization, or other modification, etc.
  • a protein may also be a single molecule or may be a multi-molecular complex.
  • a protein may be just a fragment of a naturally occurring protein or peptide.
  • a protein may be naturally occurring, recombinant, or synthetic, or any combination of these.
  • TEV tobacco Etch Virus
  • a wild-type TEV protease refers to the amino acid sequence of a TEV protease as it naturally occurs in a Tobacco Etch Virus genome.
  • the mutant protease with the single amino acid substitution S219V is referred to as wild-type; this variant is unable to cleave itself thus preventing auto- inactivation.
  • An example of a wild-type S219V TEV protease is represented by the amino acid sequence set forth in SEQ ID NO: 1.
  • a wild-type TEV protease cleaves the canonical target peptide (e.g., substrate) ENLYFQS (SEQ ID NO: 2).
  • Genetically modified cells that heterologously express one or more TEV protease(s) are known in the art, for example, as described by Tropea et ah, "Expression and purification of soluble His 6-tagged TEV protease.” High Throughput Protein Expression and Purification: Methods and Protocols (2009): 297-307.
  • TEV protease variant refers to a protein (e.g., a
  • TEV protease having one or more amino acid variations introduced into the amino acid sequence, e.g., as a result of application of the PACE method, as compared to the amino acid sequence of a naturally-occurring or wild-type TEV protein (e.g., SEQ ID NO: 1).
  • Amino acid sequence variations may include one or more mutated residues within the amino acid sequence of the protease, e.g., as a result of a change in the nucleotide sequence encoding the protease that results in a change in the codon at any particular position in the coding sequence, the deletion of one or more amino acids (e.g., a truncated protein), the insertion of one or more amino acids, or any combination of the foregoing.
  • a TEV protease variant cleaves a different target peptide (e.g., has broadened or different substrate specificity) relative to a wild-type TEV protease.
  • the term "continuous evolution,” as used herein, refers to an evolution procedure, in which a population of nucleic acids is subjected to multiple rounds of (a) replication, (b) mutation (or modification of the primary sequence of nucleotides of the nucleic acids in the population), and (c) selection to produce a desired evolved product, for example, a novel nucleic acid encoding a novel protein with a desired activity, wherein the multiple rounds of replication, mutation, and selection can be performed without investigator interaction, and wherein the processes (a)-(c) can be carried out simultaneously.
  • the evolution procedure is carried out in vitro, for example, using cells in culture as host cells.
  • a continuous evolution process relies on a system in which a gene of interest is provided in a nucleic acid vector that undergoes a life-cycle including replication in a host cell and transfer to another host cell, wherein a critical component of the life-cycle is deactivated and reactivation of the component is dependent upon a desired variation in an amino acid sequence of a protein encoded by the gene of interest.
  • the gene of interest is transferred from cell to cell in a manner dependent on the activity of the gene of interest.
  • the transfer vector is a virus infecting cells, for example, a bacteriophage, or a retroviral vector.
  • the viral vector is a phage vector infecting bacterial host cells.
  • the transfer vector is a conjugative plasmid transferred from a donor bacterial cell to a recipient bacterial cell.
  • the nucleic acid vector comprising the gene of interest is a phage, a viral vector, or naked DNA (e.g., a mobilization plasmid).
  • naked DNA e.g., a mobilization plasmid
  • transfer of the gene of interest from cell to cell is via infection, transfection, transduction, conjugation, or uptake of naked DNA, and efficiency of cell-to-cell transfer (e.g., transfer rate) is dependent on an activity of a product encoded by the gene of interest.
  • the nucleic acid vector is a phage harboring the gene of interest and the efficiency of phage transfer (via infection) is dependent on an activity of the gene of interest in that a protein required for the generation of phage particles (e.g. , pill for M13 phage) is expressed in the host cells only in the presence of the desired activity of the gene of interest.
  • some embodiments provide a continuous evolution system, in which a population of viral vectors comprising a gene of interest to be evolved replicates in a flow of host cells, e.g., a flow through a lagoon, wherein the viral vectors are deficient in a gene encoding a protein that is essential for the generation of infectious viral particles, and wherein that gene is in the host cell under the control of a conditional promoter that can be activated by a gene product encoded by the gene of interest, or a mutated version thereof.
  • the activity of the conditional promoter depends on a desired function of a gene product encoded by the gene of interest.
  • Viral vectors in which the gene of interest has not acquired a desired function as a result of a variation of amino acids introduced into the gene product protein sequence, will not activate the conditional promoter, or may only achieve minimal activation, while any mutations introduced into the gene of interest that confers the desired function will result in activation of the conditional promoter. Since the conditional promoter controls an essential protein for the viral life cycle, e.g. , pill, activation of this promoter directly corresponds to an advantage in viral spread and replication for those vectors that have acquired an advantageous mutation.
  • a host cell flow refers to a stream of host cells, wherein fresh host cells are being introduced into a host cell population, for example, a host cell population in a lagoon, remain within the population for a limited time, and are then removed from the host cell population.
  • a host cell flow may be a flow through a tube, or a channel, for example, at a controlled rate.
  • a flow of host cells is directed through a lagoon that holds a volume of cell culture media and comprises an inflow and an outflow.
  • the introduction of fresh host cells may be continuous or intermittent and removal may be passive, e.g., by overflow, or active, e.g. , by active siphoning or pumping. Removal further may be random, for example, if a stirred suspension culture of host cells is provided, removed liquid culture media will contain freshly introduced host cells as well as cells that have been a member of the host cell population within the lagoon for some time. Even though, in theory, a cell could escape removal from the lagoon indefinitely, the average host cell will remain only for a limited period of time within the lagoon, which is determined mainly by the flow rate of the culture media (and suspended cells) through the lagoon.
  • the viral vectors replicate in a flow of host cells, in which fresh, uninfected host cells are provided while infected cells are removed, multiple consecutive viral life cycles can occur without investigator interaction, which allows for the accumulation of multiple advantageous mutations in a single evolution experiment.
  • phage-assisted continuous evolution refers to continuous evolution that employs phage as viral vectors.
  • viral vector refers to a nucleic acid comprising a viral genome that, when introduced into a suitable host cell, can be replicated and packaged into viral particles able to transfer the viral genome into another host cell.
  • the term viral vector extends to vectors comprising truncated or partial viral genomes.
  • a viral vector is provided that lacks a gene encoding a protein essential for the generation of infectious viral particles.
  • suitable host cells for example, host cells comprising the lacking gene under the control of a conditional promoter, however, such truncated viral vectors can replicate and generate viral particles able to transfer the truncated viral genome into another host cell.
  • the viral vector is a phage, for example, a filamentous phage (e.g., an M13 phage).
  • a viral vector for example, a phage vector, is provided that comprises a gene of interest to be evolved.
  • nucleic acid refers to a polymer of nucleotides.
  • the polymer may include natural nucleosides (i.e. , adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine), nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3- methyl adenosine, 5-methylcytidine, C5-bromouridine, C5-fluorouridine, C5-iodouridine, C5-propynyl-uridine, C5-propynyl-cytidine, C5-methylcytidine, 7-deazaadenosine,
  • natural nucleosides i.e. , adenosine, thymidine, guanosine, cytidine, uridine, deoxy
  • gene of interest or “gene encoding a protease of interest,” as used herein, refers to a nucleic acid construct comprising a nucleotide sequence encoding a gene product, e.g., a protease, of interest to be evolved in a continuous evolution process as described herein.
  • the term includes any variations of a gene of interest that are the result of a continuous evolution process according to methods described herein.
  • a gene of interest is a nucleic acid construct comprising a nucleotide sequence encoding a protease to be evolved, cloned into a viral vector, for example, a phage genome, so that the expression of the encoding sequence is under the control of one or more promoters in the viral genome.
  • a gene of interest is a nucleic acid construct comprising a nucleotide sequence encoding a protease to be evolved and a promoter operably linked to the encoding sequence.
  • the expression of the encoding sequence of such genes of interest is under the control of the heterologous promoter and, in some embodiments, may also be influenced by one or more promoters in the viral genome.
  • function of a gene of interest refers to a function or activity of a gene product, for example, a nucleic acid or a protein, encoded by the gene of interest.
  • a function of a gene of interest may be an enzymatic activity (e.g., an enzymatic activity resulting in the generation of a reaction product, phosphorylation activity, phosphatase activity, etc.), an ability to activate transcription (e.g., transcriptional activation activity targeted to a specific promoter sequence), a bond-forming activity (e.g., an enzymatic activity resulting in the formation of a covalent bond), or a binding activity (e.g., a protein, DNA, or RNA binding activity).
  • an enzymatic activity e.g., an enzymatic activity resulting in the generation of a reaction product, phosphorylation activity, phosphatase activity, etc.
  • an ability to activate transcription e.g., transcriptional activation activity targeted to a specific promoter sequence
  • a bond-forming activity e.g., an enzymatic activity resulting in the formation of a covalent bond
  • a binding activity e.g., a protein, DNA
  • promoter refers to a nucleic acid molecule with a sequence recognized by the cellular transcription machinery and able to initiate transcription of a downstream gene.
  • a promoter can be constitutively active, meaning that the promoter is always active in a given cellular context, or conditionally active, meaning that the promoter is only active under specific conditions.
  • a conditional promoter may only be active in the presence of a specific protein that connects a protein associated with a regulatory element in the promoter to the basic transcriptional machinery, or only in the absence of an inhibitory molecule.
  • a subclass of conditionally active promoters are inducible promoters that require the presence of a small molecule "inducer" for activity.
  • inducible promoters include, but are not limited to, arabinose-inducible promoters, Tet-on promoters, and tamoxifen-inducible promoters.
  • arabinose-inducible promoters include, but are not limited to, arabinose-inducible promoters, Tet-on promoters, and tamoxifen-inducible promoters.
  • constitutive, conditional, and inducible promoters are well known to the skilled artisan, and the skilled artisan will be able to ascertain a variety of such promoters useful in carrying out the instant invention, which is not limited in this respect.
  • viral particle refers to a viral genome, for example, a DNA or RNA genome, that is associated with a coat of a viral protein or proteins, and, in some cases, with an envelope of lipids.
  • a phage particle comprises a phage genome packaged into a protein encoded by the wild type phage genome.
  • infectious viral particle refers to a viral particle able to transport the viral genome it comprises into a suitable host cell. Not all viral particles are able to transfer the viral genome to a suitable host cell. Particles unable to accomplish this are referred to as non-infectious viral particles.
  • a viral particle comprises a plurality of different coat proteins, wherein one or some of the coat proteins can be omitted without compromising the structure of the viral particle.
  • a viral particle is provided in which at least one coat protein cannot be omitted without the loss of infectivity. If a viral particle lacks a protein that confers infectivity, the viral particle is not infectious.
  • an M13 phage particle that comprises a phage genome packaged in a coat of phage proteins (e.g. , pVIII) but lacks pill (protein III) is a non-infectious M13 phage particle because pill is essential for the infectious properties of M13 phage particles.
  • viral life cycle refers to the viral reproduction cycle comprising insertion of the viral genome into a host cell, replication of the viral genome in the host cell, and packaging of a replication product of the viral genome into a viral particle by the host cell.
  • the viral vector provided is a phage.
  • phage refers to a virus that infects bacterial cells.
  • phages consist of an outer protein capsid enclosing genetic material.
  • the genetic material can be ssRNA, dsRNA, ssDNA, or dsDNA, in either linear or circular form.
  • Phages and phage vectors are well known to those of skill in the art and non- limiting examples of phages that are useful for carrying out the methods provided herein are ⁇ (Lysogen), T2, T4, T7, T12, R17, M13, MS2, G4, PI, P2, P4, Phi X174, N4, ⁇ 6, and ⁇ 29.
  • the phage utilized in the present invention is M13. Additional suitable phages and host cells will be apparent to those of skill in the art, and the invention is not limited in this aspect. For an exemplary description of additional suitable phages and host cells, see Elizabeth Kutter and Alexander Sulakvelidze: Bacteriophages: Biology and Applications. CRC Press; 1 st edition (December 2004), ISBN: 0849313368; Martha R. J. Clokie and Andrew M. Kropinski: Bacteriophages: Methods and Protocols, Volume 1:
  • the phage is a filamentous phage. In some embodiments, the phage is a filamentous phage.
  • the phage is an M 13 phage.
  • M13 phages are well known to those in the art and the biology of M13 phages has extensively been studied. Wild type M13 phage particles comprise a circular, single- stranded genome of approximately 6.4kb.
  • the wild-type genome of an M 13 phage includes eleven genes, gl-gXI, which, in turn, encode the eleven M13 proteins, pI-pXI, respectively.
  • gVIII encodes pVIII, also often referred to as the major structural protein of the phage particles, while gill encodes pill, also referred to as the minor coat protein, which is required for infectivity of M13 phage particles.
  • the M13 life cycle includes attachment of the phage to the sex pilus of a suitable bacterial host cell via the pill protein and insertion of the phage genome into the host cell.
  • the circular, single-stranded phage genome is then converted to a circular, double- stranded DNA, also termed the replicative form (RF), from which phage gene transcription is initiated.
  • the wild type M13 genome comprises nine promoters and two transcriptional terminators as well as an origin of replication. This series of promoters provides a gradient of transcription such that the genes nearest the two transcriptional terminators (gVIII and IV) are transcribed at the highest levels. In wild-type M13 phage, transcription of all 11 genes proceeds in the same direction.
  • One of the phage-encoded proteins, pll, initiates the generation of linear, single-stranded phage genomes in the host cells, which are subsequently circularized, and bound and stabilized by pV.
  • the circularized, single- stranded M13 genomes are then bound by pVIII, while pV is stripped off the genome, which initiates the packaging process.
  • pVIII the number of copies of pill are attached to wild-type M13 particles, thus generating infectious phage ready to infect another host cell and concluding the life cycle.
  • the M13 phage genome can be manipulated, for example, by deleting one or more of the wild type genes, and/or inserting a heterologous nucleic acid construct into the genome.
  • M13 does not have stringent genome size restrictions, and insertions of up to 42 kb have been reported. This allows M13 phage vectors to be used in continuous evolution experiments to evolve genes of interest without imposing a limitation on the length of the gene to be involved.
  • selection phage as used herein interchangeably with the term
  • selection plasmid refers to a modified phage that comprises a gene of interest to be evolved and lacks a full-length gene encoding a protein required for the generation of infectious phage particles.
  • some M13 selection phages provided herein comprise a nucleic acid sequence encoding a protease to be evolved, e.g., under the control of an M13 promoter, and lack all or part of a phage gene encoding a protein required for the generation of infectious phage particles, e.g., gl, gll, gill, gIV, gV, gVI, gVII, gVIII, glX, or gX, or any combination thereof.
  • some M13 selection phages provided herein comprise a nucleic acid sequence encoding a protease to be evolved, e.g., under the control of an M13 promoter, and lack all or part of a gene encoding a protein required for the generation of infective phage particles, e.g., the gill gene encoding the pill protein.
  • helper phage as used herein interchangeable with the terms
  • helper phagemid and “helper plasmid,” refers to a nucleic acid construct comprising a phage gene required for the phage life cycle, or a plurality of such genes, but lacking a structural element required for genome packaging into a phage particle.
  • a helper phage may provide a wild-type phage genome lacking a phage origin of replication.
  • a helper phage is provided that comprises a gene required for the generation of phage particles, but lacks a gene required for the generation of infectious particles, for example, a full-length pill gene.
  • the helper phage provides only some, but not all, genes required for the generation of phage particles.
  • Helper phages are useful to allow modified phages that lack a gene required for the generation of phage particles to complete the phage life cycle in a host cell.
  • a helper phage will comprise the genes required for the generation of phage particles that are lacking in the phage genome, thus complementing the phage genome.
  • the helper phage typically complements the selection phage, but both lack a phage gene required for the production of infectious phage particles.
  • replication product refers to a nucleic acid that is the result of viral genome replication by a host cell. This includes any viral genomes synthesized by the host cell from a viral genome inserted into the host cell. The term includes non-mutated as well as mutated replication products.
  • the term "accessory plasmid,” as used herein, refers to a plasmid comprising a gene required for the generation of infectious viral particles under the control of a conditional promoter.
  • the conditional promoter of the accessory plasmid is typically activated by a function of the gene of interest to be evolved. Accordingly, the accessory plasmid serves the function of conveying a competitive advantage to those viral vectors in a given population of viral vectors that carry a gene of interest able to activate the conditional promoter.
  • the conditional promoter of the accessory plasmid is a promote the transcriptional activity of which can be regulated over a wide range, for example, over 2, 3, 4, 5, 6, 7, 8, 9, or 10 orders of magnitude by the activating function, for example, function of a protein encoded by the gene of interest).
  • the level of transcriptional activity of the conditional promoter depends directly on the desired function of the gene of interest. This allows for starting a continuous evolution process with a viral vector population comprising versions of the gene of interest that only show minimal activation of the conditional promoter.
  • any mutation in the gene of interest that increases activity of the conditional promoter directly translates into higher expression levels of the gene required for the generation of infectious viral particles, and, thus, into a competitive advantage over other viral vectors carrying minimally active or loss-of-function versions of the gene of interest.
  • the stringency of selective pressure imposed by the accessory plasmid in a continuous evolution procedure as provided herein can be modulated.
  • the use of low copy number accessory plasmids results in an elevated stringency of selection for versions of the gene of interest that activate the conditional promoter on the accessory plasmid, while the use of high copy number accessory plasmids results in a lower stringency of selection.
  • the terms "high copy number plasmid” and “low copy number plasmid” are art- recognized and those of skill in the art will be able to ascertain whether a given plasmid is a high or low copy number plasmid.
  • a low copy number accessory plasmid is a plasmid exhibiting an average copy number of plasmid per host cell in a host cell population of about 5 to about 100.
  • a very low copy number accessory plasmid is a plasmid exhibiting an average copy number of plasmid per host cell in a host cell population of about 1 to about 10.
  • a very low copy number accessory plasmid is a single-copy per cell plasmid.
  • a high copy number accessory plasmid is a plasmid exhibiting an average copy number of plasmid per host cell in a host cell population of about 100 to about 5000. The copy number of an accessory plasmid will depend to a large part on the origin of replication employed. Those of skill in the art will be able to determine suitable origins of replication in order to achieve a desired copy number.
  • the stringency of selective pressure imposed by the accessory plasmid can also be modulated through the use of mutant or alternative conditional transcription factors with higher or lower transcriptional output (e.g. , a T7RNA polymerase comprising a Q649S mutation).
  • the use of lower transcriptional output results in an elevated stringency of selection for versions of the gene of interest that activate the conditional promoter on the accessory plasmid, while the use of higher transcriptional output machinery results in a lower stringency of selection.
  • the function of the accessory plasmid namely to provide a gene required for the generation of viral particles under the control of a conditional promoter the activity of which depends on a function of the gene of interest, can be conferred to a host cell in alternative ways.
  • Such alternatives include, but are not limited to, permanent insertion of a gene construct comprising the conditional promoter and the respective gene into the genome of the host cell, or introducing it into the host cell using an different vector, for example, a phagemid, a cosmid, a phage, a virus, or an artificial chromosome. Additional ways to confer accessory plasmid function to host cells will be evident to those of skill in the art, and the invention is not limited in this respect.
  • mutant refers to an agent that induces mutations or increases the rate of mutation in a given biological system, for example, a host cell, to a level above the naturally-occurring level of mutation in that system.
  • Some exemplary mutagens useful for continuous evolution procedures are provided elsewhere herein and other useful mutagens will be evident to those of skill in the art.
  • Useful mutagens include, but are not limited to, ionizing radiation, ultraviolet radiation, base analogs, deaminating agents (e.g., nitrous acid), intercalating agents (e.g.
  • alkylating agents e.g., ethylnitrosourea
  • transposons bromine, azide salts, psoralen, benzene,3-chloro-4- (dichloromethyl)-5-hydroxy-2(5H)-furanone (MX) (CAS no. 77439-76-0), 0,0-dimethyl-S- (phthalimidomethyl)phosphorodithioate (phos-met) (CAS no. 732-11- 6), formaldehyde (CAS no. 50-00-0), 2-(2-furyl)-3-(5-nitro-2-furyl)acrylamide (AF-2) (CAS no. 3688-53-7), glyoxal (CAS no.
  • 6-mercaptopurine (CAS no. 50-44- 2), N-(trichloromethylthio)- 4-cyclohexane-l,2-dicarboximide (captan) (CAS no. 133- 06-2), 2-aminopurine (CAS no. 452-06-2), methyl methane sulfonate (MMS) (CAS No. 66-27-3), 4-nitroquinoline 1 -oxide (4-NQO) (CAS No. 56-57-5), N4-aminocytidine (CAS no. 57294-74-3), sodium azide (CAS no. 26628-22-8), N-ethyl-N-nitrosourea (ENU) (CAS no.
  • N-methyl-N-nitrosourea (MNU) (CAS no. 820-60-0), 5- azacytidine (CAS no. 320-67-2), cumene hydroperoxide (CHP) (CAS no. 80- 15-9), ethyl methanesulfonate (EMS) (CAS no. 62-50-0), N-ethyl-N - nitro-N-nitrosoguanidine (ENNG) (CAS no. 4245-77-6), N-methyl-N -nitro-N- nitrosoguanidine (MNNG) (CAS no. 70-25-7), 5-diazouracil (CAS no. 2435-76-9) and t- butyl hydroperoxide (BHP) (CAS no. 75-91-2). Additional mutagens can be used in continuous evolution procedures as provided herein, and the invention is not limited in this respect.
  • a mutagen is used at a concentration or level of exposure that induces a desired mutation rate in a given host cell or viral vector population, but is not significantly toxic to the host cells used within the average time frame a host cell is exposed to the mutagen or the time a host cell is present in the host cell flow before being replaced by a fresh host cell.
  • mutagenesis plasmid refers to a plasmid comprising a gene encoding a gene product that acts as a mutagen.
  • the gene encodes a DNA polymerase lacking a proofreading capability.
  • the gene is a gene involved in the bacterial SOS stress response, for example, a UmuC, UmuD', or RecA gene.
  • the gene is a GATC methylase gene, for example, a deoxyadenosine methylase (dam methylase) gene.
  • the gene is involved in binding of hemimethylated GATC sequences, for example a seqA gene.
  • the gene is involved with repression of mutagenic nucleobase export, for example emrR.
  • the gene is involved with inhibition of uracil DNA-glycosylase, for example a Uracil Glycosylase Inhibitor (ugi) gene.
  • ugi Uracil Glycosylase Inhibitor
  • the gene is involved with deamination of cytidine (e.g., a cytidine deaminase from Petromyzon marinus), for example, cytidine deaminase 1 (CDA1).
  • cytidine e.g., a cytidine deaminase from Petromyzon marinus
  • CDA1 cytidine deaminase 1
  • the term "host cell,” as used herein, refers to a cell that can host a viral vector useful for a continuous evolution process as provided herein.
  • a cell can host a viral vector if it supports expression of genes of viral vector, replication of the viral genome, and/or the generation of viral particles.
  • One criterion to determine whether a cell is a suitable host cell for a given viral vector is to determine whether the cell can support the viral life cycle of a wild-type viral genome that the viral vector is derived from. For example, if the viral vector is a modified M13 phage genome, as provided in some embodiments described herein, then a suitable host cell would be any cell that can support the wild-type M13 phage life cycle.
  • Suitable host cells for viral vectors useful in continuous evolution processes are well known to those of skill in the art, and the invention is not limited in this respect.
  • modified viral vectors are used in continuous evolution processes as provided herein.
  • such modified viral vectors lack a gene required for the generation of infectious viral particles.
  • a suitable host cell is a cell comprising the gene required for the generation of infectious viral particles, for example, under the control of a constitutive or a conditional promoter (e.g., in the form of an accessory plasmid, as described herein).
  • the viral vector used lacks a plurality of viral genes.
  • a suitable host cell is a cell that comprises a helper construct providing the viral genes required for the generation of viral particles. A cell is not required to actually support the life cycle of a viral vector used in the methods provided herein.
  • a cell comprising a gene required for the generation of infectious viral particles under the control of a conditional promoter may not support the life cycle of a viral vector that does not comprise a gene of interest able to activate the promoter, but it is still a suitable host cell for such a viral vector.
  • the viral vector is a phage
  • the host cell is a bacterial cell.
  • the host cell is an E. coli cell.
  • Suitable E. coli host strains will be apparent to those of skill in the art, and include, but are not limited to, New England Biolabs (NEB) Turbo, ToplOF', DH12S, ER2738, ER2267, XLl-Blue MRF', and DH10B. These strain names are art recognized, and the genotype of these strains has been well characterized. It should be understood that the above strains are exemplary only and that the invention is not limited in this respect.
  • freshness refers to a host cell that has not been infected by a viral vector comprising a gene of interest as used in a continuous evolution process provided herein.
  • a fresh host cell can, however, have been infected by a viral vector unrelated to the vector to be evolved or by a vector of the same or a similar type but not carrying the gene of interest.
  • the host cell is a prokaryotic cell, for example, a bacterial cell, such as an E. coli cell.
  • the host cell is an E.coli cell.
  • the host cells are E. coli cells expressing the Fertility factor, also commonly referred to as the F factor, sex factor, or F-plasmid.
  • the F-factor is a bacterial DNA sequence that allows a bacterium to produce a sex pilus necessary for conjugation and is essential for the infection of E. coli cells with certain phage, for example, with Ml 3 phage.
  • the host cells for M13-PACE are of the genotype F proA + B + A(lacIZY) zzf::Tnl0(TetR)/ endAl recAl galE15 galK16 nupG rpsL AlacIZYA araD139 A(ara, leu)7697 mcrA A(mrr-hsdRMS- mcrBC) proBA::pirl 16 ⁇ " .
  • the host cells for M13-PACE are of the genotype F'proA+B+ A(lacIZY) zzf::Tnl0(TetR) lacIQlPN25-tetR luxCDE/endAl recAl galE15 galK16 nupG rpsL(StrR) AlacIZYA araD139 A(ara,leu)7697 mcrA A(mrr-hsdRMS- mcrBC) proBA::pirl l6 araE201 ArpoZ Aflu AcsgABCDEFG ApgaC ⁇ -, for example S 1030 cells as described in Carlson, J. C, et al.
  • the host cells for M13-PACE are of the genotype F' proA+B+ A(lacIZY) zzf::Tnl0 lacIQl PN25-tetR luxCDE Ppsp(AR2) lacZ luxR Plux groESL / endAl recAl galE15 galK16 nupG rpsL AlacIZYA araD139 A(ara,leu)7697 mcrA A(mrr-hsdRMS-mcrBC) proBA::pirl l6 araE201 ArpoZ Aflu AcsgABCDEFG ApgaC ⁇ -, for example S2060 cells as described in Hubbard, B. P. et al. Continuous directed evolution of DNA-binding proteins to improve TALEN specificity. Nature Methods 12, 939-942 (2015).
  • the term "subject,” as used herein, refers to an individual organism, for example, an individual mammal.
  • the subject is a human.
  • the subject is a non-human mammal.
  • the subject is a non-human primate.
  • the subject is a rodent.
  • the subject is a sheep, a goat, a cattle, a cat, or a dog.
  • the subject is a vertebrate, an amphibian, a reptile, a fish, an insect, a fly, or a nematode.
  • the subject is a research animal.
  • the subject is genetically engineered, e.g., a genetically engineered non-human subject.
  • the subject may be of either sex and at any stage of development.
  • the subject has a disease characterized by increased IL-23 expression.
  • the disease characterized by increased IL-23 activity is an inflammatory disease ⁇ e.g., plaque psoriasis, multiple sclerosis, inflammatory bowel disease, ulcerative colitis, Crohn's disease, rheumatoid arthritis, spondyloarthritis, systemic Lupus erythematosus (SLE), etc.).
  • cell refers to a cell derived from an individual organism, for example, from a mammal.
  • a cell may be a prokaryotic cell or a eukaryotic cell.
  • the cell is a eukaryotic cell, for example, a human cell, a mouse cell, a pig cell, a hamster cell, a monkey cell, etc.
  • a cell is
  • a cell is obtained from a subject having or suspected of having a disease characterized by increased IL-23 levels/expression, for example, inflammatory diseases, autoimmune diseases, etc.
  • extracellular environment refers, to the aqueous biological fluids and tissues forming the microenvironment surrounding a cell or cells.
  • an extracellular environment may include blood, serum, cytokines, neurotransmitters, tissue, etc., surrounding a cell or group of cells.
  • a cellular environment is the cell culture growth media surrounding a cell or cells in an in vitro culture vessel, such as a cell culture plate or flask.
  • the term "increased expression,” as used herein, refers to an increase in expression ⁇ e.g., elevated expression) of a particular molecule in one cell or subject relative to a normal cell or subject that is not characterized by increased expression of that molecule ⁇ e.g., a "normal” or “control” cell or subject).
  • a cell characterized by increased IL-23 expresses more IL-23 than a control cell expressing a normal (e.g. , healthy) amount of IL-23.
  • a cell characterized by increased IL-17 expression expresses more IL-17 than a control cell expressing a normal (e.g., healthy) amount of IL- 17.
  • biomolecules e.g., cytokines, proteins, nucleic acids, etc.
  • q-RT- PCR quantitative real-time PCR
  • western blot protein quantification assays
  • BCA assay protein quantification assay
  • Proteases are ubiquitous regulators of protein function in all domains of life and represent approximately one percent of known protein sequences.
  • Substrate-specific proteases have proven useful as research tools and as therapeutics that supplement a natural protease deficiency to treat diseases, such as hemophilia, or that simply perform their native functions such as the case of botulinum toxin, which catalyzes the cleavage of SNARE proteins.
  • proteases have engineered or evolved industrial proteases with enhanced thermostability and solvent tolerance. Similarly, a handful of therapeutic proteases have been engineered with improved kinetics and prolonged activity. The potential of proteases to serve as a broadly useful platform for degrading proteins implicated in disease, however, is greatly limited by the native substrate scope of known proteases. In contrast to the highly successful generation of therapeutic monoclonal antibodies with tailor-made binding specificities, the generation of proteases with novel protein cleavage specificities has proven to be a major challenge.
  • protease that can degrade a target protein of interest often necessitates changing substrate sequence specificity at more than one position, and thus may require many generations of evolution.
  • Continuous evolution strategies which require little or no researcher intervention between generations, therefore may be well- suited to evolve proteases capable of cleaving a target protein that differs substantially in sequence from the preferred substrate of a wild-type protease.
  • PACE phage-assisted continuous evolution
  • SP population of evolving selection phage (SP) is continuously diluted in a fixed-volume vessel by an incoming culture of host cells, e.g., E. coli.
  • the SP is a modified phage genome in which the evolving gene of interest has replaced gene III, a gene essential for phage infectivity. If the evolving gene of interest possesses the desired activity, it will trigger expression of gene III from an accessory plasmid (AP) in the host cell, thus producing infectious progeny encoding active variants of the evolving gene.
  • the mutation rate of the SP is controlled using an inducible mutagenesis plasmid (MP) such as MP6, which upon induction increases the mutation rate of the SP by > 300,000-fold. Because the rate of continuous dilution is slower than phage replication but faster than E. coli replication, mutations only accumulate in the SP.
  • MP inducible mutagenesis plasmid
  • PACE can be employed for the directed evolution of proteases, in particular the evolution of proteases that cleave IL-23.
  • Proteases may require many successive mutations to remodel complex networks of contacts with polypeptide substrates, and are thus not readily manipulated by conventional, iterative evolution methods.
  • the ability of PACE to perform the equivalent of hundreds of rounds of iterative evolution methods within days enables complex protease evolution experiments, that are impractical with conventional methods.
  • This disclosure provides data illustrating the feasibility of PACE-mediated evolution of the TEV protease to cleave IL-23.
  • TEV Tobacco Etch Virus
  • HPLVGHM SEQ ID NO: 3
  • TEV variant proteases contain up to 20 amino acid substitutions relative to wild-type TEV protease (e.g., SEQ ID NO: 1), cleave human IL-23 at the intended target peptide bond, and block the ability of IL-23 to stimulate IL-17 production in a murine splenocyte assay.
  • wild-type TEV protease e.g., SEQ ID NO: 1
  • cleave human IL-23 at the intended target peptide bond
  • IL-17 production in a murine splenocyte assay.
  • variant TEV proteases that are derived from a wild-type TEV protease ⁇ e.g., SEQ ID NO: l) and have at least 14 variations in the amino acid sequence of the protein as compared to the amino acid sequence present within a cognate wild-type TEV protease.
  • the variation in amino acid sequence generally results from a mutation, insertion, or deletion in a DNA coding sequence.
  • Mutation of a DNA sequence can result in a nonsense mutation ⁇ e.g., a transcription termination codon (TAA, TAG, or TAA) that produces a truncated protein), a missense mutation ⁇ e.g., an insertion or deletion mutation that shifts the reading frame of the coding sequence), or a silent mutation ⁇ e.g., a change in the coding sequence that results in a codon that codes for the same amino acid normally present in the cognate protein, also referred to sometimes as a synonymous mutation).
  • mutation of a DNA sequence results in a non- synonymous ⁇ i.e., conservative, semi-conservative, or radical) amino acid substitution.
  • wild-type TEV protease is encoded by a gene of the Tobacco Etch
  • the amount or level of variation between a wild-type TEV protease and a variant TEV protease provided herein can be expressed as the percent identity of the nucleic acid sequences or amino acid sequences between the two genes or proteins.
  • the amount of variation is expressed as the percent identity at the amino acid sequence level.
  • a variant TEV protease and a wild-type TEV protease are from about 50% to about 99.9% identical, about 55% to about 95% identical, about 60% to about 90% identical, about 65% to about 85% identical, or about 70% to about 80% identical at the amino acid sequence level.
  • a variant TEV protease comprises an amino acid sequence that is at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 99%, or at least 99.9% identical to the amino acid sequence of a wild-type TEV protease.
  • a variant TEV protease is about 70%, about 71%, about
  • variant TEV proteases having between about 90% and about 94% (e.g., about 90%, about 90.5%, about 91%, about 91.5%, about 92%, about 92.5%, about 93%, about 93.5%, or about 94%) identity to a wild-type TEV protease as set forth in SEQ ID NO: 1.
  • the variant TEV protease is no more than 94% identical to a wild-type TEV protease.
  • the variant TEV protease comprises at least 14 amino acid variations selected from the variations (e.g. , amino acid substitutions) provided in Table 1.
  • the amount or level of variation between a wild-type TEV protease and a variant TEV protease can also be expressed as the number of mutations present in the amino acid sequence encoding the variant TEV protease relative to the amino acid sequence encoding the wild-type TEV protease.
  • an amino acid sequence encoding a variant TEV protease comprises between about 1 mutation and about 100 mutations, about 10 mutations and about 90 mutations, about 20 mutations and about 80 mutations, about 30 mutations and about 70 mutations, or about 40 and about 60 mutations relative to an amino acid sequence encoding a wild-type TEV protease.
  • an amino acid sequence encoding a variant TEV protease comprises more than 100 mutations relative to an amino acid sequence encoding a wild-type TEV protease.
  • an amino acid sequence encoding a variant TEV protease comprises 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 mutations relative to an amino acid sequence encoding a wild-type TEV protease. Examples of mutations that occur in an amino acid sequence encoding a variant TEV protease are included in Table 1. Table 1: Amino acid mutations in variant TEV proteases relative to SEQ ID NO: 1
  • variant TEV protease genotype may comprise the mutations T17S, H28L, T30A, N68D, E107D, F132L, S 153N, and S 170A, relative to a wild-type TEV protease (e.g., SEQ ID NO: 1). Further examples of variant TEV protease genotypes are shown in Tables 2-11.
  • Table 2 Non-limiting Examples of variant TEV protease genotypes relative to SEQ ID NO: 1.
  • Table 3 Non-limiting examples of variant TEV protease genotypes relative to SEQ ID NO: 1 from 84 h PACE 1.
  • Each row corresponds to a single clone of evolved TEV protease from the selection plasmid (SP). These genotypes are from the end of PACE 1 in Figure 2.
  • Table 4 Non-limiting examples of variant TEV protease genotypes relative to SEQ ID NO: 1 from 168h of PACE 2.
  • Genotypes after 168 cumulative hours of PACE are from the end of PACE 2 of trajectories 1 and 2 in Figure 2.
  • Table 5 Non-limiting examples of variant TEV protease genotypes relative to SEQ ID NO: 1 after 84h of PACE 3.
  • Genotypes after 168 cumulative hours of PACE are from the end of PACE 3 for trajectory 3 in Figure 2.
  • Table 6 Non-limiting examples of variant TEV protease genotypes relative to SEQ ID NO: 1 after 96h of PACE 4.
  • Genotypes after 264 cumulative hours of PACE are from the end of PACE 4 in Figure 2.
  • Table 7 Non-limiting examples of variant TEV protease genotypes relative to SEQ ID NO: 1 after 72h of PACE 5.
  • Genotypes after 336 cumulative hours of PACE are from the end of PACE 5 in Figure 2.
  • Table 8 Non-limiting Examples of variant TEV protease genotypes relative to SEQ ID NO: 1 after 120h of PACE 6.
  • Genotypes after 456 cumulative hours of PACE are from the end of PACE 6 in Figure 2.
  • Genotypes after 528 cumulative hours of PACE are from the end of PACE 7 in Figure 2.
  • a variant TEV protease comprises at least
  • a variant TEV protease as described herein comprises or consists of a sequence selected from SEQ ID NOs: 11-153 given in Table 12. The lowercase amino acid residues indicate the amino acid substitutions.
  • This disclosure relates, in part, to the discovery that continuous evolution methods (e.g., PACE) are useful for producing variant TEV proteases that have altered peptide cleaving activities (altered peptide cleaving functions).
  • a variant TEV protease as described by the disclosure cleaves an IL-23 protein or peptide.
  • a variant TEV protease as described by the disclosure cleaves the target sequence HPLVGHM (SEQ ID NO: 3).
  • a variant TEV protease as described herein cleaves both the canonical TEV protease peptide target sequence ENLYFQS (SEQ ID NO: 2) and an IL-23 peptide target sequence, for example, HPLVGHM (SEQ ID NO: 3).
  • a variant TEV protease cleaves an IL- 23 target peptide with higher affinity than the cognate TEV protease.
  • a variant TEV protease that cleaves a target peptide with higher affinity can have an increase in catalytic efficiency ranging from about 1.1-fold, about 1.5-fold, 2-fold to about 100-fold, about 5-fold to about 50-fold, or about 10-fold to about 40-fold, relative to the catalytic efficiency of the wild-type TEV protease from which the variant TEV protease was derived.
  • a variant TEV protease described herein cleaves IL-23 with about 1% to about 100% (e.g.
  • Catalytic efficiency can be measured or determined using any suitable method known in the art, for example using the methods described in the Examples below.
  • Some aspects of this disclosure provide methods for using a protease provided herein.
  • such methods include contacting a protein comprising a protease target cleavage sequence with the protease.
  • the protein contacted with the protease is a therapeutic target.
  • the therapeutic target is interleukin-23 (IL-23).
  • IL-23 is a heterodimeric cytokine that comprises an IL- 12p40 subunit and an IL-23pl9 subunit, and binds to its cognate receptor, IL-23R.
  • IL-23 functions as a mediator of inflammation, for example by inducing secretion of the pro-inflammatory cytokine interleukin- 17 (IL- 17).
  • the disclosure provides methods of decreasing IL-23 expression or activity in a cell, the method comprising contacting the cell, or the extracellular environment (e.g. , cell culture media surrounding the cells) with a variant TEV protease as described herein.
  • the disclosure provides methods of decreasing IL-17 expression or activity in a cell, the method comprising contacting the cell, or the extracellular environment, with a variant TEV protease as described herein.
  • the cell, or extracellular environment is in a subject, for example a mammal.
  • the cell, or extracellular environment is in vitro.
  • the cell is characterized by increased expression of IL-23 relative to a normal cell or extracellular environment (e.g. , a healthy cell, or extracellular environment, not characterized by increased expression of IL- 23).
  • increased expression of IL-23 occurs when, in a cell, the expression of IL-23 is about 2-fold, 3-fold, 4-fold, 5-fold, 10-fold, 25-fold, 50-fold, 100-fold, 500-fold, or 1000-fold over expression of IL-23 in a normal healthy cell, or extracellular environment.
  • a cell characterized by increased expression of IL-23 is derived from a subject (e.g.
  • a mammalian subject such as a human or mouse
  • a disease associated with increased IL-23 expression for example, an inflammatory disease or an autoimmune disease.
  • inflammatory diseases include, but are not limited to, plaque psoriasis, multiple sclerosis, inflammatory bowel disease, ulcerative colitis, Crohn's disease, rheumatoid arthritis, systemic lupus erythmatosus (SLE), and spondylarthritis.
  • the methods provided herein comprise contacting the target protein (e.g., IL-23, or a protein comprising a peptide comprising the amino acid sequence HPLVGHM (SEQ ID NO: 3)) with the protease in vitro. In some embodiments, the methods provided herein comprise contacting the target protein (e.g., IL-23, or a protein comprising a peptide comprising the amino acid sequence HPLVGHM (SEQ ID NO: 3)) with the protease in vivo.
  • the target protein e.g., IL-23, or a protein comprising a peptide comprising the amino acid sequence HPLVGHM (SEQ ID NO: 3)
  • the methods provided herein comprise contacting the target protein (e.g., IL-23, or a protein comprising a peptide comprising the amino acid sequence HPLVGHM (SEQ ID NO: 3)) with the protease in an extracellular environment.
  • the methods provided herein comprise contacting the target protein (e.g., IL-23, or a protein comprising a peptide comprising the amino acid sequence HPLVGHM (SEQ ID NO: 3)) with the protease in a subject, e.g., by administering the protease to the subject, either locally or systemically.
  • the protease is administered to the subject in an amount effective to result in a measurable decrease in the level of full-length (or functional) target protein (e.g., IL-23) in the subject, or in a measurable increase in the level of a cleavage product generated by the protease upon cleavage of the target protein.
  • a measurable decrease in the level of full-length (or functional) target protein e.g., IL-23
  • IL-23- cleaving TEV protease variants described herein are useful, in some embodiments, for treating diseases associated with increased IL-17 expression or activity, such as autoimmune diseases (e.g., plaque psoriasis, multiple sclerosis, inflammatory bowel disease, ulcerative colitis, Crohn's disease, rheumatoid arthritis, systemic lupus erythmatosus (SLE), and spondylarthritis, etc.).
  • autoimmune diseases e.g., plaque psoriasis, multiple sclerosis, inflammatory bowel disease, ulcerative colitis, Crohn's disease, rheumatoid arthritis, systemic lupus erythmatosus (SLE), and spondylarthritis, etc.
  • SLE systemic lupus erythmatosus
  • spondylarthritis etc.
  • Some aspects of this disclosure provide methods for evolution of a protease.
  • a method of evolution of a protease comprises (a) contacting a population of host cells with a population of vectors comprising a gene encoding a protease.
  • the vectors are typically deficient in at least one gene required for the transfer of the phage vector from one cell to another, e.g., a gene required for the generation of infectious phage particles.
  • the host cells are amenable to transfer of the vector; (2) the vector allows for expression of the protease in the host cell, can be replicated by the host cell, and the replicated vector can transfer into a second host cell; and (3) the host cell expresses a gene product encoded by the at least one gene for the generation of infectious phage particles (a) in response to the activity of the protease, and the level of gene product expression depends on the activity of the protease.
  • the methods of protease evolution provided herein typically comprise (b) incubating the population of host cells under conditions allowing for mutation of the gene encoding the protease, and the transfer of the vector comprising the gene encoding the protease of interest from host cell to host cell.
  • the host cells are removed from the host cell population at a certain rate, e.g., at a rate that results in an average time a host cell remains in the cell population that is shorter than the average time a host cell requires to divide, but long enough for the completion of a life cycle (uptake, replication, and transfer to another host cell) of the vector.
  • the population of host cells is replenished with fresh host cells that do not harbor the vector.
  • the rate of replenishment with fresh cells substantially matches the rate of removal of cells from the cell population, resulting in a substantially constant cell number or cell density within the cell population.
  • the methods of protease evolution provided herein typically also comprise (c) isolating a replicated vector from the host cell population in (b), wherein the replicated vector comprises a mutated version of the gene encoding the protease.
  • Some embodiments provide a continuous evolution system, in which a population of viral vectors, e.g. , M13 phage vectors, comprising a gene encoding a protease of interest to be evolved replicates in a flow of host cells, e.g., a flow through a lagoon, wherein the viral vectors are deficient in a gene encoding a protein that is essential for the generation of infectious viral particles, and wherein that gene is in the host cell under the control of a conditional promoter the activity of which depends on the activity of the protease of interest.
  • a population of viral vectors e.g. , M13 phage vectors
  • a gene encoding a protease of interest to be evolved replicates in a flow of host cells, e.g., a flow through a lagoon, wherein the viral vectors are deficient in a gene encoding a protein that is essential for the generation of infectious viral particles, and wherein that gene is in the host cell under the control
  • transcription from the conditional promoter may be activated by cleavage of a fusion protein comprising a transcription factor and an inhibitory protein fused to the transcriptional activator via a linker comprising a target site of the protease.
  • Some embodiments of the protease PACE technology described herein utilize a "selection phage," a modified phage that comprises a gene of interest to be evolved and lacks a full-length gene encoding a protein required for the generation of infectious phage particles.
  • the selection phage serves as the vector that replicates and evolves in the flow of host cells.
  • some M13 selection phages provided herein comprise a nucleic acid sequence encoding a protease to be evolved, e.g.
  • M13 selection phages comprise a nucleic acid sequence encoding a protease to be evolved, e.g. , under the control of an M 13 promoter, and lack all or part of a gene encoding a protein required for the generation of infective phage particles, e.g. , the gill gene encoding the pill protein.
  • protease PACE protease PACE
  • the transcriptional activator directly drives
  • the transcriptional activator may be an RNA polymerase.
  • RNA polymerase Suitable RNA polymerases and promoter sequences targeted by such RNA polymerases are well known to those of skill in the art.
  • exemplary suitable RNA polymerases include, but are not limited to, T7 polymerases (targeting T7 promoter sequences) and T3 RNA polymerases (targeting T3 promoter sequences). Additional suitable RNA polymerases will be apparent to those of skill in the art based on the instant disclosure, which is not limited in this respect.
  • the transcriptional activator does not directly drive transcription, but recruits the transcription machinery of the host cell to a specific target promoter.
  • Suitable transcriptional activators such as, for example, Gal4 or fusions of the transactivation domain of the VP 16 transactivator with DNA-binding domains, will be apparent to those of skill in the art based on the instant disclosure, and the disclosure is not limited in this respect.
  • the at least one gene for the generation of infectious phage particles is expressed in the host cells under the control of a promoter activated by the transcriptional activator, for example, under the control of a T7 promoter if the transcriptional activator is T7 RNA polymerase, and under the control of a T3 promoter if the transcriptional activator is T3 polymerase, and so on.
  • the transcriptional activator is fused to an inhibitor that either directly inhibits or otherwise hinders the transcriptional activity of the transcriptional activator, for example, by directly interfering with DNA binding or transcription, by targeting the transcriptional activator for degradation through the host cells protein degradation machinery, or by directing export from the host cell or localization of the transcriptional activator into a compartment of the host cell in which it cannot activate transcription from its target promoter.
  • the inhibitor is fused to the transcriptional activator's N-terminus. In other embodiments, it is fused to the activator' s C-terminus.
  • the protease evolution methods provided herein comprise an initial or intermittent phase of diversifying the population of vectors by mutagenesis, in which the cells are incubated under conditions suitable for mutagenesis of the gene encoding the protease in the absence of stringent selection or in the absence of any selection for evolved protease variants that have acquired a desired activity.
  • Such low- stringency selection or no selection periods may be achieved by supporting expression of the gene for the generation of infectious phage particles in the absence of desired protease activity, for example, by providing an inducible expression construct comprising a gene encoding the respective packaging protein under the control of an inducible promoter and incubating under conditions that induce expression of the promoter, e.g.
  • inducible promoters and inducible expression systems are described herein and in International PCT Application, PCT/US2011/066747, filed December 22, 2011, published as WO 2012/088381 on June 28, 2012; and U.S. Application, U.S.S.N. 13/922,812, filed June 20, 2013; International PCT Application, PCT/US2015/057012, filed on October 22, 2015, published as WO2016/077052; and, PCT/US2016/027795, filed on April 15, 2016, published as WO2016/168631, the entire contents of each of which are incorporated herein by reference. Additional suitable promoters and inducible gene expression systems will be apparent to those of skill in the art based on the instant disclosure.
  • the method comprises a phase of stringent selection for a mutated protease version. If an inducible expression system is used to relieve selective pressure, the stringency of selection can be increased by removing the inducing agent from the population of cells in the lagoon, thus turning expression from the inducible promoter off, so that any expression of the gene required for the generation of infectious phage particles must come from the protease activity-dependent expression system.
  • One aspect of the PACE protease evolution methods provided herein is the mutation of the initially provided vectors encoding a protease of interest.
  • the host cells within the flow of cells in which the vector replicates are incubated under conditions that increase the natural mutation rate. This may be achieved by contacting the host cells with a mutagen, such as certain types of radiation or to a mutagenic compound, or by expressing genes known to increase the cellular mutation rate in the cells. Additional suitable mutagens will be known to those of skill in the art, and include, without limitation, those described in International PCT Application, PCT/US2011/066747, filed December 22, 2011, published as WO 2012/088381 on June 28, 2012; and U.S. Application, U.S.S.N. 13/922,812, filed June 20, 2013; International PCT Application,
  • the host cells comprise the accessory plasmid encoding the at least one gene for the generation of infectious phage particles, e.g., of the M13 phage, encoding the protease to be evolved and a helper phage, and together, the helper phage and the accessory plasmid comprise all genes required for the generation of infectious phage particles. Accordingly, in some such embodiments, variants of the vector that do not encode a protease variant that can untether the inhibitor from the transcriptional activator will not efficiently be packaged, since they cannot effect an increase in expression of the gene required for the generation of infectious phage particles from the accessory plasmid.
  • variants of the vector that encode a protease variant that can efficiently cleave the inhibitor from the transcriptional activator will effect increased transcription of the at least one gene required for the generation of infectious phage particles from the accessory plasmid and thus be efficiently packaged into infectious phage particles.
  • the protease PACE methods provided herein further comprises a negative selection for undesired protease activity in addition to the positive selection for a desired protease activity.
  • Such negative selection methods are useful, for example, in order to maintain protease specificity when increasing the cleavage efficiency of a protease directed towards a specific target site. This can avoid, for example, the evolution of proteases that show a generally increased protease activity, including an increased protease activity towards off-target sites, which is generally undesired in the context of therapeutic proteases.
  • negative selection is applied during a continuous evolution process as described herein, by penalizing the undesired activities of evolved proteases. This is useful, for example, if the desired evolved protease is an enzyme with high specificity for a target site, for example, a protease with altered, but not broadened, specificity.
  • negative selection of an undesired activity e.g., off-target protease activity, is achieved by causing the undesired activity to interfere with pill production, thus inhibiting the propagation of phage genomes encoding gene products with an undesired activity.
  • expression of a dominant-negative version of pill or expression of an antisense RNA complementary to the gill RBS and/or gill start codon is linked to the presence of an undesired protease activity.
  • Suitable negative selection strategies and reagents useful for negative selection, such as dominant-negative versions of M13 pill, are described herein and in International PCT Application, PCT/US2011/066747, filed December 22, 2011, published as WO 2012/088381 on June 28, 2012; and U.S.
  • counter- selection against activity on non-target substrates is achieved by linking undesired evolved protease activities to the inhibition of phage propagation.
  • a dual selection strategy is applied during a continuous evolution experiment, in which both positive selection and negative selection constructs are present in the host cells.
  • the positive and negative selection constructs are situated on the same plasmid, also referred to as a dual selection accessory plasmid.
  • One advantage of using a simultaneous dual selection strategy is that the selection stringency can be fine-tuned based on the activity or expression level of the negative selection construct as compared to the positive selection construct.
  • Another advantage of a dual selection strategy is that the selection is not dependent on the presence or the absence of a desired or an undesired activity, but on the ratio of desired and undesired activities, and, thus, the resulting ratio of pIII and plll-neg that is incorporated into the respective phage particle.
  • the host cells comprise an expression construct encoding a dominant-negative form of the at least one gene for the generation of infectious phage particles, e.g., a dominant-negative form of the pIII protein (plll-neg), under the control of an inducible promoter that is activated by a transcriptional activator other than the transcriptional activator driving the positive selection system.
  • a dominant-negative form of the gene diminishes or completely negates any selective advantage an evolved phage may exhibit and thus dilutes or eradicates any variants exhibiting undesired activity from the lagoon.
  • the positive selection system comprises a T7 promoter driving the expression of the at least one gene for the generation of infectious phage particles, and a T7 RNA polymerase fused to a T7-RNA polymerase inhibitor via a linker comprising a protease target site that is cleaved by a desired protease activity
  • the negative selection system should be a non-T7 based system.
  • the negative selection system could be based on T3 polymerase activity, e.g., in that it comprises a T3 promoter driving the expression of a dominant-negative form of the at least one gene for the generation of infectious phage particles, and a T3 RNA polymerase fused to a T3-RNA polymerase inhibitor via a linker comprising a protease target site that is cleaved by an undesired protease activity.
  • the negative selection polymerase is a T7 RNA polymerase gene comprising one or more mutations that render the T7 polymerase able to transcribe from the T3 promoter but not the T7 promoter, for example: N67S, R96L, K98R, H176P, E207K, E222K, T375A, M401I, G675R, N748D, P759L, A798S, A819T, etc.
  • the negative selection polymerase may be fused to a T7-RNA polymerase inhibitor via a linker comprising a protease target site that is cleaved by an undesired protease activity.
  • the undesired function is cleavage of an off-target protease cleavage site. In some embodiments, the undesired function is cleavage of the linker sequence of the fusion protein outside of the protease cleavage site.
  • Some aspects of this invention provide or utilize a dominant negative variant of pill (plll-neg). These aspects are based on the recognition that a pill variant that comprises the two N-terminal domains of pill and a truncated, termination-incompetent C- terminal domain is not only inactive but is a dominant-negative variant of pill.
  • a pill variant comprising the two N-terminal domains of pill and a truncated, termination-incompetent C- terminal domain was described in Bennett, N. J.; Rakonjac, J., Unlocking of the filamentous bacteriophage virion during infection is mediated by the C domain of pill. Journal of
  • the plll-neg variant as provided in some embodiments herein is efficiently incorporated into phage particles, but it does not catalyze the unlocking of the particle for entry during infection, rendering the respective phage noninfectious even if wild type pill is present in the same phage particle. Accordingly, such plll-neg variants are useful for devising a negative selection strategy in the context of PACE, for example, by providing an expression construct comprising a nucleic acid sequence encoding a plll-neg variant under the control of a promoter comprising a recognition motif, the recognition of which is undesired.
  • plll-neg is used in a positive selection strategy, for example, by providing an expression construct in which a plll-neg encoding sequence is controlled by a promoter comprising a nuclease target site or a repressor recognition site, the recognition of either one is desired.
  • a protease PACE experiment according to methods provided herein is run for a time sufficient for at least 10, at least 20, at least 30, at least 40, at least 50, at least 100, at least 200, at least 300, at least 400, at least, 500, at least 600, at least 700, at least 800, at least 900, at least 1000, at least 1250, at least 1500, at least 1750, at least 2000, at least 2500, at least 3000, at least 4000, at least 5000, at least 7500, at least 10000, or more consecutive viral life cycles.
  • the viral vector is an M13 phage, and the length of a single viral life cycle is about 10-20 minutes.
  • the host cells are contacted with the vector and/or incubated in suspension culture.
  • bacterial cells are incubated in suspension culture in liquid culture media.
  • suitable culture media for bacterial suspension culture will be apparent to those of skill in the art, and the invention is not limited in this regard. See, for example, Molecular Cloning: A Laboratory Manual, 2nd Ed., ed. by Sambrook, Fritsch, and Maniatis (Cold Spring Harbor Laboratory Press: 1989); Elizabeth Kutter and Alexander Sulakvelidze: Bacteriophages: Biology and Applications. CRC Press; 1 st edition (December 2004), ISBN: 0849313368; Martha R. J. Clokie and Andrew M.
  • the protease PACE methods provided herein are typically carried out in a lagoon. Suitable lagoons and other laboratory equipment for carrying out protease PACE methods as provided herein have been described in detail elsewhere. See, for example, International PCT Application, PCT/US2011/066747, published as WO2012/088381 on June 28, 2012, the entire contents of which are incorporated herein by reference.
  • the lagoon comprises a cell culture vessel comprising an actively replicating population of vectors, for example, phage vectors comprising a gene encoding the protease of interest, and a population of host cells, for example, bacterial host cells.
  • the lagoon comprises an inflow for the introduction of fresh host cells into the lagoon and an outflow for the removal of host cells from the lagoon.
  • the inflow is connected to a turbidostat comprising a culture of fresh host cells.
  • the outflow is connected to a waste vessel, or a sink.
  • the lagoon further comprises an inflow for the introduction of a mutagen into the lagoon. In some embodiments that inflow is connected to a vessel holding a solution of the mutagen.
  • the lagoon comprises an inflow for the introduction of an inducer of gene expression into the lagoon, for example, of an inducer activating an inducible promoter within the host cells that drives expression of a gene promoting mutagenesis (e.g., as part of a mutagenesis plasmid), as described in more detail elsewhere herein.
  • that inflow is connected to a vessel comprising a solution of the inducer, for example, a solution of arabinose.
  • a PACE method as provided herein is performed in a suitable apparatus as described herein.
  • the apparatus comprises a lagoon that is connected to a turbidostat comprising a host cell as described herein.
  • the host cell is an E. coli host cell.
  • the host cell comprises an accessory plasmid as described herein, a helper plasmid as described herein, a mutagenesis plasmid as described herein, and/or an expression construct encoding a fusion protein as described herein, or any combination thereof.
  • the lagoon further comprises a selection phage as described herein, for example, a selection phage encoding a protease of interest.
  • the lagoon is connected to a vessel comprising an inducer for a mutagenesis plasmid, for example, arabinose.
  • the host cells are E.
  • a host cell for continuous evolution processes as described herein.
  • a host cell comprises at least one viral gene encoding a protein required for the generation of infectious viral particles under the control of a conditional promoter, and a fusion protein comprising a transcriptional activator targeting the conditional promoter and fused to an inhibitor via a linker comprising a protease cleavage site.
  • some embodiments provide host cells for phage-assisted continuous evolution processes, wherein the host cell comprises an accessory plasmid comprising a gene required for the generation of infectious phage particles, for example, M13 gill, under the control of a conditional promoter, as described herein.
  • the host cells comprises an expression construct encoding a fusion protein as described herein, e.g., on the same accessory plasmid or on a separate vector.
  • the host cell further provides any phage functions that are not contained in the selection phage, e.g., in the form of a helper phage.
  • the host cell provided further comprises an expression construct comprising a gene encoding a mutagenesis-inducing protein, for example, a mutagenesis plasmid as provided herein.
  • modified viral vectors are used in continuous evolution processes as provided herein.
  • such modified viral vectors lack a gene required for the generation of infectious viral particles.
  • a suitable host cell is a cell comprising the gene required for the generation of infectious viral particles, for example, under the control of a constitutive or a conditional promoter ⁇ e.g., in the form of an accessory plasmid, as described herein).
  • the viral vector used lacks a plurality of viral genes.
  • a suitable host cell is a cell that comprises a helper construct providing the viral genes required for the generation of infectious viral particles.
  • a cell is not required to actually support the life cycle of a viral vector used in the methods provided herein.
  • a cell comprising a gene required for the generation of infectious viral particles under the control of a conditional promoter may not support the life cycle of a viral vector that does not comprise a gene of interest able to activate the promoter, but it is still a suitable host cell for such a viral vector.
  • the host cell is a prokaryotic cell, for example, a bacterial cell.
  • the host cell is an E. coli cell.
  • the host cell is a eukaryotic cell, for example, a yeast cell, an insect cell, or a mammalian cell.
  • the type of host cell will, of course, depend on the viral vector employed, and suitable host cell/viral vector combinations will be readily apparent to those of skill in the art.
  • the viral vector is a phage and the host cell is a bacterial cell.
  • the host cell is an E. coli cell.
  • Suitable E. coli host strains will be apparent to those of skill in the art, and include, but are not limited to, New England Biolabs (NEB) Turbo, ToplOF', DH12S, ER2738, ER2267, and XLl-Blue MRF' . These strain names are art recognized and the genotype of these strains has been well characterized. It should be understood that the above strains are exemplary only and that the invention is not limited in this respect.
  • the host cells are E. coli cells expressing the Fertility factor, also commonly referred to as the F factor, sex factor, or F-plasmid.
  • the F-factor is a bacterial DNA sequence that allows a bacterium to produce a sex pilus necessary for conjugation and is essential for the infection of E. coli cells with certain phage, for example, with M13 phage.
  • the host cells for M13-PACE are of the genotype F proA + B + A(lacIZY) zzf::Tnl0(TetR)/ endAl recAl galE15 galK16 nupG rpsL AlacIZYA araD139 A(ara,leu)7697 mcrA A(mrr-hsdRMS-mcrBC) proBA::pirl l6 ⁇ .
  • a subjective rating matrix based was created upon the knowledge of TEV protease substrate specificity and evolution of proteases that accept substrate changes. Key features include high marks for consensus residues ENLYFQS (SEQ ID NO: 2) as well as substitutions with known evolutionary solutions P6 His, PI His, and PI Glu. Penalties for cysteine residues were also introduced due to disulfide formation in mammalian target proteins and proline due to unique structural properties. Table 16: Refined List of Protease Target Substrates.
  • Target substrates were identified from the human extracellular proteome based upon ratings calculated using the above scoring matrix. These four substrates were manually curated based upon the disease relevance of the target protein and the solvent-accessibility of target peptide.
  • IL-23 is a pro-inflammatory cytokine secreted by macrophages and dendritic cells in response to pathogens and tissue damage, ultimately promoting an innate immune response at the site of injury or infection. This immune response is mediated by IL-23 -dependent stabilization of Thl7 cells, a class of T helper cells that produce pro-inflammatory cytokines IL-17, IL-6, and TNFa. Hyperactivity of this pathway can lead to a variety of autoimmune disorders including psoriasis and rheumatoid arthritis.
  • the target sequence HPLVGHM differs from the TEV consensus substrate sequence, ENLYFQS (SEQ ID NO: 2), at six of seven positions. Two of these substitutions are predicted to not substantially impact TEV protease activity due to its low specificity at positions P5 and ⁇ , while the other four substitutions occur at positions that are known to be crucial specificity determinants of wild-type TEV protease (P6 Glu, P3 Tyr, P2 Phe, and PI Gin). Substitution of TEV substrate P2 Phe or PI Gin with the corresponding IL-23 substrate residue (P2 Gly or PI His) has been shown to reduce TEV protease activity by more than an order of magnitude in each case. However, TEV mutants that accept PI His instead of the PI Gin have been identified, demonstrating the evolv ability of PI recognition.
  • PACE requires linking the activity of interest to expression of an essential phage gene (such as gene III) and thus phage survival. Such a linkage was previously established for a range of activities including polymerase activity, DNA binding, protein binding, and protein cleavage.
  • PA-RNAP protease-activated RNA polymerase
  • T7 RNAP T7 RNA polymerase
  • Figure 1 a natural inhibitor of T7 RNAP
  • HNLYFQS HNLYFQS
  • SEQ ID NO: 4 the population that emerged from PACE on the first stepping-stone was diversified (HNLYFQS; SEQ ID NO: 4) with NNK codons at TEV protease residues 209, 211, 216, and 218, which line the hydrophobic pocket that is occupied by the P2 Phe and performed PACE using host cells expressing ENLYGQS (SEQ ID NO: 5).
  • TEV mutants The resulting population of TEV mutants is typified by the mutations N176I, V209M, W21 II, M218F (Table 4), which confer cleavage activity on both HNLYFQS (SEQ ID NO: 4) and ENLYGQS (SEQ ID NO: 5) substrates ( Figure 8).
  • trajectory 3 a mixing strategy was used to access TEV proteases that could cleave the HNLYFHS (SEQ ID NO: 6) stepping-stone double mutant substrate.
  • a mixing strategy relies on a transitional period of phage propagation on a mixture of two different host cell populations, one expressing an accepted substrate (HNLYFQS; SEQ ID NO: 4) and the other expressing the next stepping-stone substrate (HNLYFHS; SEQ ID NO: 6). Following this transitional period, the SP is propagated exclusively on hosts expressing the next stepping- stone substrate (HNLYFHS; SEQ ID NO: 6).
  • the variants that emerged from this stage of trajectory 3 showed weak apparent activity on the double mutant substrate HNLYFHS (SEQ ID NO: 6) ( Figure 9), and only a single additional enriched mutation D148A (Table 5).
  • the primers used to randomize TEV protease residues 167 and 177 must also encode the identity of intervening amino acids N171 and N176. Although the population appeared to converge on N176I (Table 4), one library was constructed with primers encoding N176I (trajectory 1) and another with N171D + N176T (trajectory 2) to preserve genetic diversity at N176. Libraries constructed for all three trajectories were then subjected to PACE on host cells expressing the triple mutant substrate HNLYGHS (SEQ ID NO: 173). The variants emerging at this stage of trajectory 1 and 2 were enriched for mutations at residues 146, 148, and 177, consistent with acceptance of the newly introduced PI substitution.
  • clones from trajectory 3 exhibit mutations at residues 209, 211, and 218 that may promote acceptance of the newly added P2 Gly substitution. Regardless of trajectory, all clones emerging at this stage exhibit at least one mutation from each of three targeted mutagenesis libraries (Table 6), suggesting that they have evolved activity on the triple mutant substrate.
  • the lowered substrate concentration strategy was applied using a mixing experiment to transition from proB to proA expression of the PA-RNAP; this experiment yielded modest changes in genotypes.
  • the other two strategies were implemented simultaneously on all three trajectories.
  • the resulting six populations (trajectories la, lb, 2a, 2b, 3a, and 3b; see Figure 2) were carried forward into PACE on hosts expressing a PA-RNAP with both the IL-23 (38- 66) linker and the attenuated T7 RNAP mutant Q649S.
  • L2F maintains the ability to detectably cleave starting and intermediate substrates while acquiring activity on the final IL-23 target ( Figure 16).
  • Figure 3A A previously reported phage substrate display method (Figure 3A) was applied to obtain an unbiased protease specificity profile. M13 bacteriophage encoding pill fused to a FLAG-tag through a library of substrate linkers were immobilized on anti-FLAG magnetic beads. When incubated with a protease of interest, phage encoding cleaved substrates are liberated from the solid support, while phage encoding the intact substrates remain immobilized and are eluted with excess FLAG peptide.
  • Table 18 Phage Display Enrichment Values From Selections on Libraries with Single Residue Randomization.
  • Each sub-table within the larger table represents the amino acid enrichment values generated for the given genotype of TEV protease.
  • Each row within a sub-table contains enrichment values from a selection performed on the library in which the
  • HPLVGHM SEQ ID NO: 3
  • TEV L2F SEQ ID NO: 137
  • the PI specificity of TEV L2F is more pronounced for His, the target residue, in the HPLVGHM (SEQ ID NO: 3) libraries ( Figure 17 and corresponding enrichment values in Table 19).

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Virology (AREA)
  • Medicinal Chemistry (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)

Abstract

The disclosure provides amino acid sequence variants of Tobacco Etch Virus (TEV) proteases that cleave IL-23 and methods of producing the same. In some embodiments, proteases described by the disclosure are useful for treating diseases associated with increased IL-23 or IL-17 expression or activity, for example inflammatory autoimmune diseases. Some aspects of this disclosure provide methods for generating TEV protease variants by continuous directed evolution.

Description

EVOLVED PROTEASES AND USES THEREOF
RELATED APPLICATIONS
This application claims priority under 35 U.S.C. § 119(e) to U.S. provisional patent application, U.S.S.N. 62/449,588, filed January 23, 2017, entitled "EVOLVED PROTEASES AND USES THEREOF", which is incorporated herein by reference.
FEDERALLY SPONSORED RESEARCH
[0001] This invention was made with government support under grant numbers
EB022376 (formerly ROl GM065400), GMl 18062, and GM008313 awarded by the National Institutes of Health (NIH). The government has certain rights in the invention.
BACKGROUND
[0002] Proteases are ubiquitous enzymes that play important roles in many aspects of cell and tissue biology. Proteases can also be harnessed for biotechnological and biomedical applications. Among the more than 600 naturally occurring proteases that have been described are enzymes that have proven to be important catalysts of industrial processes, essential tools for proteome analysis, and life-saving pharmaceuticals. Recombinant human proteases including thrombin, factor Vila, and tissue plasminogen activator are widely used drugs for the treatment of blood clotting diseases. In addition, the potential of protease-based therapeutics to address disease in a manner analogous to that of antibody drugs, but with catalytic turnover, has been recognized for several decades.
[0003] Natural proteases, however, typically target only a narrowly defined set of substrates, limiting their therapeutic potential. The directed evolution of proteases in principle could generate enzymes with tailor-made specificities, but laboratory-evolved proteases are frequently non-specific, weakly active, or only modestly altered in their substrate specificity, limiting their utility. Accordingly there is an unmet need in the art to develop novel evolved proteases and methods of producing such novel proteases.
SUMMARY
[0004] The laboratory evolution of proteases has the potential to generate proteases with therapeutically relevant specificities, for example novel proteases that cleave
interleukin-23 (IL-23). IL-23 is a pro-inflammatory cytokine that enhances expansion of T helper type 17 (Thl7) cells and upregulates inflammatory autoimmune responses. It has been demonstrated that IL-23 plays an important role in several autoimmune diseases, such as psoriasis, inflammatory bowel disease, rheumatoid arthritis, asthma, and multiple sclerosis. Without wishing to be bound by any particular theory, cleavage of IL-23 by a protein (e.g., an evolved protease) as described herein, in some embodiments, inhibits pro-inflammatory signaling mediated by IL-23. Thus, in some embodiments, proteins described herein are useful for the treatment of diseases associated with IL-23.
[0005] In some aspects, the disclosure provides a protein (e.g., an evolved protease) that cleaves IL-23. In some embodiments, the protein is evolved from a TEV protease. In some embodiments, the protein is not evolved from a protein that naturally cleaves IL-23. In some aspects, the disclosure provides a protein (e.g., an evolved protease) comprising an amino acid sequence that is at least 90% identical to a Tobacco etch virus (TEV) protease, for example as represented by SEQ ID NO: 1. In some aspects, the disclosure provides a protein comprising an amino acid sequence that is at least 90% identical to SEQ ID NO: 1, wherein the protein comprises at least 14 amino acid sequence mutations set forth in Table 1.
[0006] In some embodiments, the amino acid sequence is not more than 94% (e.g., not more than 93.9%, 93.5%, 93%, 92.5%, 92%, 91.5%, 90% etc.) identical to SEQ ID NO: 1. In some embodiments, the protein comprises at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 amino acid mutations as set forth in Table 1.
[0007] In some embodiments, at least one of the amino acid sequence mutations is introduced at an amino acid position selected from the group consisting of T17, H28, T30, N68, E107, F132, S 153, and S 170. In some embodiments, at least one of the amino acid sequence mutations is selected from the group consisting of T17S, H28L, T30A, N68D, E107D, F132L, S 153N, and S 170A.
[0008] In some embodiments, the protein further comprises at least one amino acid mutation at an amino acid position selected from the group consisting of D127, S 135, T146, D148, F162, N171, N176, N177, V209, W211, M218, and K229. In some embodiments, at least one of the amino acid mutations is selected from the group consisting of D127A, S 135F, T146S, D148P, F162S, N171D, N176T, N177M, V209M, W211I, M218F, and K229E.
[0009] In some embodiments, the protein comprises or consists of the amino acid sequence as set forth in any one of SEQ ID NOs: 11- 153.
[0010] In some embodiments, the protein cleaves a target sequence that is present in the present in an exposed loop of an IL-23 protein. In some embodiments, the protein cleaves a target sequence as set forth as HPLVGHM (SEQ ID NO: 3). In some embodiments, the protein cleaves the canonical target sequence of a TEV protease, for example a target sequence set forth as ENLYFQS (SEQ ID NO: 2).
[0011] In some aspects, the disclosure provides a pharmaceutical composition comprising a protein as described herein and a pharmaceutically acceptable excipient.
[0012] In some aspects, the disclosure provides an isolated nucleic acid encoding a protein comprising an amino acid sequence as set forth in any one of SEQ ID NOs: 11-153. In some aspects, the disclosure provides a host cell comprising said isolated nucleic acid.
[0013] In some aspects, the disclosure provides methods of reducing IL-23 activity
(e.g., in the extracellular environment of a subject or in vitro), the method comprising administering to the extracellular environment (e.g., administering to the subject) an effective amount of a protein (e.g. , a TEV variant protease) as described herein. In some
embodiments, the extracellular environment is in vitro. In some embodiments, the extracellular environment is in vivo, for example an extracellular environment located within a subject. In some embodiments, the extracellular environment is characterized by increased IL-23 activity relative to a normal, healthy extracellular environment.
[0014] In some aspects, this disclosure relates to the surprising discovery that cleaving IL-23 using the evolved proteases described herein results in attenuated IL- 17 secretion. In some embodiments, cleavage of the HPLVGHM (SEQ ID NO: 3) target site (e.g., by a protease described herein) reduces IL- 17 secretion and is sufficient to block immune signaling. In some embodiments, IL- 17 secretion from a cell or cells located in the extracellular environment (e.g. , a cell or cells of the subject) is reduced. In some
embodiments, the cell is a mammalian cell, optionally a human cell or a mouse cell. In some embodiments, the cell is an immune cell, such as a macrophage, dendritic cell, or activated phagocytic cell.
[0015] In some aspects, the disclosure relates to methods of producing an evolved,
IL-23 -cleaving protease using phage-assisted continuous evolution (PACE). The general concept of PACE technology been described, for example, in International PCT Application, PCT/US2009/056194, filed September 8, 2009, published as WO 2010/028347 on March 11, 2010; International PCT Application, PCT/US2011/066747, filed December 22, 2011, published as WO 2012/088381 on June 28, 2012; U.S. Application, U.S.S.N. 13/922,812, filed June 20, 2013; International PCT Application, PCT/US2015/057012, filed on October 22, 2015, published as WO2016/077052; and, PCT/US2016/027795, filed on April 15, 2016, published as WO2016/168631, the entire contents of each of which are incorporated herein by reference. One advantage of the PACE technology described herein is that both the time and human effort required to evolve a protease. In some embodiments, evolved proteases described herein (e.g., proteases evolved using PACE technology described herein) cleave IL-23 with higher efficiency or specificity relative to previously described IL-23 cleaving proteases.
[0016] The summary above is meant to illustrate and outline, in a non-limiting manner, some of the embodiments, advantages, features, and uses of the technology disclosed herein. The disclosure is, however, not limited to the embodiments described in the summary above. Other embodiments, advantages, features, and uses of the technology disclosed herein will be apparent from the Detailed Description, the Drawings, the Examples, and the Claims.
BRIEF DESCRIPTION OF THE DRAWINGS
[0017] Figure 1. Overview of PACE of a protease. A culture of host E. coli continuously dilute a fixed-volume vessel containing an evolving population of selection phage (SP) in which the essential phage gene gill has been replaced by a protease gene. These host cells contain an arabinose-inducible mutagenesis plasmid (MP) and an accessory plasmid (AP) that supplies gill. The expression of gill is made protease-dependent through the use of a protease-activated RNA polymerase (PA-RNAP) consisting of T7 RNA polymerase fused through a cleavable substrate linker to T7 lysozyme, a natural inhibitor of T7 RNAP transcription. If an SP encodes a protease capable of cleaving the substrate linker, then the resulting liberation of T7 RNAP leads to the production of pill and infectious progeny phage encoding active proteases. Conversely, SP encoding proteases that cannot cleave the PA-RNAP yield non-infectious progeny phage.
[0018] Figure 2. Evolutionary trajectories and representative evolved TEV protease genotypes. Each arrow represents a PACE experiment with the corresponding substrate peptide (sequences are given by SEQ ID NOs: 3-8, 173, and 184) and selection stringency parameters listed beneath the arrow. Increased selection stringency annotations are: Q649S (a T7 RNAP mutant with decreased transcriptional activity), proA (lower expression of substrate PA-RNAP), and IL-23 (38-66) (native IL-23 sequence in place of GGS linker). Numbers above the arrows denote TEV protease residues that were targeted in site- saturation mutagenesis libraries used to initiate that PACE experiment. In the first PACE experiment, wild-type TEV protease was mutagenized at the positions shown. All other libraries were generated using the protease genes emerging from the previous PACE segment as the PCR template. For PACE segments with no targeted mutagenesis, lagoons were inoculated with an aliquot of the phage population from the preceding experiment. [0019] Figures 3A to 3E. Protease specificity profiling. (Figure 3A) Overview of phage substrate display. M13 bacteriophage libraries contain pill fused to a FLAG-tag through a randomized protease substrate linker. These substrate phage are bound to anti- FLAG magnetic beads and treated with a protease to release phage that encode substrates that can be cleaved by the protease. The remaining intact substrate phage are eluted with excess FLAG peptide. The abundance of all substrate sequences within the cleaved and eluted samples is measured by high-throughput sequencing. (Figures 3B to 3E) For all assayed proteases, phage substrate display was separately performed on seven libraries, each with a different single randomized position within the ENLYFQS (SEQ ID NO: 2) motif. The resulting enrichment values are displayed as sequence logos, with enrichment values above zero indicating protease acceptance, and values below zero indicating rejection. (Figure 3B) Wild-type TEV protease exhibits strong enrichment for the consensus motif EXLYFQS (SEQ ID NO: 168). (Figure 3C) Evolved TEV L2F (SEQ ID NO: 137) has broadened specificity at P6 and shifted specificity at P3, PI, and PI' in accordance with the HPLVGHM (SEQ ID NO: 3) target substrate. (Figure 3D) Mutations I138T, N171D, and N176T are sufficient to broaden P6 specificity. (Figure 3E) Mutations T146S, D148P, S 153N, S 170A, and N177M shift specificity at both PI and P3.
[0020] Figure 4. Protease-mediated attenuation of IL-17 secretion in mouse splenocytes. The activity of IL-23 in vivo is mediated by stabilization of a T-helper cell lineage (Thn) that secretes IL-17, leading to downstream pro-inflammatory signals. This pathway can be assayed within a culture of mouse mononuclear splenocytes, by measuring the amount of IL-17 secretion into the cell culture media using an ELISA. As a positive control, anti-IL-23 antibodies in a super- stoichiometric ratio prevent the secretion of IL-17. When pre-incubated with IL-23, evolved TEV L2F (SEQ ID NO: 137) also attenuates IL-17 secretion, demonstrating that cleavage of the HPLVGHM (SEQ ID NO: 3) target site is sufficient to block immune signaling. Error bars represent the standard deviation of three technical replicates.
[0021] Figure 5. Selection Phage Plasmid Map. The M13 bacteriophage gene gill has been replaced with the gene of interest to be evolved, maltose-binding protein (MBP) fused to TEV through a GGS-linker. The MBP fusion enhances soluble expression of active TEV protease.
[0022] Figure 6. Accessory Plasmid Map. A single accessory plasmid is used to supply the PA-RNAP construct under constitutive expression as well as supply gill under control of the T7 promoter. A lysozyme-dependent terminator is placed downstream of the T7 promoter to lower transcription of gill in the absence of active protease.
[0023] Figure 7. Luciferase Activity Assay after PACE 1 of Trajectories 1, 2, and
3. TEV protease clones (corresponding genotypes can be found in Table 3) after evolution on the first stepping-stone substrate show apparent proteolytic activity on both the wild-type substrate ENLYFQS (SEQ ID NO: 2) and the single mutant substrate HNLYFQS (SEQ ID NO: 4). Error bars represent the standard deviation of three technical replicates.
[0024] Figure 8. Luciferase Activity Assay after PACE 2 of Trajectories 1 and 2.
TEV protease clones from trajectories 1 and 2 (corresponding genotypes can be found in Table 4) after evolution on the second stepping-stone substrate, ENLYGQS (SEQ ID NO: 5), show activity on the wild-type substrate ENLYFQS (SEQ ID NO: 2) and both single mutant substrates (HNLYFQS (SEQ ID NO: 4), ENLYGQS (SEQ ID NO: 5)). Error bars represent the standard deviation of three technical replicates.
[0025] Figure 9. Luciferase Activity Assay after PACE 3 of Trajectory 3. TEV protease clones from trajectory 3 (corresponding genotypes can be found in Table 5) after evolution on the second stepping-stone substrate, HNLYFHS (SEQ ID NO: 6), show apparent activity on the wild-type substrate (SEQ ID NO: 2) and the double mutant substrate, HNLYFHS (SEQ ID NO: 6). Error bars represent the standard deviation of three technical replicates.
[0026] Figure 10. Luciferase Activity Assay of Clones after PACE 5. PACE evolved TEV SP clones (corresponding genotypes can be found in Table 7) from stage four of the evolutionary trajectories show proteolysis of HPLVGHM (SEQ ID NO: 3) and ENLYFQS (SEQ ID NO: 2) substrates within a protease-activated RNA polymerase as measured by downstream luciferase signal. These data indicate that the evolved enzymes were acquiring the desired phenotype. Error bars represent the standard deviation of three technical replicates.
[0027] Figure 11. Stringency Modulation Validation. Using the highest activity
TEV variant prior to stringency modulation in PACE, protease-induced luminescence assays were performed on a number of accessory plasmids (APs) that were expected to exert higher selection stringency. Prior to stringency modulation, the HPLVGHM (SEQ ID NO: 3) proB AP exhibits robust protease-induced luminescence and fold activation of 4.7. When the flexible GGS -linkers in the PA-RNAP of the standard AP are replaced with the native sequence of IL-23 (amino acids 38-66) protease-induced luminescence is diminished (fold- activation 2.8). When expression levels of the HPLVGHM (SEQ ID NO: 3) PA-RNAP are lower due to a weaker constitutive promoter (proA instead of proB), there is much lower background and protease-induced luminescence as well as a lower fold activation of 3.3. The introduction of deactivating mutation Q649S to the T7 RNAP portion of the PA-RNAP also causes a decrease in background and protease-induced lucif erase signal (fold activation 2.7). When all three strategies are combined in a single AP, an even greater decrease in luciferase signal is observed (fold activation 2.1); a lower fold activation generally corresponds with higher selection stringency. The sequences correspond to SEQ ID NOs.: 2 and 3. Error bars represent the standard deviation of three technical replicates.
[0028] Figure 12. Luciferase Activity Assay of Clones after PACE 9. After multiple PACE experiments with increasing levels of positive selection stringency, many TEV protease variants (corresponding genotypes can be found in Table 11) exhibit markedly stronger apparent activity on the HPLVGHM (SEQ ID NO: 3) substrate when compared with clones from previous PACE experiments such as those seen in Figure 8. Error bars represent the standard deviation of three technical replicates.
[0029] Figure 13. Epistatic Interactions with TEV Protease Residue N177. PACE evolved clones LIF and L2F (SEQ ID NO: 137) from trajectories 1 and 2 respectively exhibit robust apparent activity on the HPLVGHM (SEQ ID NO: 3) substrate. When the identity of residue N 177 is swapped between these clones there is a significant decrease in apparent activity, suggesting that the optimal substitution for N177 depends upon the identities of other mutations within TEV protease. Error bars represent the standard deviation of three technical replicates.
[0030] Figure 14. In Vitro Proteolysis Assays to Select Highest Activity Clone.
TEV protease variants from the final PACE time point were overexpressed and purified. Approximately 1 μg of protease was incubated with 5 μg of a fusion protein construct in which MBP is linked to GST through a cleavable substrate linker (in this case the substrate was HPLVGHM (SEQ ID NO: 3)). Here it was observed that TEV L2F (SEQ ID NO: 137) exhibits the highest catalytic activity. Note that TEV protease variants LIF and L5B encode premature stop codons leading to products with approximately the same molecular weight as GST. Consequently, the intensity of the MBP product band best reflects reaction efficiency.
[0031] Figures 15A to 15D. HPLC Assay of TEV Protease Kinetics. (Figure 15A)
Synthetic Peptide Standards. TEV protease substrate peptides and the corresponding product peptides in a 1: 1 mixture are separable by reverse-phase liquid chromatography. (Figure 15B) WT TEV protease (0.1 μΜ) was incubated for 10 minutes at 30 °C with ENLYFQS (SEQ ID NO: 2) substrate concentration ranging from 50 to 800 μΜ. (Figure 15C) TEV L2F (SEQ ID NO: 137) protease (0.1 μΜ) was incubated for 10 minutes at 30 °C with HPLVGHM (SEQ ID NO: 3)_substrate concentration ranging from 50 to 2000 μΜ. (Figure 15D) TEV L2F (SEQ ID NO: 137) protease (0.05μΜ) was incubated for 10 minutes at 30 °C with ENLYFQS (SEQ ID NO: 2) substrate concentration ranging from 50 to 500 μΜ. Data was fit to a Michaelis-Menten enzyme kinetics model with error bars representing the standard deviation of three technical replicates.
[0032] Figure 16. Evolved TEV Protease Cleaves Wild-type and Stepping-Stone
Substrates in Addition to the HPLVGHM (SEQ ID NO: 3) Substrate. In a manner entirely analogous to that described above in Figure 11, TEV proteases were assayed on a panel of substrate sequences. WT TEV efficiently cleaves wild-type substrate, and to a much lesser degree processes single mutant substrates (HNLYFQS (SEQ ID NO: 4), ENLYFHS (SEQ ID NO: 7), ENLYGQS (SEQ ID NO: 5)). Evolved TEV protease clone L2F yields a visible product band for the target substrate HPLVGHM (SEQ ID NO: 3). However, this evolved protease has also maintained activity on wild-type, single, double and triple mutant substrates that were used as evolutionary stepping-stones in PACE. Sequences are given by SEQ ID NOs.: 2-7 and 173.
[0033] Figure 17. Specificity Profiles Generated from Libraries with Three
Randomized Substrate Amino Acids. The logos depicted were generated using phage substrate libraries containing windows of three randomized amino acids within either the ENLYFQS (SEQ ID NO: 2) or the HPLVGHM (SEQ ID NO: 3) substrate (corresponding enrichment values in Table 19). The nature of the library (sequences are given by SEQ ID NOs.: 174-183) and the protease (sequences are given by SEQ ID NOs: 1 and 173) that was used in the selection is specified in the title above each sequence logo. The specificity profiles of wild-type and evolved protease using three-residue ENLYFQS (SEQ ID NO: 2) libraries are largely identical to those seen in Figures 3A to 3E using single-site randomized substrate libraries. However, in the context of the HPLVGHM (SEQ ID NO: 3) libraries, it is observed that TEV L2F (SEQ ID NO: 137) indeed retains some specificity for glutamate at P6 and exhibits a greater specificity for histidine at PI.
[0034] Figure 18. TEV Protease Variants Containing Subsets of TEV L2F
Mutations Are All Active In Vitro. TEV protease variants were engineered to contain groups of mutations taken from the L2F variant (SEQ ID NO: 137). These enzymes were purified and assayed in vitro on test substrate, MBP-GST, containing the wild-type substrate motif ENLYFQS (SEQ ID NO: 2). All assayed variants retained proteolytic activity despite the naive genetic dissection of mutations.
[0035] Figure 19. Specificity Profiles of TEV Variants Similar to Wild-type TEV.
The logos above were generated using phage substrate libraries each containing a single randomized amino acid within the ENLYFQS (SEQ ID NO: 2) substrate (corresponding enrichment values in Table 18). The genotype of the protease that was used in the selection is specified in the title above each sequence logo. The above specificity profiles all exhibit the ENLYFQS (SEQ ID NO: 2) consensus motif that is characteristic of wild-type TEV protease specificity.
[0036] Figures 20A to 20B. Identification of IL-23 Cleavage Sites by Western
Blot and LC-MS. IL-23 heterodimer (IL-23) and IL-23 monomer (IL-23pl9) were incubated with and without TEV L2F (SEQ ID NO: 137). Reaction mixtures were subject to LC-MS and visualized by Western blot with anti-IL-23pl9 monoclonal antibody (Figure 20A).
Bands 1 and 3 correspond to intact IL-23pl9; differences in size are due to carboxy-terminal affinity purification tags. Cleavage product bands 2 and 4 correspond to IL-23 fragments with new masses that are 3,598 Da less than the corresponding starting materials. This mass difference corresponds to the fragment liberated by cleavage at the target site (HPLVGH//M; SEQ ID NO: 8). Cleavage of the monomer also results in a second product (band 5) with a mass that matches IL-23 cleaved at both the target site (HPLVGH//M; SEQ ID NO: 8) and an off-target site (ARVFAH//G; SEQ ID NO: 9). The IL-23pl9 amino acid sequence (SEQ ID NO: 195) is shown with the target cleavage site in bold and the off-target site in italics
(Figure 20B).
[0037] Figures 21A to 21D. Identification of Cleavage Site within IL-23
Heterodimer by Mass Spectrometry. IL-23 was procured in its native state as a
heterodimeric protein expressed and purified from cultured mammalian cells (PHC9321, ThermoFisher). This protein was incubated under reducing conditions either in the presence or absence of TEV L2F (SEQ ID NO: 137). These samples were then analyzed by LC-MS to yield total ion current (Figures 21A and 21C) and the corresponding deconvoluted mass spectra (Figures 21B and 21D). Both samples exhibit a cluster of masses around 36000 Da corresponding to the multiple glycoforms of the p40 subunit. The unreacted sample (Figures 21A and 21B) contains a mass of 19,472 Da, which is 751 Da greater than the expected mass of IL-23. The reaction mixture (Figures 21C and 21D) contains 27,768 Da match for TEV L2F (SEQ ID NO: 137) as well as a 15,875 Da mass, which matches the expected cleavage product plus an unspecified 751 Da C-terminal tag. [0038] Figures 22A to 22D. Identification of Multiple Cleavage Sites within IL-23
Monomer by Mass Spectrometry. IL-23 l9 was procured in its monomeric form expressed and purified from cultured HEK293T cells using a C-terminal Myc/DDK tag (TP309680, Origene). This protein was incubated under reducing conditions either in the presence or absence of TEV L2F (SEQ ID NO: 137). These samples were then analyzed by LC-MS to yield total ion current (Figures 22A and 22C) and the corresponding deconvoluted mass spectra (Figures 22B and 22D). The unreacted sample (Figures 22A and 22B) contains a mass of 22,324 Da which corresponds to the IL-23pl9 sequence and Myc tag in the product data. The reaction mixture (Figures 22C and 22D) contains three additional masses: TEV L2F (SEQ ID NO: 137) (27,768 Da), substrate cleaved only at the HPLVGHM (SEQ ID NO: 3) target site (18727 Da), and substrate cleaved at both the target site and an off-target site ARVFAHG (SEQ ID NO: 10) (14,526 Da).
[0039] Figure 23. Western Blot Analysis of Premixed Additives to Splenocyte
Cell Culture. IL-23 and TEV proteases were incubated for 16 hours at 4 °C in the presence of BSA as a stabilizing carrier protein. Samples were prepared at 300x concentration used in splenocyte cultures to enable detection of IL-23pl9 and IL-12p40 by Western blot. Neither component is proteolyzed by wild-type TEV protease; IL-12p40 is also unaffected by TEV L2F (SEQ ID NO: 137). TEV L2F (SEQ ID NO: 137) cleaves IL-23pl9 at the HPLVGHM (SEQ ID NO: 3) site in a dose-dependent manner. At the highest doses off-target cleavage products are also observed. An aliquot of these samples was directly used in the cell culture experiments in Figure 4 to confirm that on-target proteolysis causes IL-23 loss of function.
[0040] Figure 24. Western Blot Analysis of Premixed Additives to Splenocyte
Cell Culture. IL-23 and TEV proteases were incubated for 16 hours at 4 °C in the presence of BSA as a stabilizing carrier protein. Samples were prepared at 300x concentration used in splenocyte cultures to enable detection of IL-23pl9 and IL-12p40 by Western blot. At approximately 0.36 molar equivalents of TEV L2F (SEQ ID NO: 137), greater than 50% of IL-23pl9 is cleaved at HPLVGHM (SEQ ID NO: 3) site, demonstrating that TEV L2F (SEQ ID NO: 137) processes substrate with catalytic turnover. At the highest doses off-target cleavage products are also observed. An aliquot of these samples was directly used in the cell culture experiments in Figure 25 to confirm that on-target proteolysis causes IL-23 loss of function.
[0041] Figure 25. TEV L2F Catalytically Deactivates IL-23 and Prevents IL-17
Secretion in Mouse Splenocytes. IL-17 is secreted by cultured mouse mononuclear splenocytes in response to human IL-23 in the media. The secretion of IL- 17 can be prevented by pretreatment of IL-23 with TEV L2F (SEQ ID NO: 137) at a dose that is less half the molar equivalent of IL-23. Inhibition began at a dose corresponding to 0.7 nM TEV L2F (SEQ ID NO: 137) (compared with 1.9 nM IL-23), confirming that IL-23 is deactivated with catalytic turnover by TEV L2F.
[0042] Figure 26. IL-23 induced IL-17 Secretion in Mouse Splenocytes. IL-17 is secreted by cultured mouse mononuclear splenocytes in response to human IL-23 in the media. This response can be prevented by addition of antibodies that neutralize IL-23 directly to cell culture media. A dose-dependent response is observed in which the antibody neutralizes IL-23 through a stoichiometric binding mechanism. Inhibition began at approximately 1.3 nM antibody (compared with 1.9 nM IL-23). Evolved TEV L2F (SEQ ID NO: 137), when added directly to cell culture media, is unable to prevent IL-23 from stimulating IL-17 secretion.
[0043] Figure 27. IL-23 In Vitro Cleavage Assay. IL-23 and TEV proteases were incubated for 16 hours at 4 °C in the presence of BSA as a stabilizing carrier protein. The addition of 10% Fetal Bovine Serum (FBS) to the assay buffer, had no effect on the efficiency of cleavage by TEV L2F (SEQ ID NO: 137). The same percentage of FBS was used to supplement cell culture media, suggesting that components within serum are not responsible for loss of TEV L2F activity when added directly to splenocyte cell cultures.
[0044] Figure 28. In Vitro Cleavage Assay. After positive selection for TEV protease variants that cleave the substrate ENLYAQS, a mixture of genotypes was enriched. Variants containing the mutation V216F cleaved only the ENLYaQS substrate but not the wild-type substrate ENLYFQS (SEQ ID NO: 2).
[0045] Figure 29. In Vitro Cleavage Assay. After simultaneous positive and negative selection for variants that cleaved the mutant substrate ENLYAQS but not the wild- type substrate ENLYFQS, all selected variants contained the V216F mutation.
DEFINITIONS
[0046] The term "protease," as used herein, refers to an enzyme that catalyzes the hydrolysis of a peptide (amide) bond linking amino acid residues together within a protein. The term embraces both naturally occurring and engineered proteases. Many proteases are known in the art. Proteases can be classified by their catalytic residue, and protease classes include, without limitation, serine proteases (serine alcohol), threonine proteases (threonine secondary alcohol), cysteine proteases (cysteine thiol), aspartate proteases (aspartate carboxylic acid), glutamic acid proteases (glutamate carboxylic acid), and metalloproteases (metal ion, e.g., zinc). The structures in parentheses correlate to the respective catalytic moiety of the proteases of each class. Some proteases are highly promiscuous and cleave a wide range of protein substrates, e.g., trypsin or pepsin. Other proteases are highly specific and only cleave substrates with a specific sequence. Some blood clotting proteases such as, for example, thrombin, and some viral proteases such as, for example, HCV or TEV protease, are highly specific proteases. Proteases that cleave in a very specific manner typically bind to multiple amino acid residues of their substrate. Suitable proteases and protease cleavage sites, also sometimes referred to as "protease substrates," will be apparent to those of skill in the art and include, without limitation, proteases listed in the MEROPS database, accessible at merops.sanger.ac.uk and described in Rawlings et ah, (2014) MEROPS: the database of proteolytic enzymes, their substrates and inhibitors. Nucleic Acids Res 42, D503-D509, the entire contents of each of which are incorporated herein by reference. The disclosure is not limited in this respect.
[0047] The term "protein," as used herein, refers to a polymer of amino acid residues linked together by peptide bonds. The term, as used herein, refers to proteins, polypeptides, and peptides of any size, structure, or function. Typically, a protein will be at least three amino acids long. A protein may refer to an individual protein or a collection of proteins. Inventive proteins preferably contain only natural amino acids, although non-natural amino acids {i.e., compounds that do not occur in nature but that can be incorporated into a polypeptide chain; see, for example, cco.caltech.edu/~dadgrp/Unnatstruct.gif, which displays structures of non-natural amino acids that have been successfully incorporated into functional ion channels) and/or amino acid analogs as are known in the art may alternatively be employed. Also, one or more of the amino acids in an inventive protein may be modified, for example, by the addition of a chemical entity such as a carbohydrate group, a hydroxyl group, a phosphate group, a farnesyl group, an isofarnesyl group, a fatty acid group, a linker for conjugation, functionalization, or other modification, etc. A protein may also be a single molecule or may be a multi-molecular complex. A protein may be just a fragment of a naturally occurring protein or peptide. A protein may be naturally occurring, recombinant, or synthetic, or any combination of these.
[0048] The term "Tobacco Etch Virus (TEV) protease," as used herein, refers to a protease derived from, or having at least 70% sequence homology to (or at least 70% identity to), a Tobacco Etch Virus cysteine protease. A wild-type TEV protease refers to the amino acid sequence of a TEV protease as it naturally occurs in a Tobacco Etch Virus genome. In some embodiments, the mutant protease with the single amino acid substitution S219V is referred to as wild-type; this variant is unable to cleave itself thus preventing auto- inactivation. An example of a wild-type S219V TEV protease is represented by the amino acid sequence set forth in SEQ ID NO: 1.
Tobacco Etch Virus protease S219V (237 AA; GenBank Accession No. APA32020.1 ) GES LFKGPRD YNPIS S TICHLTNES DGHTTS LYGIGFGPFIITNKHLFRRNNGTLLVQS L HGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTNFQTK SMSSMVSDTSCTFPSSDGIFWKHWIQTKDGQCGSPLVSTRDGFIVGIHSASNFTNTNN YFTSVPKNFMELLTNQEAQQWVSGWRLNADSVLWGGHKVFMVKPEEPFQPVKEAT QLMN (SEQ ID NO: 1)
Generally, a wild-type TEV protease cleaves the canonical target peptide (e.g., substrate) ENLYFQS (SEQ ID NO: 2). Genetically modified cells that heterologously express one or more TEV protease(s) are known in the art, for example, as described by Tropea et ah, "Expression and purification of soluble His 6-tagged TEV protease." High Throughput Protein Expression and Purification: Methods and Protocols (2009): 297-307.
[0049] The term "TEV protease variant," as used herein, refers to a protein (e.g., a
TEV protease) having one or more amino acid variations introduced into the amino acid sequence, e.g., as a result of application of the PACE method, as compared to the amino acid sequence of a naturally-occurring or wild-type TEV protein (e.g., SEQ ID NO: 1). Amino acid sequence variations may include one or more mutated residues within the amino acid sequence of the protease, e.g., as a result of a change in the nucleotide sequence encoding the protease that results in a change in the codon at any particular position in the coding sequence, the deletion of one or more amino acids (e.g., a truncated protein), the insertion of one or more amino acids, or any combination of the foregoing. In certain embodiments, a TEV protease variant cleaves a different target peptide (e.g., has broadened or different substrate specificity) relative to a wild-type TEV protease.
[0050] The term "continuous evolution," as used herein, refers to an evolution procedure, in which a population of nucleic acids is subjected to multiple rounds of (a) replication, (b) mutation (or modification of the primary sequence of nucleotides of the nucleic acids in the population), and (c) selection to produce a desired evolved product, for example, a novel nucleic acid encoding a novel protein with a desired activity, wherein the multiple rounds of replication, mutation, and selection can be performed without investigator interaction, and wherein the processes (a)-(c) can be carried out simultaneously. Typically, the evolution procedure is carried out in vitro, for example, using cells in culture as host cells. In general, a continuous evolution process provided herein relies on a system in which a gene of interest is provided in a nucleic acid vector that undergoes a life-cycle including replication in a host cell and transfer to another host cell, wherein a critical component of the life-cycle is deactivated and reactivation of the component is dependent upon a desired variation in an amino acid sequence of a protein encoded by the gene of interest.
[0051] In some embodiments, the gene of interest is transferred from cell to cell in a manner dependent on the activity of the gene of interest. In some embodiments, the transfer vector is a virus infecting cells, for example, a bacteriophage, or a retroviral vector. In some embodiments, the viral vector is a phage vector infecting bacterial host cells. In some embodiments, the transfer vector is a conjugative plasmid transferred from a donor bacterial cell to a recipient bacterial cell.
[0052] In some embodiments, the nucleic acid vector comprising the gene of interest is a phage, a viral vector, or naked DNA (e.g., a mobilization plasmid). In some
embodiments, transfer of the gene of interest from cell to cell is via infection, transfection, transduction, conjugation, or uptake of naked DNA, and efficiency of cell-to-cell transfer (e.g., transfer rate) is dependent on an activity of a product encoded by the gene of interest. For example, in some embodiments, the nucleic acid vector is a phage harboring the gene of interest and the efficiency of phage transfer (via infection) is dependent on an activity of the gene of interest in that a protein required for the generation of phage particles (e.g. , pill for M13 phage) is expressed in the host cells only in the presence of the desired activity of the gene of interest.
[0053] For example, some embodiments provide a continuous evolution system, in which a population of viral vectors comprising a gene of interest to be evolved replicates in a flow of host cells, e.g., a flow through a lagoon, wherein the viral vectors are deficient in a gene encoding a protein that is essential for the generation of infectious viral particles, and wherein that gene is in the host cell under the control of a conditional promoter that can be activated by a gene product encoded by the gene of interest, or a mutated version thereof. In some embodiments, the activity of the conditional promoter depends on a desired function of a gene product encoded by the gene of interest. Viral vectors, in which the gene of interest has not acquired a desired function as a result of a variation of amino acids introduced into the gene product protein sequence, will not activate the conditional promoter, or may only achieve minimal activation, while any mutations introduced into the gene of interest that confers the desired function will result in activation of the conditional promoter. Since the conditional promoter controls an essential protein for the viral life cycle, e.g. , pill, activation of this promoter directly corresponds to an advantage in viral spread and replication for those vectors that have acquired an advantageous mutation.
[0054] The term "flow," as used herein in the context of host cells, refers to a stream of host cells, wherein fresh host cells are being introduced into a host cell population, for example, a host cell population in a lagoon, remain within the population for a limited time, and are then removed from the host cell population. In a simple form, a host cell flow may be a flow through a tube, or a channel, for example, at a controlled rate. In other
embodiments, a flow of host cells is directed through a lagoon that holds a volume of cell culture media and comprises an inflow and an outflow. The introduction of fresh host cells may be continuous or intermittent and removal may be passive, e.g., by overflow, or active, e.g. , by active siphoning or pumping. Removal further may be random, for example, if a stirred suspension culture of host cells is provided, removed liquid culture media will contain freshly introduced host cells as well as cells that have been a member of the host cell population within the lagoon for some time. Even though, in theory, a cell could escape removal from the lagoon indefinitely, the average host cell will remain only for a limited period of time within the lagoon, which is determined mainly by the flow rate of the culture media (and suspended cells) through the lagoon.
[0055] Since the viral vectors replicate in a flow of host cells, in which fresh, uninfected host cells are provided while infected cells are removed, multiple consecutive viral life cycles can occur without investigator interaction, which allows for the accumulation of multiple advantageous mutations in a single evolution experiment.
[0056] The term "phage-assisted continuous evolution (PACE)," as used herein, refers to continuous evolution that employs phage as viral vectors.
[0057] The term "viral vector," as used herein, refers to a nucleic acid comprising a viral genome that, when introduced into a suitable host cell, can be replicated and packaged into viral particles able to transfer the viral genome into another host cell. The term viral vector extends to vectors comprising truncated or partial viral genomes. For example, in some embodiments, a viral vector is provided that lacks a gene encoding a protein essential for the generation of infectious viral particles. In suitable host cells, for example, host cells comprising the lacking gene under the control of a conditional promoter, however, such truncated viral vectors can replicate and generate viral particles able to transfer the truncated viral genome into another host cell. In some embodiments, the viral vector is a phage, for example, a filamentous phage (e.g., an M13 phage). In some embodiments, a viral vector, for example, a phage vector, is provided that comprises a gene of interest to be evolved. [0058] The term "nucleic acid," as used herein, refers to a polymer of nucleotides.
The polymer may include natural nucleosides (i.e. , adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine), nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3- methyl adenosine, 5-methylcytidine, C5-bromouridine, C5-fluorouridine, C5-iodouridine, C5-propynyl-uridine, C5-propynyl-cytidine, C5-methylcytidine, 7-deazaadenosine,
7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine, 0(6)-methylguanine, 4-acetylcytidine, 5-(carboxyhydroxymethyl)uridine, dihydrouridine, methylpseudouridine, 1 -methyl adenosine, 1-methyl guanosine, N6-methyl adenosine, and 2-thiocytidine), chemically modified bases, biologically modified bases (e.g., methylated bases), intercalated bases, modified sugars (e.g., 2'-fluororibose, ribose, 2'-deoxyribose, 2 '-O-methylcytidine, arabinose, and hexose), or modified phosphate groups (e.g., phosphorothioates and
5' -N-phosphoramidite linkages).
[0059] The term "gene of interest" or "gene encoding a protease of interest," as used herein, refers to a nucleic acid construct comprising a nucleotide sequence encoding a gene product, e.g., a protease, of interest to be evolved in a continuous evolution process as described herein. The term includes any variations of a gene of interest that are the result of a continuous evolution process according to methods described herein. For example, in some embodiments, a gene of interest is a nucleic acid construct comprising a nucleotide sequence encoding a protease to be evolved, cloned into a viral vector, for example, a phage genome, so that the expression of the encoding sequence is under the control of one or more promoters in the viral genome. In other embodiments, a gene of interest is a nucleic acid construct comprising a nucleotide sequence encoding a protease to be evolved and a promoter operably linked to the encoding sequence. When cloned into a viral vector, for example, a phage genome, the expression of the encoding sequence of such genes of interest is under the control of the heterologous promoter and, in some embodiments, may also be influenced by one or more promoters in the viral genome.
[0060] The term "function of a gene of interest," as interchangeably used with the term "activity of a gene of interest," refers to a function or activity of a gene product, for example, a nucleic acid or a protein, encoded by the gene of interest. For example, a function of a gene of interest may be an enzymatic activity (e.g., an enzymatic activity resulting in the generation of a reaction product, phosphorylation activity, phosphatase activity, etc.), an ability to activate transcription (e.g., transcriptional activation activity targeted to a specific promoter sequence), a bond-forming activity (e.g., an enzymatic activity resulting in the formation of a covalent bond), or a binding activity (e.g., a protein, DNA, or RNA binding activity).
[0061] The term "promoter" refers to a nucleic acid molecule with a sequence recognized by the cellular transcription machinery and able to initiate transcription of a downstream gene. A promoter can be constitutively active, meaning that the promoter is always active in a given cellular context, or conditionally active, meaning that the promoter is only active under specific conditions. For example, a conditional promoter may only be active in the presence of a specific protein that connects a protein associated with a regulatory element in the promoter to the basic transcriptional machinery, or only in the absence of an inhibitory molecule. A subclass of conditionally active promoters are inducible promoters that require the presence of a small molecule "inducer" for activity. Examples of inducible promoters include, but are not limited to, arabinose-inducible promoters, Tet-on promoters, and tamoxifen-inducible promoters. A variety of constitutive, conditional, and inducible promoters are well known to the skilled artisan, and the skilled artisan will be able to ascertain a variety of such promoters useful in carrying out the instant invention, which is not limited in this respect.
[0062] The term "viral particle," as used herein, refers to a viral genome, for example, a DNA or RNA genome, that is associated with a coat of a viral protein or proteins, and, in some cases, with an envelope of lipids. For example, a phage particle comprises a phage genome packaged into a protein encoded by the wild type phage genome.
[0063] The term "infectious viral particle," as used herein, refers to a viral particle able to transport the viral genome it comprises into a suitable host cell. Not all viral particles are able to transfer the viral genome to a suitable host cell. Particles unable to accomplish this are referred to as non-infectious viral particles. In some embodiments, a viral particle comprises a plurality of different coat proteins, wherein one or some of the coat proteins can be omitted without compromising the structure of the viral particle. In some embodiments, a viral particle is provided in which at least one coat protein cannot be omitted without the loss of infectivity. If a viral particle lacks a protein that confers infectivity, the viral particle is not infectious. For example, an M13 phage particle that comprises a phage genome packaged in a coat of phage proteins (e.g. , pVIII) but lacks pill (protein III) is a non-infectious M13 phage particle because pill is essential for the infectious properties of M13 phage particles.
[0064] The term "viral life cycle," as used herein, refers to the viral reproduction cycle comprising insertion of the viral genome into a host cell, replication of the viral genome in the host cell, and packaging of a replication product of the viral genome into a viral particle by the host cell.
[0065] In some embodiments, the viral vector provided is a phage. The term "phage," as used herein interchangeably with the term "bacteriophage," refers to a virus that infects bacterial cells. Typically, phages consist of an outer protein capsid enclosing genetic material. The genetic material can be ssRNA, dsRNA, ssDNA, or dsDNA, in either linear or circular form. Phages and phage vectors are well known to those of skill in the art and non- limiting examples of phages that are useful for carrying out the methods provided herein are λ (Lysogen), T2, T4, T7, T12, R17, M13, MS2, G4, PI, P2, P4, Phi X174, N4, Φ6, and Φ29. In certain embodiments, the phage utilized in the present invention is M13. Additional suitable phages and host cells will be apparent to those of skill in the art, and the invention is not limited in this aspect. For an exemplary description of additional suitable phages and host cells, see Elizabeth Kutter and Alexander Sulakvelidze: Bacteriophages: Biology and Applications. CRC Press; 1st edition (December 2004), ISBN: 0849313368; Martha R. J. Clokie and Andrew M. Kropinski: Bacteriophages: Methods and Protocols, Volume 1:
Isolation, Characterization, and Interactions (Methods in Molecular Biology) Humana Press; 1st edition (December, 2008), ISBN: 1588296822; Martha R. J. Clokie and Andrew M. Kropinski: Bacteriophages: Methods and Protocols, Volume 2: Molecular and Applied Aspects (Methods in Molecular Biology) Humana Press; 1st edition (December 2008), ISBN: 1603275649; all of which are incorporated herein in their entirety by reference for disclosure of suitable phages and host cells as well as methods and protocols for isolation, culture, and manipulation of such phages).
[0066] In some embodiments, the phage is a filamentous phage. In some
embodiments, the phage is an M 13 phage. M13 phages are well known to those in the art and the biology of M13 phages has extensively been studied. Wild type M13 phage particles comprise a circular, single- stranded genome of approximately 6.4kb. In certain
embodiments, the wild-type genome of an M 13 phage includes eleven genes, gl-gXI, which, in turn, encode the eleven M13 proteins, pI-pXI, respectively. gVIII encodes pVIII, also often referred to as the major structural protein of the phage particles, while gill encodes pill, also referred to as the minor coat protein, which is required for infectivity of M13 phage particles.
[0067] The M13 life cycle includes attachment of the phage to the sex pilus of a suitable bacterial host cell via the pill protein and insertion of the phage genome into the host cell. The circular, single-stranded phage genome is then converted to a circular, double- stranded DNA, also termed the replicative form (RF), from which phage gene transcription is initiated. The wild type M13 genome comprises nine promoters and two transcriptional terminators as well as an origin of replication. This series of promoters provides a gradient of transcription such that the genes nearest the two transcriptional terminators (gVIII and IV) are transcribed at the highest levels. In wild-type M13 phage, transcription of all 11 genes proceeds in the same direction. One of the phage-encoded proteins, pll, initiates the generation of linear, single-stranded phage genomes in the host cells, which are subsequently circularized, and bound and stabilized by pV. The circularized, single- stranded M13 genomes are then bound by pVIII, while pV is stripped off the genome, which initiates the packaging process. At the end of the packaging process, multiple copies of pill are attached to wild-type M13 particles, thus generating infectious phage ready to infect another host cell and concluding the life cycle.
[0068] The M13 phage genome can be manipulated, for example, by deleting one or more of the wild type genes, and/or inserting a heterologous nucleic acid construct into the genome. M13 does not have stringent genome size restrictions, and insertions of up to 42 kb have been reported. This allows M13 phage vectors to be used in continuous evolution experiments to evolve genes of interest without imposing a limitation on the length of the gene to be involved.
[0069] The term "selection phage," as used herein interchangeably with the term
"selection plasmid," refers to a modified phage that comprises a gene of interest to be evolved and lacks a full-length gene encoding a protein required for the generation of infectious phage particles. For example, some M13 selection phages provided herein comprise a nucleic acid sequence encoding a protease to be evolved, e.g., under the control of an M13 promoter, and lack all or part of a phage gene encoding a protein required for the generation of infectious phage particles, e.g., gl, gll, gill, gIV, gV, gVI, gVII, gVIII, glX, or gX, or any combination thereof. For example, some M13 selection phages provided herein comprise a nucleic acid sequence encoding a protease to be evolved, e.g., under the control of an M13 promoter, and lack all or part of a gene encoding a protein required for the generation of infective phage particles, e.g., the gill gene encoding the pill protein.
[0070] The term "helper phage," as used herein interchangeable with the terms
"helper phagemid" and "helper plasmid," refers to a nucleic acid construct comprising a phage gene required for the phage life cycle, or a plurality of such genes, but lacking a structural element required for genome packaging into a phage particle. For example, a helper phage may provide a wild-type phage genome lacking a phage origin of replication. In some embodiments, a helper phage is provided that comprises a gene required for the generation of phage particles, but lacks a gene required for the generation of infectious particles, for example, a full-length pill gene. In some embodiments, the helper phage provides only some, but not all, genes required for the generation of phage particles. Helper phages are useful to allow modified phages that lack a gene required for the generation of phage particles to complete the phage life cycle in a host cell. Typically, a helper phage will comprise the genes required for the generation of phage particles that are lacking in the phage genome, thus complementing the phage genome. In the continuous evolution context, the helper phage typically complements the selection phage, but both lack a phage gene required for the production of infectious phage particles.
[0071] The term "replication product," as used herein, refers to a nucleic acid that is the result of viral genome replication by a host cell. This includes any viral genomes synthesized by the host cell from a viral genome inserted into the host cell. The term includes non-mutated as well as mutated replication products.
[0072] The term "accessory plasmid," as used herein, refers to a plasmid comprising a gene required for the generation of infectious viral particles under the control of a conditional promoter. In the context of continuous evolution described herein, the conditional promoter of the accessory plasmid is typically activated by a function of the gene of interest to be evolved. Accordingly, the accessory plasmid serves the function of conveying a competitive advantage to those viral vectors in a given population of viral vectors that carry a gene of interest able to activate the conditional promoter. Only viral vectors carrying an "activating" gene of interest will be able to induce expression of the gene required to generate infectious viral particles in the host cell, and, thus, allow for packaging and propagation of the viral genome in the flow of host cells. Vectors carrying non-activating versions of the gene of interest, on the other hand, will not induce expression of the gene required to generate infectious viral vectors, and, thus, will not be packaged into viral particles that can infect fresh host cells.
[0073] In some embodiments, the conditional promoter of the accessory plasmid is a promote the transcriptional activity of which can be regulated over a wide range, for example, over 2, 3, 4, 5, 6, 7, 8, 9, or 10 orders of magnitude by the activating function, for example, function of a protein encoded by the gene of interest). In some embodiments, the level of transcriptional activity of the conditional promoter depends directly on the desired function of the gene of interest. This allows for starting a continuous evolution process with a viral vector population comprising versions of the gene of interest that only show minimal activation of the conditional promoter. In the process of continuous evolution, any mutation in the gene of interest that increases activity of the conditional promoter directly translates into higher expression levels of the gene required for the generation of infectious viral particles, and, thus, into a competitive advantage over other viral vectors carrying minimally active or loss-of-function versions of the gene of interest.
[0074] The stringency of selective pressure imposed by the accessory plasmid in a continuous evolution procedure as provided herein can be modulated. In some embodiments, the use of low copy number accessory plasmids results in an elevated stringency of selection for versions of the gene of interest that activate the conditional promoter on the accessory plasmid, while the use of high copy number accessory plasmids results in a lower stringency of selection. The terms "high copy number plasmid" and "low copy number plasmid" are art- recognized and those of skill in the art will be able to ascertain whether a given plasmid is a high or low copy number plasmid. In some embodiments, a low copy number accessory plasmid is a plasmid exhibiting an average copy number of plasmid per host cell in a host cell population of about 5 to about 100. In some embodiments, a very low copy number accessory plasmid is a plasmid exhibiting an average copy number of plasmid per host cell in a host cell population of about 1 to about 10. In some embodiments, a very low copy number accessory plasmid is a single-copy per cell plasmid. In some embodiments, a high copy number accessory plasmid is a plasmid exhibiting an average copy number of plasmid per host cell in a host cell population of about 100 to about 5000. The copy number of an accessory plasmid will depend to a large part on the origin of replication employed. Those of skill in the art will be able to determine suitable origins of replication in order to achieve a desired copy number.
[0075] In some embodiments, the stringency of selective pressure imposed by the accessory plasmid can also be modulated through the use of mutant or alternative conditional transcription factors with higher or lower transcriptional output (e.g. , a T7RNA polymerase comprising a Q649S mutation). In some embodiments, the use of lower transcriptional output results in an elevated stringency of selection for versions of the gene of interest that activate the conditional promoter on the accessory plasmid, while the use of higher transcriptional output machinery results in a lower stringency of selection.
[0076] It should be understood that the function of the accessory plasmid, namely to provide a gene required for the generation of viral particles under the control of a conditional promoter the activity of which depends on a function of the gene of interest, can be conferred to a host cell in alternative ways. Such alternatives include, but are not limited to, permanent insertion of a gene construct comprising the conditional promoter and the respective gene into the genome of the host cell, or introducing it into the host cell using an different vector, for example, a phagemid, a cosmid, a phage, a virus, or an artificial chromosome. Additional ways to confer accessory plasmid function to host cells will be evident to those of skill in the art, and the invention is not limited in this respect.
[0077] The term "mutagen," as used herein, refers to an agent that induces mutations or increases the rate of mutation in a given biological system, for example, a host cell, to a level above the naturally-occurring level of mutation in that system. Some exemplary mutagens useful for continuous evolution procedures are provided elsewhere herein and other useful mutagens will be evident to those of skill in the art. Useful mutagens include, but are not limited to, ionizing radiation, ultraviolet radiation, base analogs, deaminating agents (e.g., nitrous acid), intercalating agents (e.g. , ethidium bromide), alkylating agents (e.g., ethylnitrosourea), transposons, bromine, azide salts, psoralen, benzene,3-chloro-4- (dichloromethyl)-5-hydroxy-2(5H)-furanone (MX) (CAS no. 77439-76-0), 0,0-dimethyl-S- (phthalimidomethyl)phosphorodithioate (phos-met) (CAS no. 732-11- 6), formaldehyde (CAS no. 50-00-0), 2-(2-furyl)-3-(5-nitro-2-furyl)acrylamide (AF-2) (CAS no. 3688-53-7), glyoxal (CAS no. 107-22-2), 6-mercaptopurine (CAS no. 50-44- 2), N-(trichloromethylthio)- 4-cyclohexane-l,2-dicarboximide (captan) (CAS no. 133- 06-2), 2-aminopurine (CAS no. 452-06-2), methyl methane sulfonate (MMS) (CAS No. 66-27-3), 4-nitroquinoline 1 -oxide (4-NQO) (CAS No. 56-57-5), N4-aminocytidine (CAS no. 57294-74-3), sodium azide (CAS no. 26628-22-8), N-ethyl-N-nitrosourea (ENU) (CAS no. 759-73-9), N-methyl-N-nitrosourea (MNU) (CAS no. 820-60-0), 5- azacytidine (CAS no. 320-67-2), cumene hydroperoxide (CHP) (CAS no. 80- 15-9), ethyl methanesulfonate (EMS) (CAS no. 62-50-0), N-ethyl-N - nitro-N-nitrosoguanidine (ENNG) (CAS no. 4245-77-6), N-methyl-N -nitro-N- nitrosoguanidine (MNNG) (CAS no. 70-25-7), 5-diazouracil (CAS no. 2435-76-9) and t- butyl hydroperoxide (BHP) (CAS no. 75-91-2). Additional mutagens can be used in continuous evolution procedures as provided herein, and the invention is not limited in this respect.
[0078] Ideally, a mutagen is used at a concentration or level of exposure that induces a desired mutation rate in a given host cell or viral vector population, but is not significantly toxic to the host cells used within the average time frame a host cell is exposed to the mutagen or the time a host cell is present in the host cell flow before being replaced by a fresh host cell. [0079] The term "mutagenesis plasmid," as used herein, refers to a plasmid comprising a gene encoding a gene product that acts as a mutagen. In some embodiments, the gene encodes a DNA polymerase lacking a proofreading capability. In some
embodiments, the gene is a gene involved in the bacterial SOS stress response, for example, a UmuC, UmuD', or RecA gene. In some embodiments, the gene is a GATC methylase gene, for example, a deoxyadenosine methylase (dam methylase) gene. In some embodiments, the gene is involved in binding of hemimethylated GATC sequences, for example a seqA gene. In some embodiments, the gene is involved with repression of mutagenic nucleobase export, for example emrR. In some embodiments, the gene is involved with inhibition of uracil DNA-glycosylase, for example a Uracil Glycosylase Inhibitor (ugi) gene. In some embodiments, the gene is involved with deamination of cytidine (e.g., a cytidine deaminase from Petromyzon marinus), for example, cytidine deaminase 1 (CDA1).
[0080] The term "host cell," as used herein, refers to a cell that can host a viral vector useful for a continuous evolution process as provided herein. A cell can host a viral vector if it supports expression of genes of viral vector, replication of the viral genome, and/or the generation of viral particles. One criterion to determine whether a cell is a suitable host cell for a given viral vector is to determine whether the cell can support the viral life cycle of a wild-type viral genome that the viral vector is derived from. For example, if the viral vector is a modified M13 phage genome, as provided in some embodiments described herein, then a suitable host cell would be any cell that can support the wild-type M13 phage life cycle. Suitable host cells for viral vectors useful in continuous evolution processes are well known to those of skill in the art, and the invention is not limited in this respect.
[0081] In some embodiments, modified viral vectors are used in continuous evolution processes as provided herein. In some embodiments, such modified viral vectors lack a gene required for the generation of infectious viral particles. In some such embodiments, a suitable host cell is a cell comprising the gene required for the generation of infectious viral particles, for example, under the control of a constitutive or a conditional promoter (e.g., in the form of an accessory plasmid, as described herein). In some embodiments, the viral vector used lacks a plurality of viral genes. In some such embodiments, a suitable host cell is a cell that comprises a helper construct providing the viral genes required for the generation of viral particles. A cell is not required to actually support the life cycle of a viral vector used in the methods provided herein. For example, a cell comprising a gene required for the generation of infectious viral particles under the control of a conditional promoter may not support the life cycle of a viral vector that does not comprise a gene of interest able to activate the promoter, but it is still a suitable host cell for such a viral vector. In some embodiments, the viral vector is a phage, and the host cell is a bacterial cell. In some embodiments, the host cell is an E. coli cell. Suitable E. coli host strains will be apparent to those of skill in the art, and include, but are not limited to, New England Biolabs (NEB) Turbo, ToplOF', DH12S, ER2738, ER2267, XLl-Blue MRF', and DH10B. These strain names are art recognized, and the genotype of these strains has been well characterized. It should be understood that the above strains are exemplary only and that the invention is not limited in this respect.
[0082] The term "fresh," as used herein interchangeably with the terms "non- infected" or "uninfected" in the context of host cells, refers to a host cell that has not been infected by a viral vector comprising a gene of interest as used in a continuous evolution process provided herein. A fresh host cell can, however, have been infected by a viral vector unrelated to the vector to be evolved or by a vector of the same or a similar type but not carrying the gene of interest. In some embodiments, the host cell is a prokaryotic cell, for example, a bacterial cell, such as an E. coli cell.
[0083] In some embodiments, the host cell is an E.coli cell. In some PACE embodiments, for example, in embodiments employing an Ml 3 selection phage, the host cells are E. coli cells expressing the Fertility factor, also commonly referred to as the F factor, sex factor, or F-plasmid. The F-factor is a bacterial DNA sequence that allows a bacterium to produce a sex pilus necessary for conjugation and is essential for the infection of E. coli cells with certain phage, for example, with Ml 3 phage. For example, in some embodiments, the host cells for M13-PACE are of the genotype F proA+B+ A(lacIZY) zzf::Tnl0(TetR)/ endAl recAl galE15 galK16 nupG rpsL AlacIZYA araD139 A(ara, leu)7697 mcrA A(mrr-hsdRMS- mcrBC) proBA::pirl 16 λ". In some embodiments, the host cells for M13-PACE are of the genotype F'proA+B+ A(lacIZY) zzf::Tnl0(TetR) lacIQlPN25-tetR luxCDE/endAl recAl galE15 galK16 nupG rpsL(StrR) AlacIZYA araD139 A(ara,leu)7697 mcrA A(mrr-hsdRMS- mcrBC) proBA::pirl l6 araE201 ArpoZ Aflu AcsgABCDEFG ApgaC λ-, for example S 1030 cells as described in Carlson, J. C, et al. Negative selection and stringency modulation in phage-assisted continuous evolution. Nat. Chem. Biol. 10, 216-222(2014). In some embodiments, the host cells for M13-PACE are of the genotype F' proA+B+ A(lacIZY) zzf::Tnl0 lacIQl PN25-tetR luxCDE Ppsp(AR2) lacZ luxR Plux groESL / endAl recAl galE15 galK16 nupG rpsL AlacIZYA araD139 A(ara,leu)7697 mcrA A(mrr-hsdRMS-mcrBC) proBA::pirl l6 araE201 ArpoZ Aflu AcsgABCDEFG ApgaC λ-, for example S2060 cells as described in Hubbard, B. P. et al. Continuous directed evolution of DNA-binding proteins to improve TALEN specificity. Nature Methods 12, 939-942 (2015).
[0084] The term "subject," as used herein, refers to an individual organism, for example, an individual mammal. In some embodiments, the subject is a human. In some embodiments, the subject is a non-human mammal. In some embodiments, the subject is a non-human primate. In some embodiments, the subject is a rodent. In some embodiments, the subject is a sheep, a goat, a cattle, a cat, or a dog. In some embodiments, the subject is a vertebrate, an amphibian, a reptile, a fish, an insect, a fly, or a nematode. In some
embodiments, the subject is a research animal. In some embodiments, the subject is genetically engineered, e.g., a genetically engineered non-human subject. The subject may be of either sex and at any stage of development. In some embodiments the subject has a disease characterized by increased IL-23 expression. In some embodiments, the disease characterized by increased IL-23 activity is an inflammatory disease {e.g., plaque psoriasis, multiple sclerosis, inflammatory bowel disease, ulcerative colitis, Crohn's disease, rheumatoid arthritis, spondyloarthritis, systemic Lupus erythematosus (SLE), etc.).
[0085] The term "cell," as used herein, refers to a cell derived from an individual organism, for example, from a mammal. A cell may be a prokaryotic cell or a eukaryotic cell. In some embodiments, the cell is a eukaryotic cell, for example, a human cell, a mouse cell, a pig cell, a hamster cell, a monkey cell, etc. In some embodiments, a cell is
characterized by increased IL-23 expression, such as an immune cell. Non-limiting examples of cells characterized by increased IL-23 expression include macrophages, dendritic cells and activated phagocytic cells. In some embodiments, a cell is obtained from a subject having or suspected of having a disease characterized by increased IL-23 levels/expression, for example, inflammatory diseases, autoimmune diseases, etc.
[0086] The term "extracellular environment," as used herein refers, to the aqueous biological fluids and tissues forming the microenvironment surrounding a cell or cells. For example, in a subject, an extracellular environment may include blood, serum, cytokines, neurotransmitters, tissue, etc., surrounding a cell or group of cells. In another example, a cellular environment is the cell culture growth media surrounding a cell or cells in an in vitro culture vessel, such as a cell culture plate or flask.
[0087] The term "increased expression," as used herein, refers to an increase in expression {e.g., elevated expression) of a particular molecule in one cell or subject relative to a normal cell or subject that is not characterized by increased expression of that molecule {e.g., a "normal" or "control" cell or subject). For example, a cell characterized by increased IL-23 expresses more IL-23 than a control cell expressing a normal (e.g. , healthy) amount of IL-23. In another example, a cell characterized by increased IL-17 expression expresses more IL-17 than a control cell expressing a normal (e.g., healthy) amount of IL- 17. Methods of determining relative expression levels of biomolecules (e.g., cytokines, proteins, nucleic acids, etc.) are known to the skilled artisan and include quantitative real-time PCR (q-RT- PCR), western blot, protein quantification assays (e.g. , BCA assay), flow cytometry, etc.
DETAILED DESCRIPTION OF INVENTION
Introduction
[0088] Proteases are ubiquitous regulators of protein function in all domains of life and represent approximately one percent of known protein sequences. Substrate- specific proteases have proven useful as research tools and as therapeutics that supplement a natural protease deficiency to treat diseases, such as hemophilia, or that simply perform their native functions such as the case of botulinum toxin, which catalyzes the cleavage of SNARE proteins.
[0089] Researchers have engineered or evolved industrial proteases with enhanced thermostability and solvent tolerance. Similarly, a handful of therapeutic proteases have been engineered with improved kinetics and prolonged activity. The potential of proteases to serve as a broadly useful platform for degrading proteins implicated in disease, however, is greatly limited by the native substrate scope of known proteases. In contrast to the highly successful generation of therapeutic monoclonal antibodies with tailor-made binding specificities, the generation of proteases with novel protein cleavage specificities has proven to be a major challenge. For example, efforts to engineer trypsin variants that exhibit the substrate specificity of the closely related protease chymotrypsin were unsuccessful until researchers grafted the entire substrate-binding pocket, multiple surface loops, and additional residues from chymotrypsin. This approach of replacing protease residues with amino acids from related proteases to impart specificity features from the latter cannot provide proteases with specificities not already known among natural proteases, prompting researchers to instead turn to laboratory evolution to generate proteases with novel specificities. Despite several decades of effort, to the best of the inventors' knowledge, no evolved proteases have yet been reported with more than one position of changed substrate specificity.
[0090] The evolution of a protease that can degrade a target protein of interest often necessitates changing substrate sequence specificity at more than one position, and thus may require many generations of evolution. Continuous evolution strategies, which require little or no researcher intervention between generations, therefore may be well- suited to evolve proteases capable of cleaving a target protein that differs substantially in sequence from the preferred substrate of a wild-type protease. In phage-assisted continuous evolution (PACE), a population of evolving selection phage (SP) is continuously diluted in a fixed-volume vessel by an incoming culture of host cells, e.g., E. coli. The SP is a modified phage genome in which the evolving gene of interest has replaced gene III, a gene essential for phage infectivity. If the evolving gene of interest possesses the desired activity, it will trigger expression of gene III from an accessory plasmid (AP) in the host cell, thus producing infectious progeny encoding active variants of the evolving gene. The mutation rate of the SP is controlled using an inducible mutagenesis plasmid (MP) such as MP6, which upon induction increases the mutation rate of the SP by > 300,000-fold. Because the rate of continuous dilution is slower than phage replication but faster than E. coli replication, mutations only accumulate in the SP.
[0091] Some aspects of this disclosure are based on the recognition that PACE can be employed for the directed evolution of proteases, in particular the evolution of proteases that cleave IL-23. Proteases may require many successive mutations to remodel complex networks of contacts with polypeptide substrates, and are thus not readily manipulated by conventional, iterative evolution methods. The ability of PACE to perform the equivalent of hundreds of rounds of iterative evolution methods within days enables complex protease evolution experiments, that are impractical with conventional methods.
[0092] This disclosure provides data illustrating the feasibility of PACE-mediated evolution of the TEV protease to cleave IL-23. As described in the Examples, Tobacco Etch Virus (TEV) protease, which natively cleaves the consensus substrate sequence ENLYFQS (SEQ ID NO: 2), was evolved by PACE to cleave a target sequence, HPLVGHM (SEQ ID NO: 3), that differs at six of seven positions from the consensus substrate and is present in an exposed loop of the pro-inflammatory cytokine IL-23. It was observed that after constructing a pathway of evolutionary stepping-stones and performing -2,500 generations of evolution using PACE, the resulting TEV variant proteases contain up to 20 amino acid substitutions relative to wild-type TEV protease (e.g., SEQ ID NO: 1), cleave human IL-23 at the intended target peptide bond, and block the ability of IL-23 to stimulate IL-17 production in a murine splenocyte assay. Together, the proof-of concept findings described herein establish a platform for generating proteases (e.g., TEV protease variants) with substrate specificities changed at several positions and the ability to cleave proteins implicated in human disease. [0093] PACE technology has been described previously, for example, in International
PCT Application, PCT/US2009/056194, filed September 8, 2009, published as WO
2010/028347 on March 11, 2010; International PCT Application, PCT/US2011/066747, filed December 22, 2011, published as WO 2012/088381 on June 28, 2012; and U.S. Application, U.S.S.N. 13/922,812, filed June 20, 2013; International PCT Application,
PCT/US2015/057012, filed on October 22, 2015, published as WO2016/077052; and, PCT/US2016/027795, filed on April 15, 2016, published as WO2016/168631, each of which is incorporated herein by reference. Those of skill in the art will understand that the PACE technology, strategies, methods, and reagents provided herein can be used in combination with many aspects of the PACE technology described in those applications, for example, with the apparatuses, lagoons, host cell types, cell flow parameters, negative selection strategies, etc., disclosed in those applications.
Variant TEV Proteases and Uses Thereof
[0094] Some aspects of this disclosure provide variant TEV proteases that are derived from a wild-type TEV protease {e.g., SEQ ID NO: l) and have at least 14 variations in the amino acid sequence of the protein as compared to the amino acid sequence present within a cognate wild-type TEV protease. The variation in amino acid sequence generally results from a mutation, insertion, or deletion in a DNA coding sequence. Mutation of a DNA sequence can result in a nonsense mutation {e.g., a transcription termination codon (TAA, TAG, or TAA) that produces a truncated protein), a missense mutation {e.g., an insertion or deletion mutation that shifts the reading frame of the coding sequence), or a silent mutation {e.g., a change in the coding sequence that results in a codon that codes for the same amino acid normally present in the cognate protein, also referred to sometimes as a synonymous mutation). In some embodiments, mutation of a DNA sequence results in a non- synonymous {i.e., conservative, semi-conservative, or radical) amino acid substitution.
[0095] Generally, wild-type TEV protease is encoded by a gene of the Tobacco Etch
Virus. The amount or level of variation between a wild-type TEV protease and a variant TEV protease provided herein can be expressed as the percent identity of the nucleic acid sequences or amino acid sequences between the two genes or proteins. In some
embodiments, the amount of variation is expressed as the percent identity at the amino acid sequence level. In some embodiments, a variant TEV protease and a wild-type TEV protease are from about 50% to about 99.9% identical, about 55% to about 95% identical, about 60% to about 90% identical, about 65% to about 85% identical, or about 70% to about 80% identical at the amino acid sequence level. In some embodiments, a variant TEV protease comprises an amino acid sequence that is at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 99%, or at least 99.9% identical to the amino acid sequence of a wild-type TEV protease.
[0096] In some embodiments, a variant TEV protease is about 70%, about 71%, about
72%, about 73%, about 74%, about 75%, about 76%, about 77%, about 78%, about 79%, about 80%, about 81%, about 82%, about 83%, about 84%, about 85%, about 86%, about 87%, about 88%, about 89%, about 90%, about 91 %, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, about 99%, or about 99.9% identical to a wild-type TEV protease.
[0097] Some aspects of the disclosure provide variant TEV proteases having between about 90% and about 94% (e.g., about 90%, about 90.5%, about 91%, about 91.5%, about 92%, about 92.5%, about 93%, about 93.5%, or about 94%) identity to a wild-type TEV protease as set forth in SEQ ID NO: 1. In some embodiments, the variant TEV protease is no more than 94% identical to a wild-type TEV protease. In some embodiments, the variant TEV protease comprises at least 14 amino acid variations selected from the variations (e.g. , amino acid substitutions) provided in Table 1.
[0098] The amount or level of variation between a wild-type TEV protease and a variant TEV protease can also be expressed as the number of mutations present in the amino acid sequence encoding the variant TEV protease relative to the amino acid sequence encoding the wild-type TEV protease. In some embodiments, an amino acid sequence encoding a variant TEV protease comprises between about 1 mutation and about 100 mutations, about 10 mutations and about 90 mutations, about 20 mutations and about 80 mutations, about 30 mutations and about 70 mutations, or about 40 and about 60 mutations relative to an amino acid sequence encoding a wild-type TEV protease. In some
embodiments, an amino acid sequence encoding a variant TEV protease comprises more than 100 mutations relative to an amino acid sequence encoding a wild-type TEV protease. In some embodiments, an amino acid sequence encoding a variant TEV protease comprises 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 mutations relative to an amino acid sequence encoding a wild-type TEV protease. Examples of mutations that occur in an amino acid sequence encoding a variant TEV protease are included in Table 1. Table 1: Amino acid mutations in variant TEV proteases relative to SEQ ID NO: 1
E2G, E2K, S3R, F5L, K6E, P8L, R9C, D10G, D ION, Y11C, N12T, N12H, P13S, T17A, T17S, I18V, E24K, E24V, D26Y, H28L, H28Y, T30A, G32R, V36I, P39S, F40L, R50G, R50K, H61Q, H61R, V63I, K65R, V66G, K67N, K67E, N68D, T69P, T70P, T71P, Q73H, H75L, R80G, K89E, K97N, K99Q, Q104R, E106G, E106D, E107D, I109V, L111F, T114A, T118A, T118S, S 120N, M121T, M124I, M124T, V125G, D127P, D127V, D127A, T128P, T128A, C 130R, F132I, F132L, F132S, F132V, S 135F, I138T, Q145P, T146S, T146R, T146C, T146P, T146A, D148C, D148P, D148A, Q150P, S 152N, S 153N, R159I, R159K, F162A, F162S, H167P, S 170A, N171K, N171D, N176I, N176T, N177W, N177S, N177R, N177G, N177F, N177M, V182G, K184T, K184E, N185D, Q193P, S200G, R203Q, V209M, V209F, V209E, V209S, W211C, W21 IV, W211L, W21 II, K215E, V216I, V216V, M218W, M218L, M218F, K220Q, P221T, E222., E223G, E223., F225L, F225S, Q226P, Q226., Q226S, P227S, P227A, P227., V228S, K229E, K229., E230A, A231E, T232A, Q233., L234R
[0099] Particular combinations of mutations present in an amino acid sequence encoding a variant TEV protease can be referred to as the "genotype" of the variant TEV protease. For example, a variant TEV protease genotype may comprise the mutations T17S, H28L, T30A, N68D, E107D, F132L, S 153N, and S 170A, relative to a wild-type TEV protease (e.g., SEQ ID NO: 1). Further examples of variant TEV protease genotypes are shown in Tables 2-11.
Table 2: Non-limiting Examples of variant TEV protease genotypes relative to SEQ ID NO: 1.
Figure imgf000032_0001
[00100] For each trajectory, a representative clone was selected at the end of each of the eight successive PACE experiments.
Table 3: Non-limiting examples of variant TEV protease genotypes relative to SEQ ID NO: 1 from 84 h PACE 1.
Figure imgf000033_0001
[00101] Each row corresponds to a single clone of evolved TEV protease from the selection plasmid (SP). These genotypes are from the end of PACE 1 in Figure 2.
Table 4: Non-limiting examples of variant TEV protease genotypes relative to SEQ ID NO: 1 from 168h of PACE 2.
Figure imgf000034_0001
[00102] Genotypes after 168 cumulative hours of PACE are from the end of PACE 2 of trajectories 1 and 2 in Figure 2.
Table 5: Non-limiting examples of variant TEV protease genotypes relative to SEQ ID NO: 1 after 84h of PACE 3.
Figure imgf000035_0001
[00103] Genotypes after 168 cumulative hours of PACE are from the end of PACE 3 for trajectory 3 in Figure 2.
Table 6: Non-limiting examples of variant TEV protease genotypes relative to SEQ ID NO: 1 after 96h of PACE 4.
Figure imgf000036_0001
[00104] Genotypes after 264 cumulative hours of PACE are from the end of PACE 4 in Figure 2.
Table 7: Non-limiting examples of variant TEV protease genotypes relative to SEQ ID NO: 1 after 72h of PACE 5.
Figure imgf000037_0001
Figure imgf000038_0001
[00105] Genotypes after 336 cumulative hours of PACE are from the end of PACE 5 in Figure 2.
Table 8: Non-limiting Examples of variant TEV protease genotypes relative to SEQ ID NO: 1 after 120h of PACE 6.
Figure imgf000039_0001
[00106] Genotypes after 456 cumulative hours of PACE are from the end of PACE 6 in Figure 2.
Figure imgf000040_0001
Figure imgf000041_0001
[00107] Genotypes after 528 cumulative hours of PACE are from the end of PACE 7 in Figure 2.
Figure imgf000042_0001
Figure imgf000043_0001
[00110] Accordingly, in some embodiments, a variant TEV protease comprises at least
14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 mutations provided in Table 1. In some embodiments, the at least one mutation is selected from the group consisting of: D127A, S 135F, T146S, D148P, F162S, N171D, N176T, N177M, V209M, W21 II, M218F, and K229E. In some embodiments, a variant TEV protease as described herein comprises or consists of a sequence selected from SEQ ID NOs: 11-153 given in Table 12. The lowercase amino acid residues indicate the amino acid substitutions.
Table 12: Variant Sequences
PACE Sequence SEQ ID
NO:
1 GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 11
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSDTSCTFPSSDGIFWKHWIQTKDGQCGSPLVSTiDGFIVGIHSASdF
TNTtNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSVLWGGHKVFMVKPEEPF
QPVKEATQLMN
1 GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 12
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKnMSSMVSDTSCTFPSSDGIFWKHWIQTKDGQCGSPLVSTRDGFIVGIHSASN
FTNTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSVLWGGHKVFMVKPEEP
FQPVKaATQLMN
1 GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 13
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQTKDGQCGSPLVSTRDGFIVGIHSASNF
TNTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSVLWGGHKVFMVKPEEPF
QPVKEATQLMN
1 GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 14
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTiPSfDGIFWKHWIQTKDGQCGSPLVSTRDGFIVGIHSASNF
TNTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSVLWGGHKVFMVKPEEPF
QPVKEATQLMN
1 GErLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 15
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSDTSCTFPSSDGtFWKHWIQTKDGQCGSPLVSTRDGFIVGIHSASd
FTNTtNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSVLWGGHKVFMVKPEEP
FQPVKEATQLMN
1 GESLFKGPRDYtPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 16
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSDTSCTFPSSDGtFWKHWIQTKDGQCGSPLVSTRDGFIVGIHSASd
FTNTtNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSVLWGGHKVFMVKPEEP
FQPVKEATQLMN
1 GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 17
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSDTSCTFPSSDGtFWKHWIQTKDGQCGSPLVSTRDGFIVGIHSASd
FTNTtNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSVLWGGHKVFMVKPEEP
FQPVKEATQLMN PACE Sequence SEQ ID
NO:
1 GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIiFGPFIITNKHLFRRNNGTLLVQ 18
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSDTSCTFPSSDGtFWKHWIQTKDGQCGSPLVSTRDGFIVGIHSASd
FTNTtNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSVLWGGHKVFMVKPEEP
FQPVKEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 19
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSDTSCTFPSSDGIFWKHWIQTKDGQCGSPLVSTRDGFIVGIHSASN
FTNTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLvGGHKvFwVKPEEPF
QPVKEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 20
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQTKDGQCGSPLVSTRDGFIVGIHSASNF
TNTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKvFfVKPEEPFQP
VKEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 21
SLHGVFrVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQTKDGQCGSPLVSTRDGFIVGIHSASNF
TNTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSeLlGGHKiFwVKPEEPFQPV
KEATQLMN
GESLFKGPRDcNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 22 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQTKDGQCGSPLVSTRDGFIVGIHSASNF
TNTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKvFwVKPEEPFQP
VKEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 23 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN FQTKSMSSMVSvTSCTFPSfDGIFWKHWIQTKDGQCGSPLVSTRDGFIVGIHSASNF
TNTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSeLlGGHKiFwVKPEEPFQPV
KEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 24 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQTKDGQCGSPLVSTRDGFIVGIHSASNF
TNTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKvFwVKPEEPFQP
VKEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 1 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN FQTKSMSSMVSDTSCTFPSSDGIFWKHWIQTKDGQCGSPLVSTRDGFIVGIHSASN FTNTNNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSVLWGGHKVFMVKPEEP FQPVKEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 13
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQTKDGQCGSPLVSTRDGFIVGIHSASNF
TNTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSVLWGGHKVFMVKPEEPF
QPVKEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 25
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSDTSCTFPSfDGIFWKHWIQTKDGQCGSPLVSTRDGFIVGIHSASN
FTNTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSVLWGGHKVFMVKPEEP
FQPVKEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 26
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKnMSSMVSDTSCTFPSfDGIFWKHWIQTKaGQCGSPLVSTRDGFIVGIHSASNF
TNTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSVLWGGHKVFMVKPEEPF
QPVKEATQLMN PACE Sequence SEQ ID
NO:
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 27
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQTKaGQCGSPLVSTRDGFIVGIHSASNF
TNTiNYFTSVPKNFMELLTNQEAQQWVSGWqLNADSVLWGGHKVFMVKPEEPF
QPVKEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 28
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSDTSCTFPSSDGtFWKHWIQTKDGQCGSPLVSTRDGFIVGIHSASN
FTNTtNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSVLWGGHKVFMVKPEEP
FQPVKEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 29
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSDTSCTFPSSDGtFWKHWIQTKaGQCGSPLVSTRDGFIVGIHSASNF
TNTtNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSVLWGGHKVFMVKPEEPF
QPVKEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 30
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQRgERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKpGQCGSPLVSTRDGFIVGIHSASNF
TNTirYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPV
KEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 31
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKpGQCGSPLVSTRDGFIVGIHSASNF
TNTirYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPV
KEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 32
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQrKcGQCGSPLVSTRDGFIVGIpSASNFT
NTigYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPV
KEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 33
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKpGQCGSPLVSTRDGFIVGIHSASNF
TNTiwYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQP
VKEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 34
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQcKpGQCGSPLVSTRDGFIVGIHSASNF
TNTimYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQP
VKEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 35
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQcKpGQCGSPLVSTRDGFIVGIHSASNF
TNTNmYFTSVPKNFMELLTNQEAQQWVgGWRLNADSmLiGGHKVFfVKPEEPFQ
PVKEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 36 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGFIVGIHSASdFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPV eEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 37 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQcKpGQCGSPLVSTRDGFIVGIHSASdFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSeLlGGHKiFwVKPE PACE Sequence SEQ ID
NO:
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 38
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQcKpGQCGSPLVSTRDGFIVGIHSASdFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSeLlGGHKiFwVKPEEPFQPV
KEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 39
SLHGiFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTNF
QTKSMSSMVSaTSCTsPSfDGIFWKHWIQsKpGQCGSPLVSTRDGFIVGIHSASdFTN
TtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPVe
EATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 40 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQcKpGQCGSPLVSTRDGFIVGIHSASdFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPV eEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 41 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGSPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSsLiGGHKVFwVKPEEPFQPV
KEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 42 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQTKaGQCGSPLVSTRDGFIVGIHSASNF
TNTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSeLlGGHKiFwVKPEEPF
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 43
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQTKaGQCGSPLVSTRDGFIVGIHSASNF
TNTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSsLiGGHKVFwVKPEEPFQP
VKEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 44
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGSPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSeLlGGHKiFwVKPEEPFQPV
KEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSrYGIGFGPFIITNKHLFRRNNGTLLVQ 45 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGSPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSeLlGGHKiFwVKPEEPF
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 46
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQTKaGQCGSPLVSTRDGFIVGIHSASNF
TNTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFQP
VKEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 47
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKpGQCGSPLVSTRDGFIVGIHSASNF
TNTiwYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQP
VeEATQLMN
GESLFKG1RDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 48
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKpGQCGSPLVSTRDGFIVGIHSASNF
TNTiwYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQP
VeEATQLMN PACE Sequence SEQ ID
NO:
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 49
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKpGQCGSPLVSTRDGFIVGIHSASNF
TNTiwYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQP
VeEAT
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 50 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKpGQCGSPLVSTkDGFIVGIHSASNFT
NTirYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPVe
EATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 51
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKpGQCGSPLVSTRDGFIVGIHSASNF
TNTirYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPV eEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 52
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGsIVGIHSASdFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPV eEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 36 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGFIVGIHSASdFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPV eEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 53 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGsIVGIHSASdFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPF
GESLFKGPRDYhPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 54 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGSPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPF
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 55 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGSPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFQsVK
EATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 56
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGSPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFQPV
KEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 57
SLHGVFKVnNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCnSPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFQPV
KEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 58 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQRgERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGFIVGIHSASNFT
NTifYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPVe
EATQLMN PACE Sequence SEQ ID
NO:
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 59 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN FQsKSMSSMVSaTSCTFPSfDGIFWKHWIQaKpGQCGSPLVSTRDGFIVGIHSASNFT
NTirYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPVe
EATQLMN
GESLFKGPRDYtPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 60
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKpGQCGSPLVSTRDGFIVGIHSASNF
TNTirYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPV eEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 61
SLHGVFKVKNTTTLQQHLIDGgDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKpGQCGSPLVSTRDGFIVGIHSASNF
TNTirYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPV eEATQLMN
GESLFeGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 62 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGFIVGIHSASdFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPV eEATQLMN
GESLFKGPRDYNPISSalCHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 63 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGFIVGIHSASdFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPV eEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 64 SLHGVFKVKdTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGFIVGIHSASdFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPV eEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 65
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQRgdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGSPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFQPV
KEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 66 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGSPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSVLWGGHKVFfVKPEEPF
GESLFKGPRDYNPISSTICHLTNESDGHTaSLYGIGFGPFIITNKHLFRkNNGTLLVQ 67
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGSPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFQPV
KEATQLMN
GESLFKGPcDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 68
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGSPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFQPV
KEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 69
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQRgdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGSPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSVLWGGHKVFMVKPEEPFQ
PVKEATQLMN PACE Sequence SEQ ID
NO:
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 70 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQRgERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKpGQCGnPLVSTRDGFIVGIHSAaNFT
NTirYFTSVPtNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPVeE
ATQLMN
GkSLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 71 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQcKpGQCGSPLVSTRDGFIVGIHSAaNFT
NTisYFTSVPKNFMELLTNQEAQQWVSGWqLNADSmLiGGHKVFfVKPEEPF
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 72 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKpGQCGnPLVSTRDGFIVGIHSAaNFT
NTirYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPVe
EATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 52
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGsIVGIHSASdFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPV eEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGyTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 73
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGsIVGIHSASdFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPV eEATQLMN
GESLFKGPRDYNPISSTICHLTNESDG1TTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 74
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGsIVGIHSASdFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPV eEATQLMN
GESLFeGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 75
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGsIVGIHSASdFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPV eEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 76
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGsIVGIHSAadFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPV eEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 77 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGSPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFsas
GESLFKGPRDYNPISSTICHLTNESDGyTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 78 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGnPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFsas
GESLFKGPRDYNPISSTICHLTNESDGyTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 79 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGSPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFsas
GkSLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGsFIITNKHLFRRNNGTLLVQ 80 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQcKpGQCGSPLVSTRDGsIVGIHSAaNFT
NTisYFTSVPKNFMELLTNQEAQQWVSGWqLNADSmLiGGHKVFfVKPEEPF PACE Sequence SEQ ID
NO:
GkSLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 81 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQcKpGQCGSPLVSTRDGFIVGIHSAaNFT
NTisYFTSVPKNFMELLTNQEAQQWVSGWqLNADSmLiGGHKVFfVKPEEPF
GkSLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 71 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQcKpGQCGSPLVSTRDGFIVGIHSAaNFT
NTisYFTSVPKNFMELLTNQEAQQWVSGWqLNADSmLiGGHKVFfVKPEEPF
GkSLFKGPRDYNPISSTICHLTNESDGyTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 82 SLHGVFKVKdTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQcKpGQCGSPLVSTRDGFIVGIHSAaNFT
NTisYFTSVPKNFMELLTNQEAQQWVSGWqLNADSmLiGGHKVFfVKPEEPF
GkSLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 83 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQcKpGQCGSPLVSTRDGFIVGIHSAaNFT
NTisYFTSVPKNFMELLTNQEAQQWVSGWqLNADSmLiGGHKVFfVKPEgPF
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 84
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTaN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGsIVGIHSAadFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPV eEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGlTaSLYGIGFGPFIITNKHLFRRNNGTLLVQS 85 LHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQRdERICLVTTNF
QTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGsIVGIHSAadFTN
TtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPVe
EATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 1 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN FQTKSMSSMVSDTSCTFPSSDGIFWKHWIQTKDGQCGSPLVSTRDGFIVGIHSASN FTNTNNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSVLWGGHKVFMVKPEEP FQPVKEATQLMN
GESLFeGPRDYNPISSTICHLTNESDGlTTSLYGIGFGPFIITNKHLFRRNNGTLLVQS 86 LHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTNF QTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGsIVGIHSASdFTN
TtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPE
GESLFeGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 75
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGsIVGIHSASdFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPV eEATQLMN
GESLFeGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 87
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGsIVGIHSAadFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPV eEATQLMN
GESLFeGPRDYNPISSTICHLTNESDGHTaSLYGIGFGPFIITNKHLFRRNNGTLLVQ 88
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGsIVGIHSAadFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPV eEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGyTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 78 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGnPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFsas PACE Sequence SEQ ID
NO:
GESLFKGPRDYNPISSTICHLTNESDGyTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 89 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPeDFPPFPQKLKFREPQREdRICLVTTN
FQTKStSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGnPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFsas
GESLFKGPRDYNPISSTICHLTNESDGyTTSLYGIGFGPlIITNKHLFRRNNGTLLVQS 90 LHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTNF
QTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGnPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFsas
GESLFKGPRDYNPISSTICHLTNESDGyTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 91 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRvCLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGnPLVSTRDGFIVGIHSASkFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFsas
GkSLFKGPRnYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 92 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQcKpGQCGSPLVSTRDGFIVGIHSAaNFT
NTisYFTSVPKNFMELLTNQEAQQWVSGWqLNADSmLiGGHKVFfVKPEEPF
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 93
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLqFREPQRgERICLVTTN
FQTKSMSSiVSaTSCTFPSfDGIFWKHWIQaKpGQCGnPLVSTRDGFIVGIHSAaNFTN
TirYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPVeE
ATQLMN
GkSLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 94 SLrGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQcKpGQCGSPLVSTRDGFIVGIHSAaNFT
NTisYFTSVPKNFMELLTNQEAQQWVSGWqLNADSmLiGGHKVFfVKPEEPF
GESLFKGPRDYNPISSTICHLTNESDG1TTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 95
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTaN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGnPLVSTRDGsIVGIHSAadFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPV eEATQLMN
GESLlKGPRDYNPISSalCHLTNESDGHTTSLYGIGFGPFIITNKHLFRgNNGTLLVQS 96 LHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTNF
QTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGsIVGIHSAadFTN
TtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPVe
EATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRgNNGTLLVQ 97 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN FQTKSMSSMVSaTSrTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGsIVGIHSAadFTN
TtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVqPEEPFQPVeE
ATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRgNNGTLLVQ 98 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERvCLVTT
NFQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGsIVGIHSAadF
TNTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPE
GESLFKGPRDYNsISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRgNNGTLLVQ 99
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTvPSfDGIFWKHWIQsKpGQCGSPLVSTRDGsIVGIHSAadFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPV eEATQrMN
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRgNNGTLLVQ 100
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERvCLVTT
NFQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGsIVGIHSAadF
TNTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQP
VeEATQLMN PACE Sequence SEQ ID
NO:
GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 101
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGsIVGIHSAadFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPV eEeTQLMN
GESLFKGPRDYNPISSTICHLTNESDGyTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 102 SLHGVFKVKdTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGnPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFsas
GESLFKGPRDYNPISSTICHLTNESDGyTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 103 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKStSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGnPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFsas
GESLFKGPRDYNPISSalCHLTNESDGyTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 104 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGnPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFsas
GESLFKGPRDYNPISSTICHLTNESDGyTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 105 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRvCLVTTN
FQTKSMSSMVSaTSCTvPSfDGIFWKHWIQaKaGQCGnPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFsas
GkSLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 81 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQcKpGQCGSPLVSTRDGFIVGIHSAaNFT
NTisYFTSVPKNFMELLTNQEAQQWVSGWqLNADSmLiGGHKVFfVKPEEPF
GESLFKGPRDYNPISSalCHLTNkSDGlTTSLYGIGFGPFIITNKHLFRRNNGTLLVQS 106
LHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQnLKFREPQREERICLVTTNF
QTKSMSSMVSpTSCTFPSfDGIFWKHWIQaKpGQCGnPLVSTRDGFIVGIHSAaNFT
NTirYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPVe
EATQLMN
GkSLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 71 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQcKpGQCGSPLVSTRDGFIVGIHSAaNFT
NTisYFTSVPKNFMELLTNQEAQQWVSGWqLNADSmLiGGHKVFfVKPEEPF
GESLFKGPRDYNPISSalCHLTNESDGlTTSLYGIGFGPFIITNKHLFRRNNGTLLVQS 107
LHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQnLKFREPQREERICLVTTNF
QTKSMSSMVSpTSCTFPSfDGIFWKHWIQaKpGQCGnPLVSTRDGFIVGIHSAaNFT
NTirYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPVe
EATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTaSLYGIGFGPFIITNKHLFRRNNGTLLVQ 108
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGalVGIHSAadFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLvGGHeVFfVKPEEPFQPV eEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTaSLYGIGFGPFIITNKHLFRRNNGTLLVQ 109
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGalVGIHSAadFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLvGGHeVFfVKPEEPlQPVe
EATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTaSLYGIGFGPFIITNKHLFRRNNGTLLVQ 110 SLHGVFKVKNTTpLQQHLIDGRDMIIIRMPeDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGalVGIHSAadFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLvGGHeVFfVKPEEPFQPV eEATQLMN PACE Sequence SEQ ID
NO:
GESLFKGPRDYNPISSTICHLTNESDGHTaSLYGIGFGPFIITNKHLFRRNNGTLLVQ 111 SLHGVFKVKNpTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSStVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGalVGIHSAadFTN
TtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPVe
EATQLMN
GESLFKGPRDYNPISSTICHLTNESDGHTaSLYGIGFGPFIITNKHLFRRNNGTLLVQ 112
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQpKpGpCGSPLVSTRDGalVGIHSAadFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLvGGHeVFfVKPEEPFQPV eEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGyTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 113
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGnPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFQPV
KEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGyTaSLYGIGFGPFIITNKHLFRRNNGTLLVQ 114 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGnPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFpas
GESLFKGPRDYNPISSTICHLTNESDGyTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 115 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKnMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGnPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFsas
GESLFKGPRDYNPISSTICHLTNESDGyTaSLYGIGFGPFIITNKHLFRRNNGTLLVQ 116
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGnPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFQPV
KEATQLMN
GESLFKGPRDYNPISSTICHLTNESDGyTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 78 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGnPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFsas
GESLFKGPRDYNPISSTvCHLTNESDGyTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 117 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGSPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFpas
GkSLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 118 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERvCLVTT
NFQTKSMSSMVSaTSCTFPSfDGIFWKHWIQcKpGQCGnPLVSTRDGFIVGIHSAaNF
TNTisYFTSVPKNFMELLTNQEAQQWVSGWqLNADSmLiGGHKVFfVKPEEPF
GkSLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 119 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQRdERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQcKpGQCGnPLVSTRDGFIVGIHSAaNFT
NTisYFTSVPeNFMELLTNQEAQQWVSGWqLNADSmLiGGHKVFfVKPEEPF
GkSLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 120 SLHGVFKVKNTpTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQcKpGQCGnPLVSTRDGFIVGIHSAaNFT
NTisYFTSVPKNFMELLTNQEAQQWVSGWqLNADSmLiGGHKVFfVKPEEPF
GkSLFKGPRDYNPISSTICHLTNESDGHTaSLYGIGFGPFIITNKHLFRRNNGTLLVQ 121
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQRgERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKpGQCGnPLVSTRDGFIVGIHSAaNFT
NTirYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLvGGHKVFfVKPEEPFQPV eEAaQLMN
GkSLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 122 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICfVTTN
FQTKSMSSiVSaTSCTFPSfDGIFWKHWIQcKpGQCGnPLVSTRDGFIVGIHSAaNFTN
TisYFTSVPKNFMELLTNQEAQQWVSGWqLNADSmLiGGHKVFfVKPEEPF PACE Sequence SEQ ID
NO:
GkSLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 123 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQcKpGQCGnPLVSTRDGFIVGIHSAaNFT
NTisYFTSVPKNFMELLTNQEAQQWVSGWqLNADSmLiGGHKVFfVKPEEPF
GkSLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 124 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTaN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQcKpGQCGnPLVSTRDGFIVGIHSAaNFT
NTisYFTSVPKNFMELLTNQEAQQWVSGWqLNADSmLiGGHKVFfVKPEEPF
GESLFKGPRDYNPISSalCHLTNESDGlTTSLYGIGFGPFIITNKHLFRRNNGTLLVQS 125
LHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTNF
QTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGnPLVSTRDGsIVGIHSAadFTN
TtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPsQPVeE
ATQLMN
GESLFKGPRDYNPISSTICHLTNESDG1TTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 126 SLqGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGnPLVSTRDGsIVGIHSAadFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFsas
GESLFKGPRDYNPISSalCHLTNESDGlTTSLYGIGFGPFIITNKHLFRRNNGTLLVQS 127 LHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTNF QTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGnPLVSTRDGsIVGIHSAadFTN
TtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFsas
GESLFKGPRDYNPISSalCHLTNESDGlTTSLYGIGFGPFIITNKHLFRRNNGTLLVQS 128 LHGVFKVKdTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTNF
QTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGnPLVSTRDGsIVGIHSAadFTN
TtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFsas
GkSLFKGPRDYNPISSalCHLTNESDGlTTSLYGIGFGPFIITNKHLFRRNNGTLLVQS 129 LHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTNF QTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGnPLVSTRDGsIVGIHSAadFTN
TtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFsas
GESLFKGPRDYNPISSTICHLTNESDG1TTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 130 SLHGVFKVKdTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGnPLVSTRDGsIVGIHSAadFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFs
GESLFKGPRDYNPISSTICHLTNESDGyTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 131 SLHGVFKgKNTTTLQQILIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTNF
QTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGSPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFsas
GESLFKGPRDYNPISSTICHLTNESDGyTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 132 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKStSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGSPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFsas
GkSLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 133 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQcKpGQCGSPLVSTRDGFIVGIHSAaNFT
NTisYFTSVPKNFMELLTNQEAQQWVSGWqLNADSmLiGGHeVFfVKPEEPF
GkSLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 134 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQcKpGQCGnPLVSTRDGFIVGIHSAaNFT
NTisYFTSVPKNFMELLTNQEAQQWVSGWqLNADSmLiGGHeVFfVKPEEPF
GESLFKGPRDYNPISSTICHLTNESDG1TTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 135
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSapSCTFPSfDGIFWKHWIQsKpGQCGnPLVSTRDGsIVGIHSAadFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPV eEATQLMN PACE Sequence SEQ ID
NO:
GESLFKGPcDYNPISSTICHLTNESDGlTaSLYGIGFGPFIITNKHLFRRNNGTLLVQS 136
LHGVFKVKdTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTNF
QTKSMSSMVSaTSCTlPSfDGIFWKHWIQsKpGQCGnPLVSTRDGsIVGIHSAadFTNT tmYFTSgPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPVeE
ATQLMN
GESLFKGPRDYNPISSsICHLTNESDGlTaSLYGIGFGPFIITNKHLFRRNNGTLLVQS 137
LHGVFKVKdTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTNF
QTKSMSSMVSaTSCTlPSfDGIFWKHWIQsKpGQCGnPLVSTRDGsIVGIHSAadFTNT tmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPVeE
ATQLMN
GESLFKGPRDYNPISSTICHLTNESDGlTaSLYGIGFGPFIITNKHLFRRNNGTLLVQS 138
LHGVFKVKdTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTNF
QTKSMSSMVSaTSCTlPSfDGIFWKHWIQsKpGQCGnPLVSTRDGsIVGIHSAadFTNT tmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLiGGHKVFfVKPEEPFQPVeE
ATQLMN
GESLFKGPRDYNPISSTICHLTNESDGyTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 139 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPrREdRICLVTTNF
QTKSMSSMVSaTSCTsPSfDGIFWKHWIQaKaGQCGnPLVSTRDGFIVGIHSASNFTN
TiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFsas
GESLFKGPRDYNPISSTICHLTNESDGyTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 78 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGnPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFsas
GESLFKGPRDYNPISSTICHLTNvSDGyTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 140 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGnPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFsas
GgSLFKGPRDYNPISSTICHLTNESDGyTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 141 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGSPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKdFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFsas
GESLFKGPRDYNPISSTICHLTNESDGyTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 142 SLHGVFKVKdTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGSPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFsas
GESLFKGPRDYNPISSTICHLTNkSDGyTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 143 SLHGVFKVKdTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaaSCTFPSfDGIFWKHWIQaKaGQCGSPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFsas
GkSLFKGPRDYNPISSTICHLTNESyGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 144 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQcKpGQCGSPLVSTRDGFIVGIHSAaNFT
NTisYFTSVPKNFMELLTNQEAQQWVSGWqLNADSmLiGGHKVFlVKPEEPF
GkSLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 81 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQcKpGQCGSPLVSTRDGFIVGIHSAaNFT
NTisYFTSVPKNFMELLTNQEAQQWVSGWqLNADSmLiGGHKVFfVKPEEPF
GkSLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 145 SLHGVFKVKNTTTLQQHLIDGgDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQcKpGQCGSPLVSTRDGFIVGIHSAaNFT
NTisYFTSVPKNFMELLTNQEAQQWVSGWqLNADSmLiGGHeVFfVKPEEPF
GkSLFKGPRDYNPISSalCHLTNESDGHTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 146 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQcKpGQCGSPLVSTRDGFIVGIHSAaNFT
NTisYFTSVPKNFMELLTNQEAQQWVSGWqLNADSmLiGGHKVFfVKPEEPF PACE Sequence SEQ ID
NO:
9 GESLlKGPRDYNPISSTICHLTNESDGHTaSLYGIGFGPFIITNKHLFRRNNGTLLVQ 147 SLHGVFKVKNTTTLhQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQaKSMSSMVSaTSrTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGalVGIHSAadFTN
TtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLvGGHeVFfVKt
9 GESLFKGPRDYNPISSTICHLTNESDGHTaSLYGIGFGPFIITNKHLFRRNNGTLLVQ 148 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGalVGIHSAadFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLvGGHeVFfVKt
9 GESLFKGPRDYNPISSTICHLTNESDGHTaSLYGIGFGPFIITNKHLFRRNNGTLLVQ 149 SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN FQTKSMSSMgSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGalVGIHSAadFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLvGGHeVFfVKt
9 GESLFKGPRDYNPISSTICHLTNESDGHTaSLYGIGFGPFIITNKHLFRRNNGTLLVQ 150
SLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQsKpGQCGSPLVSTRDGalVGIHSAadFT
NTtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLvGGHeVFfVKPEEPFQPV eEATQLMN
9 GESLFKGPRDYNPISSTICHLTNESDGHTaSLYGIGFGPFIITNKHLFRRNNGTLLVQ 151 SLHGVFKVKNTTTLhQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREERICLVTTN
FQaKSMSSMVSaTSrTFPSfDGIFWKHWIpsKpGQCGSPLVSTRDGalVGIHSAadFTN
TtmYFTSVPKNFMELLTNQEAQQWVSGWRLNADSmLvGGHeVFfVKt
9 GESLFKGPRDYtPISSTICHLTNESDGyTTSLYGIGFGPFIITNKHLFRRNNGTLLVQS 152 LHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTNF
QTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGnPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNQEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFsas
9 GESLFKGPRgYNPISSTICHLTNESDGyTTSLYGIGFGPFIITNKHLFRRNNGTLLVQ 153 SLHGVFKVeNTTTLQQHLIDGRDMIIIRMPKDFPPFPQKLKFREPQREdRICLVTTN
FQTKSMSSMVSaTSCTFPSfDGIFWKHWIQaKaGQCGnPLVSTRDGFIVGIHSASNFT
NTiNYFTSVPKNFMELLTNpEAQQWVSGWRLNADSfLcGGHKVFlVKPEEPFsas
[00111] This disclosure relates, in part, to the discovery that continuous evolution methods (e.g., PACE) are useful for producing variant TEV proteases that have altered peptide cleaving activities (altered peptide cleaving functions). For example, in some embodiments, a variant TEV protease as described by the disclosure cleaves an IL-23 protein or peptide. In some embodiments, a variant TEV protease as described by the disclosure cleaves the target sequence HPLVGHM (SEQ ID NO: 3). In some embodiments, a variant TEV protease as described herein cleaves both the canonical TEV protease peptide target sequence ENLYFQS (SEQ ID NO: 2) and an IL-23 peptide target sequence, for example, HPLVGHM (SEQ ID NO: 3). In some embodiments, a variant TEV protease cleaves an IL- 23 target peptide with higher affinity than the cognate TEV protease. A variant TEV protease that cleaves a target peptide with higher affinity can have an increase in catalytic efficiency ranging from about 1.1-fold, about 1.5-fold, 2-fold to about 100-fold, about 5-fold to about 50-fold, or about 10-fold to about 40-fold, relative to the catalytic efficiency of the wild-type TEV protease from which the variant TEV protease was derived. In some embodiments, a variant TEV protease described herein cleaves IL-23 with about 1% to about 100% (e.g. , about 1%, 2%, 5%, 10%, 20%, 50%, 80%, 90%, 100%)_of the catalytic efficiency with which wild-type TEV cleaves its native substrate (e.g., ENLYFQS, SEQ ID NO: 2). Catalytic efficiency can be measured or determined using any suitable method known in the art, for example using the methods described in the Examples below.
[00112] Some aspects of this disclosure provide methods for using a protease provided herein. In some embodiments, such methods include contacting a protein comprising a protease target cleavage sequence with the protease. In some embodiments, the protein contacted with the protease is a therapeutic target. In some embodiments, the therapeutic target is interleukin-23 (IL-23). Generally, IL-23 is a heterodimeric cytokine that comprises an IL- 12p40 subunit and an IL-23pl9 subunit, and binds to its cognate receptor, IL-23R. Without wishing to be bound by any particular theory, IL-23 functions as a mediator of inflammation, for example by inducing secretion of the pro-inflammatory cytokine interleukin- 17 (IL- 17). Accordingly, in some aspects, the disclosure provides methods of decreasing IL-23 expression or activity in a cell, the method comprising contacting the cell, or the extracellular environment (e.g. , cell culture media surrounding the cells) with a variant TEV protease as described herein. In some embodiments, the disclosure provides methods of decreasing IL-17 expression or activity in a cell, the method comprising contacting the cell, or the extracellular environment, with a variant TEV protease as described herein. In some embodiments, the cell, or extracellular environment, is in a subject, for example a mammal. In some embodiments, the cell, or extracellular environment, is in vitro.
[00113] In some embodiments, the cell (or extracellular environment) is characterized by increased expression of IL-23 relative to a normal cell or extracellular environment (e.g. , a healthy cell, or extracellular environment, not characterized by increased expression of IL- 23). In some embodiments, increased expression of IL-23 occurs when, in a cell, the expression of IL-23 is about 2-fold, 3-fold, 4-fold, 5-fold, 10-fold, 25-fold, 50-fold, 100-fold, 500-fold, or 1000-fold over expression of IL-23 in a normal healthy cell, or extracellular environment. In some embodiments, a cell characterized by increased expression of IL-23 is derived from a subject (e.g. , a mammalian subject, such as a human or mouse) that has or is suspected of having a disease associated with increased IL-23 expression, for example, an inflammatory disease or an autoimmune disease. Non-limiting examples of inflammatory diseases include, but are not limited to, plaque psoriasis, multiple sclerosis, inflammatory bowel disease, ulcerative colitis, Crohn's disease, rheumatoid arthritis, systemic lupus erythmatosus (SLE), and spondylarthritis. [00114] In some embodiments, the methods provided herein comprise contacting the target protein (e.g., IL-23, or a protein comprising a peptide comprising the amino acid sequence HPLVGHM (SEQ ID NO: 3)) with the protease in vitro. In some embodiments, the methods provided herein comprise contacting the target protein (e.g., IL-23, or a protein comprising a peptide comprising the amino acid sequence HPLVGHM (SEQ ID NO: 3)) with the protease in vivo. In some embodiments, the methods provided herein comprise contacting the target protein (e.g., IL-23, or a protein comprising a peptide comprising the amino acid sequence HPLVGHM (SEQ ID NO: 3)) with the protease in an extracellular environment. In some embodiments, the methods provided herein comprise contacting the target protein (e.g., IL-23, or a protein comprising a peptide comprising the amino acid sequence HPLVGHM (SEQ ID NO: 3)) with the protease in a subject, e.g., by administering the protease to the subject, either locally or systemically. In some such embodiments, the protease is administered to the subject in an amount effective to result in a measurable decrease in the level of full-length (or functional) target protein (e.g., IL-23) in the subject, or in a measurable increase in the level of a cleavage product generated by the protease upon cleavage of the target protein.
[00115] Aspects of this disclosure relate to the discovery that cleaving IL-23 using the evolved proteases described herein results in attenuated IL-17 secretion. Accordingly, IL-23- cleaving TEV protease variants described herein are useful, in some embodiments, for treating diseases associated with increased IL-17 expression or activity, such as autoimmune diseases (e.g., plaque psoriasis, multiple sclerosis, inflammatory bowel disease, ulcerative colitis, Crohn's disease, rheumatoid arthritis, systemic lupus erythmatosus (SLE), and spondylarthritis, etc.). In some embodiments, inhibition of IL-17 secretion by a protease described herein results in the amelioration of at least one symptom of an autoimmune disease or disorder.
Production of Variant TEV Proteases using PACE
[00116] Some aspects of this disclosure provide methods for evolution of a protease.
In some embodiments, a method of evolution of a protease is provided that comprises (a) contacting a population of host cells with a population of vectors comprising a gene encoding a protease. The vectors are typically deficient in at least one gene required for the transfer of the phage vector from one cell to another, e.g., a gene required for the generation of infectious phage particles. In some embodiments of the provided methods, (1) the host cells are amenable to transfer of the vector; (2) the vector allows for expression of the protease in the host cell, can be replicated by the host cell, and the replicated vector can transfer into a second host cell; and (3) the host cell expresses a gene product encoded by the at least one gene for the generation of infectious phage particles (a) in response to the activity of the protease, and the level of gene product expression depends on the activity of the protease. The methods of protease evolution provided herein typically comprise (b) incubating the population of host cells under conditions allowing for mutation of the gene encoding the protease, and the transfer of the vector comprising the gene encoding the protease of interest from host cell to host cell. The host cells are removed from the host cell population at a certain rate, e.g., at a rate that results in an average time a host cell remains in the cell population that is shorter than the average time a host cell requires to divide, but long enough for the completion of a life cycle (uptake, replication, and transfer to another host cell) of the vector. The population of host cells is replenished with fresh host cells that do not harbor the vector. In some embodiments, the rate of replenishment with fresh cells substantially matches the rate of removal of cells from the cell population, resulting in a substantially constant cell number or cell density within the cell population. The methods of protease evolution provided herein typically also comprise (c) isolating a replicated vector from the host cell population in (b), wherein the replicated vector comprises a mutated version of the gene encoding the protease.
[00117] Some embodiments provide a continuous evolution system, in which a population of viral vectors, e.g. , M13 phage vectors, comprising a gene encoding a protease of interest to be evolved replicates in a flow of host cells, e.g., a flow through a lagoon, wherein the viral vectors are deficient in a gene encoding a protein that is essential for the generation of infectious viral particles, and wherein that gene is in the host cell under the control of a conditional promoter the activity of which depends on the activity of the protease of interest. In some embodiments, transcription from the conditional promoter may be activated by cleavage of a fusion protein comprising a transcription factor and an inhibitory protein fused to the transcriptional activator via a linker comprising a target site of the protease.
[00118] Some embodiments of the protease PACE technology described herein utilize a "selection phage," a modified phage that comprises a gene of interest to be evolved and lacks a full-length gene encoding a protein required for the generation of infectious phage particles. In some such embodiments, the selection phage serves as the vector that replicates and evolves in the flow of host cells. For example, some M13 selection phages provided herein comprise a nucleic acid sequence encoding a protease to be evolved, e.g. , under the control of an M 13 promoter, and lack all or part of a phage gene encoding a protein required for the generation of infectious phage particles, e.g., gl, gll, gill, gIV, gV, gVI, gVII, gVIII, glX, or gX, or any combination thereof. For example, some M13 selection phages provided herein comprise a nucleic acid sequence encoding a protease to be evolved, e.g. , under the control of an M 13 promoter, and lack all or part of a gene encoding a protein required for the generation of infective phage particles, e.g. , the gill gene encoding the pill protein.
[00119] One prerequisite for evolving proteases with a desired activity is to provide a selection system that confers a selective advantage to mutated protease variants exhibiting such an activity. The expression systems and fusion proteins comprising transcriptional activators in an inactive form that are activated by protease activity thus constitute an important feature of some embodiments of the protease PACE technology provided herein.
[00120] In some embodiments, the transcriptional activator directly drives
transcription from a target promoter. For example, in some such embodiments, the transcriptional activator may be an RNA polymerase. Suitable RNA polymerases and promoter sequences targeted by such RNA polymerases are well known to those of skill in the art. Exemplary suitable RNA polymerases include, but are not limited to, T7 polymerases (targeting T7 promoter sequences) and T3 RNA polymerases (targeting T3 promoter sequences). Additional suitable RNA polymerases will be apparent to those of skill in the art based on the instant disclosure, which is not limited in this respect.
[00121] In some embodiments, the transcriptional activator does not directly drive transcription, but recruits the transcription machinery of the host cell to a specific target promoter. Suitable transcriptional activators, such as, for example, Gal4 or fusions of the transactivation domain of the VP 16 transactivator with DNA-binding domains, will be apparent to those of skill in the art based on the instant disclosure, and the disclosure is not limited in this respect.
[00122] In some embodiments, it is advantageous to link protease activity to enhanced phage packaging via a transcriptional activator that is not endogenously expressed in the host cells in order to minimize leakiness of the expression of the gene required for the generation of infectious phage particles through the host cell basal transcription machinery. For example, in some embodiments, it is desirable to drive expression of the gene required for the generation of infectious phage particles from a promoter that is not or is only minimally active in host cells in the absence of an exogenous transcriptional activator, and to provide the exogenous transcriptional activator, such as, for example, T7 RNA polymerase, as part of the expression system linking protease activity to phage packaging efficiency. In some embodiments, the at least one gene for the generation of infectious phage particles is expressed in the host cells under the control of a promoter activated by the transcriptional activator, for example, under the control of a T7 promoter if the transcriptional activator is T7 RNA polymerase, and under the control of a T3 promoter if the transcriptional activator is T3 polymerase, and so on.
[00123] In some embodiments, the transcriptional activator is fused to an inhibitor that either directly inhibits or otherwise hinders the transcriptional activity of the transcriptional activator, for example, by directly interfering with DNA binding or transcription, by targeting the transcriptional activator for degradation through the host cells protein degradation machinery, or by directing export from the host cell or localization of the transcriptional activator into a compartment of the host cell in which it cannot activate transcription from its target promoter. In some embodiments, the inhibitor is fused to the transcriptional activator's N-terminus. In other embodiments, it is fused to the activator' s C-terminus.
[00124] In some embodiments, the protease evolution methods provided herein comprise an initial or intermittent phase of diversifying the population of vectors by mutagenesis, in which the cells are incubated under conditions suitable for mutagenesis of the gene encoding the protease in the absence of stringent selection or in the absence of any selection for evolved protease variants that have acquired a desired activity. Such low- stringency selection or no selection periods may be achieved by supporting expression of the gene for the generation of infectious phage particles in the absence of desired protease activity, for example, by providing an inducible expression construct comprising a gene encoding the respective packaging protein under the control of an inducible promoter and incubating under conditions that induce expression of the promoter, e.g. , in the presence of the inducing agent. Suitable inducible promoters and inducible expression systems are described herein and in International PCT Application, PCT/US2011/066747, filed December 22, 2011, published as WO 2012/088381 on June 28, 2012; and U.S. Application, U.S.S.N. 13/922,812, filed June 20, 2013; International PCT Application, PCT/US2015/057012, filed on October 22, 2015, published as WO2016/077052; and, PCT/US2016/027795, filed on April 15, 2016, published as WO2016/168631, the entire contents of each of which are incorporated herein by reference. Additional suitable promoters and inducible gene expression systems will be apparent to those of skill in the art based on the instant disclosure. In some embodiments, the method comprises a phase of stringent selection for a mutated protease version. If an inducible expression system is used to relieve selective pressure, the stringency of selection can be increased by removing the inducing agent from the population of cells in the lagoon, thus turning expression from the inducible promoter off, so that any expression of the gene required for the generation of infectious phage particles must come from the protease activity-dependent expression system.
[00125] One aspect of the PACE protease evolution methods provided herein is the mutation of the initially provided vectors encoding a protease of interest. In some embodiments, the host cells within the flow of cells in which the vector replicates are incubated under conditions that increase the natural mutation rate. This may be achieved by contacting the host cells with a mutagen, such as certain types of radiation or to a mutagenic compound, or by expressing genes known to increase the cellular mutation rate in the cells. Additional suitable mutagens will be known to those of skill in the art, and include, without limitation, those described in International PCT Application, PCT/US2011/066747, filed December 22, 2011, published as WO 2012/088381 on June 28, 2012; and U.S. Application, U.S.S.N. 13/922,812, filed June 20, 2013; International PCT Application,
PCT/US2015/057012, filed on October 22, 2015, published as WO2016/077052; and, PCT/US2016/027795, filed on April 15, 2016, published as WO2016/168631, the entire contents of each of which are incorporated herein by reference and the disclosure is not limited in this respect.
[00126] In some embodiments, the host cells comprise the accessory plasmid encoding the at least one gene for the generation of infectious phage particles, e.g., of the M13 phage, encoding the protease to be evolved and a helper phage, and together, the helper phage and the accessory plasmid comprise all genes required for the generation of infectious phage particles. Accordingly, in some such embodiments, variants of the vector that do not encode a protease variant that can untether the inhibitor from the transcriptional activator will not efficiently be packaged, since they cannot effect an increase in expression of the gene required for the generation of infectious phage particles from the accessory plasmid. On the other hand, variants of the vector that encode a protease variant that can efficiently cleave the inhibitor from the transcriptional activator will effect increased transcription of the at least one gene required for the generation of infectious phage particles from the accessory plasmid and thus be efficiently packaged into infectious phage particles.
[00127] In some embodiments, the protease PACE methods provided herein further comprises a negative selection for undesired protease activity in addition to the positive selection for a desired protease activity. Such negative selection methods are useful, for example, in order to maintain protease specificity when increasing the cleavage efficiency of a protease directed towards a specific target site. This can avoid, for example, the evolution of proteases that show a generally increased protease activity, including an increased protease activity towards off-target sites, which is generally undesired in the context of therapeutic proteases.
[00128] In some embodiments, negative selection is applied during a continuous evolution process as described herein, by penalizing the undesired activities of evolved proteases. This is useful, for example, if the desired evolved protease is an enzyme with high specificity for a target site, for example, a protease with altered, but not broadened, specificity. In some embodiments, negative selection of an undesired activity, e.g., off-target protease activity, is achieved by causing the undesired activity to interfere with pill production, thus inhibiting the propagation of phage genomes encoding gene products with an undesired activity. In some embodiments, expression of a dominant-negative version of pill or expression of an antisense RNA complementary to the gill RBS and/or gill start codon is linked to the presence of an undesired protease activity. Suitable negative selection strategies and reagents useful for negative selection, such as dominant-negative versions of M13 pill, are described herein and in International PCT Application, PCT/US2011/066747, filed December 22, 2011, published as WO 2012/088381 on June 28, 2012; and U.S.
Application, U.S.S.N. 13/922,812, filed June 20, 2013; International PCT Application, PCT/US2015/057012, filed on October 22, 2015, published as WO2016/077052; and, PCT/US2016/027795, filed on April 15, 2016, published as WO2016/168631, the entire contents of each of which are incorporated herein by reference.
[00129] In some embodiments, counter- selection against activity on non-target substrates is achieved by linking undesired evolved protease activities to the inhibition of phage propagation. In some embodiments, a dual selection strategy is applied during a continuous evolution experiment, in which both positive selection and negative selection constructs are present in the host cells. In some such embodiments, the positive and negative selection constructs are situated on the same plasmid, also referred to as a dual selection accessory plasmid.
[00130] One advantage of using a simultaneous dual selection strategy is that the selection stringency can be fine-tuned based on the activity or expression level of the negative selection construct as compared to the positive selection construct. Another advantage of a dual selection strategy is that the selection is not dependent on the presence or the absence of a desired or an undesired activity, but on the ratio of desired and undesired activities, and, thus, the resulting ratio of pIII and plll-neg that is incorporated into the respective phage particle.
[00131] For example, in some embodiments, the host cells comprise an expression construct encoding a dominant-negative form of the at least one gene for the generation of infectious phage particles, e.g., a dominant-negative form of the pIII protein (plll-neg), under the control of an inducible promoter that is activated by a transcriptional activator other than the transcriptional activator driving the positive selection system. Expression of the dominant-negative form of the gene diminishes or completely negates any selective advantage an evolved phage may exhibit and thus dilutes or eradicates any variants exhibiting undesired activity from the lagoon.
[00132] For example, if the positive selection system comprises a T7 promoter driving the expression of the at least one gene for the generation of infectious phage particles, and a T7 RNA polymerase fused to a T7-RNA polymerase inhibitor via a linker comprising a protease target site that is cleaved by a desired protease activity, the negative selection system should be a non-T7 based system. For example, in some such embodiments, the negative selection system could be based on T3 polymerase activity, e.g., in that it comprises a T3 promoter driving the expression of a dominant-negative form of the at least one gene for the generation of infectious phage particles, and a T3 RNA polymerase fused to a T3-RNA polymerase inhibitor via a linker comprising a protease target site that is cleaved by an undesired protease activity. In some embodiments, the negative selection polymerase is a T7 RNA polymerase gene comprising one or more mutations that render the T7 polymerase able to transcribe from the T3 promoter but not the T7 promoter, for example: N67S, R96L, K98R, H176P, E207K, E222K, T375A, M401I, G675R, N748D, P759L, A798S, A819T, etc. In some embodiments the negative selection polymerase may be fused to a T7-RNA polymerase inhibitor via a linker comprising a protease target site that is cleaved by an undesired protease activity. When used together, such positive-negative PACE selection results in the evolution of proteases that exhibit the desired activity but not the undesired activity. In some embodiments, the undesired function is cleavage of an off-target protease cleavage site. In some embodiments, the undesired function is cleavage of the linker sequence of the fusion protein outside of the protease cleavage site.
[00133] Some aspects of this invention provide or utilize a dominant negative variant of pill (plll-neg). These aspects are based on the recognition that a pill variant that comprises the two N-terminal domains of pill and a truncated, termination-incompetent C- terminal domain is not only inactive but is a dominant-negative variant of pill. A pill variant comprising the two N-terminal domains of pill and a truncated, termination-incompetent C- terminal domain was described in Bennett, N. J.; Rakonjac, J., Unlocking of the filamentous bacteriophage virion during infection is mediated by the C domain of pill. Journal of
Molecular Biology 2006, 356 (2), 266-73; the entire contents of which are incorporated herein by reference. The dominant negative property of such pill variants has been described in more detail in PCT Application PCT/US2011/066747, published as
WO2012/088381 on June 28, 2012, the entire contents of which are incorporated herein by reference.
[00134] The plll-neg variant as provided in some embodiments herein is efficiently incorporated into phage particles, but it does not catalyze the unlocking of the particle for entry during infection, rendering the respective phage noninfectious even if wild type pill is present in the same phage particle. Accordingly, such plll-neg variants are useful for devising a negative selection strategy in the context of PACE, for example, by providing an expression construct comprising a nucleic acid sequence encoding a plll-neg variant under the control of a promoter comprising a recognition motif, the recognition of which is undesired. In other embodiments, plll-neg is used in a positive selection strategy, for example, by providing an expression construct in which a plll-neg encoding sequence is controlled by a promoter comprising a nuclease target site or a repressor recognition site, the recognition of either one is desired.
[00135] In some embodiments, a protease PACE experiment according to methods provided herein is run for a time sufficient for at least 10, at least 20, at least 30, at least 40, at least 50, at least 100, at least 200, at least 300, at least 400, at least, 500, at least 600, at least 700, at least 800, at least 900, at least 1000, at least 1250, at least 1500, at least 1750, at least 2000, at least 2500, at least 3000, at least 4000, at least 5000, at least 7500, at least 10000, or more consecutive viral life cycles. In certain embodiments, the viral vector is an M13 phage, and the length of a single viral life cycle is about 10-20 minutes.
[00136] In some embodiments, the host cells are contacted with the vector and/or incubated in suspension culture. For example, in some embodiments, bacterial cells are incubated in suspension culture in liquid culture media. Suitable culture media for bacterial suspension culture will be apparent to those of skill in the art, and the invention is not limited in this regard. See, for example, Molecular Cloning: A Laboratory Manual, 2nd Ed., ed. by Sambrook, Fritsch, and Maniatis (Cold Spring Harbor Laboratory Press: 1989); Elizabeth Kutter and Alexander Sulakvelidze: Bacteriophages: Biology and Applications. CRC Press; 1st edition (December 2004), ISBN: 0849313368; Martha R. J. Clokie and Andrew M. Kropinski: Bacteriophages: Methods and Protocols, Volume 1: Isolation, Characterization, and Interactions (Methods in Molecular Biology) Humana Press; 1st edition (December, 2008), ISBN: 1588296822; Martha R. J. Clokie and Andrew M. Kropinski: Bacteriophages: Methods and Protocols, Volume 2: Molecular and Applied Aspects (Methods in Molecular Biology) Humana Press; 1st edition (December 2008), ISBN: 1603275649; all of which are incorporated herein in their entirety by reference for disclosure of suitable culture media for bacterial host cell culture).
[00137] The protease PACE methods provided herein are typically carried out in a lagoon. Suitable lagoons and other laboratory equipment for carrying out protease PACE methods as provided herein have been described in detail elsewhere. See, for example, International PCT Application, PCT/US2011/066747, published as WO2012/088381 on June 28, 2012, the entire contents of which are incorporated herein by reference. In some embodiments, the lagoon comprises a cell culture vessel comprising an actively replicating population of vectors, for example, phage vectors comprising a gene encoding the protease of interest, and a population of host cells, for example, bacterial host cells. In some
embodiments, the lagoon comprises an inflow for the introduction of fresh host cells into the lagoon and an outflow for the removal of host cells from the lagoon. In some embodiments, the inflow is connected to a turbidostat comprising a culture of fresh host cells. In some embodiments, the outflow is connected to a waste vessel, or a sink. In some embodiments, the lagoon further comprises an inflow for the introduction of a mutagen into the lagoon. In some embodiments that inflow is connected to a vessel holding a solution of the mutagen. In some embodiments, the lagoon comprises an inflow for the introduction of an inducer of gene expression into the lagoon, for example, of an inducer activating an inducible promoter within the host cells that drives expression of a gene promoting mutagenesis (e.g., as part of a mutagenesis plasmid), as described in more detail elsewhere herein. In some embodiments, that inflow is connected to a vessel comprising a solution of the inducer, for example, a solution of arabinose.
[00138] In some embodiments, a PACE method as provided herein is performed in a suitable apparatus as described herein. For example, in some embodiments, the apparatus comprises a lagoon that is connected to a turbidostat comprising a host cell as described herein. In some embodiments, the host cell is an E. coli host cell. In some embodiments, the host cell comprises an accessory plasmid as described herein, a helper plasmid as described herein, a mutagenesis plasmid as described herein, and/or an expression construct encoding a fusion protein as described herein, or any combination thereof. In some embodiments, the lagoon further comprises a selection phage as described herein, for example, a selection phage encoding a protease of interest. In some embodiments, the lagoon is connected to a vessel comprising an inducer for a mutagenesis plasmid, for example, arabinose. In some embodiments, the host cells are E. coli cells comprising the F' plasmid, for example, cells of the genotype F proA+B+ A(lacIZY) zzf::TnlO(TetR)/ endAl recAl galE15 galK16 nupG rpsL AlacIZYA araD139 A(ara,leu)7697 mcrA A(mrr-hsdRMS-mcrBC) proBA::pirl l6 λ".
[00139] Some aspects of this invention relate to host cells for continuous evolution processes as described herein. In some embodiments, a host cell is provided that comprises at least one viral gene encoding a protein required for the generation of infectious viral particles under the control of a conditional promoter, and a fusion protein comprising a transcriptional activator targeting the conditional promoter and fused to an inhibitor via a linker comprising a protease cleavage site. For example, some embodiments provide host cells for phage-assisted continuous evolution processes, wherein the host cell comprises an accessory plasmid comprising a gene required for the generation of infectious phage particles, for example, M13 gill, under the control of a conditional promoter, as described herein. In some embodiments, the host cells comprises an expression construct encoding a fusion protein as described herein, e.g., on the same accessory plasmid or on a separate vector. In some embodiments, the host cell further provides any phage functions that are not contained in the selection phage, e.g., in the form of a helper phage. In some embodiments, the host cell provided further comprises an expression construct comprising a gene encoding a mutagenesis-inducing protein, for example, a mutagenesis plasmid as provided herein.
[00140] In some embodiments, modified viral vectors are used in continuous evolution processes as provided herein. In some embodiments, such modified viral vectors lack a gene required for the generation of infectious viral particles. In some such embodiments, a suitable host cell is a cell comprising the gene required for the generation of infectious viral particles, for example, under the control of a constitutive or a conditional promoter {e.g., in the form of an accessory plasmid, as described herein). In some embodiments, the viral vector used lacks a plurality of viral genes. In some such embodiments, a suitable host cell is a cell that comprises a helper construct providing the viral genes required for the generation of infectious viral particles. A cell is not required to actually support the life cycle of a viral vector used in the methods provided herein. For example, a cell comprising a gene required for the generation of infectious viral particles under the control of a conditional promoter may not support the life cycle of a viral vector that does not comprise a gene of interest able to activate the promoter, but it is still a suitable host cell for such a viral vector. [00141] In some embodiments, the host cell is a prokaryotic cell, for example, a bacterial cell. In some embodiments, the host cell is an E. coli cell. In some embodiments, the host cell is a eukaryotic cell, for example, a yeast cell, an insect cell, or a mammalian cell. The type of host cell, will, of course, depend on the viral vector employed, and suitable host cell/viral vector combinations will be readily apparent to those of skill in the art.
[00142] In some embodiments, the viral vector is a phage and the host cell is a bacterial cell. In some embodiments, the host cell is an E. coli cell. Suitable E. coli host strains will be apparent to those of skill in the art, and include, but are not limited to, New England Biolabs (NEB) Turbo, ToplOF', DH12S, ER2738, ER2267, and XLl-Blue MRF' . These strain names are art recognized and the genotype of these strains has been well characterized. It should be understood that the above strains are exemplary only and that the invention is not limited in this respect.
[00143] In some PACE embodiments, for example, in embodiments employing an
M13 selection phage, the host cells are E. coli cells expressing the Fertility factor, also commonly referred to as the F factor, sex factor, or F-plasmid. The F-factor is a bacterial DNA sequence that allows a bacterium to produce a sex pilus necessary for conjugation and is essential for the infection of E. coli cells with certain phage, for example, with M13 phage. For example, in some embodiments, the host cells for M13-PACE are of the genotype F proA+B+ A(lacIZY) zzf::Tnl0(TetR)/ endAl recAl galE15 galK16 nupG rpsL AlacIZYA araD139 A(ara,leu)7697 mcrA A(mrr-hsdRMS-mcrBC) proBA::pirl l6 λ.
[00144] Some of the embodiments, advantages, features, and uses of the technology disclosed herein will be more fully understood from the Examples below. The Examples are intended to illustrate some of the benefits of the present disclosure and to describe particular embodiments, but are not intended to exemplify the full scope of the disclosure and, accordingly, do not limit the scope of the disclosure.
EXAMPLES
Example 1
Choice of Target Substrate and Protease
[00145] The biochemistry, substrate specificity, and molecular structure of TEV protease were studied. An "evolvability" scoring matrix (Table 15) that assigns a difficulty score to each of the 20 possible amino acids at each of the seven positions recognized by TEV protease, was created. This matrix was used to rank all possible heptapeptides within all human extracellular proteins. From this ranking, a handful of disease-associated proteins were manually curated in which the target peptide sequences were predicted to be solvent exposed based on their crystal structures (Table 16).
Table 15: Target Peptide Scoring Matrix.
Figure imgf000070_0001
[00146] A subjective rating matrix based was created upon the knowledge of TEV protease substrate specificity and evolution of proteases that accept substrate changes. Key features include high marks for consensus residues ENLYFQS (SEQ ID NO: 2) as well as substitutions with known evolutionary solutions P6 His, PI His, and PI Glu. Penalties for cysteine residues were also introduced due to disulfide formation in mammalian target proteins and proline due to unique structural properties. Table 16: Refined List of Protease Target Substrates.
Figure imgf000071_0001
[00147] Target substrates were identified from the human extracellular proteome based upon ratings calculated using the above scoring matrix. These four substrates were manually curated based upon the disease relevance of the target protein and the solvent-accessibility of target peptide.
[00148] The resulting candidate target sequences included HPLVGHM (SEQ ID NO:
3), a sequence found in human IL-23. IL-23 is a pro-inflammatory cytokine secreted by macrophages and dendritic cells in response to pathogens and tissue damage, ultimately promoting an innate immune response at the site of injury or infection. This immune response is mediated by IL-23 -dependent stabilization of Thl7 cells, a class of T helper cells that produce pro-inflammatory cytokines IL-17, IL-6, and TNFa. Hyperactivity of this pathway can lead to a variety of autoimmune disorders including psoriasis and rheumatoid arthritis.
[00149] The target sequence HPLVGHM (SEQ ID NO: 3) differs from the TEV consensus substrate sequence, ENLYFQS (SEQ ID NO: 2), at six of seven positions. Two of these substitutions are predicted to not substantially impact TEV protease activity due to its low specificity at positions P5 and Ρ , while the other four substitutions occur at positions that are known to be crucial specificity determinants of wild-type TEV protease (P6 Glu, P3 Tyr, P2 Phe, and PI Gin). Substitution of TEV substrate P2 Phe or PI Gin with the corresponding IL-23 substrate residue (P2 Gly or PI His) has been shown to reduce TEV protease activity by more than an order of magnitude in each case. However, TEV mutants that accept PI His instead of the PI Gin have been identified, demonstrating the evolv ability of PI recognition.
Continuous Evolution of TEV variants that cleave the IL-23 target peptide
[00150] PACE requires linking the activity of interest to expression of an essential phage gene (such as gene III) and thus phage survival. Such a linkage was previously established for a range of activities including polymerase activity, DNA binding, protein binding, and protein cleavage. To link proteolysis to gene expression a protease-activated RNA polymerase (PA-RNAP) consisting of T7 RNA polymerase (T7 RNAP) fused through a protease-cleavable linker to T7 lysozyme, a natural inhibitor of T7 RNAP (Figure 1) was used. In this study the PA-RNAP is expressed from the same host-cell AP that places gene III expression under control of the T7 promoter, while the evolving protease is expressed from the SP (Figures 5 and 6).
[00151] It was verified that the SP expressing wild-type TEV protease propagates robustly on host cells expressing a PA-RNAP containing the TEV consensus substrate, ENLYFQS (SEQ ID NO: 2), in the linker connecting T7 RNAP and T7 lysozyme. In contrast, replacing the PA-RNAP linker with the target peptide HPLVGHM (SEQ ID NO: 3) results in a failure of phage to propagate and rapid phage washout, consistent with the inability of wild-type TEV protease, or TEV variants containing a handful of immediately accessible mutations, to cleave the target IL-23 peptide.
[00152] Whether successful evolution of TEV protease variants that cleave the IL-23 target would involve multiple evolutionary stepping-stones to guide evolving gene populations through points in the fitness landscape that bring them successively closer to activity on the final target substrate was investigated. Three evolutionary trajectories were designed such that substrate changes known to strongly disrupt the activity of wild-type TEV protease, including P6 His, P2 Gly, and PI His, were introduced in the earliest stepping- stones (Figure 2). These substitutions were first confronted while the evolving protease populations had access to variants with wild-type-like levels of activity, as it was thought that the likelihood of success was higher while proteases had sufficient activity to exchange for altered specificity without falling below a minimum activity threshold needed to survive selection. These substrate changes were introduced one stepping-stone at a time to minimize the risk of washout and to illuminate at each stage how mutations within TEV protease altered substrate specificity.
[00153] All three evolutionary trajectories (Figure 2) began by introducing the P6 His substitution (HNLYFQS; SEQ ID NO: 4) into the PA-RNAP and expressing a site-saturation mutagenesis library of TEV protease from the SP. Using NNK codons, TEV protease residues N171, N176, and Y178, all of which are proximal to the P6 substrate residue were randomized. This first PACE yielded variants with enhanced apparent activity on the HNLYFQS (SEQ ID NO: 4) substrate (Figure 7) and genotypes were highly enriched for D127A + S 135F + N176I, or I138T + N171D + N176T (Table 3). Mutations N171D and N176T have been previously characterized as allowing P6 tolerance for uncharged residues such as threonine and proline.
[00154] Next, two parallel lines of PACE were pursued using either ENLYGQS (SEQ
ID NO: 5) (trajectories 1 and 2) or HNLYFHS (SEQ ID NO: 6) (trajectory 3) as the second stepping-stone substrate. For trajectories 1 and 2, the population that emerged from PACE on the first stepping-stone was diversified (HNLYFQS; SEQ ID NO: 4) with NNK codons at TEV protease residues 209, 211, 216, and 218, which line the hydrophobic pocket that is occupied by the P2 Phe and performed PACE using host cells expressing ENLYGQS (SEQ ID NO: 5). The resulting population of TEV mutants is typified by the mutations N176I, V209M, W21 II, M218F (Table 4), which confer cleavage activity on both HNLYFQS (SEQ ID NO: 4) and ENLYGQS (SEQ ID NO: 5) substrates (Figure 8).
[00155] In trajectory 3, a mixing strategy was used to access TEV proteases that could cleave the HNLYFHS (SEQ ID NO: 6) stepping-stone double mutant substrate. Unlike PACE experiments initiated from a site saturation mutagenesis library, a mixing strategy relies on a transitional period of phage propagation on a mixture of two different host cell populations, one expressing an accepted substrate (HNLYFQS; SEQ ID NO: 4) and the other expressing the next stepping-stone substrate (HNLYFHS; SEQ ID NO: 6). Following this transitional period, the SP is propagated exclusively on hosts expressing the next stepping- stone substrate (HNLYFHS; SEQ ID NO: 6). The variants that emerged from this stage of trajectory 3 showed weak apparent activity on the double mutant substrate HNLYFHS (SEQ ID NO: 6) (Figure 9), and only a single additional enriched mutation D148A (Table 5).
Mutation at residue D148 has previously been reported to enable activity on ENLYFHS (SEQ ID NO: 7).
[00156] Due to the low activity of proteases emerging from this mixing experiment, the site-saturation mutagenesis strategy to was relied on evolve activity on the third stepping- stone, HNLYGHS (SEQ ID NO: 173). The TEV protease populations from trajectory 3 were randomized at sites implicated in P2 recognition (209, 211, 216, and 218), while for trajectories 1 and 2 the TEV protease population was randomized at sites 146, 148, 167, 177 as previously described for the reprogramming of TEV specificity at the PI position (Figure
2) . The primers used to randomize TEV protease residues 167 and 177 must also encode the identity of intervening amino acids N171 and N176. Although the population appeared to converge on N176I (Table 4), one library was constructed with primers encoding N176I (trajectory 1) and another with N171D + N176T (trajectory 2) to preserve genetic diversity at N176. Libraries constructed for all three trajectories were then subjected to PACE on host cells expressing the triple mutant substrate HNLYGHS (SEQ ID NO: 173). The variants emerging at this stage of trajectory 1 and 2 were enriched for mutations at residues 146, 148, and 177, consistent with acceptance of the newly introduced PI substitution. Similarly, clones from trajectory 3 exhibit mutations at residues 209, 211, and 218 that may promote acceptance of the newly added P2 Gly substitution. Regardless of trajectory, all clones emerging at this stage exhibit at least one mutation from each of three targeted mutagenesis libraries (Table 6), suggesting that they have evolved activity on the triple mutant substrate.
[00157] Given the known tolerance of TEV protease for amino acids at positions P5 and Ρ , it was speculated that proteases evolved to recognize the triple mutant substrate HNLYGHS (SEQ ID NO: 173) might already exhibit activity on the final target substrate (HPLVGHM; SEQ ID NO: 3). Indeed, the phage arising from evolution on the triple mutant substrate successfully propagate in PACE on hosts expressing the HPLVGHM (SEQ ID NO:
3) substrate, and the resulting variants display weak apparent activity on the final target substrate (Figure 10). In order to evolve high levels of activity on the final target substrate from these weakly active mutants, three strategies were applied to increase selection stringency on all three trajectories: (1) express a lower concentration of the PA-RNAP substrate by using a weaker constitutive promoter (proA instead of proB); (2) substitute the flexible GGS linker that flanks the substrate with the native sequence from IL-23 (human IL- 23 residues 38-66); and (3) introduce a mutation in the T7 RNA polymerase portion of the PA-RNAP that decreases transcriptional activity (Q649S). It was confirmed that all three strategies indeed increased selection stringency (Figure 11).
[00158] First, the lowered substrate concentration strategy was applied using a mixing experiment to transition from proB to proA expression of the PA-RNAP; this experiment yielded modest changes in genotypes. Exploiting the ease of performing PACE on multiple lagoons in parallel, the other two strategies were implemented simultaneously on all three trajectories. The resulting six populations (trajectories la, lb, 2a, 2b, 3a, and 3b; see Figure 2) were carried forward into PACE on hosts expressing a PA-RNAP with both the IL-23 (38- 66) linker and the attenuated T7 RNAP mutant Q649S. In the final stage of PACE for all six populations, a proA promoter was used to generate less of the PA-RNAP containing the IL- 23 (38-66) linker and the Q649S mutation. This series of stringency modulation experiments produced variants with higher levels of apparent activity on the final HPLVGHM (SEQ ID NO: 3) substrate (Figure 12).
Characterization of Evolved TEV Protease Variants
[00159] Mutations that arise early in long evolutionary trajectories can create a cascade of contingencies because subsequent mutations must be compatible with the preexisting genetic context, a phenomenon known as epistasis. Genotypes suggest that epistasis strongly shaped the outcomes of trajectories 1 and 2, which were dominated by N176I vs. N171D + N176T, respectively, prior to the third stage of PACE. During subsequent evolution, the amino acid identity at amino acid 176 appears to have dictated the optimal identity of residue 177, such that the combinations N176I + N177S, or N176T + N177M, predominate trajectories 1 and 2, respectively. Swapping the identity of N177 between clones from trajectories 1 and 2 results in a substantial loss of activity (Figure 13), further consistent with epistasis at this position. It is likely that these genetic differences between trajectory 1 and 2 also later led to the enrichment of distinct mutations outside of the substrate-binding site (Tables 3 to 11). For example, mutations at position 203 persisted only in trajectory 1, while mutations at positions 28, 30, 68, 132, and 162 were only abundant in trajectory 2.
[00160] Unsurprisingly, a dramatically different outcome was observed in the third trajectory, which not only experienced a different schedule of stepping-stone substrates but was also subjected to a mixing experiment instead of NNK mutagenesis at residues 146, 148, 167 and 177. The data are consistent with a model in which a lack of diversification at these critical residues traps trajectory 3 in a local fitness maximum, evidenced by weak apparent activity on the final target substrate (Figures 10 and 12) and few genotypic changes after the fifth stage of trajectory 3 (Table 2). Consequently, while all six populations yielded TEV variants with apparent activity on the final target, the TEV protease variants that exhibited the highest apparent activity on the final target were all derived from trajectories 1 and 2.
[00161] Three representative proteases from the end of trajectories 1 and 2 were purified and assayed in vitro for their ability to cleave a model protein substrate in which MBP and GST were fused through a linker containing the final HPLVGHM (SEQ ID NO: 3) substrate sequence (Figure 14). All three evolved proteases cleaved the model substrate. The most active clone (TEV L2F (SEQ ID NO: 137) from trajectory 2, containing 20 non-silent mutations, Table 11) was selected for detailed characterization. The kinetic parameters of this mutant enzyme on wild-type (ENLYFQS; SEQ ID NO: 2) and target (HPLVGHM; SEQ ID NO: 3) substrate peptides were assayed using a previously described HPLC method. Unlike the wild-type enzyme, which exhibits no detectable activity on the HPLVGHM (SEQ ID NO: 3)_peptide, the L2F (SEQ ID NO: 137) variant processes this substrate with approximately 15% of the catalytic efficiency (kca K-M) with which TEV protease cleaves its native substrate (Table 17 and Figures 15A to 15D). Compared to wild-type TEV, evolved TEV L2F appears to maintain nearly identical kinetics on the canonical ENLYFQS (SEQ ID NO: 2) substrate, while experiencing only a modest 5-fold increased Km on the target substrate HPLVGHM (SEQ ID NO: 3). These results collectively indicate that PACE generated a highly evolved mutant protease that cleaves a target substrate containing mutations at six positions with only modestly lower efficiency than wild-type TEV protease cleaves its consensus substrate.
Table 17. Kinetic parameters of wild-type and evolved TEV proteases.
Figure imgf000076_0001
Substrate Specificity Profiling of an Evolved TEV Protease
[00162] Proteolysis assays on individual substrates reveal that evolved TEV protease
L2F (SEQ ID NO: 137) maintains the ability to detectably cleave starting and intermediate substrates while acquiring activity on the final IL-23 target (Figure 16). A previously reported phage substrate display method (Figure 3A) was applied to obtain an unbiased protease specificity profile. M13 bacteriophage encoding pill fused to a FLAG-tag through a library of substrate linkers were immobilized on anti-FLAG magnetic beads. When incubated with a protease of interest, phage encoding cleaved substrates are liberated from the solid support, while phage encoding the intact substrates remain immobilized and are eluted with excess FLAG peptide. The abundance of each substrate in the cleaved and eluted populations was measured by high-throughput DNA sequencing, yielding enrichment values (Table 18) and sequence logos (Figures 3B to3E) that convey protease substrate specificity across all possible amino acids.
Table 18: Phage Display Enrichment Values From Selections on Libraries with Single Residue Randomization.
Wild-type TEV
A C D E F G H I K L M N P Q R S T V W Y
P6 -0.19 0.08 1.03 5.69 0.99 -0.16 0.03 0.41 -0.20 0.42 0.31 -0.10 0.04 0.23 -0.14 -0.05 -0.01 0.26 0.37 -0.01
P5 0.30 -0.25 0.45 -0.07 0.25 -0.46 0.15 -0.19 -0.06 0.21 0.10 -0.51 -0.24 0.02 0.02 0.10 -0.04 0.20 0.14 0.24
P4 -0.34 -0.29 -0.31 -0.15 -0.33 -0.31 -0.40 5.16 -0.29 3.16 0.68 -0.32 -0.37 -0.39 -0.33 -0.35 -0.31 0.39 -0.21 -0.39
P3 -0.50 -0.52 -0.09 -0.54 0.84 0.30 -0.48 -0.11 -0.50 -0.37 -0.54 -0.45 -0.39 -0.52 -0.29 -0.56 -0.58 0.01 -0.26 4.08
P2 -0.19 -0.74 -0.48 -0.66 3.81 -0.39 -0.70 3.79 -0.69 0.48 0.85 -0.59 -0.03 -0.72 -0.62 -0.38 0.19 4.36 0.92 -0.72
P1 -0.22 -0.06 -0.14 -0.12 -0.16 -0.15 -0.08 -0.24 -0.24 -0.18 1.29 -0.02 -0.22 5.49 -0.27 -0.24 -0.17 -0.28 -0.16 -0.26
P11 2.05 -0.50 -0.03 -0.48 1.60 1.96 1.22 -0.39 -0.58 -0.44 1.38 0.07 -0.68 -0.33 -0.62 2.77 -0.33 -0.60 2.24 2.60
TE L2F (SEQ ID NO: 137)
T17S, H28L, T30A, N68D, E107D, D127A, F132L, S135F, T146S, D148P, S153N, F162S, S170A, N171 D, N176T, N177M, V209M, W211 I, M218F,
K229E
A C D E F G H I K L M N P Q R S T V W Y
P6 0.50 -0.59 -0.04 0.02 -0.08 0.01 -0.01 0.11 0.02 0.02 0.07 -0.03 -0.03 -0.02 0.01 -0.07 -0.01 0.02 0.15 -0.72
P5 0.01 -0.57 -0.08 0.03 -0.05 0.00 0.09 -0.05 0.01 0.01 0.06 -0.23 0.07 0.03 0.09 -0.01 -0.03 0.00 -0.02 0.06
P4 0.03 -0.40 -0.21 -0.12 -0.1 1 -0.32 -0.41 3.80 -0.25 4.24 1.30 -0.16 -0.31 -0.38 -0.35 -0.32 -0.33 1.23 -0.20 -0.37
P3 -0.66 -0.86 -0.58 -0.79 1.42 -0.51 -0.13 1.82 -0.65 1.12 1.30 -0.78 -0.88 0.02 0.00 -0.85 -0.36 1.73 1.43 1.36
P2 0.50 -0.85 -0.83 -0.79 3.04 -0.64 -0.81 1.76 -0.82 0.53 -0.18 -0.70 -0.56 -0.80 -0.73 -0.28 0.02 1.94 2.67 2.22
P1 0.45 -0.66 0.11 1.74 1.97 -0.08 2.35 -0.70 -0.62 1.89 2.60 0.68 -0.54 3.04 -0.54 0.12 1.33 -0.76 -0.15 2.71
P11 3.22 -0.51 0.98 1.36 2.07 2.13 1.99 3.16 -0.77 2.06 3.01 -0.80 -0.80 1.50 -0.74 2.45 2.26 2.69 2.52 2.87
TEV I138T, N171D, N176T
A C D E F G H I K L M N P Q R S T V W Y
P6 -0.81 -0.50 0.59 0.78 1.41 -0.51 -0.12 1.53 1.45 1.60 1.70 -0.42 1.50 1.06 -0.08 0.28 0.48 1.46 0.42 0.19
P5 0.63 -0.19 1.66 1.25 2.49 -0.76 0.85 2.32 -0.57 2.11 2.16 -0.06 2.39 0.93 -0.48 0.07 0.72 1.97 3.04 2.81
P4 -0.20 -0.22 -0.41 -0.25 -0.12 -0.16 -0.24 1.69 -0.13 4.15 -0.01 -0.13 -0.04 -0.14 -0.18 -0.21 -0.28 -0.25 -0.09 -0.27
P3 -0.26 -0.35 0.84 0.26 -0.08 1.62 -0.12 -0.33 -0.19 -0.25 -0.39 0.00 -0.19 -0.19 -0.12 -0.23 -0.30 -0.26 -0.26 3.35
P2 -0.06 -0.40 -0.10 -0.30 5.21 -0.18 -0.34 0.05 -0.29 -0.21 -0.38 -0.33 -0.77 -0.31 -0.18 -0.21 -0.26 1.74 -0.27 -0.48
P1 -0.10 -0.12 -0.47 -0.12 -0.13 -0.01 -0.20 -0.25 -0.06 -0.17 -0.1 1 -0.18 0.08 4.05 -0.08 -0.05 -0.07 -0.26 -0.11 -0.24
P11 1.24 -0.31 -0.44 -0.29 0.20 0.74 0.05 -0.25 -0.33 -0.27 -0.04 -0.37 -0.36 -0.43 -0.40 2.35 -0.41 -0.41 1.24 -0.06
TEV T146S, D148P, S153N, S170A, N177M
A C D E F G H I K L M N P Q R S T V W Y P6 0.48 0.35 1.31 2.68 0.52 -0.03 0.09 0.31 -0.16 0.26 0.36 0.18 0.15 0.34 -0.14 -0.01 0.10 0.21 0.18 0.19
P5 0.30 -0.48 0.20 0.30 0.28 -0.55 -0.06 0.42 -0.25 0.27 0.40 -0.23 0.30 0.30 -0.04 -0.08 0.17 0.32 0.30 0.17
P4 -0.34 -0.34 -0.39 -0.08 -0.35 -0.27 -0.37 4.37 -0.22 4.49 0.91 -0.20 -0.28 -0.24 -0.29 -0.33 -0.32 0.50 -0.31 -0.47
P3 -0.67 -0.73 -0.48 -0.59 0.83 0.02 -0.49 0.86 -0.72 -0.15 -0.38 -0.56 -0.74 -0.56 -0.45 -0.74 -0.69 0.81 0.66 2.27
P2 -0.36 -0.64 -0.58 -0.49 3.55 -0.47 -0.51 1.16 -0.65 0.14 -0.01 -0.42 -0.27 -0.56 -0.49 -0.37 -0.23 0.39 1.18 -0.43
P1 0.15 -0.43 0.14 1.21 1.55 -0.14 2.02 -0.78 -0.59 1.57 2.23 0.14 -0.61 2.16 -0.56 0.12 0.86 -0.78 0.08 1.99
P11 1.00 -0.32 0.49 0.38 1.11 0.81 0.54 0.90 -0.60 0.33 0.73 -0.76 -0.81 0.13 -0.56 1.92 0.32 -0.08 1.00 2.71
TEV V209M, W211I, M218F
A C D E F G H I K L M N P Q R S T V W Y
P6 0.91 0.31 0.39 2.61 0.19 -0.01 0.01 0.22 0.31 0.29 0.28 0.09 0.11 0.17 -0.12 0.05 0.10 0.18 0.05 -0.46
P5 1.06 -0.37 1.42 1.05 1.65 -0.78 1.14 2.04 -0.55 1.47 1.45 0.36 1.88 0.60 -0.45 0.19 0.84 1.90 1.78 1.94
P4 -0.31 -0.22 -0.07 -0.13 -0.23 -0.19 -0.30 2.68 -0.13 4.45 0.19 -0.11 -0.17 -0.18 -0.23 -0.27 -0.27 0.26 -0.19 -0.26
P3 -0.39 -0.46 -0.03 -0.30 0.32 0.72 -0.34 -0.10 -0.34 -0.17 -0.44 -0.11 -0.23 -0.32 -0.03 -0.45 -0.44 -0.09 -0.16 3.38
P2 0.06 -0.50 -0.42 -0.49 3.77 -0.28 -0.50 -0.26 -0.51 -0.37 -0.27 -0.27 0.41 -0.56 -0.38 -0.29 -0.21 0.05 1.41 0.49
P1 -0.03 -0.07 0.06 0.20 -0.22 0.07 0.04 -0.30 -0.08 -0.25 0.51 -0.01 -0.07 3.92 -0.12 -0.07 -0.02 -0.29 -0.23 -0.29
P11 0.79 -0.13 -0.07 -0.16 0.39 1.08 0.56 -0.33 -0.39 -0.36 -0.08 -0.50 -0.46 -0.30 -0.44 2.44 -0.24 -0.46 0.88 0.55
TEV H28L, T30A
A C D E F G H I K L M N P Q R S T V W Y
P6 0.56 0.18 0.24 0.33 0.58 0.04 -0.03 0.00 0.17 0.07 0.04 0.13 0.00 -0.05 -0.04 0.11 -0.01 -0.10 0.25 0.09
P5 0.11 -0.05 0.46 0.43 0.32 -0.39 -0.14 0.22 -0.45 0.32 0.66 -0.16 1.07 0.00 -0.38 -0.28 0.08 0.34 1.05 0.29
P4 -0.05 -0.03 -0.12 0.04 -0.1 1 0.03 -0.07 0.26 0.08 0.60 -0.12 0.04 0.06 -0.05 0.01 -0.01 -0.14 -0.10 -0.09 -0.20
P3 0.08 -0.07 -0.02 0.29 0.00 0.45 0.11 -0.16 0.23 -0.09 -0.20 0.22 0.11 0.03 0.04 0.03 -0.04 -0.20 -0.10 0.51
P2 0.07 -0.14 -0.13 -0.02 0.74 -0.08 -0.10 0.54 0.02 -0.09 -0.18 0.03 -0.35 -0.10 0.08 0.00 0.04 0.15 -0.18 -0.26
P1 -0.10 -0.09 0.16 0.01 -0.09 0.18 -0.07 -0.12 0.08 -0.16 -0.05 0.01 0.08 0.51 0.05 -0.02 -0.03 -0.15 -0.07 -0.14
P11 0.02 -0.20 -0.21 -0.08 0.21 0.16 -0.04 -0.02 -0.18 -0.11 0.24 -0.21 -0.19 -0.22 -0.26 0.33 -0.07 -0.1 1 0.97 0.06
TEV T17S, N68D, E107D, D127A, F132L, S135F, F162S, K229E
A C D E F G H I K L M N P Q R S T V W Y
P6 -0.85 -0.16 0.45 2.26 0.47 -0.06 -0.12 0.23 3.24 0.43 0.48 -0.09 -0.05 0.53 -0.08 -0.05 -0.02 0.16 -0.08 -0.05
P5 0.40 -0.42 0.10 0.90 0.92 -0.65 -0.31 1.30 -0.49 1.11 1.07 -0.44 1.43 0.59 -0.15 -0.30 0.53 1.13 1.47 0.92
P4 -0.21 -0.16 1.06 -0.06 -0.36 -0.06 -0.18 2.69 -0.12 2.92 0.00 -0.04 -0.10 -0.14 -0.12 -0.19 -0.21 0.00 -0.20 -0.36
P3 -0.27 -0.37 -0.34 -0.17 0.20 0.74 -0.23 -0.25 0.25 -0.26 -0.37 -0.08 0.06 -0.25 -0.09 -0.27 -0.35 -0.16 -0.15 2.34
P2 -0.25 -0.37 -0.24 -0.31 2.20 -0.33 -0.40 1.36 -0.39 -0.11 -0.17 -0.23 -0.31 -0.36 -0.21 -0.26 -0.16 0.80 0.19 -0.43
P1 -0.04 0.08 -0.45 0.08 -0.15 -0.12 0.00 -0.33 -0.12 -0.18 0.56 0.00 -0.02 2.36 -0.07 0.00 0.04 -0.36 -0.07 -0.27
P11 0.91 -0.35 -0.10 -0.38 1.30 0.49 0.64 -0.05 -0.46 -0.26 0.58 -0.56 -0.41 -0.40 -0.50 1.69 -0.30 -0.35 2.37 1.85
TEV E107D, D127A, S135F, R203Q, K215E
A C D E F G H I K L M N P Q R S T V W Y
P6 -0.94 0.10 0.10 0.62 0.53 -0.01 -0.13 0.14 2.29 0.38 0.41 -0.11 -0.02 0.39 -0.04 -0.04 -0.08 0.14 0.00 0.44
P5 0.16 -0.39 -0.55 -0.44 0.12 -0.52 -0.44 0.51 0.36 0.46 0.68 -0.58 1.43 0.49 1.08 -0.37 0.19 0.33 0.24 -0.06
P4 -0.12 -0.07 0.56 -0.02 -0.20 -0.01 -0.18 1.05 0.02 0.94 0.03 0.06 -0.07 -0.06 -0.04 -0.13 -0.11 -0.06 0.00 -0.21
P3 0.00 -0.17 1.01 0.07 0.14 0.05 -0.10 -0.21 0.14 -0.02 -0.14 0.15 -0.12 -0.05 0.05 -0.09 -0.21 -0.18 0.14 0.35
P2 -0.12 -0.14 -0.10 -0.15 1.07 -0.24 -0.24 0.58 -0.1 1 -0.08 -0.18 -0.10 0.98 -0.17 0.01 -0.1 1 -0.13 0.27 0.07 -0.31
P1 -0.14 0.21 -0.49 -0.18 0.29 0.02 -0.18 -0.18 -0.07 0.12 0.13 -0.18 -0.03 0.76 -0.06 -0.1 1 -0.10 -0.24 0.60 -0.08
P11 0.12 0.09 -0.27 -0.25 1.25 0.06 0.14 -0.21 -0.27 -0.15 0.14 -0.31 -0.27 -0.35 -0.30 0.72 -0.29 -0.35 1.50 1.24 TEV E107D. D127A, S135Fa
A C D E F G H I K L M N P Q R S T V W Y
P6 -0.81 0.25 0.57 2.22 1.16 -0.13 -0.08 0.90 2.59 1.14 0.94 -0.15 0.00 0.71 -0.14 -0.01 0.02 0.67 0.64 1.44
P5 0.15 -0.15 -0.08 0.20 0.42 -0.53 0.07 0.12 -0.18 0.27 0.47 -0.52 0.24 0.25 0.04 0.08 0.15 0.10 0.36 0.44
P4 -0.26 -0.14 0.80 -0.05 -0.20 -0.15 -0.27 3.37 -0.16 3.03 0.46 -0.02 -0.18 -0.29 -0.20 -0.25 -0.24 0.24 -0.03 -0.30
P3 -0.34 -0.40 -0.28 -0.27 0.73 0.75 -0.28 -0.15 -0.31 -0.25 -0.44 -0.16 -0.45 -0.32 -0.02 -0.41 -0.44 0.03 -0.12 2.57
P2 -0.37 -0.51 -0.54 -0.52 3.67 -0.55 -0.53 2.99 -0.56 0.09 0.03 -0.35 -0.18 -0.59 -0.41 -0.40 -0.22 2.52 1.22 -0.50
P1 -0.15 0.28 -0.47 -0.02 -0.1 1 -0.07 -0.02 -0.29 -0.09 -0.13 0.71 0.06 -0.07 2.88 -0.12 -0.03 -0.04 -0.36 0.05 -0.23
P11 1.45 0.19 -0.04 -0.39 1.59 1.17 0.89 -0.27 -0.46 -0.35 1.25 -0.48 -0.56 -0.28 -0.46 2.27 -0.34 -0.49 1.89 2.40
[00163] Each sub-table within the larger table represents the amino acid enrichment values generated for the given genotype of TEV protease. Each row within a sub-table contains enrichment values from a selection performed on the library in which the
corresponding position within the ENLYFQS (SEQ ID NO: 2) motif was randomized. The enrichment value for each amino acid identity at a given position was calculated as frequencycieaved/frequencyeiution- 1 ·
[00164] This specificity profiling technique was applied to seven separate libraries, each containing a single randomized position within the canonical ENLYFQS (SEQ ID NO: 2) substrate. These libraries are inherently biased by the identities of the residues that are held constant, but because of their small theoretical diversity, they are easy to construct and a single round of selection yielded robust enrichment values. This method was validated by enrichment of the consensus motif EXLYFQS (SEQ ID NO: 168) (where X = any amino acid) for wild-type TEV protease (Figure 3B). This substrate specificity profiling method was also applied to more complex libraries containing sets of three consecutive randomized amino acids within either the ENLYFQS (SEQ ID NO: 2) or HPLVGHM (SEQ ID NO: 3).substrate. The resulting specificity profiles from these larger libraries (Figure 17) did not substantially differ from the results of the single- site libraries. In addition, the identity of the constant residues (ENLYFQS; SEQ ID NO: 2 vs. HPLVGHM; SEQ ID NO: 3) had only modest impact on the resulting specificity profiles of TEV L2F (SEQ ID NO: 137), although the PI specificity of TEV L2F is more pronounced for His, the target residue, in the HPLVGHM (SEQ ID NO: 3) libraries (Figure 17 and corresponding enrichment values in Table 19).
Table 19: Phage Display Enrichment Values From Selections on Libraries with
Three Randomized Residues.
WT (SEQ ID NO: 1)
XXXYFQS
(SEQ ID NO: 174)
P6 P5
P4
WT (SEQ ID NO: 1) EXXXFQS
(SEQ ID NO: 175)
P5
P4
P3
WT (SEQ ID NO: 1) ENXXXQS
(SEQ ID NO: 176)
P4
P3
P2
WT (SEQ ID NO: 1) ENLXXXS
(SEQ ID NO: 177)
P3
P2
P1
WT ENLYXXX
(SEQ ID NO: 178)
P2
P1
pr
L2F (SEQ ID NO: 137) XXXYFQS
(SEQ ID NO: 174)
P6
P5
P4
L2F (SEQ ID NO: 137) EXXXFQS
(SEQ ID NO: 175)
P5
P4
P3
L2F (SEQ ID NO: 137) ENXXXQS
(SEQ ID NO: 176)
P4
P3
P2
L2F (SEQ ID NO: 137) ENLXXXS
(SEQ ID NO: 177)
P3
P2 P1
L2F (SEQ ID NO: 137)
ENLYXXX
(SEQ ID NO: 178)
P2
P1
pr
L2F (SEQ ID NO: 137)
XXXVGHM
(SEQ ID NO: 179)
P6
P5
P4
L2F (SEQ ID NO: 137)
HXXXGHM
(SEQ ID NO: 180)
P5
P4
P3
L2F (SEQ ID NO: 137)
HPXXXHM
(SEQ ID NO: 181)
P4
P3
P2
L2F (SEQ ID NO: 137)
HPLXXXM
(SEQ ID NO: 182)
P3
P2
P1
L2F (SEQ ID NO: 137)
HPLVXXX
(SEQ ID NO: 183)
P2
P1
pr
[00165] Each sub-table within the larger table represents the amino acid enrichment values generated for the given genotype of TEV protease on the specified library. Each set of three rows contains enrichment values after two rounds of selection performed on the library in which the corresponding three positions within either the ENLYFQS (SEQ ID NO: 2) or HPLVGHM (SEQ ID NO: 3) motif was randomized. The enrichment value for each amino acid identity at a given position was calculated as frequencycieaved/frequencyCOntroLseiection- 1 · [00166] When the specificity profile of evolved TEV L2F (SEQ ID NO: 137) (Figure
3C) is compared to that of wild-type TEV, a number of differences are apparent: TEV L2F shows a broadening of specificity at P6, a shifting of P3 specificity towards aliphatic residues He and Val, a shifting of PI specificity to include His, and a shifting of Ρ specificity towards aliphatic amino acids Ala, He, and Met. These changes are largely consistent with evolutionary pressure to cleave the target substrate HPLVGHM (SEQ ID NO: 3). A notable absence of altered specificity at the P2 Gly position indicates that affinity for this substitution may offer the largest remaining potential gains in target substrate cleavage efficiency.
Specificity Profiling Reveals Functionally Independent TEV Mutation Groups
[00167] To illuminate the molecular basis of the evolved changes in substrate specificity, TEV mutants were generated containing small subsets of mutations and profiled their substrate specificities using substrate libraries in which a single residue of the
ENLYFQS (SEQ ID NO: 2) substrate was randomized. A number of mutations were predicted to influence solubility and stability based on previous reports or their distance from the substrate in the crystal structure. Various combinations of the predicted solubility mutations (T17S, N68D, E107D, D127A, F132L, S 135F, F162S, R203Q, K215E, K229E) as well as mutants that putatively influence specificity at PI (T146S, D148P, S 153N, S 170A, N177M), P6 (I138T, N171D, N176T), and P2 (V209M, W211I, M218F) were constructed based on the emergence of these mutations during PACE.
[00168] All of the tested combinations of mutations resulted in proteases that retained activity to varying degrees (Figure 18), despite being taken out of their PACE-evolved contexts. The solubility-enhancing mutants exhibited no significant change in specificity (Figure 19). The P2 variant also did not display any substantial specificity changes, consistent with the lack of a strong change in P2 specificity in the TEV L2F specificity profile (Figure 19).
[00169] Mutations in the P6 variant (I138T, N171D, N176T) are sufficient to confer loss of glutamate specificity at P6 with no other obvious changes to substrate preferences (Figure 3D), indicating some degree of modularity in protease-substrate interactions.
Conversely, the PI variant (T146S, D148P, S 153N, S 170A, N177M) not only exhibits broadened specificity at the PI site, but also shows a concurrent increased affinity for P3 aliphatic side chains (Figure 3E). The mutations within these two variants appear to be responsible for the three largest differences in substrate specificity between wild-type TEV and TEV L2F. Evolved TEV L2F Cleaves Human IL-23 with Minimal Off-Target Activity
[00170] Next, the ability of the evolved TEV L2F (SEQ ID NO: 137) protease to cleave full-length human IL-23 protein was tested. In its active form, IL-23 is a heterodimer between the IL-12p40 subunit and the IL-23pl9 subunit. TEV L2F (SEQ ID NO: 137) was incubated with IL-23 in either its heterodimeric or monomeric pl9 state, and the formation of a single cleavage product for the full IL-23 heterodimer was observed by Western blot, and in the presence of excess protease, two cleavage products for the monomeric IL-23pl9 substrate (Figures 20 A to 20B).
[00171] IL-23 digestion reactions were subjected to LC-MS to identify the cleavage products. The heterodimer cleavage reaction generated a new protein of mass 3,598 Da less than the starting material, matching the fragment liberated by cleavage of the target peptide bond at the HPLVGH//M (SEQ ID NO: 8) sequence (Figures 21A to 21D). Data from the monomer cleavage reaction in the presence of a 10-fold excess L2F (SEQ ID NO: 137) protease revealed two new masses. The major proteolytic product corresponds to a single cleavage at the on-target site (HPLVGH//M; SEQ ID NO: 8) (Figures 22A to 22D). The less abundant ion was a match for proteolysis at both the on-target site (HPLVGH//M; SEQ ID NO: 8) and an additional off-target site (ARVFAH//G; SEQ ID NO: 9) that is consistent with the L2F (SEQ ID NO: 137) specificity profile shown in Figure 3C. The absence of an ion corresponding to IL-23 cleaved at only the off-target site suggests that the on-target site is kinetically favored by TEV L2F. This off-target site was only cleaved in the monomeric substrate and not the heterodimer presumably because it is occluded by the IL-12p40 subunit in the heterodimer structure.
Evolved TEV Protease Deactivates IL-23 and Prevents IL-17 Secretion
[00172] Finally the ability of the evolved TEV L2F (SEQ ID NO: 137) protease to abrogate the biological activity of IL-23 was tested. A previously described IL-23 activity assay with primary isolates of mouse mononuclear splenocytes was used. When cultured in the presence of IL-2 and IL-23, Thl7 cells are stabilized and secrete IL-17 into the media supernatant, which is quantified by ELISA. A dose-dependent attenuation of IL-17 production when IL-23 was pre-incubated with TEV L2F (SEQ ID NO: 137) (Figure 4) was observed. These pre-incubated samples were also visualized by Western blot demonstrating that the p40 subunit is unaffected by incubation with protease, and that inhibition of IL-17 production is causally linked to IL-23pl9 cleavage (Figure 23). Even a sub- stoichiometric dose of L2F (SEQ ID NO: 137) protease (0.36 equivalents) resulted in conversion of greater than 50% of IL-23 into cleaved product (Figure 24) and the loss of nearly all IL-23-induced IL-17 secretion, consistent with the action of TEV L2F (SEQ ID NO: 137) in a catalytic manner to degrade IL-23 (Figure 25). In contrast, addition of neutralizing IL-23 antibody elicits a dose-dependent attenuation of IL-17 production in which the minimum effective dose is stoichiometric with IL-23 concentration (Figure 26). Direct addition of TEV L2F (SEQ ID NO: 137) to splenocyte cultures in serum-containing media supplemented with IL- 23 did not attenuate IL-17 secretion (Figure 26). While the presence of an equivalent concentration of serum did not inhibit cleavage in vitro (Figure 27), it is possible the protease is sequestered by other secreted factors or cell-surface proteins within the complex culture media, or that IL-23 binding to IL-23R may occur faster than IL-23 proteolysis.
Materials and Methods
Ranking of Target Sites within Extracellular Proteins
[00173] A list of human extracellular and transmembrane proteins with their corresponding amino acid sequences were tabulated using the ProteinData functionality in Mathematica 10. This data was transferred into MATLAB for further processing by a customizable script that performed the following operations described below (MATLAB script for Extracellular Target Substrate Search). A rating matrix that is 7 wide (for the seven sites within the TEV protease recognition motif) by 20 long (for each amino acid) was manually populated with subjective evolvability integer ratings. Each protein was converted into a binary sparse matrix with as many rows as the length of the protein sequence and 20 columns one for each amino acid. For each protein matrix, 7 rows at a time were multiplied by the "specificity" matrix, with the trace of the resulting 7x7 product matrix providing a score for the heptapeptide. For each extracellular protein the best score and the
corresponding peptide/starting-residue index were saved. Once all protein sequences had been processed, the protein names were sorted along with their best-match substrates by score.
Cloning of Accessory Plasmids, Expression Vectors and Phage Libraries
[00174] All primers were designed to perform USER cloning and ordered from
Integrated DNA Technologies (IDT). For the cloning of phage libraries, NNK codons were generating using hand-mixed nucleotide ratios to provide even incorporation rates. All PCR reactions were performed using Phusion U Hot Start polymerase (Thermo-Fisher). [00175] For the assembly of APs and expression vectors, PCR products were purified using EconoSpin columns (Epoch Life Sciences) and assembled with DPNI and USER Enzyme in CutSmart Buffer (New England BioLabs). Following assembly, plasmids were transformed into NEB Turbo Competent E. coli high-efficiency (New England BioLabs).
[00176] For the assembly of phage libraries, PCR products were gel extracted using the MinElute kit by Qiagen. Following an assembly reaction identical to that of the AP, USER reaction was desalted using the MinElute PCR purification kit (Qiagen) prior to electroporation into competent E. coli S 1059 (for SP libraries) or S 1030 (for substrate display phage libraries).
PACE Experiments
[00177] PACE experiments were performed as previously described. E. coli strain
S 1030 was co-transformed by electroporation with a mutagenesis plasmid (MP6) and an accessory plasmid (plasmids are described in Table 20 and detailed in Figures 5 and 6). Chemostats containing 80ml of Davis Rich Media+ 22.5 g/ml carbenicillin and 15 g/ml chloramphenicol were inoculated with overnight starter cultures and grown at 37 °C while mixing at 250 rpm via a magnetic-stir bar. Once the chemostat grew to approximately OD6oo=l-0, a dilution began with fresh media at a rate of 80-100 ml/hour, with the waste needle set at a height of 80 ml. At the same time, the flow of chemostat culture began at approximately 10-20 ml/hour into a lagoon with a waste needle set at a height of 15 mL. This flow rate was selected based upon the difficulty of a given experiment, with slower dilution being more appropriate for challenging evolutions. For the full duration of the experiment 10% w/v arabinose solution was syringe-pumped into the lagoons at a rate of 0.5-1.0 ml/hr.
Table 20: Plasmids Used in PACE and Protein Expression.
Protease cleavage
Shorthand Glycerol Origin of Resistance site SEQ Name Type Stock Replication Marker Description/Features ID NO:
Chlorampheni pBad dnaQ926, dam, seqA, emrR, ugi,
MP6 MP MSP513 cloDF col cdal
122-1182- proB -ly sozyme-ggs -ENLYFQS -ggs- 2 proB AP MSP955 pSClOl Carbenicillin T7RNAP//T7pro-gIII-lux
122-432- proB -ly sozyme-ggs -HNLYFQS -ggs - 4 proB AP MSP565 pSClOl Carbenicillin T7RNAP//T7pro-gIII-lux
Figure imgf000086_0001
Figure imgf000087_0001
Figure imgf000088_0001
[00178] The plasmids are listed with important features including origin of rep ication, resistance marker, and encoded proteins.
[00179] Experiments starting with an NNK mutagenized SP library initiated with a lagoon inoculum of l-2mL of phage library containing 10 8-1010 pfu/ml. For all other experiments, lagoons were inoculated with 50-100μ1 of filtered phage population from the last time point of the previous PACE experiment. In certain cases for a period of 24 to48 hours, lagoons received an influx of cell culture from two separate chemostats containing hosts bearing two different APs (combined rate of 10-20mL/hour). This mixing strategy was used to transition between stepping-stone substrates or from low to high selection stringency APs.
[00180] Phage samples were aspirated via the waste needle of the lagoon at 24 hour intervals and passed through a 0.22μιη sterile filter. The titers of phage samples were evaluated by plaque assay using strain S 1059 as hosts. Briefly, phage were prepared in four 50-fold serial dilutions of volume 50μ1, to this was added ΙΟΟμΙ of fresh host culture at approximately
Figure imgf000088_0002
followed by addition of 900μ1 of top agar (2xYT, 6g/L). The mixture was pipet up and down once to mix and transferred to a quarter-plate prepared with a thin layer of bottom agar (2xYT, 16g/L). The plaque assays were incubated overnight at 37 °C. [00181] At the end of each PACE, individual plaques were picked with a pipet tip in order to provide template material (SP-infected E. coli) for rolling circle amplification (TempliPhi, GE Healthcare). The same pipet tip was subsequently transferred to a 96-deep well culture plate containing 2xYT media for growth overnight shaking at 37°C. Sanger Sequencing was performed using primer BCD 1136 (Table 21), and results were aligned and tabulated using SeqMan (DNAStar).
Table 21: Sanger and Illumina Sequencing Primers.
Figure imgf000089_0001
Luminescence Assays of PACE Evolved Clones
[00182] After analyzing sequencing data, clones were chosen for characterization and clonal phage stocks were sterile-filtered from the corresponding position within the 96-well culture plate. Saturated overnight cultures of S 1030 containing the same AP from PACE or the wild-type TEV AP were used to initiate luciferase assays in 96-well culture plates.
Approximate volumes were 500 μΐ 2xYT, 50μ1 overnight starter culture, 10 μΐ filtered phage samples. All assays included negative control (no phage), positive control (SP encoding T7 RNAP), and wild-type TEV SP control. Experimental and control conditions were performed in triplicate. After 3 to5 hours of growth in a 37 °C shaker, ΙΟΟμΙ was transferred to a clear- bottom assay plate to measure OD6oo and luminescence on a Tecan Infinite Pro Plate Reader. Measurements were analyzed as OD-normalized values, with particular attention to the fold- change over background signal (no phage condition).
Purification of TEV Proteases and Fusion Protein Substrates
[00183] TEV protease was purified as previously described, but with minor modifications. OneShot BL21 Star (DE3) chemically competent cells (Invitrogen) were transformed with expression vectors encoding MBP fused through a TEV cleavage site to a 6xHis-tagged TEV protease. Five milliliters of saturated overnight starter culture was added to 1-2 L of LB + kanamycin (4(^g/ml), and grown at 37 °C until OD6oo~0.7. Expression was induced with lmM IPTG for 4 hours at 30°C; cells were harvested by centrifugation at 6000g for 5 minutes. The pellet was resuspended in 15-25 ml binding buffer (10% glycerol, 50 mM Tris pH 8.0, 1.0 M NaCl, lmM DTT, and 20mM imidazole) with a Roche Complete EDTA- free protease inhibitor tablet (TEV protease is unaffected by conventional inhibitors). Cells were lysed by sonication for 4 minutes with a 1 second on 1 second off cycle at medium power. Lysate was clarified by centrifugation at 18,000g for 20 minutes. Clarified lysate was incubated with l-2ml TALON metal affinity resin (Clontech) for 1 hour mixing end- over-end at 4°C. Resin was pelleted at 700g for 5 minutes, and resuspended with 10 ml of binding buffer to load onto a gravity flow column. Resin was washed with 10 column volumes of binding buffer, followed by 2 column volumes of bind buffer with imidazole supplemented to 50mM. TEV protease was eluted with 4 column volumes of elution buffer (10% glycerol, 50mM Tris pH 8.0, 0.1 M NaCl, lmM DTT, and 250mM imidazole). The purity of fractions was assessed by SDS-PAGE using precast Bolt 4-12% Bis-Tris gels (Thermo Fisher), and TEV containing fractions were pooled and concentrated to <250μ1 using an Amicon Ultra centrifugal filter with a lOkDa molecular weight cut-off (EMD-Millipore). The concentrated sample was further purified to >95% using a SuperDex 200 Increase 10/300 column (GE Healthcare) running with storage buffer (20% glycerol, 50mM Tris pH 8.0, 0.1 M NaCl, lmM DTT). Proteases used in mammalian cell culture were further subjected to endotoxin removal resin (Pierce), followed by assaying with an LAL endotoxin quantification kit (Pierce). Protein concentrations were determined by Bradford Assay (ThermoFisher) and aliquots were frozen in liquid nitrogen for storage at -80°C.
[00184] The purification protocol for MBP-GST test substrates was adapted from a published report. OneShot BL21 Star (DE3) chemically competent cells (Invitrogen) were transformed with expression vectors encoding MBP fused to GST through a linker containing one of a number of TEV cleavage site variants. Five milliliters of saturated overnight starter culture was added to 500ml of LB + kanamycin (40μg/ml), and grown at 37°C until
OD6oo~0.7. Expression was induced with lmM IPTG for 16 hours at 20°C; cells were harvested by centrifugation at 6000g for 5 minutes. The pellet was resuspended in 15-25ml binding buffer (50mM Tris pH 8.0, 0.5M NaCl) with a Roche Complete EDTA-free protease inhibitor tablet. Cells were lysed by sonication for 4 minutes with a 1 second on 1 second off cycle at medium power. Lysate was clarified by centrifugation at 18,000g for 20 minutes. Clarified lysate was incubated with 1ml glutathione sepharose (Clontech) for 1 hour mixing end-over-end at 4°C. Resin was pelleted at 700g for 5 minutes, and resuspended with 10ml of binding buffer to load onto a gravity flow column. Resin was washed with 40 column volumes of binding buffer, followed 4 column volumes of elution buffer (50mM Tris pH 8.0, lOOmM NaCl, lOmM glutathione). Samples were >95% pure as assessed by SDS-PAGE and were dialyzed against storage buffer (20% glycerol, 50mM Tris pH 8.0, 0.1 M NaCl, ImM DTT) using Slide- A-Lyzer Cassettes filter with a lOkDa molecular weight cut-off
(Thermo Fisher). Protein concentrations were determined by Bradford Assay (ThermoFisher) and aliquots were frozen in liquid nitrogen for storage at -80 °C.
Assaying Proteolysis of Fusion Protein Substrates
[00185] Protease assays consisted of 1.5μξ of MBP-GST substrate and 0.25 μg of TEV protease incubated for 3 hours at 30°C in storage buffer (20% glycerol, 50mM Tris pH 8.0, 0.1M NaCl, ImM DTT) supplemented with an additional freshly prepared DTT to a final concentration of 2mM. Reactions were run on SDS-PAGE and visualized with Coomassie Stain to roughly gauge the activity level on a panel of different substrates.
HPLC Kinetics Assay
[00186] TEV protease kinetics were assessed as previously described, but with minor protocol adjustments. Synthetic peptide substrates (THPLVGHMGTRRW (SEQ ID NO: 190)- dinitrophenol-lysine and TENLYFQS GTRRW (SEQ ID NO: 191)-dinitrophenol- lysine) and synthetic standards for cleaved products (MGTRRW (SEQ ID NO: 192)- dinitrophenol-lysine and SGTRRW (SEQ ID NO: 193)-dinitrophenol-lysine) were ordered from Genscript. Dinitrophenol moieties provided high-intensity signal in the 355nm range for more accurate quantification of low concentrations. Reactions and standards were run on a C18 reverse-phase column (Kinetex 5μ C18 100A, Phenomenex) using an acetonitrile gradient from 5-50%. Standard curves were constructed for both products (MGTRRW (SEQ ID NO: 192)- dinitrophenol-lysine and SGTRRW (SEQ ID NO: 193)-dinitrophenol-lysine) to enable quantification of reaction progress.
[00187] Reactions were carried out at 0.05-0.1 μΜ protease and 50μΜ-2ιηΜ substrate.
Proteases (in storage buffer plus ImM DTT freshly added) and substrates (in sterile water) were prepared at 2x concentration in 50μ1 each and combined to yield a total reaction volume of ΙΟΟμΙ. Reactions were incubated at 30°C for 10 minutes and quenched with 25μ1 of 5% TFA. After quenching, protease was eliminated from samples using an Amicon Ultra Centrifugal Filter with a lOkDa cut-off (EMD-Millipore). Prior to conducting reactions in triplicate, all conditions were prepared in singlet and monitored at 5, 10, 30 minutes to ensure that 10 minutes was within the linear range. Peak integrations were tabulated, converted into product concentrations using the standard curves, and fit to the Michaelis-Menten kinetics model in Prism GraphPad.
Phage Substrate Display Selection
[00188] For each combination of library and protease, 60 μΐ of a 50% suspension of
Anti-FLAG M2 Magnetic beads (Sigma) was transferred into a 1.5ml Eppendorf tube. For all subsequent manipulations, a magnetic plate was used to separate beads and aspirate the supernatant. After washing with 1ml of TBS (20mM Tris pH 7.0, 150mM NaCl), beads were incubated with 30-100μ1 of substrate phage libraries (titers ranged from 10 8 -1010 pfu/mL) in lml of TBS at room temperature for 2 hours rotating end-over-end. After initial binding, the supernatant was discarded and beads were washed with lml of TBS. Beads with bound substrate phage were incubated in 0.5ml TBS containing 0.5μΜ protease for 2 hours.
Supernatant containing cleaved substrate phage was recovered, and the beads were again washed with lmL of TBS. The remaining bound uncleaved substrate phage was eluted in 0.1ml of TBS containing O. lmg/ml FLAG peptide (Sigma).
[00189] For substrate libraries containing a single randomized amino acid position, a single round of selection was sufficient. For substrate libraries containing windows of three randomized amino acids, a second round of selection was necessary to detect enrichment. In order to run a second round of phage display, the round 1 cleaved substrate phage must be regenerated. Titers were expanded in overnight cultures consisting of ΙΟΟμΙ cleaved substrate phage and 900μ1 S 1030 culture (diluted to OD6oo~0.1). Following outgrowth, cultures were centrifuged to pellet E. coli prior aspirating the supernatant phage to be used in a second round of phage substrate display as previously described. Due to expansion biases during the outgrowth, these specificity profiles were only interpretable once normalized to the second round elution of the no protease control experiments.
High-Throughput Sequencing and Data Analysis Generating Specificity Profiles
[00190] Samples were PCR amplified using Q5 Hot Start 2x Master Mix (NEB) with
Ιμΐ of template phage sample and primers (MSP819 or MSP820 and MSP824 Table 21). Illumina barcodes were added in a second PCR reaction using Ιμΐ of the first round PCR material as template. Samples were pooled and gel extracted using a MinElute Gel
Extraction kit (Qiagen). The concentration of the pooled library was first assessed by Quant- iT PicoGreen dsDNA Assay (ThermoFisher) and diluted to approximately 4nM. This concentration was adjusted based upon qPCR (Kapa Biosystems). Samples were loaded onto an niumina MiSeq using a v2 50 cycle kit set up to run a single-direction read of 50 nucleotides.
[00191] Data was automatically demultiplexed by MiSeq Reporter software and the resulting fastq files were processed by a custom Python script (Python Script for Processing High-Throughput Sequencing Data from Phage Substrate Display Experiments). This script searches each sequencing read for a perfect match to sequences flanking both sides of the proteolysis site. If the proteolysis site in between these matching flanks is exactly 21 nucleotides, then the proteolysis site is translated to a seven amino acid sequence. Sequences were disregarded in subsequent analysis if: they contained a stop codon, were template material used in library cloning (HNLYGHS; SEQ ID NO: 137), or were the FLAG tag sequence (YKDDDDK; SEQ ID NO: 194) due to spontaneous genetic deletion of the proteolysis site. The list of proteolysis sites was tabulated into a position- specific amino acid frequency table. For each library-protease combination, enrichment values were calculated as freqcleaved/freqelution-l- F°r eacn protease, specificity data from the randomized position within single- site libraries could be combined into a single table and turned into a sequence logo using a the Seq2Logo webserver. For libraries containing windows of three randomized positions, sequence logos for each protease-library combination were separately generated.
Western Blot Visualization of Recombinant IL-23 Cleavage
[00192] IL-23 was purchased as a Myc-tagged monomer (TP309680 Origene) and as a heterodimer IL-23pl9+IL12p40 (PHC9321 ThermoFisher). Five micrograms of heterodimer or 0.44μg of monomer was incubated with 5μg of TEV L2F (SEQ ID NO: 137) for 3 hours at 30°C in storage buffer (20% glycerol, 50mM Tris pH 8.0, 0.1 M NaCl, ImM DTT) supplemented with additional freshly prepared DTT to a final concentration of 2mM.
Samples were run on precast Bolt 4-12% Bis-Tris gels (ThermoFisher), and transferred to a PVDF membrane using the iBlot 2 Dry Blotting System (ThermoFisher). Membranes were incubated at room temperature for 30 minutes in Odyssey Blocking Buffer (LiCor). Primary antibody (IL-23 Antibody (26H20L23), ABfinity™ Rabbit Monoclonal, ThermoFisher) was added to blocking buffer in 1: 1000 dilution, and the membrane was incubated on a rocker at 4°C overnight. After 3 washes with TBST (20mM Tris pH 7.0, 150mM NaCl, 0.1% Tween 20), the membrane was incubated for 1 hour at room temperature in blocking buffer containing a 1: 1000 dilution secondary antibody (IRDye 800CW Donkey anti-Rabbit IgG, LiCor). After 3 more washes with TBST, the membrane was scanned using an Odyssey Imaging System.
LC-MS Identification of Cleavage Sites
[00193] IL-23pl9 (Origene) and IL-23 heterodimer (ThermoFisher) were reduced with lOmM DTT to identify intact masses of unreacted IL-23pl9 subunits. IL-23 substrates were incubated in a manner similar to Western blots using ^g of substrate and 5μg of TEV L2F (SEQ ID NO: 137). All samples were analyzed using an Agilent LC-MS 6220 (ESI-TOF) equipped with an Agilent PLRP-S column. A standard protein LC method was used containing a 15-minute reverse-phase gradient (0.1% formic acid in water, MeCN 0.1% formic acid).
IL-23-induced IL-17 Production in Mouse Mononuclear Splenocytes
[00194] The following protocol was adapted from those previously published. Two male mice (C57BL/6J) were euthanized and dissected to isolate spleens. Spleens were pulverized into lOmL of cell culture media (DMEM, Glutamax, high-glucose, penicillin, streptomycin, 10% FBS ThermoFisher) through a ΙΟΟμιη nylon mesh Falcon Cell Strainer (Corning). Single cell suspensions were then centrifuged for 3 minutes at 700 g, and the supernatant was discarded. The pellet was resuspended in 1ml ACK lysis buffer
(Gibco/Thermo Fisher). After 5 minutes, lysis was stopped with the addition of 9ml of DMEM, and cells were pelleted by centrifugation at 700g for 3 minutes. If the pellet was red due to remaining red blood cells, ACK lysis was repeated. Otherwise if the pellet was white, lysis was complete and cells were resuspended in 4ml of DMEM. Cell density was quantified using a hand-held Scepter 2.0 Handheld Automated Cell Counter (Millipore). Cultures were diluted to 2xl06 cells/ml in cell culture media supplemented with recombinant human IL-2 lOOunits/ml (Roche). An outer perimeter of wells within a 96- well round bottom culture plate were filled with ΙΟΟμΙ of cell-free media to prevent evaporative loss in central wells containing cell culture. These central wells were prepared in triplicate filled first with 125μ1 of culture followed by 25μ1 of additives. Cell culture supernatant was sampled after two days of growth, and ΙΟμΙ was used to perform a Mouse IL-17 ELISA (R&D Systems). [00195] Additives containing IL-23 and varying doses of protease or neutralizing antibody (MAB 1510, R&D Systems) were prepared in cell culture media immediately prior to mixing with splenocytes. Additives containing doses of proteases were also prepared as pre-incubated samples at 300x final concentration. Incubation was performed at 4°C for 16 hours in storage buffer (20% glycerol, 50mM Tris pH 8.0, 0.1 M NaCl, ImM DTT) supplemented with 2.5mg/ml of BSA carrier-protein to enhance stability during incubation. These pre-incubated samples were prepared at high concentration to confirm cleavage efficiency by Western blot as described in the previous method section. However, this Western blot was conducted using two primary antibodies Anti-IL-12p40 (ab62822
Abcam)/Anti-IL-23pl9 (sc271279 Santa Cruz) and two secondary antibodies IRDye800CW Donkey anti-Mouse/IRDye 680RD Donkey anti-Goat (LiCor).
Matlab Scripts
MATLAB Script for Extracellular Target Substrate Search
%We iterated through each entry in ProteinList,
%which is the list of extracellular protein amino acid sequences
%(corresponding gene names are stored in ExtracellularNames)
for i=l:length(ProteinList)
aa=ProteinList{ i } ;
%convert amino acid letters to a sequence of integers
protein=aa2int(aa) ;
%check that there are only the 20 canonical amino acids
if sum(protein>20)==0 && sum(protein<=0)==0
%initialize and empty score output
SpecificityScore=[] ;
%convert integer protein code to a sparse matrix
proteinmat=sparse([ 1 :length(protein)] ,
double(protein),ones(l,length(protein)),length(protein),20);
%for every window of seven amino acids we calculate a match score
%(SpecificityScore) by multiplying the sparse protein sequence matrix
%by the scoring matrix (TEVspecificity) and sum the diagonal by taking the trace of the product
for j=l :length(protein)-6 SpecificityScore=[SpecificityScore; trace(proteinmat(j:j+6,:)*TEVspecificity)]; end
%find the max score and the index to locate the substrate sequence
[C,I] =max(SpecificityS core) ;
score(i,l)=C;
% store the corresponding best match peptide in aligns
aligns(i,l:7)=aa(I:I+6);
starts(i)=I;
end
end
%sort the best matches for each protein by score
[score,I]=sort(score,'descend');
aligns=aligns(I,l:7);
hits=ExtracellularNames(I) ;
starts=starts(I);
Python Script for Processing High-Throughput Sequencing Data from Phage Substrate Display Experiments
In [l]:
%matplotlib inline
import numpy as np
import scipy as sp
import matplotlib as mpl
import matplotlib.cm as cm
import matplotlib.pyplot as pit
import pandas as pd
pd.set_option('display. width', 500)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)
import seaborn as sns
sns . set_style( "whitegrid " )
sns.set_context("poster")
import requests
import time from bs4 import BeautifulSoup
import regex
import re
import os
from Bio import SeqIO
import Bio
from Bio import motifs
from Bio.Alphabet import IUPAC, Gapped
alpha = Gapped(IUP AC. protein)
In [2]:
#specify input and output directories; iterate through fastq files
indir='/Users/michaelpacker/Desktop/Liu_Lab/MiSeqData/021816Miseq/fastq/'
outdir='/Users/michaelpacker/Desktop/Liu_Lab/MiSeqData/021816Miseq/'
filenames=os.listdir(indir)
for i in range(len(filenames)):
seqs={ }
#check that the file is fastq
if filenames[i][-5:]=='fastq':
#read fastq
for record in SeqIO.parse(indir+filenames[i], "fastq") :
#trimming to protease substrate
#split on sequence immediately before protease substrate check that there are two entries in the split string
if len(record.seq.tostring().split('AGAACCACC'))>=2:
#take the second string from the first split, and split again on the sequence immediately after the protease substrate
sequence=record.seq.tostring().split('AGAACCACC')[l]
#check substrate length
if len(sequence)>=21 :
seqs [record.id]=sequence[0:21 ]
#only save substrates free of stop codons and ambiguous bases, filter out library cloning template sequence HNLYGHS ( SEQ ID NO: 173) and display truncation sequence
YKDDDDK (SEQ ID NO: 194)
substrates=[Bio.Seq.translate(Bio.Seq.reverse_complement(x)) for x in seqs.valuesQ if '*' not in Bio.Seq.translate(Bio.Seq.reverse_complement(x)) if
Bio.Seq.translate(Bio.Seq.reverse_complement(x))!='HNLYGHS (SEQ ID NO: 173) 'if Bio.Seq.translate(Bio.Seq.reverse_complement(x))!='YKDDDDK (SEQ ID NO: 194) 'if 'X' not in Bio.Seq.translate(Bio.Seq.reverse_complement(x))]
#use motifs to calculate normalized amino acid frequencies
M=motifs.create(substrates, alphabet=alpha)
pd.DataFrame(substrates).to_csv(outdir+filenames[i]+'substrates.csv')
pd.DataFrame(M. counts). to_csv(outdir+filenames[i]+'counts.csv')
pd.DataFrame(M. counts. normalize()).to_csv(outdir+filenames[i]+'normalizedcounts.csv')
Example 2: Additional Protease Variants
[00196] After positive selection for TEV protease variants that cleave the substrate
ENLYAQS, a mixture of genotypes was enriched. Examples of mutations present in the protease variants are shown in Table 22.
Table 22
Figure imgf000098_0001
[00197] Some variants retained activity on the wildtype substrate ENLYFQS, while the variants containing the mutation V216F cleaved only the ENLYaQS substrate (Figure 28).
[00198] After positive selection, some clones which cleave both substrates (like wild- type TEV) and some that only cleave the ENLYAQS substrate, were observed. Table 23
Figure imgf000099_0001
[00199] After simultaneous positive and negative selection for variants that cleaved the mutant substrate ENLYAQS but not the wild-type substrate ENLYFQS, all variants contained the V216F mutation that appears to be responsible for the specificity towards the ENLYAQS substrate (Table 23, Figure 29).
[00200] All variants derived from simultaneous positive and negative selection show apparent cleavage of the mutant substrate but not the wild-type substrate.
EQUIVALENTS AND SCOPE
[00201] Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents of the embodiments described herein. The scope of the present disclosure is not intended to be limited to the above description, but rather is as set forth in the appended claims.
[00202] Articles such as "a," "an," and "the" may mean one or more than one unless indicated to the contrary or otherwise evident from the context. Claims or descriptions that include "or" between two or more members of a group are considered satisfied if one, more than one, or all of the group members are present, unless indicated to the contrary or otherwise evident from the context. The disclosure of a group that includes "or" between two or more group members provides embodiments in which exactly one member of the group is present, embodiments in which more than one members of the group are present, and embodiments in which all of the group members are present. For purposes of brevity those embodiments have not been individually spelled out herein, but it will be understood that each of these embodiments is provided herein and may be specifically claimed or disclaimed.
[00203] It is to be understood that the invention encompasses all variations, combinations, and permutations in which one or more limitation, element, clause, or descriptive term, from one or more of the claims or from one or more relevant portion of the description, is introduced into another claim. For example, a claim that is dependent on another claim can be modified to include one or more of the limitations found in any other claim that is dependent on the same base claim. Furthermore, where the claims recite a composition, it is to be understood that methods of making or using the composition according to any of the methods of making or using disclosed herein or according to methods known in the art, if any, are included, unless otherwise indicated or unless it would be evident to one of ordinary skill in the art that a contradiction or inconsistency would arise.
[00204] Where elements are presented as lists, e.g., in Markush group format, it is to be understood that every possible subgroup of the elements is also disclosed, and that any element or subgroup of elements can be removed from the group. It is also noted that the term "comprising" is intended to be open and permits the inclusion of additional elements or steps. It should be understood that, in general, where an embodiment, product, or method is referred to as comprising particular elements, features, or steps, embodiments, products, or methods that consist, or consist essentially of, such elements, features, or steps, are provided as well. For purposes of brevity those embodiments have not been individually spelled out herein, but it will be understood that each of these embodiments is provided herein and may be specifically claimed or disclaimed.
[00205] Where ranges are given, endpoints are included. Furthermore, it is to be understood that unless otherwise indicated or otherwise evident from the context and/or the understanding of one of ordinary skill in the art, values that are expressed as ranges can assume any specific value within the stated ranges in some embodiments, to the tenth of the unit of the lower limit of the range, unless the context clearly dictates otherwise. For purposes of brevity, the values in each range have not been individually spelled out herein, but it will be understood that each of these values is provided herein and may be specifically claimed or disclaimed. It is also to be understood that unless otherwise indicated or otherwise evident from the context and/or the understanding of one of ordinary skill in the art, values expressed as ranges can assume any subrange within the given range, wherein the endpoints of the subrange are expressed to the same degree of accuracy as the tenth of the unit of the lower limit of the range.
[00206] In addition, it is to be understood that any particular embodiment of the present invention may be explicitly excluded from any one or more of the claims. Where ranges are given, any value within the range may explicitly be excluded from any one or more of the claims. Any embodiment, element, feature, application, or aspect of the compositions and/or methods of the invention, can be excluded from any one or more claims. For purposes of brevity, all of the embodiments in which one or more elements, features, purposes, or aspects is excluded are not set forth explicitly herein.

Claims

CLAIMS What is claimed is:
1. A protein comprising an amino acid sequence that is at least 90% identical to SEQ ID NO: 1, wherein the protein comprises at least 14 amino acid sequence mutations set forth in Table 1, and wherein the protein cleaves an IL-23 protein.
2. The protein of claim 1, wherein the amino acid sequence is not more than 94% identical to SEQ ID NO: 1.
3. The protein of claim 1 or claim 2, wherein the protein comprises at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 amino acid mutations as set forth in Table 1.
4. The protein of any one of claims 1 to 3, wherein at least one of the amino acid sequence mutations is introduced at an amino acid position selected from the group consisting of T17, H28, T30, N68, E107, F132, S 153, and S 170.
5. The protein of any one of claims 1-4, wherein at least one of the amino acid sequence mutations is selected from the group consisting of T17S, H28L, T30A, N68D, E107D, F132L, S 153N, and S 170A.
6. The protein of claim 5, wherein at least two, at least three, at least four, at least five, at least six, at least seven, or at least eight of the amino acid sequence mutations is selected from the group consisting of T17S, H28L, T30A, N68D, E107D, F132L, S 153N, and S 170A.
7. The protein of any one of claims 4-6 further comprising at least one amino acid mutation at an amino acid position selected from the group consisting of D127, S 135, T146, D148, F162, N171, N176, N177, V209, W211, M218, and K229.
8. The protein of any one of claims 4-7, wherein at least one of the amino acid sequence variants is selected from the group consisting of D127A, S 135F, T146S, D148P, F162S, N171D, N176T, N177M, V209M, W211I, M218F, and K229E.
9. The protein of any one of claims 1-8, wherein the protein comprises the amino acid sequence as set forth in any one of SEQ ID NOs.: 11-153.
10. The protein of any one of claims 1-8, wherein the protein consists of the amino acid sequence as set forth in any one of SEQ ID NOs.: 11-153.
11. The protein of any one of claims 1-10, wherein the protein comprises an amino acid sequence as set forth in SEQ ID NO: 137.
12. The protein of any one of claims 1-11, wherein the protein cleaves a target sequence having a sequence as set forth in SEQ ID NO: 3.
13. The protein of claim 12, wherein the protein also cleaves a target sequence having a sequence set forth in SEQ ID NO: 2.
14. The protein of any one of claims 1-13, wherein the protein cleaves IL-23 with between about 5% and about 50% of the catalytic efficiency (kcat/KM) with which TEV protease cleaves its native substrate.
15. The protein of any one of claims 1-14, wherein the protein cleaves the target sequence ENLYFQS with between about 50% and about 100% of the catalytic efficiency (kcat/KM) with which TEV protease cleaves the target sequence ENLYFQS (SEQ ID NO: 2).
16. A method for producing the protein of any one of claims 1- 10, the method comprising (a) contacting a population of host cells with a population of phage vectors comprising a gene encoding a protease and deficient in at least one gene for the generation of infectious phage particles, wherein
(1) the host cells are amenable to transfer of the vector;
(2) the vector allows for expression of the protease in the host cell, can be replicated by the host cell, and the replicated vector can transfer into a second host cell;
(3) the host cell expresses a gene product encoded by the at least one gene for the generation of infectious phage particles of (a) in response to the activity of the protease, and the level of gene product expression depends on the activity of the protease;
(b) incubating the population of host cells under conditions allowing for mutation of the gene encoding the protease and the transfer of the vector comprising the gene encoding the protease of interest from host cell to host cell, wherein host cells are removed from the host cell population, and the population of host cells is replenished with fresh host cells that do not harbor the vector;
(c) isolating a replicated vector from the host cell population in (b), wherein the replicated vector comprises a mutated version of the gene encoding the protease.
17. The method of claim 16, wherein the host cell expresses a fusion molecule comprising
(i) a transcriptional activator; and
(ii) an inhibitor of the transcriptional activator of (i), wherein the inhibitor is fused to the transcriptional activator of (i) via a linker comprising a protease cleavage site that is cleaved by the protease of (a).
18. A pharmaceutical composition comprising the protein of any one of claims 1-15 and a pharmaceutically acceptable excipient.
19. An isolated nucleic acid encoding a protein comprising an amino acid sequence as set forth in any one of SEQ ID NOs.: 11-153.
20. A host cell comprising the isolated nucleic acid of claim 19.
21. A method of cleaving IL-23, the method comprising contacting an IL-23 protein with the protein of any one of claims 1-15.
22. The method of claim 21, wherein the IL-23 protein is contacted in an extracellular environment.
23. The method of claim 22, wherein the extracellular environment is in vitro.
24. The method of claim 22, wherein the extracellular environment is in a subject, optionally a mammalian subject, such as a human or mouse.
25. A method of reducing IL-23 activity in a subject, the method comprising administering to the subject an effective amount of the protein of any one of claims 1-15.
26. A method of reducing IL-17 activity in a subject, the method comprising
administering to the subject an effective amount of the protein of any one of claims 1-15.
27. The method of claim 25 or 26, wherein the administration results in reduction of IL- 17 secretion by the cells of the subject.
28. The method of any one of claims 25-27, wherein the cell is a mammalian cell, optionally a human cell.
29. The method any one of claims 25-27, wherein the subject is a mouse cell.
30. The method of any one of claims 25-29, wherein the subject has or is suspected of having a disease characterized by increased IL-23 activity.
31. The method of claim 30, wherein the disease characterized by increased IL-23 activity is an inflammatory disease.
32. The method of claim 31, wherein the inflammatory disease is an autoimmune disease selected from the group consisting of psoriasis, inflammatory bowel disease, rheumatoid arthritis, asthma, and multiple sclerosis.
33. A method of treating an inflammatory disease, the method comprising administering to a subject having or suspected of having an inflammatory disease an effective amount of the protein of any one of claims 1-15.
34. The method of claim 33, wherein the inflammatory disease is an autoimmune disease.
35. The method of claim 34, wherein the autoimmune disease is selected from the group consisting of psoriasis, inflammatory bowel disease, rheumatoid arthritis, asthma, and multiple sclerosis.
PCT/US2018/014867 2017-01-23 2018-01-23 Evolved proteases and uses thereof WO2018136939A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201762449588P 2017-01-23 2017-01-23
US62/449,588 2017-01-23

Publications (1)

Publication Number Publication Date
WO2018136939A1 true WO2018136939A1 (en) 2018-07-26

Family

ID=62908393

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2018/014867 WO2018136939A1 (en) 2017-01-23 2018-01-23 Evolved proteases and uses thereof

Country Status (1)

Country Link
WO (1) WO2018136939A1 (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111019927A (en) * 2019-12-30 2020-04-17 重庆艾力彼生物科技有限公司 Recombinant plasmid and recombinant engineering bacterium for expressing TEV protein, and method for preparing and purifying TEV protein
US11078469B2 (en) 2015-07-30 2021-08-03 President And Fellows Of Harvard College Evolution of TALENs
US11104967B2 (en) 2015-07-22 2021-08-31 President And Fellows Of Harvard College Evolution of site-specific recombinases
WO2021202651A1 (en) 2020-04-01 2021-10-07 Voyager Therapeutics, Inc. Redirection of tropism of aav capsids
US11214792B2 (en) 2010-12-22 2022-01-04 President And Fellows Of Harvard College Continuous directed evolution
US11299729B2 (en) 2015-04-17 2022-04-12 President And Fellows Of Harvard College Vector-based mutagenesis system
EP3851525A4 (en) * 2018-10-10 2022-06-22 Shangrao Concord Pharmaceutical Co., Ltd. Tev protease variant, fusion protein thereof, preparation method therefor and use thereof
US11447809B2 (en) 2017-07-06 2022-09-20 President And Fellows Of Harvard College Evolution of tRNA synthetases
US11524983B2 (en) 2015-07-23 2022-12-13 President And Fellows Of Harvard College Evolution of Bt toxins
US11624130B2 (en) 2017-09-18 2023-04-11 President And Fellows Of Harvard College Continuous evolution for stabilized proteins
US11760986B2 (en) 2014-10-22 2023-09-19 President And Fellows Of Harvard College Evolution of proteases
EP4321618A1 (en) * 2022-08-09 2024-02-14 NUMAFERM GmbH Variants of tev protease and uses thereof
US11913044B2 (en) 2018-06-14 2024-02-27 President And Fellows Of Harvard College Evolution of cytidine deaminases

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120231498A1 (en) * 2006-10-13 2012-09-13 Novo Nordisk Health Care Ag Processing enzymes fused to basic protein tags

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120231498A1 (en) * 2006-10-13 2012-09-13 Novo Nordisk Health Care Ag Processing enzymes fused to basic protein tags

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11214792B2 (en) 2010-12-22 2022-01-04 President And Fellows Of Harvard College Continuous directed evolution
US11760986B2 (en) 2014-10-22 2023-09-19 President And Fellows Of Harvard College Evolution of proteases
US11299729B2 (en) 2015-04-17 2022-04-12 President And Fellows Of Harvard College Vector-based mutagenesis system
US11905623B2 (en) 2015-07-22 2024-02-20 President And Fellows Of Harvard College Evolution of site-specific recombinases
US11104967B2 (en) 2015-07-22 2021-08-31 President And Fellows Of Harvard College Evolution of site-specific recombinases
US11524983B2 (en) 2015-07-23 2022-12-13 President And Fellows Of Harvard College Evolution of Bt toxins
US11078469B2 (en) 2015-07-30 2021-08-03 President And Fellows Of Harvard College Evolution of TALENs
US11913040B2 (en) 2015-07-30 2024-02-27 President And Fellows Of Harvard College Evolution of TALENs
US11447809B2 (en) 2017-07-06 2022-09-20 President And Fellows Of Harvard College Evolution of tRNA synthetases
US11624130B2 (en) 2017-09-18 2023-04-11 President And Fellows Of Harvard College Continuous evolution for stabilized proteins
US11913044B2 (en) 2018-06-14 2024-02-27 President And Fellows Of Harvard College Evolution of cytidine deaminases
EP3851525A4 (en) * 2018-10-10 2022-06-22 Shangrao Concord Pharmaceutical Co., Ltd. Tev protease variant, fusion protein thereof, preparation method therefor and use thereof
US11919936B2 (en) 2018-10-10 2024-03-05 Shangrao Concord Pharmaceutical Co., Ltd. TEV protease variant, fusion protein thereof, preparation method therefor and use thereof
CN111019927A (en) * 2019-12-30 2020-04-17 重庆艾力彼生物科技有限公司 Recombinant plasmid and recombinant engineering bacterium for expressing TEV protein, and method for preparing and purifying TEV protein
CN111019927B (en) * 2019-12-30 2023-10-13 重庆艾力彼生物科技有限公司 Recombinant plasmid for expressing TEV protein, recombinant engineering bacterium and method for preparing and purifying TEV protein
WO2021202651A1 (en) 2020-04-01 2021-10-07 Voyager Therapeutics, Inc. Redirection of tropism of aav capsids
EP4321618A1 (en) * 2022-08-09 2024-02-14 NUMAFERM GmbH Variants of tev protease and uses thereof
WO2024033427A1 (en) * 2022-08-09 2024-02-15 Numaferm Gmbh Variants of tev protease and uses thereof

Similar Documents

Publication Publication Date Title
WO2018136939A1 (en) Evolved proteases and uses thereof
US11760986B2 (en) Evolution of proteases
US11905623B2 (en) Evolution of site-specific recombinases
JP7133671B2 (en) Compositions and methods for in vitro viral genome engineering
US20210163924A1 (en) Evolution of bont peptidases
Wein et al. Bacterial origins of human cell-autonomous innate immune mechanisms
US11913044B2 (en) Evolution of cytidine deaminases
EP3097196B1 (en) Negative selection and stringency modulation in continuous evolution systems
US20220259269A1 (en) Evolved botulinum neurotoxins and uses thereof
EP3202903B1 (en) Continuous directed evolution
JP2018148917A (en) Enzymes
US11447809B2 (en) Evolution of tRNA synthetases
Taverniti et al. Mycobacterium smegmatis RNase J is a 5′‐3′ exo‐/endoribonuclease and both RNase J and RNase E are involved in ribosomal RNA maturation
WO2016168631A1 (en) Vector-based mutagenesis system
KR20210060541A (en) Improved high throughput combinatorial genetic modification system and optimized Cas9 enzyme variants
WO2023081805A1 (en) Procaspase-cleaving proteases and uses thereof
EP1838851B1 (en) Polypeptide mutagenesis method
Chetverin et al. Unsolved puzzles of Qβ replicase
WO2024050007A1 (en) Gtp cyclohydrolase-cleaving proteases
Meyer The evolution and engineering of T7 RNA polymerase
Laniohan Development of a dual in vitro/in vivo rna synthesis system using rna polymerase holoenzyme and alternative sigma factors
Ellefson Engineering the central dogma using emulsion based directed evolution
Bucheli TESIS DOCTORAL Directed evolution of penicillin V acylase from Streptomyces lavendulae and aculeacin A acylase from Actinoplanes utahensis Evolución dirigida de penicilina V acilasa de Streptomyces
Taverniti RNA MATURATION/DEGRADATION IN MYCOBACTERIA: IN VIVO AND IN VITRO CHARACTERIZATION OF RNASE J AND RNASE E

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18741782

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18741782

Country of ref document: EP

Kind code of ref document: A1