WO2019005973A1

WO2019005973A1 - Synthetase variants for incorporation of biphenylalanine into a peptide

Info

Publication number: WO2019005973A1
Application number: PCT/US2018/039764
Authority: WO
Inventors: Aditya Mohan KUNJAPUR; George M. Church; Devon Alexander Olson STORK; Erkin KURU
Original assignee: President And Fellows Of Harvard College
Priority date: 2017-06-30
Filing date: 2018-06-27
Publication date: 2019-01-03

Abstract

This disclosure provides variants of the biphenylalanine (BipA) orthogonal translation system used for incorporation of BipA into proteins. Specifically, engineered BipA aminoacyl-tRNA synthetase (BipARS) variants and tRNA variants that improve selectivity towards BipA are described. Furthermore, this disclosure provides methods used to generate these variants.

Description

Synthetase Variants for Incorporation of Biphenylalanine into a Peptide

RELATED APPLICATION DATA

This application claims priority to U.S. Provisional Application No. 62/527, 115 filed on June 30, 2017 which is hereby incorporated herein by reference in its entirety for all purposes.

STATEMENT OF GOVERNMENT INTERESTS

This invention was made with government support under DE-FG02-02ER63445 awarded by Department of Energy. The government has certain rights in the invention.

SEQUENCE LISTING

The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on June 26, 2018, is named 010498_01 110_WO_SL.txt and is 46,661 bytes in size.

FIELD

The present invention relates in general to synthetase and transfer RNA variants for incorporation of biphenylalanine into a polypeptide and methods of making same,

BACKGROUND

Advancements to genetic code expansion require accurate, selective, and high- throughput determination of non-standard amino acid (NSAA) incorporation into proteins. The fidelity of translation relies on the selectivity of amino acyl transfer RNA (tRNA) synthetases (AARSs), which catalyze esterifi cation of tRNAs to their corresponding amino acids (See, M. Ibba, D. Soil, Aminoacyl -tRNAs: setting the limits of the genetic code. Genes Dev. 18, 731-8 (2004)). Orthogonal AARS/tRNA pairs, together known as OTSs, enable site-specific NSAA incorporation into proteins, most often by suppressing amber (UAG) stop codons in targeted sequences (See, C. Noren, S. Anthony-Cahill, M. Griffith, P. Schuitz, A general method for site-specific incorporation of unnatural amino acids into proteins. Science (80-. ). 244 (1989) (available at world wide website science, ciencemag.org/content/244/4901/182), L. Wang, A , Brock, B. Herberich, P. G. Schuitz, Expanding the genetic code of Escherichia coli. Science (80- ). 292, 498-500 (2001)). Four primary site-specific OTS families have been developed for NSAA incorporation: Methanococcus jannaschii tyrosyl-tRNA synthetase

(A^TyrRS)/tRNA¾^, ; various Methanosarcina pyrrolysyl-tRNA synthetase (PylRS)/tRNA¾^l _A; Escherichia coli tvrosv 1-tRNA synthetase (£cTyrRS)/tRNA¾ and E. coli 1 eucvl-tRNA synthetase

(See, J. W. Chin, Expanding and Reprogramming the Genetic Code of Cells and Animals. Aram. Rev. Biochem. 83, 379-408 (2014); A. Dumas, L. Lercher, C, D. Spicer, B. G. Davi s, Designing logical codon reassignment - Expanding the chemistry in biology. Chem. Sci. 6, 50-69 (2015)). Another commonly used OTS is the Saccharomyces cerevisiae tryptophanyi-tRNA synthetase (ScTrpRS)/ tRN A^" _c pair (See, R. A. Hughes, A. D. Ellington, Rational design of an orthogonal tryptophanyl nonsense suppressor tRNA. Nucleic Acids Res. 38, 6813-6830 (2010); A. Chatterjee, H. Xiao, P.-Y. Yang, G. Soundararajan, P. G. Schuitz, Genetic Codon Expansi on A Tryptophanyl -tRNA Synthetase/tRN A Pair for Unnatural Amino Acid Mutagenesis in E. coli**, doi: 10.1002/anie.201301094; J. W. Ellefson et al.. Directed evolution of genetic parts and circuits by compartmentalized partnered replication. Nai. Biotechnol. 32, 97-101 (2014)),

However, engineered OTS promiscuity for standard amino acids (SAAs) and for undesired NSAAs is a major barrier to expansion of the genetic code. The low fidelity of several OTSs is documented, revealing that even after multiple rounds of negative selection they misacylate tRNA with SAAs that their ancestral variants acted upon, such as tyrosine (Y) and tryptophan (W) (See, K. Oki, K. Sakamoto, T. Kobayashi, H. M. Sasaki, S. Yokoyama, Transplantation of a tyrosine editing domain into a tyrosyl-tRNA synthetase variant enhances its specificity for a tyrosine analog. Proc. Nail. Acad. Sci. U. S. A. 105, 13298-303 (2008); T. S. Young, I. Ahmad, J. A. Yin, P, G. Schultz, An Enhanced System for Unnatural Amino Acid Mutagenesis in E, coli. J. Mol Biol. 395, 361-374 (2010); A. K, Anton czak et al., Importance of single molecular determinants in the fidelity of expanded genetic codes. Proc. Nail. Acad. Sci. U. S. A. 108, 1320-5 (201 1); S. Nehring, N. Budisa, B. Wiitschi, M. Oiiveberg, N. Budisa, Performance Analysis of Orthogonal Pairs Designed for an Expanded Eukaryotic Genetic Code. PLoS One. 7, e31992 (2012); J. W. Monk et al. Rapid and Inexpensive Evaluation of Nonstandard Amino Acid Incorporation in Escherichia coli. ACS Synth. Biol. (2016), doi: 10.1021/acssynbio.6b00192)). The problem of OTS cross-talk with SAAs is exemplified in the case of biocontainment, which was previously demonstrated based on the NSAA biphenylalanine (BipA) and its corresponding OTS (See, J. Xie, W. Liu, P. G. Schultz, A Genetically Encoded Bidentate, Metal-Binding Amino Acid. Angew. ( ^'hemic. 119, 9399-9402 (2007)), Protease mutations were found in sequenced escapees that emerged in the absence of BipA, suggesting that redesigned enzymes intended to be destabilized by SAA misincorporation may transiently remain functional prior to degradation (See, D. J. Mandell et al, Biocontainment of genetically modified organisms by synthetic protein design. Nature. 518, 55-60 (2015)), Furthermore, genomic integration of the BipA OTS, which likely decreased misincorporation, reduced escape frequency. Given that OTS evolution efforts have not selected against activity upon undesired NSAAs, greater promiscuity is expected in the presence of multiple NSAAs. OTS promiscuity is of particular concern when using members of TyrRS/TrpRS/PylRS families together given demonstrated overlap of substrate ranges (See, C, Fan, J, M, L. Ho, N. Chirathivat, D. Soil, Y.-S. Wang, Exploring the Substrate Range of Wild-Type Aminoacyl-tRNA Synthetases. ChemBioChem . 15, 1805-1809 (2014); L.-T. Guo et al. Polyspecific pyrrolysyl-tRNA synthetases from directed evolution. Proc Natl Acad Sci US A. I ll, 16724-16729 (2014); Y.-S. Wang et a!., The de novo engineering of pyrrolysyl- tRNA synthetase for genetic incorporation of 1 -phenylalanine and its derivatives. Mol. Biosyst. 7, 714-717 (2011)). Together, these concerns converge as progress was made towards constructing a 57-codon E. coli strain anticipated to exhibit multi-virus resistance, to require biocontainment, and to serve as a platform for producing proteins containing multiple different NSAAs (See, N. Ostrov et al. Design, synthesis, and testing toward a 57-codon genome. Science (80-. ). 353, 819-822 (2016)), Many other applications utilizing NSAAs, such as protein double labelling, FRET, and antibody conjugation, also require high fidelity incorporation to avoid heterogenous protein production.

There is a continuing need in the art to develop BipA OTS variants with improved efficiency and accuracy for incorporating desired NSAAs into peptides,

SUMMARY

The present disclosure provides a method of screening for an amino acyl tRNA synthetase variant having preferential selectivity for a desired non-standard amino acid (NSAA) over its standard amino acid (SAA) counterpart or an undesired non-standard amino acid for incorporation into a target polypeptide in a cell. According to one aspect, the method includes providing to the cell an amino acyl tRNA synthetase variant and its cognate transfer RNA corresponding to the desired NSAA, wherein the cell is genetically engineered to express the target polypeptide including an amino acid target location for incorporation of the desired NSAA by the amino acyl tRNA synthetase variant and the transfer RNA, and wherein the cell expresses the target polynucleotide and either a desired NSAA, an SAA or an undesired NSAA is incorporated at the amino acid target location depending on the preferential selectivity of the amino acyl tRNA synthetase variant and the transfer RNA for the corresponding desired NSAA, wherein a removable protecting group is attached to the target polypeptide adjacent to the amino acid target location, such that when the removable protecting group is removed, an N-end amino acid is exposed at the amino acid target location, and wherein a detectable moiety is attached to the C-end of the target polypeptide, wherein the ceil expresses an enzyme that cleaves the removable protecting group to generate an N-end amino acid, and wherein the cell further expresses an adaptor protein for a protease, wherein the protease degrades the target polypeptide when the N-end amino acid is an SAA or an undesired NSAA, detecting the detectable moiety as a measure of the amount of target polypeptide including the desired NSAA within the cell, and repeatedly testing an amino acyl tRNA synthetase variant for improved production of the target polypeptide including the desired NSAA.

In one embodiment, the removable protecting group is ubiquitin that is cleavable by Ubpl . In another embodiment, the detectable moiety is a fluorescent moiety or a reporter protein. In certain embodiments, cell expresses the enzyme for cleaving the removable protecting group constitutively or inducibly. In other embodiments, the adaptor protein and the protease is a ClpS-ClpAP protease system wherein the ClpS-CipAP protease system degrades the target polypeptide when the N-end amino acid is an SAA or an undesired NSAA to thereby enrich the target polypeptide including the desired NSAA within the cell. In some embodiments, the adaptor protein comprises a ClpS protein, its natural homolog, ClpS_V65I, ClpS 431 or ClpS L32F mutants. In some embodiments, the cell is a prokaryotic cell or a eukaryotic cell. In one embodiment, the cell is a bacterium. In another embodiment, the cell is a genetically modified E. coli. In one embodiment, the desired NSAA is bi phenyl alanine (BipA). In certain embodiment, the amino acyl tRNA synthetase variant is a biphenylalanine amino acyl tRNA synthetase (BipARS) variant. In one embodiment, the amino acyl tRNA synthetase variant is generated by introducing mutations throughout the wild type amino acyl tRN A synthetase gene. In an exemplary embodiment, error-prone PGR is used to introduce mutations throughout the wild type amino acyl tRN A synthetase gene. In one embodiment, the amino acyl tRNA synthetase variant is provided to the cell by a nucleic acid encoding the amino acyl tRNA synthetase variant. In another embodiment, the transfer RNA is provided to the cell by a nucleic acid encoding the transfer RNA.

According to one aspect, the present disclosure provides an amino acyl tRNA synthetase variant comprising variant 1 to variant 1 , According to another aspect, the present disclosure provides a nucleic acid encoding the amino acyl tR A synthetase variants 1 to 1 1.

According to one aspect, the present disclosure provides a transfer RNA variant comprising variant 4 tRNA, variant 9 tRNA, and variant 10 tRNA. According to another aspect, the present disclosure provides a nucleic acid encoding the transfer RNA variants of variant 4 tRNA, variant 9 tRNA, and variant 10 tRNA.

According to another aspect, the present disclosure provides a biphenylalanine amino acyl tRNA synthetase variant wherein the variant comprises one or more amino acid substitutions to a parental biphenylalanine amino acyl tRNA synthetase having the sequence of

NIDEFEMIKRNTSEIISEEELREVLKKDEKSAHIGFEPSGKIHLGHYL-QIKKMIDLQNAG FDIIMLADLHAYLNQKGELDEIRKIGDYNKK EAMGLKAKYVYGSEWMLDKDYT

LNVYRLALKTTLKRARR

EQRKIHMLARELLPKK CfflNP VLTGLDGEGKMS S SKGNFIAVDD SPEEIRAKD KA YCPAGVVEGOTIMEIAKWLEYPLTK

KNAVAEELIKILEPIRKRL (SEQ ID NO: 1), In some embodiments, the variant includes one or more amino acid substitutions selected from the group consisting of N1 57K and I255F, R257G, R181C and E259V, I153V and A214T, P37A, K76R, I49F, A130V and A233V, L55M and G158S, D61V and H70Q and Nl 17D, D200Y, G210S, E237V and D286Y to the parental biphenylalanine amino acyl tRNA synthetase, or an amino acid sequence having at least 90% sequence identity thereof. In an exemplary embodiment, the variant includes amino acid substitutions D61 V and H70Q to the parental biphenylalanine amino acyl tRNA synthetase, or an amino acid sequence having at least 90% sequence identity thereof, in one embodiment, an isolated polynucleotide encoding the synthetase variants described herein. In another embodiment, a host cell comprising an expression vector is provided. In one embodiment, the expression vector comprises the polynucleotide encoding the synthetase variants described herein.

According to one aspect, the present disclosure provides a transfer RNA (tRNA) variant wherein the variant comprises one or more nucleotide substitutions to a parental tRNA having the sequence of ccggcggtagttcagcagggcagaacggcggactctaaatccgcatggcaggggttcaaatcccctccgccggacca (SEQ ID NO: 2). In some embodiments, the tRNA variant includes a nucleotide substitution selected from the group consisting of A22G, C67A, C26T, C29A, G51T and G23A to the parental tRNA, or a nucleotide sequence having at least 90% sequence identity thereof. In one embodiment, an isolated polynucleotide each encoding the tRNA variants described herein is provided. In another embodiment, a host cell comprising an expression vector is provided. In one embodiment, the expression vector comprises the polynucleotide which each encodes the tRNA variants described herein is provided.

According to another aspect, the present disclosure provides a biphenylalanine amino acyl tRNA synthetase and tRNA pair wherein the pair is selected from the group consisting of i) a biphenylalanine amino acyl tRNA synthetase variant comprising amino acid substitutions Nl 57K and I255F to the parental biphenylalanine amino acyl tRN A synthetase and the parental tRNA; ii) a biphenylalanine amino acyl tRNA synthetase variant comprising an amino acid substitution R257G to the parental biphenylalanine amino acyl tRNA synthetase and the parental tRNA; iii) a biphenylalanine amino acyl tRNA synthetase variant comprising amino acid substitutions R181C and E259V to the parental biphenylalanine amino acyl tRNA synthetase and a tRNA variant comprising a nucleotide substitution A22G to the parental tRNA; vi) a biphenyl alanine amino acyl tRNA synthetase variant comprising amino acid substitutions I153V and A214T to the parental biphenylalanme amino acyl tRNA synthetase and a tRNA variant comprising a nucleotide substitution C67A to the parental tRNA; v) a biphenylalanine amino acyl tRNA synthetase variant comprising an amino acid substitution P37A to the parental biphenylalanine amino acyl tRNA synthetase and the parental tRNA; vi) a biphenylalanine amino acyl tRNA synthetase variant comprising an amino acid substitution K76R to the parental biphenylalanine amino acyl tRNA synthetase and the parental tRNA; vii) the parental biphenylalanine amino acyl tRNA synthetase and a tRNA variant comprising a nucleotide substitution A22G to the parental tRNA; viii) a biphenylalanine amino acyl tRNA synthetase variant comprising amino acid substitutions I49F, A130V and A233 V to the parental biphenyl alanine amino acyl tRNA synthetase and a tRNA variant comprising a nucleotide substitution C26T to the parental tRNA; ix) a biphenylalanine amino acyl tRNA synthetase variant comprising amino acid substitutions L55M and G158S to the parental biphenylalanine amino acyl tRNA synthetase and a tRNA variant comprising a nucleotide substitution C29A to the parental tRNA; x) a biphenylalanine amino acyl tRNA synthetase variant comprising amino acid substitutions D61V and H70Q to the parental biphenylalanine amino acyl tRNA synthetase and a tRNA variant comprising a nucleotide substitution G51T to the parental tRNA; and xi) a biphenylalanine amino acyl tRNA synthetase variant comprising amino acid substitutions N i l 71), D200Y, G210S, E237V and D286Y to the parental biphenylalanine amino acyl tRNA synthetase and a tRNA variant comprising a nucleotide substitution G23A to the parental tRNA.

BRIEF DESCRIPTION OF THE DRAWINGS

The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee. The foregoing and other features and advantages of the present invention will be more fully understood from the following detailed description of illustrative embodiments taken in conjunction with the accompanying drawings in which:

Figs. 1 A and 1.G illustrate the use of post-translational proofreading (PTP) for selective BipA OTS evolution. Fig. 1 A shows FACS evolution scheme with EP-PCR AARS libraries transformed into hosts with PTP (using ClpS^Vo51) genomicaliy integrated before 3 sorting rounds. Fig. IB shows evaluation of most enriched evolved BipARS variants on a panel of NSAAs ([BipA] ==^: 100 μΜ, [rest] ^;=: 1 itiM, which are their standard concentrations). Fig. 1C shows in vitro amino acid substrate specificity profile of BipA OTS variants. Fig. ID shows escape frequencies over time for adk.d6 strains transformed with constructs indicated in legend. Navy circles represent previously published data. Gray circles for adk.d6 represent repeat of previously published data. Green and yellow circles are the most relevant constructs to compare for this study. KA: Kanamycin+Arabinose. SCA: SDS+Chloramphenicol+Arabinose. Error bars in D-F represent SEM, N=3. Fig. IE shows escape frequencies over time for tyrS.d.8 strains. Lines represent assay detection limit in cases where no colonies were observed. Fig, IF shows escape frequencies over time for adk.d6/tyrS.d8 strains. Fig. 1G shows doubling time for biocontained strains with WT or Variant 10 OTS. Error bars = SD, N=3.

Fig. 2 shows FACS data from BipARS EP-PCR library exposed to negative screens of differing stringency.

Figs. 3A-3D show confirmation of BipA incorporation by mass spectrometry (MS). Fig. 3 A shows SDS-PAGE gel of Ni-NTA purified Ub-X-GFP reporter proteins. Fig. 3B shows MS trace indicating incorporation of tyrosine in position X in peptide GGXLFVQELASK (SEQ ID NO: 3) (positions 75-86 of Ub-X-GFP) using WT BipA OTS and no addition of BipA. Fig. 3C shows MS trace indicating incorporation of BipA in position X of the same peptide using WT BipA OTS in the presence of BipA. Fig. 3D shows MS trace indicating incorporation of BipA in position X of the same peptide using BipA 10 OTS in the presence of BipA.

Figs. 4A-4B show sample images of plates depicting biocontainment escape frequency estimation. Fig. 4A shows total CFU estimation on permissive media. Fig. 4B shows escapee estimation on non-permissive media.

Figs. 5A-5D show spontaneous tRNA mutations observed in sorted variants and effect on selectivity. Fig. 5 A shows positions of observed tRNA mutations on the predicted /TyrRS tRNAopt structure. Note that the position of the BipA OTS Variant 10 tRNA is the most influential for interaction with elongation factor Tu (EF-Tu). Figure 5A discloses SEQ ID NO: 153. Fig. 5B shows FL/OD measurements after cloning each combination of BipARS and tRNA variant. Each of the 3 variant tRNAs confers selectivity against standard amino acids (represented by the "No NSAA" case) regardless of the BipARS pairing. Variant 10 BipARS with Variant 10 tRNA is the most selective for BipA compared to the other NSAAs shown above. Fig. 5C shows in vitro amino acid substrate specificity of Variant 9 BipARS with WT tRNA or Variant 9 tRNA. Fig, 5D shows in vitro amino acid substrate specificity of Variant 10 BipARS with WT tRNA or Variant 10 tRNA.

Fig. 6 shows single UAG suppression sensitivity assay with and without PTP (using ClpS V65L which does not degrade pAcF or pAzF) reveals that AARSs evolved using a strategy geared towards multi-UAG suppression (See, M. Amiram et al. Evolution of translation machinery in recoded bacteria enables multi-site incorporation of nonstandard amino acids. Nat Biotech. 33, 1272-1279 (2015)) display very low fidelity for single UAG sites. Progenitor; pAcFRS_D286R. Evolved strains: pAzFRS. l .tl and pAcFRS.2.tl .

DETAILED DESCRIPTION

The present disclosure provides a method of screening for an amino acyl tRNA synthetase variant having preferential selectivity for a desired non-standard amino acid (NSAA) over its standard amino acid (SAA) counterpart or an undesired non-standard amino acid for incorporation into a target polypeptide in a cell. As used herein, the terms "polypeptide" and "protein" include compounds that include amino acids joined together by peptide bonds between the carboxyl and amino groups of adjacent amino acid residues. Exemplar}- cells include prokaryotic cells and eukaryotic cells. Exemplary prokaryotic cells include bacteria, such as E. coli, such as genetically modified E. coii. In exemplary embodiments, the cell is genetically modified to express the target polypeptide including an amino acid target location for incorporation of a desired non-standard amino acid substitution by an engineered amino-acyl tRNA synthetase variant and transfer R A pair corresponding to the non-standard amino acid. A removable protecting group is attached to the target polypeptide adjacent to the amino acid target location, such that when the removable protecting group is removed, an N-end amino acid is exposed at the amino acid target location. According to one aspect, the removable protecting group is orthogonal within the cell in which it is being used.

According to certain aspects, the cell includes a protease system for degrading the target polypeptide when the N-end amino acid is a standard amino acid. According to certain aspects, the ceil includes a protease system for degrading the target polypeptide when the N-end amino acid is an undesired NSAA. According to certain aspects, the protease system includes an adapter protein and a corresponding protease. The adapter protein coordinates with the protease for degrading the target polypeptide when the N-end amino acid is a standard amino acid. According to one aspect, the protease system is endogenous. According to one aspect, the protease and adaptor can be expressed constitutively. According to one aspect, the protease system is exogenous. According to one aspect, the protease system is under influence of a promoter. According to one aspect, the adapter protein of the protease system is under influence of an inducible promoter. According to one aspect, the adapter protein is upregulated. According to one aspect overexpression of adaptor to produce adaptor levels in excess of that found normally within a cell improves degradation of polypeptides having an undesired amino acid at the amino acid target location. According to one aspect, an adaptor protein is provided that facilitates N-end rale classification of an NSAA (See, D, B. F. Johnson et al, RF1 knockout allows ribosomal incorporation of unnatural amino acids at multiple sites. Nat Chem Biol. 7, 779-786 (2011); P. O'Donoghue et l., Near-cognate suppression of amber, opal and quadruplet codons competes with aminoacyl-tRNAPyl for genetic code expansion. FEBSLetl. 586, 3931-3937 (2012); A. Bachmair, D. Finley, A . Varshavsky, In vivo half-life of a protein is a function of its amino-terminal residue. Science (80- ). 234, 1 79-186 (1986); J. W. Tobias, T. E. Shrader, G. Rocap, A. Varshavsky, The N-end rule in bacteria. Science (80-. ). 254, 1374— 1377 (1991); T. Tasaki, S. M. Sriram, K. S. Park, Y. T. Kwon, TheN-End Rule Pathway. Aram. Rev. Biochem. 81, 261-289 (20 2)).

Because the N-end rule pathway of protein degradation is conserved across prokaryotes and eukaryotes, methods described herein are useful in prokaryotes and eukaryotes. The removable protecting groups should be orthogonal in the cell within which it is being used. Ubiquitin is a suitable protecting group in prokaryotic cells because it is orthogonal but it is not a suitable protecting group in eukaryotic cells because it is not orthogonal. In eukaryotic cells, ubiquitin is N-terminally added to proteins often to initiate the process of protein degradation in the proteasome. In addition, the adaptor proteins in eukaryotic cells are homologs of ClpS known as Ubiquitin E3 ligases. According to the present disclosure, ubiquitin E3 ligase domain is altered in order to change the N-end rule classification of an NSAA.

According to one aspect, the removable protecting group is removed to generate an N- end amino acid, and the protease degrades the target polypeptide when the N-end amino acid is a standard amino acid or an undesired NSAA. In this manner, the target polypeptide including a desired non-standard amino acid substitution, i.e. which is resistant to degradation, is enriched within the cell. According to one aspect, embodiments of the disclosure are directed to methods that allow selective degradation of proteins having a standard amino acid or undesired NSAA instead of a desired nonstandard amino acid at their ^'N -termini in a cell. The methods can be used for producing proteins with desired nonstandard amino acids at their N- termini with no detectable impurities.

According to one aspect, a method of identifying the presence of a target polypeptide including a desired non-standard amino acid, i.e. one which is resistant to degradation, is provided. According to this aspect, the target polypeptide includes a detectable moiety attached to the C-end of the target polypeptide. In this manner, if the target polypeptide (and detectable moiety) that is made by the cell is not subject to degradation as described above, then the detectable moiety is detected as a measure of the amount of target polypeptide generated by the cell. Accordingly, a method is provided where a detectable moiety is present at the C-end of the target polypeptide, the removable protecting group is removed to generate an N-end amino acid, the protease (whether accompanied by an adapter protein or not depending upon the protease system being used) degrades the target polypeptide when the N-end amino acid is a standard amino acid or an undesired NSAA, for example, to thereby enrich the target polypeptide including a desired non-standard amino acid substitution, and the detectable moiety is detected as a measure of the amount of the target polypeptide including a desired non-standard amino acid substitution.

According to one aspect, a method is provided for screening for amino acyl tRNA synthetase variants that are more selective for incorporating non-standard amino acids versus standard amino acids at a selected site in a protein. Since all or substantially all of proteins bearing a standard amino acid or an undesired NSAA at their N-terminus are degraded leaving only proteins with a desired nonstandard amino at their N-terminus, no or substantially no background signal due to standard amino acid or undesired NSAA incorporation results from the method. Synthetases can be evolved and their variants screened in a high-throughput fashion for their function of producing a protein incorporating a nonstandard amino acid, such as a desired NSAA. In this manner, those synthetases with improved function can be identified and modified further to improve efficiency and selectivity.

I. Receding Cells That Incorporate a Desired NSAA in a Target Polypeptide

In general, a cell can be genetically modified to include a nucleic acid sequence which encodes for the target polypeptide that incorporate one or more non-standard amino acids within its amino acid sequence. The cell can be genomically recoded, ("a genomically recoded organsim") to the extent that one or more codons have been reassigned to encode for a nonstandard amino acid. For each different non-standard amino acid, an amino-acyl tRNA synthetase/tRNA pair is engineered and the cell is capable of using the amino-acyl tRNA synthetase/tRNA pair to add the corresponding non-standard amino acid (when present in the cell) to a growing peptide sequence. Materials, conditions, and reagents for genetically modifying a cell to make a target protein having one or more amino acid sequences are described in the following references, each of which are hereby incorporated by reference in their entireties.

Approaches to genomically recede organisms include multiplex automatable genome engineering (MAGE), (for example, as described in Wang, Harris H., et al. "Programming cells by multiplex genome engineering and accelerated evolution." Nature 460.7257 (2009): 894- 898 hereby incorporated by reference in its entirety) and hierarchical conjugative assembly genome engineering (CAGE) (for example, as described in Isaacs, Farren J., et al. "Precise manipulation of chromosomes in vivo enables genome-wide codon replacement." Science 333.6040 (2011): 348-353 hereby incorporated by reference in its entirety). In addition, portions of recoded genomes can be synthesized and subsequently assembled, as described recently in an effort to construct a 57-codon organism (for example, as described in Ostrov, Niii, et al. "Design, synthesis, and testing toward a 57-codon genome," Science 353.6301 (2016): 819-822 hereby incorporated by reference in its entirety). The modification of an organism, whether receded or not receded, in order to express a polypeptide containing a site-specific non-standard amino acid has been described extensively in the literature (for example, as described in Wang, Lei, et al. "Expanding the genetic code of Escherichia coli. " Science 292.5516 (2001): 498-500; Chin, Jason W., et al. "An expanded eukaryotic genetic code," Science 301.5635 (2003): 964-967; Wang, Lei, and Peter G. Schultz. "Expanding the genetic code." Angewandte chemie international edition 44.1 (2005): 34-66; Liu, Chang C, and Peter G. Schultz. "Adding new chemistries to the genetic code." Annual review of biochemistry 79 (2010): 413-444; Chin, Jason W. "Expanding and reprogramming the genetic code of cells and animals." Annual review of biochemistry 83 (2014): 379-408 each of which is hereby incorporated by reference in its entirety). In brief, foreign nucleic acid sequences containing a gene encoding an orthogonal amino-acyl tRNA synthetase and an associated tRNA are introduced into an organism, typically in an expression vector. In addition, a desired non-standard amino acid is added to the cell culture medium. A nucleic acid sequence corresponding to a target protein is modified so that a free codon, such as the UAG codon, is formed at the target site of the gene encoding the target protein. In the presence of these four components - aminoaeyl tRNA synthetase protein, tRNA, NSAA, and target protein mRNA - the target protein containing the NSAA is made.

Basic to the present disclosure is the use of an amino-acyl tRNA synthetase/tRNA pair cognate to a nonstandard amino acid. Exemplar}' amino-acyl tRNA synthetase/tRNA pairs cognate to a nonstandard amino acid are known to those of skill in the art or may be designed for particular non-standard amino acids, as is known in the art or as described in Wang, Lei, and Peter G. Schultz. "Expanding the genetic code." Angewandte chemie international edition 44.1 (2005): 34-66; Liu, Chang C, and Peter G. Schultz. "Adding new chemistries to the genetic code." Annual review of biochemistry 79 (2010): 413-444; and Chin, Jason W. "Expanding and reprogramming the genetic code of cells and animals." Annual review of biochemistry 83 (2014): 379-408 each of which are hereby incorporated by reference in its entirety.

According to one aspect, the amino-acyl tRNA synthetase/tRNA pair cognate to a nonstandard amino acid is orthogonal to the cellular components of the cell in which it is used . The orthogonality (and therefore the suitability) of exogenous amino-acyl tRNA synthetase/tRNA pairs is dependent on the type of host organism. Four main orthogonal aminoacyl-tRNA synthetases have been developed for genetic code expansion: the Methanococcus janaschii tyrosyl-tRNA synthetase

pair, the Escherichia, coli tyrosyl-tRNA synthetase (£cTyrRS)/tR AcuA pair, the E. coli leucyl-tRNA synthetase (EcLeuRSytRNACUA pair, and pyrrolysyl-tRNA synthetase (PylRS)/tRNAcuA pairs from certain Methanosarcina. The JW/TyrRS/tRNAcuA pair is orthogonal in E. coli but not in eukaryotic cells. The EcTyrRS/tRNAcuA pair and the EcLeuRS/tRNAcuA pair are orthogonal in eukaryotic cells but not in E. coli, whereas the PylRS/tRNAcuA pair is orthogonal in bacteria, eukaryotic cells, and animals (see Chin, Jason W. "Expanding and reprogramming the genetic code of cells and animals." Annual review of biochemistry S3 (2014): 379-408 hereby incorporated by reference in its entirety). To maintain orthogonality, the exogenous amino acyl tRNA synthetase should not recognize any native amino acids or native tRNA. To maintain orthogonality, the tRNA should not be recognized by any native amino-acyl tRNA synthetases. To maintain orthogonality, the non-standard amino acid should not be recognized by any native amino acyl tRNA synthetases. "Orthogonal" pairs meet one or more of the above conditions. It is to be understood that "orthogonal" pairs may lead to some mischarging, i.e. such as insubstantial mischarging for example, of orthogonal tRNA with native amino acids so long as sufficient efficiency of charging to the designed NSAA occurs.

Exemplar}' families of synthetases for bacteria in addition to those described above and incorporated by reference include the PylRS/t NAcuA pair and the Saccharomyces cerevisiae tryptophanyl-tRNA synthetase (ScWRS)/tRNAcuA pair. These exemplary synthetase families have natural analogs (lysine and tryptophan) that are N-end destabilizing amino acids. The following references describe useful synthetase families and their associated NSAAs. Blight, Sherry K,, et al. "Direct charging of tRNAcuA with pyrroiysine in vitro and in vivo." Nature 431.7006 (2004): 333-335; Namy, Olivier, et al. "Adding pyrroiysine to the Escherichia coli genetic code." FEBS letters 581.27 (2007): 5282-5288; Hughes, Randall A., and Andrew D. Ellington. "Rational design of an orthogonal tryptophanyl nonsense suppressor tRNA." Nucleic acids research 38. 19 (2010): 6813-6830; Ellefson, Tared W., et al. "Directed evolution of genetic parts and circuits by compartmentalized partnered replication." Nature Biotechnology 32.1 (2014): 97-101; and Chatterjee, Abhishek, et al. "A Tryptophanyl-tRNA Synthetase/tRNA Pair for Unnatural Amino Acid Mutagenesis in E. coli." Angewandte Chemie International Edition 52.19 (2013): 5106-5109 each of which are hereby incorporated by reference in its entirety. As is known in the art, the synthetase catalyzes a reaction that attaches the nonstandard amino acid to the correct tRNA. The amino-acyl tRNA then migrates to the ribosome. The ribosome adds the nonstandard amino acid where the tRNA anticodon corresponds to the reverse complement of the codon on the mRNA of the target protein to be translated.

II. Orthogonal Translation Systems and Variants Thereof

According to one aspect, a method is provided for screening for amino acyl tRNA synthetase variants and their cognate transfer RNA variants having improved selectivity for incorporating a desired non-standard amino acid versus standard amino acid or an undesired non-standard amino acid at a selected site in a protein or a polypeptide. According to one aspect, the screening is based on using prokaryotic or eukaryotic cells containing a CipS-ClpAP protease system. In certain exemplary embodiments, the protease system includes the adaptor protein ClpS or homologs or mutants thereof, such as ClpS_V65I, ClpS_V43I or ClpS_L32F. In certain embodiments, adaptor protein ClpS variants including ClpS_V65I, ClpS_V43I or ClpS L32F are used since they exhibit improved selectivity for certain amino acids, such as between standard amino acids and non-standard amino acids or between a desired NSAA and an undesired NSAA.

In exemplary embodiments, biphenyl alanine (BipA) aminoacyl-tRNA synthetase (BipARS) variants are generated by making one or more amino acid substitutions of a parental biphenyl alanine amino acyl tRNA synthetase having the amino acide sequence of MDEFEMI RNT SEESEEELRE VLKKDEK S AHIGFEP SGKIHLGHYLQIKKMIDLQNAG FDEIFILADLHAYLNQKGELDEIRKIGDYNKK EAMGLKAKYVYGSEWMLDKDYT LNVYRLAIJ TTLKRARRSMEU^

EQRKIHMLARELLPKKVVCIHNPVLTGLDGEGKMSSS GNFIAVDDSPEEERAKI KA

YCI^AGVVEGNPIMEIAKYFLEYPL^'Il M⁵E FGGDLT\^rNSYEELESLFKNKELHl¾lDL KNAVAEELIKILEPIRKRL (SEQ ID NO: 1). In this manner, synthetases can be evolved and their variants screened in a high-throughput fashion for their function of producing a protein or polypeptide incorporating a biphenylalamne at a desired position in the protein or polypeptide. In this manner, those synthetases with improved function can be identified and modified further to improve efficiency and selectivity. In some embodiments, the synthetase variant includes at least one, two, three, four, five, six, seven, eight, nine or ten amino acid substitutions of the parental synthetase. In some embodiments, the synthetase variant includes from about ten to about twenty, from about twenty to about fifty amino acid substitutions of the parental synthetase. In certain embodiments, the synthetase variant includes one or more amino acid substitutions selected from the group consisting of N157K and I255F, R257G, R181C and E259V, I153V and A214T, P37A, K76R, I49F, A130V and A233V, L55M and G158S, D61V and H70Q and N1 17D, D200Y, G210S, E237V and D286Y to the parental biphenvlalamne amino acyl tRNA synthetase, or an amino acid sequence having at least at least 50%, at least 60%, at least 70%, at least 80%, e.g., at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%>, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%>, or 100%, sequence identity thereof. In an exemplary embodiment, the variant includes amino acid substitutions D61V and H70Q to the parental biphenylalanine amino acyl tRNA synthetase, or an amino acid sequence having at least 50%, at least 60%, at least 70%, at least 80%, e.g., at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity thereof.

According to one aspect, the present disclosure provides a transfer RNA (tRNA) variant wherein the variant comprises one or more nucleotide substitutions to a parental tRNA having the sequence of ccggcggtagttcagcagggcagaacggcggactctaaatccgcatggcaggggttcaaatcccctccgccggacca (SEQ ID NO: 2). In some embodiments, the tRNAvariant includes at least one, two, three, four, five, six, seven, eight, nine or ten nucleotide substitutions of the parental tRNA. In some embodiments, the tRNA variant includes from about ten to about twenty, from about twenty to about fifty nucleotide substitutions of the parental tRNA, In certain embodiments, the tRNA variant includes nucleotide substitution selected from the group consisting of A22G, C67A, C26T, C29A, G51T and G23A to the parental tRNA, or nucleotide sequence having at least at least 50%, at least 60%, at least 70%, at least 80%, e.g., at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%, sequence identity thereof.

According to another aspect, the present disclosure provides a biphenylalanine amino acyl tRNA synthetase and tRNA pair. In certain embodiments, the pair includes either or both of a biphenylalanine amino acyl tRNA synthetase variant and a tRNA variant. In some embodiments, the pair includes i) a biphenylalanine amino acyl tRNA synthetase variant comprising amino acid substitutions N157K and I255F to the parental biphenylalanine amino acyl tRNA synthetase and the parental tRNA; ii) a biphenylalanine amino acyl tRNA synthetase variant comprising an amino acid substitution R257G to the parental biphenylalanine amino acyl tRNA synthetase and the parental tRNA; iii) a biphenylalanine amino acyl tRNA synthetase variant comprising amino acid substitutions R181C and E259V to the parental biphenylalanine amino acyl tRNA synthetase and a tRNA variant comprising a nucleotide substitution A22G to the parental tRNA; vi) a biphenylalanine amino acyl tRNA synthetase variant comprising amino acid substitutions II 53V and A214T to the parental biphenylalanine amino acyl tRNA synthetase and a tRNA variant comprising a nucleotide substitution C67A to the parental tRNA; v) a biphenylalanine amino acyl tRNA synthetase variant comprising an amino acid substitution P37A to the parental biphenylalanine amino acyl tRNA synthetase and the parental tRNA; vi) a biphenylalanine amino acyl tRNA synthetase variant comprising an amino acid substitution K76R to the parental biphenylalanine amino acyl tRNA synthetase and the parental tRNA; vii) the parental biphenylalanine amino acyl tRNA synthetase and a tRNA variant comprising a nucleotide substitution A22G to the parental tRNA; viii) a biphenylalanine amino acyl tRNA synthetase variant comprising amino acid substitutions I49F, A130V and A233V to the parental biphenylalanine amino acyl tRNA synthetase and a tRNA variant comprising a nucleotide substitution C26T to the parental tRNA; ix) a biphenyialanine amino acyl tRNA synthetase variant comprising amino acid substitutions L55M and G158S to the parental biphenyialanine amino acyl tRNA synthetase and a tRNA variant comprising a nucleotide substitution C29A to the parental tRNA; x) a biphenyialanine amino acyl tRNA synthetase variant comprising amino acid substitutions D61.V and H70Q to the parental biphenyialanine amino acyl tRNA synthetase and a tRNA variant comprising a nucleotide substitution G5 IT to the parental tRNA; or xi) a biphenyialanine amino acyl tRNA synthetase variant comprising amino acid substitutions N117D, D200Y, G210S, E237V and D286Y to the parental biphenyialanine amino acyl tRNA synthetase and a tRNA variant comprising a nucleotide substitution G23 A to the parental tRNA.

III. Removable Protecting Groups

According to one aspect, the target polypeptide includes a removable protecting group adjacent to the amino acid target location such that when the removable protecting group is removed, the amino acid target location is an N-end amino acid. Exemplary removable protecting groups are known to those of skill in the art and can be readily identified in the literature based on the present disclosure. According to one aspect, the removable protecting is a peptide sequence produced by the ceil when making the target polypeptide. According to one aspect, the removable protecting is a peptide sequence produced by the cell when making the target polypeptide, such that the removable peptide and the target polypeptide is a fusion. According to this aspect, the cell is genetically modified to include a foreign nucleic acid sequence encoding the target polypeptide including a non-standard amino acid substitution at an amino acid target location and a removable protecting group attached to the target polypeptide adjacent to the amino acid target location. According to one aspect, the removable protecting group is foreign to the cell, i.e. it is not endogenous to the cell. In this manner, the removable protecting is orthogonal to endogenous enzymes or other conditions within the cell. An exemplary removable protecting group includes a cleavable protecting group, such as an enzyme cleavable protecting group. According to one aspect, the cell produces an enzyme that cleaves the removable protecting group to generate an N-end amino acid. An exemplary removable protecting group is a protein that is cleavable by a corresponding enzyme. According to one aspect, a removable protecting group is foreign to the cell and is not endogenous. According to one aspect, the enzyme that cleaves the removable protecting group is foreign to the cell and is not endogenous. According to one aspect, an exemplary removable protecting group for prokaryotic cells is ubiquitin that is cleavable by Ubpl . According to another aspect, an exemplary removable protecting group for eukaryotic ceils i s the sequence MENLYFQ/* (SEQ ID NO: 4), where "*" is the target position for the NSAA (known in the field as the Ρ position), where "/" represents the cut site, and where "ENLYFQ/*" (SEQ ID NO: 5) is the sequence that is cleavable by certain variants of TEV protease. Ordinarily, TEV protease cleavage efficiency is influenced by the choice of the amino acid at the Ρ position. However, mutants of TEV protease have been engineered which have increased or altered substrate tolerance at the PI ' position (see Renicke, Christian, Roberta Spadaccini , and Chri stof Taxis. "A Tobacco Etch Virus Protease with Increased Substrate Tolerance at the P l'position." PloSone 8.6 (2013): e67915 hereby incorporated by reference in its entirety). The use of TEV protease in vivo in mammalian cells has been demonstrated and is described in Oberst, Andrew, et al. "Inducible dimerization and inducible cleavage reveal a requirement for both processes in caspase-8 activation." Journal of Biological Chemistry 285.22 (2010): 16632-16642 hereby incorporated by reference in its entirety. One of skill will readily understand based on the present disclosure that the methods described herein are useful in prokaryotic cells and eukaryotic cells.

According to the present disclosure, the N-end target residue is exposed using materials and methods that are or will become apparent to one of skill based on the present disclosure. An exemplar}- removable protecting protein domain includes a self-splicing domain, such as an intein, or other cleavable domains such as small ubiquitin modifiers (SUMO proteins). An exemplar}' removable protecting group may be a protein cleavage sequence along with its cognate partner, such as the TEV cleavage site and TEV protease. In general, any of the strategies used to remove N-terminal affinity tags in protein purification can serve as alternative ways to expose the N-end target residue. An exemplar}' system to expose the N-end target residue includes a class of enzymes known as methionine aminopeptidases which can remove the first N-terminal residue, such as when the second residue is the amino acid target location which is the desired site of addition of aNSAA. According to one aspect, the amino acid target location may be the N-terminal location or it may be any location between the N-terminal location and the C-terminal location. Accordingly, methods are provided for removing a protecting group and/or all amino acids up to the amino acid target location, thereby rendering the amino acid target location being the N-terminal amino acid.

IV. Detectable Moiety

According to one aspect, the target polypeptide includes a detectable moiety attached to the C-end of the target polypeptide. Exemplary detectable moieties are known to those of skill in the art and can be readily identified in the literature based on the present disclosure. According to one aspect, the detectable moiety is a peptide sequence produced by the cell when making the target polypeptide. According to one aspect, the detectable moiety is a peptide sequence produced by the cell when making the target polypeptide, such that the detectable moiety and the target polypeptide is a fusion. According to this aspect, the cell is genetically modified to include a foreign nucleic acid sequence encoding the target polypeptide including a non-standard amino acid substitution at an amino acid target location and a detectable moiety attached to the target polypeptide, for example, at the C-end of the target polypeptide. According to one aspect, the detectable moiety is foreign to the cell, i.e. it is not endogenous to the cell.

An exemplar}' detectable moiety is a fluorescent moiety, such as GFP, that can be detected by fluorimetry, for example. An exemplary detectable moiety is a reporter protein. An exemplar}- detectable moiety includes a protein that confers antibiotic resistance which can be detected in the presence of an antibiotic. An exemplar ' detectable moiety includes an enzyme that perform s a function (such as Beta-Gal actosidase) that can lead to easy colorimetric output.

Aspects of the methods described herein may make use of epitope tags and reporter gene sequences as detectable moieties. Non-limiting examples of epitope tags include histidine (His) tags, V5 tags, FLAG tags, influenza hemagglutinin (HA) tags, Myc tags, VSV-G tags, and thioredoxin (Trx) tags. Examples of reporter genes include, but are not limited to, glutathione-S-transferase (GST), horseradish peroxidase (HRP), chloramphenicol acetyltransferase (CAT) beta-galactosidase, betaglucuronidase, luciferase, green fluorescent protein (GFP), HcRed, DsRed, cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), and autofluorescent proteins including blue fluorescent protein (BFP).

V. Genetic Modifications

Aspects of the present discl osure include the genetic modification of a cell to include foreign genetic material which can then be expressed by the cell . The cell may be modified to include any other genetic material or elements useful in the expression of a nucleic acid sequence. Foreign genetic elements may be introduced or provided to a ceil using methods known to those of skill in the art. For example, the cell may be genetically modified to include a foreign nucleic acid sequence encoding the target polypeptide including a non-standard amino acid substitution at an amino acid target location, a removable protecting group attached to the target polypeptide adjacent to the amino acid target location and a detectable moiety attached to the C-end of the target polypeptide. The nonstandard amino acid may be encoded by a corresponding nonsense or sense codon. The cell may be genomically receded to recognize an engineered amino-acyl tR A synthetase corresponding or cognate to a nonstandard amino acid. The cell may be genetically modified to include a foreign nucleic acid sequence encoding an amino-acyl tRNA synthetase and/or a transfer RNA corresponding or cognate to the nonstandard amino acid and wherein the nonstandard amino acid is provided to the cell and the cell expresses the synthetase and the transfer RNA to include the nonstandard amino acid at the amino acid target location. The cell is genetically modified to include a foreign nucleic acid sequence encoding an enzyme for cleaving the removable protecting group under influence of an inducible promoter. The cell is genetically modified to include an inducible promoter influencing the production of an enzyme system for removal of the removable protecting group. The enzyme system or component thereof may be under influence of the inducible promoter. For example, the adapter which helps associate the cleavage enzyme with the removable protecting group may be under influence of an inducible promoter.

In general, nucleic acids may be introduced into a cell using any method known to those skilled in the art for such introduction. Such methods include transfection, transduction, viral transduction, microinjection, lipofection, nucleofection, nanoparticle bombardment, transformation, conjugation and the like. One of skill in the art will readily understand and adapt such methods using readily identifiable literature sources.

Aspects of the methods described herein may make use of vectors. The term "vector" includes a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. Vectors used to deliver the nucleic acids to cells as described herein include vectors known to those of skil 1 in the art and used for such purposes. Certain exemplary vectors may be plasmids, ientiviruses or adeno-associated viruses known to those of skill in the art. Vectors include, but are not limited to, nucleic acid molecules that are single-stranded, double- stranded, or partially double-stranded; nucleic acid molecules that comprise one or more free ends, no free ends (e.g. circular); nucleic acid molecules that comprise DNA, RNA, or both; and other varieties of polynucleotides known in the art. One type of vector is a "plasmid," which refers to a circular double stranded DNA loop into which additional DNA segments can be inserted, such as by standard molecular cloning techniques. Another type of vector is a viral vector, wherein viraily-derived DNA or NA sequences are present in the vector for packaging into a virus (e.g. retroviruses, lentiviruses, replication defective retroviruses, adenoviruses, replication defective adenoviruses, and adeno-associated viruses). Viral vectors also include polynucleotides carried by a virus for transfection into a host ceil. Certain vectors are capable of autonomous replication in a host cell into which they are introduced (e.g. bacterial vectors having a bacterial origin of replication and episomal mammalian vectors). Other vectors (e.g., non-episomal mammalian vectors) are integrated into the genome of a host ceil upon introduction into the host cell, and thereby are replicated along with the host genome. Moreover, certain vectors are capable of directing the expression of genes to which they are operativeiy linked. Such vectors are referred to herein as "expression vectors." Common expression vectors of utility in recombinant DNA techniques are often in the form of plasmids. Recombinant expression vectors can comprise a nucleic acid of the invention in a form suitable for expression of the nucleic acid in a host cell, which means that the recombinant expression vectors include one or more regulatory elements, which may be selected on the basis of the host cells to be used for expression, that is operatively-1 inked to the nucleic acid sequence to be expressed. Within a recombinant expression vector, "operably linked" is intended to mean that the nucleotide sequence of interest is linked to the regulatory element(s) in a manner that allows for expression of the nucleotide sequence (e.g. in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell).

Aspects of the methods described herein may make use of regulatory elements. The term "regulatory element" is intended to include promoters, enhancers, internal ribosomal entry sites (IRES), and other expression control elements (e.g. transcription termination signals, such as polyadenylation signals and poly-U sequences). Such regulatory elements are described, for example, in Goeddel, GENE EXPRESSION TECHNOLOGY : METHODS IN ENZYMOLOGY 185, Academic Press, San Diego, Calif. (1990). Regulatory elements include those that direct constitutive expression of a nucleotide sequence in many types of host ceil and those that direct expression of the nucleotide sequence only in certain host cells (e.g., tissue- specific regulatory sequences). Regulatory elements useful in eukaryotic cells include a tissue- specific promoter that may direct expression primarily in a desired tissue of interest, such as muscle, neuron, bone, skin, blood, specific organs (e.g. liver, pancreas), or particular cell types (e.g. lymphocytes). Regulator}' elements m ay also direct expression in a temporal -dependent manner, such as in a cell-cycle dependent or developmental stage-dependent manner, which may or may not also be tissue or cell-type specific. In some embodiments, a vector may comprise one or more pol III promoter (e.g. 1 , 2, 3, 4, 5, or more po! III promoters), one or more pol II promoters (e.g. 1 , 2, 3, 4, 5, or more pol II promoters), one or more pol I promoters (e.g. 1, 2, 3, 4, 5, or more pol I promoters), or combinations thereof. Examples of pol III promoters include, but are not limited to, U6 and HI promoters. Examples of pol II promoters include, but are not limited to, the retroviral Rous sarcoma virus (RSV) LTR promoter (optionally with the RSV enhancer), the cytomegalovirus (CMV) promoter (optionally with the CMV enhancer) [see, e.g., Boshart et al, Cell, 41 :521 -530 ( 1985)], the SV40 promoter, the dihydrotolate reductase promoter, the β-actin promoter, the phosphoglvcerol kinase (PGK) promoter, and the EF la promoter and Pol II promoters described herein. Also encompassed by the term "regulatory element" are enhancer elements, such as WPRE; CMV enhancers; the R- U5' segment in LTR of HTLV-I (Mol. Cell. Biol., Vol. 8(1), p. 466-472, 1988); SV40 enhancer; and the intron sequence between exons 2 and 3 of rabbit β-giobin (Proc. Natl. Acad. Sci. USA., Vol. 78(3), p. 1527-31, 1981). It will be appreciated by those skilled in the art that the design of the expression vector can depend on such factors as the choice of the host ceil to be transformed, the level of expression desired, etc. A vector can be introduced into host cells to thereby produce transcripts, proteins, or peptides, including fusion proteins or peptides, encoded by nucleic acids as described herein (e.g., clustered regularly interspersed short palindromic repeats (CRISPR) transcripts, proteins, enzymes, mutant forms thereof, fusion proteins thereof, etc.). Common prokaryotic promoters include IPTG (isopropyl B-D-l- thiogalactopyranoside) inducible, anhydrotetracycline inducible, or arabinose inducible promoters. Such promoters express genes only in the presence of IPTG, anhydrotetracycline, or arabinose in the medium. An exemplary promoter for use in bacteria such as E. coli to express aminoacyl tRNA synthetase is an arabinose inducible promoter. An exemplary promoter for use in bacteria such as E. coli to express a reporter protein is an anhydrotetracycline inducible promoter.

Aspects of the methods described herein may make use of terminator sequences. A terminator sequence includes a section of nucleic acid sequence that marks the end of a gene or operon in genomic DNA during transcription. This sequence mediates transcriptional termination by providing signals in the newly synthesized mRNA that trigger processes which release the mRNA from the transcriptional complex. These processes include the direct interaction of the mRNA secondary structure with the complex and/or the indirect activities of recruited termination factors. Release of the transcriptional complex frees RNA polymerase and related transcriptional machinery to begin transcription of new mRNAs. Terminator sequences include those known in the art and identified and described herein,

VI. Adapter Protein Protease Systems

According to one aspect, the cell includes a protease system for degrading the target polypeptide when the N-end amino acid is a standard amino acid. The protease system may be endogenous or exogenous. The ceil may include an adapter or discriminator protein that coordinates with a protease for degrading the target polypeptide when the N-end amino acid is a standard amino acid. The adapter protein may be under influence of an inducible promoter. According to one aspect, the adapter protein is ClpS or a variant or mutant thereof. According to one aspect, adapter proteins may have different levels of selectivity for certain amino acids. According to certain aspects, adapter proteins, such as ClpS may be altered to improve selectivity, such as between standard amino acids and non-standard amino acids or between a desired NSAA and an undesired NSAA. According to one aspect, the protease system is a ClpS-ClpAP protease system.

According to one aspect protease systems include Clps or homologs or mutants thereof, such as ClpS V65T ClpS _V43I or ClpS L32F, The N-end rule is mediated by homologs of ClpS/ClpAP in bacteria. In eukaryotes, the N-end rule involves more distant homologs of CipS (UBRl, ubiquitin E3 ligases) and degradation by the proteasome. Accordingly, the present disclosure contemplates use of many of the bacterial ClpS homologs to perform similar functions with slightly different amino acid recognition specificity. The present disclosure also contemplates use of eukaryotic protease systems, such as UBRl and related variants to mediate N-end rule recognition with different amino acid recognition specificity in eukaryotes. VII. Cells

According to certain aspects, cells according to the present disclosure include prokaryotic cells and eukaryotic cells. Exemplary prokaryotic cells include bacteria. Microorganisms which may serve as host cells and which may be genetically modified to produce recombinant microorganisms as described herein may include one or members of the genera Clostridium, Escherichia, Rhodococcus, Pseudomonas, Bacillus, Lactobacillus Saccharomyces, and Enterococcus . Particularly suitable microorganisms include bacteria and archaea. Exemplary microorganisms include Escherichia coli, Bacillus subtilis, and Saccharomyces cerevisiae. Exemplar' eukaryotic cells include animal cells, such as human ceils, plant cells, fungal cells and the like.

In addition to E. coli, other useful bacteria include but are not limited to Bacillus suhtilis, Bacillus megaterium, Bifidobacterium bifidum, Caulohacter crescentus, Clostridium difficile, Chlamydia trachomatis, Corynebacterium glutamicum, Lactobacillus acidophilus, Lactococcus lactis, Mycoplasma geniialium, Neisseria gonorrhoeae, Prochlorococcus mar inns, Pseudomonas aeruginosa, Psuedomonas putida, Treponema pallidum, Streptomyces coelicolor, Synechococcus elongates, Vibrio natrigiens, and l ^"ymomonas mobilis.

Exemplary genus and species of bacteria cells include Acetobacter aurantius, Acinetobacter bitumen, Actinomyces israelii, Agrobacterium radiobacter, Agrobacterium turn efaci ens, Anaplasma Anaplasma phagocytophilum, Azorhizobium caulinodans, Azotobacter vinelandii, viridans streptococci, Bacillus anthracis, Bacillus brevis, Bacillus cereus, Bacillus fusiformis, Bacillus licheniformis, Bacillus megaterium, Bacillus mycoides, Bacillus stearothermophilus, Bacillus subtilis, Bacteroides, Bacteroides fragilis, Bacteroides gingival! s, Bacteroides melaninogenicus (also referred to as Prevotella melaninogenica ), Bartonella ,Bartonelia henselae, Bartonella quintana, Bordetella, Bordetella bronchi septica, Bordetella pertussis, Borrelia burgdorferi, Brucella abortus, Brucella melitensis, Brucella suis, Burkholderia, Burkholderia mallei, Burk olderia pseudomallei, Burkholderia cepacia, Calymmatobacterium granulomatis, Campylobacter, Campylobacter coli, Campylobacter fetus, Campylobacter jejuni, Campylobacter pylori, Chlamydia, Chlamydia trachomatis, Chlamydophiia Chlamydophila pneumoniae (also known as Chlamydia pneumoniae) Chlamydophila psittaci (also known as Chlamydia psittaci), Clostridium, Clostridium botulinum, Clostridium difficile, Clostridium perfringens (also known as Clostridium welchii), Clostridium tetani, Corynebacterium, Corynebacterium diphtheriae, Corynebacterium fusiforme, Coxiella burnetii, Ehrlichia chaffeensis, Enterobacter cloacae, Enterococcus, Enterococcus avium, Enterococcus durans, Enterococcus faecalis, Enterococcus faecium, Enterococcus gailiinanmi, Enterococcus maloratus, Escherichia coli, Francisella tuiarensis, Fusobacterium nucleatum, Gardnerella vaginalis, Haemophilus, Haemophilus ducreyi, Haemophilus influenzae, Haemophilus parainfluenzae, Haemophilus pertussis, Haemophilus vaginalis, Helicobacter pylori, Klebsiella pneumoniae, Lactobacillus, Lactobacillus acidophilus, Lactobacillus bulgaricus, Lactobacillus casei, Lactococcus lactis, Legionella pneumophila, Listeria monocytogenes, Methanobacterium extroquens, Microbacterium multiforme, Micrococcus luteus, Moraxella catarrhalis, Mycobacterium, Mycobacterium avium, Mycobacterium bovis, Mycobacterium diphtheriae, Mycobacterium intraceliulare, Mycobacterium leprae, Mycobacterium iepraemurium, Mycobacterium phiei, Mycobacterium smegmatis, Mycobacterium tuberculosis, Mycoplasma, Mycoplasma term en tans, Mycoplasma genitalium, Mycoplasma hominis, Mycoplasma penetrans, Mycoplasma pneumoniae, Neisseria, Neisseria gonorrhoeae, Neisseria meningitidis, Pasteurelia, Pasteurelia multocida, Pasteurella tuiarensis, Peptostreptococcus, Poiphyromonas gingivalis, Prevotella melaninogenica (also known as Bacteroides melaninogenicus), Pseudomonas aeruginosa, Rhizobium radiobacter, Rickettsia, Rickettsia prowazekii, Rickettsia psittaci, Rickettsia quintana, Rickettsia rickettsii, Rickettsia trachomae, Rochalimaea, Rochalimaea henselae, Rochalimaea quintana, Rothia dentocariosa, Salmonella, Salmonella enteritidis, Salmonella typhi. Salmonella typhimurium, Serratia marcescens, Shigella dysenteriae, Staphylococcus, Staphylococcus aureus, Staphylococcus epidermidis, Stenotrophomonas maltophiiia, Streptococcus Streptococcus agalactiae, Streptococcus avium, Streptococcus bovis, Streptococcus cricetus, Streptococcus faceium, Streptococcus faecalis, Streptococcus ferus, Streptococcus gallinarum, Streptococcus lactis, Streptococcus mitior, Streptococcus mitis, Streptococcus mutans, Streptococcus oralis, Streptococcus pneumoniae, Streptococcus pyogenes, Streptococcus rattus, Streptococcus saiivarius, Streptococcus sanguis, Streptococcus sobrinus, Treponema, Treponema pallidum, Treponema denticola, Vibrio, Vibrio cholerae, Vibrio comma, Vibrio parahaemolvticus, Vibrio vulnificus, Wolbachia, Yersinia, Yersinia enterocolitica, Yersinia pestis, and Yersinia pseudotuberculosis, and other genus and species known to those of skill in the art.

Exemplary genus and species of yeast cells include Saccharomyces, Saccharomyces cerevisiae, Torula, Saccharomyces bouiardii, Schizosaccharomyces, Schizosaccharomyces pombe, Candida, Candida glabrata, Candida tropicalis, Yarrowia, Candida parapsilosis, Candida krusei, Saccharomyces pastorianus, Brettanomyces, Brettanomyces bruxellei sis, Pichia, Pichia guilliermondii, Cryptococcus, Cryptococcus gattii, Torulaspora, Torulaspora delbrueckii, Zvgosaccharomvces, Zv osaccharomvces bailii, Candida lusitaniae, Candida stellata, Geotrichum, Geotrichum candidum, Pichia pastoris, Kluyveromyces, Kluyveromyces marxianus, Candida dubli iensis, Kluyveromyces, Kluyveromyces lactis, Trichosporon, Trichosporon uvarum, Eremothecium, Eremothecium gossypii, Pichia stipitis, Candida milieri, Ogataea, Ogataea polymorpha, Candida oleophilia, Zygosaccharomyces rouxii, Candida albicans, Leucosporidium, Leucosporidium frigidum, Candida viswanathii, Candida blankii, Saccharaomyces telluris, Saccharomyces florentinus, Sporidiobolus, Sporidioboius salmonicolor, Dekkera, Dekkera anomala, Lachancea, Lachancea kluyveri, Trichosporon, Trichosporon mycotoxinivorans, Rhodotorula, Rhodotorula rubra, Saccharomyces exiguus, Sporobolomyces koalae, and Trichosporon cutaneum, and other genus and species known to those of skill in the art.

Exemplary genus and species of fungal cells include Sac fungi, Basidiomycota, Zygomycota, Chtridiomycota, Basidiomycetes, Hyphomycetes, Glomeromyeota, Microsporidia, Blastocladiomycota, and Neocallimastigomycota, and other genus and species known to those of skill in the art.

Exemplary eukaryotic cells include mammalian cells, plant ceils, yeast cells and fungal ceils.

VIII. Standard Amino Acid

As used herein, the term "SAA" (standard amino acid) include one of the L-amino acids that typically naturally occur in proteins on Earth and includes alanine, arginine, asparagine, aspartic acid, cysteine, glutamic acid, glutamine, glycine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, serine, threonine, tyrosine, tryptophan, proline and valine. The standard amino acids that are naturally N-end destabilizing in most bacteria include tyrosine, phenylalanine, tryptophan, leucine, lysine, and arginine. According to one aspect, the amino acid at the amino acid target location is an NSAA that is stabilizing. When the natural analog of the NSAA is destabilizing and is present at the amino acid target location, degradation of the polypeptide occurs. Standard amino acids that are not naturally destabilizing via the N-end rule using natural ClpS, can be destabilizing when the ClpS is engineered to recognize such standard amino acid.

The N-end rule in bacteria may also be engineered to recognize isoleucine, valine, aspartate, glutamate, asparagine, and glutamine as destabilizing using methods known to those of skill in the art which is useful when the desired NSAA is an analog of these amino acids. For example, isoleucine and valine can be converted into N-end destabilizing residues by introducing a ClpS variant (M40A) that recognizes these amino acids as -terminal destabilizing residues see (Roman-Hernandez G, Grant RA, Sauer RT, & Baker TA (2009) Molecular basis of substrate selection by the N-end rule adaptor protein ClpS. Proceedings of the National Academy of Sciences 106(22):8888-8893 hereby incorporated by reference in its entirety). Aspartate and glumatate may be converted into N-end destabilizing residues by introducing a bacterial aminoacyl -transferase from Vibrio vulnificus (Bpt) that is a homolog of eukaryotic transferases and N-terminally appends a leucine (L) to peptides containing N- terminaily exposed aspartate or glutamate (see Graciet E, et ai. (2006) Aminoacyi-transferases and the N-end rule pathway of prokaryotic/eukaryotic specificity in a human pathogen. Proceedings of the National Academy of Sciences of the United States of America 103(9):3078-3083 hereby incorporated by reference in its entirety). The ability of Bpt to catalyze this reaction has been demonstrated in E. coli and shows that components of the N- end rule, which includes many more conditionally destabilizing residues in eukaryotes, can be transferred across kingdoms. Asparagine and glutamine can be converted into N-end destabilizing residues by using an N-teraiinal amidase from S. cerevisiae (NTA1), which converts N-terminal asparagine into aspartate or N -terminal glutamine into glumate, respectively (see Tasaki T, Sriram SM, Park KS, & Kwon YT (20 2) The N-End Rule Pathway. Annual Review of Biochemistry 81(l):261-289 hereby incorporated by reference in its entirety). Indeed, in many eukaryotic cells these amino acids and more are naturally conditionally N-end destabilizing. One of skill will understand that an N-end rule destabilizing pathway may be provided for all 20 standard amino acids as a basis for a system where a desired amino acid from among the 20 standard amino acids is N-end destabilizing in at least one context (see Chen, Shun-Jia, et al. "An N-end rule pathway that recognizes proline and destroys gluconeogenic enzymes." Science 355.6323 (2017): eaa!3655 hereby incorporated by rweference in its entirety). One of skill in the art can identify the eukaryotic proteins required for conferring expanded N-end destabilization and transfer them to prokaryotes as needed. Similarly, in eukaryotic cells one can constitutively express components required for conferring expanded N-end destabilization such that degradation of proteins containing N-end standard amino acids no longer remains conditional. One of skill will recognize that some amino acids rendered destabilizing may have adverse consequences for cell physiology. For example, most native proteins begin with methionine and if methionine is made N-end destabilizing then most proteins would degrade. Aspects of converting an N-end stabilizing amino acid to an N-end destabilizing amino acid can be tested in a particular organism. IX. Non-Standard Amino Acid

As used herein, the term "NSAA" refers to an unmodified amino acid that is not one of the 20 naturally occurring standard L-amino acids. NSAAs also include synthetic amino acids which have been designed to include a non-standard functional group not present in the standard amino acids or are naturally occurring amino acids bearing functional groups not present in the set of standard amino acids. Accordingly, a non-standard amino acid may include the structure of a standard amino acid and which includes a non-standard functional group. A non-standard amino acid may include the basic amino acid portion of a standard amino acid and include a non-standard functional group.

NSAAs also refer to natural amino acids that are not used by ail organisms (e.g. L- pyrrolysine (B. Hao et a!,, A new uag-encoded residue in the structure of a methanogen methyitransferase. Science. 296: 1462) and L-seienocysteine (S. Osawa et al., Recent evidence for evolution of the genetic code. Microbiol Mol. Biol. Rev. 56:229)). NSAAs are also known in the art as unnatural amino acids (UAAs) and non-canonical amino acids (NCAAs).

NSAAs include, but are not limited to, p-Acetylphenylalanine, m-Acetylphenylalanine, O-aliyltyrosine, Phenylselenocysteine, p-Propargyloxyphenylalanine, p-Azidophenylalanine, p-Boronophenylalanine, O-methyityrosine, p-Aminophenylalanine, p-Cyanophenyialanine, m-Cyanophenyl alanine, p-Fluorophenylalanine, p-Iodophenylalanine, p-Bromophenylalanine, p-Nitrophenylalanine, L-DOPA, 3-Aminotyrosine, 3-Iodotyrosine, p-Isopropylphenylalanine, 3-(2-Naphthyl)alanine, biphenylalanine, homoglutamine, D-tyrosine, p-Hydroxyphenyllactic acid, 2-Aminocaprylic acid, bipyridylalanine, HQ-alanine, p-Benzoylphenylalanine, o- trobenzyl cysteine, o-Nitrobenzylserine, 4,5-Dimethoxy-2-Nitrobenzylserine, o- Nitrobenzyllysine, o-Nitrobenzyltyrosine, 2-Nitrophenylalanine, dan syl alanine, p- Carboxymethyiphenyialanine, 3-Nitrotyrosine, sulfotyrosine, acetyllysine, methylhistidine, 2- Aminononanoic acid, 2-Aminodecanoic acid, pyrrolysine, Cbz-lysine, Boc-lysine, allyloxycarbonyllysine, arginosuccinic acid, citrulfine, cysteine sulfinic acid, 3,4- dihydroxyphenylalanine, homocysteine, homoserine, ornithine, 3-monoiodotyrosine, 3,5- diiodotryosine, 3, 5, 5, -triiodothyronine, and 3,3 ',5,5'-tetraiodothyronine. Modified or unusual amino acids include D-amino acids, hydroxylysine, 4-hydroxyproline, N-Cbz-protected amino acids, 2,4-diaminobutyric acid, homoarginine, norieucine, N-methylaminobutyric acid, naphthyiaianine, phenylglycine, -phenylproline, tert-leucine, -aminocyclohexyl alanine, N- methyl-norl eucine, 3,4-dehydroproline, Ν,Ν-dimethylaminoglycine, N-methylaminoglycine, 4-aminopiperidine-4-carboxylic acid, 6-aminocaproic acid, trans-4-(aminomethyl)- cyclohexanecarboxylic acid, 2-, 3-, and 4-(aminomethyl)-benzoic acid, 1 - aminocyclopentanecarboxylic acid, 1-aminocyclopropanecarboxylic acid, and 2-benzyl-5- aminopentanoic acid, and the like. NSAAs also include amino acids that are functionalized, e.g., alkyne-functionalized, azide-functionalized, ketone-functionalized, aminooxy- functionaiized and the like. For reviews of NSAAs and lists of NSAAs suitable for use in certain embodiments of the subject invention, see Liu and Schultz (2010) Ann. Rev. Biockem. 79:413, and Kim et al. (2013) Cnrr. Opin. ( ^'hem. Biol. 17:412, each of which is incorporated herein by reference in its entirety for all purposes.

In certain aspects, an NSAA of the subject invention has a corresponding aminoacyl tRNA synthetase (aaRS)/tRNA pair. In certain aspects, the aminoacyl tRNA synthetase/tRNA pair is orthogonal to those in a genetically modified organism such as, e.g., a prokaryotic cell, a bacterium (e.g., E. coif), a eukaryotic cell, a yeast, a plant cell, an insect cell, a mammalian cell, a virus, etc. In certain aspects, an NSAA of the subject invention is non -toxic when expressed in a genetically modified organism such as, e.g., a prokaryotic cell, a bacterium (e.g., E. coif), a eukaryotic cell, a yeast, a plant cell, an insect cell, a mammalian cell, a vims, etc. In certain aspects, an NSAA of the subject invention is not or does not resemble a natural product present in a cell or organism. In certain aspects, an NSAA of the subject invention is hydrophobic, hydrophilic, polar, positively charged, or negatively charged. In other aspects, an NSAA of the subject invention is commercially available (such as, e.g., L-4,4-bipnehylalanine (bipA) and L-2-Naphthylalanine (napA)) or synthesized according to published protocols.

EXAMPLE I

Post-translational proofreading (PTP) for selective BipA OTS evolution A cell is genetically modified for the screening. The cell is provided with a nucleic acid sequence encoding a ubiquitin fused to the N-terminus of the protein wherein the N-terminus of the protein is an amino acid target location intended to have a nonstandard amino acid. The nonstandard amino acid may be encoded by a nonsense or sense codon. The cell is provided with a ubiquitin cleavase. The ceil may include an endogenous protease system, such as a ClpS-ClpAP system. The cell is provided with a non-standard amino acid. The cell expresses the fusion protein having either a standard or a non-standard amino acid incorporated at the amino acid target location. The ubiquitin cleavase cleaves the ubiquitin to produce a protein having either the standard or non-standard intervening amino acid at its N-terminus. If a standard amino acid is present at the N-terminus, the ClpS recognizes the standard amino acid at the N-terminus and targets the protein having the standard amino acid at its N-terminus to ClpP for degradation. If a nonstandard amino acid is present at the N-terminus, the Clps does not recognize the nonstandard amino acid and the protein is not targeted for degradation. A residue is destabilizing if it is recognized by the ClpS adaptor protein, which is the discriminator of the N-end rule in E. coli such as is described in Erbse A, et al. (2006) ClpS is an essential component of the N-end rule pathway in Escherichia coli. Nature 439(7077):753- 756 and Wang KH, Oakes ESC, Sauer RT, & Baker TA (2008) Tuning the Strength of a Bacterial N-end Rule Degradation Signal. Journal of Biological Chemistry 283(36):24600- 24607; Schmidt R, Zahn R, Bukau B, & Mogk A (2009) ClpS is the recognition component for Escherichia coli substrates of the N-end rule degradation pathway . Molecular Microbiology 72(2):506-517.; Roman-Hernandez G, Grant RA, Sauer RT, & Baker TA (2009) Molecular basis of substrate selection by the N-end rule adaptor protein ClpS. Proceedings of the National Academy of Sciences 106(22): 8888-8893; Schuenemann VJ, et al. (2009) Structural basis of N-end rule substrate recognition in Escherichia coli by the Cl AP adaptor protein ClpS. EMBO reports 10(5):508-514; Roman -Hernandez G, Hou Jennifer Y, Grant Robert A, Sauer Robert T, & Baker Tania A (2011) The ClpS Adaptor Mediates Staged Delivery of N-End Rule Substrates to the AAA+ ClpAP Protease. Molecular Cell 43(2):217-228; and Hou JY, Sauer RT, & Baker TA (2008) Distinct structural elements of the adaptor ClpS are required for regulating degradation by ClpAP. Nat Struct Mol Biol 15(3):288-294 each of which is hereby incorporated by reference in its entirety.

The disclosure provides a method of screening for an amino acyl tRNA synthetase variant that preferentially selects a non-standard amino acid against its standard amino acid counterpart or an undesired non-standard amino acid for incorporation into a polypeptide in a cell. In one embodiment, the cell is provided with an amino acyl tRNA synthetase variant. In another embodiment, the cell is provided with a nucleic acid sequence encoding a ubiquitin fused to the N-terminus of the polypeptide wherein the N-terminus of the polypeptide is an amino acid target location intended to have a nonstandard amino acid, and wherein GFP is fused to the C-end of the polypeptide (Ub-UAG-sfGFP). The nonstandard amino acid may be encoded by a nonsense or sense codon. The cell is provided with a ubiquitin cleavase, such as Ubpl . The cell may include an endogenous protease system, such as a ClpS-ClpAP system. In certain embodiment, the Ub-UAG-sfGFP construct is integrated into the cell's genome (C321.AClpS .Ub-UAG-sfGFP). In an exemplary embodiment, the UBPl~clpS^V631 expression cassette is integrated into C321.AClpS.Ub-UAG-sfGFP (resulting in strain C321.Nend). The ceil is provided with a non-standard amino acid. The cell expresses the fusion protein having either a standard or a non-standard amino acid incorporated at the amino acid target location. The ubiquitin cleavase cleaves the ubiquitin to produce a protein having either the standard or non-standard intervening amino acid at its N-terminus. If a standard amino acid is present at the N-terminus, the ClpS recognizes the standard amino acid at the N-terminus and targets the protein having the standard amino acid at its N-terminus to ClpP for degradation, including the GFP portion. If a nonstandard amino acid is present at the N-terminus, the Clps does not recognize the nonstandard amino acid and the protein is not targeted for degradation. The GFP is detected and is indicative of the presence of a synthetase variant that preferentially selects the non-standard amino acid against its standard amino acid counterpart for incorporation into the protein.

According to another aspect, the strength of the signal detected from the GFP is indicative of the amount of protein produced that included the nonstandard amino acid. In this manner, methods are provided for screening and evolving an amino acyl tR A synthetase variant that preferentially selects a non-standard amino acid against its standard amino acid counterpart for incorporation into a protein in a cell.

The ability of FTP to discriminate incorporation of intended NSAA from related SAAs is especially useful for high-throughput screening of OTS libraries. To demonstrate this for proof-of-concept, the UBPl-clpS^¹ expression cassette was genomically integrated into C321.AClpS.Ub-UAG-sfGFP (resulting in strain C321.Nend). This strain was then used to improve the selectivity of the "wild-type" (WT) BipA OTS, Previous efforts to engineer MjTyrRS variants like BipARS focused on site-directed mutagenesis on positions near the amino acid binding pocket (See, "L. Wang, A. Brock, B. Herberich, P. G. Schultz, Expanding the genetic code of Escherichia coli. Science (80-, ). 292, 498-500 (2001); T. S. Young, I. Ahmad, J. A, Yin, P. G, Schultz, An Enhanced System for Unnatural Amino Acid Mutagenesis in E. coli. J. Mol. Biol. 395, 361-374 (2010)). To generate a novel BipARS library, error-prone PGR was used to introduce 2-4 mutations throughout the bipARS gene. After assembly, these libraries were transformed into C321.Nend and screened with three rounds of F ACS sorting: (i) positive sort for GFP+ cells in BipA+; (ii) negative sort for GFP- cells in BipA-; (iii) final positive sort for GFP+ cells in BipA+ (Fig. 1A). To decrease promiscuity against other NS AAs, negative screening stringency was altered by varying addition of undesired NSAAs (as many as pAcF, pAzF, tBtylY, NapA, and pBnzylF), which changed the profile of isolated variants (Fig. 2). Upon characterizing the 11 most enriched variants after miniprep and transformation into C321.Ub-UAG-sfGFP (no PTP), it was observed that variants isolated from lower stringency negative sorts exhibited greater activity on BipA and lower activity on SAAs compared to the WT OTS, as well as varying degrees of activity on undesired NSAAs (Fig. IB and Table 1, Variants 1-6). Supplementation with undesired NSAAs enriched for mutants with even greater selectivity against SAAs and undesired NSAAs (Variants 4, 9-11) but also gave rise to cheaters (variant 8), suggesting that these conditions may be nearly too harsh. One mutant, Variant 10, exhibited high activity on BipA and no observable activity on any other NSAAs except tBtylY, which contains the inert tert-Butyl protecting group (Fig. IB). SDS- PAGE of reporter protein resulting from the Variant 10 OTS after expression and affinity purification showed no observable BipA- protein production (Fig. 3A). Furthermore, mass spectrometry confirmed site-specific BipA+ BipA incorporation (Figs. 3B-3D).

During characterization of BipA OTS variants we discovered spontaneous tRNA mutations present in our most apparently selective variants, such as 4, 9, and 10. Mutations were present at C29A for Variant 4, C67A for Variant 9, and G51U for Variant 10 (Fig. 5A). Notably, when these tRNA mutations were reverted, each corresponding BipA OTS became more promiscuous (Fig. SB), suggesting that observed tRNA mutations increase selectivity. The G51 position (G50 in E. coli nomenclature) mutated in tRNA Variant 10 is the most significant base pair in determining acylated tRNA binding affinity to Elongation Factor Tu (EF-Tu), which influences incorporation selectivity downstream of the AARS (See, F. J. LaRiviere, A. D. Wolfson, O. C. Uhlenbeck, Uniform Binding of Aminoacyl-tRNAs to Elongation Factor Tu by Thermodynamic Compensation. Science (80-. ). 294 (2001); J. M. Schrader, S. J, Chapman, O, C. Uhlenbeck, Understanding the Sequence Specificity of tRNA. Binding to Elongation Factor Tu using tRNA Mutagenesis, J. Mol. Biol 386, 1255-1264 (2009)). Large hydrophobic NSAAs may have stronger interactions than Y with EF-Tu (See, Taraka Dale, and Lee E. Sanderson, O. C. Uhlenbeck*, The Affinity of Elongation Factor Tu for an Aminoacyl-tRNA Is Modulated by the Esterified Amino Acid† (2004), doi : 10.1021/BI036290O), and the C-U mismatch at this position in the tRNA may weaken the EF-Tu interaction. Improved selectivity of OTS Variant 10 may therefore result from a weaker tRNA-EF-Tu interaction that compensates for a stronger amino acid interaction.

To more rigorously assess OTS selectivity, we purified AARS and tRNA for the WT, Variant 9, and Variant 10 OTSs. The observed in vitro substrate specificity as determined by tRNA aminoacylation is in excellent agreement with our in vivo assays (Fig. 1C, Figs. 5C-D).

To demonstrate the utility of a more selective OTS, we substituted the WT BipA OTS construct previously used in three biocontained strains that exhibit observable escape frequencies with new plasmids containing either WT or Variant 10 OTS. These three biocontained strains harbor a computationally predicted redesign to the following essential genes to make their stability dependent on biphenylalanine: adk (adk.d6), tyrS (iyrS.dS), or adk and tyrS (adkd6/tyrS.dS) (See, D. J. Mandell et al, Biocontainment of genetically modified organisms by synthetic protein design. Nature. 518, 55-60 (2015)). In all three strains, we observed lower escape frequencies on non -permissive media at all three measured days (Figs. 1D-F, Figs, 4A-B). The difference in escape frequency was most apparent for the adk.d6/tyrS.d8 strain, which exhibited a 7-day escape frequency of 7.36 X 10^"9, which is by more than two orders of magnitude the lowest ever observed for a C321.AA-derived strain containing only two LIAG codons in essential genes and no other biocontainment-related genomic alterations outside of the two essential genes. Furthermore, the fitness of all three strains improved with the Variant 10 OTS, with doubling time decreasing by a factor of as much as 0.54 (Fig, 1G). Finally, Variant 10 also delayed onset of growth of adk.d6/tyrS.d8 on non-cognate NSAAs (Table 2). We expect these benefits to carry over to all strains which employ Variant 10 over WT OTS.

In addition to providing a new paradigm for OTS evaluation and evolution, PTP can be transformative for applications in which amino acid positions are in competitive states, such as screening of natural synthetases for NSAA acceptance, sense codoi reassignment, and post- translational modifications. PTP may also find use in translational regulation and as an orthogonal biocontainment strategy. Given that all 20 SAAs are known to be N-end destabilizing under certain conditions (See, S.-J. Chen, X. Wu, B. Wadas, J.-H. Oh, A. Varshavsky, An N-end rule pathway that recognizes proline and destroys gluconeogenic enzymes. Science (80-, ), 355 (2017), conditionally expressed components could be transferred across organisms to dramatically alter the set of N-end destabilizing SAAs for a particular application.

Evolution methods for improving selectivity

Improving selectivity or activity need not be conflicting goals, but different reporters and screening schemes may be better suited for one or the other. For example, a recent evolution method, (See, J, W. Tobias, T. E. Shrader, G. Rocap, A. Varshavsky, The N-end rule in bacteria. Science (80-. ). 254, 1374-1377 (1991)) used a reporter containing many UAG sites and a genome-integrated OTS system. The resulting higher ratio of UAG sites to OTS expression compared to this study can produce OTSs capable of expressing protein containing as many as 30 UAGs. However, we tested this evolved OTS (pAcFRS.2.tl ) on our genome- integrated Ub-X-GFP reporter and observed remarkably high promiscuity (Fig. 6). In fact, the promiscuity increased substantially from the parental construct, which offers insight into the potential tradeoff between selectivity/activity and the importance of methods capable of achieving either aim. Except for Variant 8, the variants described in this disclosure exhibit greater than parental selectivity for BipA compared to other structurally similar NSAAs or SAAs. Thus, we envision that these BipARS/tRNA variants will be useful for applications where selectivity is important. Biocontainment is an exceptionally relevant use case given that promiscuous activity on amino acid substrates besides BipA can lead to growth in contexts that are intended to be "non-permissive" (ie., environments where Bip A is not present). Examples where biocontainment is important include safe expression of toxic biological agents, safeguards for accidental release of multi-virus resistance organisms, controlled environmental remediation, and in engineered probiotics to prevent undesired proliferation in the gist or in the environment upon excretion. In addition, BipARS/tRNA variants with greater selectivity can be used for tight translational control of protein expression, and they can also be more effectively used in conjunction with other NSAAs for applications that would benefit from use of multiple NSAAs simultaneously.

EXAMPLE II

Materials and Methods

Strains and strain engineering

E. coli strain C321.AA (CP006698. 1), which was previously engineered to be devoid of UAG codons and RF1 , was the starting strain used for this study, (See, . H. Wang, R. T. Sauer, T. A. Baker, ClpS modulates but is not essential for bacterial N-end rule degradation. Genes Dev. 21, 403-8 (2007); K. H. Wang, E. S. C. Oakes, R. T. Sauer, T. A. Baker, Tuning the Strength of a Bacterial N-end Rule Degradation Signal. J. Biol, Chem. 283, 24600-24607 (2008), M. .) . Lajoie et al , Genomically Receded Organisms Expand Biological Functions, Science (80- ). 342, 357-360 (2013); M. Amiram et al., Evolution of translation machinery in recoded bacteria enables multi-site incorporation of nonstandard amino acids. Nat Biotech. 33, 1272-1279 (2015); S. Million-Weaver, D. L. Alexander, J. M. Alien, M. Camps, in Methods in molecular biology (Clifton, N.J.) (2012; http://www.ncbi.nlm.nih.gov/pubmed/22144351), vol. 834, pp. 33-48; J. W. Tobias, A. Varshavsky, Cloning and functional analysis of the ubiquitin-specific protease gene UBP1 of Saccharomyces cerevisiae. J. Biol. Chem. 266, 12021-8 (1991); A. Wojtowicz et al., Expression of yeast deubiquitination enzyme UBP1 analogues in E. coli. Microb. Cell Fact. 4, 1-12 (2005); G. Roman-Hernandez, J. Y. Hou, R. A. Grant, R. T. Sauer, T. A. Baker, The ClpS Adaptor Mediates Staged Delivery of N-End Rule Substrates to the AAA.+ CipAP Protease. Mol Cell. 43, 217-228 (2011)). The TET promoter and Ub-UAG-sfGFP expression cassette was genomically integrated using λ Red recombineering, (See, K. A. Datsenko, B. L. Wanner, One-step inactivation of chromosomal genes in Escherichia coli K-12 using PGR products. Proc Natl Acad Sci USA . 97, 6640-6645 (2000); D. Yu et al, An efficient recombination system for chromosome engineering in Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 97, 5978-83 (2000)) and tolC negative selection using Colicin El (See, J. A. DeVito, Recombineering with tolC as a Selectable/Counter-selectable Marker: remodeling the rRNA Operons of Escherichia coli . Nucleic Acids Res. 36, e4 (2008); C. .) . Gregg ei aL, Rational optimization oftoiC as a powerful dual selectable marker for genome engineering. Nucleic Acids Res. 42, 4779-4790 (2014)). This resulted in strain C321.Ub-UAG-sfGFP. Please see Table 3 for sequences of key constructs such as the reporter construct. Multiplex automatable genome engineering (MAGE) (See, H. H. Wang et al.. Programming ceils by multiplex genome engineering and accelerated evolution. Nature. 460, 894-898 (2009)) was used to inactivate the endogenous mutS and clpS genes when needed and to add or remove UAG codons in the integrated reporter. For MAGE, saturated overnight cultures were diluted 100-fold into 3 raL LB^L containing appropriate antibiotics and grown at 34 °C until mid-log. The integrated Lambda Red cassette in C321. ΔΑ derived strains was induced in a shaking water bath (42 °C, 300 rpm, 15 minutes), followed by cooling culture tubes on ice for at least two minutes. These cells were made eiectrocompetent at 4 °C by pelleting 1 mL of culture (16,000 rcf, 20 seconds) and washing twice with 1 mL ice cold deionized water (dH20). Eiectrocompetent pellets were resuspended in 50 JJL of dH20 containing the desired DNA. For MAGE oligonucleotides, 5 μ,Μ of each oligonucleotide was used. Please see Table 4 for a list of all oligonucleotides used in this study. For integration of dsDNA cassettes, 50 ng was used. Ailele-specific colony PGR (ASC-PCR) was used to identify desired colonies resulting from MAGE as previously described (See, F, J. Isaacs et al, Precise manipulation of chromosomes in vivo enables genome-wide codon replacement. Science (80- . ). 333, 348-353 (20 1 )). Colony PGR was performed using Kapa 2G Fast HotStart ReadyMix according to manufacturer protocols and Sanger sequencing was performed by Genewiz to verify strain engineering. The strains C321.Ub-UAG-sfGFP, C321.Ub-UAG-sfGFP UAG151 , and C321.AClpS.Ub-UAG-sfGFP are available from Addgene. Ub-X-GFP reporters containing codons encoding SAAs in place of UAG were generated from Ub-UAG-GFP by PGR and Gibson assembly, and they were subsequently cloned into the pOSIP-TT vector for Clonetegration (one-step cloning and chromosomal integration) into NEB5a strains (See, F. St-Pierre et al. One-step cloning and chromosomal integration of DNA. ACS Synth. Biol. 2, 537-541 (2013)). The UBPl/clpS V65I operon was also placed under weak constitutive expression and integrated into C321.AClpS. Ub-UAG-sfGFP using Clonetegration. This strain (C321.Nend) was used as the host for FACS experiments.

Table 1 . Sequences of evolved BipA OTS variants.

Parental ("WT") sequences shown below

The amino acid sequence of the WT BipARS:

MDEFEMIKR TSEnSEEELREVLKKDEKSAHIGFEPSGKIHLGHYLQIKKMIDLQNAG

FDIIfflLADLHAYLNQKGELDEIRXIGDYN K EAilGLKAKY\^rYGSEWMLDKDYT LNVYRLALKTTLKRARRSMELIAREDENPKVAEVIYPIMQVNGIHYKGVDVAVGGM EQRKfflMLAREIJ.PKKVVT.fflNPW GLDGEG MSSSKGNFIA\T)DSPEEIRA IK A YCPAGV ΈG PIMEIA YFLEYPLTIKRPEKFGGDLTVNSYΈELESLFKNKELHPMDL

K^'NAVAEELIKILEPIRKRL ( SI X.) ID NO: 1)

The nucleotide sequence encoding the parental BipARS:

alggacgaatlcgaaatgatcaaacg aaca ae^

ctacctgcagaicaaaaaaaigatcgaectgcagaacgcgggtticgacatcai^

acaaaaaagttttcgaagcgatgggtctgaaagcgaaatacgtttacggtra

atggaactgategcgcgtgaagacgaaaacccgaaagttgcggaagttatctac^

igctggcgcgtgaactgctgccgaaaaaagttgUtgcatccacaa^

gtgcgaaaatcaaaaaagcgiactgcccggcgggtgttgttgaaggtaa∞

ttaactcttacgaagaactggaaicicigttcaaaaacaaagaactgcacccgaiggacctgaaaaacgcggttgcggaagaacigatcaaaatcctggaacc (SEQ ID

NO: 6)

The nucleotide sequence of the parental tRNA_.o pt ?_A:

ccggcggJagttcagcagggcagaacggcggactcta;¾a1ccgcatggcaggggttcaaaicccctocgccggacca (SEQ ID NO : 2)

Table 2. Growth of biocontained adk.d6/tyrS.d8 strain on 100 μΜ non-cognate NSAAs as

DNO: Did not observe within a 48 hour incubation period

Table 3. Sequences o: key constructs

Construct Name eesiee

Ubiquitin-*- ATGCAGATTTTTGTGAAGACTTTAACAGGTAAGACGATTACCCT LFVQEL-sfGFP- GGAGGTGGAGTCCTCGGACACCATCGATAATGTAAAATCAAAA His6x ("LFVQEL" ATCCAAGATAAGGAAGGAATCCCTCCAGACCAGCAACGTCTGA and "Hisox" TTTTCGCAGGTAAACAACTGGAGGATGGTCGCACGCTTTCGGAC disclosed as SEQ TACAACATCCAGAAAGAATCTACCCTTCATTTGGTTCTGCGTCTG

CGTGGAGGATAGTTGTTTGTGCAGGAGCTTgcatccaagggcgaggagctct ID NOS 7 and 8, ttactggcgtagtaccaattctcgtagagctcgatggcgatgtaaatggccataagttttccgtacgcggcga respectively) gggcgagggcgatgcaactaacggcaagctcactctcaagtttatttgtactactggcaagctcccagtac catggccaactctcgtaactactctgacctatggcgtacaatgtttttcccgctatccagatcacatgaagcaa catgatttttttaagtccgcaatgccagagggctatgtacaagagcgcactattagctttaaggatgatggca cctataagactcgcgcagaggtaaagtttgagggcgatactctcgtaaatcgcattgagctcaagggcattg attttaaggaggatggcaatattctcggccataagctggagtataatttcaattcccataatgtatacattaccg cagataagcaaaagaatggcattaaggcgaattttaagattcgccataatgtggaggatggctccgtacaa ctcgcagatcattatcaacaaaatactccaattggcgatggcccagtactcctcccagataatcattatctctc cactcaatccgtgctctccaaagatccaaatgagaagcgcgatcacatggtactcctggagtttgtaactgc agcaggcattactcatggcatggatgagctctataagctcgagcaccaccaccaccaccactaa (SEQ

ID NO: 9)

ClpS2_At gBlock ATGTCTGATAGTCCTGTTGACTTAAAACCCAAGCCTAAAGTCAA

GCCCAAATTAGAACGCCCAAAACTTTACAAAGTCATGTTATTGA ATGATGATTATACACCACGCGAATTTGTGACGGTAGTCCTTAAA GCGGTGTTTCGTATGTCAGAGGACACTGGTCGCCGTGTAATGAT GACAGCACATCGTTTTGGTTCGGCGGTGGTGGTCGTTTGTGAAC

GTGACATTGCAGAGACGAAAGCCAAGGAGGCGACCGACTTGGG GAAGGAAGCAGGTTTTCCTTTGATGTTCACGACTGAGCCCGAGG AGTAA (SEQ ID NO: 10)

pAzFRSJ .il GTTATGcactacGATggtgttgacgttTACgttggtggtatggaacagcgtaaaatccacatgct gBlock ggcgcgtgaactgctgccgaaaaaagttgtttgcatccacaacccggttctgaccggtctggacggtgaag gtaaaatgtcttcttctaaaggtaacttcatcgcggttgacgactctccggaagaaatccgtgcgaaaatcaa aaaagcgtactgcccggcgggtgttgttgaaggtaacccgatcatggaaatcgcgaaatacttcctggaat acccgctgaccatcaaaGGT (SEQ ID NO: 1 1) ScUBPl^tmnc, or ATGGGGAGTGGGTCTTTCATTGCTGGGCTTGTCAACGATGGTAA

UBP 1 TACGTGTTTTATGAACTCGGTTCTTCAGTCCCTTGCTAGTAGCCG

TGAACTTATGGAGTTTTTGGATAATAATGTAATCCGTACATATG

AAGAAATTGAACAGAACGAGCACAATGAGGAAGGTAATGGCCA

AGAGAGCGCACAAGATGAGGCAACTCAC AAAAAAAACACTCGC

AAGGGAGGTAAGGTCTATGGGAAGCATAAAAAGAAATTAAACC

GCAAATCTTCTAGCAAGGAAGACGAAGAAAAGTCGCAAGAACC

AGACATTACGTTTTCGGTGGCGTTGCGTGATCTGCTGAGCGCAT

TAAATGCTAAGTATTATCGCGACAAACCCTACTTTAAGACTAAC

TCTTTATTAAAAGCGATGAGCAAGTCCCCGCGCAAAAATATCTT

GCTTGGGTACGATCAAGAAGACGCTCAGGAATTTTTTCAAAACA

TTCTTGCGGAGTTAGAATCTAATGTCAAGTCGTTAAACACAGAA

AAGCTTGATACTACACCX JTAGCCAAGTCCGAACTTCCAGACGA

TGCTCTGGTTGGCCAATTAAACCTTGGTGAGGTAGGCACCGTGT

ACATTCCCACAGAACAAATTGACCCCAATTCGATTTTACATGAC

AAATCGATTCAAAACTTTACCCCCTTTAAACTGATGACCCCGTT

GGATGGGATCACGGCTGAGCGCATCGGCTGCCTGCAATGCGGA

GAGAACGGGGGAATTCGCTACAGTGTTTTCAGCGGATTAAGTTT

GAACCTGCCGAATGAAAATATTGGAAGCACTCTTAAACTGTCCC

AGTTACTGTCCGATTGGTCGAAACCCGAGATTATCGAGGGTGTT

GAATGCAACCGTTGCGCTTTAACAGCTGCGCACTCACACTTGTT

TGGCCAATTAAAGGAGTTTGAGAAGAAACCTGAAGGCTCGATTC

CCGAAAAACTTATTAATGCCGTAAAGGACCGCGTGCACCAGATC

GAAGAGGTCTTGGCAAAGCCGGTTATCGACGATGAAGATTATA

AAAAATTGCATACTGCGAATATGGTCCGCAAGTGTTCAAAAAGT AAACAAATTCTTATCTCTCGTCCACCACCTTTGTTGTCTATTCAT

ATCAACCGCTCTGTTTTCGACCCGCGCACCTACATGATTCGCAA

GAACAACTCCAAGGTTTTGTTCAAGTCACGCTTGAACCTGGCAC

CCTGGTGCTGTGATATCAACGAAATCAATCTTGACGCACGCCTT

CCGATGTCGAAGAAGGAAAAAGCAGCTCAACAAGATTCTTCTG

AAGACGAGAACATTGGCGGAGAGTACTATACTAAATTGCATGA

ACGTTTTGAGCAGGAGTTTGAAGATTCTGAAGAAGAGAAGGAA

TACGATGATGCAGAGGGTAATTATGCATCGCATTATAACCATAC

CAAGGACATCTCCAACTACGATCCATTGAATGGAGAAGTCGACG

GTGTGACTTCCGATGATGAGGATGAATACATTGAAGAGACAGA

CGCGTTGGGGAATACCATCAAAAAACGTATTATTGAACACTCCG

ACGTGGAGAACGAAAACGTGAAGGATAATGAAGAACTTCAGGA

GATCGATAACGTTAGCTTGGATGAGCCAAAAATTAATGTCGAGG

ACCAGCTTGAAACGAGTTCTGATGAGGAAGACGTTATTCCTGCT

CCACCCATCAACTACGCTCGCAGCTTTAGTACGGTCCCAGCGAC

CCCTTTAACTTACTCTTTGCGCAGCGTCATCGTGCACTATGGGAC

TCACAACTACGGACATTATATTGCATTTCGCAAGTATCGTGGAT

GTTGGTGGCGCATCTCCGATGAGACGGTCTATGTGGTAGATGAG

GCCGAAGTACTGTCAACACCGGGGGTATTTATGCTTTTCTACGA

GTATGATTTCGACGAGGAGACCGGAAAAATGAAAGACGACTTA

GAAGCTATCCAGAGCAATAATGAGGAAGATGACGAGAAAGAAC

AGGAACAGAAGGGTGTCCAGGAGCCAAAAGAATCCCAGGAGCA

AGGCGAAGGCGAAGAACAAGAAGAAGGGCAAGAGCAAATGAA

ATTTGAGCGTACGGAGGATCATCGCGACATTTCAGGGAAGGATG

TGAATTAA (SEQ ID NO: 12) Table 4. Oiigonuc eotides used

Oligo Name Seqoessee SEQ ID NO pZE21-seq-F CCATTATTATCATGACATTAACC 13 pZE21-seq-R GGATTTGTCCTACTCAGGAG 14

AARS-seq-F CTTTTTATCGCAACTCTC 15

Ubiquitin+N- TTAAAGAGGAGAAATTAACTATGCAGATTTTTGTGAA 16 degron-F GACT

Ubiquitin+N- AGCTCCTCGCCCTTGGATGCAAGCTCCTGCACAAACAA 17 degron-R GT

pEVOLbbone_ C A GGGA AGG A TGTG A ATT A AT A A GTC G AC CATC A TC A 18

Ubpl-F TCA

pEVOLbbone_ AT :sAAAGACCCACTX .CCCAT AGAlX:TAATT .CT ::ClXrT 19

Ubpl-R TAGC

Ubpl -Pl-F TAACAGGAGGAATTAGATCTATGGGGAGTGGGTCTTT 20

CAT

Ubpl -Pl-R TCAAGCGTGACTTGAACAAAACCTTGGAGTTGTTCTTG i

CG

Upbl -P2-F CGCAAGAACAACTCCAAGGTTTTGTTCAAGTCACGCTT 22

GA

Upbl -P2-R TGATGATGATGGTCGACTTATTAATTCACATCCTTCCC 23

TGA

pUbi-*-Ndeg- TGCGTCTGCGTGGAGGATAGTTGTTTGTGCAGGAGCTT 24

GFP-F GC

pUbi-*-Ndeg- AAGCTCCTGCACAAACAACTATCCTCCACGCAGACGC 25

GFP-R Ubpl int-seq~F GCTTGGGTACGATCAAGAAG 26

Ubpl int-seq- CCTTGGTATGGTTATAATGCG 27 R

pZE21bbone4 CAGGGAAGGATGTGAATTAAAAGCTTGATGGGGGATC 28 Ubpl -F CCA

pZE21bbone4 ATGAAAGACCCACTCCCCATGGTACCTTTCTCCTCTTT 29 Ubpl -R AATGAAT

Ubpl-ins-F TTAAAGAGGAGAAAGGTACCATGGGGAGTGGGTCTTT 30

CAT

Ubpl-ins-R TGGGATCCCCCATCAAGCTTTTAATTCACATCCTTCCC 31

TGA

UbiGFPins-F TAAAGAGGAGAAAGGTACCATGCAGATTTTTGTGAAG 32

ACTTTAAC

UbiGFPins-R TGGGATCCCCCATCAAGCTTTTAGTGGTGGTGGTGGTG 33

GT

pZEbbone4Ubi ACCACCACCACCACCACTAAAAGCTTGATGGGGGATC 34

GFP-F CCA

pZEbbone4Ubi GTCTTCACAAAAATCTGCATGGTACCTTTCTCCTCTTTA 35

GFP-R ATGAAT

reporter to ge TTACGGGCTAATTACAGGCAGAAATGCGTGATGTGTG 36 nome-F CCACACTTGTTGATCCCTATCAGTGATAGAGATTGAC reporter to ge CCAGCGGGCTAACTTTCCTCGCCGGAAGAGTGGTTAA 37 nome-R CAAAATAGTAACGTCACCGACAAACAACAGATAAAAC

SIR-seq-F CCAAAGTGAGTTGAGTATAAC 38

SIR-seq-R TTTCTCCTTATTATCAATGC 39 r2g-extend-F GCCGCAGCAAGCCAAAGTGAGTTGAGTATAACGCAAA 40 TTTGCTACTGGTCCGATGGGTGCAATGGTCTGAATTAC GGGCTAATTACAGGC

r2g-extend- AACGCAATCGCAACCGCTAAACCACTGGCCATGTGCA 41

CGAGTTTCATTCATTTCTCCTTATTATCAATGCACCAGC

GGGCTAACTTTC

MAGE_*toS t*a*aagagctcctcgcccttggatgcAAGCTCCTGCACAAACAACgA 42

TCCTCCACGCAGACGCAGAACCAAATGAAGGGTAGAT

TCTTTCT

asPCR-S-F CGTCTGCGTGGAGGATC 43 asPCR-*-F CGTCTGCGTGGAGGATA 44 pZE- TTCTGACCCATCGTAATTAAaagcttgatgggggatccca 45

Ubplbbone4Cl

pP-F

pZE- tGGTATATCTCCTTTTATTATTAATTCACATCCTTCCCTG 46

Ubplbbone4Cl AAAT

pP-R

clpPins-F GTGAATTAATAATAAAAGGAGATATACCatgTCATACA 47

GCGGCGA

clpPins-R tgggatcccccatcaagcttTTAATTACGATGGGTCAGAATCG 48 pEVOLtR A- ctgccaacttactgatttagtgtatgatggtgtttttgagg 49 pl-F

pEVOLtRNA- gccgcttagttagccgtgcaaacttatatcgtatggggctg 50 pl-R agccccatacgatataagtttgcacggctaactaagcggc 51 p2-F

ctcaaaaacaccatcatacactaaatcagtaagttggcagcatca 52 p2-R

pZE- TGTGTACGCTAGAAAAAGCCTAAaagcttgatgggggatc 53

Ubplbbone4Cl

pS-F

pZE- GTTCGTTTTACCcatGGTATATCTCCTTTTATTATTAATT 54

Ubplbbone4Cl CACAT

pS-R

ClpSins-F ATAATAAAAGGAGATATACCatgGGTAAAACGAACGAC 55

TG

ClpSins-R gatcccccatcaagcttTTAGGCTTTTTCTAGCGTACACA 56

AARSlibraryin tactgtttctccatacccgtttttttgggctaacaggaggaattagatct 57 s-F

pEVOLbbone4 agatctaattcctcctgttagcc 58 lib-R

mutS null mut A*C*CCCATGAGTGCAATAGAAAATTTCGACGCCCATA 59

-2* CGCCCATGATGCAGCAGTGATAGTCGCTGAAAGCCCA

GCATCCCGAGATCCTGC

mutS_null_rev A*C*CCCATGAGTGCAATAGAAAATTTCGACGCCCATA 60 ert-2* CGCCCATGATGCAGCAGTATCTCAGGCTGAAAGCCCA

GCATCCCGAGATCCTGC mutS- CCATGATGCAGCAGTATCTCAG 61

2_ascPCR_wt-

F

mutS- CCATGATGCAGCAGTGATAGTC 62

2_ascPCR_mut

-F

mutS- AGGTTGTCCTGACGCTCCTG 63

2 ascPCR-R

ASPCR- GTATAATTTCAATTCCCATAATGTATAG 64 151UAG-F

ASPCR- GTATAATTTCAATTCCCATAATGTATAC 65 151UAC-F

ASPCR-151-R ctcgagcttatagagctcatc 66

RemovelS l UA c*t*taaaattcgccttaatgccattcttttgcttatctgcggtaatgtatacattatgggaattg 67 G- aaattatactccagcttatggccgag

MAGE correct

ed

ClpS.inact- C*T*TTTTCTTCCGCCAGTTGATCAAAGTCCAGCCAGTC 68

MAGE GTTCtaTTatCaCATTGTCAGTTATCATCTTCGGTTACGGT

TATCGGCAGAAC

ASPCR- CCGATAACCGTAACCGAAGATGATAACTGACAATGG 69 ClpS_WT-F

ASPCR- CCGATAACCGTAACCGAAGATGATAACTGACAATGT 70 ClpS.inact-F ASPCR-ClpS- CGTACTTGTTCACCATCGCCACTTTGGT 71 R

pZE-U- CGACTGAGCCCGAGGAGTAAaagcttgatgggggatccca 72 bbone4ClpS2_

At-F

pZE-U- TCAACAGGACTATCAGACATGGTATATCTCCTTTTATT J bbone4ClpS2_ ATTAATTCACATCC

At-R

ClpS2_At-ins- ATAATAAAAGGAGATATACCATGTCTGATAGTCCTGTT 74

F GACTT

ClpS2_At-ins- tgggatcccccatcaagcttTTACTCCTCGGGCTCAGTCG 75

R

CipS Μ40Α-Ι· ATGATGATTACACTCCGGCGGAGTTTGTTATTGACGTG 76

T

ClpS_M40A-R CGTCAATAACAAACTCCGCCGGAGTGTAATCATCATTG 77

AC

pOSIPbbone-F taacctaaactgacaggcat 78 pOSIPbbone-R ttccgatccccaattcct 79 pEVOL-araC- GGATCATTTTGCGCTTCAG 80 seq-1

pEVOL-araC- GAATATAACCTTTCATTCCC 81 seq-2

PylRSmiddle- GTGTTTCGACTAGCATTTC 82 seq

PylRSend-seq GGTCAAACATGATTTCAAAAAC 83 pEVOLCmR- caacagtactgcgatgag 84 seq-R

upstreamClpS- GCAAATAAGCTCTTGTCAGC 85

C ipS I 32- CATCTATGTATAAAGTGATANTCGTCAATGATGATTAC 86 X ! { -!· ACTCCG

ClpS_32-R TATCACTTTATACATAGATG 87

ClpS-V43- ATTACACTCCGATGGAGTTTNTTATTGACGTGTTACAA 88

NTT-F AAATTC

ClpS_43-R AAACTCCATCGGAGTGTAAT 89

ClpS_V65- CAACGCAATTGATGCTCGCTNTTCACTACCAGGGGAA 90

NTT-F GG

ClpS_65-R AGCGAGCATCAATTGCGTTG 91

ClpS_L99- CGAGGGAGAATGAGCATCCANTCCTGTGTACGCTAGA 92 NTC-F AAAAGC

ClpS__99-R TGGATGCTCATTCTCCCTCG 93

Alt ClpS- gcggatttgtcctactcag 94 R_forL99

AARS- gctaacaggaggaattagatct 95 inducible-only-

F

AARS- ttgataatctaacaaggattatggg 96 inducible-only-

R pEVOLbbone- cccataatccttgttagattatcaaaggcattttgctattaaggg 97 Ind-only-F

pEVOL-bbone- agatctaattcctcctgttagc 98 ind-only-R

protosens- TAACTCGAGGCTGTTTTGG 99 bbone-F

protosens- CATATGTATATCTCCTTGTGCATC 100 bbone-R

Ubpl ClpS4prot GATGCACAAGGAGATATACATATGGGGAGTGGGTCTT 101 osens-F TCAT

Ubpl ClpS4prot CCAAAACAGCCTCGAGTTAGGCTTTTTCTAGCGTACA 102 osens-R

acccgatcatgcaggttaacGTTATGcactacGATggtgt 103 ins-F

tcaccaccgaatttttccggACCtttgatggtcagcg 104 ins-R

bbone4pAzFR ccggaaaaattcggtggtga 105 S. l .tl-F

bbone4pAzFR gttaacctgcatgatcgggt 106 S. l .tl-R

pZEbbone4tetR acgctctcctgagtaggac 107 -F

pZEbbone4tetR tcaccgacaaacaacagataaaac 108 -R

TetR-ins-F tatctgttgtttgtcggtgaacgtctcattttcgccagat 109 TetR-ins-R gtcctactcaggagagcgtagtgtcaactttatggctagc 110 pDULE-ABK- cgacctgaatggaagcc 111 bbone-F

pDULE-ABK- catacacggtgcctgac 112 bbone-R

CmRins4pDUL aacgcagtcaggcaccgtgtatggagaaaaaaatcactggatatac 113 E-F

CmR4pDULE- gccggcttccattcaggtcgaaaaaattacgccccgc 114 R

pC FRS-65- CAAAATGCTGGATTTGATATAATTATA NKTTG NKG 115 67-70-NNK-F ATTTANNKGCCTATTTAAACCAGAAAGGAGAG

pCNFRS-65-R TATAATTATATCAAATCCAGCATTTTGTAAATC 116 pCNFRS-108- GGCAAAATATGTTTATGGAAGTGAANNKNNKCTTGAT 1 17 109-114-N K- AAGGATN KACACTGAATGTCTATAGATTGGC

F

pCNFRS-108- TTCACTTCCATAAACATATTTTGCC 118 R

pC FRS-155~ GAAGTTATCTATCCAATAATG NKGTTN KGGTGCTC 119 157-161 -NNK- ATNNKCTTGGCGTTGATGTTGCAG

CATTATTGGATAGATAACTTCAGCAAC 120

R

library INS-seq- CGCATCAGGCAATTTAGC 121 R BipARS PI 44 cgcgcgtgaagacgaaaaccagaaagttgcggaagttatctac 122 Q-F

BipARS PI 44 ggttttcgtcttcacgcg 123 Q-R

BipARS XI 7 tacccgatcatgcaggttaaaggtatccactacaaaggtgttg 124

K-F

BipARS XI 57 ttaacctgcatgatcgggta 125

K-R

BipARS_R181 gtaaaatccacatgctggcgtgtgaactgctgccgaaa 126 C-F

BipARS_R181 cgccagcatgtggatttta 127 C-R

BipARS I255F tcctggaatacccgctgaccttcaaacgtccggaaaaattc 128

BipARS !255i ggtcagcgggtattccag 129 -R

BipARS_E259 gctgaccatcaaacgtccggtaaaattcggtggtgacctg 130 V-F

BipARS_E259 ccggacgtttgatggtc 131 V-R

BipARS P284 tcaaaaacaaagaactgcactcgatgcgtctgaaaaacg 132 S-F

BipA S P284 gtgcagttctttgtttttgaac 133 S-R pEVOLbbone4 ctgcagtttcaaacgctaaattg 134 iibv2-F

AARSlibraryin taggcctgataagcgtagcgcatcaggcaatttagcgtttgaaactgcag 135 sv2-R

Bi pA R S G257 aatacccgctgaccatcaaacgtccggaaaaattcggtg 136 R-F

Bi pA R S G257 accaccgaatttttccggacgtttgatggtcagcgggtat 137 R-R

BipARS- gcgaaatacgtttacggttc 138 100AA-F

BipARS- gaaccgtaaacgtatttcgc 139 100AA-R

BipARS- ggacggtgaaggtaaaatgtc 140 200AA-F

BipARS- gacattttaccttcaccgtcc 141 200AA-R

pZErepbbone4 cggcgccagggttgtttttcacgctctcctgagtaggaca 142 pylT-F

pZErepbbone4 ttccattcaggtcgaaaaaaagtgtcaactttatggctagc 143 pylT-R

pylTpDULE-F ttttttcgacctgaatggaagc 144 pylTpDULE-R gaaaaacaaccctggcgc 145 pZEbbone4pyl cggcgccagggttgtttttcacgctctcctgagtaggaca 142

Tonly-F pZEbbone4pyl ttccattcaggtcgaaaaaactcgaggtgaagacgaaagg 146 Tonly-R

ClpS-Lib-F ACATTTCAGGGAAGGATGTGAATTAATAATAAAAGGA 147

GATATACC

ClpS-Lib-R gcgtaccatgggatcccccatcaagcttTTA 148 pZEbbone4Clp TAAaagcttgatgggggatc 149 Slib-F

pZEbbone4Clp GGTATATCTCCTTTTATTATTAATTCACATCC 150 Slib-R

ClpS-Lib-Seq GGATCATCGCGACATTTC 151

Plasmids and plasmid construction

Two copies of orthogonal MjTyrRS-derived AARSs and tRNA^¹ _AOpi were kindly provided in pEVOL plasmids by Dr, Peter Schultz (Scripps Institute) (See, M. Ibba, D. Soil, Aminoacyl-tRNAs: setting the limits of the genetic code. Genes Dev. 18, 731-8 (2004)). AARSs used in this study were the following: BipARS (See, J. Xie, W. Liu, P. G. Schultz, A Genetically Encoded Bidentate, Metal-Binding Amino Acid. Angew. Chemie. 119, 9399-9402 (2007)), BipyARS, (See, J. Xie, W. Liu, P, G. Schultz, A Genetically Encoded Bidentate, Metal-Binding Amino Acid. Angew. Chemie. 119, 9399-9402 (2007)), pAcFRS, (See, L. W ang, Z. Zhang, A. Brock, P. G. Schultz, Addition of the keto functional group to the genetic code of Escherichia coli. Proc. Natl Acad. Set U. S. A. 100, 56-61 (2003)), pAzFRS (See, J. W. Chin et aL, Addition of p-Azido-l-phenylalanine to the Genetic Code of Escherichia coli. J. Am. Chem. Soc. 124, 9026-9027 (2002)), and apARS, (See,† Lei Wang, % and Ansgar Brock, J,§ Peter G. Schultz*, Adding l-3-(2-Naphthyl)alanine to the Genetic Code of E. coli (2002), doi: 10.1021/JA012307J). The pEVOL plasmids were maintained using chloramphenicol. Original plasmids harboring two AARS copies were used for synthetase promiscuity comparison experiments (Figures 2 and 3A-D). For generation and characterization of synthetase variants, plasmids harboring only one AARS copy under inducible expression were constructed using Gibson assembly, (See, D. G. Gibson et ah, Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods. 6, 343- 345 (2009)). The ScWRS-R3-13 AARS was synthesized as codon-optimized for expression in E. coli and cloned into the pEVOL plasmid along with its associated fRNA, (See, R. A. Hughes, A. D, Ellington, Rational design of an orthogonal tryptophan}'! nonsense suppressor tRNA. Nucleic Acids Res. 38, 6813-6830 (2010); J. W. Eliefson et al., Directed evolution of genetic parts and circuits by compartmentalized partnered replication. Nat. Biotechnol. 32, 97-101 (2014)), In all cases, tRNA is constitutively expressed and AARS expression is either arabinose inducible or constitutive.

An N-terminally truncated form of the UBPl gene from Saccharomyces cerevisiae, (See, J. W. Tobias, A. Varshavsky, Cloning and functional analysis of the ubiquitin-specific protease gene UBPl of Saccharomyces cerevisiae. J, Biol. Chem. 266, 12021 -8 (1991 ); A. Wojtowicz et al., Expression of yeast deubiquitination enzyme UBP l analogues in E. coli. Microb. Cell Fact. 4, 1-12 (2005)) (ScUBPl^tninc or simply UBPl) was synthesized as codon- optimized for expression in E. coli and cloned into the pZE21 vector (Kanamycin resistance, Col El origin, TET promoter) (Expressys). The E. coli genes clpS and clpP were PCR amplified from E. coli MG1655 and cloned into artificial operons downstream of the UBPl gene in the pZE21 vector using Gibson assembly. Artificial operons were created by inserting the following RBS sequence between the UBPl and clp genes: TAATAAAAGGAGATATACC (SEQ ID NO: 152), This RBS was originally designed using the RBS calculator (See, H. M Salis, E. A. MIrsky, C. A. Voigt, Automated design of synthetic ribosome binding sites to control protein expression. Nat Biotech. 27, 946-950 (2009) and previously validated in the context of another artificial operon, (See, A. M. Kunjapur, Y. Tarasova, K. L. J. Prather, Synthesis and Accumulation of Aromatic Aldehydes in an Engineered Strain of Escherichia coli. J. Am. ( ^'h m. Soc. 136, 1 1644-11654 (2014)). Rational engineering of CipS variants was performed by dividing the clpS gene into two amplicons where the second ampHeon contained a degenerate NTC or NTT sequence in the oligo corresponding to each codon of interest. The four initial positions of interest in the clpS gene correspond to amino acids 32, 43, 65, and 99. In each case, Gibson assembly was used to ligate both amplicons and the backbone plasmid. The pZE/UBPl/ClpS and pZE/UBPl/ClpS_V65I plasmids are available from Addgene.

Three reporter constructs were initially cloned into pZE21 vectors before use as templates for PGR amplification and genomic integration. The first of these consists of a Ubiquitin-*-LFVQEL-sfGFP-His6x fusion ("LFVQEL" and "His6x" disclosed as SEQ ID NOS 7 and 8, respectively) ("Ub-UAG-sfGFP") downstream of the TET promoter. The second has an additional UAG codon internal to the sfGFP at position Y151 * ("Ub-UAG- sfGFP 151UAG"). The third has an ATG codon (encoding methionine) in place of the first UAG ("Ub-M-sfGFP_151 UAG").

Culture Conditions

Cultures for general cuituring used herein were grown in LB -Lennox medium (LB^L: 10 g/L bacto tryptone, 5 g L sodium chloride, 5 g/L yeast extract). Cultures for experiments in Figs 3A-3D were grown in 2X YT medium (2XYT: 16 g/L bacto tryptone, 10 g/L bacto yeast extract, 5 g/L sodium chloride) given improved observed final culture densities compared to LB^L upon expression of ClpS variants. Unless otherwise indicated, ail cultures were grown in biological triplicate in 96-well deep-well plates in 300 Ε culture volumes at 34°C and 400 rpm. Minimal Media SAA Spiking Experiments

Minimal media adapted C321.AA strains harboring either (i) pZE21 Ub-M- sfGFP_151UAG only, (ii) pZE21/Ub-M-sfGFP_151UAG and pEVOL/Mjti?N^^_opl, (iii) pZE21/Ub-M-sfGFP_ _.151UAG only and pEVOL/bip ARS_WT-tR A_WT, or (iv) pZE21/Ub- M-sfGFP_151UAG only and pEVOL bipARS_10-tRNA_10 were inoculated from frozen stocks in at least experimental duplicates. A I X M9 salt medium containing 6.78 g L Na₂HP0₄ ^•7H₂0, 3 g L KH2PO4, 1 g L NH4CI, and 0.5 g L NaCl, supplemented with 2 mM MgS0₄, 0.1 mM CaCh, 1% glycerol, trace elements, 0.25 ^ig/L D-biotin, and carbenicillin was used as the culture medium. The trace element solution (lOOX) used contained 5 g/L EDTA, 0.83 g/L FeCly6H₂0, 84 mg/L ZnCh, 10 mg/L CoCl₂-6H 0, 13 mg/L CuCI₂-2H₂Q, 1.6 mg/L MnCl₂-2H₂0 and 10 mg/L H₃B0₃ dissolved in water (See, A. M. A. M. Kunjapur, J. C. J. C. Hyun, K. L. J. K. L. J. Prather, Deregulation of S-adenosylmethionine biosynthesis and regeneration improves m ethyl ati on in the E, coli de novo vanillin biosynthesis pathway, Microh. Cell Fact. 15, 1 (2016)). Inoculum were grown to confluence overnight in deep 96- weli plates containing supplemented with 0.2% arabinose and chloramphenicol and/or kanamycin. Experimental cultures were inoculated at 1 :7 dilution in the same media supplemented with each of the 20 standard amino acids or bip.A to 1 mM or 100 uM, respectively. Cultures were incubated at 34 °C to an ODeoo of 0.5-0.8 in a shaking plate incubator at 1050 rpm (-4-5 h). GFP expression was induced by addition of anhydrotetracycline, and cells were incubated at 34 °C for an additional 16-20 h before measurement. All assays were performed in 96-well plate format. Cells were centrifuged at 5,000g for 5 min, washed with 1 x PBS, and resuspended in 1 x PBS after a second spin. GFP fluorescence was measured on a Biotek spectrophotometric plate reader using excitation and emission wavelengths of 485 and 525 nm. Fluorescence was then normalized by the ODeoo reading to obtain FL/OD. Average normalized FL/OD from 3 independent experiments were plotted.

NSAA Incorporation Assays

Strains harboring integrated GFP reporters and AARS/tRNA plasmids were inoculated from frozen stocks of biological triplicate and grown to confluence overnight in deep well plates. Experimental cultures were inoculated at 1 : 100 dilution in either LB^L or 2XYT media supplemented with chloramphenicol, arabinose, and the appropriate NSAA. Cultures were incubated at 34 °C to an OD₆oo of 0.5-0,8 in a shaking plate incubator at 400 rpm (-4-5 h). GFP expression was induced by addition of anhydrotetracycline, and cells were incubated at 34 °C for an additional 16-20 h.

All assays were performed in 96-well plate format. Cells were centrifuged at 5,000»^· for 3 min, washed with PBS, and resuspended in PBS after a second spin. GFP fluorescence was measured on a Biotek spectrophotometric plate reader using excitation and emission wavelengths of 485 and 525 nm . Fluorescence signals were corrected for autofluorescence as a linear function of OD₆₀₀ using the parent C321.AA strain that does not contain a reporter. Fluorescence was then normalized by the QDeoo reading to obtain FL/OD.

Chemicals

NSAAs used in this study were purchased from PepTech Corporation, Sigma Aldrich, Santa Cruz Biotechnology, and Toronto Research Chemicals, The following NSAAs were purchased: L-4,4-Biphenylalanine (BipA), L-4-Benzoyiphenyiaianine (pBenzoylF), O-tert- Butyl-L-tyrosine (tButylY), L-2-Naphthylalanine (NapA), L-4-Acetylphenylalanine (pAcF), L-4-Iodophenylalanine (pIF), L-4-Bromophenylalanine (pBromoF), L-4-Chlorophenylalanine (pChloroF), L-4-Fluorophenylalanine (pFluoroF), L-4-Azidophenylalanine (pAzF), L-4- Nitrophenylalanine, L-4-Cyanophenylalanine, L-3-Iodophenylalanine, L-phenylalanine, L- tyrosine, L-tryptophan, D-phenylalanine, D-tyrosine, and 5-Hydroxytryptophan. Solutions of

NSAAs (50 or 100 mM) were made in 10-50 niM NaOH.

Library Generation

Error-prone PCR (EP-PCR) is the method of choice for introducing random mutations into a defined segment of DNA that is too long to be chemically synthesized as a degenerate sequence. EP-PCR was performed using the GeneMorph II Random Mutagenesis Kit (Stratagene Catalog #200550), following manufacturer instructions to obtain approximately an average of 2-4 DNA mutations per library member. To generate libraries of MjTyrRS-derived AARSs, roughly 175 ng of PCR template was used in each 25 uL of PCR mix containing primers that have roughly 40 base pairs of homology flanking the AARS coding region. The reaction mixture was subject to 30 cycles with Tm of 63°C and extension time of 1 min. Four separate 25 uL EP-PCR reactions were performed per AARS and then pooled. Plasmid backbone PCRs were performed using KOD Xtreme Hot Start Polymerase (Miilipore Catalog #71795). Both PCR products were isolated by 1.5% agarose gel electrophoresis and Gibson assembled in 8 parallel 20 uL volumes per library. Assemblies were pooled, washed by ethanol precipitation, and resuspended in 50 JJL of dH₂0, which was drop dialyzed (EMD Miilipore, Billerica, MA) and electroporated into E. cloni supreme cells (Lucigen, Middleton, WI). Libraries were expanded in culture and miniprepped (Qiagen, Valencia, CA) to roughly 100 ng/μΐ aliquots. 1 _tug of library was drop dialyzed and electroporated into C321.AA.Nendint for subsequent FACS experiments. Colony counts on appropriate antibiotic containing plates within one doubling time after transformation revealed library sizes of roughly 1 x 10° for AARS libraries in Ecloni hosts and J x 10⁷ in C321 AA endint hosts. Flow Cytometry and Cell Sorting

AARS libraries were subject to three rounds of fluorescence activated sorting in a Beckman Coulter MoFlo Astrios. Prior to each round, the usual NSAA incorporation assay procedure was followed such that cells would express GFP reporter proportional to the activity of the AARS library member. One notable deviation from that procedure was the use of a higher and variable inoculum volume to screen the full library at each stage. Ceils displaying the top 0,5% of fluorescence activation (50k cells) were collected after Round 1, expanded overnight, and used to inoculate experimental cultures for the next round. Because the next round was a negative screening round, the desired NSAA was not added into culture medium. The rest of the NSAA incorporation assay procedure was followed in order to eliminate cells that remained fluorescence due to promiscuous AARS activity on standard amino acids. In the second sort, cells displaying the lowest 10% of visible fluorescence (500k cells) were collected. Cells passing the second round were expanded overnight and used to inoculate the third and final round of sorting. The experimental cultures for the third round were treated as the first round and were sorted for the upper 0,05% of fluorescence activation (Ik cells). The final cells collected were expanded overnight and plated for sequencing and downstream testing. Libraries were frozen at each stage before and after sorting. Flow Jo X software was used to analyze the flow cytometry data. Constructs of interest were grown overnight, miniprepped, and transformed into C321 .AA.Ubiq-UAG-sfGFP for further analysis in plate reader assays. Reporter Purification

Strains harboring integrated GFP reporters and AARS/tRNA plasmids were inoculated from frozen stocks and grown to confluence overnight in 5 mL 2XYT containing chloramphenicol. Saturated cultures were used to inoculate 500 mL experimental cultures of 2XYT supplemented with chloramphenicol, arabinose, and appropriate NSAAs. Cuitures were incubated at 34 °C to an QDeoo of 0.5-0.8 in a shaking incubator at 250 rpm. GFP expression was induced by addition of anhydrotetracycline, and cells were incubated at 34 °C for an additional 24 h before measurement. Cells were centrifuged in a Sorvali RC 5C Plus at 10,000 g for 20 minutes. Pellets were frozen at -20 °C before lysis and purification. Lysis of resuspended pellets was performed under denaturing conditions in 10 raL 7 M urea, 0, 1 M Na -!O i, 0.01 M Tris-Cl, pH 8.0 buffer with 450 units of Benzonase (Novagen, cat. no. 70664- 3) using 15 minutes of sonication in ice using a QSonica Q125 sonicator. Lysate was distributed into microcentrifuge tubes and centrifuged for 20 minutes at 20,000 g at room temperature, and then protein-containing supernatant was removed. 2 raL supernatant with 7.5 uM imidazole was added to 250 uL Ni-NTA resin (Qiagen Cat no. 30210) and equilibrated at 4°C overnight. Columns were washed with 7x 1 mL washes using 8 M urea, 0.1 M Na₂P0₄, 0.01 M Tris-Cl. Wash I and 2 were adjusted to pH 6.3 and contained no imidazole. Washes 3- 7 were adjusted to pH 6.1 and contained imidazole at concentrations of 10 mM, 25 mM, 40 n M, 60 mM and 80 mM respectively. Protein was eluted with two 150 uL elutions using elution buffer (8 M urea, 0.1 M N a ··■!*() ;.. 0.01 M Tris-Cl, pH 4.5, 300 mM imidazole). Gels demonstrated that wash 5 eluted the protein, and for several samples the wash 5 fraction was concentrated ~20X using Amicon Ultra 0.5 mL 10K spin concentrators. Protein gels were loaded with 30 uL wash or elution volumes along with 10 uL Nu-PAGE loading dye in Nu- PAGE 10% Bis-Tris Gels (ThermoFisher Cat. no NP0301 ). Protein gels were run at 180 V for 1 h, washed 3x with DI water, stained with coomassie (Invitrogen Cat. no LC6060) for one hour. Gels were destained overnight in water on a shaker at room temperature and images were taken with a BioRad ChemiDoc MP imaging system.

Mass spectrometry

Samples were submitted for single LC-MS/MS experiments that were performed on a LTQ Orbitrap Elite (Thermo Fischer) equipped with Waters (Milford, MA) NanoAcquity UPLC pump. Trypsin-digested peptides were separated onto a 100 μηι inner diameter microcapillary trapping column packed first with approximately 5 cm of CIS Reprosil resin (5 μηι, 100 A, Dr. Maisch GmbH, Germany) followed by analytical column -20 cm of Reprosil resin (1.8 μιη, 200 A, Dr. Maisch GmbH, Germany). Separation was achieved through applying a gradient from 5-27% ACN in 0.1% formic acid over 90 min at 200 nl min-1. Electrospray ionization was enabled through applying a voltage of 2.0 kV using a home-made electrode junction at the end of the microcapillary column and sprayed from fused silica pico tips (New Objective, MA). The LTQ Orbitrap Elite was operated in the data-dependent mode for the mass spectrometry methods. The mass spectrometry survey scan was performed in the Orbitrap in the range of 395 -1,800 m/z at a resolution of 6 x 10⁴, followed by the selection of the twenty most intense ions (TOP20) for CD3-MS2 fragmentation in the Ion trap using a precursor isolation width window of 2 m/z, AGC setting of 10,000, and a maximum ion accumulation of 200 ms. Singly charged ion species were not subjected to CTD fragmentation. Normalized collision energy was set to 35 V and an activation time of 10 ms, AGC was set to 50,000, the maximum ion time was 200 ms. Ions in a 10 ppm m/z window around ions selected for MS2 were excluded from further selection for fragmentation for 60 s.

Mass Spectrometry Analysis

Raw data were submitted for analysis in Proteome Discoverer 2.1.0.81 (Thermo Scientific) software. Assignment of MS/MS spectra was performed using the Sequest HT algorithm by searching the data against a user provided protein sequence database as well as all entries from the E. coli Uniprot database and other known contaminants such as human keratins and common lab contaminants. Sequest HT searches were performed using a 20 ppm precursor ion tolerance and requiring each peptide N-/C termini to adhere with Trypsin protease specificity while allowing up to two missed cleavages. Cysteine carbamidom ethyl (+57.021) was set as static modifications while methionine oxidation (+15.99492 Da) was set as variable modification. MS2 spectra assignment false discovery rate (FDR) of 1% on protein level was achieved by applying the target-decoy database search. Filtering was performed using a Percolator (64bit version, reference 6). For quantification, a 0.02 m/z window centered on the theoreti cal m/z value of each the six reporter ions and the intensity of the signal closest to the theoretical m/z value was recorded. Reporter ion intensities were exported in result file of Proteome Discoverer 2.1 search engine as an excel tables. All fold changes were analyzed after normalization between samples based on total unique peptides ion signal.

In vitro Aminoacylation Assays

Wild-type BipARS, BipARS9, and BipARSK) DNA template was amplified from the pEVOL.BipARS plasmid and cloned into pET20b using Gibson assembly (New England Biolabs) with primers pET20.F2 and pET20.R for linearization of pET20b and BipRS.F and BipRS.R2 for amplification of BipARS. The BipARS. pET20b plasmids were transformed into BL21(DE3) cells. A 25-mL overnight culture was used to inoculate 500 mL of fresh LB media containing ampicillin. Cells were grown at 37 °C to an ODeoo of approximately 0.6, and protein overexpression was induced with 1 mM IPTG for 4 h. Cells were harvested by centrifugation at 4 °C for 20 minutes at 6000 rpm. Cells were lysed using 50 mM Tris (pH7.5), 300 mM NaCl, 3 mM 2-mercaptoethanol and 5 mM imidazole followed by soni cation. Lysed cells were centrifuged at 18000 x g for I h at 4 °C. The supernatant was run through TALON resin and BipARS was eiuted using an imidazole concentration gradient. The proteins were stored in 50 mM HEPES (pH 7.3), 50 mM KC1, and 1 mM dithiothreitol (DTT). Protein concentration was calculated using the Bradford assay (BioRad).

The tRNA genes were cloned into pUC 18 using Gibson Assembly. pUC18 was linearized using primers pUCbipJF and p!JCbipJfl. The tRNA gene fragment was prepared by annealing 2 μΜ of primers tBip F and tBip R for WT tRNA, tBip9 F and tBip9 R for tRNA variant 9, and tBip 10 F and tBip 10 R for tRNA variant 10. tRNAs were obtained by in vitro transcription using T7 RNA polymerase. -100 μg of resulting plasmid was digested with BstNI overnight at 55 °C, and the digestion reaction was used to start in vitro transcription by adding transcription buffer (40 mM Tris-HCl, pH 8, 6 mM MgCh, 1 mM spermidine, 0.01% Triton, 0.005 mg/mL BSA, and 5 mM dithiothreitol), 4 mM NTPs (ATP, OTP, UTP, and CTP), 20 mM MgCl₂, 5 mM DTT, 2 units/mg of pyrophosphatase (Roche), and 0.75 mg/mL T7 RNA polymerase. The reaction was incubated for 6-7 h at 37 °C. The tRNA was purified using an 8 M urea/12 % acrylamide gel and extracted from the gel using a solution containing 0.5 M sodium acetate and 1 mM EDTA (pH 8) overnight at 30 °C followed by ethanol precipitation.

For aminoacylation reactions, tRNAs were radiolabeled at the 3 '-end using CCA- adding enzyme as previously described (See, A, M. A. M, Kunjapur, J. C. J. C. Hyun, K. L, J. K. L. J, Prather, Deregulation of S-adenosylmethionine biosynthesis and regeneration improves methylation in the E. coli de novo vanillin biosynthesis pathway. Microh. Cell Fact. 15, 1 (2016)). Reactions were carried out with 5 μΜ tRNA (with trace amount of ³²P-labeled tRNA), 2.5 mM amino acid, and 5 μΜ BipARS in buffer containing 50 mM HEPES (pH 7.3), 4 mM ATP, 20 mM MgCl₂, 0.1 mg/mL BSA, and 1 mM DTT. Reactions were incubated for 30 minutes at 37 °C. 2 uh of reaction mixture were quenched in 5 iL of 0.1 U/uL P I nuclease (Sigma) in 200 mM sodium acetate (pH 5) right after enzyme addition and after 30 min. The quenched time points were incubated at room temperature for 1 h. 1 Ε of the solution was run PEI cellulose thin layer chromatography sheets. The fraction of aminoacylated tRNA was determined as described previously, (See, M., Ibba, D. Soil, Aminoacyl -tRNAs: setting the limits of the genetic code. Genes Dev. 18, 731-8 (2004)). All assays were repeated three times. Figures were generated using Prism 7 (GraphPad Software).

Biocontainment escape frequency assays

Escape assays were performed very similarly to as previously described, (See, K. H. Wang, G. Rom an -Hernandez, R. A. Grant, R. T. Sauer, T. A. Baker, The Molecular Basis of N-End Rule Recognition. Mol. Cell. 32, 406-414 (2008)). All strains were grown in permissive conditions and harvested in late exponential phase. Cells were washed twice in LB and resuspended in LB. Viable CFU were calculated from the mean and standard error of the mean (SEM) of three technical replicates of tenfold serial dilutions on permissive media. Three technical replicates were plated on n on -permissive media and monitored for 7 days. Synthetic auxotrophs were plated on two different non-permissive media conditions: SCA - LB with SDS, chloramphenicol, and arabinose - for previously published strains; and KA - LB with kanamycin and arabinose - for strains generated in this study. The latter strains were isolated by transformation with pEVOL vectors harboring kanamycin resistance markers instead of chloramphenicol resistance markers. Passaging and replica plating were used to ensure that isolated strains had lost chloramphenicol resistance and thus the original OTS construct used in the previous study. If synthetic auxotrophs exhibited escape frequencies above the detection limit (lawns) on non-permissive media at days 2, 5, or 7, escape frequencies for those days were calculated from additional platings at lower density. The SEM across technical replicates of the cumulative escape frequency was calculated as previously indicated,

Biocontained strain doubling time measurement

Doubling times for biocontained strains were measured in triplicate by plate reader as indicated earlier for growth assays. Doubling time assays for biocontained strains in the presence of only non-cognate NSAAs were performed as follows: cells grown to mid-log in permissive media were washed twice in LB and diluted to OD -0, 1 before 300-fold dilution into three 150 u volumes of LB+NSAA for each NSAA. These cultures were incubated in the Eon plate reader at conditions described earlier. OTHER EMBODIMENTS

Other embodiments will be evident to those of skill in the art. It should be understood that the foregoing description is provided for clarity only and is merely exemplary. The spirit and scope of the present invention are not limited to the above examples, but are encompassed by the following claims. All publications and patent applications cited above are incorporated by reference in their entirety for all purposes to the same extent as if each individual publication or patent application were specifically and individually indicated to be so incorporated by reference.

Claims

Claims:

1. A biphenylalanine amino acyl tRNA synthetase variant wherein the variant comprises one or more amino acid substitutions to a parental biphenylalanine amino acyl tRNA synthetase having the sequence of

MDEFEMIKRNTSEIISEEELREVL KDEKSAHIGFEPSGKIHLGHYLQIKKJvlIDLQNAG FDIIfflLADLHAYLNQKGELDEIR IGDYNKK EAIvlGLKAKYVYGSEWMLDKDYT LNVYRXALXTTUCR

EQRKIHMLARELLPKKVVC1HNPVLTGLDGEG MSSSKGNFIAVDDSPEEIRA IK A

YCPAGV ΈG PIMEIA YFLEYPLTIKRPEKFGGDLTVNSYΈELESLFKNKELHPMDL KNAVAEELD ILEPIRKRL (SEQ ID NO: 1).

2. The variant of claim 1 wherein the variant compri ses one or more amino acid substitutions selected from the group consisting of N157K and I255F, R257G, R181 C and E259V, I153V and A214T, P37A, K76R, I49F, A130V and A233V, L55M and G158S, D61 V and H70Q and Nl 17D, D200Y, G210S, E237V and D286Y to the parental biphenylalanine amino acyl tRNA synthetase, or an amino acid sequence having at least 90% sequence identity thereof.

3. The variant of claim 1 wherein the variant comprises amino acid substitutions D61 V and H70Q to the parental biphenylalanine amino acyl tRNA synthetase, or an amino acid sequence having at least 90% sequence identity thereof,

4. An isolated polynucleotide encoding the variant of claim 1.

5. A host ceil comprising an expression vector, wherein the expression vector comprises the polynucleotide of claim 4.

6. A transfer RNA (tRN A) variant wherein the variant comprises one or more nucleotide substitutions to a parental tRNA having the sequence of ccggcggtagttcagcagggcagaacggcggactctaaatccgcatggcaggggttcaaatcccctccgccggacca (SEQ ID NO: 2).

7. The tRNA variant of claim 6 wherein the tRNA variant comprises a nucleotide substitution selected from the group consisting of A22G, C67A, C26T, C29A, G51T and G23 A to the parental tRNA, or a nucleotide sequence having at least 90% sequence identity thereof.

8. An isolated polynucleotide encoding the variant of claim 6,

9. A host cell comprising an expression vector, wherein the expression vector comprises the polynucleotide of claim 8.

10. A biphenylalanine amino acyl tRNA synthetase and tRNA pair wherein the pair is selected from the group consisting of

i) a biphenylalanine amino acyl tRNA synthetase variant comprising amino acid substitutions N157K and I255F to the parental biphenylalanine amino acyl tRNA synthetase of claim 1 and the parental tRN A of claim 6,

ii ) a biphenylalanine amino acyl tRNA synthetase variant comprising an amino acid substitution R257G to the parental biphenylalanine amino acyl tRNA synthetase of claim 1 and the parental tRNA of claim 6;

iii) a biphenylalanine amino acyl tRNA synthetase variant comprising amino acid substitutions R181C and E259V to the parental biphenylalanine amino acyl tRNA synthetase of claim 1 and a tRNA variant comprising a nucleotide substitution A22G to the parental tRNA of claim 6;

iv) a biphenylalanine amino acyl tRN A synthetase vari ant compri sing amino acid substitutions I153V and A214T to the parental biphenylalanine amino acyl tRNA synthetase of claim 1 and a tRNA variant comprising a nucleotide substitution C67A to the parental tRNA of claim 6; v) a biphenylalanme amino acyi tRNA synthetase variant comprising an amino acid substitution P37A to the parental biphenylalanme amino acyi tRNA synthetase of claim 1 and the parental tRNA of claim 6;

vi) a biphenylalanme amino acyi tRNA synthetase variant comprising an amino acid substitution K76R to the parental biphenylalanme amino acyi tRNA synthetase of claim 1 and the parental tRNA of claim 6;

vii) the parental biphenylalanme amino acyi tRNA synthetase of claim 1 and a tRNA variant comprising a nucleotide substitution A22G to the parental tRNA of claim 6; viii) a biphenylalanme amino acyi tRNA synthetase variant comprising amino acid substitutions I49F, A130V and A233V to the parental biphenylalanme amino acyi tRNA synthetase of claim 1 and a tRN A variant comprising a nucleotide substitution C26T to the parental tRNA of claim 6;

ix) a biphenylalanme amino acyi tRNA synthetase variant comprising amino acid substitutions L55M and G158S to the parental bi phenyl alanine amino acyi tRNA synthetase of claim 1 and a tRNA variant comprising a nucleotide substitution C29A to the parental tRNA of claim 6;

x) a biphenylalanme amino acyi tRNA synthetase variant comprising amino acid substitutions D61 V and H70Q to the parental biphenylalanme amino acyi tRNA synthetase of claim 1 and a tRNA variant comprising a nucleotide substitution G51T to the parental tRNA of claim 6; and

xi) a biphenylalanme amino acyi tRNA synthetase variant comprising amino acid substitutions Nl 17D, D200Y, G210S, E237V and D286Y to the parental biphenylalanme amino acyi tRNA synthetase of claim 1 and a tRN A variant comprising a nucleotide substitution G23A to the parental tRNA of claim 6.

11. A method of screening for an amino acyl tRNA synthetase variant having preferential selectivity for a desired non-standard amino acid (NSAA) over its standard amino acid (SAA) counterpart or an undesired non-standard amino acid for incorporation into a target polypeptide in a cell comprising

providing to the cell an amino acyl tRNA synthetase variant and its cognate transfer RNA corresponding to the desired NSAA, wherein the cell is genetically engineered to express the target polypeptide including an amino acid target location for incorporation of the desired NSAA by the amino acyl tRNA synthetase variant and the transfer RNA, and wherein the cell expresses the target polynucleotide and either a desired NSAA, an SAA or an undesired NSAA is incorporated at the amino acid target location depending on the preferential selectivity of the amino acyl tRNA synthetase variant and the transfer RN A for the corresponding desired NSAA,

wherein a removable protecting group is attached to the target polypeptide adjacent to the amino acid target location, suc that when the removable protecting group is removed, an N-end amino acid is exposed at the amino acid target location, and wherein a detectable moiety is attached to the C-end of the target polypeptide,

wherein the cell expresses an enzyme that cleaves the removable protecting group to generate an N-end amino acid, and wherein the cell further expresses an adaptor protein for a protease, wherein the protease degrades the target polypeptide when the N-end amino acid is an SAA or an undesired NSAA,

detecting the detectable moiety as a measure of the amount of target polypeptide including the desired NSAA within the cell, and

repeatedly testing an amino acyl tRNA synthetase variant for improved production of the target polypeptide including the desired NSAA.

12. The method of claim 1 1 wherein the removable protecting group is ubiquitin that is cleavable by Ubpl .

13. The method of claim 1 1 wherein the detectable moiety is a fluorescent moiety or a reporter protein,

14. The method of claim 1 1 wherein cell expresses the enzyme for cleaving the removable protecting group constitutively or inducibiy.

15. The method of claim 1 1 wherein the adaptor protein and the protease is a ClpS-ClpAP protease system wherein the ClpS-ClpAP protease system degrades the target polypeptide when the N-end amino acid is an SAA or an undesired NSAA to thereby enrich the target polypeptide including the desired NSAA within the cell.

16. The method of claim 1 1 wherein the adaptor protein comprises a ClpS protein, its natural homolog, ClpS__V65I, ClpS_43I or ClpS_L32F mutants.

17. The method of claim 1 1 wherein the cell is a prokaryotic cell or a eukaryotic cell.

1 8. The method of claim 1 1 wherein the cell is a bacterium.

19. The method of claim 1 1 wherein the cell is a genetically modified E. coii.

20. The method of claim 1 1 wherein the desired NSAA is biphenylalanine (BipA).

21. The method of claim 1 1 wherein the amino acyl tRNA synthetase variant is a biphenylalanine amino acyl tRNA synthetase (BipARS) variant,

22. The method of claim 1 1 wherein the amino acyl tR A synthetase variant is generated by introducing mutations throughout a parental amino acyl tRNA synthetase gene.

23. The method of claim 1 1 where error-prone PGR is used to introduce mutations throughout the wild type amino acyl tRNA synthetase gene,

24. The method of claim 1 1 wherein the amino acyl tRNA synthetase variant is provided to the cell by a nucleic acid encoding the amino acyl tRNA synthetase variant.

25. The method of claim 1 1 wherein the transfer RNA is provided to the cell by a nucleic acid encoding the transfer RNA.