US20210072252A1 - Molecules and methods for iterative polypeptide analysis and processing - Google Patents
Molecules and methods for iterative polypeptide analysis and processing Download PDFInfo
- Publication number
- US20210072252A1 US20210072252A1 US17/088,898 US202017088898A US2021072252A1 US 20210072252 A1 US20210072252 A1 US 20210072252A1 US 202017088898 A US202017088898 A US 202017088898A US 2021072252 A1 US2021072252 A1 US 2021072252A1
- Authority
- US
- United States
- Prior art keywords
- seq
- amino acid
- residue
- naab
- position corresponding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 108090000765 processed proteins & peptides Proteins 0.000 title claims abstract description 125
- 102000004196 processed proteins & peptides Human genes 0.000 title claims abstract description 102
- 229920001184 polypeptide Polymers 0.000 title claims abstract description 72
- 238000000034 method Methods 0.000 title claims abstract description 65
- 238000004458 analytical method Methods 0.000 title description 12
- 150000001413 amino acids Chemical class 0.000 claims abstract description 111
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 86
- 102000004169 proteins and genes Human genes 0.000 claims abstract description 80
- 125000001429 N-terminal alpha-amino-acid group Chemical group 0.000 claims abstract description 75
- 238000006731 degradation reaction Methods 0.000 claims abstract description 48
- 102000004190 Enzymes Human genes 0.000 claims abstract description 45
- 108090000790 Enzymes Proteins 0.000 claims abstract description 45
- 230000027455 binding Effects 0.000 claims description 101
- 235000001014 amino acid Nutrition 0.000 claims description 97
- 229940024606 amino acid Drugs 0.000 claims description 94
- 102000052866 Amino Acyl-tRNA Synthetases Human genes 0.000 claims description 78
- 108700028939 Amino Acyl-tRNA Synthetases Proteins 0.000 claims description 78
- 235000018102 proteins Nutrition 0.000 claims description 78
- 230000035772 mutation Effects 0.000 claims description 61
- 241000588724 Escherichia coli Species 0.000 claims description 45
- 230000015556 catabolic process Effects 0.000 claims description 44
- ROHFNLRQFUQHCH-UHFFFAOYSA-N Leucine Natural products CC(C)CC(N)C(O)=O ROHFNLRQFUQHCH-UHFFFAOYSA-N 0.000 claims description 26
- 125000003630 glycyl group Chemical group [H]N([H])C([H])([H])C(*)=O 0.000 claims description 26
- 235000005772 leucine Nutrition 0.000 claims description 26
- COLNVLDHVKWLRT-QMMMGPOBSA-N L-phenylalanine Chemical compound OC(=O)[C@@H](N)CC1=CC=CC=C1 COLNVLDHVKWLRT-QMMMGPOBSA-N 0.000 claims description 24
- 239000012634 fragment Substances 0.000 claims description 24
- KDXKERNSBIXSRK-UHFFFAOYSA-N Lysine Natural products NCCCCC(N)C(O)=O KDXKERNSBIXSRK-UHFFFAOYSA-N 0.000 claims description 23
- 125000001909 leucine group Chemical group [H]N(*)C(C(*)=O)C([H])([H])C(C([H])([H])[H])C([H])([H])[H] 0.000 claims description 23
- 238000012163 sequencing technique Methods 0.000 claims description 23
- DHMQDGOQFOQNFH-UHFFFAOYSA-N Glycine Chemical compound NCC(O)=O DHMQDGOQFOQNFH-UHFFFAOYSA-N 0.000 claims description 22
- HNDVDQJCIGZPNO-UHFFFAOYSA-N histidine Natural products OC(=O)C(N)CC1=CN=CN1 HNDVDQJCIGZPNO-UHFFFAOYSA-N 0.000 claims description 21
- 125000000729 N-terminal amino-acid group Chemical group 0.000 claims description 20
- 235000006109 methionine Nutrition 0.000 claims description 20
- CKLJMWTZIZZHCS-REOHCLBHSA-N L-aspartic acid Chemical compound OC(=O)[C@@H](N)CC(O)=O CKLJMWTZIZZHCS-REOHCLBHSA-N 0.000 claims description 19
- 239000004472 Lysine Substances 0.000 claims description 19
- KZSNJWFQEVHDMF-UHFFFAOYSA-N Valine Natural products CC(C)C(N)C(O)=O KZSNJWFQEVHDMF-UHFFFAOYSA-N 0.000 claims description 19
- 125000003295 alanine group Chemical group N[C@@H](C)C(=O)* 0.000 claims description 19
- 229930182817 methionine Natural products 0.000 claims description 19
- DCXYFEDJOCDNAF-REOHCLBHSA-N L-asparagine Chemical compound OC(=O)[C@@H](N)CC(N)=O DCXYFEDJOCDNAF-REOHCLBHSA-N 0.000 claims description 18
- KDXKERNSBIXSRK-YFKPBYRVSA-N L-lysine Chemical compound NCCCC[C@H](N)C(O)=O KDXKERNSBIXSRK-YFKPBYRVSA-N 0.000 claims description 18
- MTCFGRXMJLQNBG-UHFFFAOYSA-N Serine Natural products OCC(N)C(O)=O MTCFGRXMJLQNBG-UHFFFAOYSA-N 0.000 claims description 17
- 239000004474 valine Substances 0.000 claims description 17
- DCXYFEDJOCDNAF-UHFFFAOYSA-N Asparagine Natural products OC(=O)C(N)CC(N)=O DCXYFEDJOCDNAF-UHFFFAOYSA-N 0.000 claims description 16
- 235000009582 asparagine Nutrition 0.000 claims description 16
- 229960001230 asparagine Drugs 0.000 claims description 16
- COLNVLDHVKWLRT-UHFFFAOYSA-N phenylalanine Natural products OC(=O)C(N)CC1=CC=CC=C1 COLNVLDHVKWLRT-UHFFFAOYSA-N 0.000 claims description 15
- 229960005190 phenylalanine Drugs 0.000 claims description 15
- 125000003607 serino group Chemical group [H]N([H])[C@]([H])(C(=O)[*])C(O[H])([H])[H] 0.000 claims description 15
- OUYCCCASQSFEME-UHFFFAOYSA-N tyrosine Natural products OC(=O)C(N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-UHFFFAOYSA-N 0.000 claims description 15
- MTCFGRXMJLQNBG-REOHCLBHSA-N (2S)-2-Amino-3-hydroxypropansäure Chemical compound OC[C@H](N)C(O)=O MTCFGRXMJLQNBG-REOHCLBHSA-N 0.000 claims description 14
- 239000004471 Glycine Substances 0.000 claims description 14
- ROHFNLRQFUQHCH-YFKPBYRVSA-N L-leucine Chemical compound CC(C)C[C@H](N)C(O)=O ROHFNLRQFUQHCH-YFKPBYRVSA-N 0.000 claims description 14
- 229960000310 isoleucine Drugs 0.000 claims description 14
- AGPKZVBTJJNPAG-UHFFFAOYSA-N isoleucine Natural products CCC(C)C(N)C(O)=O AGPKZVBTJJNPAG-UHFFFAOYSA-N 0.000 claims description 14
- 230000009871 nonspecific binding Effects 0.000 claims description 14
- FFEARJCKVFRZRR-BYPYZUCNSA-N L-methionine Chemical compound CSCC[C@H](N)C(O)=O FFEARJCKVFRZRR-BYPYZUCNSA-N 0.000 claims description 13
- AYFVYJQAPQTCCC-GBXIJSLDSA-N L-threonine Chemical compound C[C@@H](O)[C@H](N)C(O)=O AYFVYJQAPQTCCC-GBXIJSLDSA-N 0.000 claims description 13
- AYFVYJQAPQTCCC-UHFFFAOYSA-N Threonine Natural products CC(O)C(N)C(O)=O AYFVYJQAPQTCCC-UHFFFAOYSA-N 0.000 claims description 13
- 239000004473 Threonine Substances 0.000 claims description 13
- 238000002887 multiple sequence alignment Methods 0.000 claims description 13
- 125000002987 valine group Chemical group [H]N([H])C([H])(C(*)=O)C([H])(C([H])([H])[H])C([H])([H])[H] 0.000 claims description 13
- AGPKZVBTJJNPAG-WHFBIAKZSA-N L-isoleucine Chemical compound CC[C@H](C)[C@H](N)C(O)=O AGPKZVBTJJNPAG-WHFBIAKZSA-N 0.000 claims description 12
- OUYCCCASQSFEME-QMMMGPOBSA-N L-tyrosine Chemical compound OC(=O)[C@@H](N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-QMMMGPOBSA-N 0.000 claims description 12
- 235000004279 alanine Nutrition 0.000 claims description 12
- CKLJMWTZIZZHCS-REOHCLBHSA-L aspartate group Chemical group N[C@@H](CC(=O)[O-])C(=O)[O-] CKLJMWTZIZZHCS-REOHCLBHSA-L 0.000 claims description 12
- XUJNEKJLAYXESH-UHFFFAOYSA-N cysteine Natural products SCC(N)C(O)=O XUJNEKJLAYXESH-UHFFFAOYSA-N 0.000 claims description 12
- 235000018417 cysteine Nutrition 0.000 claims description 12
- XUJNEKJLAYXESH-REOHCLBHSA-N L-Cysteine Chemical compound SC[C@H](N)C(O)=O XUJNEKJLAYXESH-REOHCLBHSA-N 0.000 claims description 11
- KZSNJWFQEVHDMF-BYPYZUCNSA-N L-valine Chemical compound CC(C)[C@H](N)C(O)=O KZSNJWFQEVHDMF-BYPYZUCNSA-N 0.000 claims description 11
- QIVBCDIJIAJPQS-UHFFFAOYSA-N Tryptophan Natural products C1=CC=C2C(CC(N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-UHFFFAOYSA-N 0.000 claims description 11
- 125000000341 threoninyl group Chemical group [H]OC([H])(C([H])([H])[H])C([H])(N([H])[H])C(*)=O 0.000 claims description 11
- 239000004475 Arginine Substances 0.000 claims description 10
- WHUUTDBJXJRKMK-VKHMYHEASA-N L-glutamic acid Chemical compound OC(=O)[C@@H](N)CCC(O)=O WHUUTDBJXJRKMK-VKHMYHEASA-N 0.000 claims description 10
- ZDXPYRJPNDTMRX-VKHMYHEASA-N L-glutamine Chemical compound OC(=O)[C@@H](N)CCC(N)=O ZDXPYRJPNDTMRX-VKHMYHEASA-N 0.000 claims description 10
- ODKSFYDXXFIFQN-UHFFFAOYSA-N arginine Natural products OC(=O)C(N)CCCNC(N)=N ODKSFYDXXFIFQN-UHFFFAOYSA-N 0.000 claims description 10
- 235000009697 arginine Nutrition 0.000 claims description 10
- 125000000613 asparagine group Chemical group N[C@@H](CC(N)=O)C(=O)* 0.000 claims description 10
- QNAYBMKLOCPYGJ-REOHCLBHSA-N L-alanine Chemical compound C[C@H](N)C(O)=O QNAYBMKLOCPYGJ-REOHCLBHSA-N 0.000 claims description 9
- QIVBCDIJIAJPQS-VIFPVBQESA-N L-tryptophane Chemical compound C1=CC=C2C(C[C@H](N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-VIFPVBQESA-N 0.000 claims description 8
- ONIBWKKTOPOVIA-UHFFFAOYSA-N Proline Natural products OC(=O)C1CCCN1 ONIBWKKTOPOVIA-UHFFFAOYSA-N 0.000 claims description 8
- 108010029287 Threonine-tRNA ligase Proteins 0.000 claims description 8
- 102100028196 Threonine-tRNA ligase 2, cytoplasmic Human genes 0.000 claims description 8
- ZDXPYRJPNDTMRX-UHFFFAOYSA-N glutamine Natural products OC(=O)C(N)CCC(N)=O ZDXPYRJPNDTMRX-UHFFFAOYSA-N 0.000 claims description 8
- 235000004554 glutamine Nutrition 0.000 claims description 8
- 125000003588 lysine group Chemical group [H]N([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])(N([H])[H])C(*)=O 0.000 claims description 8
- 235000013930 proline Nutrition 0.000 claims description 8
- 125000000151 cysteine group Chemical group N[C@@H](CS)C(=O)* 0.000 claims description 7
- 125000001493 tyrosinyl group Chemical group [H]OC1=C([H])C([H])=C(C([H])=C1[H])C([H])([H])C([H])(N([H])[H])C(*)=O 0.000 claims description 7
- WHUUTDBJXJRKMK-UHFFFAOYSA-N Glutamic acid Natural products OC(=O)C(N)CCC(O)=O WHUUTDBJXJRKMK-UHFFFAOYSA-N 0.000 claims description 6
- ONIBWKKTOPOVIA-BYPYZUCNSA-N L-Proline Chemical compound OC(=O)[C@@H]1CCCN1 ONIBWKKTOPOVIA-BYPYZUCNSA-N 0.000 claims description 6
- HNDVDQJCIGZPNO-YFKPBYRVSA-N L-histidine Chemical compound OC(=O)[C@@H](N)CC1=CN=CN1 HNDVDQJCIGZPNO-YFKPBYRVSA-N 0.000 claims description 6
- 102000021052 amino acid binding proteins Human genes 0.000 claims description 6
- 235000003704 aspartic acid Nutrition 0.000 claims description 6
- OQFSQFPPLPISGP-UHFFFAOYSA-N beta-carboxyaspartic acid Natural products OC(=O)C(N)C(C(O)=O)C(O)=O OQFSQFPPLPISGP-UHFFFAOYSA-N 0.000 claims description 6
- 229930195712 glutamate Natural products 0.000 claims description 6
- 125000000430 tryptophan group Chemical group [H]N([H])C(C(=O)O*)C([H])([H])C1=C([H])N([H])C2=C([H])C([H])=C([H])C([H])=C12 0.000 claims description 6
- FWMNVWWHGCHHJJ-SKKKGAJSSA-N 4-amino-1-[(2r)-6-amino-2-[[(2r)-2-[[(2r)-2-[[(2r)-2-amino-3-phenylpropanoyl]amino]-3-phenylpropanoyl]amino]-4-methylpentanoyl]amino]hexanoyl]piperidine-4-carboxylic acid Chemical compound C([C@H](C(=O)N[C@H](CC(C)C)C(=O)N[C@H](CCCCN)C(=O)N1CCC(N)(CC1)C(O)=O)NC(=O)[C@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FWMNVWWHGCHHJJ-SKKKGAJSSA-N 0.000 claims description 5
- 101710178135 O-phosphoserine-tRNA(Cys) ligase Proteins 0.000 claims description 5
- 108091011209 amino acid binding proteins Proteins 0.000 claims description 5
- WHUUTDBJXJRKMK-VKHMYHEASA-L glutamate group Chemical group N[C@@H](CCC(=O)[O-])C(=O)[O-] WHUUTDBJXJRKMK-VKHMYHEASA-L 0.000 claims description 5
- 230000004481 post-translational protein modification Effects 0.000 claims description 5
- 108010058060 Alanine-tRNA ligase Proteins 0.000 claims description 4
- 102100037399 Alanine-tRNA ligase, cytoplasmic Human genes 0.000 claims description 4
- 241000203069 Archaea Species 0.000 claims description 4
- 101710132120 Asparagine-tRNA ligase Proteins 0.000 claims description 4
- 102100023245 Asparagine-tRNA ligase, cytoplasmic Human genes 0.000 claims description 4
- 101710160288 Asparagine-tRNA ligase, cytoplasmic Proteins 0.000 claims description 4
- 101710090387 Asparagine-tRNA ligase, mitochondrial Proteins 0.000 claims description 4
- 108010065272 Aspartate-tRNA ligase Proteins 0.000 claims description 4
- 108010051724 Glycine-tRNA Ligase Proteins 0.000 claims description 4
- 102100036589 Glycine-tRNA ligase Human genes 0.000 claims description 4
- ODKSFYDXXFIFQN-BYPYZUCNSA-P L-argininium(2+) Chemical compound NC(=[NH2+])NCCC[C@H]([NH3+])C(O)=O ODKSFYDXXFIFQN-BYPYZUCNSA-P 0.000 claims description 4
- 108010003060 Methionine-tRNA ligase Proteins 0.000 claims description 4
- 102100037206 Methionine-tRNA ligase, cytoplasmic Human genes 0.000 claims description 4
- 101710086402 Probable asparagine-tRNA ligase, cytoplasmic Proteins 0.000 claims description 4
- 101710108281 Probable asparagine-tRNA ligase, mitochondrial Proteins 0.000 claims description 4
- 102100028531 Probable proline-tRNA ligase, mitochondrial Human genes 0.000 claims description 4
- 101710164123 Probable proline-tRNA ligase, mitochondrial Proteins 0.000 claims description 4
- 101710115782 Proline-tRNA ligase Proteins 0.000 claims description 4
- 101710140381 Proline-tRNA ligase, cytoplasmic Proteins 0.000 claims description 4
- 108010030161 Serine-tRNA ligase Proteins 0.000 claims description 4
- 102100040516 Serine-tRNA ligase, cytoplasmic Human genes 0.000 claims description 4
- 235000013922 glutamic acid Nutrition 0.000 claims description 4
- 239000004220 glutamic acid Substances 0.000 claims description 4
- 230000009870 specific binding Effects 0.000 claims description 4
- 101710146427 Probable tyrosine-tRNA ligase, cytoplasmic Proteins 0.000 claims description 3
- 102100025336 Tyrosine-tRNA ligase, mitochondrial Human genes 0.000 claims description 3
- 101710107268 Tyrosine-tRNA ligase, mitochondrial Proteins 0.000 claims description 3
- 230000000696 methanogenic effect Effects 0.000 claims description 3
- 101710177011 Histidine-tRNA ligase, cytoplasmic Proteins 0.000 claims description 2
- 101710096715 Probable histidine-tRNA ligase, cytoplasmic Proteins 0.000 claims description 2
- 239000007850 fluorescent dye Substances 0.000 claims 2
- 101710201712 Amino acid binding protein Proteins 0.000 claims 1
- 241000205042 Archaeoglobus fulgidus Species 0.000 claims 1
- 102100026198 Aspartate-tRNA ligase, mitochondrial Human genes 0.000 claims 1
- 102100029015 Histidine-tRNA ligase, mitochondrial Human genes 0.000 claims 1
- 108010092041 Lysine-tRNA Ligase Proteins 0.000 claims 1
- 102100035529 Lysine-tRNA ligase Human genes 0.000 claims 1
- 241000203353 Methanococcus Species 0.000 claims 1
- 239000000205 acacia gum Substances 0.000 claims 1
- 239000000665 guar gum Substances 0.000 claims 1
- DCWXELXMIBXGTH-QMMMGPOBSA-N phosphonotyrosine Chemical group OC(=O)[C@@H](N)CC1=CC=C(OP(O)(O)=O)C=C1 DCWXELXMIBXGTH-QMMMGPOBSA-N 0.000 claims 1
- 239000003153 chemical reaction reagent Substances 0.000 abstract description 30
- 238000003776 cleavage reaction Methods 0.000 abstract description 19
- 230000007017 scission Effects 0.000 abstract description 19
- 238000010252 digital analysis Methods 0.000 abstract description 6
- 125000003275 alpha amino acid group Chemical group 0.000 description 72
- QKFJKGMPGYROCL-UHFFFAOYSA-N phenyl isothiocyanate Chemical compound S=C=NC1=CC=CC=C1 QKFJKGMPGYROCL-UHFFFAOYSA-N 0.000 description 46
- 239000000758 substrate Substances 0.000 description 28
- 125000000487 histidyl group Chemical group [H]N([H])C(C(=O)O*)C([H])([H])C1=C([H])N([H])C([H])=N1 0.000 description 25
- 229940117953 phenylisothiocyanate Drugs 0.000 description 23
- 125000001360 methionine group Chemical group N[C@@H](CCSC)C(=O)* 0.000 description 16
- 238000006467 substitution reaction Methods 0.000 description 16
- 239000013078 crystal Substances 0.000 description 15
- 210000004027 cell Anatomy 0.000 description 14
- 238000013461 design Methods 0.000 description 12
- 229940009098 aspartate Drugs 0.000 description 8
- 108090000711 cruzipain Proteins 0.000 description 8
- LOKCTEFSRHRXRJ-UHFFFAOYSA-I dipotassium trisodium dihydrogen phosphate hydrogen phosphate dichloride Chemical compound P(=O)(O)(O)[O-].[K+].P(=O)(O)([O-])[O-].[Na+].[Na+].[Cl-].[K+].[Cl-].[Na+] LOKCTEFSRHRXRJ-UHFFFAOYSA-I 0.000 description 8
- 239000002953 phosphate buffered saline Substances 0.000 description 8
- 239000000523 sample Substances 0.000 description 7
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 6
- ZFOMKMMPBOQKMC-KXUCPTDWSA-N L-pyrrolysine Chemical compound C[C@@H]1CC=N[C@H]1C(=O)NCCCC[C@H]([NH3+])C([O-])=O ZFOMKMMPBOQKMC-KXUCPTDWSA-N 0.000 description 6
- 102000003960 Ligases Human genes 0.000 description 6
- 108090000364 Ligases Proteins 0.000 description 6
- -1 LysRS Proteins 0.000 description 6
- BZQFBWGGLXLEPQ-UHFFFAOYSA-N O-phosphoryl-L-serine Natural products OC(=O)C(N)COP(O)(O)=O BZQFBWGGLXLEPQ-UHFFFAOYSA-N 0.000 description 6
- 101150096038 PTH1R gene Proteins 0.000 description 6
- 241000223109 Trypanosoma cruzi Species 0.000 description 6
- 230000003197 catalytic effect Effects 0.000 description 6
- 229950006137 dexfosfoserine Drugs 0.000 description 6
- 230000002255 enzymatic effect Effects 0.000 description 6
- RAXXELZNTBOGNW-UHFFFAOYSA-N imidazole Natural products C1=CNC=N1 RAXXELZNTBOGNW-UHFFFAOYSA-N 0.000 description 6
- 230000003993 interaction Effects 0.000 description 6
- 238000004949 mass spectrometry Methods 0.000 description 6
- BZQFBWGGLXLEPQ-REOHCLBHSA-N phosphoserine Chemical compound OC(=O)[C@@H](N)COP(O)(O)=O BZQFBWGGLXLEPQ-REOHCLBHSA-N 0.000 description 6
- DCWXELXMIBXGTH-UHFFFAOYSA-N phosphotyrosine Chemical compound OC(=O)C(N)CC1=CC=C(OP(O)(O)=O)C=C1 DCWXELXMIBXGTH-UHFFFAOYSA-N 0.000 description 6
- 125000000174 L-prolyl group Chemical group [H]N1C([H])([H])C([H])([H])C([H])([H])[C@@]1([H])C(*)=O 0.000 description 5
- 101710181812 Methionine aminopeptidase Proteins 0.000 description 5
- 230000002378 acidificating effect Effects 0.000 description 5
- 239000012062 aqueous buffer Substances 0.000 description 5
- 238000012575 bio-layer interferometry Methods 0.000 description 5
- 238000012217 deletion Methods 0.000 description 5
- 230000037430 deletion Effects 0.000 description 5
- 238000001514 detection method Methods 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 5
- 238000003780 insertion Methods 0.000 description 5
- 230000037431 insertion Effects 0.000 description 5
- 125000000741 isoleucyl group Chemical group [H]N([H])C(C(C([H])([H])[H])C([H])([H])C([H])([H])[H])C(=O)O* 0.000 description 5
- 239000000203 mixture Substances 0.000 description 5
- 230000007935 neutral effect Effects 0.000 description 5
- 125000001997 phenyl group Chemical group [H]C1=C([H])C([H])=C(*)C([H])=C1[H] 0.000 description 5
- AUUIARVPJHGTSA-UHFFFAOYSA-N 3-(aminomethyl)chromen-2-one Chemical compound C1=CC=C2OC(=O)C(CN)=CC2=C1 AUUIARVPJHGTSA-UHFFFAOYSA-N 0.000 description 4
- 125000004042 4-aminobutyl group Chemical group [H]C([*])([H])C([H])([H])C([H])([H])C([H])([H])N([H])[H] 0.000 description 4
- 108020004414 DNA Proteins 0.000 description 4
- CIQHWLTYGMYQQR-QMMMGPOBSA-N O(4')-sulfo-L-tyrosine Chemical compound OC(=O)[C@@H](N)CC1=CC=C(OS(O)(=O)=O)C=C1 CIQHWLTYGMYQQR-QMMMGPOBSA-N 0.000 description 4
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 4
- 125000003277 amino group Chemical group 0.000 description 4
- 238000013459 approach Methods 0.000 description 4
- 238000012512 characterization method Methods 0.000 description 4
- 238000010494 dissociation reaction Methods 0.000 description 4
- 230000005593 dissociations Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 239000013604 expression vector Substances 0.000 description 4
- PJJJBBJSCAKJQF-UHFFFAOYSA-N guanidinium chloride Chemical compound [Cl-].NC(N)=[NH2+] PJJJBBJSCAKJQF-UHFFFAOYSA-N 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- USRGIUJOYOXOQJ-GBXIJSLDSA-N phosphothreonine Chemical compound OP(=O)(O)O[C@H](C)[C@H](N)C(O)=O USRGIUJOYOXOQJ-GBXIJSLDSA-N 0.000 description 4
- 238000002331 protein detection Methods 0.000 description 4
- 238000004557 single molecule detection Methods 0.000 description 4
- 238000002415 sodium dodecyl sulfate polyacrylamide gel electrophoresis Methods 0.000 description 4
- UMGDCJDMYOKAJW-UHFFFAOYSA-N thiourea Chemical compound NC(N)=S UMGDCJDMYOKAJW-UHFFFAOYSA-N 0.000 description 4
- HJKIPLVIICSANU-UHFFFAOYSA-N 1-(2-anilino-5-methyl-1,3-thiazol-4-yl)ethanone Chemical compound S1C(C)=C(C(=O)C)N=C1NC1=CC=CC=C1 HJKIPLVIICSANU-UHFFFAOYSA-N 0.000 description 3
- QKNYBSVHEMOAJP-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;hydron;chloride Chemical compound Cl.OCC(N)(CO)CO QKNYBSVHEMOAJP-UHFFFAOYSA-N 0.000 description 3
- WEVYAHXRMPXWCK-UHFFFAOYSA-N Acetonitrile Chemical compound CC#N WEVYAHXRMPXWCK-UHFFFAOYSA-N 0.000 description 3
- 102100028820 Aspartate-tRNA ligase, cytoplasmic Human genes 0.000 description 3
- 229910019142 PO4 Inorganic materials 0.000 description 3
- 102220467128 Runt-related transcription factor 1_L13Y_mutation Human genes 0.000 description 3
- 241000589499 Thermus thermophilus Species 0.000 description 3
- 239000002253 acid Substances 0.000 description 3
- 125000000539 amino acid group Chemical group 0.000 description 3
- 239000011230 binding agent Substances 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 150000001875 compounds Chemical class 0.000 description 3
- 238000012165 high-throughput sequencing Methods 0.000 description 3
- 239000003112 inhibitor Substances 0.000 description 3
- 239000006166 lysate Substances 0.000 description 3
- 238000002703 mutagenesis Methods 0.000 description 3
- 231100000350 mutagenesis Toxicity 0.000 description 3
- 239000010452 phosphate Substances 0.000 description 3
- 230000026731 phosphorylation Effects 0.000 description 3
- 238000006366 phosphorylation reaction Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 239000000047 product Substances 0.000 description 3
- 125000001500 prolyl group Chemical group [H]N1C([H])(C(=O)[*])C([H])([H])C([H])([H])C1([H])[H] 0.000 description 3
- 238000000159 protein binding assay Methods 0.000 description 3
- 238000000746 purification Methods 0.000 description 3
- 230000001105 regulatory effect Effects 0.000 description 3
- 239000007787 solid Substances 0.000 description 3
- 239000000243 solution Substances 0.000 description 3
- 239000000126 substance Substances 0.000 description 3
- 238000005406 washing Methods 0.000 description 3
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 2
- PQMRRAQXKWFYQN-UHFFFAOYSA-N 1-phenyl-2-sulfanylideneimidazolidin-4-one Chemical compound S=C1NC(=O)CN1C1=CC=CC=C1 PQMRRAQXKWFYQN-UHFFFAOYSA-N 0.000 description 2
- 108090000915 Aminopeptidases Proteins 0.000 description 2
- 102000004400 Aminopeptidases Human genes 0.000 description 2
- 108091003079 Bovine Serum Albumin Proteins 0.000 description 2
- 108010005843 Cysteine Proteases Proteins 0.000 description 2
- 102000005927 Cysteine Proteases Human genes 0.000 description 2
- 108010016626 Dipeptides Proteins 0.000 description 2
- 102100020751 Dipeptidyl peptidase 2 Human genes 0.000 description 2
- 101710176147 Isoleucine-tRNA ligase, cytoplasmic Proteins 0.000 description 2
- 102100035997 Isoleucine-tRNA ligase, mitochondrial Human genes 0.000 description 2
- 101710149031 Probable isoleucine-tRNA ligase, cytoplasmic Proteins 0.000 description 2
- 108010026552 Proteome Proteins 0.000 description 2
- JUJWROOIHBZHMG-UHFFFAOYSA-N Pyridine Chemical compound C1=CC=NC=C1 JUJWROOIHBZHMG-UHFFFAOYSA-N 0.000 description 2
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N Silicium dioxide Chemical compound O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 2
- 108700005078 Synthetic Genes Proteins 0.000 description 2
- DTQVDTLACAAQTR-UHFFFAOYSA-N Trifluoroacetic acid Chemical compound OC(=O)C(F)(F)F DTQVDTLACAAQTR-UHFFFAOYSA-N 0.000 description 2
- 102220613740 Uncharacterized protein C19orf84_L160Y_mutation Human genes 0.000 description 2
- XSQUKJJJFZCRTK-UHFFFAOYSA-N Urea Natural products NC(N)=O XSQUKJJJFZCRTK-UHFFFAOYSA-N 0.000 description 2
- 238000002835 absorbance Methods 0.000 description 2
- 239000012491 analyte Substances 0.000 description 2
- 125000000637 arginyl group Chemical group N[C@@H](CCCNC(N)=N)C(=O)* 0.000 description 2
- 238000003556 assay Methods 0.000 description 2
- 125000004429 atom Chemical group 0.000 description 2
- 230000001580 bacterial effect Effects 0.000 description 2
- 239000012148 binding buffer Substances 0.000 description 2
- 230000008033 biological extinction Effects 0.000 description 2
- 229940098773 bovine serum albumin Drugs 0.000 description 2
- 239000000872 buffer Substances 0.000 description 2
- 238000004113 cell culture Methods 0.000 description 2
- 239000013592 cell lysate Substances 0.000 description 2
- 229920002301 cellulose acetate Polymers 0.000 description 2
- 238000005119 centrifugation Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000002189 fluorescence spectrum Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 239000000499 gel Substances 0.000 description 2
- 238000004128 high performance liquid chromatography Methods 0.000 description 2
- 230000002209 hydrophobic effect Effects 0.000 description 2
- 230000005764 inhibitory process Effects 0.000 description 2
- 239000003446 ligand Substances 0.000 description 2
- 230000000670 limiting effect Effects 0.000 description 2
- 238000011068 loading method Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 125000002496 methyl group Chemical group [H]C([H])([H])* 0.000 description 2
- 238000000386 microscopy Methods 0.000 description 2
- 238000002156 mixing Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 239000000178 monomer Substances 0.000 description 2
- 238000012856 packing Methods 0.000 description 2
- NBIIXXVUZAFLBC-UHFFFAOYSA-K phosphate Chemical compound [O-]P([O-])([O-])=O NBIIXXVUZAFLBC-UHFFFAOYSA-K 0.000 description 2
- 108010066823 proline dipeptidase Proteins 0.000 description 2
- 238000000734 protein sequencing Methods 0.000 description 2
- 238000011002 quantification Methods 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 102220013614 rs397516692 Human genes 0.000 description 2
- 102220197216 rs748117555 Human genes 0.000 description 2
- 102220127170 rs886044488 Human genes 0.000 description 2
- 230000019491 signal transduction Effects 0.000 description 2
- 239000011780 sodium chloride Substances 0.000 description 2
- 238000000527 sonication Methods 0.000 description 2
- 238000010186 staining Methods 0.000 description 2
- 239000006228 supernatant Substances 0.000 description 2
- 150000003573 thiols Chemical class 0.000 description 2
- 239000011534 wash buffer Substances 0.000 description 2
- LJHYWUVYIKCPGU-VIFPVBQESA-N (2s)-2-amino-3-[4-(carboxymethyl)phenyl]propanoic acid Chemical compound OC(=O)[C@@H](N)CC1=CC=C(CC(O)=O)C=C1 LJHYWUVYIKCPGU-VIFPVBQESA-N 0.000 description 1
- ZSQPDAOJXSYJNP-LBPRGKRZSA-N (2s)-2-amino-5-(diaminomethylideneamino)-n-(4-methyl-2-oxochromen-7-yl)pentanamide Chemical compound C1=C(NC(=O)[C@@H](N)CCCN=C(N)N)C=CC2=C1OC(=O)C=C2C ZSQPDAOJXSYJNP-LBPRGKRZSA-N 0.000 description 1
- ZXSBHXZKWRIEIA-JTQLQIEISA-N (2s)-3-(4-acetylphenyl)-2-azaniumylpropanoate Chemical compound CC(=O)C1=CC=C(C[C@H](N)C(O)=O)C=C1 ZXSBHXZKWRIEIA-JTQLQIEISA-N 0.000 description 1
- 108010088751 Albumins Proteins 0.000 description 1
- 102000009027 Albumins Human genes 0.000 description 1
- 206010001935 American trypanosomiasis Diseases 0.000 description 1
- 101000787278 Arabidopsis thaliana Valine-tRNA ligase, chloroplastic/mitochondrial 2 Proteins 0.000 description 1
- 101000787296 Arabidopsis thaliana Valine-tRNA ligase, mitochondrial 1 Proteins 0.000 description 1
- 101710167800 Capsid assembly scaffolding protein Proteins 0.000 description 1
- 101710152440 Cysteine-tRNA ligase Proteins 0.000 description 1
- 102100030115 Cysteine-tRNA ligase, cytoplasmic Human genes 0.000 description 1
- 101710185308 Cysteine-tRNA ligase, cytoplasmic Proteins 0.000 description 1
- 101000787280 Dictyostelium discoideum Probable valine-tRNA ligase, mitochondrial Proteins 0.000 description 1
- 241001198387 Escherichia coli BL21(DE3) Species 0.000 description 1
- 108010024636 Glutathione Proteins 0.000 description 1
- 102000029746 Histidine-tRNA Ligase Human genes 0.000 description 1
- 108090001005 Interleukin-6 Proteins 0.000 description 1
- 102000004889 Interleukin-6 Human genes 0.000 description 1
- ODKSFYDXXFIFQN-BYPYZUCNSA-N L-arginine Chemical compound OC(=O)[C@@H](N)CCCN=C(N)N ODKSFYDXXFIFQN-BYPYZUCNSA-N 0.000 description 1
- 241000203407 Methanocaldococcus jannaschii Species 0.000 description 1
- 241000205274 Methanosarcina mazei Species 0.000 description 1
- FULZLIGZKMKICU-UHFFFAOYSA-N N-phenylthiourea Chemical class NC(=S)NC1=CC=CC=C1 FULZLIGZKMKICU-UHFFFAOYSA-N 0.000 description 1
- QSHJQJJEBPXTJC-XDWRKJDYSA-N N[C@@H](CCCCCC(=O)OC1CCCC1)C(=O)O.N[C@@H](CCCCCC(=S)CC1=CC=CC=C1)C(=O)O.[H][C@@]1(C(=O)CCCCC[C@H](N)C(=O)O)N=CC[C@H]1C Chemical compound N[C@@H](CCCCCC(=O)OC1CCCC1)C(=O)O.N[C@@H](CCCCCC(=S)CC1=CC=CC=C1)C(=O)O.[H][C@@]1(C(=O)CCCCC[C@H](N)C(=O)O)N=CC[C@H]1C QSHJQJJEBPXTJC-XDWRKJDYSA-N 0.000 description 1
- 102000035195 Peptidases Human genes 0.000 description 1
- 108091005804 Peptidases Proteins 0.000 description 1
- 102000006335 Phosphate-Binding Proteins Human genes 0.000 description 1
- 108010058514 Phosphate-Binding Proteins Proteins 0.000 description 1
- 108010001441 Phosphopeptides Proteins 0.000 description 1
- 102000004160 Phosphoric Monoester Hydrolases Human genes 0.000 description 1
- 108090000608 Phosphoric Monoester Hydrolases Proteins 0.000 description 1
- 101710130420 Probable capsid assembly scaffolding protein Proteins 0.000 description 1
- 101710121315 Probable cysteine-tRNA ligase, mitochondrial Proteins 0.000 description 1
- 239000004365 Protease Substances 0.000 description 1
- 108020004422 Riboswitch Proteins 0.000 description 1
- 101710204410 Scaffold protein Proteins 0.000 description 1
- QAOWNCQODCNURD-UHFFFAOYSA-L Sulfate Chemical compound [O-]S([O-])(=O)=O QAOWNCQODCNURD-UHFFFAOYSA-L 0.000 description 1
- 101000601553 Thermus thermophilus Phenylalanine-tRNA ligase alpha subunit Proteins 0.000 description 1
- 101000601492 Thermus thermophilus Phenylalanine-tRNA ligase beta subunit Proteins 0.000 description 1
- 102100025607 Valine-tRNA ligase Human genes 0.000 description 1
- 108030004686 Xaa-Pro aminopeptidases Proteins 0.000 description 1
- XFKQXANXMAWSKE-UHFFFAOYSA-N [S].NC(N)=S Chemical group [S].NC(N)=S XFKQXANXMAWSKE-UHFFFAOYSA-N 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000006154 adenylylation Effects 0.000 description 1
- 238000001042 affinity chromatography Methods 0.000 description 1
- 238000001261 affinity purification Methods 0.000 description 1
- 150000001408 amides Chemical class 0.000 description 1
- 230000006229 amino acid addition Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 239000012472 biological sample Substances 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 210000004899 c-terminal region Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 125000001314 canonical amino-acid group Chemical group 0.000 description 1
- 125000002915 carbonyl group Chemical group [*:2]C([*:1])=O 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000004587 chromatography analysis Methods 0.000 description 1
- 238000010367 cloning Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000001816 cooling Methods 0.000 description 1
- 230000001351 cycling effect Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000011049 filling Methods 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- RWSXRVCMGQZWBV-WDSKDSINSA-N glutathione Chemical compound OC(=O)[C@@H](N)CCC(=O)N[C@@H](CS)C(=O)NCC(O)=O RWSXRVCMGQZWBV-WDSKDSINSA-N 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 238000012203 high throughput assay Methods 0.000 description 1
- 108010047800 histidine-binding protein Proteins 0.000 description 1
- 125000001165 hydrophobic group Chemical group 0.000 description 1
- 230000003100 immobilizing effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000000126 in silico method Methods 0.000 description 1
- 238000011534 incubation Methods 0.000 description 1
- 229940100601 interleukin-6 Drugs 0.000 description 1
- 238000000752 ionisation method Methods 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 230000000155 isotopic effect Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000004811 liquid chromatography Methods 0.000 description 1
- 150000002668 lysine derivatives Chemical class 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000002844 melting Methods 0.000 description 1
- 230000008018 melting Effects 0.000 description 1
- 108020004999 messenger RNA Proteins 0.000 description 1
- 230000002503 metabolic effect Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 231100000219 mutagenic Toxicity 0.000 description 1
- 230000003505 mutagenic effect Effects 0.000 description 1
- 108020004707 nucleic acids Proteins 0.000 description 1
- 102000039446 nucleic acids Human genes 0.000 description 1
- 150000007523 nucleic acids Chemical class 0.000 description 1
- 239000012038 nucleophile Substances 0.000 description 1
- 230000000269 nucleophilic effect Effects 0.000 description 1
- 239000003960 organic solvent Substances 0.000 description 1
- 125000002467 phosphate group Chemical group [H]OP(=O)(O[H])O[*] 0.000 description 1
- 125000002924 primary amino group Chemical group [H]N([H])* 0.000 description 1
- 235000019419 proteases Nutrition 0.000 description 1
- 238000000575 proteomic method Methods 0.000 description 1
- UMJSCPRVCHMLSP-UHFFFAOYSA-N pyridine Natural products COC1=CC=CN=C1 UMJSCPRVCHMLSP-UHFFFAOYSA-N 0.000 description 1
- 125000000168 pyrrolyl group Chemical group 0.000 description 1
- 108040001032 pyrrolysyl-tRNA synthetase activity proteins Proteins 0.000 description 1
- 239000011541 reaction mixture Substances 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000002390 rotary evaporation Methods 0.000 description 1
- 238000002864 sequence alignment Methods 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 239000000377 silicon dioxide Substances 0.000 description 1
- 238000002741 site-directed mutagenesis Methods 0.000 description 1
- 150000003384 small molecules Chemical group 0.000 description 1
- 238000001179 sorption measurement Methods 0.000 description 1
- 230000000087 stabilizing effect Effects 0.000 description 1
- 125000000472 sulfonyl group Chemical group *S(*)(=O)=O 0.000 description 1
- 229910052717 sulfur Inorganic materials 0.000 description 1
- 125000004434 sulfur atom Chemical group 0.000 description 1
- 150000003467 sulfuric acid derivatives Chemical class 0.000 description 1
- 150000003549 thiazolines Chemical class 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/68—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
- G01N33/6803—General methods of protein analysis not limited to specific proteins or families of proteins
- G01N33/6818—Sequencing of polypeptides
- G01N33/6824—Sequencing of polypeptides involving N-terminal degradation, e.g. Edman degradation
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/48—Hydrolases (3) acting on peptide bonds (3.4)
- C12N9/485—Exopeptidases (3.4.11-3.4.19)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/48—Hydrolases (3) acting on peptide bonds (3.4)
- C12N9/50—Proteinases, e.g. Endopeptidases (3.4.21-3.4.25)
- C12N9/52—Proteinases, e.g. Endopeptidases (3.4.21-3.4.25) derived from bacteria or Archaea
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/48—Hydrolases (3) acting on peptide bonds (3.4)
- C12N9/50—Proteinases, e.g. Endopeptidases (3.4.21-3.4.25)
- C12N9/64—Proteinases, e.g. Endopeptidases (3.4.21-3.4.25) derived from animal tissue
- C12N9/6402—Proteinases, e.g. Endopeptidases (3.4.21-3.4.25) derived from animal tissue from non-mammals
- C12N9/6405—Proteinases, e.g. Endopeptidases (3.4.21-3.4.25) derived from animal tissue from non-mammals not being snakes
- C12N9/641—Cysteine endopeptidases (3.4.22)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/93—Ligases (6)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y304/00—Hydrolases acting on peptide bonds, i.e. peptidases (3.4)
- C12Y304/11—Aminopeptidases (3.4.11)
- C12Y304/11018—Methionyl aminopeptidase (3.4.11.18)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y601/00—Ligases forming carbon-oxygen bonds (6.1)
- C12Y601/01—Ligases forming aminoacyl-tRNA and related compounds (6.1.1)
- C12Y601/0101—Methionine-tRNA ligase (6.1.1.10)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y601/00—Ligases forming carbon-oxygen bonds (6.1)
- C12Y601/01—Ligases forming aminoacyl-tRNA and related compounds (6.1.1)
- C12Y601/0102—Phenylalanine-tRNA ligase (6.1.1.20)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y601/00—Ligases forming carbon-oxygen bonds (6.1)
- C12Y601/01—Ligases forming aminoacyl-tRNA and related compounds (6.1.1)
- C12Y601/01021—Histidine-tRNA ligase (6.1.1.21)
Definitions
- the present invention generally relates to reagents and methods for the digital analysis of proteins or peptides.
- proteins for identifying the N-terminal amino acid or N-terminal phosphorylated amino acid of a polypeptide are proteins for identifying the N-terminal amino acid or N-terminal phosphorylated amino acid of a polypeptide.
- Another aspect of the invention is an enzyme for use in the cleavage step of the Edman degradation reaction and a method for using this enzyme.
- Proteins carry out the majority of signaling, metabolic, and regulatory tasks necessary for life. As a result, a quantitative description of the proteomic state of cells, tissues, and fluids is crucial for assessing the functionally relevant differences between diseased and unaffected tissues, between cells of different lineages or developmental states, and between cells executing different regulatory programs. Although powerful high-throughput techniques are available for determining the RNA content of a biological sample, the correlation between mRNA and protein levels is low (1).
- the preferred method for proteomic characterization is currently mass spectrometry.
- mass spectrometry possesses limitations.
- One limitation is quantification. Because different proteins ionize with different efficiencies, it is difficult to compare relative amounts between two samples without isotopic labeling (2). In ‘shotgun’ strategies for analyzing complex samples, the uncertainties of peptide assignment further complicate quantification, especially for low abundance proteins (3).
- a second limitation of mass spectrometry is its dynamic range. For unbiased samples that have not undergone prefractionation or affinity purification, the dynamic range in analyte concentration is roughly 10 2 -10 3 , depending upon the instrument (4).
- DAPES Digital Analysis of Proteins by End Sequencing
- the identity of the N-terminal amino acid derivative is determined by performing, for example, 20 rounds of antibody binding with antibodies specific for each PITC-derivatized N-terminal amino acid, detection, and stripping.
- the N-terminal amino acid is removed by raising the temperature or lowering pH, and the cycle is repeated to sequence 12-20 amino acids from each peptide on the slide.
- the absolute concentration of every protein in the original sample can then be calculated based on the number of different peptide sequences observed.
- the phenyl isothiocyanate chemistry used in DAPES is the same used in Edman degradation and is efficient and robust (>99% efficiency).
- the cleavage of single amino acids requires strong anhydrous acid or alternatively, an aqueous buffer at elevated temperatures. Cycling between either of these harsh conditions is undesirable for multiple rounds of analysis on sensitive substrates used for single molecule protein detection (SMD).
- SMD single molecule protein detection
- the method for sequencing a polypeptide comprises (a) contacting the polypeptide with one or more fluorescently labeled N-terminal amino acid binding proteins (NAABs); (b) detecting fluorescence of a NAAB bound to an N-terminal amino acid of the polypeptide; (c) identifying the N-terminal amino acid of the polypeptide based on the fluorescence detected; (d) removing the NAAB from the polypeptide; (e) optionally repeating steps (a) through (d); (f) cleaving the N-terminal amino acid of the polypeptide via Edman degradation; and (g) repeating steps (a) through (f) one or more times.
- NAABs N-terminal amino acid binding proteins
- the present invention also generally relates to reagents for the digital analysis of proteins or peptides.
- proteins for identifying the N-terminal amino acid or N-terminal phosphorylated amino acid of a polypeptide are proteins for identifying the N-terminal amino acid or N-terminal phosphorylated amino acid of a polypeptide.
- the enzymatic Edman degradation method comprises reacting the N-terminal amino acid of the polypeptide with phenyl isothiocyanate (PITC) to form a PITC-derivatized N-terminal amino acid and cleaving the PITC-derivatized N-terminal amino acid using an Edman degradation enzyme.
- PITC phenyl isothiocyanate
- FIG. 1 depicts the Digital Analysis of Proteins by End Sequencing Protocol (DAPES) utilizing N-terminal amino acid binding proteins in the identification step and a synthetic enzyme in the cleavage step.
- DAPES Digital Analysis of Proteins by End Sequencing Protocol
- FIGS. 2A-2B show the binding specificity of wild-type E. coli methionine aminopeptidase (eMAP) and an engineered leucine-specific aminopeptidase (eLAP) of the present invention in a single-molecule detection experiment.
- eMAP E. coli methionine aminopeptidase
- eLAP engineered leucine-specific aminopeptidase
- FIG. 3 shows the binding specificity of an engineered mutant of methionine tRNA synthetase (MetRS) of the present invention that exhibits binding specificity for surface-immobilized peptides with N-terminal methionines.
- MetalRS methionine tRNA synthetase
- FIG. 4A-4B depict three mutations (indicated by the arrows) introduced into a model of cruzain (pdb code: 1U9Q (27)) to accommodate the phenyl moiety of the Edman reagent phenyl isothiocyanate.
- FIG. 5A depicts a model for a cleavage intermediate for Edman degradation generated using experimental small molecules structures for similar compounds and geometrically optimized using quantum chemistry calculations.
- FIG. 5B shows the model for the intermediate fitted into the active site cleft of the enzyme cruzain.
- the wild-type catalytic cysteine was removed.
- the activating residues (the other two components of the ‘catalytic triad’) were retained. These are a histidine and asparagine that are intended to activate the sulfur atom in the Edman reagent for nucleophile attack on the peptide bond.
- FIG. 6 is a graphical representation of kinetic data from cleavage experiments using an Edman degradation enzyme of the present invention and the substrate Ed-Asp-AMC.
- FIG. 7 is a trace plot of biolayer interferometry kinetics data showing the binding affinity of two proteins for peptides with N-terminal histidine residues: (1) engineered His NAAB (open circles); (2) native wild-type protein (solid circles).
- FIG. 8 is a full binding matrix showing the binding affinity of every single NAAB (row) for a single N-terminal amino acid (column) as measured by biolayer interferometry.
- the present invention is directed to a method and reagents for sequencing a polypeptide.
- the present invention provides methods and reagents for the single-molecule, high-throughput sequencing of polypeptides.
- SMD single-molecule protein detection
- reagents capable of specifically binding to N-terminal amino acids for an identification step are provided.
- the present invention also includes methods and reagents for identification phosphorylated N-terminal amino acids. Quantitatively interrogating peptide sequences in neutral aqueous environments allows for the possibility of proteomic analyses complementary to those afforded by mass spectrometry.
- the N-terminal amino acids specific for phosphorylated forms of amino acids allow for quantitative comparison of proteomic inventories and signal transduction cascades in different samples.
- the present invention is directed to a method and reagents for enzymatic Edman degradation (i.e., for enzymatically cleaving the N-terminal amino group of a polypeptide).
- a synthetic enzyme is provided that catalyzes the cleavage step of the Edman degradation reaction in an aqueous buffer and at neutral pH, thereby providing an alternative to the harsh chemical conditions typically employed in Edman degradation.
- Yet another aspect of the present invention is directed to an integrated high-throughput method for sequencing of polypeptides that includes use of reagents capable of specifically binding to N-terminal amino acids for an identification step and use of an enzymatic Edman degradation to remove N-terminal amino acids.
- NAABs N-terminal Amino Acids Binders
- N-terminal amino acid binders each selectively bind to a particular amino acid, for example one of the twenty standard naturally occurring amino acids.
- the standard, naturally-occurring amino acids include Alanine (A or Ala), Cysteine (C or Cys), Aspartic Acid (D or Asp), Glutamic Acid (E or Glu), Phenylalanine (F or Phe), Glycine (G or Gly), Histidine (H or His), Isoleucine (I or Ile), Lysine (K or Lys), Leucine (L or Leu), Methionine (M or Met), Asparagine (N or Asn), Proline (P or Pro), Glutamine (Q or Gln), Arginine (R or Arg), Serine (S or Ser), Threonine (T or Thr), Valine (V or Val), Tryptophan (W or Trp), and Tyrosine (Y or Tyr).
- the NAABs of the present invention can be made by modifying various naturally occurring proteins to introduce one or more mutations in the amino acid sequence to produce engineered proteins that bind to particular N-terminal amino acids.
- aminopeptidases or tRNA synthetases can be modified to create NAABs that selectively bind to particular N-terminal amino acids.
- NAAB that binds specifically to N-terminal leucine residues
- eMAP E. coli methionine aminopeptidase
- This NAAB (eLAP) has 19 amino acid substitutions as compared to wild-type eMAP.
- eLAP has substitutions at the amino acid positions corresponding to positions 42, 46, 56-60, 62, 63, 65-70, 81, 101, 177, and 221 of wild-type eMAP.
- the aspartate at position 42 of eMAP is replaced with a glutamate
- the asparagine at position 46 of eMAP is replaced with a tryptophan
- the valine at position 56 of eMAP is replaced with a threonine
- the serine at position 57 of eMAP is replaced with an aspartate
- the alanine at position 58 of eMAP is replaced with a serine
- the cysteine at position 59 of eMAP is replaced with a leucine
- the leucine at position 60 of eMAP is replaced with a threonine
- the tyrosine at position 62 of eMAP is replaced with a histidine
- the histidine at position 63 of eMAP is replaced with an asparagine
- the tyrosine at position 65 of eMAP is replaced with a isoleucine
- the proline at position 66 of eMAP is replaced with an aspartate
- valine at 56 could be replaced instead by serine
- leucine at 60 could be replaced instead by serine
- tyrosine at 65 could be replaced instead by valine
- cysteine at 70 could be replaced instead by threonine
- tryptophan at 221 could be replaced instead by threonine.
- one reagent in accordance with the present invention comprises an isolated, synthetic, or recombinant NAAB comprising an amino acid sequence having a glutamate residue at a position corresponding to position 42 of wild-type E. coli methionine aminopeptidase (eMAP) (SEQ ID NO: 1), a tryptophan residue at a position corresponding to position 46 of wild-type eMAP, a threonine or serine residue at a position corresponding to position 56 of wild-type eMAP, an aspartate residue at a position corresponding to position 57 of wild-type eMAP, a serine residue at a position corresponding to position 58 of wild-type eMAP, a leucine residue at a position corresponding to position 59 of wild-type eMAP, a threonine or serine residue at a position corresponding to position 60 of wild-type eMAP, a histidine residue at a position corresponding to position 62 of wild-type eMAP, an as
- the remaining amino acid sequence of the NAAB comprises a sequence similar to that of wild-type eMAP, but which may contain additional amino acid mutations (including deletions, insertions, and/or substitutions), so long as such mutations do not significantly impair the ability of the NAAB to selectively bind to N-terminal leucine residues.
- the remaining amino acid sequence can comprise an amino acid sequence having at least about 80% sequence identity to the amino acid sequence of wild-type eMAP (SEQ ID NO: 1), or at least 85%, at least 90%, at least 93%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 1.
- the NAAB comprises the amino acid sequence of SEQ ID NO: 2.
- the NAAB can consist of the amino acid sequence of SEQ ID NO: 2.
- the NAAB preferably selectively binds to N-terminal leucine residues with at least about a 1.5:1 ratio of specific to non-specific binding, more preferably about a 2:1 ratio of specific to non-specific binding.
- Non-specific binding refers to background binding, and is the amount of signal that is produced when the amino acid target of the NAAB is not present at the N-terminus of an immobilized peptide.
- NAABs can also be made by introducing mutations into class I and class II tRNA synthetases (RSs).
- RSs tRNA synthetases
- NAABs for use in the polypeptide sequencing processes described herein should possess high affinity and specificity for amino acids at the N-terminus of peptides. Because tRNA synthetases have intrinsic specificity for free amino acids, they are useful scaffolds for developing NAABs for use in protein sequencing. The inherent specificity of these scaffold proteins is retained, while broadening the binding capabilities of these proteins from free monomers to peptides, and removing unnecessary domains or functions.
- the Protein Data Bank contains multiple crystal structures for RSs specific for all twenty canonical amino acids.
- RSs do not envelop the entire amino acid, as the C-terminus must be available for adenylation.
- the binding pocket in these molecules can be modified to permit the entry of peptides presenting the specifically bound amino acid. This results in a complete set of engineered RS fragments that can bind to their cognate amino acids at the N-termini of peptides.
- the class I RS proteins form a distinct structural family that is identified by sequence homology and has been extensively characterized both biochemically and biophysically. RS proteins possess a modular architecture, and the domains conferring specificity for a particular amino acid are readily identified (18).
- Several types of mutations to improve the performance of the amino acid binding domain of an RS as a NAAB can be introduced. First, one or more mutations can be introduced into the binding domain to lock the domain into the bound conformation, eliminating the energetic cost of any induced conformational change (16). Second, one or more mutations can be introduced to widen the binding pocket for the amino acid, making room for entry of a peptide. This approach can be used for each of the RS proteins.
- mutations can be introduced into methionyl-tRNA synthetase (MetRS) from E. coli to create a NAAB that binds specifically to N-terminal methionine residues.
- This NAAB comprises a truncated version of wild-type E. coli MetRS (residues 4-547; SEQ ID NO: 3) having four substitution mutations as compared to the wild-type sequence (SEQ ID NO: 5).
- SEQ ID NO: 4 The sequence of this N-terminal methionine-specific NAAB is provided by SEQ ID NO: 4.
- the leucine at position 13 of wild-type E is provided by SEQ ID NO: 4.
- coli MetRS is replaced with a serine (L13S)
- the phenylalanine at position 260 is replaced with a leucine (Y260L)
- the aspartic acid at position 296 is replaced with a glycine (D296G)
- the histidine at position 301 is replaced with a leucine (H301L).
- one reagent in accordance with the present invention comprises an isolated, synthetic, or recombinant NAAB comprising an amino acid sequence having a serine residue at a position corresponding to position 13 of wild-type E. coli methionyl-tRNA synthetase (MetRS); a leucine residue at a position corresponding to position 260 of wild-type E. coli MetRS; a glycine residue at a position corresponding to position 296 of wild-type E. coli MetRS; and a leucine residue at a position corresponding to position 301 of wild-type E. coli MetRS.
- MethodRS E. coli methionyl-tRNA synthetase
- the remaining amino acid sequence of the NAAB comprises a sequence similar to that of amino acids 4-547 of wild-type MetRS, but may contain additional amino acid mutations (including deletions, insertions, and/or substitutions), so long as such mutations do not significantly impair the ability of the NAAB to selectively bind to N-terminal methionine residues.
- the remaining amino acid sequence can comprise an amino acid sequence having at least about 80% sequence identity to the amino acid sequence of SEQ ID NO: 3, or at least 85%, at least 90%, at least 93%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 3.
- the NAAB comprises the amino acid sequence of SEQ ID NO: 4.
- the NAAB can consist of the amino acid sequence of SEQ ID NO: 4.
- the NAAB preferably selectively binds to N-terminal methionine residues with at least about a 2:1 ratio of specific to non-specific binding, more preferably at least about a 7:1 ratio, at least about a 10:1 ratio, or about a 13:1 ratio of specific to non-specific binding.
- the starting point for the phenylalanine NAAB was the phenylalanine-tRNA synthetase (PheRS) from Thermus Thermophilus , for which a crystal structure is available.
- PheRS phenylalanine-tRNA synthetase
- the operational unit is a tetramer with two copies each of two separate proteins. Only one of the proteins has the amino acid binding specificity, so a model was made of one copy of the protein in isolation.
- the N-terminus of the protein was truncated, which exposed a significant amount of surface area that was previously buried in contacts with other proteins. This surface was hydrophobic, and mutations were made the surface to make the protein stabile and soluble as a monomer. Tighter binding of the mutant to peptides was observed when compared to the wild-type protein.
- mutations can be introduced into PheRS from Thermus Thermophilus to create a NAAB that binds specifically to N-terminal phenylalanine residues.
- This NAAB comprises a truncated version of wild-type Thermus Thermophilus PheRS (residues 86-350; SEQ ID NO: 6) having 22 substitution mutations as compared to the wild-type sequence.
- the sequence of this N-terminal phenylalanine-specific NAAB is provided by SEQ ID NO: 7.
- PheNAAB has substitutions at the amino acid positions corresponding to positions 100, 142, 143, 152-154, 165, 205, 212, 228-232, 234, 257, 287, 289, 303, 336, 338, 340 of wild-type PheRS.
- the leucine at position 100 of PheRS is replaced with an aspartate
- the histidine at position 142 of PheRS is replaced with an asparagine
- the histidine at position 143 of PheRS is replaced with a glycine
- the phenylalanine at position 152 of PheRS is replaced with a valine
- the tryptophan at position 153 of PheRS is replaced with a glycine
- the leucine at position 154 of PheRS is replaced with a lysine
- the leucine at position 165 of PheRS is replaced with an aspartate
- the phenylalanine at position 205 of PheRS is replaced with an alanine
- the histidine at position 212 of PheRS is replaced with an alanine
- the isoleucine at position 228 of PheRS is replaced with a valine
- the alanine at position 229 of PheRS is replaced with an asparagine
- one reagent in accordance with the present invention comprises an isolated, synthetic, or recombinant NAAB comprising an amino acid sequence having a an aspartate residue at a position corresponding to position 100 of wild-type PheRS from Thermus Thermophilus (SEQ ID NO: 8), an asparagine residue at a position corresponding to position 142 of wild-type PheRS, a glycine residue at a position corresponding to position 143 of wild-type PheRS, a valine residue at a position corresponding to position 152 of wild-type PheRS, a glycine residue at a position corresponding to position 153 of wild-type PheRS, a lysine residue at a position corresponding to position 154 of wild-type PheRS, an aspartate residue at a position corresponding to position 165 of wild-type PheRS, an alanine residue at a position corresponding to position 205 of wild-type PheRS, an alanine residue at a position corresponding to
- the remaining amino acid sequence of the NAAB comprises a sequence similar to that of wild-type PheRS, but which may contain additional amino acid mutations (including deletions, insertions, and/or substitutions), so long as such mutations do not significantly impair the ability of the NAAB to selectively bind to N-terminal phenylalanine residues.
- the remaining amino acid sequence can comprise an amino acid sequence having at least about 80% sequence identity to the amino acid sequence of truncated wild-type PheRS (SEQ ID NO: 6), or at least 85%, at least 90%, at least 93%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO:6.
- the NAAB comprises the amino acid sequence of SEQ ID NO: 7.
- the NAAB can consist of the amino acid sequence of SEQ ID NO: 7.
- the NAAB preferably selectively binds to N-terminal phenylalanine residues with at least about a 1.5:1 ratio of specific to non-specific binding, more preferably about a 2:1 ratio of specific to non-specific binding.
- His NAAB histidine-tRNA synthetase
- HisRS histidine-tRNA synthetase
- this NAAB comprises a truncated version of wild-type E. coli HisRS (residues 3-180; SEQ ID NO: 10) having two substitution mutations as compared to the wild-type sequence.
- the sequence of this N-terminal histidine-specific NAAB is provided by SEQ ID NO: 9.
- one reagent in accordance with the present invention comprises an isolated, synthetic, or recombinant NAAB comprising an amino acid sequence having an asparagine residue at a position corresponding to position 121 of wild-type HisRS from E. coli (SEQ ID NO: 9) and an alanine residue at a position corresponding to position 122 of wild-type HisRS.
- the remaining amino acid sequence of the NAAB comprises a sequence similar to that of wild-type HisRS, but which may contain additional amino acid mutations (including deletions, insertions, and/or substitutions), so long as such mutations do not significantly impair the ability of the NAAB to selectively bind to N-terminal histidine residues.
- the remaining amino acid sequence can comprise an amino acid sequence having at least about 80% sequence identity to the amino acid sequence of wild-type HisRS (SEQ ID NO: 9), or at least 85%, at least 90%, at least 93%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 9.
- the NAAB comprises the amino acid sequence of SEQ ID NO: 10.
- the NAAB can consist of the amino acid sequence of SEQ ID NO: 10.
- the NAAB preferably selectively binds to N-terminal histidine residues with at least about a 1.5:1 ratio of specific to non-specific binding, more preferably about a 2:1 ratio of specific to non-specific binding.
- the NAAB comprises an amino acid sequence selected from the group consisting of SEQ ID NO: 11; SEQ ID NO: 12; SEQ ID NO: 13; SEQ ID NO: 14; SEQ ID NO: 15; SEQ ID NO: 16; SEQ ID NO: 17; SEQ ID NO: 18; SEQ ID NO: 19; SEQ ID NO: 20; SEQ ID NO: 21; SEQ ID NO: 22; SEQ ID NO: 23; SEQ ID NO: 24; SEQ ID NO: 25; SEQ ID NO: 26; SEQ ID NO: 27; and SEQ ID NO: 28.
- a set of NAABs comprises at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or more of the amino acid sequences of SEQ ID NO: 2; SEQ ID NO: 4; SEQ ID NO: 7; SEQ ID NO: 10; SEQ ID NO: 11; SEQ ID NO: 12; SEQ ID NO: 13; SEQ ID NO: 14; SEQ ID NO: 15; SEQ ID NO: 16; SEQ ID NO: 17; SEQ ID NO: 18; SEQ ID NO: 19; SEQ ID NO: 20; SEQ ID NO: 21; SEQ ID NO: 22; SEQ ID NO: 23; SEQ ID NO: 24; SEQ ID NO: 25; SEQ ID NO: 26; SEQ ID NO: 27; and SEQ ID NO: 28.
- a set of NAABs comprises of the amino acid sequences of SEQ ID NO: 2; SEQ ID NO: 4; SEQ ID NO: 7; SEQ ID NO: 10; SEQ ID NO: 11; SEQ ID NO: 12; SEQ ID NO: 13; SEQ ID NO: 14; SEQ ID NO: 15; SEQ ID NO: 16; SEQ ID NO: 17; SEQ ID NO: 18; SEQ ID NO: 19; SEQ ID NO: 20; SEQ ID NO: 21; SEQ ID NO: 22; SEQ ID NO: 23; SEQ ID NO: 24; SEQ ID NO: 25; SEQ ID NO: 26; SEQ ID NO: 27; and SEQ ID NO: 28.
- PITC phenyl isothiocyanate
- LysRS lysine RNA synthetase
- a NAAB that is specific for PITC-derivatized lysine is therefore required.
- the class II RS for pyrrolysine (Py1RS) served as a starting point for development. Pyrrolysine is a lysine derivative that possesses a pyrrole ring attached to the NE atom by an amide linkage (Structure A).
- Crystal structures have been determined for PylRS bound to several ligands (23), one of which is one bond longer than pyrrolysine (Structure B), and possesses steric similarity to a model of PITC-derivatized lysine (Structure C).
- Genomic DNA for the archaea Methanosarcina mazei will be obtained from the American Type Culture Collection (ATCC).
- ATCC American Type Culture Collection
- the gene will be cloned and expressed.
- the relevant substrate for assessing compatibility with the DAPES strategy is a peptide with an N-terminal lysine that has been modified with PITC on its side chain, but not its amino terminus. It is expected that the side chain will be derivatized during previous cycles, but that the N-terminus will be regenerated by the cleavage step of the preceding cycle.
- a peptide with the sequence DKGMMGSSC will be obtained.
- the peptide will be derivatized with PITC, modifying both the N-terminus and the side chain of the lysine at the second position.
- the modified aspartate residue will be with the designed enzyme, which has excellent activity against PITC-modified aspartate.
- the resulting peptide, with an N-terminal lysine modified only on its side chain, will be purified from the reaction mixture by HPLC.
- the peptide will then be immobilized on the nanogel surface via its C-terminal cysteine.
- the liberated Py1RS domain will be fluorescently labeled with Cy5 and assayed for binding to the immobilized peptide.
- the NAABs may also include reagents capable of specifically binding to phosphorylated N-terminal amino acids (e.g., phosphotyrosine, phosphoserine, and phosphothreonine).
- reagents capable of specifically binding to phosphorylated N-terminal amino acids e.g., phosphotyrosine, phosphoserine, and phosphothreonine.
- the proteome is elaborated by post-translational modifications. These marks are reversible and provide a snapshot of the current state of a cell with respect to signaling pathways and other regulatory control.
- Side chain phosphorylation which primarily occurs on tyrosine, serine, and threonine residues, is a well-known post-translational modification.
- characterization of phosphorylated amino acids by mass spectrometry is difficult. Phosphate groups can be altered or lost during the ionization process, and sample enrichment is typically required to cope with issues of dynamic range (2). Identification of phosphorylated amino acids using digital protocols (e.g., DAPES) is improved because of the improved dynamic range and mild buffer conditions afforded by the present invention.
- the ability to distinguish between phosphorylated and unphosphorylated amino acids could have a huge impact for characterizing cellular and disease states.
- NAABs that specifically bind to either phosphoserine, phosphotyrosine, or phosphothreonine can be made by modifying certain tRNA synthetases to include one or more mutations.
- methanogenic archaea possess an RS for phosphoserine.
- methanogenic archaea lack a CysRS.
- phosphoserine (Sep) is first ligated to the tRNA for cysteine, and then converted to Cys-tRNA in a subsequent step.
- a crystal structure of SepRS, a class II synthetase in complex with Sep is available from the PDB (pdb code: 2DU3 (36)).
- RSs for several chemically similar analogs have been obtained via directed evolution (37-39).
- the class I TyrRS from Methanococcus jannaschii is the parental protein for these mutants, and a crystal structure is available for engineering (pdb code: 1U7D (apo), 1J1U(holo)).
- pdb code 1U7D (apo), 1J1U(holo)
- NAAB for pThr may also bind to N-terminal pSer. If so, this NAAB can be used for pThr and pSer, and then the specific amino acid can be inferred by evaluating the surrounding sequence to map the peptide onto a reference proteome library. Alternatively, if de novo, phosphorylation-sensitive sequencing is required, then the efficacy of applying a pSer NAAB, detecting binding, then applying a pThr NAAB without an intervening wash step will be assessed. Bound pSer termini will be blocked by the pSer NAAB, and only additional fluorescent spots will be identified as pThr residues.
- the NAABs are fluorescently labeled such that when a NAAB binds to an amino acid, fluorescence can be detected.
- Fluorophores useful for fluorescently labels on the NAABs include, for example, but are not limited to Cy3 and Cy5.
- the fluorophores are usually coupled on-specifically to free amine groups (e.g., lysine side chains) of the NAABs.
- the present invention also relates to a method for making a NAAB by introducing mutations into the amino acid sequence of a tRNA synthetase (RS) to produce a NAAB that selectively binds to a particular N-terminal amino acid.
- RS tRNA synthetase
- such methods can involve introducing one or more mutations into a naturally occurring RS (e.g., into a wild-type E. coli RS).
- Such methods can also involve introducing one or more additional mutations into an RS that already includes one or more amino acid mutations in its sequence as compared to the sequence of a corresponding wild-type RS.
- the methods for making NAABs comprise identifying the amino acid binding domain of a tRNA synthetase, introducing one or more mutations into the amino acid binding domain to create a NAAB, and assaying the NAAB for specific binding to an N-terminal amino acid of a polypeptide.
- identification of the amino acid binding domain can be accomplished, for example, by constructing a sequence alignment that aligns pairwise the amino acid sequences of two or more class I tRNA synthetases with one another, wherein one of the class I tRNA synthetases has a previously defined amino acid binding domain. This allows for identification of corresponding sequence positions between proteins in order to share useful mutations between NAABs.
- the tRNA synthetase is a first class I tRNA synthetase and the identifying step comprises aligning an amino acid sequence of the first class I tRNA synthetase with an amino acid sequence of a second class I tRNA synthetase having a previously defined amino acid binding domain.
- the amino acid binding domain of E. coli MetRS is known to be encompassed within amino acids 4 to 547 of the protein.
- the amino acid sequence of the second class I tRNA synthetase can comprise the amino acid sequence of full-length E. coli MetRS (SEQ ID NO: 5) or a fragment thereof which includes the amino acid binding domain.
- the amino acid sequence of the second class I tRNA synthetase can comprise a wild-type sequence or can comprise a sequence containing one or more mutations, so long as the presence of the mutations does not significantly impair the ability of the sequence to align with other class I tRNA synthetases.
- the amino acid sequence of the second class I tRNA synthetase can comprise the amino acid sequence of the engineered MetRS fragment described above (of SEQ ID NO: 4), which contains four amino acid substitutions as compared to the corresponding fragment of wild-type E. coli MetRS.
- the identifying step can comprise aligning the amino acid sequence of full-length E.
- coli MetRS SEQ ID NO: 5
- a class I tRNA synthetase selected from the group consisting of arginine, cysteine, glutamate, glutamine, isoleucine, leucine, lysine, methionine, tyrosine, tryptophan, and valine.
- the method can also involve constructing a multiple sequence alignment that aligns the amino acid sequences of the first class I tRNA synthetase, the second class I tRNA synthetase, and at least one additional class I tRNA synthetase.
- the multiple sequence alignment can align the sequences of at least five, at least seven, or at least nine class I tRNA synthetases.
- the multiple sequence alignment can align the amino acid sequence of full-length E.
- coli MetRS SEQ ID NO: 5
- a fragment thereof which includes the amino acid binding domain with the amino acid sequences of at least two other class I tRNA synthetases selected from the group consisting of arginine, cysteine, glutamate, glutamine, isoleucine, leucine, lysine, methionine, tyrosine, tryptophan, and valine.
- the boundaries of the amino acid binding domain of the first class I tRNA synthetase can be identified using the known boundaries of the amino acid binding domain in the second class I tRNA synthetase as a guide.
- mutations homologous to the four substitution mutations present in the engineered MetRS fragment described above are introduced into the amino acid binding domain of the class I tRNA synthetase.
- the leucine at position 13 of wild-type E. coli MetRS is replaced with a serine (L13S)
- the phenylalanine at position 260 is replaced with a leucine (Y260L)
- the aspartic acid at position 296 is replaced with a glycine (D296G)
- the histidine at position 301 is replaced with a leucine (H301L).
- the binding affinity of each NAAB containing these mutations against a panel of N-terminal amino acids can be predicted in silica using a computer modeling program (e.g., the Rosetta modeling program).
- Any NAAB with significant predicted cross-binding with undesired target peptides can be subjected to computational redesign for specificity using a multi-state strategy (11).
- the computational redesign may identify one or more additional mutations likely to improve the binding specificity of the NAAB for a particular N-terminal amino acid.
- structural models of the NAAB in complex with both the desired and undesired amino acids are constructed in silico.
- Similar methods can be used to identify the amino acid binding domains of the class II RSs and introduce mutations into those domains to produce NAABs that selectively bind to N-terminal amino acids that are activated by class II RSs (Ala, Pro, Ser, Thr, His, Asp, Asn, Lys, Gly, and Phe).
- the catalytic domain of class II RS proteins contains the amino acid specificity for the enzyme, and these domains can be used as a starting point for developing additional NAABs.
- class II RSs function as multimers
- the catalytic domain of the HisRS from E. coli can be made monomeric by liberating it from its activation domain (20).
- the crystal structure of the enzyme in complex with histidyl-adenylate is available (pdb code 1KMM (21)), and can serve as a basis for computational structure-based design. At least one RS crystal structure is available for each of the amino acids activated by class II RSs (Ala, Pro, Ser, Thr, His, Asp, Asn, Lys, Gly, and Phe).
- the amino acid binding domains for each of the class II RSs can be identified using the monomeric fragment of E. coli HisRS (SEQ ID NO: 9) as a guide to identify corresponding domains in other class II RSs.
- Structural alignments between the monomeric fragment of E. coli HisRS can be obtained from the Dali web server (22).
- Multiple sequence alignments for the conserved class II catalytic domain can be obtained from the Pfam database (19). Using these alignments, boundaries for the amino acid binding domains for class II RSs can be identified.
- the tRNA synthetase is a first class II tRNA synthetase and the step of identifying the amino acid binding domain comprises aligning an amino acid sequence of the first class II tRNA synthetase with an amino acid sequence of a second class II tRNA synthetase having a previously defined amino acid binding domain.
- the amino acid sequence of the second class II tRNA synthetase can comprise the amino acid sequence a monomeric fragment of E. coli HisRS that contains the amino acid binding domain (e.g., SEQ ID NO: 9).
- the amino acid sequence of the second class II tRNA synthetase can comprise a wild-type sequence or can comprise a sequence containing one or more mutations, so long as the presence of the mutations does not significantly impair the ability of the sequence to align with other class I tRNA synthetases.
- the identifying step can comprise aligning the amino acid sequence of the monomeric fragment of E. coli HisRS with a corresponding domain of a class II tRNA synthetase selected from the group consisting of AlaRS, ProRS, SerRS, ThrRS, AspRS, AsnRS, LysRS, GlyRS, and PheRS.
- a class II tRNA synthetase selected from the group consisting of AlaRS, ProRS, SerRS, ThrRS, AspRS, AsnRS, LysRS, GlyRS, and PheRS.
- the identifying step can also comprise constructing a multiple sequence alignment that aligns the amino acid sequences of the first class II tRNA synthetase, the second class II tRNA synthetase, and at least one additional class II tRNA synthetase.
- the multiple sequence alignment can align the sequences of at least five, at least seven, or at least nine class II tRNA synthetases.
- the multiple sequence alignment can align the amino acid sequence of a monomeric fragment of E.
- the multiple sequence alignment can align the amino acid sequence of a monomeric fragment of E. coli HisRS that contains the amino acid binding domain with corresponding domains of AlaRS, ProRS, SerRS, ThrRS, AspRS, AsnRS, LysRS, GlyRS, and PheRS.
- mutations are introduced into the amino acid binding domain in order to increase the binding affinity of the domain for a particular N-terminal amino acid.
- the methods involving class II tRNA synthetases can also further comprise using a computer modeling program to predict the binding affinity of the NAAB against a panel of N-terminal amino acids.
- the NAAB can be subjected to computational redesign to identify one or more additional mutations to improve the binding specificity of the NAAB for a particular N-terminal amino acid. Any additional mutations identified using computational redesign can then be introduced into the NAAB.
- the NAABs designed and made using any of the above methods can cloned into an expression vector, expressed in a host cell (e.g., in an E. coli host cell), purified, and assayed for specific binding to an N-terminal amino acid of a polypeptide.
- a host cell e.g., in an E. coli host cell
- the binding activity for each NAAB can be assayed against a standard set of polypeptides having different N-terminal residues (e.g., custom synthesized peptides of the form XGMMGSSC, where X is a variable position occupied by each of the twenty amino acids).
- NAABs derived from class II tRNA synthetases if any of the E. coli protein fragments prove to are insoluble or perform poorly as NAABs, protein design can be used to redesign hydrophobic residues that become exposed upon monomerization. If a crystal structure is unavailable for the E. coli protein, a synthetic gene for an RS with an experimentally determined structure can be obtained. The availability of structures for these proteins allows application of protein surface redesign if the domain truncation results in loss of solubility, binding pocket redesign for enhanced affinity if binding is weak, or multi-state design for enhanced specificity if promiscuous binding is observed (11).
- the tRNA synthetase amino acid sequences can be E. coli tRNA synthetase amino acid sequences.
- sequences can be aligned pairwise by various methods known in the art, for example, using the hidden Markov models available in the Pfam database (19), dynamic programming, and heuristic methods like BLAST.
- mutations that favor desired binding and disfavor undesired binding can be introduced into any of the wild-type proteins described above by various methods, for example, using mutagenic primers to introduce mutations via site-directed mutagenesis, PCR-based mutagenesis and Kunkel mutagenesis.
- mutagenic primers to introduce mutations via site-directed mutagenesis, PCR-based mutagenesis and Kunkel mutagenesis.
- QUICKCHANGE Aligent Technologies
- the NAABs discussed above are used as reagents in a method of polypeptide sequencing.
- the method of sequencing a polypeptide comprises the steps of:
- the polypeptide is contacted with one or more NAABs.
- the polypeptide is contacted with a single type of NAAB that selectively binds to a single type of N-terminal amino acid residue (e.g., a NAAB that selectively binds to N-terminal alanine residues or a NAAB that selectively binds to N-terminal methionine residues).
- the polypeptide is contacted with a mixture of two or more types of NAABs that each selectively binds to different amino acid residues.
- the mixture may comprise two NAABs such as a NAAB that selectively binds to N-terminal alanine residues and a NAAB that selectively binds to N-terminal cysteine residues.
- a mixture comprising two or more NAABs that selectively bind to different amino acid residues is especially useful when sequencing several polypeptides simultaneously. Introducing multiple different NAABs also reduces sequencing time because multiple N-terminal amino acid residues can be identified during a single iteration of steps (a) through (d).
- the method comprises sequencing a plurality of polypeptides. These embodiments are especially suited for high throughput sequencing methods.
- the polypeptide may be immobilized on a substrate prior to contact with the one or more NAABs.
- the peptide may be immobilized on any suitable substrate.
- nanogel substrates have been developed with low non-specific adsorption of proteins and the ability to visualize single attached molecules on this surface (8, 9).
- a plurality of polypeptides may be immobilized on the substrate for sequencing. Immobilizing a plurality of polypeptides is especially suited for high throughput sequencing methods.
- the NAABs of the present inventions are fluorescently labeled with a fluorophore such that when a NAAB binds to a N-terminal amino acid, fluorescence emitted by the fluorophore can be detected by an appropriate detector.
- Suitable fluorophores include, but are not limited to Cy3 and Cy5. Fluorescence can suitably be detected by detectors known in the art. Based on the fluorescence detected, the N-terminal amino acid of the polypeptide can identified.
- each type of NAAB is suitably labeled with different fluorophores having different fluorescence emission spectra.
- the contacting step can comprise contacting the polypeptide with a first type of NAAB and a second type of NAAB, wherein the first type of NAAB selectively binds to a first type of N-terminal amino acid residue and the second type of NAAB selectively binds to a second type of N-terminal amino acid residue different from the first type of N-terminal amino acid residue.
- the first type of NAAB is suitably coupled to a first fluorophore and the second type of NAAB is suitably coupled to a second fluorophore, wherein the first and second fluorophores have different fluorescence emission spectra.
- step (d) the one or more NAABs are removed from the polypeptide(s).
- Removing the one or more NAABs includes removing any excess NAABs present in solution and/or removing any NAABs that are bound to N-terminal amino acids of the polypeptides. Removal of the NAABs is suitably accomplished by washing the polypeptide with a suitable wash buffer in order to cause dissociation of any bound NAABs.
- the reagent may be removed by contacting the substrate with a suitable wash buffer.
- Steps (a)-(d) may be repeated any number of times until the N-terminal amino acid of the polypeptide has been identified.
- steps (a)-(d) may be repeated any number of times until all of the N-terminal amino acids of the polypeptide(s) have been identified.
- a different NAAB or a set of NAABs may be used in step (a) to probe the N-terminal amino acid of the polypeptide(s).
- step (a) comprises contacting the polypeptide with a single type of NAAB that selectively binds to a single type of N-terminal amino acid residue
- steps (a) through (d) up to 24 or more times in order to probe the polypeptide with a NAAB specific for each of the twenty standard amino acids, for PITC-derivatized lysine, and for each of the three common phosphorylated amino acids.
- step (a) comprises contacting the polypeptide with two or more different types of NAABs simultaneously, fewer repetitions of steps (a) through (d) will be necessary to identify the N-terminal amino acid of the polypeptide.
- the N-terminal amino acid(s) may be cleaved from the polypepitde(s) via Edman degradation.
- the Edman degradation comprises reacting the N-terminal amino acid of the polypeptide with phenyl isothiocyanate (PITC) to form a PITC-derivatized N-terminal amino acid, and cleaving the PITC-derivatized N-terminal amino acid.
- PITC phenyl isothiocyanate
- the modified N-terminal amino acid may be cleaved using an Edman degradation enzyme as described in further detail below.
- the modified N-terminal amino group may be cleaved by methods known in the art including contact with acid or exposure to high temperature.
- any substrate comprising the immobilized polypeptide(s) should be compatible with the acidic conditions or high temperatures.
- FIG. 1 provides a diagrammatic representation of the steps of a method of polypeptide sequencing according to the present invention.
- step 1 of FIG. 1 multiple polypeptide molecules are immobilized on a substrate.
- the individual peptide molecules are suitably spatially segregated on the substrate.
- Analyte proteins may be fragmented into two or more polypeptides prior to immobilization on the substrate.
- step 2 of FIG. 1 the immobilized polypeptides are contacted with a fluorescently labeled NAAB and fluorescence of the NAAB bound to the N-terminal amino acid of any of the peptides is detected.
- An image of the substrate is suitably captured at this stage.
- the NAAB is washed off the substrate. This cycle of binding, detection, and removal of the NAAB is repeated until the N-terminal amino acids of all of the immobilized polypeptides have been identified (step 3 ).
- step 4 the N-termini of the polypeptides are reacted with phenyl isothiocyanate (PITC) (black ovals in FIG. 1 ).
- PITC phenyl isothiocyanate
- step 5 an Edman degradation (“Edmanase”), catalyzes the removal of the PITC-derivatized N-terminal amino acid under mild conditions. In each complete cycle, one amino acid is sequenced from each peptide and a new N-terminus is generated for identification in subsequent cycles (step 6 ).
- Edmanase Edman degradation
- NAABs may bind smaller, sterically similar off-target amino acids.
- the isoluecine-specific NAAB derived from IleRS and the threonine-specific NAAB derived from ThrRS may bind N-terminal valine and serine residues, respectively, in addition to their desired targets.
- this does not hinder the effectiveness of this protein sequencing technique.
- various aspects of the present invention relate to a reagent comprising NAABs for all twenty amino acids, the optimal set size for actual sequencing may be less than twenty. Reducing the number of NAABs involves trading off absolute specificity for fewer binding molecules by using a reduced alphabet for protein sequences.
- valine-specific NAAB derived from ValRS can be added before the isoleucine-specific NAAB derived from IleRS, with the intention of identifying and capping N-terminal valine residues before molecules intended to target isoleucine residues that can bind to them.
- Methods of the present invention possess attractive features relative to mass spectrometry. Because detection operates at the single molecule level, this method will have excellent dynamic range, and will be appropriate for extremely small amounts of sample. Furthermore, the digital nature of the detection produces inherently quantitative data. Finally, because all steps can be carried out in neutral aqueous buffer, post-translation modifications (e.g., phosphorylations) remain stable and available for analysis.
- post-translation modifications e.g., phosphorylations
- the present invention is directed to a method and reagents for enzymatic Edman degradation (i.e., cleaving the N-terminal amino acid of a polypeptide).
- one or more enzymes are provided that catalyze the cleavage step of the Edman degradation in aqueous buffer and at neutral pH, thereby providing an alternative to the harsh chemical conditions typically employed in conventional Edman degradation.
- the Edman degradation enzyme a modified cruzain enzyme.
- Cruzian is a cysteine protease in the protozoa Trypanosoma cruzi and was discovered to possess many of the desired characteristics for creating an Edman degradation enzyme.
- polypeptides are sequenced by degradation from their N-terminus using the Edman reagent, phenyl isothiocyanate (PITC).
- PITC Edman reagent
- the process requires two steps: coupling and cleavage.
- the first step (coupling) the N-terminal amino group of a peptide reacts with phenyl isothiocyanate to form a thiourea.
- anhydrous acid e.g., trifluoroacetic acid
- the N-terminal amino acid is released as a thiazolinone derivative.
- the thiazoline derivative may be extracted into an organic solvent, dried, and converted to the more stable phenylthiohydantoin (PTH) derivative for analysis.
- PTH phenylthiohydantoin
- the most convenient method for identifying the PTH-amino acids generated during each sequencing cycle is by UV absorbance and HPLC chromatography. Each amino acid is detected by it UV absorbance at 269 nm and is identified by its characteristic retention time.
- the N-terminal amino acid has already been identified. Therefore, there is no need to generate or detect a phenylthiohydantoin derivative of the terminal amino acid.
- the strongly acidic conditions typically used in the cleavage step of conventional Edman degradation protocols are incompatible with the substrate surface upon which the polypeptides are immobilized for single molecule protein detection (SMD) (e.g., a nanogel surface).
- SMD single molecule protein detection
- One modification of the conventional Edman degradation dispenses with the acidic conditions promotes cleavage with elevated temperature (e.g., 70-75° C.) instead (25).
- the present invention provides a method of performing the Edman degradation which dispenses with both acidic conditions and elevated temperature.
- an enzyme has been developed which accomplishes the cleavage step in a neutral, aqueous buffer. This enzyme avoids acidic conditions and high temperatures and decreases the cycle time for polypeptide sequencing by reducing or eliminating the need to change buffer and temperature conditions repeatedly.
- the Edman degradation enzyme (or “Edmanase”) accomplishes the chemical step of the N-terminal degradation by nucleophilic attack of the thiourea sulfur atom on the carbonyl group of the scissile peptide bond.
- the enzyme was made by modifying cruzain, a cysteine protease from the protozoa Trypanosoma cruzi (SEQ ID NO: 30).
- Cruzain prefers hydrophobic amino acids at the S2 position relative to the scissile bond, which corresponds to the phenyl moiety of the Edman reagent.
- the protease is relatively insensitive to the identity of the amino acid at the S1 position (29), allowing for promiscuous cleavage of diverse N-terminal residues.
- this protein has been the subject of extensive structural characterization (27).
- the Edman degradation enzyme differs from the wild-type of cysteine protease cruzain at four positions.
- One mutation C25G removes the catalytic cysteine residue while three mutations (G65S, A138C, L160Y) were selected to create steric fit with the phenyl moiety of the Edman reagent (PITC).
- FIG. 4A-4B depicts latter three mutations (indicated by the arrows) introduced into a model of cruzain (pdb code: 1U9Q (27); SEQ ID NO: 30) to accommodate the phenyl moiety of the Edman reagent phenyl isothiocyanate.
- FIG. 4A depicts a model for the cleavage intermediate of an N-terminal alanine residue in the active site cleft.
- two wild-type residues shown in green sticks
- FIG. 4B depicts a space-filling representation of the packing of the phenyl ring by protein side chains.
- the methyl group of the ligand in gray at the top of the panel
- one aspect of the present invention relates to an isolated, synthetic, or recombinant Edman degradation enzyme comprising an amino acid sequence having a glycine residue at a position corresponding to position 25 of wild-type Trypanosoma cruzi cruzian; a serine residue at a position corresponding to position 65; a cysteine residue at a position corresponding to position 138; and a tryptophan residue at a position corresponding to position 160.
- the remaining amino acid sequence of the Edman degradation enzyme comprises a sequence similar to that of wild-type Trypanosoma cruzi cruzian, but may contain additional amino acid mutations (including deletions, insertions, and/or substitutions, so long as such mutations do not significantly impair the ability of the Edman degradation enzyme to cleave PITC-derivatized N-terminal amino acids.
- the remaining amino acid sequence can have at least about 80%, or at least 85%, at least 90%, at least 93%, at least 95%, at least 96%, at least 87%, at least 98%, or at least 99% sequence identity with the sequence of the wild-type Trypanosoma cruzi cruzian.
- the Edman degradation enzyme comprises the sequence of SEQ ID NO: 29.
- the Edman degradation enzyme can consist of the sequence of SEQ ID NO: 29.
- the reagents for enzymatic Edman degradation comprise two or more enzymes.
- one point of concern is the ability to cleave proline residues. If a single mutant of cruzain cannot accomplish this reaction, then an additional enzyme would be required.
- Naturally occurring enzymes cleave dipeptides of the form Xaa-Pro from the N-terminus of peptides, for example, quiescent cell proline dipeptidase (QPP) (35), and Xaa-Pro amino peptidase (pdb code: 30VK).
- QPP quiescent cell proline dipeptidase
- pdb code Xaa-Pro amino peptidase
- Example 1 eLAP: a NAAB that Specifically Binds to N-Terminal Leucine Residues
- an E. coli methionine aminopeptidase was modified to create a NAAB that binds specifically to N-terminal leucine residues.
- Two mutually compatible leucine-contacting interactions were identified from the protein data bank (PDB) (15) that could be incorporated into the eMAP structure.
- the surrounding protein residues of eMAP were redesigned around these two interactions.
- the resulting NAAB for leucine (eLAP) has 19 amino acid mutations relative to eMAP.
- the eMAP and eLAP proteins were expressed and assayed for binding against a panel of peptides with different N-termini.
- the NAAB for N-terminal leucine amino acids was non-specifically labeled with Cy5 fluorophore on lysine side chains.
- Synthetic peptides with either N-terminal methionine, leucine, or asparagine amino acids were coupled to a nanogel surface by thiol linkage. An additional experiment was performed with no peptide added. The labeled NAAB was briefly incubated with the immobilized peptide, and unbound protein was removed by washing.
- FIGS. 2A-2B show the binding specificity of wild-type E. coli methionine aminopeptidase (eMAP) and eLAP in a single-molecule detection experiment.
- eMAP E. coli methionine aminopeptidase
- FIG. 2A fluorescently labeled eMAP and eLAP NAABs were visualized after binding to immobilized peptides with different N-terminal amino acids.
- FIG. 2B depicts histograms of quantitative binding. Digital analysis of NAAB binding for eMAP and eLAP showed that each NAAB was specific for the expected N-terminal amino acid. Both proteins exhibited roughly a 2:1 ratio of specific to non-specific binding.
- a truncated version of wild-type E. coli methionyl-tRNA synthetase (MetRS) from E. coli . was modified to make a NAAB that binds specifically to N-terminal methionine residues.
- MetRS wild-type E. coli methionyl-tRNA synthetase
- a truncated version of MetRS having three amino acid mutations (L13S, Y260L, and H301L) that had been shown to pre-organize the binding site towards the methionine-bound conformation was obtained (16).
- a crystal structure is available of this mutant bound to free methionine (pdb code: 3h99).
- D296G An additional mutation (D296G) was introduced to provide a more open binding pocket capable of accommodating a peptide and avoid steric clashes.
- This mutation was introduced into MetRS and the altered protein was expressed in E. coli .
- the gene encoding MetRS from genomic DNA was amplified and was cloned into the pET42a expression vector between the Mfel and XhoI sites. This yielded a genetic fusion of a thrombin-cleavable GST tag and MetRS.
- the mutations were introduced using the QuikChange protocol.
- the proteins were expressed at 16° C. overnight using the autoinduction protocol of Studier (17).
- the GST-MetRS fusion was purified from lysates by affinity chromatography using GSTrap columns on a Bio-Rad liquid chromatography system. Following purification, proteins were labeled with Cy5 fluorophore on lysine side chains for single-molecule binding assays.
- HISRS histidine-tRNA synthetase
- this NAAB comprises an internally truncated version of wild-type E. coli HisRS (residues 3-180; SEQ ID NO: 10) having seven fewer residues as compared to the wild-type sequence.
- SEQ ID NO: 9 The sequence of this N-terminal histidine-specific NAAB is provided by SEQ ID NO: 9.
- FIG. 7 shows that engineered HisNAAB (SEQ ID NO: 10) exhibits enhanced binding affinity for peptides with N-terminal histidine residues as compared to the wild-type fragment.
- Biolayer interferometry kinetics data show that the engineered HisNAAB (data in open circles) binds N-terminal histidine with the same off-rate as the wild-type fragment (SEQ ID NO: 90 (data in solid circles), but with an enhanced on-rate.
- the engineered His NAAB binds with an approximately 10-fold improvement in binding affinity.
- a synthetic gene containing the Edman degradation enzyme was purchased from GenScript.
- GenScript The gene encoded a modified version of the cruzian enzyme of T. cruzi having the following substitution mutations: C25G, G65S, A138C, and L160Y.
- the gene was inserted between an NdeI and an XhoI site in a pet42(a) (Novagen) expression vector and transformed into E. coli , BL-21(De3) chemically competent cells. Protein was then over-expressed following Studier's auto-induction protocol. Bacterial cells were harvested by centrifugation of the cell culture at 5000 rpm and 4° C. for 10 minutes. Cells were then suspended in 1 ⁇ PBS with 10% glycerol and 6M guanidine chloride, pH 7.4. Cells were then lysed by sonication (15 seconds at 20% power, 8 times on ice). The cell lysate was centrifuged at 18000 rpm, 4 degrees for 20 minutes.
- the supernatant was then filtered through a 0.2 ⁇ m cellulose acetate filter.
- the filtered lysate was loaded onto a 5 mL HisTrap (Ni-NTA) column and washed with 5 column volumes of binding buffer (50 mM Tris-HCl, 150 mM NaCl, 6M guanidine chloride, 25 mM imidazole).
- binding buffer 50 mM Tris-HCl, 150 mM NaCl, 6M guanidine chloride, 25 mM imidazole.
- Bound protein was then eluted in 50 mM Tris-HCl, 150 mM NaCl, 6M guanidine chloride, 500 mM imidazole. Purified fractions were prepared for SDS-PAGE analysis by mixing 2 parts sample with 1 part 4 ⁇ loading dye.
- AMC aminomethylcoumarin
- PAChem BAChem
- Arg-AMC Asn-AMC
- Phe-AMC Met-AMC
- Ala-AMC Ala-AMC
- Pro-AMC Phenylisothiocyanate
- PITC Phenylisothiocyanate
- Edman degradation enzyme The ability of the Edman degradation enzyme to perform N-terminal cleavage on six substrates of the form Ed-X-AMC, where Ed denotes the Edman reagent, X is an amino acid from the set (Ala, Asp, Phe, Met, Pro, Arg), and AMC is the fluorogenic amidomethylcoumarin group was characterized. Cleavage of the X-AMC bond was monitored by the appearance of fluorescence ( FIG. 6 ). The engineered protein displayed activity against all six substrates to varying degrees (See Table below).
- NAABs N-terminal Amino Acid Binding Proteins
- NAABs N-terminal Amino Acid Binders
- Protein was over-expressed following Studier's auto-induction protocol. Bacterial cells were harvested by centrifugation of the cell culture at 5000 rpm and 4° C. for 10 minutes. Cells were then suspended in 1 ⁇ PBS with 10% glycerol, pH 7.4. Cells were then lysed by sonication (15 seconds at 20% power, 8 times on ice). The cell lysate was centrifuged at 18000 rpm, 4 degrees for 20 minutes. The supernatant was then filtered through a 0.2 um cellulose acetate filter. The filtered lysate was loaded onto a 1 mL GSTrap column and washed with 5 column volumes of binding buffer (1 ⁇ PBS).
- Bound protein was then eluted in 50 mM Tris-HCl, 10 mM reduced glutathione. Purified fractions were prepared for SDS-PAGE analysis by mixing 2 parts sample with 1 part 4 ⁇ loading dye. Samples were analyzed on 16% SDS-PAGE precast gels, and visualized by Coomassie staining. Protein concentration was determined using the calculated molar extinction coefficient and measuring the A280 on an ND-8000 spectrophotometer (Thermo Fisher Scientific).
- FIG. 8 is a full binding matrix that shows how well every engineered NAAB protein binds to every N-terminal amino acid.
- Each square in the binding matrix represents the binding affinity for a single NAAB with an N-terminal amino acid as measured by biolayer interferometry.
- Each row in the matrix contains all the binding data for a single NAAB, and each column contains the binding data for a single N-terminal amino acid (shown by single-letter code). Darker squares represent tighter binding.
- the NAABs exhibit cross-binding for chemically similar N-terminal amino acids. However, the set of predicted binding patterns for each amino acid are distinct. Thus, when taken as a set, the engineered NAAB proteins are capable of identifying amino acids at the N-terminus of peptides.
- amino acids For reference, the abbreviations of the amino acids are as follows:
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Organic Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Genetics & Genomics (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- General Health & Medical Sciences (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Biotechnology (AREA)
- Medicinal Chemistry (AREA)
- Microbiology (AREA)
- Hematology (AREA)
- Physics & Mathematics (AREA)
- Urology & Nephrology (AREA)
- Immunology (AREA)
- Biophysics (AREA)
- Cell Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Computational Biology (AREA)
- Food Science & Technology (AREA)
- Analytical Chemistry (AREA)
- General Physics & Mathematics (AREA)
- Pathology (AREA)
- Enzymes And Modification Thereof (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Peptides Or Proteins (AREA)
Abstract
Description
- This application is a continuation of U.S. patent application Ser. No. 16/907,813, filed Jun. 22, 2020, which is a continuation of U.S. patent application Ser. No. 15/255,433, filed Sep. 2, 2016, now abandoned, which is a division of U.S. patent application Ser. No. 14/211,448, filed Mar. 14, 2014, now U.S. Pat. No. 9,435,810, issued Sep. 5, 2016, which claims the benefit of U.S. Provisional Application No. 61/798,705, filed Mar. 15, 2013, the entire disclosures of which are incorporated herein by reference.
- This invention was made with Government support under grant R01 GM101602 awarded by the National Institutes of Health. The Government has certain rights in the invention.
- The present invention generally relates to reagents and methods for the digital analysis of proteins or peptides. Specifically provided herein are proteins for identifying the N-terminal amino acid or N-terminal phosphorylated amino acid of a polypeptide. Another aspect of the invention is an enzyme for use in the cleavage step of the Edman degradation reaction and a method for using this enzyme.
- Proteins carry out the majority of signaling, metabolic, and regulatory tasks necessary for life. As a result, a quantitative description of the proteomic state of cells, tissues, and fluids is crucial for assessing the functionally relevant differences between diseased and unaffected tissues, between cells of different lineages or developmental states, and between cells executing different regulatory programs. Although powerful high-throughput techniques are available for determining the RNA content of a biological sample, the correlation between mRNA and protein levels is low (1).
- The preferred method for proteomic characterization is currently mass spectrometry. Despite its many successes, mass spectrometry possesses limitations. One limitation is quantification. Because different proteins ionize with different efficiencies, it is difficult to compare relative amounts between two samples without isotopic labeling (2). In ‘shotgun’ strategies for analyzing complex samples, the uncertainties of peptide assignment further complicate quantification, especially for low abundance proteins (3). A second limitation of mass spectrometry is its dynamic range. For unbiased samples that have not undergone prefractionation or affinity purification, the dynamic range in analyte concentration is roughly 102-103, depending upon the instrument (4). This is problematic for complex samples such as blood, where two proteins whose levels are measured in clinical laboratories (albumin and interleukin-6) can differ in abundance by 1010 (5). Another limitation is the analysis of phosphopeptides, due to the loss of phosphate in some ionization modes. The power of proteomic approaches would increase dramatically with the introduction of a more quantitative high-throughput assay possessing greater dynamic range.
- One promising technology for the analysis of proteins in a sensitive and quantitative manner was developed by Mitra et al (7). This technology, referred to as Digital Analysis of Proteins by End Sequencing or DAPES, features a method for single molecule protein analysis. To perform DAPES, a large number (ca. 109) of protein molecules are denatured and cleaved into peptides. These peptides are immobilized on a nanogel surface applied to the surface of a microscope slide and their amino acid sequences are determined in parallel using a method related to Edman degradation. Phenyl isothiocyanate (PITC) is added to the slide and reacts with the N-terminal amino acid of each peptide to form a stable phenylthiourea derivative. Next, the identity of the N-terminal amino acid derivative is determined by performing, for example, 20 rounds of antibody binding with antibodies specific for each PITC-derivatized N-terminal amino acid, detection, and stripping. The N-terminal amino acid is removed by raising the temperature or lowering pH, and the cycle is repeated to sequence 12-20 amino acids from each peptide on the slide. The absolute concentration of every protein in the original sample can then be calculated based on the number of different peptide sequences observed.
- The phenyl isothiocyanate chemistry used in DAPES is the same used in Edman degradation and is efficient and robust (>99% efficiency). However, the cleavage of single amino acids requires strong anhydrous acid or alternatively, an aqueous buffer at elevated temperatures. Cycling between either of these harsh conditions is undesirable for multiple rounds of analysis on sensitive substrates used for single molecule protein detection (SMD). Thus, there is a need in the art for improved reagents and methods for the parallel analysis of peptides in single molecule protein detection (SMD) format.
- One aspect of the invention is an improved method for single molecule sequencing of proteins or peptides. Generally, the method for sequencing a polypeptide, the method comprises (a) contacting the polypeptide with one or more fluorescently labeled N-terminal amino acid binding proteins (NAABs); (b) detecting fluorescence of a NAAB bound to an N-terminal amino acid of the polypeptide; (c) identifying the N-terminal amino acid of the polypeptide based on the fluorescence detected; (d) removing the NAAB from the polypeptide; (e) optionally repeating steps (a) through (d); (f) cleaving the N-terminal amino acid of the polypeptide via Edman degradation; and (g) repeating steps (a) through (f) one or more times.
- The present invention also generally relates to reagents for the digital analysis of proteins or peptides. Specifically provided herein are proteins for identifying the N-terminal amino acid or N-terminal phosphorylated amino acid of a polypeptide.
- Another aspect of the invention relates to an enzyme for use in the cleavage step of the Edman degradation reaction and a method for using this enzyme. Generally, the enzymatic Edman degradation method comprises reacting the N-terminal amino acid of the polypeptide with phenyl isothiocyanate (PITC) to form a PITC-derivatized N-terminal amino acid and cleaving the PITC-derivatized N-terminal amino acid using an Edman degradation enzyme.
- Other objects and features will be in part apparent and in part pointed out hereinafter.
-
FIG. 1 depicts the Digital Analysis of Proteins by End Sequencing Protocol (DAPES) utilizing N-terminal amino acid binding proteins in the identification step and a synthetic enzyme in the cleavage step. -
FIGS. 2A-2B show the binding specificity of wild-type E. coli methionine aminopeptidase (eMAP) and an engineered leucine-specific aminopeptidase (eLAP) of the present invention in a single-molecule detection experiment. -
FIG. 3 shows the binding specificity of an engineered mutant of methionine tRNA synthetase (MetRS) of the present invention that exhibits binding specificity for surface-immobilized peptides with N-terminal methionines. -
FIG. 4A-4B depict three mutations (indicated by the arrows) introduced into a model of cruzain (pdb code: 1U9Q (27)) to accommodate the phenyl moiety of the Edman reagent phenyl isothiocyanate. -
FIG. 5A depicts a model for a cleavage intermediate for Edman degradation generated using experimental small molecules structures for similar compounds and geometrically optimized using quantum chemistry calculations. -
FIG. 5B shows the model for the intermediate fitted into the active site cleft of the enzyme cruzain. The wild-type catalytic cysteine was removed. The activating residues (the other two components of the ‘catalytic triad’) were retained. These are a histidine and asparagine that are intended to activate the sulfur atom in the Edman reagent for nucleophile attack on the peptide bond. -
FIG. 6 is a graphical representation of kinetic data from cleavage experiments using an Edman degradation enzyme of the present invention and the substrate Ed-Asp-AMC. -
FIG. 7 is a trace plot of biolayer interferometry kinetics data showing the binding affinity of two proteins for peptides with N-terminal histidine residues: (1) engineered His NAAB (open circles); (2) native wild-type protein (solid circles). -
FIG. 8 is a full binding matrix showing the binding affinity of every single NAAB (row) for a single N-terminal amino acid (column) as measured by biolayer interferometry. - In one aspect, the present invention is directed to a method and reagents for sequencing a polypeptide. In particular, the present invention provides methods and reagents for the single-molecule, high-throughput sequencing of polypeptides. Recent advances in single-molecule protein detection (SMD) allow for the parallel analysis of large numbers of individual proteins utilizing digital protocols. In accordance with the present invention, reagents capable of specifically binding to N-terminal amino acids for an identification step are provided.
- The present invention also includes methods and reagents for identification phosphorylated N-terminal amino acids. Quantitatively interrogating peptide sequences in neutral aqueous environments allows for the possibility of proteomic analyses complementary to those afforded by mass spectrometry. The N-terminal amino acids specific for phosphorylated forms of amino acids allow for quantitative comparison of proteomic inventories and signal transduction cascades in different samples.
- In another aspect, the present invention is directed to a method and reagents for enzymatic Edman degradation (i.e., for enzymatically cleaving the N-terminal amino group of a polypeptide). In accordance with this aspect, a synthetic enzyme is provided that catalyzes the cleavage step of the Edman degradation reaction in an aqueous buffer and at neutral pH, thereby providing an alternative to the harsh chemical conditions typically employed in Edman degradation.
- Yet another aspect of the present invention is directed to an integrated high-throughput method for sequencing of polypeptides that includes use of reagents capable of specifically binding to N-terminal amino acids for an identification step and use of an enzymatic Edman degradation to remove N-terminal amino acids.
- In accordance with the present invention, reagents capable of specifically binding to N-terminal amino acids are provided. In various aspects of the invention, the N-terminal amino acid binders (NAABs) each selectively bind to a particular amino acid, for example one of the twenty standard naturally occurring amino acids. The standard, naturally-occurring amino acids include Alanine (A or Ala), Cysteine (C or Cys), Aspartic Acid (D or Asp), Glutamic Acid (E or Glu), Phenylalanine (F or Phe), Glycine (G or Gly), Histidine (H or His), Isoleucine (I or Ile), Lysine (K or Lys), Leucine (L or Leu), Methionine (M or Met), Asparagine (N or Asn), Proline (P or Pro), Glutamine (Q or Gln), Arginine (R or Arg), Serine (S or Ser), Threonine (T or Thr), Valine (V or Val), Tryptophan (W or Trp), and Tyrosine (Y or Tyr).
- The NAABs of the present invention can be made by modifying various naturally occurring proteins to introduce one or more mutations in the amino acid sequence to produce engineered proteins that bind to particular N-terminal amino acids. For example, aminopeptidases or tRNA synthetases can be modified to create NAABs that selectively bind to particular N-terminal amino acids.
- A. eLAP
- For example, a NAAB that binds specifically to N-terminal leucine residues has been developed by introducing mutations into E. coli methionine aminopeptidase (eMAP). This NAAB (eLAP) has 19 amino acid substitutions as compared to wild-type eMAP. In particular, eLAP has substitutions at the amino acid positions corresponding to positions 42, 46, 56-60, 62, 63, 65-70, 81, 101, 177, and 221 of wild-type eMAP. In eLAP, the aspartate at position 42 of eMAP is replaced with a glutamate, the asparagine at position 46 of eMAP is replaced with a tryptophan, the valine at position 56 of eMAP is replaced with a threonine, the serine at position 57 of eMAP is replaced with an aspartate, the alanine at position 58 of eMAP is replaced with a serine, the cysteine at position 59 of eMAP is replaced with a leucine, the leucine at position 60 of eMAP is replaced with a threonine, the tyrosine at position 62 of eMAP is replaced with a histidine, the histidine at position 63 of eMAP is replaced with an asparagine, the tyrosine at position 65 of eMAP is replaced with a isoleucine, the proline at position 66 of eMAP is replaced with an aspartate, the lysine at position 67 of eMAP is replaced with a glycine, the serine at position 68 of eMAP is replaced with a histidine, the valine at position 69 of eMAP is replaced with a glycine, the cysteine at position 70 of eMAP is replaced with a serine, the isoleucine at position 81 of eMAP is replaced with a valine, the isoleucine at position 101 of eMAP is replaced with an arginine, the phenylalanine at position 177 of eMAP is replaced with a histidine, and the tryptophan at position 221 of eMAP is replaced with a serine. Alternative substitutions could be made at selected positions. For example, valine at 56 could be replaced instead by serine, leucine at 60 could be replaced instead by serine, tyrosine at 65 could be replaced instead by valine, cysteine at 70 could be replaced instead by threonine, and tryptophan at 221 could be replaced instead by threonine.
- Accordingly, one reagent in accordance with the present invention comprises an isolated, synthetic, or recombinant NAAB comprising an amino acid sequence having a glutamate residue at a position corresponding to position 42 of wild-type E. coli methionine aminopeptidase (eMAP) (SEQ ID NO: 1), a tryptophan residue at a position corresponding to position 46 of wild-type eMAP, a threonine or serine residue at a position corresponding to position 56 of wild-type eMAP, an aspartate residue at a position corresponding to position 57 of wild-type eMAP, a serine residue at a position corresponding to position 58 of wild-type eMAP, a leucine residue at a position corresponding to position 59 of wild-type eMAP, a threonine or serine residue at a position corresponding to position 60 of wild-type eMAP, a histidine residue at a position corresponding to position 62 of wild-type eMAP, an asparagine residue at a position corresponding to position 63 of wild-type eMAP, a isoleucine or valine residue at a position corresponding to position 65 of wild-type eMAP, an aspartate residue at a position corresponding to position 66 of wild-type eMAP, a glycine residue at a position corresponding to position 67 of wild-type eMAP, a histidine residue at a position corresponding to position 68 of wild-type eMAP, a glycine residue at a position corresponding to position 69 of wild-type eMAP, a serine or threonine residue at a position corresponding to position 70 of wild-type eMAP, a valine residue at a position corresponding to position 81 of wild-type eMAP, an arginine residue at a position corresponding to position 101 of wild-type eMAP, a histidine residue at a position corresponding to position 177 of wild-type eMAP, and a serine or threonine residue at a position corresponding to position 221 of wild-type eMAP.
- The remaining amino acid sequence of the NAAB comprises a sequence similar to that of wild-type eMAP, but which may contain additional amino acid mutations (including deletions, insertions, and/or substitutions), so long as such mutations do not significantly impair the ability of the NAAB to selectively bind to N-terminal leucine residues. For example, the remaining amino acid sequence can comprise an amino acid sequence having at least about 80% sequence identity to the amino acid sequence of wild-type eMAP (SEQ ID NO: 1), or at least 85%, at least 90%, at least 93%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 1.
- In some aspects of the present invention, the NAAB comprises the amino acid sequence of SEQ ID NO: 2. For example, the NAAB can consist of the amino acid sequence of SEQ ID NO: 2.
- The NAAB preferably selectively binds to N-terminal leucine residues with at least about a 1.5:1 ratio of specific to non-specific binding, more preferably about a 2:1 ratio of specific to non-specific binding. Non-specific binding refers to background binding, and is the amount of signal that is produced when the amino acid target of the NAAB is not present at the N-terminus of an immobilized peptide.
- B. tRNA Synthetase-Based NAABs
- NAABs can also be made by introducing mutations into class I and class II tRNA synthetases (RSs). NAABs for use in the polypeptide sequencing processes described herein should possess high affinity and specificity for amino acids at the N-terminus of peptides. Because tRNA synthetases have intrinsic specificity for free amino acids, they are useful scaffolds for developing NAABs for use in protein sequencing. The inherent specificity of these scaffold proteins is retained, while broadening the binding capabilities of these proteins from free monomers to peptides, and removing unnecessary domains or functions. The Protein Data Bank contains multiple crystal structures for RSs specific for all twenty canonical amino acids. Moreover, unlike other classes of amino acid binding molecules, such as riboswitches, RSs do not envelop the entire amino acid, as the C-terminus must be available for adenylation. The binding pocket in these molecules can be modified to permit the entry of peptides presenting the specifically bound amino acid. This results in a complete set of engineered RS fragments that can bind to their cognate amino acids at the N-termini of peptides.
- The class I RS proteins form a distinct structural family that is identified by sequence homology and has been extensively characterized both biochemically and biophysically. RS proteins possess a modular architecture, and the domains conferring specificity for a particular amino acid are readily identified (18). Several types of mutations to improve the performance of the amino acid binding domain of an RS as a NAAB can be introduced. First, one or more mutations can be introduced into the binding domain to lock the domain into the bound conformation, eliminating the energetic cost of any induced conformational change (16). Second, one or more mutations can be introduced to widen the binding pocket for the amino acid, making room for entry of a peptide. This approach can be used for each of the RS proteins.
- For example, mutations can be introduced into methionyl-tRNA synthetase (MetRS) from E. coli to create a NAAB that binds specifically to N-terminal methionine residues. This NAAB comprises a truncated version of wild-type E. coli MetRS (residues 4-547; SEQ ID NO: 3) having four substitution mutations as compared to the wild-type sequence (SEQ ID NO: 5). The sequence of this N-terminal methionine-specific NAAB is provided by SEQ ID NO: 4. In particular, in the methionine-specific NAAB, the leucine at position 13 of wild-type E. coli MetRS is replaced with a serine (L13S), the phenylalanine at position 260 is replaced with a leucine (Y260L), the aspartic acid at position 296 is replaced with a glycine (D296G), and the histidine at position 301 is replaced with a leucine (H301L).
- Accordingly, one reagent in accordance with the present invention comprises an isolated, synthetic, or recombinant NAAB comprising an amino acid sequence having a serine residue at a position corresponding to position 13 of wild-type E. coli methionyl-tRNA synthetase (MetRS); a leucine residue at a position corresponding to position 260 of wild-type E. coli MetRS; a glycine residue at a position corresponding to position 296 of wild-type E. coli MetRS; and a leucine residue at a position corresponding to position 301 of wild-type E. coli MetRS.
- The remaining amino acid sequence of the NAAB comprises a sequence similar to that of amino acids 4-547 of wild-type MetRS, but may contain additional amino acid mutations (including deletions, insertions, and/or substitutions), so long as such mutations do not significantly impair the ability of the NAAB to selectively bind to N-terminal methionine residues. For example, the remaining amino acid sequence can comprise an amino acid sequence having at least about 80% sequence identity to the amino acid sequence of SEQ ID NO: 3, or at least 85%, at least 90%, at least 93%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 3.
- In certain aspects of the invention, the NAAB comprises the amino acid sequence of SEQ ID NO: 4. For example, the NAAB can consist of the amino acid sequence of SEQ ID NO: 4.
- The NAAB preferably selectively binds to N-terminal methionine residues with at least about a 2:1 ratio of specific to non-specific binding, more preferably at least about a 7:1 ratio, at least about a 10:1 ratio, or about a 13:1 ratio of specific to non-specific binding.
- The starting point for the phenylalanine NAAB (Phe NAAB) was the phenylalanine-tRNA synthetase (PheRS) from Thermus Thermophilus, for which a crystal structure is available. Normally the operational unit is a tetramer with two copies each of two separate proteins. Only one of the proteins has the amino acid binding specificity, so a model was made of one copy of the protein in isolation. The N-terminus of the protein was truncated, which exposed a significant amount of surface area that was previously buried in contacts with other proteins. This surface was hydrophobic, and mutations were made the surface to make the protein stabile and soluble as a monomer. Tighter binding of the mutant to peptides was observed when compared to the wild-type protein.
- For example, mutations can be introduced into PheRS from Thermus Thermophilus to create a NAAB that binds specifically to N-terminal phenylalanine residues. This NAAB comprises a truncated version of wild-type Thermus Thermophilus PheRS (residues 86-350; SEQ ID NO: 6) having 22 substitution mutations as compared to the wild-type sequence. The sequence of this N-terminal phenylalanine-specific NAAB is provided by SEQ ID NO: 7. In particular, PheNAAB has substitutions at the amino acid positions corresponding to
positions 100, 142, 143, 152-154, 165, 205, 212, 228-232, 234, 257, 287, 289, 303, 336, 338, 340 of wild-type PheRS. In the NAAB, the leucine at position 100 of PheRS is replaced with an aspartate, the histidine at position 142 of PheRS is replaced with an asparagine, the histidine at position 143 of PheRS is replaced with a glycine, the phenylalanine at position 152 of PheRS is replaced with a valine, the tryptophan at position 153 of PheRS is replaced with a glycine, the leucine at position 154 of PheRS is replaced with a lysine, the leucine at position 165 of PheRS is replaced with an aspartate, the phenylalanine at position 205 of PheRS is replaced with an alanine, the histidine at position 212 of PheRS is replaced with an alanine, the isoleucine at position 228 of PheRS is replaced with a valine, the alanine at position 229 of PheRS is replaced with an asparagine, the methionine at position 230 of PheRS is replaced with a glutamate, the alanine at position 231 of PheRS is replaced with a glycine, the histidine at position 232 of PheRS is replaced with an aspartate, the lysine at position 234 of PheRS is replaced with a tyrosine, the tyrosine at position 257 of PheRS is replaced with a threonine, the histidine at position 287 of PheRS is replaced with a glycine, the lysine at position 289 of PheRS is replaced with an asparagine, the leucine at position 303 of PheRS is replaced with an aspartate, the phenylalanine at position 336 of PheRS is replaced with an alanine, the glycine at position 338 of PheRS is replaced with a threonine, and the leucine at position 340 of PheRS is replaced with a glycine. - Accordingly, one reagent in accordance with the present invention comprises an isolated, synthetic, or recombinant NAAB comprising an amino acid sequence having a an aspartate residue at a position corresponding to position 100 of wild-type PheRS from Thermus Thermophilus (SEQ ID NO: 8), an asparagine residue at a position corresponding to position 142 of wild-type PheRS, a glycine residue at a position corresponding to position 143 of wild-type PheRS, a valine residue at a position corresponding to position 152 of wild-type PheRS, a glycine residue at a position corresponding to position 153 of wild-type PheRS, a lysine residue at a position corresponding to position 154 of wild-type PheRS, an aspartate residue at a position corresponding to position 165 of wild-type PheRS, an alanine residue at a position corresponding to position 205 of wild-type PheRS, an alanine residue at a position corresponding to position 212 of wild-type PheRS, a valine residue at a position corresponding to position 228 of wild-type PheRS, an asparagine residue at a position corresponding to position 229 of wild-type PheRS, a glutamate residue at a position corresponding to position 230 of wild-type PheRS, a glycine residue at a position corresponding to position 231 of wild-type PheRS, an aspartate residue at a position corresponding to position 232 of wild-type PheRS, a tyrosine residue at a position corresponding to position 234 of wild-type PheRS, a threonine residue at a position corresponding to position 257 of wild-type PheRS, a glycine residue at a position corresponding to position 287 of wild-type PheRS, an asparagine residue at a position corresponding to position 289 of wild-type PheRS, an aspartate residue at a position corresponding to position 303 of wild-type PheRS, an alanine residue at a position corresponding to position 336 of wild-type PheRS, a threonine residue at a position corresponding to position 338 of wild-type PheRS, and a glycine residue at a position corresponding to position 340 of wild-type PheRS.
- The remaining amino acid sequence of the NAAB comprises a sequence similar to that of wild-type PheRS, but which may contain additional amino acid mutations (including deletions, insertions, and/or substitutions), so long as such mutations do not significantly impair the ability of the NAAB to selectively bind to N-terminal phenylalanine residues. For example, the remaining amino acid sequence can comprise an amino acid sequence having at least about 80% sequence identity to the amino acid sequence of truncated wild-type PheRS (SEQ ID NO: 6), or at least 85%, at least 90%, at least 93%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO:6.
- In some aspects of the present invention, the NAAB comprises the amino acid sequence of SEQ ID NO: 7. For example, the NAAB can consist of the amino acid sequence of SEQ ID NO: 7.
- The NAAB preferably selectively binds to N-terminal phenylalanine residues with at least about a 1.5:1 ratio of specific to non-specific binding, more preferably about a 2:1 ratio of specific to non-specific binding.
- The starting point for the histidine NAAB (His NAAB) was the histidine-tRNA synthetase (HisRS) from E. coli, for which a crystal structure is available. The fragment of wild-type HisRS from 1-320 was shown to be monomeric by others. After inspecting the crystal structure, further residues were truncated from both ends. The initial fragment tested has from Lysine3 to Alanine180. Protein design was conducted to replace a long loop near the binding site with a shorter loop that would create a more open pocket and result in tighter binding to N-terminal histidine residues. This involved the removal of 7 residues (from Arginine113 to Lysine119) and two mutations wherein the arginine at position 121 of HisRS is replaced with an asparagine, and the tyrosine at position 122 of HisRS is replaced with an alanine. Thus, thus this NAAB comprises a truncated version of wild-type E. coli HisRS (residues 3-180; SEQ ID NO: 10) having two substitution mutations as compared to the wild-type sequence. The sequence of this N-terminal histidine-specific NAAB is provided by SEQ ID NO: 9.
- Accordingly, one reagent in accordance with the present invention comprises an isolated, synthetic, or recombinant NAAB comprising an amino acid sequence having an asparagine residue at a position corresponding to position 121 of wild-type HisRS from E. coli (SEQ ID NO: 9) and an alanine residue at a position corresponding to position 122 of wild-type HisRS.
- The remaining amino acid sequence of the NAAB comprises a sequence similar to that of wild-type HisRS, but which may contain additional amino acid mutations (including deletions, insertions, and/or substitutions), so long as such mutations do not significantly impair the ability of the NAAB to selectively bind to N-terminal histidine residues. For example, the remaining amino acid sequence can comprise an amino acid sequence having at least about 80% sequence identity to the amino acid sequence of wild-type HisRS (SEQ ID NO: 9), or at least 85%, at least 90%, at least 93%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 9.
- In some aspects of the present invention, the NAAB comprises the amino acid sequence of SEQ ID NO: 10. For example, the NAAB can consist of the amino acid sequence of SEQ ID NO: 10.
- The NAAB preferably selectively binds to N-terminal histidine residues with at least about a 1.5:1 ratio of specific to non-specific binding, more preferably about a 2:1 ratio of specific to non-specific binding.
- Full-length or truncated fragments from wild-type synthetases from E. coli may be used as NAABs for the remaining amino acids. See Table A for the sequences of each of the NAABs. Accordingly, in some aspects of the present invention, the NAAB comprises an amino acid sequence selected from the group consisting of SEQ ID NO: 11; SEQ ID NO: 12; SEQ ID NO: 13; SEQ ID NO: 14; SEQ ID NO: 15; SEQ ID NO: 16; SEQ ID NO: 17; SEQ ID NO: 18; SEQ ID NO: 19; SEQ ID NO: 20; SEQ ID NO: 21; SEQ ID NO: 22; SEQ ID NO: 23; SEQ ID NO: 24; SEQ ID NO: 25; SEQ ID NO: 26; SEQ ID NO: 27; and SEQ ID NO: 28. In various embodiments, a set of NAABs comprises at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or more of the amino acid sequences of SEQ ID NO: 2; SEQ ID NO: 4; SEQ ID NO: 7; SEQ ID NO: 10; SEQ ID NO: 11; SEQ ID NO: 12; SEQ ID NO: 13; SEQ ID NO: 14; SEQ ID NO: 15; SEQ ID NO: 16; SEQ ID NO: 17; SEQ ID NO: 18; SEQ ID NO: 19; SEQ ID NO: 20; SEQ ID NO: 21; SEQ ID NO: 22; SEQ ID NO: 23; SEQ ID NO: 24; SEQ ID NO: 25; SEQ ID NO: 26; SEQ ID NO: 27; and SEQ ID NO: 28. For example, a set of NAABs comprises of the amino acid sequences of SEQ ID NO: 2; SEQ ID NO: 4; SEQ ID NO: 7; SEQ ID NO: 10; SEQ ID NO: 11; SEQ ID NO: 12; SEQ ID NO: 13; SEQ ID NO: 14; SEQ ID NO: 15; SEQ ID NO: 16; SEQ ID NO: 17; SEQ ID NO: 18; SEQ ID NO: 19; SEQ ID NO: 20; SEQ ID NO: 21; SEQ ID NO: 22; SEQ ID NO: 23; SEQ ID NO: 24; SEQ ID NO: 25; SEQ ID NO: 26; SEQ ID NO: 27; and SEQ ID NO: 28.
- The phenyl isothiocyanate (PITC) reagent used to activate peptide N-termini for stepwise degradation also reacts with the NE atom in the lysine side chain. As a result, domains derived from lysine RNA synthetase (LysRS) proteins cannot be used for specific recognition of modified lysine. A NAAB that is specific for PITC-derivatized lysine is therefore required. The class II RS for pyrrolysine (Py1RS) served as a starting point for development. Pyrrolysine is a lysine derivative that possesses a pyrrole ring attached to the NE atom by an amide linkage (Structure A). Crystal structures have been determined for PylRS bound to several ligands (23), one of which is one bond longer than pyrrolysine (Structure B), and possesses steric similarity to a model of PITC-derivatized lysine (Structure C).
- Genomic DNA for the archaea Methanosarcina mazei, the source organism for the crystal structure, will be obtained from the American Type Culture Collection (ATCC). The gene will be cloned and expressed. The relevant substrate for assessing compatibility with the DAPES strategy is a peptide with an N-terminal lysine that has been modified with PITC on its side chain, but not its amino terminus. It is expected that the side chain will be derivatized during previous cycles, but that the N-terminus will be regenerated by the cleavage step of the preceding cycle. A peptide with the sequence DKGMMGSSC will be obtained. The peptide will be derivatized with PITC, modifying both the N-terminus and the side chain of the lysine at the second position. The modified aspartate residue will be with the designed enzyme, which has excellent activity against PITC-modified aspartate. The resulting peptide, with an N-terminal lysine modified only on its side chain, will be purified from the reaction mixture by HPLC. The peptide will then be immobilized on the nanogel surface via its C-terminal cysteine. The liberated Py1RS domain will be fluorescently labeled with Cy5 and assayed for binding to the immobilized peptide.
- In the event that the engineered domain exhibits poor binding, a structural model of the NAAB in complex with pyrrolysine will be constructed using the crystal structure as a template. Computational design will be performed with the program RosettaDesign (24) to optimize the shape complementarity between the protein and the amino acid. We will introduce the suggested mutations into the gene for the NAAB, express and purify the protein, and reassess the binding properties of the new mutant NAAB.
- In accordance with various aspects of the present invention, the NAABs may also include reagents capable of specifically binding to phosphorylated N-terminal amino acids (e.g., phosphotyrosine, phosphoserine, and phosphothreonine).
- The proteome is elaborated by post-translational modifications. These marks are reversible and provide a snapshot of the current state of a cell with respect to signaling pathways and other regulatory control. Side chain phosphorylation, which primarily occurs on tyrosine, serine, and threonine residues, is a well-known post-translational modification. However, characterization of phosphorylated amino acids by mass spectrometry is difficult. Phosphate groups can be altered or lost during the ionization process, and sample enrichment is typically required to cope with issues of dynamic range (2). Identification of phosphorylated amino acids using digital protocols (e.g., DAPES) is improved because of the improved dynamic range and mild buffer conditions afforded by the present invention. Moreover, the ability to distinguish between phosphorylated and unphosphorylated amino acids could have a huge impact for characterizing cellular and disease states.
- NAABs that specifically bind to either phosphoserine, phosphotyrosine, or phosphothreonine can be made by modifying certain tRNA synthetases to include one or more mutations. For example, methanogenic archaea possess an RS for phosphoserine. In contrast to most organisms, methanogenic archaea lack a CysRS. In these organisms, phosphoserine (Sep) is first ligated to the tRNA for cysteine, and then converted to Cys-tRNA in a subsequent step. A crystal structure of SepRS, a class II synthetase in complex with Sep is available from the PDB (pdb code: 2DU3 (36)).
- While there are no known phosphotyrosine tRNA synthetases, RSs for several chemically similar analogs have been obtained via directed evolution (37-39). The class I TyrRS from Methanococcus jannaschii is the parental protein for these mutants, and a crystal structure is available for engineering (pdb code: 1U7D (apo), 1J1U(holo)). There are several relevant mutant RSs, most notably for sulfotyrosine (37), p-acetyl-L-phenylalanine (pAF), and p-carboxymethyl-L-phenylalanine (pCMF).
- Given the stereochemical similarity between phosphate and sulfate, and the fact that phosphatases and phosphoryltransferases often accept sulfates and sulfuryl groups as substrates (40), it has been found that the sulfotyrosine RS will recognize phosphotyrosine without further modification. The pAF RS, for which a crystal structure is available (pdb code: 1ZH6), differs from the sulfotyrosine RS at only two residues (38). Thus, if necessary a template is available for structural modeling and further protein engineering.
- There are no reported pThrRSs or previously engineered RSs that recognize pThr analogs. Consequently, generation of a pThrRS may require more extensive protein engineering. We will approach this task from two directions. First, we will use computational design to widen the binding pocket of SepRS to accommodate the additional methyl group present in pThr. Second, we will use the motif-directed design approach to graft previously observed phosphate-binding interactions into the binding pocket of ThrRS. The PDB contains hundreds of examples of binding interactions involving phosphotyrosine (308 examples), phosphoserine (385), and phosphothreonine (325) that are suitable for building a motif library of protein-phosphate interactions. The same design protocol successfully used to switch the specificity of eMAP to eLAP will be applied to transplant these interaction motifs into E. coli ThrRS. Mutagenesis of SepRS and ThrRS proteins will be performed using the QuikChange protocol. We will purchase a peptide with the sequence pTGMMGSSC for attachment to the nanogel surface and characterization of binding by single-molecule detection.
- It is expected that a NAAB for pThr may also bind to N-terminal pSer. If so, this NAAB can be used for pThr and pSer, and then the specific amino acid can be inferred by evaluating the surrounding sequence to map the peptide onto a reference proteome library. Alternatively, if de novo, phosphorylation-sensitive sequencing is required, then the efficacy of applying a pSer NAAB, detecting binding, then applying a pThr NAAB without an intervening wash step will be assessed. Bound pSer termini will be blocked by the pSer NAAB, and only additional fluorescent spots will be identified as pThr residues.
- In accordance with various aspects of the present invention, the NAABs are fluorescently labeled such that when a NAAB binds to an amino acid, fluorescence can be detected. Fluorophores useful for fluorescently labels on the NAABs include, for example, but are not limited to Cy3 and Cy5. The fluorophores are usually coupled on-specifically to free amine groups (e.g., lysine side chains) of the NAABs.
- II. Method of Making NAABs by Introducing Mutations into tRNA Synthetase Proteins
- The present invention also relates to a method for making a NAAB by introducing mutations into the amino acid sequence of a tRNA synthetase (RS) to produce a NAAB that selectively binds to a particular N-terminal amino acid. For example, such methods can involve introducing one or more mutations into a naturally occurring RS (e.g., into a wild-type E. coli RS). Such methods can also involve introducing one or more additional mutations into an RS that already includes one or more amino acid mutations in its sequence as compared to the sequence of a corresponding wild-type RS.
- The methods for making NAABs comprise identifying the amino acid binding domain of a tRNA synthetase, introducing one or more mutations into the amino acid binding domain to create a NAAB, and assaying the NAAB for specific binding to an N-terminal amino acid of a polypeptide.
- Where the tRNA synthetase is a class I tRNA synthetase, identification of the amino acid binding domain can be accomplished, for example, by constructing a sequence alignment that aligns pairwise the amino acid sequences of two or more class I tRNA synthetases with one another, wherein one of the class I tRNA synthetases has a previously defined amino acid binding domain. This allows for identification of corresponding sequence positions between proteins in order to share useful mutations between NAABs. Thus, in certain aspects of these methods, the tRNA synthetase is a first class I tRNA synthetase and the identifying step comprises aligning an amino acid sequence of the first class I tRNA synthetase with an amino acid sequence of a second class I tRNA synthetase having a previously defined amino acid binding domain. For example, the amino acid binding domain of E. coli MetRS is known to be encompassed within
amino acids 4 to 547 of the protein. Thus, the amino acid sequence of the second class I tRNA synthetase can comprise the amino acid sequence of full-length E. coli MetRS (SEQ ID NO: 5) or a fragment thereof which includes the amino acid binding domain. In addition, the amino acid sequence of the second class I tRNA synthetase can comprise a wild-type sequence or can comprise a sequence containing one or more mutations, so long as the presence of the mutations does not significantly impair the ability of the sequence to align with other class I tRNA synthetases. For example, the amino acid sequence of the second class I tRNA synthetase can comprise the amino acid sequence of the engineered MetRS fragment described above (of SEQ ID NO: 4), which contains four amino acid substitutions as compared to the corresponding fragment of wild-type E. coli MetRS. The identifying step can comprise aligning the amino acid sequence of full-length E. coli MetRS (SEQ ID NO: 5) or a fragment thereof which includes the amino acid binding domain with a class I tRNA synthetase selected from the group consisting of arginine, cysteine, glutamate, glutamine, isoleucine, leucine, lysine, methionine, tyrosine, tryptophan, and valine. - The method can also involve constructing a multiple sequence alignment that aligns the amino acid sequences of the first class I tRNA synthetase, the second class I tRNA synthetase, and at least one additional class I tRNA synthetase. For example, the multiple sequence alignment can align the sequences of at least five, at least seven, or at least nine class I tRNA synthetases. Thus, the multiple sequence alignment can align the amino acid sequence of full-length E. coli MetRS (SEQ ID NO: 5) or a fragment thereof which includes the amino acid binding domain with the amino acid sequences of at least two other class I tRNA synthetases selected from the group consisting of arginine, cysteine, glutamate, glutamine, isoleucine, leucine, lysine, methionine, tyrosine, tryptophan, and valine.
- Following alignment of an amino acid sequence of a first class I tRNA synthetase with an amino acid sequence of a second class I tRNA synthetase having a previously defined amino acid binding domain, the boundaries of the amino acid binding domain of the first class I tRNA synthetase can be identified using the known boundaries of the amino acid binding domain in the second class I tRNA synthetase as a guide.
- Once the amino acid binding domain of a given class I tRNA synthetase has been identified, mutations homologous to the four substitution mutations present in the engineered MetRS fragment described above are introduced into the amino acid binding domain of the class I tRNA synthetase. Thus, for each class I tRNA synthetase, the leucine at position 13 of wild-type E. coli MetRS is replaced with a serine (L13S), the phenylalanine at position 260 is replaced with a leucine (Y260L), the aspartic acid at position 296 is replaced with a glycine (D296G), and the histidine at position 301 is replaced with a leucine (H301L).
- The binding affinity of each NAAB containing these mutations against a panel of N-terminal amino acids can be predicted in silica using a computer modeling program (e.g., the Rosetta modeling program). Any NAAB with significant predicted cross-binding with undesired target peptides can be subjected to computational redesign for specificity using a multi-state strategy (11). For example, the computational redesign may identify one or more additional mutations likely to improve the binding specificity of the NAAB for a particular N-terminal amino acid. In this approach, structural models of the NAAB in complex with both the desired and undesired amino acids are constructed in silico.
- If computational redesign identifies any further mutations as being likely to improve the binding specificity of the NAAB for a particular N-terminal amino acid, such mutations can be introduced into the NAAB.
- Similar methods can be used to identify the amino acid binding domains of the class II RSs and introduce mutations into those domains to produce NAABs that selectively bind to N-terminal amino acids that are activated by class II RSs (Ala, Pro, Ser, Thr, His, Asp, Asn, Lys, Gly, and Phe).
- The catalytic domain of class II RS proteins contains the amino acid specificity for the enzyme, and these domains can be used as a starting point for developing additional NAABs. Although class II RSs function as multimers, the catalytic domain of the HisRS from E. coli can be made monomeric by liberating it from its activation domain (20). The crystal structure of the enzyme in complex with histidyl-adenylate is available (pdb code 1KMM (21)), and can serve as a basis for computational structure-based design. At least one RS crystal structure is available for each of the amino acids activated by class II RSs (Ala, Pro, Ser, Thr, His, Asp, Asn, Lys, Gly, and Phe).
- For example, the amino acid binding domains for each of the class II RSs can be identified using the monomeric fragment of E. coli HisRS (SEQ ID NO: 9) as a guide to identify corresponding domains in other class II RSs. Structural alignments between the monomeric fragment of E. coli HisRS (residues 3-180 and corresponding domains in other class II RSs can be obtained from the Dali web server (22). Multiple sequence alignments for the conserved class II catalytic domain can be obtained from the Pfam database (19). Using these alignments, boundaries for the amino acid binding domains for class II RSs can be identified.
- Thus, in some aspects of the method of a making a NAAB, the tRNA synthetase is a first class II tRNA synthetase and the step of identifying the amino acid binding domain comprises aligning an amino acid sequence of the first class II tRNA synthetase with an amino acid sequence of a second class II tRNA synthetase having a previously defined amino acid binding domain. The amino acid sequence of the second class II tRNA synthetase can comprise the amino acid sequence a monomeric fragment of E. coli HisRS that contains the amino acid binding domain (e.g., SEQ ID NO: 9). The amino acid sequence of the second class II tRNA synthetase can comprise a wild-type sequence or can comprise a sequence containing one or more mutations, so long as the presence of the mutations does not significantly impair the ability of the sequence to align with other class I tRNA synthetases.
- For example, the identifying step can comprise aligning the amino acid sequence of the monomeric fragment of E. coli HisRS with a corresponding domain of a class II tRNA synthetase selected from the group consisting of AlaRS, ProRS, SerRS, ThrRS, AspRS, AsnRS, LysRS, GlyRS, and PheRS.
- The identifying step can also comprise constructing a multiple sequence alignment that aligns the amino acid sequences of the first class II tRNA synthetase, the second class II tRNA synthetase, and at least one additional class II tRNA synthetase. For example, the multiple sequence alignment can align the sequences of at least five, at least seven, or at least nine class II tRNA synthetases. The multiple sequence alignment can align the amino acid sequence of a monomeric fragment of E. coli HisRS that contains the amino acid binding domain with a corresponding domain of at least two other class II tRNA synthetases selected from the group consisting of AlaRS, ProRS, SerRS, ThrRS, AspRS, AsnRS, LysRS, GlyRS, and PheRS. Alternatively, the multiple sequence alignment can align the amino acid sequence of a monomeric fragment of E. coli HisRS that contains the amino acid binding domain with corresponding domains of AlaRS, ProRS, SerRS, ThrRS, AspRS, AsnRS, LysRS, GlyRS, and PheRS.
- Once the amino acid binding domain of a given class II tRNA synthetase has been identified, mutations (e.g., substitution mutations) are introduced into the amino acid binding domain in order to increase the binding affinity of the domain for a particular N-terminal amino acid.
- As with the methods involving class I tRNA synthetases, the methods involving class II tRNA synthetases can also further comprise using a computer modeling program to predict the binding affinity of the NAAB against a panel of N-terminal amino acids. In addition, the NAAB can be subjected to computational redesign to identify one or more additional mutations to improve the binding specificity of the NAAB for a particular N-terminal amino acid. Any additional mutations identified using computational redesign can then be introduced into the NAAB.
- The NAABs designed and made using any of the above methods can cloned into an expression vector, expressed in a host cell (e.g., in an E. coli host cell), purified, and assayed for specific binding to an N-terminal amino acid of a polypeptide. For example, the binding activity for each NAAB can be assayed against a standard set of polypeptides having different N-terminal residues (e.g., custom synthesized peptides of the form XGMMGSSC, where X is a variable position occupied by each of the twenty amino acids).
- For NAABs derived from class II tRNA synthetases, if any of the E. coli protein fragments prove to are insoluble or perform poorly as NAABs, protein design can be used to redesign hydrophobic residues that become exposed upon monomerization. If a crystal structure is unavailable for the E. coli protein, a synthetic gene for an RS with an experimentally determined structure can be obtained. The availability of structures for these proteins allows application of protein surface redesign if the domain truncation results in loss of solubility, binding pocket redesign for enhanced affinity if binding is weak, or multi-state design for enhanced specificity if promiscuous binding is observed (11).
- In any of the above methods, the tRNA synthetase amino acid sequences can be E. coli tRNA synthetase amino acid sequences.
- In addition, in any of the above methods, the sequences can be aligned pairwise by various methods known in the art, for example, using the hidden Markov models available in the Pfam database (19), dynamic programming, and heuristic methods like BLAST.
- Also, in any of the above methods, mutations that favor desired binding and disfavor undesired binding can be introduced into any of the wild-type proteins described above by various methods, for example, using mutagenic primers to introduce mutations via site-directed mutagenesis, PCR-based mutagenesis and Kunkel mutagenesis. Various computer programs can be used to design suitable primers (e.g., the QUICKCHANGE (Aligent Technologies) primer design program).
- In accordance with various aspects of the present invention, the NAABs discussed above are used as reagents in a method of polypeptide sequencing. Generally, the method of sequencing a polypeptide comprises the steps of:
-
- (a) contacting the polypeptide with one or more fluorescently labeled N-terminal amino acid binding proteins (NAABs);
- (b) detecting fluorescence of a NAAB bound to an N-terminal amino acid of the polypeptide;
- (c) identifying the N-terminal amino acid of the polypeptide based on the fluorescence detected;
- (d) removing the NAAB from the polypeptide;
- (e) optionally repeating steps (a) through (d);
- (f) cleaving the N-terminal amino acid of the polypeptide via Edman degradation; and
- (g) repeating steps (a) through (f) one or more times.
- In step (a), the polypeptide is contacted with one or more NAABs. In various aspects, the polypeptide is contacted with a single type of NAAB that selectively binds to a single type of N-terminal amino acid residue (e.g., a NAAB that selectively binds to N-terminal alanine residues or a NAAB that selectively binds to N-terminal methionine residues). In other embodiments, the polypeptide is contacted with a mixture of two or more types of NAABs that each selectively binds to different amino acid residues. For example, the mixture may comprise two NAABs such as a NAAB that selectively binds to N-terminal alanine residues and a NAAB that selectively binds to N-terminal cysteine residues. A mixture comprising two or more NAABs that selectively bind to different amino acid residues is especially useful when sequencing several polypeptides simultaneously. Introducing multiple different NAABs also reduces sequencing time because multiple N-terminal amino acid residues can be identified during a single iteration of steps (a) through (d). As such, in various embodiments, the method comprises sequencing a plurality of polypeptides. These embodiments are especially suited for high throughput sequencing methods.
- In various aspects of the invention, the polypeptide may be immobilized on a substrate prior to contact with the one or more NAABs. The peptide may be immobilized on any suitable substrate. For example, nanogel substrates have been developed with low non-specific adsorption of proteins and the ability to visualize single attached molecules on this surface (8, 9). Moreover, a plurality of polypeptides may be immobilized on the substrate for sequencing. Immobilizing a plurality of polypeptides is especially suited for high throughput sequencing methods.
- The NAABs of the present inventions are fluorescently labeled with a fluorophore such that when a NAAB binds to a N-terminal amino acid, fluorescence emitted by the fluorophore can be detected by an appropriate detector. Suitable fluorophores include, but are not limited to Cy3 and Cy5. Fluorescence can suitably be detected by detectors known in the art. Based on the fluorescence detected, the N-terminal amino acid of the polypeptide can identified.
- In aspects of the method where the contacting step comprises contacting the polypeptide with a mixture of two or more types of NAABs that each selectively binds to different amino acid residues, each type of NAAB is suitably labeled with different fluorophores having different fluorescence emission spectra. For example, the contacting step can comprise contacting the polypeptide with a first type of NAAB and a second type of NAAB, wherein the first type of NAAB selectively binds to a first type of N-terminal amino acid residue and the second type of NAAB selectively binds to a second type of N-terminal amino acid residue different from the first type of N-terminal amino acid residue. In such methods, the first type of NAAB is suitably coupled to a first fluorophore and the second type of NAAB is suitably coupled to a second fluorophore, wherein the first and second fluorophores have different fluorescence emission spectra.
- In step (d), the one or more NAABs are removed from the polypeptide(s). Removing the one or more NAABs includes removing any excess NAABs present in solution and/or removing any NAABs that are bound to N-terminal amino acids of the polypeptides. Removal of the NAABs is suitably accomplished by washing the polypeptide with a suitable wash buffer in order to cause dissociation of any bound NAABs. In embodiments where the polypeptide is immobilized on a solid substrate, the reagent may be removed by contacting the substrate with a suitable wash buffer.
- Steps (a)-(d) may be repeated any number of times until the N-terminal amino acid of the polypeptide has been identified. In embodiments where a plurality of polypeptides is being sequenced, steps (a)-(d) may be repeated any number of times until all of the N-terminal amino acids of the polypeptide(s) have been identified. During each repetition, a different NAAB or a set of NAABs may be used in step (a) to probe the N-terminal amino acid of the polypeptide(s). Thus, for example, where step (a) comprises contacting the polypeptide with a single type of NAAB that selectively binds to a single type of N-terminal amino acid residue, it may be necessary to repeat steps (a) through (d) up to 24 or more times in order to probe the polypeptide with a NAAB specific for each of the twenty standard amino acids, for PITC-derivatized lysine, and for each of the three common phosphorylated amino acids. Alternatively, where step (a) comprises contacting the polypeptide with two or more different types of NAABs simultaneously, fewer repetitions of steps (a) through (d) will be necessary to identify the N-terminal amino acid of the polypeptide.
- After the N-terminal amino acid has been identified or after all of the N-terminal amino acids have been identified (when sequencing multiple polypeptides simultaneously), the N-terminal amino acid(s) may be cleaved from the polypepitde(s) via Edman degradation. Generally, the Edman degradation comprises reacting the N-terminal amino acid of the polypeptide with phenyl isothiocyanate (PITC) to form a PITC-derivatized N-terminal amino acid, and cleaving the PITC-derivatized N-terminal amino acid. In various aspects of the invention, the modified N-terminal amino acid may be cleaved using an Edman degradation enzyme as described in further detail below. In other embodiments, the modified N-terminal amino group may be cleaved by methods known in the art including contact with acid or exposure to high temperature. In these aspects, any substrate comprising the immobilized polypeptide(s) should be compatible with the acidic conditions or high temperatures.
-
FIG. 1 provides a diagrammatic representation of the steps of a method of polypeptide sequencing according to the present invention. Instep 1 ofFIG. 1 , multiple polypeptide molecules are immobilized on a substrate. The individual peptide molecules are suitably spatially segregated on the substrate. Analyte proteins may be fragmented into two or more polypeptides prior to immobilization on the substrate. - In
step 2 ofFIG. 1 , the immobilized polypeptides are contacted with a fluorescently labeled NAAB and fluorescence of the NAAB bound to the N-terminal amino acid of any of the peptides is detected. An image of the substrate is suitably captured at this stage. Subsequently, the NAAB is washed off the substrate. This cycle of binding, detection, and removal of the NAAB is repeated until the N-terminal amino acids of all of the immobilized polypeptides have been identified (step 3). Next, instep 4, the N-termini of the polypeptides are reacted with phenyl isothiocyanate (PITC) (black ovals inFIG. 1 ). Instep 5, an Edman degradation (“Edmanase”), catalyzes the removal of the PITC-derivatized N-terminal amino acid under mild conditions. In each complete cycle, one amino acid is sequenced from each peptide and a new N-terminus is generated for identification in subsequent cycles (step 6). - In the polypeptide sequencing methods described herein, some of the NAABs may bind smaller, sterically similar off-target amino acids. For example, the isoluecine-specific NAAB derived from IleRS and the threonine-specific NAAB derived from ThrRS may bind N-terminal valine and serine residues, respectively, in addition to their desired targets. However, this does not hinder the effectiveness of this protein sequencing technique. Although various aspects of the present invention relate to a reagent comprising NAABs for all twenty amino acids, the optimal set size for actual sequencing may be less than twenty. Reducing the number of NAABs involves trading off absolute specificity for fewer binding molecules by using a reduced alphabet for protein sequences. It may be more efficient to identify multiple amino acids (such as isoleucine and valine) with a single NAAB, and treat these amino acids as interchangeable when matching against a sequence database. It is also possible to enforce specificity in digital protocols such as DAPES by introducing the NAABs in a step-wise fashion. For example, the valine-specific NAAB derived from ValRS can be added before the isoleucine-specific NAAB derived from IleRS, with the intention of identifying and capping N-terminal valine residues before molecules intended to target isoleucine residues that can bind to them.
- Methods of the present invention possess attractive features relative to mass spectrometry. Because detection operates at the single molecule level, this method will have excellent dynamic range, and will be appropriate for extremely small amounts of sample. Furthermore, the digital nature of the detection produces inherently quantitative data. Finally, because all steps can be carried out in neutral aqueous buffer, post-translation modifications (e.g., phosphorylations) remain stable and available for analysis.
- In another aspect, the present invention is directed to a method and reagents for enzymatic Edman degradation (i.e., cleaving the N-terminal amino acid of a polypeptide). In accordance with this aspect, one or more enzymes are provided that catalyze the cleavage step of the Edman degradation in aqueous buffer and at neutral pH, thereby providing an alternative to the harsh chemical conditions typically employed in conventional Edman degradation. In one aspect, the Edman degradation enzyme a modified cruzain enzyme. Cruzian is a cysteine protease in the protozoa Trypanosoma cruzi and was discovered to possess many of the desired characteristics for creating an Edman degradation enzyme.
- In conventional Edman degradation, polypeptides are sequenced by degradation from their N-terminus using the Edman reagent, phenyl isothiocyanate (PITC). The process requires two steps: coupling and cleavage. In the first step (coupling), the N-terminal amino group of a peptide reacts with phenyl isothiocyanate to form a thiourea. In the second step, treatment of the thiourea with anhydrous acid (e.g., trifluoroacetic acid) results in cleavage of the peptide bond between the first and second amino acids. The N-terminal amino acid is released as a thiazolinone derivative. The thiazoline derivative may be extracted into an organic solvent, dried, and converted to the more stable phenylthiohydantoin (PTH) derivative for analysis. The most convenient method for identifying the PTH-amino acids generated during each sequencing cycle is by UV absorbance and HPLC chromatography. Each amino acid is detected by it UV absorbance at 269 nm and is identified by its characteristic retention time.
- In digital protocols, such as DAPES, the N-terminal amino acid has already been identified. Therefore, there is no need to generate or detect a phenylthiohydantoin derivative of the terminal amino acid. However, the strongly acidic conditions typically used in the cleavage step of conventional Edman degradation protocols are incompatible with the substrate surface upon which the polypeptides are immobilized for single molecule protein detection (SMD) (e.g., a nanogel surface). One modification of the conventional Edman degradation dispenses with the acidic conditions promotes cleavage with elevated temperature (e.g., 70-75° C.) instead (25). However, some substrate surfaces used to immobilize peptides include bovine serum albumin (BSA), which has a melting temperature of approximately 60° C. in the absence of stabilizing additives (26). Further, repeated cycles of heating and cooling of the substrate surface (e.g., nanogel) may be undesirable. Thus, the present invention provides a method of performing the Edman degradation which dispenses with both acidic conditions and elevated temperature. Advantageously, an enzyme has been developed which accomplishes the cleavage step in a neutral, aqueous buffer. This enzyme avoids acidic conditions and high temperatures and decreases the cycle time for polypeptide sequencing by reducing or eliminating the need to change buffer and temperature conditions repeatedly.
- The Edman degradation enzyme (or “Edmanase”) according to the present invention accomplishes the chemical step of the N-terminal degradation by nucleophilic attack of the thiourea sulfur atom on the carbonyl group of the scissile peptide bond. As noted, the enzyme was made by modifying cruzain, a cysteine protease from the protozoa Trypanosoma cruzi (SEQ ID NO: 30). Cruzain prefers hydrophobic amino acids at the S2 position relative to the scissile bond, which corresponds to the phenyl moiety of the Edman reagent. The protease is relatively insensitive to the identity of the amino acid at the S1 position (29), allowing for promiscuous cleavage of diverse N-terminal residues. Furthermore, this protein has been the subject of extensive structural characterization (27).
- The Edman degradation enzyme differs from the wild-type of cysteine protease cruzain at four positions. One mutation (C25G) removes the catalytic cysteine residue while three mutations (G65S, A138C, L160Y) were selected to create steric fit with the phenyl moiety of the Edman reagent (PITC).
FIG. 4A-4B depicts latter three mutations (indicated by the arrows) introduced into a model of cruzain (pdb code: 1U9Q (27); SEQ ID NO: 30) to accommodate the phenyl moiety of the Edman reagent phenyl isothiocyanate.FIG. 4A depicts a model for the cleavage intermediate of an N-terminal alanine residue in the active site cleft. In addition to the engineered residues, two wild-type residues (shown in green sticks) contribute to forming a complementary pocket.FIG. 4B depicts a space-filling representation of the packing of the phenyl ring by protein side chains. The methyl group of the ligand (in gray at the top of the panel) corresponds to the side chain of the N-terminal residue to be cleaved, and is not involved in the tight packing between enzyme and substrate. The enzyme was expressed and purified. - Accordingly, one aspect of the present invention relates to an isolated, synthetic, or recombinant Edman degradation enzyme comprising an amino acid sequence having a glycine residue at a position corresponding to position 25 of wild-type Trypanosoma cruzi cruzian; a serine residue at a position corresponding to position 65; a cysteine residue at a position corresponding to position 138; and a tryptophan residue at a position corresponding to position 160.
- The remaining amino acid sequence of the Edman degradation enzyme comprises a sequence similar to that of wild-type Trypanosoma cruzi cruzian, but may contain additional amino acid mutations (including deletions, insertions, and/or substitutions, so long as such mutations do not significantly impair the ability of the Edman degradation enzyme to cleave PITC-derivatized N-terminal amino acids. For example, the remaining amino acid sequence can have at least about 80%, or at least 85%, at least 90%, at least 93%, at least 95%, at least 96%, at least 87%, at least 98%, or at least 99% sequence identity with the sequence of the wild-type Trypanosoma cruzi cruzian.
- In some aspects of the invention, the Edman degradation enzyme comprises the sequence of SEQ ID NO: 29. For example, the Edman degradation enzyme can consist of the sequence of SEQ ID NO: 29.
- In various aspects of the invention, the reagents for enzymatic Edman degradation comprise two or more enzymes. For example, one point of concern is the ability to cleave proline residues. If a single mutant of cruzain cannot accomplish this reaction, then an additional enzyme would be required. Naturally occurring enzymes cleave dipeptides of the form Xaa-Pro from the N-terminus of peptides, for example, quiescent cell proline dipeptidase (QPP) (35), and Xaa-Pro amino peptidase (pdb code: 30VK). PITC-coupled N-terminal proline is chemically and sterically very similar to a dipeptide. Therefore, these enzymes are excellent starting points for engineering a proline-specific activity.
- When introducing elements of the present invention or the preferred embodiments(s) thereof, the articles “a”, “an”, “the” and “said” are intended to mean that there are one or more of the elements. The terms “comprising”, “including” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements.
- As various changes could be made in the above products, compositions and processes without departing from the scope of the invention, it is intended that all matter contained in the above description and shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.
- The following non-limiting examples are provided to further illustrate the present invention.
- In this example, an E. coli methionine aminopeptidase (eMAP) was modified to create a NAAB that binds specifically to N-terminal leucine residues. Two mutually compatible leucine-contacting interactions were identified from the protein data bank (PDB) (15) that could be incorporated into the eMAP structure. The surrounding protein residues of eMAP were redesigned around these two interactions. The resulting NAAB for leucine (eLAP) has 19 amino acid mutations relative to eMAP.
- The eMAP and eLAP proteins were expressed and assayed for binding against a panel of peptides with different N-termini. The NAAB for N-terminal leucine amino acids was non-specifically labeled with Cy5 fluorophore on lysine side chains. Synthetic peptides with either N-terminal methionine, leucine, or asparagine amino acids were coupled to a nanogel surface by thiol linkage. An additional experiment was performed with no peptide added. The labeled NAAB was briefly incubated with the immobilized peptide, and unbound protein was removed by washing. Bound protein, which may be bound specifically to peptides or non-specifically to the surface, was imaged by total internal reflection fluorescence (TIRF) microscopy. Spots exceeding a detection threshold were deemed to indicate bound protein and were converted to a number of counts per field-of-view.
FIGS. 2A-2B show the binding specificity of wild-type E. coli methionine aminopeptidase (eMAP) and eLAP in a single-molecule detection experiment. InFIG. 2A , fluorescently labeled eMAP and eLAP NAABs were visualized after binding to immobilized peptides with different N-terminal amino acids.FIG. 2B depicts histograms of quantitative binding. Digital analysis of NAAB binding for eMAP and eLAP showed that each NAAB was specific for the expected N-terminal amino acid. Both proteins exhibited roughly a 2:1 ratio of specific to non-specific binding. - These results demonstrate that individual N-terminal amino acids can be identified in an SMD format using NAABs that are selective for a particular amino acid.
- In this example, a truncated version of wild-type E. coli methionyl-tRNA synthetase (MetRS) from E. coli. was modified to make a NAAB that binds specifically to N-terminal methionine residues. A truncated version of MetRS (residues 1-547) having three amino acid mutations (L13S, Y260L, and H301L) that had been shown to pre-organize the binding site towards the methionine-bound conformation was obtained (16). A crystal structure is available of this mutant bound to free methionine (pdb code: 3h99). An additional mutation (D296G) was introduced to provide a more open binding pocket capable of accommodating a peptide and avoid steric clashes. This mutation was introduced into MetRS and the altered protein was expressed in E. coli. The gene encoding MetRS from genomic DNA was amplified and was cloned into the pET42a expression vector between the Mfel and XhoI sites. This yielded a genetic fusion of a thrombin-cleavable GST tag and MetRS. The mutations were introduced using the QuikChange protocol. The proteins were expressed at 16° C. overnight using the autoinduction protocol of Studier (17). The GST-MetRS fusion was purified from lysates by affinity chromatography using GSTrap columns on a Bio-Rad liquid chromatography system. Following purification, proteins were labeled with Cy5 fluorophore on lysine side chains for single-molecule binding assays.
- Using an SMD assay we then tested the specificity of our mutant MetRS for peptides with different amino acids at the N-terminus. Peptides of the form XGMMGSSC were purchased commercially, where X is methionine, leucine, or asparagine. The peptides were immobilized on a nanogel surface via thiol linkages, and the engineered MetRS domain was applied to the surface. Single molecule detection of bound MetRS was imaged by total internal reflection fluorescence (TIRF) microscopy. The resulting images are shown in
FIG. 3 . Quantitation of single-molecule binding events yields specific to non-specific binding of ˜7:1 and ˜13:1 for the alternate amino acids. The data inFIG. 3 show that the domain exhibits specific binding for N-terminal methionine, indicating that engineered RS fragments are excellent molecular reagents for DAPES and that computational protein design is an efficient method for producing NAABs with specificity for particular N-terminal amino acids. - In this example, a histidine-tRNA synthetase (HISRS) from E. coli was modified to create a NAAB that binds specifically to N-terminal histidine residues. The fragment of wild-type HisRS from 1-320 was shown to be monomeric by others. After inspecting the crystal structure of HisRS, further residues were truncated from both ends. The initial fragment tested has from Lysine3 to Alanine180. Protein design was conducted to replace a long loop near the binding site with a shorter loop that would create a more open pocket and result in tighter binding to N-terminal histidine residues. This involved the replacement of an 11 residue loop (from Arginine113 to Lysine123) with a 4 residue turn, wherein the four residues of the inserted turn are Glycine, Asparagine, Alanine, and Proline. Thus, this NAAB comprises an internally truncated version of wild-type E. coli HisRS (residues 3-180; SEQ ID NO: 10) having seven fewer residues as compared to the wild-type sequence. The sequence of this N-terminal histidine-specific NAAB is provided by SEQ ID NO: 9.
-
FIG. 7 shows that engineered HisNAAB (SEQ ID NO: 10) exhibits enhanced binding affinity for peptides with N-terminal histidine residues as compared to the wild-type fragment. Biolayer interferometry kinetics data show that the engineered HisNAAB (data in open circles) binds N-terminal histidine with the same off-rate as the wild-type fragment (SEQ ID NO: 90 (data in solid circles), but with an enhanced on-rate. As a result, the engineered His NAAB binds with an approximately 10-fold improvement in binding affinity. - A synthetic gene containing the Edman degradation enzyme was purchased from GenScript. The gene encoded a modified version of the cruzian enzyme of T. cruzi having the following substitution mutations: C25G, G65S, A138C, and L160Y.
- The gene was inserted between an NdeI and an XhoI site in a pet42(a) (Novagen) expression vector and transformed into E. coli, BL-21(De3) chemically competent cells. Protein was then over-expressed following Studier's auto-induction protocol. Bacterial cells were harvested by centrifugation of the cell culture at 5000 rpm and 4° C. for 10 minutes. Cells were then suspended in 1×PBS with 10% glycerol and 6M guanidine chloride, pH 7.4. Cells were then lysed by sonication (15 seconds at 20% power, 8 times on ice). The cell lysate was centrifuged at 18000 rpm, 4 degrees for 20 minutes. The supernatant was then filtered through a 0.2 μm cellulose acetate filter. The filtered lysate was loaded onto a 5 mL HisTrap (Ni-NTA) column and washed with 5 column volumes of binding buffer (50 mM Tris-HCl, 150 mM NaCl, 6M guanidine chloride, 25 mM imidazole). Bound protein was then eluted in 50 mM Tris-HCl, 150 mM NaCl, 6M guanidine chloride, 500 mM imidazole. Purified fractions were prepared for SDS-PAGE analysis by mixing 2 parts sample with 1
part 4× loading dye. Samples were analyzed on 16% SDS-PAGE precast gels, and visualized by Coomassie staining. The purified protein was then refolded by successive, overnight dialyses into 1×PBS containing 5M, 3M, 1M, 0.5M, and 0M guanidine chloride. Protein concentration was determined using the calculated molar extinction coefficient and measuring the A280 on an ND-8000 spectrophotometer (Thermo Fisher Scientific). - Single amino acid, aminomethylcoumarin (AMC) containing compounds were obtained from BAChem (Bubendorf, Switzerland). These included Arg-AMC, Asn-AMC, Phe-AMC, Met-AMC, Ala-AMC, and Pro-AMC. Phenylisothiocyanate (PITC) was purchased from Thermo-scientific and coupled to the N-terminus of each substrate by incubating for 10 minutes at room temperature in a 100 μL solution of acetonitrile:pyridine:water (10:5:3) with 5 μL of PITC. The derivatized substrate was then dried by rotary evaporation and suspended in 250 μL of 1× Phosphate Buffered Saline (PBS). Inhibitor compound, 1-(2-anilino-5-methyl-1,3-thiazol-4-yl)-ethanone, was ordered from Sigma-Aldrich (St. Louis, Mo.).
- The ability of the Edman degradation enzyme to perform N-terminal cleavage on six substrates of the form Ed-X-AMC, where Ed denotes the Edman reagent, X is an amino acid from the set (Ala, Asp, Phe, Met, Pro, Arg), and AMC is the fluorogenic amidomethylcoumarin group was characterized. Cleavage of the X-AMC bond was monitored by the appearance of fluorescence (
FIG. 6 ). The engineered protein displayed activity against all six substrates to varying degrees (See Table below). - All kinetic measurements were performed in a 96-well coming plate on a BioTek Synergy2 plate reader at 30 degrees. Reactions were started by adding 5-20 μL of purified enzyme to 100 μL of 10 mM substrate solution. Final enzyme concentration was between 1 nM and 100 nM, depending on the experiment. Fluorescence of the cleaved product was measured by exciting at 370 nm (30 second intervals for 1-10 hours) and monitoring emissions at 460 nm. A standard curve using AMC from Invitrogen was referenced quantitate the amount of product formation.
-
TABLE Measured kinetic rates for Edmanase Substrate (χ-AMC) Kcat (s−1) Km (μM) Kcat/KM Alanine 0.55 21.3 2.6 × 104 Arginine 0.087 167.8 5.2 × 102 Asparagine 3.6 124.5 2.9 × 104 Methionine 0.54 271.8 2.0 × 103 Phenylalanine 0.47 122.8 3.8 × 103 Proline 0.0014 252.0 5.7 × 101 - Assays were conducted as described above in Example 5, with 5 μM substrate, 100 nM enzyme, and 500 nM-15 μM 1-(2-anilino-5-methyl-1,3-thiazol-4-yl)-ethanone. Reaction velocity was determined as above, plotted against the inverse of inhibitor concentration, and fit by non-linear least squares to determine the inhibition constant.
- Primers specific for each NAAB were ordered from Integrated DNA Technologies. Each NAAB was then amplified from isolated, E. coli genomic DNA and transferred to a pet42a expression vector at various positions, depending on the gene sequence. These constructs were transformed into either E. coli BL21(DE3) or E. coli ‘Arctic Express’ competent cells for expression.
- Protein was over-expressed following Studier's auto-induction protocol. Bacterial cells were harvested by centrifugation of the cell culture at 5000 rpm and 4° C. for 10 minutes. Cells were then suspended in 1×PBS with 10% glycerol, pH 7.4. Cells were then lysed by sonication (15 seconds at 20% power, 8 times on ice). The cell lysate was centrifuged at 18000 rpm, 4 degrees for 20 minutes. The supernatant was then filtered through a 0.2 um cellulose acetate filter. The filtered lysate was loaded onto a 1 mL GSTrap column and washed with 5 column volumes of binding buffer (1×PBS). Bound protein was then eluted in 50 mM Tris-HCl, 10 mM reduced glutathione. Purified fractions were prepared for SDS-PAGE analysis by mixing 2 parts sample with 1
part 4× loading dye. Samples were analyzed on 16% SDS-PAGE precast gels, and visualized by Coomassie staining. Protein concentration was determined using the calculated molar extinction coefficient and measuring the A280 on an ND-8000 spectrophotometer (Thermo Fisher Scientific). - Real time binding assays between peptides and purified NAABs were performed using biolayer interferometry on a Blitz system (Fortebio, Menlo Park, Calif.). This system monitors interference of light reflected from the surface of a fiber optic sensor to measure the thickness of molecules bound to the sensor surface. Sensors coated with peptides were allowed to bind to the NAABs in 1×PBS at several different protein concentrations. Binding kinetics were calculated using the Blitz software package, which fit the observed binding curves to a 1:1 binding model to calculate the association rate constants. NAABs were allowed to dissociate by incubation of the sensors in 1×PBS. Dissociation curves were fit to a 1:1 model to calculate the dissociation rate constants. Binding affinities were calculated as the kinetic dissociation rate constant divided by the kinetic association rate constant.
-
TABLE Measured Affinity Constants Glutamate 2.12 μM Phenylalanine 3.44 μM Histidine 98.7 μM Methionine 1.07 μM Asparagine 754 nM Arginine 129 nM Tryptophan 48.9 nM Tyrosine 57.6 μM Phosphoserine 7.72 μM Phosphotyrosine 1.07 μM Aspartate 411 nM Isoleucine 3.01 μM Leucine 1.88 μM Glutamine 531 nM Serine 938 nM Threonine 1.01 μM Valine 1.22 μM Lysine 2.61 μM -
FIG. 8 is a full binding matrix that shows how well every engineered NAAB protein binds to every N-terminal amino acid. Each square in the binding matrix represents the binding affinity for a single NAAB with an N-terminal amino acid as measured by biolayer interferometry. Each row in the matrix contains all the binding data for a single NAAB, and each column contains the binding data for a single N-terminal amino acid (shown by single-letter code). Darker squares represent tighter binding. The NAABs exhibit cross-binding for chemically similar N-terminal amino acids. However, the set of predicted binding patterns for each amino acid are distinct. Thus, when taken as a set, the engineered NAAB proteins are capable of identifying amino acids at the N-terminus of peptides. - For reference, the abbreviations of the amino acids are as follows:
-
Amino acid Three letter code One letter code alanine ala A arginine arg R asparagine asn N aspartic acid asp D asparagine or asx B aspartic acid cysteine cys C glutamic acid glu E glutamine gln Q glutamine or glx Z glutamic acid glycine gly G histidine his H isoleucine ile I leucine leu L lysine lys K methionine met M phenylalanine phe F proline pro P serine ser S threonine thr T tryptophan trp W tyrosine tyr Y valine val V -
TABLE A NAAB sequences SEQ ID NO: SEQ ID wild-type MAISIKTPEDIEKMRVAGRLAAEVLEMIEPYVKPGVSTGELD NO: 1 eMAP RICNDYIVNEQHAVSACLGYHGYPKSVCISINEVVCHGIPDD AKLLKDGDIVNIDVTVIKDGFHGDTSKMFIVGKPTIMGERLC RITQESLYLALRMVKPGINLREIGAAIQKFVEAEGFSVVREYC GHGIGRGFHEEPQVLHYDSRETNVVLKPGMTFTIEPMVNAG KKEIRTMKDGWTVKTKDRSLSAQYEHTIVVTDNGCEILTLR KDDTIPAIISHDE SEQ ID eLAP MAISIKTPEDIEKMRVAGRLAAEVLEMIEPYVKPGVSTGELE NO: 2 RICWDYIVNEQHATDSLTGHNGIDGHGSISINEVVCHGVPDD AKLLKDGDIVNIDVTVRKDGFHGDTSKMFIVGKPTIMGERLC RITQESLYLALRMVKPGINLREIGAAIQKFVEAEGFSVVREYC GHGIGRGHHEEPQVLHYDSRETNVVLKPGMTFTIEPMVNAG KKEIRTMKDGSTVKTKDRSLSAQYEHTIVVTDNGCEILTLRK DDTIPAIISHDE SEQ ID truncated AKKILVTCALPYANGSIHLGHMLEHIQADVWVRYQRMRGH NO: 3 wild-type EVNFICADDAHGTPIMLKAQQLGITPEQMIGEMSQEHQTDFA MetRS GFNISYDNYHSTHSEENRQLSELIYSRLKENGFIKNRTISQLY (4-547) DPEKGMFLPDRFVKGTCPKCKSPDQYGDNCEVCGATYSPTE LIEPKSVVSGATPVMRDSEHFFFDLPSFSEMLQAWTRSGALQ EQVANKMQEWFESGLQQWDISRDAPYFGFEIPNAPGKYFYV WLDAPIGYMGSFKNLCDKRGDSVSFDEYWKKDSTAELYHFI GKDIVYFHSLFWPAMLEGSNFRKPSNLFVHGYVTVNGAKMS KSRGTFIKASTWLNHFDADSLRYYYTAKLSSRIDDIDLNLED FVQRVNADIVNKVVNLASRNAGFINKRFDGVLASELADPQL YKTFTDAAEVIGEAWESREFGKAVREIMALADLANRYVDEQ APWVVAKQEGRDADLQAICSMGINLFRVLMTYLKPVLPKLT ERAEAFLNTELTWDGIQQPLLGHKVNPFKALYNRIDMRQVE ALVEASK SEQ ID Met AKKILVTCASPYANGSIHLGHMLEHIQADVWVRYQRMRGH NO: 4 NAAB* EVNFICADDAHGTPIMLKAQQLGITPEQMIGEMSQEHQTDFA GFNISYDNYHSTHSEENRQLSELIYSRLKENGFIKNRTISQLY DPEKGMFLPDRFVKGTCPKCKSPDQYGDNCEVCGATYSPTE LIEPKSVVSGATPVMRDSEHFFFDLPSFSEMLQAWTRSGALQ EQVANKMQEWFESGLQQWDISRDAPYFGFEIPNAPGKYFYV WLDAPIGLMGSFKNLCDKRGDSVSFDEYWKKDSTAELYHFI GKGIVYFLSLFWPAMLEGSNFRKPSNLFVHGYVTVNGAKMS KSRGTFIKASTWLNHFDADSLRYYYTAKLSSRIDDIDLNLED FVQRVNADIVNKVVNLASRNAGFINKRFDGVLASELADPQL YKTFTDAAEVIGEAWESREFGKAVREIMALADLANRYVDEQ APWVVAKQEGRDADLQAICSMGINLFRVLMTYLKPVLPKLT ERAEAFLNTELTWDGIQQPLLGHKVNPFKALYNRIDMRQVE ALVEASK SEQ ID wild-type MTQVAKKILVTCALPYANGSIHLGHMLEHIQADVWVRYQR NO: 5 MetRS MRGHEVNFICADDAHGTPIMLKAQQLGITPEQMIGEMSQEH (full QTDFAGFNISYDNYHSTHSEENRQLSELIYSRLKENGFIKNRT length) ISQLYDPEKGMFLPDRFVKGTCPKCKSPDQYGDNCEVCGAT YSPTELIEPKSVVSGATPVMRDSEHFFFDLPSFSEMLQAWTRS GALQEQVANKMQEWFESGLQQWDISRDAPYFGFEIPNAPGK YFYVWLDAPIGYMGSFKNLCDKRGDSVSFDEYWKKDSTAE LYHFIGKDIVYFHSLFWPAMLEGSNFRKPSNLFVHGYVTVN GAKMSKSRGTFIKASTWLNHFDADSLRYYYTAKLSSRIDDID LNLEDFVQRVNADIVNKVVNLASRNAGFINKRFDGVLASEL ADPQLYKTFTDAAEVIGEAWESREFGKAVREIMALADLANR YVDEQAPWVVAKQEGRDADLQAICSMGINLFRVLMTYLKP VLPKLTERAEAFLNTELTWDGIQQPLLGHKVNPFKALYNRID MRQVEALVEASKEEVKAAAAPVTGPLADDPIQETITFDDFA KVDLRVALIENAEFVEGSDKLLRLTLDLGGEKRNVFSGIRSA YPDPQALIGRHTIMVANLAPRKMRFGISEGMVMAAGPGGKD IFLLSPDAGAKPGHQVK SEQ ID truncated VDVSLPGASLFSGGLHPITLMERELVEIFRALGYQAVEGPEV NO: 6 wild-type ESEFFNFDALNIPEHHPARDMWDTFWLTGEGFRLEGPLGEEV PheRS EGRLLLRTHTSPMQVRYMVAHTPPFRIVVPGRVFRFEQTDAT (86-350) HEAVFHQLEGLVVGEGIAMAHLKGAIYELAQALFGPDSKVR FQPVYFPFVEPGAQFAVWWPEGGKWLELGGAGMVHPKVFQ AVDAYRERLGLPPAYRGVTGFAFGLGVERLAMLRYGIPDIR YFFGGRLKFLEQFKGVL SEQ ID PheNAAB VDVSLPGASLFSGGDHPITLMERELVEIFRALGYQAVEGPEV NO: 7 (86-350) ESEFFNFDALNIPENGPARDMWDTVGKTGEGFRLEGPDGEE VEGRLLLRTHTSPMQVRYMVAHTPPFRIVVPGRVFRAEQTD ATAEAVFHQLEGLVVGEGVNEGDLYGAIYELAQALFGPDSK VRFQPVTFPFVEPGAQFAVWWPEGGKWLELGGAGMVGPNV FQAVDAYRERLGDPPAYRGVTGFAFGLGVERLAMLRYGIPD IRYF SEQ ID wild-type MLEEALAAIQNARDLEELKALKARYLGKKGLLTQEMKGLS NO: 8 PheRS ALPLEERRKRGQELNAIKAALEAALEAREKALEEAALKEAL (full ERERVDVSLPGASLFSGGLHPITLMERELVEIFRALGYQAVE length) GPEVESEFFNFDALNIPEHHPARDMWDTFWLTGEGFRLEGPL GEEVEGRLLLRTHTSPMQVRYMVAHTPPFRIVVPGRVFRFEQ TDATHEAVFHQLEGLVVGEGIAMAHLKGAIYELAQALFGPD SKVRFQPVYFPFVEPGAQFAVWWPEGGKWLELGGAGMVHP KVFQAVDAYRERLGLPPAYRGVTGFAFGLGVERLAMLRYGI PDIRYFFGGRLKFLEQFKGVL SEQ ID truncated NIQAIRGMNDYLPGETAIWQRIEGTLKNVLGSYGYSEIRLPIV NO: 9 wild-type EQTPLFKRAIGEVTDVVEKEMYTFEDRNGDSLTLRPEGTAGC HisRS VRAGIEHGLLYNQEQRLWYIGPMFRHERPQKGRYRQFHQLG (3-180) CEVFGLQGPDIDAELIMLTARWWRALGISEHVTLELNSIGSL EARANYRDA SEQ ID HisNAAB KNIQAIRGMNDYLPGETAIWQRIEGTLKNVLGSYGYSEIRLPI NO: 10 (3-180) VEQTPLFKRAIGEVTDVVEKEMYTFEDRNGDSLTLRPEGTA GCVRAGIEHGLLYNQEQRLWYIGPMFGNAPQFHQLGCEVFG LQGPDIDAELIMLTARWWRALGISEHVTLELNSIGSLEARAN YRDA SEQ ID AlaNAAB SKSTAEIRQAFLDFFHSKGHQVVASSSLVPHNDPTLLFTNAG NO: 11 MNQFKDVFLGLDKRNYSRATTSQRCVRAGGKHNDLENVGY TARHHTFFEMLGNFSFGDYFKHDAIQFAWELLTSEKWF ALPKERLWVTVYESDDEAYEIWEKEVGIPRERIIRIGDNKGA PYASDNFWQMGDTGPCGPCTEIFYDHGDHIWGGPPGSPEED GDRYIEIWNIVFMQFNRQADGTMEPLPKPSVDTGMGL ERIAAVLQHVNSNYDIDL SEQ ID ArgNAAB EKQTIVVDYSAPNVAKEMHVGHLRSTIIGDAAVRTLEFLGH NO: 12 KVIRANHVGDWGTQFGMLIAWLEKQQQENAGEMELADLE GFYRDAKKHYDEDEEFAERARNYVVKLQSGDEYFREMWR KLVDITMTQNQITYDRLNVTLTRDDVMGESLYNPMLPGIVA DLKAKGLAVESEGATVVFLDEFKNKEGEPMGVIIQKKDGGY LYTTTDIACAKYRYESLHADRVLYYIDSRQHQHLMQAWAIV RKAGYVPESVPLEHHMFGMMLGKDGKPFKTRAGGTVKLAD LLDETLERARRLVAEKNPDMPADELEKLANAVGIGAVKYA DLSKNRTTDYIFDWDNMLAFEGNTAPYMQYAYTRVLSVFR KAEINEEQLAAAPVIIREDREAQLAARLLQFEETLTVVAREG TPHVMCAYLYDLAGLFSGFYEHCPILSAENEEVRNSRLKLAQ LTAKTLKLGLDTLGIETVERM SEQ ID AsnNAAB SIEYLREVAHLRPRTNLIGAVARVRHTLAQALHRFFNEQGFF NO: 13 WVSTPLITASDTEGAGEMFRVSTLDLE NLPRNDQGKVDFDKDFFGKESFLTVSGQLNGETYACALSKI YTFGPTFRAENSNTSRHLAEFWMLEPEVAFANLNDIAGLAE AMLKYVFKAVLEERADDMKFFAERVDKDAVSRLERFIEADF AQVDYTDAVTILENCGRKFENPVYWGVDLSSEHERYLAEEH FKAPVVVKNYPKDIKAFYMRLNEDGKTVAAMDVLAPGIGEI IGGSQREERLDVLDERMLEMGLNKEDYWWYRDLRRYGTVP HSGFGLGFERLIAYVTGVQNVRDVIPFPRTP SEQ ID AspNAAB LPLDSNHVNTEEARLKYRYLDLRRPEMAQRLKTRAKITSLV NO: 14 RRFMDDHGFLDIETPMLTKATPEGARDYLVPSRVHKGKFYA LPQSPQLFKQLLMMSGFDRYYQIVKCFRDEDLRADRQPEFT QIDVETSFMTAPQVREVMEALVRHLWLEVKGVDLGDFPVM TFAEAERRYGSDKPDLRNPMELTDVADLLRSVEFAVFAGPA NDPKGRVAALRVPGGASLTRKQIDEYDNFVKIYGAKGLAYI KVNERAKGLEGINSPVAKFLNAEHEAILDRTAAQDGDMIFFG ADNKKIVADAMGALRLKVGKDLGLTDESKWAPLWVIDFPM FEDDGEGGLTAMHHPFTSPKDMTAAELKAAPENAVANAYD MVINGYEVGGGSVRIHNGDMQQTVFGILGINEEEQREKFGFL LDALKYGTPPHAGLAFGLDRLTMLLTGTDNIRDVIAFPK SEQ ID CysNAAB MLKIFNTLTRQKEEFKPIHAGEVGMYVCGITVYDLCHIGHGR NO: 15 TFVAFDVVARYLRFLGYKLKYVRNITDI DDKIIKRANENGESFVAMVDRMIAEMHKDFDALNILRPDME PRATHHIAEIIELTEQLIAKGHAYVADNGDVMFDVPTDPTYG VLSRQDLDQLQAGARVDVVDDKRNPMDFVLWKMSKEGEP SWPSPWGAGRPGWHIECSAMNCKQLGNHFDIHGGGSDLMF PHHENEIAQSTCAHDGQYVNYWMHSGMVMVDREKMSKSL GNFFTVRDVLKYYDAETVRYFLMSGHYRSQLNY SEQ ID GlnNAAB TNFIRQIIDEDLASGKHTTVHTRFPPEPNGYLHIGHAKSICLNF NO: 16 GIAQDYKGQCNLRFDDTNPVKEDIEYVESIKNDVEWLGFHW SGNVRYSSDYFDQLHAYAIELINKGLAYVDELTPEQIREYRG TLTQPGKNSPYRDRSVEENLALFEKMRTGGFEEGKACLRAKI DMASPFIVMRDPVLYRIKFAEHHQTGNKWCIYPMYDFTHCIS DALEGITHSLCTLEFQDNRRLYDWVLDNITIPVHPRQYEFSR SEQ ID GluNAAB IKTRFAPSPTGYLHVGGARTALYSWLFARNHGGEFVLRIEDT NO: 17 DLERSTPEAIEAIMDGMNWLSLEWDEGPYYQTKRFDRYNAV IDQMLEEGTAYKCYCSKERLEALREEQMAKGEKPRYDGRC RHSHEHHADDEPCVVRFANPQEGSVVFDDQIRGPIEFSNQEL DDLIIRRTDGSPTYNFCVVVDDWDMEITHVIRGEDHINNTPR QINILKALNAPVPVYAHVSMINGDDGKKLSKRHGAVSVMQ YRDDGYLPEALLNYLVRLGWSHGDQEIFTREEMIKYFTLNA VSKSASAFNTDKLLWLNHHYI SEQ ID IleNAAB FPMRGDLAKREPGMLARWTDDDLYGIIRAAKKGKKTFILHD NO: 18 GPPYANGSIHIGHSVNKILKDIIIKSKGLSGYDSPYVPGWDCH GLPIELKVEQEYGKPGEKFTAAEFRAKCREYAATQVDGQRK DFIRLGVLGDWSHPYLTMDFKTEANIIRALGKIIGNGHLHKG AKPVHWCVDCRSALAEAEVEYYDKTSPSIVAFQAVDQDAL KTKFGVSNVNGPISLVIWTTTPWTLPANRAISIAPDFDYALVQ IDGQAVILAKDLVESMQRIGVSDYTILGTVKGAELELLRFTH PFMDFDVPAILGDHVTLDAGTGAVHTAPGHGPDDYVIGQKY GLETANPVGPDGTYLPGTYPTLDGVNVFKANDIVVALLQEK GALLHVEKMQHSYPCCWRHKTPIIFRATPQWFVSMDQKGLR AQSLKEIKGVQWIPDWGQARIESMVANRPDWCISRQRTWG VPMSLFVHKDTEELHPRTLELMEEVAKRVEVDGIQAWWDL DAKEILGDEADQYVKVPDTLDVWFDSGSTHSSVVDVRPEFA GHAADMYLEGSDQHRGWFMSSLMISTAMKGKAPYRQVLT HGFTVDGQGRKMSKSIGNTVSPQDVMNKLGADILRLWVAS TDYTGEMAVSDEILKRAADSYRRIRNTARFLLANLNGFDPA KDMVKPEEMVVLDRWAVGCAKAAQEDILKAYEAYDFHEV VQRLMRFCSVEMGSFYLDIIKDRQYTAKADSVARRSCQTAL YHIAEALVRWMAPILSFTADEVWGYLPGERE SEQ ID LeuNAAB IESKVQLHWDEKRTFEVTEDESKEKYYCLSMLPYPSGRLHM NO: 19 GHVRNYTIGDVIARYQRMLGKNVLQPIGWDAFGLPAEGAA VKNNTAPAPWTYDNIAYMKNQLKMLGFGYDWSRELATCTP EYYRWEQKCFTELYKKGLVYKKTSAVNWCPNDQTVLANE QVIDGCCWRCDTKVERKEIPQWFIKITAYADELLNDLDKLD HWPDTVKTMQRNWIGRSEGVEITFNVKDYDNTLTVYTTRPD TFMGCTYLAVAAGHPLAQKAAENNPELAAFIDECRNTKVAE AEMATMEKKGVDTGFKAVHPLTGEEIPVWAANFVLMEYGT GAVMAVPGHDQRDYEFASKYGLNIKPVILAADGSEPDLSQQ ALTEKGVLFNSGEFNGLDHEAAFNAIADKLTEMGVGERKVN YRLRDWGVSRQRYWGAPIPMVTLEDGTVMPTPDDQLPVILP EDVVMDGITSPIKADPEWAKTTVNGMPALRETDTFDTFMES SWYYARYTCPEYKEGMLDSKAANYWLPVDIYIGGIEHAIMH LLYFRFFHKLMRDAGMVNSDEPAKQLLCQGMVLADAFYYV GENGERNWVSPVDAIVERDEKGRIVKAKDAAGHELVYTGM SKMSKSKNNGIDPQVMVERYGADTVRLFMMFASPADMTLE WQESGVEGANRFLKRVWKLVYEHTAKGDVAALNVDALTE DQKALRRDVHKTIAKVTDDIGRRQTFNTAIAAIMELMNKLA KAPTDGEQDRALMQEALLAVVRMLNPFTPHICFTLWQELKG EGDIDNAPWP SEQ ID LysNAAB ANDKSRQTFVVRSKILAAIRQFMVARGFMEVETPMMQVIPG NO: 20 GASARPFITHHNALDLDMYLRIAPELYLKRLVVGGFERVFEI NRNFRNEGISVRHNPEFTMMELYMAYADYHDLIELTESLFRT LAQEVLGTTKVTYGEHVFDFGKPFEKLTMREAIKKYRPETD MADLDNFDAAKALAESIGITVEKSWGLGRIVTEIFDEVAEAH LIQPTFITEYPAEVSPLARRNDVNPEITDRFEFFIGGREIGNGFS ELNDAEDQAERFQEQVNAKAAGDDEAMFYDEDYVTALEY GLPPTAGLGIGIDRMIMLFTNSHTIRDVILFPAMRP SEQ ID ProNAAB MIRKLASGLYTWLPTGVRVLKKVENIVREEMNNAGAIEVLM NO: 21 PVVQPSELWQESGRWEQYGPELLRIADRGDRPFVLGPTHEE VITDLIRNELSSYKQLPLNFYQIQTKFRDEVRPRFGVMRSREF LMKDAYSFHTSQESLQETYDAMYAAYSKIFSRMGLDFRAVQ ADTGSIGGSASHEFQVLAQSGEDDVVFSDTSDYAANIELAEA IAPKEPRAAATQEMTLVDTPNAKTIAELVEQFNLPIEKTVKTL LVKAVEGSSFPLVALLVRGDHELNEVKAEKLPQVASPLTFAT EEEIRAVVKAGPGSLGPVNMPIPVVIDRTVAAMSDFAAGANI DGKHYFGINWDRDVATPEIADIRNVVAGDPSPDGQGTLLIKR GIEVGHIFQLG SEQ ID SerNAAB MLDPNLLRNEPDAVAEKLARRGFKLDVDKLGALEERRKVL NO: 22 QVKTENLQAERNSRSKSIGQAKARGEDIEPLRLEVNKLGEEL DAAKAELDALQAEIRDIALTIPNLPADEVPVGKDENDNVEVS RWGTPREFDFEVRDHVTLGEMYSGLDFAAAVKLTGSRFVV MKGQIARMHRALSQFMLDLHTEQHGYSENYVPYLVNQDTL YGTGQLPKFAGDLFHTRPLEEEADTSNYALIPTAEVPLTNLV RGEIIDEDDLPIKMTAHTPCFRSEAGSYGRDTRGLIRMHQFD KVEMVQIVRPEDSMAALEEMTGHAEKVLQLLGLPYRKIILC TGDMGFGACKTYDLEVWIPAQNTYREISSCSNVWDFQARR MQARCRSKSDKKTRLVHTLNGSGLAVGRTLVAVMENYQQ ADGRIEVPEVLRPYMNGLEYI SEQ ID ThreNAAB RDHRKIGKQLDLYHMQEEAPGMVFWHNDGWTIFRELEVFV NO: 23 RSKLKEYQYQEVKGPFMMDRVLWEKTGHWDNYKDAMFTT S SENREYCIKPMNCPGHVQIFNQGLKSYRDLPLRMAEFGSCH RNEPSGSLHGLGRVRGFTQDDAHIFCTEEQIRDEVNGCIRLV YDMYSTFGFEKIVVKLSTRPEKRIGSDEMWDRAEADLAVAL EENNIPFEYQLGEGAFYGPKIEFTLYDCLDRAAQCGTVQLDF SLPSRLSASYVGEDNERKVPVMIHRAILGSMEVFIGILTEEFA GFFPTWLAPVQVVIMNITDSQSEYVNELTQKLSNAGIRVKAD LRNEKIGFKIREHTLRRVPYMLVCGDKEVESGKVAVRTRRG KDLGSMDVNEVIEKLQQEIRSRSLKQLEE SEQ ID TrpNAAB MTKPIVFSGAQPSGELTIGNYMGALRQWINMQDDYHCIYCI NO: 24 VDQHAITVRQDAQKLRKATLDTLALYLACGIDPEKSTIFVQS HVPEHAQLGWALNCYTYFGELSRMTQFKDKSARYAENINA GLFDYPVLMAADILLYQTNLVPVGEDQKQHLELSRDIAQRF NALYGDIFKVPEPFIPKSGARVMSLLEPTKKMSKSDDNRNNV IGLLEDPKSVVKKIKRAVTDSDEPPVVRYDVQNKAGVSNLL DILSAVTGQSIPELEKQ SEQ ID TyrNAAB MASSNLIKQLQERGLVAQVTDEEALVERLAQGPIALYCGFDP NO: 25 TADSLHLGHLVPLLCLKRFQQAGHKPVALVGGATGLIGDPS FKAAERKLNTEETVQEWVDKIRKQVAPFLDFDCGENSAIAA NNYDWFGNMNVLTFLRDIGKHFSVNQMINKEAVKQRLNRE DQGISFTEFSYNLLQGYDFACLNKQYGVVLQIGGSDQWGNI TSGIDLTRRLHQNQVFGLTVPLITKADGTKFGKTEGGAVWL DPKKTSPYKFYQFWINTADADVYRFLKFFTFMSIEEINALEEE DKNSGKAPRAQYVLAEQVTRLVHGEEGLQAAKRITECLFSG SLSALSEADFEQLAQDGVPMVKMEKGADLMQALVDSELQP SRGQARKTIASNAITINGEKQSDPEYFFKEEDRLFGRFTLLRR GKKNYCLICWK SEQ ID ValNAAB MEKTYNPQDIEQPLYEHWEKQGYFKPNGDESQESFCIMIPPP NO: 26 NVTGSLHMGHAFQQTIMDTMIRYQRMQGKNTLWQVGTDH AGIATQMVVERKIAAEEGKTRHDYGREAFIDKIWEWKAESG GTITRQMRRLGNSVDWERERFTMDEGLSNAVKEVFVRLYK EDLIYRGKRLVNWDPKLRTAISDLEVENRESKGSMWHIRYP LADGAKTADGKDYLVVATTRPETLLGDTGVAVNPEDPRYK DLIGKYVILPLVNRRIPIVGDEHADMEKGTGCVKITPAHDFN DYEVGKRHALPMINILTFDGDIRESAQVFDTKGNESDVYSSEI PAEFQKLERFAARKAVVAAIDALGLLEEIKPHDLTVPYGDRG GVVIEPMLTDQWYVRADVLAKPAVEAVENGDIQFVPKQYE NMYFSWMRDIQDWCISRQLWWGHRIPAWYDEAGNVYVGR NEEEVRKENNLGADVALRQDEDVLDTWFSSALWTFSTLGW PENTDALRQFHPTSVMVSGFDIIFFWIARMIMMTMHFIKDEN GKPQVPFHTVYMTGLIRDDEGQKMSKSKGNVIDPLDMVDGI SLPELLEKRTGNMMQPQLADKIRKRTEKQFPNGIEPHGTDAL RFTLAALASTGRDINWDMKRLEGYRNFCNKLWNASRFVLM NTEGQDCGFNGGEMTLSLADRWILAEFNQTIKAYREALDSF RFDIAAGILYEFTWNQFCDWYLELTKPVMNGGTEAELRGTR HTLVTVLEGLLRLAHPIIPFITETIWQ SEQ ID Phospho- MDEFEMIKRNTSEIISELREVLKKDEKSALIGFEPSGKIHLGH NO: 27 tyrosine YLQKKMIDLQNAGFDIIIPLADLHAYLNQKGELDEIRKIGDY NAAB** NKKVFEAMLKAKYVYGSEFQLDKYTLNVYRLALKTTLKAR RSMELIAREDENPVAEVIYPIMQVNGCHYKGVDVAVGGME QRKIMLARELLPKKVVCIHPVLTGLDGEGKMSSSGNFIAVDD SPEEIRAFKKAYCPAGVVEGNPEIAKYFLEYPLTIKPEKFGGD LTVNSYEESLFKNKELHPMDLKAVAEELIKILEPIRK SEQ ID Phospho- MRFDPEKIKKDAKENFDLTWNEGKKMVKTPTLNERYPRTTF NO: 28 serine RYGKAHPVYDTIQKLREAYLRMGFEEMMNPLIVDEKEVHK NAAB QFGSEALAVLDRCFYLAGLPRPNVGISDERIAQINGILGDIGD EGIDKVRKVLHAYKKGKVEGDDLVPEISAALEVSDALVAD MIEKVFPEFKELVAQASTKTLRSHMTSGWFISLGALLERKEP PFHFFSIDRCFRREQQEDASRLMTYYSASCVIMDENVTVDHG KAVAEGLLSQFGFEKFLFRPDEKRSKYYVPDTQTEVFAFHPK LVGSNSKYSDGWIEIATFGIYSPTALAEYDIPCPVMNLGLGVE RLAMILHDAPDIRSLTYPQIPQYSEWEMSDSELAKQVFVDKT PETPEGREIADAVVAQCELHGEEP SPCEFPAWEGEVCGRKVK VSVIEPEENTKLCGPAAFNEVVTYQGDILGIPNTKKWQKAFE NHSAMAGIRFIEAFAAQAAREIEEAAMSGADEHIVRVRIVKV PSEVNIKIGATAQRYITGKNKKIDMRGPIFTSAKAEFE *Utilizes base truncation mutant reported in reference (3) with an additional mutation of our own design. **Truncated version of sulfotyrosine tRNA synthetase mutant from (2). The full length mutant is under patent - no. U.S. Pat. No. 8,114,652 B2. -
TABLE B Edmanase Sequence SEQ ID APAAVDWRARGAVTAVKDSGQCGSGWAFAAIGNVECQWFLA NO: 29 GHPLTNLSEQMLVSCDKTDSGCSSGLMDNAFEWIVQENNGA VYTEDSYPYASATGISPPCTTSGHTVGATITGHVELPQDEA QIAAWLAVNGPVAVCVDASSWMTYTGGVMTSCVSESYDHGV LLVGYNDSHKVPYWIIKNSWTTQWGEEGYIRIAKGSNQCLV KEEASSAVVG -
- 1. Ingolia N T, Ghaemmaghami S, Newman JRS, Weissman J S. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science, 2009, 324: 218.
- 2. Grimsrud P A, Swaney D L, Wenger C D, Beauchene N A, Coon J J. Phosphoproteomics for the masses. ACS Chem Biol. 2010, 5: 105-119.
- 3. Duncan M W, Aebersold R, Caprioli R M. The pros and cons of peptide-centric proteomics. Nat Biotechnol. 2010.
- 4. Gillette M A, Mani D R, Carr S A. Place of Pattern in Proteomic Biomarker Discovery. J Proteome Res. 2005, 4: 1143-1154.
- 5. Anderson N L, Anderson N G. The human plasma proteome: history, character, and diagnostic prospects. Mol Cell Proteomics. 2002
- 6. Edman P. Method for determination of the amino acid sequence in peptides. Acta Chem Scand. 1950, 4: 283-293.
- 7. Mitra R D, Tessler L A. Single Molecule Protein Screening. WO 2010/065531 A1.
- 8. Tessler L A, Donahoe C D, Garcia D J, Jun Y S, Elbert D L, Mitra R D. Nanogel surface coatings for improved single-molecule imaging substrates. J R Soc Interface. 2011
- 9. Tessler L A, Reifenberger J G, Mitra R D. Protein Quantification in Complex Mixtures by Solid Phase Single Molecule Counting. Anal Chem. 2009, 81: 7141-7148.
- 10. Emmert-Buck M R, Bonner R F, Smith P D, Chuaqui R F, Zhuang Z, Goldstein S R, Weiss R A, Liotta L A. Laser capture microdissection. Science. 1996, 274: 998.
- 11. Havranek J J, Harbury P B. Automated design of specificity in molecular recognition. Nat Struct Biol. 2003, 10: 45-52.
- 12. Ashworth J, Havranek J J, Duarte C M, Sussman D, R. J. Monnat J, Stoddard B L, Baker D. Computational redesign of endonuclease DNA binding and cleavage specificity. Nature. 2006
- 13. Ashworth J, Taylor G K, Havranek J J, Quadri S A, Stoddard B L, Baker D. Computational reprogramming of homing endonuclease specificity at multiple adjacent base pairs. Nucleic Acids Res. 2010, 38: 5601.
- 14. Havranek J J, Baker D. Motif-directed flexible backbone design of functional interactions. Protein Sci. 2009, 18: 1293-1305.
- 15. Berman H, Henrick K, Nakamura H, Markley J L. The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res. 2oo6, 35: D301.
- 16. Schmitt E, Tanrikulu I C, Yoo T H, Panvert M, Tirrell D A, Mechulam Y. Switching from an induced-fit to a lock-and-key mechanism in an aminoacyl-tRNA synthetase with modified specificity. J Mol Biol. 2009, 394: 843-851.
- 17. Studier F W. Protein production by auto-induction in high-density shaking cultures. Protein Expr Purif 2005, 41: 207-234.
- 18. Wolf Y I, Aravind L, Grishin N V, Koonin E V. Evolution of aminoacyl-tRNA synthetases-analysis of unique domain architectures and phylogenetic trees reveals a complex history of horizontal gene transfer events. Genome Res. 1999, 9: 689.
- 19. Finn R D, Tate J, Mistry J, Coggill P C, Sammut S J, Hotz H R, Ceric G, Forslund K, Eddy S R, Sonnhammer E L, Bateman A. The Pfam protein families database. Nucleic Acids Res. 2008, 36: D281-8.
- 20. Augustine J, Francklyn C. Design of an active fragment of a class II aminoacyl-tRNA synthetase and its significance for synthetase evolution. Biochemistry. 1997, 36: 3473-3482.
- 21. Amez J G, Augustine J G, Moras D, Francklyn C S. The first step of aminoacylation at the atomic level in histidyl-tRNA synthetase. Proc Natl Acad Sci USA. 1997, 94: 7144.
- 22. Holm L, Rosenstrom P. Dali server: conservation mapping in 3D. Nucleic Acids Res. 2010, 38: W545.
- 23. Kavran J M, Gundllapalli S, O′donoghue P, Englert M, Soll D, Steitz T A. Structure of pyrrolysyl-tRNA synthetase, an archaeal enzyme for genetic code innovation. Proc Natl Acad Sci USA. 2007, 104: 11268.
- 24. Kuhlman B, Baker D. Native protein sequences are close to optimal for their structures. Proc Natl Acad Sci USA. 2000; 97:10383-10388.
- 25. Barrett G C, Penglis A J. Edman Stepwise degradation of polypeptides: a new strategy employing mild basic cleavage conditions. Tetrahedron Lett. 1985, 26: 4375-4378.
- 26. Celej M S, Montich G G, Fidelia G D. Protein stability induced by ligand binding correlates with changes in protein flexibility. Protein Sci. 2003, 12: 1496-1506.
- 27. Choe Y, Brinen L S, Price M S, Engel J C, Lange M, Grisostomi C, Weston S G, Pallai P V, Cheng H, Hardy L W. Development of a-keto-based inhibitors of cruzain, a cysteine protease implicated in Chagas disease. Bioorg Med Chem. 2005, 13: 2141-2156.
- 28. Carter P, Wells J A. Engineering enzyme specificity by “substrate-assisted catalysis”. Science. 1987, 237: 394.
- 29. McGrath M E. The lysosomal cysteine proteases. Annu Rev Biophys Biomol Struct. 1999, 28: 181-204.
- 30. Jiang L, Althoff E A, Clemente F R, Doyle L, Rothlisberger D, Zanghellini A, Gallaher J L, Betker J L, Tanaka F, Barbas C F 3rd, Hilvert D, Houk H N, Stoddard B L, Baker D. De novo computational design of retro-aldol enzymes. Science. 2008, 319: 1387-1391.
- 31. Rothlisberger D, Khersonsky O, Wollacott A M, Jiang L, DeChancie J, Betker J, Gallaher J L, Althoff E A, Zanghellini A, Dym O, Albeck S, Houk K N, Tawfik D S, Baker D. Kemp elimination catalysts by computational enzyme design. Nature. 2008, 453: 190-195.
- 32. Schmidt M W, Baldridge K K, Boatz J A, Elbert S T, Gordon M S, Jensen J H, Koseki S, Matsunaga N, Nguyen K A, Su S. General atomic and molecular electronic structure system. J Comput Chem. 1993, 14: 1347-1363.
- 33. Dantas G, Corrent C, Reichow S L, Havranek J J, Eletr Z M, Isern N G, Kuhlman B, Varani G, Merritt E A, Baker D. High-resolution structural and thermodynamic analysis of extreme stabilization of human procarboxypeptidase by computational protein design. J Mol Biol. 2007, 366: 1209-1221.
- 34. Dunbrack R L. Backbone-dependent rotamer library for proteins application to side-chain prediction. J Mol Biol. 1993, 230: 543-574.
- 35. Chiravuri M, Agarraberes F, Mathieu S L, Lee H, Huber B T. Vesicular localization and characterization of a novel post-proline-cleaving aminodipeptidase, quiescent cell proline dipeptidase. J Immunol. 2000, 165: 5695.
- 36. Fukunaga R, Yokoyama S. Structural insights into the first step of RNA-dependent cysteine biosynthesis in archaea. Nat Struct Mol Biol. 2007, 14: 272-279.
- 37. Liu C C, Schultz P G. Recombinant expression of selectively sulfated proteins in Escherichia coli. Nat Biotechnol. 2006, 24: 1436-1440.
- 38. Turner J M, Graziano J, Spraggon G, Schultz P G. Structural characterization of a p-acetylphenylalanyl aminoacyl-tRNA synthetase. J Am Chem Soc. 2005, 127: 14976-14977.
- 39. Xie J, Supekova L, Schultz P G. A genetically encoded metabolically stable analogue of phosphotyrosine in Escherichia coli. ACS Chem Biol. 2007, 2: 474-478.
- 40. O'Brien P J, Herschlag D. Catalytic promiscuity and the evolution of new enzymatic activities. Chem Biol. 1999, 6: R91-R105.
Claims (32)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/088,898 US20210072252A1 (en) | 2013-03-15 | 2020-11-04 | Molecules and methods for iterative polypeptide analysis and processing |
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201361798705P | 2013-03-15 | 2013-03-15 | |
US14/211,448 US9435810B2 (en) | 2013-03-15 | 2014-03-14 | Molecules and methods for iterative polypeptide analysis and processing |
US15/255,433 US20170052194A1 (en) | 2013-03-15 | 2016-09-02 | Molecules and methods for iterative polypeptide analysis and processing |
US16/907,813 US10852305B2 (en) | 2013-03-15 | 2020-06-22 | Molecules and methods for iterative polypeptide analysis and processing |
US17/088,898 US20210072252A1 (en) | 2013-03-15 | 2020-11-04 | Molecules and methods for iterative polypeptide analysis and processing |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/907,813 Continuation US10852305B2 (en) | 2013-03-15 | 2020-06-22 | Molecules and methods for iterative polypeptide analysis and processing |
Publications (1)
Publication Number | Publication Date |
---|---|
US20210072252A1 true US20210072252A1 (en) | 2021-03-11 |
Family
ID=51528735
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/211,448 Active 2034-08-20 US9435810B2 (en) | 2013-03-15 | 2014-03-14 | Molecules and methods for iterative polypeptide analysis and processing |
US15/255,433 Abandoned US20170052194A1 (en) | 2013-03-15 | 2016-09-02 | Molecules and methods for iterative polypeptide analysis and processing |
US16/907,813 Active US10852305B2 (en) | 2013-03-15 | 2020-06-22 | Molecules and methods for iterative polypeptide analysis and processing |
US17/088,898 Abandoned US20210072252A1 (en) | 2013-03-15 | 2020-11-04 | Molecules and methods for iterative polypeptide analysis and processing |
Family Applications Before (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/211,448 Active 2034-08-20 US9435810B2 (en) | 2013-03-15 | 2014-03-14 | Molecules and methods for iterative polypeptide analysis and processing |
US15/255,433 Abandoned US20170052194A1 (en) | 2013-03-15 | 2016-09-02 | Molecules and methods for iterative polypeptide analysis and processing |
US16/907,813 Active US10852305B2 (en) | 2013-03-15 | 2020-06-22 | Molecules and methods for iterative polypeptide analysis and processing |
Country Status (1)
Country | Link |
---|---|
US (4) | US9435810B2 (en) |
Families Citing this family (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10337054B2 (en) | 2004-02-02 | 2019-07-02 | Quantum-Si Incorporated | Enrichment of nucleic acid targets |
US9566335B1 (en) | 2009-09-25 | 2017-02-14 | The Governing Council Of The University Of Toronto | Protein sequencing method and reagents |
US9625469B2 (en) | 2011-06-23 | 2017-04-18 | Board Of Regents, The University Of Texas System | Identifying peptides at the single molecule level |
US11435358B2 (en) | 2011-06-23 | 2022-09-06 | Board Of Regents, The University Of Texas System | Single molecule peptide sequencing |
EP3543703A1 (en) | 2014-09-15 | 2019-09-25 | Board of Regents, The University of Texas System | Improved single molecule peptide sequencing |
US10174363B2 (en) | 2015-05-20 | 2019-01-08 | Quantum-Si Incorporated | Methods for nucleic acid sequencing |
US11130986B2 (en) | 2015-05-20 | 2021-09-28 | Quantum-Si Incorporated | Method for isolating target nucleic acid using heteroduplex binding proteins |
WO2017063093A1 (en) | 2015-10-16 | 2017-04-20 | Andrew Emili | Protein sequencing methods and reagents |
EP4299803A3 (en) | 2016-05-02 | 2024-03-27 | Encodia, Inc. | Macromolecule analysis employing nucleic acid encoding |
CA3047321A1 (en) | 2016-12-19 | 2018-06-28 | Quantum-Si Incorporated | Polymerizing enzymes for sequencing reactions |
AU2018261007A1 (en) | 2017-05-05 | 2019-11-07 | Quantum-Si Incorporated | Substrates having modified surface reactivity and antifouling properties in biological reactions |
KR102626317B1 (en) | 2017-07-24 | 2024-01-18 | 퀀텀-에스아이 인코포레이티드 | High-Intensity Labeled Reactant Compositions and Sequencing Methods |
GB201715684D0 (en) * | 2017-09-28 | 2017-11-15 | Univ Gent | Means and methods for single molecule peptide sequencing |
AU2018358057B2 (en) | 2017-10-31 | 2023-03-02 | Encodia, Inc. | Kits for analysis using nucleic acid encoding and/or label |
BR112021008098A2 (en) | 2018-11-15 | 2021-08-10 | Quantum-Si Incorporated | methods and compositions for protein sequencing |
JP2022523624A (en) | 2019-01-08 | 2022-04-26 | マサチューセッツ インスティテュート オブ テクノロジー | Monomolecular protein and peptide sequencing |
WO2020154546A1 (en) | 2019-01-23 | 2020-07-30 | Quantum-Si Incorporated | High intensity labeled reactant compositions and methods for sequencing |
CA3134776A1 (en) * | 2019-03-26 | 2020-10-01 | Encodia, Inc. | Modified cleavases, uses thereof and related kits |
US11427814B2 (en) | 2019-03-26 | 2022-08-30 | Encodia, Inc. | Modified cleavases, uses thereof and related kits |
GB201904697D0 (en) | 2019-04-03 | 2019-05-15 | Vib Vzw | Means and methods for single molecule peptide sequencing |
EP3963070A4 (en) | 2019-04-30 | 2023-02-22 | Encodia, Inc. | Methods for preparing analytes and related kits |
US11346842B2 (en) | 2019-06-20 | 2022-05-31 | Massachusetts Institute Of Technology | Single molecule peptide sequencing methods |
KR20220029708A (en) | 2019-06-28 | 2022-03-08 | 퀀텀-에스아이 인코포레이티드 | Polymerases for Sequencing Reactions |
AU2020364058A1 (en) | 2019-10-11 | 2022-05-26 | Quantum-Si Incorporated | Surface modification in the vapor phase |
BR112022007937A2 (en) * | 2019-10-28 | 2022-08-30 | Quantum Si Inc | METHODS OF SEQUENCING AND RECONSTRUCTION OF A SINGLE POLYPEPTIDE |
JP2023500485A (en) * | 2019-10-28 | 2023-01-06 | クアンタム-エスアイ インコーポレイテッド | Methods for single cell protein and nucleic acid sequencing |
JP2023500486A (en) * | 2019-10-28 | 2023-01-06 | クアンタム-エスアイ インコーポレイテッド | Methods, Kits, and Devices for Preparing Samples for Multiplexed Polypeptide Sequencing |
AU2021210878A1 (en) | 2020-01-21 | 2022-09-15 | Quantum-Si Incorporated | Compounds and methods for selective C-terminal labeling |
GB2610078A (en) * | 2020-04-13 | 2023-02-22 | Erisyon Inc | Single molecule N-terminal sequencing using electrical signals |
WO2023196642A1 (en) * | 2022-04-08 | 2023-10-12 | Glyphic Biotechnologies, Inc. | Methods and systems for processing polymeric analytes |
WO2024015875A2 (en) * | 2022-07-12 | 2024-01-18 | Abrus Bio, Inc. | Determination of protein information by recoding amino acid polymers into dna polymers |
WO2024072614A1 (en) * | 2022-09-27 | 2024-04-04 | Nautilus Subsidiary, Inc. | Polypeptide capture, in situ fragmentation and identification |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040005582A1 (en) | 2000-08-10 | 2004-01-08 | Nanobiodynamics, Incorporated | Biospecific desorption microflow systems and methods for studying biospecific interactions and their modulators |
US7611834B2 (en) | 2005-09-30 | 2009-11-03 | Sandia Corporation | Methods and devices for protein assays |
US20070218503A1 (en) | 2006-02-13 | 2007-09-20 | Mitra Robi D | Methods of polypeptide identification, and compositions therefor |
US8340951B2 (en) | 2007-12-13 | 2012-12-25 | University Of Washington | Synthetic enzymes derived from computational design |
WO2010065531A1 (en) | 2008-12-01 | 2010-06-10 | Robi David Mitra | Single molecule protein screening |
-
2014
- 2014-03-14 US US14/211,448 patent/US9435810B2/en active Active
-
2016
- 2016-09-02 US US15/255,433 patent/US20170052194A1/en not_active Abandoned
-
2020
- 2020-06-22 US US16/907,813 patent/US10852305B2/en active Active
- 2020-11-04 US US17/088,898 patent/US20210072252A1/en not_active Abandoned
Also Published As
Publication number | Publication date |
---|---|
US20200319200A1 (en) | 2020-10-08 |
US9435810B2 (en) | 2016-09-06 |
US20140273004A1 (en) | 2014-09-18 |
US20170052194A1 (en) | 2017-02-23 |
US10852305B2 (en) | 2020-12-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10852305B2 (en) | Molecules and methods for iterative polypeptide analysis and processing | |
Webb et al. | Identification of protein N-terminal methyltransferases in yeast and humans | |
US9566335B1 (en) | Protein sequencing method and reagents | |
Stiffler et al. | Uncovering quantitative protein interaction networks for mouse PDZ domains using protein microarrays | |
US20040110186A1 (en) | Methods for isolating and labeling sample molecules | |
Merkel et al. | Functional protein microarrays: just how functional are they? | |
CA2745197A1 (en) | Concurrent identification of multitudes of polypeptides | |
Turk | Understanding and exploiting substrate recognition by protein kinases | |
US20150087526A1 (en) | Peptide identification and sequencing by single-molecule detection of peptides undergoing degradation | |
RU2714156C2 (en) | C5-binding polypeptides | |
Chojnacki et al. | Polyubiquitin-photoactivatable crosslinking reagents for mapping ubiquitin interactome identify Rpn1 as a proteasome ubiquitin-associating subunit | |
Lee et al. | Accurate MALDI-TOF/TOF Sequencing of One-Bead− One-Compound Peptide Libraries with Application to the Identification of Multiligand Protein Affinity Agents Using in Situ Click Chemistry Screening | |
Huse et al. | Semisynthesis of hyperphosphorylated type I TGFβ receptor: addressing the mechanism of kinase activation | |
Zhou et al. | Site-selective protein immobilization by covalent modification of GST fusion proteins | |
Trinh et al. | Profiling the substrate specificity of protein kinases by on-bead screening of peptide libraries | |
Pellegrini et al. | Mapping the subsite preferences of protein tyrosine phosphatase PTP-1B using combinatorial chemistry approaches | |
Jung et al. | A fusion protein expression analysis using surface plasmon resonance imaging | |
Tivon et al. | Covalent flexible peptide docking in Rosetta | |
Hayne et al. | We FRET so you don’t have to: New models of the lipoprotein lipase dimer | |
Yeung et al. | Inference of multisite phosphorylation rate constants and their modulation by pathogenic mutations | |
Warthaka et al. | Phosphopeptide Modification and Enrichment by Oxidation–Reduction Condensation | |
US20060105407A1 (en) | Protein chip for analyzing interaction between protein and substrate peptide thereof | |
Kunys et al. | Specificity Profiling of Protein‐Binding Domains Using One‐Bead‐One‐Compound Peptide Libraries | |
WO2004065928A2 (en) | Light induced immobilisation | |
Sun et al. | Peptide microarrays for high-throughput studies of Ser/Thr phosphatases |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: WASHINGTON UNIVERSITY, MISSOURI Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HAVRANEK, JAMES;BORGO, BENJAMIN;REEL/FRAME:054270/0315 Effective date: 20120828 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: APPLICATION DISPATCHED FROM PREEXAM, NOT YET DOCKETED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |