US20210101945A1 - Polypeptides Capable of Forming Homo-Oligomers with Modular Hydrogen Bond Network-Mediated Specificity and Their Design - Google Patents
Polypeptides Capable of Forming Homo-Oligomers with Modular Hydrogen Bond Network-Mediated Specificity and Their Design Download PDFInfo
- Publication number
- US20210101945A1 US20210101945A1 US16/993,975 US202016993975A US2021101945A1 US 20210101945 A1 US20210101945 A1 US 20210101945A1 US 202016993975 A US202016993975 A US 202016993975A US 2021101945 A1 US2021101945 A1 US 2021101945A1
- Authority
- US
- United States
- Prior art keywords
- hydrogen bond
- residue
- networks
- seq
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 229910052739 hydrogen Inorganic materials 0.000 title claims abstract description 246
- 239000001257 hydrogen Substances 0.000 title claims abstract description 246
- 108090000765 processed proteins & peptides Proteins 0.000 title claims abstract description 61
- 102000004196 processed proteins & peptides Human genes 0.000 title claims abstract description 56
- 229920001184 polypeptide Polymers 0.000 title claims abstract description 50
- 230000001404 mediated effect Effects 0.000 title abstract description 6
- 238000013461 design Methods 0.000 title description 166
- 238000000034 method Methods 0.000 claims abstract description 88
- 238000012216 screening Methods 0.000 claims abstract description 9
- 108090000623 proteins and genes Proteins 0.000 claims description 79
- 102000004169 proteins and genes Human genes 0.000 claims description 75
- 230000003993 interaction Effects 0.000 claims description 63
- 108700005078 Synthetic Genes Proteins 0.000 claims description 10
- 238000001727 in vivo Methods 0.000 claims description 8
- 235000018102 proteins Nutrition 0.000 description 73
- 125000004429 atom Chemical group 0.000 description 33
- 230000006870 function Effects 0.000 description 30
- 235000001014 amino acid Nutrition 0.000 description 28
- 125000003275 alpha amino acid group Chemical group 0.000 description 26
- 150000001413 amino acids Chemical class 0.000 description 25
- 229940024606 amino acid Drugs 0.000 description 24
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 22
- 238000003491 array Methods 0.000 description 20
- 230000002209 hydrophobic effect Effects 0.000 description 20
- 238000012856 packing Methods 0.000 description 19
- 210000004027 cell Anatomy 0.000 description 17
- 238000004891 communication Methods 0.000 description 17
- 239000000539 dimer Substances 0.000 description 17
- 239000013078 crystal Substances 0.000 description 16
- 150000007523 nucleic acids Chemical class 0.000 description 16
- 239000013604 expression vector Substances 0.000 description 15
- RAXXELZNTBOGNW-UHFFFAOYSA-N imidazole Natural products C1=CNC=N1 RAXXELZNTBOGNW-UHFFFAOYSA-N 0.000 description 15
- ZDXPYRJPNDTMRX-VKHMYHEASA-N L-glutamine Chemical compound OC(=O)[C@@H](N)CCC(N)=O ZDXPYRJPNDTMRX-VKHMYHEASA-N 0.000 description 14
- 239000002609 medium Substances 0.000 description 14
- 239000013638 trimer Substances 0.000 description 13
- 108020004414 DNA Proteins 0.000 description 12
- 230000014509 gene expression Effects 0.000 description 12
- 238000000235 small-angle X-ray scattering Methods 0.000 description 12
- ROHFNLRQFUQHCH-YFKPBYRVSA-N L-leucine Chemical compound CC(C)C[C@H](N)C(O)=O ROHFNLRQFUQHCH-YFKPBYRVSA-N 0.000 description 11
- OUYCCCASQSFEME-QMMMGPOBSA-N L-tyrosine Chemical compound OC(=O)[C@@H](N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-QMMMGPOBSA-N 0.000 description 11
- 230000027455 binding Effects 0.000 description 11
- 238000004364 calculation method Methods 0.000 description 11
- 238000002983 circular dichroism Methods 0.000 description 11
- 239000011780 sodium chloride Substances 0.000 description 11
- DCXYFEDJOCDNAF-REOHCLBHSA-N L-asparagine Chemical compound OC(=O)[C@@H](N)CC(N)=O DCXYFEDJOCDNAF-REOHCLBHSA-N 0.000 description 10
- 239000007983 Tris buffer Substances 0.000 description 10
- 239000000523 sample Substances 0.000 description 10
- 238000006467 substitution reaction Methods 0.000 description 10
- LENZDBCJOHFCAS-UHFFFAOYSA-N tris Chemical compound OCC(N)(CO)CO LENZDBCJOHFCAS-UHFFFAOYSA-N 0.000 description 10
- MTCFGRXMJLQNBG-REOHCLBHSA-N (2S)-2-Amino-3-hydroxypropansäure Chemical compound OC[C@H](N)C(O)=O MTCFGRXMJLQNBG-REOHCLBHSA-N 0.000 description 9
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 9
- AGPKZVBTJJNPAG-WHFBIAKZSA-N L-isoleucine Chemical compound CC[C@H](C)[C@H](N)C(O)=O AGPKZVBTJJNPAG-WHFBIAKZSA-N 0.000 description 9
- 239000000370 acceptor Substances 0.000 description 9
- 238000013500 data storage Methods 0.000 description 9
- 239000000178 monomer Substances 0.000 description 9
- 238000006384 oligomerization reaction Methods 0.000 description 9
- 239000000243 solution Substances 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 8
- CKLJMWTZIZZHCS-REOHCLBHSA-N L-aspartic acid Chemical compound OC(=O)[C@@H](N)CC(O)=O CKLJMWTZIZZHCS-REOHCLBHSA-N 0.000 description 7
- KDXKERNSBIXSRK-UHFFFAOYSA-N Lysine Natural products NCCCCC(N)C(O)=O KDXKERNSBIXSRK-UHFFFAOYSA-N 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- LOKCTEFSRHRXRJ-UHFFFAOYSA-I dipotassium trisodium dihydrogen phosphate hydrogen phosphate dichloride Chemical compound P(=O)(O)(O)[O-].[K+].P(=O)(O)([O-])[O-].[Na+].[Na+].[Cl-].[K+].[Cl-].[Na+] LOKCTEFSRHRXRJ-UHFFFAOYSA-I 0.000 description 7
- 108020004707 nucleic acids Proteins 0.000 description 7
- 102000039446 nucleic acids Human genes 0.000 description 7
- 239000002953 phosphate buffered saline Substances 0.000 description 7
- 238000001542 size-exclusion chromatography Methods 0.000 description 7
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 7
- WHUUTDBJXJRKMK-UHFFFAOYSA-N Glutamic acid Natural products OC(=O)C(N)CCC(O)=O WHUUTDBJXJRKMK-UHFFFAOYSA-N 0.000 description 6
- COLNVLDHVKWLRT-QMMMGPOBSA-N L-phenylalanine Chemical compound OC(=O)[C@@H](N)CC1=CC=CC=C1 COLNVLDHVKWLRT-QMMMGPOBSA-N 0.000 description 6
- AYFVYJQAPQTCCC-GBXIJSLDSA-N L-threonine Chemical compound C[C@@H](O)[C@H](N)C(O)=O AYFVYJQAPQTCCC-GBXIJSLDSA-N 0.000 description 6
- KZSNJWFQEVHDMF-UHFFFAOYSA-N Valine Natural products CC(C)C(N)C(O)=O KZSNJWFQEVHDMF-UHFFFAOYSA-N 0.000 description 6
- 125000003295 alanine group Chemical group N[C@@H](C)C(=O)* 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 6
- KRKNYBCHXYNGOX-UHFFFAOYSA-N citric acid Chemical compound OC(=O)CC(O)(C(O)=O)CC(O)=O KRKNYBCHXYNGOX-UHFFFAOYSA-N 0.000 description 6
- 238000010276 construction Methods 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 6
- 230000002349 favourable effect Effects 0.000 description 6
- PJJJBBJSCAKJQF-UHFFFAOYSA-N guanidinium chloride Chemical compound [Cl-].NC(N)=[NH2+] PJJJBBJSCAKJQF-UHFFFAOYSA-N 0.000 description 6
- 239000000710 homodimer Substances 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 239000013598 vector Substances 0.000 description 6
- XUJNEKJLAYXESH-REOHCLBHSA-N L-Cysteine Chemical compound SC[C@H](N)C(O)=O XUJNEKJLAYXESH-REOHCLBHSA-N 0.000 description 5
- 230000008901 benefit Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000000670 limiting effect Effects 0.000 description 5
- 239000012139 lysis buffer Substances 0.000 description 5
- 239000013612 plasmid Substances 0.000 description 5
- 229920001223 polyethylene glycol Polymers 0.000 description 5
- 238000000746 purification Methods 0.000 description 5
- 239000011347 resin Substances 0.000 description 5
- 229920005989 resin Polymers 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 230000004568 DNA-binding Effects 0.000 description 4
- 241000588724 Escherichia coli Species 0.000 description 4
- QIVBCDIJIAJPQS-VIFPVBQESA-N L-tryptophane Chemical compound C1=CC=C2C(C[C@H](N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-VIFPVBQESA-N 0.000 description 4
- 108090000190 Thrombin Proteins 0.000 description 4
- QIVBCDIJIAJPQS-UHFFFAOYSA-N Tryptophan Natural products C1=CC=C2C(CC(N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-UHFFFAOYSA-N 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 4
- 238000013459 approach Methods 0.000 description 4
- 235000009582 asparagine Nutrition 0.000 description 4
- 239000000872 buffer Substances 0.000 description 4
- 238000012512 characterization method Methods 0.000 description 4
- 238000001142 circular dichroism spectrum Methods 0.000 description 4
- 238000003776 cleavage reaction Methods 0.000 description 4
- 230000021615 conjugation Effects 0.000 description 4
- 125000000151 cysteine group Chemical group N[C@@H](CS)C(=O)* 0.000 description 4
- 230000012010 growth Effects 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 239000000203 mixture Substances 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- YBYRMVIVWMBXKQ-UHFFFAOYSA-N phenylmethanesulfonyl fluoride Chemical compound FS(=O)(=O)CC1=CC=CC=C1 YBYRMVIVWMBXKQ-UHFFFAOYSA-N 0.000 description 4
- 230000007017 scission Effects 0.000 description 4
- 238000013515 script Methods 0.000 description 4
- 238000001464 small-angle X-ray scattering data Methods 0.000 description 4
- 239000002904 solvent Substances 0.000 description 4
- 238000001228 spectrum Methods 0.000 description 4
- 229960004072 thrombin Drugs 0.000 description 4
- 238000003158 yeast two-hybrid assay Methods 0.000 description 4
- 108090000790 Enzymes Proteins 0.000 description 3
- 102000004190 Enzymes Human genes 0.000 description 3
- 108010093488 His-His-His-His-His-His Proteins 0.000 description 3
- 150000008575 L-amino acids Chemical class 0.000 description 3
- ROHFNLRQFUQHCH-UHFFFAOYSA-N Leucine Natural products CC(C)CC(N)C(O)=O ROHFNLRQFUQHCH-UHFFFAOYSA-N 0.000 description 3
- 230000004913 activation Effects 0.000 description 3
- 239000000427 antigen Substances 0.000 description 3
- 108091007433 antigens Proteins 0.000 description 3
- 102000036639 antigens Human genes 0.000 description 3
- 150000001508 asparagines Chemical class 0.000 description 3
- 239000011230 binding agent Substances 0.000 description 3
- 210000004899 c-terminal region Anatomy 0.000 description 3
- 238000002425 crystallisation Methods 0.000 description 3
- 230000008025 crystallization Effects 0.000 description 3
- 230000001419 dependent effect Effects 0.000 description 3
- 210000001163 endosome Anatomy 0.000 description 3
- 229940088598 enzyme Drugs 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- HNDVDQJCIGZPNO-UHFFFAOYSA-N histidine Natural products OC(=O)C(N)CC1=CN=CN1 HNDVDQJCIGZPNO-UHFFFAOYSA-N 0.000 description 3
- 125000004435 hydrogen atom Chemical group [H]* 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 239000000155 melt Substances 0.000 description 3
- 238000010369 molecular cloning Methods 0.000 description 3
- 238000000569 multi-angle light scattering Methods 0.000 description 3
- 230000006916 protein interaction Effects 0.000 description 3
- 238000003259 recombinant expression Methods 0.000 description 3
- 238000004448 titration Methods 0.000 description 3
- 230000003612 virological effect Effects 0.000 description 3
- 239000011534 wash buffer Substances 0.000 description 3
- 238000002424 x-ray crystallography Methods 0.000 description 3
- BDKLKNJTMLIAFE-UHFFFAOYSA-N 2-(3-fluorophenyl)-1,3-oxazole-4-carbaldehyde Chemical compound FC1=CC=CC(C=2OC=C(C=O)N=2)=C1 BDKLKNJTMLIAFE-UHFFFAOYSA-N 0.000 description 2
- BSYNRYMUTXBXSQ-UHFFFAOYSA-N Aspirin Chemical compound CC(=O)OC1=CC=CC=C1C(O)=O BSYNRYMUTXBXSQ-UHFFFAOYSA-N 0.000 description 2
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 description 2
- 108091026890 Coding region Proteins 0.000 description 2
- 102100039556 Galectin-4 Human genes 0.000 description 2
- 101000608765 Homo sapiens Galectin-4 Proteins 0.000 description 2
- 108091005804 Peptidases Proteins 0.000 description 2
- 239000004365 Protease Substances 0.000 description 2
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 2
- 239000012505 Superdex™ Substances 0.000 description 2
- 238000004279 X-ray Guinier Methods 0.000 description 2
- 238000010521 absorption reaction Methods 0.000 description 2
- 230000002378 acidificating effect Effects 0.000 description 2
- 235000004279 alanine Nutrition 0.000 description 2
- BFNBIHQBYMNNAN-UHFFFAOYSA-N ammonium sulfate Chemical compound N.N.OS(O)(=O)=O BFNBIHQBYMNNAN-UHFFFAOYSA-N 0.000 description 2
- 229910052921 ammonium sulfate Inorganic materials 0.000 description 2
- 235000011130 ammonium sulphate Nutrition 0.000 description 2
- 210000004102 animal cell Anatomy 0.000 description 2
- 238000003556 assay Methods 0.000 description 2
- PXXJHWLDUBFPOL-UHFFFAOYSA-N benzamidine Chemical compound NC(=N)C1=CC=CC=C1 PXXJHWLDUBFPOL-UHFFFAOYSA-N 0.000 description 2
- 230000010261 cell growth Effects 0.000 description 2
- 238000005119 centrifugation Methods 0.000 description 2
- 238000000205 computational method Methods 0.000 description 2
- 238000001816 cooling Methods 0.000 description 2
- 125000004122 cyclic group Chemical group 0.000 description 2
- 235000018417 cysteine Nutrition 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 238000002050 diffraction method Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 238000002523 gelfiltration Methods 0.000 description 2
- 235000004554 glutamine Nutrition 0.000 description 2
- 239000001963 growth medium Substances 0.000 description 2
- 230000001976 improved effect Effects 0.000 description 2
- 238000000126 in silico method Methods 0.000 description 2
- 238000000338 in vitro Methods 0.000 description 2
- 230000001965 increasing effect Effects 0.000 description 2
- 230000001939 inductive effect Effects 0.000 description 2
- 230000000977 initiatory effect Effects 0.000 description 2
- 150000002632 lipids Chemical class 0.000 description 2
- 239000007788 liquid Substances 0.000 description 2
- KWGKDLIKAYFUFQ-UHFFFAOYSA-M lithium chloride Chemical compound [Li+].[Cl-] KWGKDLIKAYFUFQ-UHFFFAOYSA-M 0.000 description 2
- 210000003712 lysosome Anatomy 0.000 description 2
- 230000001868 lysosomic effect Effects 0.000 description 2
- BDAGIHXWWSANSR-UHFFFAOYSA-N methanoic acid Natural products OC=O BDAGIHXWWSANSR-UHFFFAOYSA-N 0.000 description 2
- 239000006151 minimal media Substances 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 239000002086 nanomaterial Substances 0.000 description 2
- 230000006855 networking Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 229920000620 organic polymer Polymers 0.000 description 2
- 229910052698 phosphorus Inorganic materials 0.000 description 2
- 230000002829 reductive effect Effects 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000012772 sequence design Methods 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 150000003384 small molecules Chemical class 0.000 description 2
- 235000017281 sodium acetate Nutrition 0.000 description 2
- 229940087562 sodium acetate trihydrate Drugs 0.000 description 2
- PUZPDOWCWNUUKD-UHFFFAOYSA-M sodium fluoride Chemical compound [F-].[Na+] PUZPDOWCWNUUKD-UHFFFAOYSA-M 0.000 description 2
- 238000010561 standard procedure Methods 0.000 description 2
- 239000007858 starting material Substances 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 229910052717 sulfur Inorganic materials 0.000 description 2
- 238000013518 transcription Methods 0.000 description 2
- 230000035897 transcription Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 239000003981 vehicle Substances 0.000 description 2
- 238000001086 yeast two-hybrid system Methods 0.000 description 2
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 1
- QKNYBSVHEMOAJP-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;hydron;chloride Chemical compound Cl.OCC(N)(CO)CO QKNYBSVHEMOAJP-UHFFFAOYSA-N 0.000 description 1
- BFSVOASYOCHEOV-UHFFFAOYSA-N 2-diethylaminoethanol Chemical compound CCN(CC)CCO BFSVOASYOCHEOV-UHFFFAOYSA-N 0.000 description 1
- OSWFIVFLDKOXQC-UHFFFAOYSA-N 4-(3-methoxyphenyl)aniline Chemical compound COC1=CC=CC(C=2C=CC(N)=CC=2)=C1 OSWFIVFLDKOXQC-UHFFFAOYSA-N 0.000 description 1
- FWMNVWWHGCHHJJ-SKKKGAJSSA-N 4-amino-1-[(2r)-6-amino-2-[[(2r)-2-[[(2r)-2-[[(2r)-2-amino-3-phenylpropanoyl]amino]-3-phenylpropanoyl]amino]-4-methylpentanoyl]amino]hexanoyl]piperidine-4-carboxylic acid Chemical compound C([C@H](C(=O)N[C@H](CC(C)C)C(=O)N[C@H](CCCCN)C(=O)N1CCC(N)(CC1)C(O)=O)NC(=O)[C@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FWMNVWWHGCHHJJ-SKKKGAJSSA-N 0.000 description 1
- GGZZISOUXJHYOY-UHFFFAOYSA-N 8-amino-4-hydroxynaphthalene-2-sulfonic acid Chemical compound C1=C(S(O)(=O)=O)C=C2C(N)=CC=CC2=C1O GGZZISOUXJHYOY-UHFFFAOYSA-N 0.000 description 1
- 102000007469 Actins Human genes 0.000 description 1
- 108010085238 Actins Proteins 0.000 description 1
- ATRRKUHOCOJYRX-UHFFFAOYSA-N Ammonium bicarbonate Chemical compound [NH4+].OC([O-])=O ATRRKUHOCOJYRX-UHFFFAOYSA-N 0.000 description 1
- 229910000013 Ammonium bicarbonate Inorganic materials 0.000 description 1
- 239000004475 Arginine Substances 0.000 description 1
- -1 Asn and Gln Chemical class 0.000 description 1
- DCXYFEDJOCDNAF-UHFFFAOYSA-N Asparagine Natural products OC(=O)C(N)CC(N)=O DCXYFEDJOCDNAF-UHFFFAOYSA-N 0.000 description 1
- 239000007989 BIS-Tris Propane buffer Substances 0.000 description 1
- 239000004135 Bone phosphate Substances 0.000 description 1
- 0 C1CC*CC1 Chemical compound C1CC*CC1 0.000 description 1
- CXOZQHPXKPDQGT-UHFFFAOYSA-N CC1C=CCC1 Chemical compound CC1C=CCC1 CXOZQHPXKPDQGT-UHFFFAOYSA-N 0.000 description 1
- UGTJLJZQQFGTJD-UHFFFAOYSA-N Carbonylcyanide-3-chlorophenylhydrazone Chemical compound ClC1=CC=CC(NN=C(C#N)C#N)=C1 UGTJLJZQQFGTJD-UHFFFAOYSA-N 0.000 description 1
- 208000034628 Celiac artery compression syndrome Diseases 0.000 description 1
- 108020004705 Codon Proteins 0.000 description 1
- 150000008574 D-amino acids Chemical class 0.000 description 1
- 229920002307 Dextran Polymers 0.000 description 1
- UPEZCKBFRMILAV-JNEQICEOSA-N Ecdysone Natural products O=C1[C@H]2[C@@](C)([C@@H]3C([C@@]4(O)[C@@](C)([C@H]([C@H]([C@@H](O)CCC(O)(C)C)C)CC4)CC3)=C1)C[C@H](O)[C@H](O)C2 UPEZCKBFRMILAV-JNEQICEOSA-N 0.000 description 1
- 241001198387 Escherichia coli BL21(DE3) Species 0.000 description 1
- 208000011616 HELIX syndrome Diseases 0.000 description 1
- UFHFLCQGNIYNRP-UHFFFAOYSA-N Hydrogen Chemical compound [H][H] UFHFLCQGNIYNRP-UHFFFAOYSA-N 0.000 description 1
- QNAYBMKLOCPYGJ-REOHCLBHSA-N L-alanine Chemical compound C[C@H](N)C(O)=O QNAYBMKLOCPYGJ-REOHCLBHSA-N 0.000 description 1
- HNDVDQJCIGZPNO-YFKPBYRVSA-N L-histidine Chemical compound OC(=O)[C@@H](N)CC1=CN=CN1 HNDVDQJCIGZPNO-YFKPBYRVSA-N 0.000 description 1
- FFEARJCKVFRZRR-BYPYZUCNSA-N L-methionine Chemical compound CSCC[C@H](N)C(O)=O FFEARJCKVFRZRR-BYPYZUCNSA-N 0.000 description 1
- LRQKBLKVPFOOQJ-YFKPBYRVSA-N L-norleucine Chemical compound CCCC[C@H]([NH3+])C([O-])=O LRQKBLKVPFOOQJ-YFKPBYRVSA-N 0.000 description 1
- 125000000510 L-tryptophano group Chemical group [H]C1=C([H])C([H])=C2N([H])C([H])=C(C([H])([H])[C@@]([H])(C(O[H])=O)N([H])[*])C2=C1[H] 0.000 description 1
- KZSNJWFQEVHDMF-BYPYZUCNSA-N L-valine Chemical compound CC(C)[C@H](N)C(O)=O KZSNJWFQEVHDMF-BYPYZUCNSA-N 0.000 description 1
- 239000004472 Lysine Substances 0.000 description 1
- 238000012614 Monte-Carlo sampling Methods 0.000 description 1
- 102000016943 Muramidase Human genes 0.000 description 1
- 108010014251 Muramidase Proteins 0.000 description 1
- 108010062010 N-Acetylmuramoyl-L-alanine Amidase Proteins 0.000 description 1
- 208000009869 Neu-Laxova syndrome Diseases 0.000 description 1
- 108010077850 Nuclear Localization Signals Proteins 0.000 description 1
- 108091028043 Nucleic acid sequence Proteins 0.000 description 1
- 102000035195 Peptidases Human genes 0.000 description 1
- 241000233805 Phoenix Species 0.000 description 1
- 241000158500 Platanus racemosa Species 0.000 description 1
- 229920002565 Polyethylene Glycol 400 Polymers 0.000 description 1
- 229940124158 Protease/peptidase inhibitor Drugs 0.000 description 1
- 102100037486 Reverse transcriptase/ribonuclease H Human genes 0.000 description 1
- 238000010847 SEQUEST Methods 0.000 description 1
- MTCFGRXMJLQNBG-UHFFFAOYSA-N Serine Natural products OCC(N)C(O)=O MTCFGRXMJLQNBG-UHFFFAOYSA-N 0.000 description 1
- 239000004098 Tetracycline Substances 0.000 description 1
- AYFVYJQAPQTCCC-UHFFFAOYSA-N Threonine Natural products CC(O)C(N)C(O)=O AYFVYJQAPQTCCC-UHFFFAOYSA-N 0.000 description 1
- 239000004473 Threonine Substances 0.000 description 1
- OHGNSVACHBZKSS-KWQFWETISA-N Trp-Ala Chemical compound C1=CC=C2C(C[C@H]([NH3+])C(=O)N[C@@H](C)C([O-])=O)=CNC2=C1 OHGNSVACHBZKSS-KWQFWETISA-N 0.000 description 1
- 108090000631 Trypsin Proteins 0.000 description 1
- 102000004142 Trypsin Human genes 0.000 description 1
- 230000021736 acetylation Effects 0.000 description 1
- 238000006640 acetylation reaction Methods 0.000 description 1
- 239000002253 acid Substances 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000001042 affinity chromatography Methods 0.000 description 1
- 150000001295 alanines Chemical class 0.000 description 1
- 125000001931 aliphatic group Chemical group 0.000 description 1
- UPEZCKBFRMILAV-UHFFFAOYSA-N alpha-Ecdysone Natural products C1C(O)C(O)CC2(C)C(CCC3(C(C(C(O)CCC(C)(C)O)C)CCC33O)C)C3=CC(=O)C21 UPEZCKBFRMILAV-UHFFFAOYSA-N 0.000 description 1
- 230000009435 amidation Effects 0.000 description 1
- 238000007112 amidation reaction Methods 0.000 description 1
- 125000000539 amino acid group Chemical group 0.000 description 1
- 235000012538 ammonium bicarbonate Nutrition 0.000 description 1
- 239000001099 ammonium carbonate Substances 0.000 description 1
- ODKSFYDXXFIFQN-UHFFFAOYSA-N arginine Natural products OC(=O)C(N)CCCNC(N)=N ODKSFYDXXFIFQN-UHFFFAOYSA-N 0.000 description 1
- 125000003118 aryl group Chemical group 0.000 description 1
- 229960001230 asparagine Drugs 0.000 description 1
- 235000003704 aspartic acid Nutrition 0.000 description 1
- 230000010310 bacterial transformation Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- OQFSQFPPLPISGP-UHFFFAOYSA-N beta-carboxyaspartic acid Natural products OC(=O)C(N)C(C(O)=O)C(O)=O OQFSQFPPLPISGP-UHFFFAOYSA-N 0.000 description 1
- 230000003115 biocidal effect Effects 0.000 description 1
- BBWBEZAMXFGUGK-UHFFFAOYSA-N bis(dodecylsulfanyl)-methylarsane Chemical compound CCCCCCCCCCCCS[As](C)SCCCCCCCCCCCC BBWBEZAMXFGUGK-UHFFFAOYSA-N 0.000 description 1
- HHKZCCWKTZRCCL-UHFFFAOYSA-N bis-tris propane Chemical compound OCC(CO)(CO)NCCCNC(CO)(CO)CO HHKZCCWKTZRCCL-UHFFFAOYSA-N 0.000 description 1
- 239000011575 calcium Substances 0.000 description 1
- 239000001506 calcium phosphate Substances 0.000 description 1
- 229910000389 calcium phosphate Inorganic materials 0.000 description 1
- 235000011010 calcium phosphates Nutrition 0.000 description 1
- FPPNZSSZRUTDAP-UWFZAAFLSA-N carbenicillin Chemical compound N([C@H]1[C@H]2SC([C@@H](N2C1=O)C(O)=O)(C)C)C(=O)C(C(O)=O)C1=CC=CC=C1 FPPNZSSZRUTDAP-UWFZAAFLSA-N 0.000 description 1
- 229960003669 carbenicillin Drugs 0.000 description 1
- 230000003197 catalytic effect Effects 0.000 description 1
- 238000006555 catalytic reaction Methods 0.000 description 1
- 210000000170 cell membrane Anatomy 0.000 description 1
- 239000003638 chemical reducing agent Substances 0.000 description 1
- 239000013611 chromosomal DNA Substances 0.000 description 1
- 238000000975 co-precipitation Methods 0.000 description 1
- 239000002299 complementary DNA Substances 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 238000012258 culturing Methods 0.000 description 1
- XUJNEKJLAYXESH-UHFFFAOYSA-N cysteine Natural products SCC(N)C(O)=O XUJNEKJLAYXESH-UHFFFAOYSA-N 0.000 description 1
- 150000001945 cysteines Chemical class 0.000 description 1
- 210000000172 cytosol Anatomy 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000004925 denaturation Methods 0.000 description 1
- 230000036425 denaturation Effects 0.000 description 1
- 230000000368 destabilizing effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 230000029087 digestion Effects 0.000 description 1
- 238000010790 dilution Methods 0.000 description 1
- 239000012895 dilution Substances 0.000 description 1
- UPEZCKBFRMILAV-JMZLNJERSA-N ecdysone Chemical compound C1[C@@H](O)[C@@H](O)C[C@]2(C)[C@@H](CC[C@@]3([C@@H]([C@@H]([C@H](O)CCC(C)(C)O)C)CC[C@]33O)C)C3=CC(=O)[C@@H]21 UPEZCKBFRMILAV-JMZLNJERSA-N 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000004520 electroporation Methods 0.000 description 1
- 238000005538 encapsulation Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 102000034287 fluorescent proteins Human genes 0.000 description 1
- 108091006047 fluorescent proteins Proteins 0.000 description 1
- 235000019253 formic acid Nutrition 0.000 description 1
- 235000013922 glutamic acid Nutrition 0.000 description 1
- 239000004220 glutamic acid Substances 0.000 description 1
- ZDXPYRJPNDTMRX-UHFFFAOYSA-N glutamine Natural products OC(=O)C(N)CCC(N)=O ZDXPYRJPNDTMRX-UHFFFAOYSA-N 0.000 description 1
- 150000002309 glutamines Chemical class 0.000 description 1
- 230000013595 glycosylation Effects 0.000 description 1
- 238000006206 glycosylation reaction Methods 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 125000005842 heteroatom Chemical group 0.000 description 1
- 125000001165 hydrophobic group Chemical group 0.000 description 1
- 239000012216 imaging agent Substances 0.000 description 1
- 230000028993 immune response Effects 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 238000011534 incubation Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- PGLTVOMIXTUURA-UHFFFAOYSA-N iodoacetamide Chemical compound NC(=O)CI PGLTVOMIXTUURA-UHFFFAOYSA-N 0.000 description 1
- 150000002500 ions Chemical class 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 229960000310 isoleucine Drugs 0.000 description 1
- AGPKZVBTJJNPAG-UHFFFAOYSA-N isoleucine Natural products CCC(C)C(N)C(O)=O AGPKZVBTJJNPAG-UHFFFAOYSA-N 0.000 description 1
- BPHPUYQFMNQIOC-NXRLNHOXSA-N isopropyl beta-D-thiogalactopyranoside Chemical compound CC(C)S[C@@H]1O[C@H](CO)[C@H](O)[C@H](O)[C@H]1O BPHPUYQFMNQIOC-NXRLNHOXSA-N 0.000 description 1
- 229930027917 kanamycin Natural products 0.000 description 1
- 229960000318 kanamycin Drugs 0.000 description 1
- SBUJHOSQTJFQJX-NOAMYHISSA-N kanamycin Chemical compound O[C@@H]1[C@@H](O)[C@H](O)[C@@H](CN)O[C@@H]1O[C@H]1[C@H](O)[C@@H](O[C@@H]2[C@@H]([C@@H](N)[C@H](O)[C@@H](CO)O2)O)[C@H](N)C[C@@H]1N SBUJHOSQTJFQJX-NOAMYHISSA-N 0.000 description 1
- 229930182823 kanamycin A Natural products 0.000 description 1
- 239000003446 ligand Substances 0.000 description 1
- 239000002502 liposome Substances 0.000 description 1
- 238000004811 liquid chromatography Methods 0.000 description 1
- 238000001294 liquid chromatography-tandem mass spectrometry Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000006166 lysate Substances 0.000 description 1
- 229960000274 lysozyme Drugs 0.000 description 1
- 239000004325 lysozyme Substances 0.000 description 1
- 235000010335 lysozyme Nutrition 0.000 description 1
- 238000004949 mass spectrometry Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000012528 membrane Substances 0.000 description 1
- 229930182817 methionine Natural products 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 230000009149 molecular binding Effects 0.000 description 1
- 238000003032 molecular docking Methods 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 229910052757 nitrogen Inorganic materials 0.000 description 1
- 210000003463 organelle Anatomy 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000006320 pegylation Effects 0.000 description 1
- 239000000137 peptide hydrolase inhibitor Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- COLNVLDHVKWLRT-UHFFFAOYSA-N phenylalanine Natural products OC(=O)C(N)CC1=CC=CC=C1 COLNVLDHVKWLRT-UHFFFAOYSA-N 0.000 description 1
- 230000008488 polyadenylation Effects 0.000 description 1
- 108010054442 polyalanine Proteins 0.000 description 1
- 229920000642 polymer Polymers 0.000 description 1
- 230000029279 positive regulation of transcription, DNA-dependent Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000000159 protein binding assay Methods 0.000 description 1
- 238000001742 protein purification Methods 0.000 description 1
- 239000012460 protein solution Substances 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 238000012857 repacking Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 239000012488 sample solution Substances 0.000 description 1
- 230000003248 secreting effect Effects 0.000 description 1
- 239000001509 sodium citrate Substances 0.000 description 1
- NLJMYIDDQXHKNR-UHFFFAOYSA-K sodium citrate Chemical compound O.O.[Na+].[Na+].[Na+].[O-]C(=O)CC(O)(CC([O-])=O)C([O-])=O NLJMYIDDQXHKNR-UHFFFAOYSA-K 0.000 description 1
- 238000002415 sodium dodecyl sulfate polyacrylamide gel electrophoresis Methods 0.000 description 1
- 239000011775 sodium fluoride Substances 0.000 description 1
- 235000013024 sodium fluoride Nutrition 0.000 description 1
- PRWXGRGLHYDWPS-UHFFFAOYSA-L sodium malonate Chemical compound [Na+].[Na+].[O-]C(=O)CC([O-])=O PRWXGRGLHYDWPS-UHFFFAOYSA-L 0.000 description 1
- 238000000527 sonication Methods 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 238000004611 spectroscopical analysis Methods 0.000 description 1
- 238000001370 static light scattering Methods 0.000 description 1
- 150000003431 steroids Chemical class 0.000 description 1
- 238000003756 stirring Methods 0.000 description 1
- 238000012916 structural analysis Methods 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 229960002180 tetracycline Drugs 0.000 description 1
- 229930101283 tetracycline Natural products 0.000 description 1
- 235000019364 tetracycline Nutrition 0.000 description 1
- 150000003522 tetracyclines Chemical class 0.000 description 1
- RLQWHDODQVOVKU-UHFFFAOYSA-N tetrapotassium;silicate Chemical compound [K+].[K+].[K+].[K+].[O-][Si]([O-])([O-])[O-] RLQWHDODQVOVKU-UHFFFAOYSA-N 0.000 description 1
- 125000003396 thiol group Chemical group [H]S* 0.000 description 1
- 238000001890 transfection Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- QORWJWZARLRLPR-UHFFFAOYSA-H tricalcium bis(phosphate) Chemical compound [Ca+2].[Ca+2].[Ca+2].[O-]P([O-])([O-])=O.[O-]P([O-])([O-])=O QORWJWZARLRLPR-UHFFFAOYSA-H 0.000 description 1
- 239000012588 trypsin Substances 0.000 description 1
- OUYCCCASQSFEME-UHFFFAOYSA-N tyrosine Natural products OC(=O)C(N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-UHFFFAOYSA-N 0.000 description 1
- 229960005486 vaccine Drugs 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 239000004474 valine Substances 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
- 239000011800 void material Substances 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K14/00—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
- C07K14/435—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K1/00—General methods for the preparation of peptides, i.e. processes for the organic chemical preparation of peptides or proteins of any length
- C07K1/107—General methods for the preparation of peptides, i.e. processes for the organic chemical preparation of peptides or proteins of any length by chemical modification of precursor peptides
- C07K1/113—General methods for the preparation of peptides, i.e. processes for the organic chemical preparation of peptides or proteins of any length by chemical modification of precursor peptides without change of the primary structure
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K14/00—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
- C07K14/001—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof by chemical synthesis
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K16/00—Immunoglobulins [IGs], e.g. monoclonal or polyclonal antibodies
- C07K16/18—Immunoglobulins [IGs], e.g. monoclonal or polyclonal antibodies against material from animals or humans
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2/00—Peptides of undefined number of amino acids; Derivatives thereof
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/20—Protein or domain folding
Definitions
- Hydrogen bonds play key roles in the structure, function, and interaction specificity of biomolecules.
- the DNA double helix elegantly resolves both challenges; paired bases come together such that all buried polar atoms make hydrogen bonds that are self-contained between the two bases and have near ideal geometry.
- a computing device determines a search space for hydrogen bond networks related to one or more molecules.
- the search space includes a plurality of energy terms related to a plurality of residues related to the hydrogen bond networks.
- the computing device searches the search space to identify one or more hydrogen bond networks based on the plurality of enemy terms.
- the computing device screens the identified one or more hydrogen bond networks to identify one or more screened hydrogen bond networks based on scores for the one or more identified hydrogen bond networks.
- the computing device generates an output related to the one or more screened hydrogen bond networks.
- a computing device in another aspect, includes one or more data processors and a computer-readable medium.
- the computer-readable medium is configured to store at least computer-readable instructions that, when executed, cause the computing device to perform functions.
- the functions include: determining a search space for hydrogen bond networks related to one or more molecules, where the search space includes a plurality of energy terms related to a plurality of residues related to the hydrogen bond networks; searching the search space to identify one or more hydrogen bond networks-based on the plurality of energy terms; screening the identified one or more hydrogen bond networks to identify one or more screened hydrogen bond networks based on scores for the one or more identified hydrogen bond networks; and generating an output related to the one or more screened hydrogen bond networks.
- a computer-readable medium configured to store at least computer-readable instructions that, when executed by one or more processors of a computing device, cause the computing device to perform functions.
- the functions include: determining a search space for hydrogen bond networks related to one or more molecules, where the search space includes a plurality of energy terms related to a plurality of residues related to the hydrogen bond networks; searching the search space to identify one or more hydrogen bond networks based on the plurality of energy term; screening the identified One or more hydrogen bond networks to identify one or more screened hydrogen bond networks based on scores for the one or more identified hydrogen bond networks; and generating an output related to the one or more screened hydrogen bond networks.
- an apparatus in another aspect, includes: means for determining a search space for hydrogen bond networks related to one or more molecules, where the search space includes a plurality of energy terms related to a plurality of residues related to the hydrogen bond networks; means for searching the search space to identify one or more hydrogen bond networks based on the plurality of energy terms; means for screening the identified one or more hydrogen bond networks to identify one or more screened hydrogen bond networks based on scores for the one or more identified hydrogen bond networks; and means for generating an output related to the one or more screened hydrogen bond networks.
- the invention provides polypeptides comprising an amino acid sequence that is at least 75% identical over its full length to the amino acid sequence selected from the group consisting of SEQ ID NOS:2-79.
- polypeptides comprising or consisting or the amino acid sequence of Formula 1:
- Z1 is a helix initiating sequence comprising the amino acid sequence of Formula 2:
- Z3 is a helix connecting sequence having the amino acid sequence of Formula 3:
- Z5 is a helix terminating sequence comprising the amino acid sequence of Formula 4:
- Z2 is selected from the group consisting of general formulae BX 1 BX 2 , X 1 BBX 2 , X 1 BX 2 B, X 1 X 2 BB, BX 1 X 2 B, and BBX 1 X 2 , wherein:
- Z4 is selected from the group consisting of general formulae B 2 X 3 B 2 X 4 , X 3 B 2 B 2 X 4 , X 3 B 2 X 4 B 2 , X 2 X 2 B 2 B 2 , B 2 X 1 X 2 B 2 , and B 2 B 2 X 1 X 2 , wherein
- B 2 is xx-L-A-xx-xx-Q-xx
- the invention provides nucleic acids that encode the polypeptides of the invention, expression vectors comprising the nucleic acids of the invention operatively linked to a promoter sequence, and host cells comprising the expression vectors.
- FIG. 1 Overview of the HBNetTM method and design strategy.
- A (left) All sidechain conformations (rotamers) of polar amino acid types considered for design at each residue position; (middle) many combinations of hydrogen-bonding rotamers are possible and the challenge is to traverse this space and extract (right) networks of connected hydrogen bonds.
- B-D HBNetTM.
- HBNetTM precomputes the hydrogen bond and steric repulsive interaction energies between sidechain rotamers at all pairs of positions and stores them in a graph structure; nodes are residue positions, residue pairs close enough to interact are connected by edges, and for each edge there is an interaction energy matrix; yellow indicates rotamer pairs with energies below a specified threshold (hydrogen bonds with good geometry and little steric repulsion). Traversing the graph elucidates all possible connectivities of hydrogen bonding rotamers (networks) that do not clash with each other.
- E-G Design strategy:
- (F) HBNetTM is applied to parametric backbones to identify the best hydrogen bond networks (G) Networks are maintained while remaining residue positions are designed in context of the assembled symmetric oligomer.
- FIG. 2 The outer ring of helices increase thermostability and can overcome poor helical propensity of the inner helices.
- A CD spectrum (260-195 nm) of design 2L4HC2_23 at 25° C. (blue), 75° C. (red), 95° C. (green), and 25° C. after cooling (purple).
- B 2L4HC2_23, denaturation by GdmCl monitoring 222 nm;
- C 2L4HC2_9, a supercoiled C2 homodimer colored by chain, looking down the supercoil axis.
- D CD spectrum of 2L4HC2_9 as in (A).
- E Inner ring design of 2L4HC2_9.
- FIG. 3 Structural characterization by x-ray crystallography.
- A-F Crystal structures (white) are superimposed onto the design models for six different topologies; (left) the full backbone is shown with cross-sections corresponding to the (middle) designed hydrogen bond networks; panel outline color corresponds to cross-section color on the left; RMSD over all network residue heavy-atoms is reported inside each panel.
- FIG. 4 Structural characterization by small angle x-ray scattering (SAXS).
- SAXS small angle x-ray scattering
- (left) backbones and (middle) b-bond networks for the design models are displayed as in FIG. 3 ;
- (right) design models were fit to experimental scattering data (black) using FoXS; Chi2 values of fit (X) indicated inside each panel.
- (A) 5L8HC4_6 (X 1.36), an untwisted C4 homotetramer with two identical h-bond networks.
- FIG. 5 The hydrogen bond networks confer specificity.
- A Interaction surfaces of monomer subunits for six structurally verified designs, ordered by increasing contiguous hydrophobic interface area, as calculated by h-patch; hydrogen bond network residues are colored.
- B Binding heat-map from yeast two-hybrid assay. Designs in (A) were fused to both DNA-binding domain and Activation domain constructs and binding measured by determining the cell growth rate (maximum ⁇ OD/hour): darker cells indicate more rapid growth, hence stronger binding; values are the average of at least 3 biological replicates. The heat-map is ordered as in (A), and designs with more extensive networks and better-partitioned hydrophobic interface area exhibit higher interaction specificity.
- C-G Modular networks confer specificity in a programmable fashion.
- C The backbone corresponding to designs 2L6HC3_13 ( FIG. 3A ) and 2L6HC3_6 ( FIG. 3B ) can accommodate different networks at each of four repeating geometric cross-sections.
- D Three possibilities for each cross-section: Network “A”, Network “B”, or hydrophobic, “X”.
- E Combinatorial designs using this three letter “alphabet” were tested for interaction specificity using the yeast two-hybrid assay as in (B). Axis labels denote the network pattern; for example, “AXBX” indicates Network A at cross-section 1, Network B at cross-section 3, and X (hydrophobic) at the two others.
- AAXX, XXBB, and XXXX correspond to designs 2L6HC3_13, 2L6HC3_6, and 2L6HC3_1respectively.
- FIG. 6 is a block diagram of an example computing network, in accordance with an example embodiment.
- FIG. 7A is a block diagram of an example computing device, in accordance with an example embodiment.
- FIG. 7B depicts a network of computing devices arranged as a cloud-based server system, in accordance with an example embodiment.
- FIG. 8 is a flowchart of a method, in accordance with an example embodiment.
- amino acid residues are abbreviated as follows: alanine (Ala; A), asparagine (Asn; N), aspartic acid (Asp; D), arginine (Arg; R), cysteine (Cys; C), glutamic acid (Glu, E), glutamine (Gln; glyciric (Gly; G), histidine (His; H), isoleucine (Ile; I), leucine (Len; L), lysine (Lys; K), methionine (Met; M), phenylalanine (Phe; F), praline (Pro; P), serine (Ser; S), threonine (Thr; T), tryptophan (Trp; W), tyrosine (Tyr; Y), and valine (Val; V).
- the invention provides polypeptides comprising an amino acid sequence that is at least 75% identical over its full length to the amino acid sequence selected from the group consisting of SEQ ID NOS:2-79.
- AXAX (SEQ ID NO: 2) TRTRSLREQEEIIRELERSLREQEELLRELERLQREGSSDEDVRELLR EIKKLAREQKYLVEELKKLAREQKRQD; XAAX (SEQ ID NO: 3) TRTEIIRELERSLREQERSLREQEELLRELERLQREGSSDEDVRELLR EIKKLAREQKKLAREQKYLVEELKRQD; XAXA (SEQ ID NO: 4) TRTEIIRELERSLREQEELAKRLKRSLREQERLQREGSSDEDVRKLAR EQKELVEEIEKLAREQKYLVEELKRQD; and XXAA (SEQ ID NO: 5) TRTEIIRELEELAKRLKRSLREQERSLREQERLQREGSSDEDVRKLAR EQKKLAREQKELVEEIEYLVEELKRQD
- Polypeptide Nomenclature The name of the polypeptides shown below indicates oligomerization state and topology, and sequences below are organized by topology and oligomerization state.
- the first two characters indicate supercoil geometry.
- ‘2L’ refers to a two-layer heptad repeat that results in a left-handed supercoil
- ‘3L’ refers to a three-layer 11-residue repeat with a right-handed supercoil
- ‘5L’ refers to untwisted designs with a five-layer 18-residue repeat and straight helices (no supercoiling), where “layer” in this context is the number of unique repeating geometric slices, or layers, along the supercoil axis.
- the middle two characters indicate the total number of helices, and the final two indicate symmetry.
- “2L6HC3” denotes a left-handed, six-helix trimer with C3 symmetry. Underlined residues are optional.
- polypeptides of this aspect of the invention have been shown in the examples that follow to be capable or forming homo-oligomers with modular hydrogen bond network-mediated specificity.
- the polypeptides comprise or consist of an amino acid sequence that is at least 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97$, 98%, 99%, or 100% identical over its full length to the amino acid sequence selected from the group consisting of SEQ ID NOS:2-79.
- polypeptide is used in its broadest sense to refer to a sequence of subunit amino acids.
- the polypeptides of the invention may comprise L-amino acids, D-amino acids (which are resistant to L-amino acid-specific proteases in vivo), or a combination of D- and L-amino acids.
- the polypeptides described herein may be chemically synthesized or recombinantly expressed.
- the polypeptides may be linked to other compounds to promote an increased half-life in vivo, such as by PEGylation, HESylation, PASylation, glycosylation, or may be produced as an Fc-fusion or in deimmunized variants. Such linkage can be covalent or non-covalent as is understood by those of skill in the art.
- polypeptides of the invention may include additional residues at the N-terminus, C-terminus, or both that are not present in the polypeptides of the invention; these additional residues are not included in determining the percent identity of the polypeptides of the invention relative to the reference polypeptide.
- changes from the reference polypeptide are conservative amino acid substitutions.
- conservative amino acid substitution means an amino acid substitution that does not alter or substantially alter polypeptide function or other characteristics.
- a given amino acid can be replaced by a residue having similar physiochemical characteristics, e.g., substituting one aliphatic residue for another (such as Ile, Val, Leu, or Ala for one another), or substitution of one polar residue for another (such as between Lys and Arg; Gln and Asp; or Gln and Asn).
- Other such conservative substitutions e.g., substitutions of entire regions having similar hydrophobicity characteristics, are well known.
- Polypeptides comprising conservative amino acid substitutions can be tested in any one of the assays described herein to confirm that a desired activity, e.g. antigen-binding activity and specificity of a native or reference polypeptide is retained.
- Amino acids can be grouped according to similarities in the properties of their side chains (in A. L. Lehninger, in Biochemistry, second ed., pp. 73-75, Worth Publishers, New York (1975)): (1) non-polar: Ala (A), Val (V), Leu (L), Ile (I), Pro (P), Phe (F), Trp (W), Met (M); (2) uncharged polar: Gly (G), Ser (S), Thr (T), Cys (C), Tyr (Y), Mn (N), Gln (Q); (3) acidic: Asp (D), Gln (E); (4) basic: Lys (K), Arg (R), His (H).
- Naturally occurring residues can be divided into groups based on common side-chain properties: (1) hydrophobic: Norleucine, Met, Ala, Val, Leu, Ile; (2) neutral hydrophilic: Cys, Ser, Thr, Asn, Gln; (3) acidic: Asp, Glu; (4) basic: His, Lys, Arg; (5) residues that influence chain orientation: Gly, Pro; (6) aromatic: Trp, Tyr, Phe.
- Non-conservative substitutions will entail exchanging a member of one of these classes for another class.
- Particular conservative substitutions include, for example; Ala into Gly or into Ser; Arg into Lys; Asn into Gln or into H is; Asp into Gln; Cys into Ser; Gln into Asn; Glu into Asp; Gly into Ala or into Pro; His into Asn or into Gln; Ile into Leu or into Val; Leu into Ile or into Val; Lys into Arg, into Gln or into Glu; Met into Leu, into Tyr or into Ile; Phe into Met, into Leu or into Tyr; Ser into Thr; Thr into Ser; Trp into Tyr; Tyr into Trp; and/or Phe into Val, into Ile or into Leu.
- polypeptides of the invention may include additional residues at the N-terminus, C-terminus, or both.
- residues may be any residues suitable for an intended use, including but not limited to detection tags (i.e.: fluorescent proteins, antibody epitope tags, etc.), linkers, ligands suitable for purposes of purification (His tags, etc.), and peptide domains that add functionality to the polypeptides.
- polypeptides comprising or consisting of the ammo acid sequence of Formula 1:
- Z1 is a helix initiating sequence comprising the amino acid sequence of Formula 2:
- Z3 is a helix connecting sequence having the amino acid sequence of Formula 3:
- Z5 is a helix terminating sequence comprising the amino acid sequence of Formula 4:
- Z2 is selected from the group consisting of general formulae BX 1 BX 2 , X 1 BBX 2 , X 1 BX 2 B, X 1 X 2 BB, BX 1 X 2 B, and BBX 1 X 2 wherein:
- Z4 is selected from the group consisting of general formulae B B 2 X 3 B 2 X 4 , X 3 B 2 B 2 X 4 , X 3 B 2 X 4 B 2 , X 2 X 2 B 2 B 2 , B 2 X 1 X 2 B 2 , and B 2 B 2 X 1 X 2 , wherein
- B 2 is xx-L-A-xx-xx-Q-xx
- polypeptides of this aspect of the invention have been shown in the examples that follow to be capable of forming homo-oligomers with modular hydrogen bond network-mediated specificity.
- J3 is present.
- Z1 is TRT.
- Z3 is RLQREGSSDEDVR (SEQ ID NO: 81).
- Z5 is RQD.
- B is RSLREQE (SEQ ID NO: 82).
- O 1 , O 4 , O 5 , and O 7 are independently selected from the group consisting of E, R, and K.
- X1 and X2 are independently selected from the group consisting of EIIRELE (SEQ ID NO: 83), ELLIRELE (SEQ ID NO: 84), and ELAKRLK (SEQ ID NO: 85).
- B 2 is KLAREQK (SEQ ID NO: 86).
- O 12 and O 15 are independently selected from the group consisting of I, L, V, and A.
- X 3 and X 4 are independently selected from the group consisting of [YE]-LVEELK (SEQ ID NO: 87), [YE]-LLREIK (SEQ ID NO: 88), and [YE]-LVEEIE (SEQ ID NO: 89).
- residues in brackets are alternative residues for a given position within the recited peptide domain.
- X 3 and X 4 are independently selected from the group consisting of ELVEELK (SEQ ID NO: 90), ELLREIK (SEQ ID NO: 91), and ELVEEIE (SEQ ID NO: 92).
- Z2 is selected from the group consisting of general formulae BX 1 BX 2 , X 1 BBX 2 , X 1 BX 2 B, and X 1 X 2 BB; and Z4 is selected from the group consisting of general formulae B 2 X 3 B 2 X 4 , X 3 B 2 B 2 X 4 , X 3 B 2 X 4 B 2 , and X 2 X 2 B 2 B 2 .
- the polypeptides of this aspect of the invention comprise a polypeptide that is at least 75% identical Over its fill length to the amino acid sequence selected from the group consisting of SEQ ID NOS:2-5.
- the polypeptides are linked to a cargo.
- the “cargo” can be any suitable component, including but not limited to nucleic acids, peptides, small molecules, amino acids, a detectable label, etc.
- the polypeptides of the invention can be modified to facilitate covalent linkage to a “cargo” of interest.
- the polypeptides can be modified, such as by introduction of various cysteine residues at defined positions to facilitate linkage to one or more antigens of interest, such that a nanostructure of the polypeptides would provide a scaffold to provide a large number of antigens for delivery as a vaccine to generate an improved immune response.
- some or all native cysteine residues that are present in the polypeptides but not intended to be used for conjugation may be mutated to other amino acids to facilitate conjugation at defined positions.
- the polypeptides of the invention may be modified by linkage (covalent or non-covalent) with a moiety to help facilitate “endosomal escape.”
- linkage covalent or non-covalent
- a critical step can be escape from the endosome—membrane-bound organelle that is the entry point of the delivery vehicle into the cell. Endosomes mature into lysosomes, which degrade their contents. Thus, if the delivery vehicle does not somehow “escape” from the endosome before it becomes a lysosome, it will be degraded and will not perform its function.
- lipids or organic polymers that disrupt the endosome and allow escape into the cytosol.
- the polypeptides can be modified, for example, by introducing cysteine residues that will allow chemical conjugation of such a lipid or organic polymer to the monomer or resulting assembly surface.
- the polypeptides can be modified, for example, by introducing cysteine residues that will allow chemical conjugation of fluorophores or other imaging agents that allow visualization of the nanostructures of the invention in vitro or in vivo.
- the invention provides homo-oligomers (i.e.: homodimer, homotrimers, homotetramer, etc.) comprising a plurality of polypeptides of the present invention having the same amino acid sequence.
- homo-oligomers i.e.: homodimer, homotrimers, homotetramer, etc.
- the polypeptides of the invention are capable of forming homo-oligomers with modular hydrogen bond network-mediated specificity.
- the present invention provides isolated nucleic acids encoding a polypeptide of the present invention.
- the isolated nucleic acid sequence may comprise RNA or DNA.
- isolated nucleic acids are those that have been removed from their normal surrounding nucleic acid sequences in the genome or in cDNA sequences.
- Such isolated nucleic acid sequences may comprise additional sequences useful for promoting expression and/or purification of the encoded protein, including but not limited to polyA sequences, modified Kozak sequences, and sequences encoding epitope tags, export signals, and secretory signals, nuclear localization signals, and plasma membrane localization signals. It will be apparent to those of skill in the art, based on the teachings herein, what nucleic acid sequences will encode the polypeptides of the invention.
- the present invention provides recombinant expression vectors comprising the isolated nucleic acid of any aspect of the invention operatively linked to a suitable control sequence.
- “Recombinant expression vector” includes vectors that operatively link a nucleic acid coding region or gene to any control sequences capable of effecting expression of the gene product.
- “Control sequences” operably linked to the nucleic acid sequences of the invention are nucleic acid sequences capable of effecting the expression of the nucleic acid molecules. The control sequences need not be contiguous with the nucleic acid sequences, so long as they function to direct the expression thereof.
- intervening untranslated yet transcribed sequences can be present between a promoter sequence and the nucleic acid sequences and the promoter sequence can still be considered “operably linked” to the coding sequence.
- Other such control sequences include, but are not limited to, polyadenylation signals, termination signals, and ribosome binding sites.
- Such expression vectors can be of any type known in the art, including but not limited plasmid and viral-based expression vectors.
- control sequence used to drive expression of the disclosed nucleic acid sequences in a mammalian system may be constitutive (driven by any of a variety of promoters, including but not limited to, CMV, SV40, RSV, actin, EF) or inducible (driven by any of a number of inducible promoters including, but not limited to, tetracycline, ecdysone, steroid-responsive).
- the construction of expression vectors for use in transfecting host cells is well known in the art, and thus can be accomplished via standard techniques. (See, for example, Sambrook, Fritsch, and Maniatis, in: Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory Press, 1989 ; Gene Transfer and Expression Protocols , pp.
- the expression vector must be replicable in the host organisms either as an episome or by integration into host chromosomal DNA.
- the expression vector may comprise a plasmid, viral-based vector, or any other suitable expression vector.
- the present invention provides host cells that comprise the recombinant expression vectors disclosed herein, wherein the host cells can be either prokaryotic or eukaryotic.
- the cells can be transiently or stably engineered to incorporate the expression vector of the invention, using standard techniques in the art, including but not limited to standard bacterial transformations, calcium phosphate co-precipitation, electroporation, or liposome mediated-, DEAE dextran mediated-, polycationic mediated-, or viral mediated transfection.
- standard techniques in the art including but not limited to standard bacterial transformations, calcium phosphate co-precipitation, electroporation, or liposome mediated-, DEAE dextran mediated-, polycationic mediated-, or viral mediated transfection.
- a method of producing a polypeptide according to the invention is an additional part of the invention.
- the method comprises the steps of (a) culturing a host according to this aspect of the invention under conditions conducive to the expression of the polypeptide, and (b) optionally, recovering the expressed polypeptide.
- the expressed polypeptide can be recovered from the cell free extract, but preferably they are recovered from the culture medium. Methods to recover polypeptide from cell free extracts or culture medium are well known to the person skilled in the art.
- HBNetTM starts by precomputing the hydrogen bonding and steric repulsion interactions between all conformations (rotameric states) of all pairs of polar sidechains. These energies are stored in a graph data structure where the nodes are residue positions, positions close in three-dimensional space are connected by edges, and for each edge there is a matrix representing the interaction energies between the different rotameric states at the two positions. HBNetTM then traverses this graph to identify all networks of three or more residues connected by low energy hydrogen bonds with little steric repulsion ( FIG.
- FIG. 1B The mast extensive and lowest energy networks ( FIG. 1C ) are kept fixed in subsequent design calculations at the remaining residue positions. Networks with buried donors and acceptors not making hydrogen bonds (unsatisfied) are rejected ( FIG. 1D ). Details of the method, as well as scripts for carrying out the design calculations, are described herein.
- “Two-ring” topologies were built from helical hairpin monomer subunits comprising an inner and outer helix connected by a short loop using a generalization of the Crick coiled-coil parameterization. Wide ranges of backbones were generated by systematically sampling the radii and helical phases of the inner and outer helices, the z-offset between inner and outer helices, and the overall supercoil twist ( FIG. 1E ). HBNetTM was then used to search these backbones for networks that span the intermolecular interface, have all heavy atom donors and acceptors satisfied, and involve at least three sidechains ( FIG.
- RosettaDesignTM was then used to optimize rotamers at the remaining residue positions in the context of the cyclic symmetry of the oligomer ( FIG. 1G ). Designs were ranked based on the total oligomer energy using the RosettaTM all atom force field, filtering to remove designs with large cavities or poor packing around the networks. The top-ranked designs were evaluated using RosettaTM “fold-and-dock” calculations. Designs with energy landscapes shaped like funnels leading into the target designed structure were identified, and a total of 114 dimeric, trimeric, and tetrameric designs spanning a broad range of superhelical parameters and hydrogen bond networks were selected for experimental characterization.
- Synthetic genes encoding the selected designs were obtained and the proteins expressed in Escherichia coli .
- the ⁇ 90% (101/114) of designs that were expressed and soluble were purified by affinity chromatography, and their oligomerization state evaluated by size-exclusion chromatography multi-angle light scattering (SEC-MALS).
- SEC-MALS size-exclusion chromatography multi-angle light scattering
- Sixty-six of the 101 were found to have the designed oligomerization state.
- the 101 soluble designs span eight different topologies; of these, the supercoiled tetramers have the largest buried interface area, yielded the fewest designs with all buried donors and acceptors satisfied, and had the lowest success rate (only 3 of the 13 soluble designs properly assembled).
- Tested peptides include the following:
- AXAX (SEQ. ID NO: 2) TRTRSLREQEEIIRELERSLREQEELLRELERLQREGSSDEDVR ELLREIKKLAREQKYLVEELKKLAREQKRQD; XAAX (SEQ ID NO: 3) TRTEIIRELERSLREQERSLREQEELLRELERLQREGSSDEDVR ELLREIKKLAREQKKLAREQKYLVEELKRQD; XAXA (SEQ ID NO: 4) TRTEIIRELERSLREQEELAKRLKRSLREQERLQREGSSDEDVR KLAREQKELVEEIEKLAREQKYLVEELKRQD; and XXAA (SEQ. ID NO: 5) TRTEIIRELEELAKRLKRSLREQERSLREQERLQREGSSDEDVR KLAREQKKLAREQKELVEEIEYLVEELKRQD
- Polypeptide Nomenclature The name of the polypeptides shown below indicates oligomerization state and topology, and sequences below are organized by topology and oligomerization state.
- the first two characters indicate supercoil geometry: ‘2L’ refers to a two-layer heptad repeat that results in a left-handed supercoil; ‘3L’ refers to a three-layer 11-residue repeat with a right-handed supercoil; and ‘5L’ refers to untwisted designs with a five-layer 18-residue repeat and straight helices (no supercoiling), where “layer” in this context is the number of unique repeating geometric slices, or layers, along the supercoil axis.
- the middle two characters indicate the total -number of helices, and the final two indicate symmetry.
- “2L6HC3” denotes a left-handed, six-helix trimer with C3 symmetry. Underlined residues are optional.
- FIG. 2G a supercoiled homotrimer is also folded and thermostable ( FIG. 2H ); however, the corresponding inner ring peptide ( FIG. 2I ) in isolation is unfolded ( FIG. 2J ) and monomeric.
- the sequence of this inner helix is notable because it has four Asn residues at canonical a or d heptad packing positions where Asn is destabilizing, and also because its other a and d positions are Leu and Ile respectively, which has been found to favor homotetramers.
- the two-ring design assembles to the intended trimeric structure as elucidated by x-ray crystallography ( FIG. 3A ).
- FIG. 4 shows ten crystal structures spanning a range of oligomerization states, superhelical parameters, and hydrogen bond networks.
- FIG. 4 Designs for which crystals were not obtained were characterized by small angle x-ray scattering (SAXS) ( FIG. 4 ). Structures for three left-handed trimers, four left-handed dimers, a left-handed tetramer, and an untwisted triangle-shaped trimer were solved. Additional topologies characterized by SAXS include square-shaped untwisted tetramers ( FIG. 4A ) and dimers ( FIG. 4B ), as well as six-helix dimers (two inner, one outer helix) with either parallel right-handed ( FIG.
- FIG. 4C Five of the x-ray crystallography-verified designs ( FIG. 3A , C-F) were also characterized by SAXS, and the experimentally determined spectra were found to closely match those computed from the design models, suggesting that very similar structures are populated in solution.
- the three left-handed trimer structures (2L6HC3_6, 2L6HC3_12, and 2L6HC3_13) are remarkably similar to the design models with sub-angstrom RMSD across all backbone C ⁇ atoms and across all heavy atoms of the hydrogen bond networks ( FIG. 3A-B ).
- These structures are constructed with supercoil phases of 0, 120 and 240 degrees for the inner helices, and 60, 180, and 300 degrees for the outer helices; loops connect outer N-terminal helices to inner C-terminal helices (at ⁇ 60 degrees from the outer helix).
- Extensive nine or twelve-residue networks form the intended hydrogen bonds in the crystal structures ( FIGS. 3 , A and B middle).
- the four left-handed dimer crystal structures (2L4HC2_9, 2L4HC2_23, 2L4HC2_11, and 2L4HC2_24) all have the designed parallel two-ring topology.
- Two of the dimer structures have hydrogen bond networks in close agreement to the designs: 2L4HC2_9 ( FIG. 3D ) and 2L4HC2_23 ( FIG. 3E ) have 0.39 ⁇ and 0.92 ⁇ RMSD across all network residue heavy-atoms, respectively, and 0.39 ⁇ and 1.16 ⁇ RMSD over all ⁇ atoms.
- the other two, 2L4HC2_11 and 2L4HC2_24, have slight structural deviations from the design models caused by water displacing designed network sidechains; in the former, the interface shifts ⁇ 2 ⁇ due to a buried water molecule bridging two network residues, and in the latter, the backbone is nearly identical to the design model but sidechains of the designed network are displaced by ordered water molecules.
- These two cases highlight the need for high connectivity and satisfaction (all polar atoms participating in hydrogen bonds) of the networks.
- the lefthanded tetramer structure has the designed overall topology ( FIG. 3C ), and SAXS data is in close agreement with the design model, but sidechain density was uncertain due to low (3.8 ⁇ ) resolution.
- the amino acid sequence is unrelated to any known sequence, and the top hit in structure-based searches of the Protein Data Bank (PDB) has a quite different helical bundle arrangement.
- the five antiparallel dimers (2L6Hanti_1-5) were soluble and assembled to the designed oligomeric state, with SAXS data in agreement with the design models ( FIG. 4D ).
- Design 2L6Hanti_3 contains a hydrogen bond network with a buried Tyr at the dimer interface ( FIG. 4D ).
- 3L6HC2 4 FIG. 4C
- 3L6HC2 7 exhibited scattering in agreement with the design models, whereas 3L6HC2 2 did not.
- 3L614C2 2 was designed to form a parallel dimer, its crystal structure revealed an antiparallel dimer interface, highlighting two design lessons: first, the importance of intermolecular hydrogen bonds at the binding interface (the 3L6HC2 2 design model has only two across the interface compared to 9 in 2L6HC3_6 ( FIG. 3B )), and second, the importance of favorable hydrophobic contacts complementing the networks (the 3L6HC2 2 design model has mainly alanines at the interface).
- 5L8HC4 6 has a distinctive network with a Trp making a buried hydrogen bond at one end of the network, which then propagates outwards towards solvent, connecting to an Glu on the surface ( FIG. 4A ). It is believed that oligomers with such uniformly straight helices do not exist in nature, nor have these topologies been designed previously.
- the 2.36 ⁇ crystal structure of the untwisted winter (5L6HC3_1) reveals straight helices with 0.51 ⁇ RMSD to the design model over all C ⁇ atoms ( FIG. 3F ).
- the two hydrogen bond networks ( FIG. 3F middle), as well as the hydrophobic packing residues surrounding the networks ( FIG. 3F right), are nearly identical between the crystal structure and design model, with 0.41 ⁇ 1 and 0.48 ⁇ RMSD over all network heavy-atoms.
- each of these networks contains sidechains from every helix, and helices were constructed to be uniformly symmetrical and equidistant.
- Design 2L6HC3_13 also has two additional smaller networks comprising a single symmetric Asn making two hydrogen bonds but with one polar hydrogen unsatisfied; in the crystal structure, these residues move away from the design model, displaced by water molecules.
- Two-ring structures are a new class of protein oligomers that have the potential for programmable interaction specificity analogous to that of Watson-Crick base paring. Whereas Watson-Crick base pairing is largely limited to the antiparallel double helix, the designed protein hydrogen bond networks allow the specification of two-ring structures with a range of oligomerization states (dimers, trimers, and tetramers) and supercoil geometries. Adding an outer ring of helices to enable hydrogen bond networks extends upon elegant studies from Keating, Woolfson, and others demonstrating the designability of coiled coils with a wide range of hetero and homo-oligomeric specificities.
- the design models and crystal structures show that a wide range of hydrogen bond network composition and geometry are possible in repeating two-ring topologies, and that multiple networks can be engineered into the same backbone at varying positions without sacrificing thermostability, enabling stable building blocks with uniform shape but orthogonal binding interfaces ( FIG. 5 ).
- the DNA nanotechnology field has demonstrated that a spectacular array of shapes and interactions can be built from a relatively limited set of hydrogen bonding interactions. It should now become possible to develop new protein-based materials with the advantages of both polymers: DNA-like programmability and tunable specificity, coupled with the geometric variability, interaction diversity, and catalytic function intrinsic to proteins.
- HBNetTM Hydrogen Bond Network Method
- the HBNetTM method can include three steps. First, an exhaustive but efficient search identifies the hydrogen bond networks possible within a given search space (which consists of all allowed sidechain rotamers of all amino acid types being considered for a particular backbone conformation). Second, networks are scored and ranked based on the RosettaTM energy function, satisfaction (all buried polar atoms participating in hydrogen bonds), and user-defined options.
- the best networks, or combinations of the best networks are iteratively placed onto the design scaffold and held in relative position with constraints that serve as ‘seeds’ for any subsequent RosettaTM method to design around the network and optimize rotamers for the remaining positions in the scaffold.
- HBNetTM makes use of RosettaTM's Interaction Graph (IG) data structure, initially populating it with only the sidechain hydrogen bond and Lennard-Jones (steric repulsive) energy terms.
- the nodes of the graph are the residue positions of all designable or packable residues, and the edges represent putative interactions between those residues, pointing to sparse matrices that store the two-body energies between all pairs of interacting rotamers (of all amino acid types being considered) at those two positions. Only using the hydrogen bond and repulsive energies allows for instant look-up of all rotamer pairs with favorable (low energy) hydrogen bond geometry and no steric clashing.
- Monte Carlo or similar randomized methods can be used to search this rotamer interaction space.
- the entire rotamer interaction space can be searched.
- the search through the entire rotamer interaction space can be performed using a recursive depth-first search or a recursive breadth-first search of the interaction graph, enumerating all compatible, non-clashing connectivities of hydrogen bonded sidechain rotamers.
- the graph traversal algorithm can check the rotamer to ensure it does not clash with any existing rotamers in that network. If it is accepted, a recursive call is made on this rotamer. These recursive calls continue until a stop condition is reached: either no additional hydrogen bonding interactions can be found, or the network connects back to one of the original starting residues.
- Some polar amino acids such as Asn and Gln, can make three or more hydrogen bonds, serving as branch points in hydrogen bond networks; depth-first search misses these branching amino acids, and to account for this, a look-back function identifies networks that share one or more identical rotamers and, after checking for clashes or conflicting residues, merges them together into complete networks. Redundant networks are eliminated.
- HBNetStapleInterfaceTM An instance of HBNetTM, “HBNetStapleInterfaceTM”, was written, in which graph traversals are initiated at residue positions at the intermolecular interface.
- This implementation of HBNetTM offers two advantages: first, starting the traversal at only the interface positions reduces the search space, speeding up runtime, and second, it ensures only networks at the interface are found, which was the goal of the approach in this study; requiring that at least 2 residues in each network come from different polypeptide chains ensure that network spans the intermolecular interface.
- the identified networks are scored and ranked to determine the “best” networks. For each network, buried polar atoms are identified by solvent-accessible surface area (SASA); networks with buried heavy atom donors or acceptors not making hydrogen bonds (unsatisfied) are eliminated. The remaining networks are then ranked based on the least number of unsatisfied polar hydrogens. The networks are then scored against each other in the context of a background reference structure: all designable or packable positions in the scaffold are mutated to poly-alanine, network rotamer placed onto the scaffold, and the network scored with the full RosettaTM energy function (talaris2013),
- Step 1 sidechain-backbone hydrogen bonds are not explicitly considered because the backbone is fixed (the number of sidechain-backbone hydrogen bonds for any given rotamer is constant).
- Step 2 sidechain-backbone hydrogen bonds are scored when the networks are placed onto the reference structure, and are therefore included in evaluation for satisfaction (how many of the buried polar atoms participate in hydrogen bonds).
- HBNetTM captures networks with sidechain-backbone hydrogen bonds. Networks with additional hydrogen bonds to backbone polar atoms will generally score better than a similar network without h-bonds to backbone in that the connectivity and satisfaction is improved.
- Step 3 For Each of the Best-Scoring H-bond Networks, Perform Design.
- the best networks as ranked by Step 2 are iteratively placed onto the input scaffold and passed back to the RosettaScriptsTM protocol and for user-defined design of the remaining residue positions.
- Atom-pair constraints are automatically turned on for each pair of atoms making a hydrogen bonds in the network; these constraints are tracked throughout the remainder of the design run to ensure the network residues are fixed in relative position during the downstream design.
- HBNetTM also outputs a RosettaTM constraint (.cst) file that can be used to specify the same constraints in subsequent Rosetta design runs.
- Combinations of multiple networks at the same interface can also be considered and specified by the user.
- RosettaTM design in which one input structure yields one output structure (the lowest energy solution found by sequence design and combinatorial sidechain optimization), this approach allows for hundreds of design possibilities to be output for each input structure.
- HBNetTM will only search for networks within a given search space (all possible rotamers of all possible amino acid types being considered for a given input backbone), which can be defined by the user.
- HBNetTM functions as a “Mover” within the RosettaScriptsTM framework and can be passed “task operations” to specify which residue positions are fixed, packable (amino acid type is fixed but sidechain conformation is not), and designable—for designable positions, task operations can also specify which amino acid types are allowed at each position.
- the default setting in the absence of any task operations is drat all residues are considered for design and all polar amino acids are considered in the network search.
- All positions in the scaffold can be set to be designable; for HBNetTM, buried positions (defined based on solvent-accessible surface area (SASA)) can be allowed to be any noncharged polar amino acid, and solvent-exposed positions can be allowed to be any polar amino acid.
- SASA solvent-accessible surface area
- a generalization of the Crick coiled-coil parameters was used to independently vary parameters of two or more helices supercoiled around the same axis, parameters defined as described previously.
- Each monomer subunit has at least one inner helix and an outer helix ( FIG. 1D ).
- the supercoil phase ( ⁇ 0 in ) and z-offset of the first inner helix were fixed to 0 to serve as a relative reference point; all other parameters varied independently between the inner and outer helices, with the exception of the supercoil twist ( ⁇ 0 ) and helical twist ( ⁇ 1 ). Because these two parameters are coupled and determine handedness, ideal values were used for ⁇ 1 with ⁇ 0 and ⁇ 1 held constant between the inner and outer helices for the majority of designs.
- 3L6HC2 parallel six-helix dimer designs
- ⁇ 0 ′ ⁇ 0 1 + ⁇ 0 2 ⁇ ( R ′2 - R 2 ) d 2
- Constraining the pitch results in the outer helix maintaining more contacts to the inner helices throughout the length of the helical bundle allows for different hydrogen bond network and packing solutions.
- HBNetTM is written in C++ as part of the RosettaTM software suite: HBNetTM was developed to be modular and is compatible with all symmetric RosettaTM applications, as well as the RosettaScriptsTM XML framework so that it can be plugged into most existing design protocols, and users can customize options specific to their design tasks. HBNetTM is written as an abstract base class, from which specialized “mover” classes can be derived for specific design cases. In particular, the instance of HBNetTM described herein as “HBNetStapleInterfaceTM” was written to search for hydrogen bond networks that span across intermolecular interfaces. AB
- Table 1 shows example RosettaScriptsTM XML used for design calculations, example command lines and flags used for design calculations, and customized score weighting information.
- Parametrically generated backbones were first regularized using Cartesian space minimization in RosettaTM to alleviate any torsional strain introduced by ideal backbone generation. For each topology, an initial search of only the inner helix was performed to identify parameter ranges that resulted in the most favorable core sidechain packing; outer helix parameters were then extensively sampled in context of these inner helix parameter ranges, generating tens of thousands of backbones.
- HBNetTM was used to search these backbones for hydrogen bond networks that span the intermolecular interface, have all heavy atom donors and acceptors satisfied, and contain at least three sidechains contributing hydrogen bonds.
- Step 3 For the designs described herein, generally on the order of ⁇ 100,000 networks were detected after Step 1, but only a handful of networks, if any, passed all of the criteria outlined in Step 2 and were carried forward. After downstream design (Step 3), packing around the networks was evaluated. Because the hydrogen bond networks are constrained during downstream design, models were minimized and sidechains repacked without the constraints to measure how well the networks remained intact in the absence of the constraints.
- Candidates under a stringent alignment tolerance (within 0.35 ⁇ RMSD) were then fully aligned to the target backbone via torsion-space minimization under stringent coordinate constraints to the target backbone heavy-atom coordinates and soft coordinate constraints to the aligned candidate backbone heavy-atom coordinates.
- candidate loop sequences were then designed under sequence profile constraints generated via alignment of the loop backbone to the source structure database, and the lowest-scoring candidate selected as the final loop design.
- Protein BLASTTM searches were performed using the National Center for Biotechnology Information (NCBI) web server, searching against all non-redundant protein sequences (‘nr’ database) using an Expect threshold (E-value cutoff) of 10.0 and the BLOSUM62 substitution matrix.
- NCBI National Center for Biotechnology Information
- the Coiled-coil Crick Parameterization (CCCP) web server with the “Global symmetric” optimization option as used, as structures of interest are all symmetric homooligomers.
- parameters varied between the inner and outer helices of a given structure parameters were calculated separately for inner ring and the outer ring helices, inputting .pdb files corresponding to either all helical residues of the inner ring helices, or all helical residues of the outer ring helices, for each crystal structure.
- Synthetic genes were ordered from Genscript Inc. (Piscataway, N.J., USA) and delivered in either pET21-NESG or pET-28b+ E. coli expression vectors, inserted at the NdeI and XhoI sites of each vector.
- pET21-NESG constructs synthesized DNA was cloned in frame with the C-terminal hexahistidine tag.
- pET-28b+ constructs synthesized DNA was cloned in frame with the N-terminal hexahistidine tag and thrombin cleavage site, and a stop codon was introduced at the C-terminus. Plasmids were transformed into chemically competent E.
- coli BL21(DE3)Star or L21 (DE3)Star-pLysS cells (Invitrogen) for protein expression.
- Constructs for yeast two-hybrid assays were made by Gibson assembly; inserts were generated by PCR from pET-21 or pET-28 E. coli expression vectors as templates, or ordered as gBlocks®(IDT). All primers and gBlocks® were ordered from Integrated DNA Technologies (IDT).
- Starter cultures were grown at 37° C. in either Luria-Bertani (LB) medium overnight, or in Terrific Broth for 8 hours, in the presence of 50 ⁇ g/ml carbenicillin (pET21-NESG) or 30 ⁇ g/ml kanamycin (pET-28b+). Starter cultures were used to inoculate 500 mL of LB, Terrific Broth, or Terrific Broth II (MP Biomedicals) containing antibiotic. Cultures were induced with 0.2-0.5 mM IPTG at an OD600 of 0.6-0.9 and expressed overnight at 18° C. (many designs were also later expressed at 37° C. for 4 hours with no noticeable difference in yield).
- lysis buffer (20 mM Tris, 300 mM NaCl, 20 mM Imidazole, pH 8.0 at room temperature
- Lysates were cleared by centrifugation at 4° C. 18,000 rpm for at least 30 minutes and applied to Ni-NTA (Qiagen) columns pre-equilibrated in lysis buffer.
- the column was washed three times with 5 column volumes (CV) of wash buffer (20 mM Tris, 300 mM NaCl, 30 mM Imidazole, pH 8.0 at room temperature), followed by 3-5 CV of high-salt wash buffer (20 mM Tris, 1 M NaCl, 30 mM imidazole, pH 8.0 at room temperature), and then 5 CV of wash buffer.
- Protein was eluted with 20 mM Tris, 300 mM NaCl, 250 mM Imidazole, pH 8.0 at room temperature. Proteins were initially screened by SEC-MALS and CD with His tags intact; if possible, the tags were cleaved and samples were further purified for crystallography, SAXS, and GdmaCl melts.
- N-terminal hexahistidine tags of the pET-28 constructs were cleaved with restriction grade thrombin (EMD Millipore 69671-3) at room temperature for 4 hours or overnight, using a 1:5000 dilution of enzyme into sample solution; full cleavage was observed after 2 hours via SDS-PAGE analysis and no spurious cleavage was observed at time points upwards of 18 hours.
- buffer Prior to addition of thrombin, buffer was exchanged into lysis buffer (20 mM Tris, 300 mM NaCl, 20 mM Imidazole).
- CD wavelength scans (260 to 195 nm) and temperature melts (23 to 95° C.) were measured using a JASCO J-1500 or an AVIV model 420 CD spectrometer. Temperature melts monitored absorption signal at 222 nm and were carried out at a heating rate of 4° C./min; protein samples were at 0.2-0.5 mg/mL in phosphate buffered saline (PBS) pH 7.4 in a 0.1 cm cuvette.
- PBS phosphate buffered saline
- Guanidinium chloride (GdmCl) titrations were performed on the same spectrometers with automated titration apparatus in PBS pH 7.4 at 25° C., monitored at 222 nm, using a protein concentration of 0.025-0.06 mg/mL in a 1 cm cuvette with stir bar; each titration consisted of at least 40 evenly distributed concentration points with one minute mixing time for each step.
- Titrant solution consisted of the same concentration of protein in PBS+GdmCl; GdmCl concentration was determined by refractive index.
- Peptides 2L4HC2_9_inner and 2L6HC3_13_inner were ordered from Genscript Inc. (Piscataway, N.J., USA) with N-terminal acetylation and C-terminal amidation.
- Peptides were dissolved in PBS pH 7.4 and further dialyzed into PBS pH 7.4 for CD experiments.
- Purified protein samples were concentrated to approximately 12 mg/ml in 20 mM Tris pH 8.0 and 1.00 mM NaCl. Samples were screened using the sparse matrix method (Jancarik and Kim, 1991) with a Phoenix Robot (Art Robbins Instruments, Sunnyvale, Calif.) utilizing the following crystallization screens: Berkeley Screen (Lawrence Berkeley National Laboratory), Crystal Screen, PEG/Ion, Index and PEGRx (Hampton Research, Aliso Viejo, Calif.).
- the crystals of the designed proteins were placed in a reservoir solution containing 15 to 20% (v/v) glycerol, and then flash-cooled in liquid nitrogen.
- the X-ray data sets were collected at the Berkeley Center for Structural Biology beamlines 5.0.1, 8.2.1 and 8.2.2 of the Advanced Light Source at Lawrence Berkeley National Laboratory (LBNL). Data sets were indexed and scaled using HKL2000. All the design structures were determined by the molecular-replacement method with the program PHASER within the Phenix suite using the design models as the initial search model. The atomic positions obtained from molecular replacement and the resulting electron density maps were used to build the design structures and initiate crystallographic refinement and model rebuilding. Structure refinement was performed using the phenix.refine program.
- Protein binders were cloned into plasmids bearing the GAL4 DNA-binding domain (pOBD2) and or the GAL4 transcription activation domain (poAD) using Gibson assembly and sequence verified.
- pOBD2 GAL4 DNA-binding domain
- poAD GAL4 transcription activation domain
- the yeast strain PJ69-4a was transformed with the appropriate pair of plasmids using a modified LiQAc transformation protocol where rescue and selection of the transformed yeast was performed in minimal liquid media lacking tryptophan and leueine. Before the assay, transformed cells were diluted 1:10 and grown for 16 hours in fresh minimal media lacking tryptophan and leucine.
- a computing device determines a search space for hydrogen bond networks related to one or more molecules.
- the search space includes a plurality of energy terms related to a plurality of residues related to the hydrogen bond networks.
- the computing device searches the search space to identify one or more hydrogen bond networks based on the plurality of energy terms.
- the computing device screens the identified one or more hydrogen bond networks to identify one or more screened hydrogen bond networks based on scores for the one or more identified hydrogen bond networks.
- the computing device generates an output related to the one or more screened hydrogen bond networks.
- a computing device in another aspect, includes one or more data processors and a computer-readable medium.
- the computer-readable medium is configured to store at least computer-readable instructions that, when executed, cause the computing device to perform functions.
- the functions include: determining a search space for hydrogen bond networks related to one or more molecules, where the search space includes a plurality of energy terms related to a plurality of residues related to the hydrogen bond networks; searching the search space to identify one or more hydrogen bond networks based on the plurality of energy terms; screening the identified one or more hydrogen bond networks to identify one or more screened hydrogen bond networks based on scores for the one or more identified hydrogen bond networks; and generating an output related to the one or more screened hydrogen bond networks.
- a computer-readable medium configured to store at least computer-readable instructions that, when executed by one or more processors of a computing device, cause the computing device to perform functions.
- the functions include: determining a search space for hydrogen bond networks related to one or more molecules, where the search space includes a plurality of energy terms related to a plurality of residues related to the hydrogen bond networks; searching the search space to identify one or more hydrogen bond networks based on the plurality of energy terms; screening the identified one or more hydrogen bond networks to identify one or more screened hydrogen bond networks based on scores for the one or more identified hydrogen bond networks; and generating an output related to the one or more screened hydrogen bond networks.
- an apparatus in another aspect, includes: means for determining a search space for hydrogen bond networks related to one or more molecules, where the search space includes a plurality of energy terms related to a plurality of residues related to the hydrogen bond networks; means for searching the search space to identify one or more hydrogen bond networks based on the plurality of energy terms; means for screening the identified one or more hydrogen bond networks to identify one or more screened hydrogen bond networks based on scores for the one or more identified hydrogen bond networks; and means for generating an output related to the one or more screened hydrogen bond networks.
- FIG. 6 is a block diagram of an example computing network.
- FIG. 6 shows protein design system 602 configured to communicate, via network 606 , with client devices 604 a , 604 b , and 604 c and protein database 608 .
- protein design system 602 and/or protein database 608 can be a computing device configured to perform some or all of the herein described methods and techniques, such as but not limited to, method 800 and functionality described as being part of or related to RosettaTM.
- Protein database 608 can, in some embodiments, store information related to and/or used by RosettaTM.
- Network 606 may correspond to a LAN, a wide area network (WAN), a corporate intranet, the public Internet, or any other type of network configured to provide a communications path between networked computing devices.
- Network 606 may also correspond to a combination of one or more LANs, WANs, corporate intranets, and/or the public Internet.
- client devices 604 a , 604 b , 604 c may serve tens, hundreds, or thousands of client devices.
- client devices 604 a , 604 b , 604 c may be any sort of computing device, such as an ordinary laptop computer, desktop computer, network terminal, wireless communication device (e.g., a cell phone or smart phone), and so on.
- client devices 604 a , 604 b , 604 c can be dedicated to problem solving/using the RosettaTM software suite.
- client devices 604 a , 604 b , 604 c can be used as general purpose computers that are configured to perform a number of tasks and need not be dedicated to problem solving/using RosettaTM.
- part or all of the functionality of protein design system 602 and/or protein database 608 can be incorporated in a client device, such as client device 604 a , 604 b , and/or 604 c.
- FIG. 7A is a block diagram of an example computing device (e.g., system).
- computing device 700 shown in FIG. 7A can be configured to: include components of and/or perform one or more functions of protein design system 602 , client device 604 a , 604 b , 604 c , network 606 , and/or protein database 608 and/or carry out part or all of any herein-described methods and techniques, such as but not limited to method 800 .
- Computing device 700 may include a user interface module 701 , a network-communication interface module 702 , one or more processors 703 , and data storage 704 , all of which may be linked together via a system bus, network, or other connection mechanism 705 .
- User interface module 701 can be operable to send data to and/or receive data from external user input/output devices.
- user interface module 701 can be configured to send and/or receive data to and/or from user input devices such as a keyboard, a keypad, a touch screen, a computer mouse, a track ball, a joystick, a camera, a voice recognition module, and/or other similar devices.
- User interface module 701 can also be configured to provide output to user display devices, such as one or more cathode ray tubes (CRT), liquid crystal displays (LCD), light emitting diodes (LEDs), displays using digital light processing (DLP) technology, printers, light bulbs, and/or other similar devices, either now known or later developed.
- User interface module 701 can also be configured to generate audible output(s), such as a speaker, speaker jack, audio output port, audio output device, earphones, and/or other similar devices.
- Network-communications interface module 702 can include one or more wireless interfaces 707 and/or one or more wireline interfaces 708 that are configurable to communicate via a network, such as network 606 shown in FIG. 6 .
- Wireless interfaces 707 can include one or more wireless transmitters, receivers, and/or transceivers, such as a Bluetooth transceiver, a Zigbee transceiver, a Wi-Fi transceiver, a WiMAX transceiver, and/or other similar type of wireless transceiver configurable to communicate via a wireless network.
- Wireline interfaces 708 can include one or more wireline transmitters, receivers, and/or transceivers, such as an Ethernet transceiver a Universal Serial Bus (USB) transceiver, or similar transceiver configurable to communicate via a twisted pair, one or more wires, a coaxial cable, a fiber-optic link, or a similar physical connection to a wireline network.
- wireline transmitters, receivers, and/or transceivers such as an Ethernet transceiver a Universal Serial Bus (USB) transceiver, or similar transceiver configurable to communicate via a twisted pair, one or more wires, a coaxial cable, a fiber-optic link, or a similar physical connection to a wireline network.
- USB Universal Serial Bus
- network communications interface module 702 can be configured to provide reliable, secured, and/or authenticated communications. For each communication described herein, information for ensuring reliable communications (i.e., guaranteed message delivery) can be provided, perhaps as part of a message header and/or footer (e.g., packet/message sequencing information, encapsulation header(s) and/or footer(s), size/time information, and transmission verification information such as CRC and/or parity check values). Communications can be made secure (e.g., be encoded or encrypted) and/or decrypted/decoded using one or more cryptographic protocols and/or algorithms, such as, but not limited to, DES, AES, RSA, Diffie-Hellman and/or DSA. Other cryptographic protocols and/or algorithms can be used as well or in addition to those listed herein to secure (and then decrypt/decode) communications.
- cryptographic protocols and/or algorithms can be used as well or in addition to those listed herein to secure (and then decrypt/decode) communications.
- Processors 703 can include one or more general purpose processors and/or one or more special purpose processors (e.g., digital signal processors, application specific integrated circuits, etc.). Processors 703 can be configured to execute computer-readable program instructions 706 contained in data storage 704 and/or other instructions as described herein.
- Data storage 704 can include one or more computer-readable storage media that can be read and/or accessed by at least one of processors 703 .
- the one or more computer-readable storage media can include volatile and/or non-volatile storage components, such as optical, magnetic, organic or other memory or disc storage, which can be integrated in whole or in part with at least one of processors 703 .
- data storage 704 can be implemented using a single physical device (e.g., one optical, magnetic, organic or other memory or disc storage unit), while in other embodiments, data storage 704 can be implemented using two or more physical devices.
- Data storage 704 can include computer-readable program instructions 706 and perhaps additional data.
- data storage 704 can store part or all of data utilized by a protein design system and/or a protein database; e.g., protein designs system 602 , protein database 608 .
- data storage 704 can additionally include storage required to perform at least part of the herein-described methods and techniques and/or at least part of the functionality of the herein-described devices and networks.
- FIG. 7B depicts a network 606 of computing clusters 709 a , 709 b , 709 c arranged as a cloud-based server system in accordance with an example embodiment.
- Data and/or software for protein design system 602 can be stored on one or more cloud-based devices that store program logic and/or data of cloud-based applications and/or services.
- protein design system 602 can be a single computing device residing in a single computing center.
- protein design system 602 can include multiple computing devices in a single computing center, or even multiple computing devices located in multiple computing centers located in diverse geographic locations.
- data and/or software for protein design system 602 can be encoded as computer readable information stored in tangible computer readable media (or computer readable storage media) and accessible by client devices 604 a , 604 b , and 604 c , and/or other computing devices.
- data and/or software for protein design system 602 can be stored on a single disk drive or other tangible storage media, or can be implemented on multiple disk drives or other tangible storage media located at one or more diverse geographic locations.
- FIG. 7B depicts a cloud-based server system in accordance with an example embodiment.
- the functions of protein design system 602 can be distributed among three computing clusters 709 a , 709 b , and 709 c .
- Computing cluster 709 a can include one or more computing devices 700 a , cluster storage arrays 710 a , and cluster routers 711 a connected by a local cluster network 712 a .
- computing cluster 709 b can include one or more computing devices 700 b , cluster storage arrays 710 b , and cluster routers 711 b connected by a local cluster network 712 b .
- computing cluster 709 c can include one or more computing devices 700 c , cluster storage arrays 710 c , and cluster routers 711 c connected by a local cluster network 712 c.
- each of the computing clusters 709 a , 709 b , and 709 c can have an equal number of computing devices, an equal number of cluster storage arrays, and an equal number of cluster routers. In other embodiments, however, each computing cluster can have different numbers of computing devices, different numbers of cluster storage arrays, and different numbers of cluster routers. The number of computing devices, cluster storage arrays, and cluster routers in each computing cluster can depend on the computing task or tasks assigned to each computing cluster.
- computing devices 700 a can be configured to perform various computing tasks of protein design system 602 .
- the various functionalities of protein design system 602 can be distributed among one or more of computing devices 700 a, 700 b, and 700 c.
- Computing devices 700 b and 700 c in computing clusters 709 b and 709 c can be configured similarly to computing devices 700 a in computing cluster 709 a.
- computing devices 700 a, 700 b, and 700 c can be configured to perform different functions.
- computing tasks and stored data associated with protein design system 602 can be distributed across computing devices 700 a, 700 b, and 700 c based at least in part on the processing requirements of protein design system 602 , the processing capabilities of computing devices 700 a, 700 b, and 700 c, the latency of the network links between the computing devices in each computing cluster and between the computing clusters themselves, and/or other factors that can contribute to the cost, speed, fault-tolerance, resiliency, efficiency, and/or other design goals of the overall system architecture.
- the cluster storage arrays 710 a, 710 b, and 710 c of the computing clusters 709 a, 709 b , and 709 c can be data storage arrays that include disk array controllers configured to manage read and write access to groups of hard disk drives.
- the disk array controllers alone or in conjunction with their respective computing devices, can also be configured to manage backup or redundant copies of the data stored in the cluster storage arrays to protect against disk drive or other cluster storage array failures and/or network failures that prevent one or more computing devices from accessing one or more cluster storage arrays.
- cluster storage arrays 710 a, 710 b, and 710 c can be configured to store one portion of the data and/or software of protein design system 602 , while other cluster storage arrays can store a separate portion of the data and/or software of protein design system 602 . Additionally, some cluster storage arrays can be configured to store backup versions of data stored in other cluster storage arrays.
- the cluster routers 711 a, 711 b, and 711 c in computing clusters 709 a, 709 b, and 709 c can include networking equipment configured to provide internal and external communications for the computing clusters.
- the cluster routers 711 a in computing cluster 709 a can include one or more internet switching and routing devices configured to provide (i) local area network communications between the computing devices 700 a and the cluster storage arrays 701 a via the local cluster network 712 a, and (ii) wide area network communications between the computing cluster 709 a and the computing clusters 709 b and 709 c via the wide area network connection 713 a to network 606 .
- Cluster routers 711 b and 711 c can include network equipment similar to the cluster routers 711 a, and cluster routers 711 b and 711 c can perform similar networking functions for computing clusters 709 b and 709 b that cluster routers 711 a perform for computing cluster 709 a.
- the configuration of the cluster routers 711 a, 711 b , and 711 c can be based at least in part on the data communication requirements of the computing devices and cluster storage arrays, the data communications capabilities of the network equipment in the cluster routers 711 a, 711 b , and 711 c , the latency and throughput of local networks 712 a, 712 b , 712 c, the latency, throughput, and cost of wide area network links 713 a, 713 b, and 713 c, and/or other factors that can contribute to the cost, speed, fault-tolerance, resiliency, efficiency and/or other design goals of the moderation system architecture.
- FIG. 8 is a flow chart of an example method 800 .
- Method 800 can be carried out by a computing device, such as computing device 700 described in the context of at least FIG. 7A
- Method 800 can begin at block 810 , where the computing device can determine a search space for hydrogen bond networks related to one or more molecules, where the search space includes a plurality of energy terms related to a plurality of residues related to the hydrogen bond networks, such as discussed above at least in the “Computational Techniques” section.
- the search space can be configured as a graph having a plurality of nodes connected by one or more edges, where a node of the plurality of nodes is based on a particular residue of the plurality of residues, the particular residue having a residue position, and where an edge of the one or more edges connects a first node and a second node of the plurality of nodes based on a possible interaction between the first and second nodes, such as discussed above at least in the “computational Techniques” section.
- the first node can relate to a first residue of the plurality of residues where the second node relates to a second residue of the plurality of residues, and where the possible interaction between first and second nodes relate to a possible interaction between a rotamer of the first residue and/or a rotamer of the second residue, such as discussed above at least in the “Computational Techniques” section.
- the possible interaction between the possible interaction between first and second nodes can relate to an interaction energy between the first residue and the second residue, such as discussed above at least in the “Computational Techniques” section.
- determining the search space can include: determining whether the interaction energy between the first residue and the second residue is less than a threshold interaction energy; and after determining that the interaction energy between the first residue and the second residue is less than the threshold interaction energy, adding a hydrogen bond network including the first node, the second node, and at least one edge between the first and second nodes to the search space, such as discussed above at least in the “Computational Techniques” section.
- at least one edge between the first and second nodes can include information about the interaction energy between the first residue and the second residue, such as discussed above at least in the “Computational Techniques” section.
- the information about the interaction energy between the first residue and the second residue can include a plurality of interaction energy values, where each interaction energy value in the plurality of interaction energy values is associated with a particular rotamer of the first residue and a particular rotamer of the second residue, such as discussed above at least in the “Computational Techniques” section.
- determining the search space can include: determining at least a first residue position and a second residue position at an intermolecular interface between a first molecule and a second molecule, the first residue position associated with a first residue of the first molecule and the second residue position associated with a second residue of the second molecule; and determining the search space based on the at least the first residue position and the second residue position, such its discussed above at least in the “Computational Techniques” section.
- at least one of the first molecule and the second molecule can include a polypeptide chain, such as discussed above at least in the “Computational. Techniques” section.
- the computing device can search the search space to identify one or more hydrogen bond networks based on the plurality of energy terms, such as discussed above at least in the “Computational Techniques” section.
- searching the search space includes searching all of the search space, such as discussed above at least in the “Computational Techniques” section.
- searching all of the search space using the depth-first search includes searching all of the search space using a breadth-first search, such as discussed above at least in the “Computational Techniques” section.
- searching the search space can include: performing a first search of the search space to identify one or more initial hydrogen bond networks; and identifying the one or more identified hydrogen bond networks by at least merging a first hydrogen bond network and a second hydrogen bond network of the one or more initial hydrogen bond networks, such as discussed above at least in the “Computational Techniques” section.
- merging the first hydrogen bond network and the second hydrogen bond network can include: determining whether the first hydrogen bond network and the second hydrogen bond network share an identical rotamer; and after determining that the first hydrogen bond network and the second hydrogen bond network share an identical rotamer, merging the first hydrogen bond network and the second hydrogen bond network, such as discussed above at least in the “Computational Techniques” section.
- the computing device can screening the identified one or more hydrogen bond networks to identify one or more screened hydrogen bond networks based on scores for the one or more identified hydrogen bond networks, such as discussed above at least in the “Computational Techniques” section.
- a particular score for a particular identified hydrogen bond network of the one or more identified hydrogen bond networks can be based on a number of polar atoms that participate in the particular hydrogen bond network, such as discussed above at least in the “Computational Techniques” section.
- a particular score for a particular identified hydrogen bond network of the one or more identified hydrogen bond networks can be based on a background reference structure, such as discussed above at least in the “Computational Techniques” section.
- the particular score for the particular identified hydrogen bond network can be based on a score related to one or more sidechain-backbone hydrogen bonds, where the one or more sidechain-backbone hydrogen bonds can be related to the background reference structure, such as discussed above at least in the “Computational Techniques” section.
- a particular score for a particular identified hydrogen bond network of the one or more identified hydrogen bond networks can be based on an energy function, such as discussed above at least in the “Computational Techniques” section.
- an output related to the one or more screened hydrogen bond networks can be generated.
- generating the output related to the one or more screened hydrogen bond networks can include designing one or more molecules based on the screened hydrogen bond networks, such as discussed above at least in the “Computational Techniques” section.
- designing the one or more molecules based on the screened hydrogen bond networks includes allowing one or more relatively-small movements of one or more rotamers in a screened hydrogen bond network, such as discussed above at least in the “Computational Techniques” section.
- generating the output related to the one or more screened hydrogen bond networks can include generating a plurality of outputs related to the one or more screened hydrogen bond networks, such as discussed above at least in the “Computational Techniques” section.
- generating the output related to the one or more screened hydrogen bond networks can include: generating a synthetic gene that is based on the one or more screened hydrogen bond networks; expressing a particular protein in vivo using the synthetic gene; and purifying the particular protein.
- expressing the particular protein sequence in vivo using the synthetic gene includes expressing the particular protein sequence in one or more Escherichia coli that include the synthetic gene, such as discussed above in at least in the “Experimental Methods” section.
- each block and/or communication may represent a processing of information and/or a transmission of information in accordance with example embodiments.
- Alternative embodiments are included within the scope of these example embodiments.
- functions described as blocks, transmissions, communications, requests, responses, and/or messages may be executed out of order from that shown or discussed, including substantially concurrent or in reverse order, depending on the functionality involved.
- more or fewer blocks and/or functions may be used with any of the ladder diagrams, scenarios, and flow charts discussed herein, and these ladder diagrams, scenarios, and flow charts may be combined with one another, in part or in whole.
- a block that represents a processing of information may correspond to circuitry that can be configured to perform the specific logical functions of a herein-described method or technique.
- a block that represents a processing of information may correspond to a module, a segment, or a portion of program code (including related data).
- the program code may include one or more instructions executable by a processor for implementing specific logical functions or actions in the method or technique.
- the program code and/or related data may be stored on any type of computer readable medium such as a storage device including a disk or hard drive or other storage medium.
- the computer readable medium may also include non-transitory computer readable media such as computer-readable media that stores data for short periods of time like register memory, processor cache, and random access memory (RAM).
- the computer readable media may also include non-transitory computer readable media that stores program code and/or data for longer periods of time, such as secondary or persistent long term storage, like read only memory (ROM), optical or magnetic disks, compact-disc read only memory (CD-ROM), for example.
- the computer readable media may also be any other volatile or non-volatile storage systems.
- a computer readable medium may be considered a computer readable storage medium, for example, or a tangible storage device.
- a block that represents one or more information transmissions may correspond to information transmissions between software and/or hardware modules in the same physical device. However, other information transmissions may be between software modules and/or hardware modules in different physical devices.
Landscapes
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biochemistry (AREA)
- Genetics & Genomics (AREA)
- Medicinal Chemistry (AREA)
- Molecular Biology (AREA)
- General Chemical & Material Sciences (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Gastroenterology & Hepatology (AREA)
- Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Analytical Chemistry (AREA)
- Toxicology (AREA)
- Immunology (AREA)
- Zoology (AREA)
- Biotechnology (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Theoretical Computer Science (AREA)
- Peptides Or Proteins (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
- Apparatus Associated With Microorganisms And Enzymes (AREA)
- Preparation Of Compounds By Using Micro-Organisms (AREA)
Abstract
Description
- This application claims priority to U.S. Provisional Patent Application Ser. No. 62/317,190 filed Apr. 1, 2016, incorporate by reference herein in its entirety.
- Hydrogen bonds play key roles in the structure, function, and interaction specificity of biomolecules. There are two main challenges facing de novo design of hydrogen bonding interactions: first, hydrogen bonding atoms are geometrically restricted to narrow ranges of orientation and distance, and second, nearly all polar atoms must participate in hydrogen bonds either with other macromolecular polar atoms, or with solvent—if not, there is a considerable energetic penalty associated with stripping away water upon folding or binding. The DNA double helix elegantly resolves both challenges; paired bases come together such that all buried polar atoms make hydrogen bonds that are self-contained between the two bases and have near ideal geometry. In proteins, meeting these challenges is more complicated because backbone geometry is highly variable and pairs of polar amino acids cannot generally interact as to fully satisfy their mutual hydrogen bonding capabilities; hence sidechain hydrogen bonding usually involves networks of multiple amino acids with variable geometry and composition, and there are generally very different networks at different sites within a single protein or interface pre-organizing polar residues for binding and catalysis.
- In nature, structural specificity in DNA and proteins is encoded quite differently: in DNA, specificity arises from modular hydrogen bonds in the core of the double helix, whereas in proteins, specificity arises largely from buried hydrophobic packing complemented by irregular peripheral polar interactions. Herein is described a general approach for designing a wide range of protein homo-oligomers with specificity determined by modular arrays of central hydrogen bond networks. This approach can be used to design dimers, trimers, and tetramers comprising two concentric rings of helices, including previously not seen triangular, square, and supercoiled topologies. X-ray Crystallography confirms that the structures overall, and the hydrogen bond networks in particular, are nearly identical to the design models, and the networks confer interaction specificity in vivo. The ability to design extensive hydrogen bond networks with atomic accuracy is a milestone for protein design and enables the programming of protein interaction specificity for a broad range of synthetic biology applications. Also described herein is a class of protein oligomers with regular arrays of hydrogen bond networks that enable programming of interaction specificity.
- In one aspect, a method is provided. A computing device determines a search space for hydrogen bond networks related to one or more molecules. The search space includes a plurality of energy terms related to a plurality of residues related to the hydrogen bond networks. The computing device searches the search space to identify one or more hydrogen bond networks based on the plurality of enemy terms. The computing device screens the identified one or more hydrogen bond networks to identify one or more screened hydrogen bond networks based on scores for the one or more identified hydrogen bond networks. The computing device generates an output related to the one or more screened hydrogen bond networks.
- In another aspect, a computing device is provide. The computing device includes one or more data processors and a computer-readable medium. The computer-readable medium is configured to store at least computer-readable instructions that, when executed, cause the computing device to perform functions. The functions include: determining a search space for hydrogen bond networks related to one or more molecules, where the search space includes a plurality of energy terms related to a plurality of residues related to the hydrogen bond networks; searching the search space to identify one or more hydrogen bond networks-based on the plurality of energy terms; screening the identified one or more hydrogen bond networks to identify one or more screened hydrogen bond networks based on scores for the one or more identified hydrogen bond networks; and generating an output related to the one or more screened hydrogen bond networks.
- In another aspect, a computer-readable medium is provided. The computer-readable medium is configured to store at least computer-readable instructions that, when executed by one or more processors of a computing device, cause the computing device to perform functions. The functions include: determining a search space for hydrogen bond networks related to one or more molecules, where the search space includes a plurality of energy terms related to a plurality of residues related to the hydrogen bond networks; searching the search space to identify one or more hydrogen bond networks based on the plurality of energy term; screening the identified One or more hydrogen bond networks to identify one or more screened hydrogen bond networks based on scores for the one or more identified hydrogen bond networks; and generating an output related to the one or more screened hydrogen bond networks.
- In another aspect, an apparatus is provided. The apparatus includes: means for determining a search space for hydrogen bond networks related to one or more molecules, where the search space includes a plurality of energy terms related to a plurality of residues related to the hydrogen bond networks; means for searching the search space to identify one or more hydrogen bond networks based on the plurality of energy terms; means for screening the identified one or more hydrogen bond networks to identify one or more screened hydrogen bond networks based on scores for the one or more identified hydrogen bond networks; and means for generating an output related to the one or more screened hydrogen bond networks.
- In one aspect, the invention provides polypeptides comprising an amino acid sequence that is at least 75% identical over its full length to the amino acid sequence selected from the group consisting of SEQ ID NOS:2-79.
- In another aspect, the invention provides polypeptides comprising or consisting or the amino acid sequence of Formula 1:
-
Z1-Z2-Z3-Z4-Z5, wherein: - Z1 is a helix initiating sequence comprising the amino acid sequence of Formula 2:
-
J1-J2-J3, wherein -
- J1 is selected from the group consisting of S, T, N, and D;
- J2 is selected from the group consisting of P, E, R, K, L, A; and
- J3 is selected from the group consisting of E, D, R, K, I, L, V, A, S, T, Y, or is absent;
- Z3 is a helix connecting sequence having the amino acid sequence of Formula 3:
-
(SEQ ID NO: 1) [RKED]-L-[NQEDRKST]-[NQEDRKST]-[NQEDRKST]-G- [STNQED]-[STNQED]-[STND]-E-[EDRK]-V-[RKED]; - Z5 is a helix terminating sequence comprising the amino acid sequence of Formula 4:
-
xx-xx-[RKEDSTNQYA] (SEQ ID NO: 80); - Z2 is selected from the group consisting of general formulae BX1BX2, X1BBX2, X1BX2B, X1X2BB, BX1X2B, and BBX1X2, wherein:
-
B is xx-S-L-xx-xx-Q-xx; -
- X1 and X2 independently have the amino acid sequence of Formula 5:
-
O1O2O3O4O5O6O7 wherein: -
- O1, O4, O5, and O7 are xx;
- O2 and O3are independently selected from the group consisting of I, L, and A; and
- O6 is L; and
- Z4 is selected from the group consisting of general formulae B2X3B2X4, X3B2B2X4, X3B2X4B2, X2X2B2B2, B2X1X2B2, and B2B2X1X2, wherein
-
B2 is xx-L-A-xx-xx-Q-xx; and -
- X3 and X4 independently have the amino acid sequence of Formula 6:
-
O10O11O12O13O14O15O16 wherein -
- O10, O13, O14, and O16 are xx
- O11 is L, and
- O12 and O15 are independently selected from the group consisting of I, L, V, and A;
- wherein xx is any amino acid, and
- wherein:
- (i) when Z1 is BX1BX2 then Z2 is X3B2X4B2;
- (ii) when Z1 is X1BBX2 then Z2 is X3B2B2X4;
- (iii) when Z1 is X1BX2B then Z2 is B2X3B2X4;
- (iv) when Z1 is X1X2BB then Z2 is B2B2X3X4;
- (v) when Z1 is BX1X2B then Z2 is B2X3X4B2; and
- (vi) when Z1 is BBX1X2 then Z2 is X3X4B2B2.
- In other aspects, the invention provides nucleic acids that encode the polypeptides of the invention, expression vectors comprising the nucleic acids of the invention operatively linked to a promoter sequence, and host cells comprising the expression vectors.
-
FIG. 1 . Overview of the HBNet™ method and design strategy. (A) (left) All sidechain conformations (rotamers) of polar amino acid types considered for design at each residue position; (middle) many combinations of hydrogen-bonding rotamers are possible and the challenge is to traverse this space and extract (right) networks of connected hydrogen bonds. (B-D) HBNet™. (B) HBNet™ precomputes the hydrogen bond and steric repulsive interaction energies between sidechain rotamers at all pairs of positions and stores them in a graph structure; nodes are residue positions, residue pairs close enough to interact are connected by edges, and for each edge there is an interaction energy matrix; yellow indicates rotamer pairs with energies below a specified threshold (hydrogen bonds with good geometry and little steric repulsion). Traversing the graph elucidates all possible connectivities of hydrogen bonding rotamers (networks) that do not clash with each other. In the simple example shown, two pairs of sidechain rotamers at Resi and Resj make good-geometry hydrogen bonds, but graph traversal shows that only one of these (left) can be extended into a connected network: (C) Resi rotamer 3 (i:3) can also hydrogen bond to Resk rotamer 2 (k:2) and Resj rotamer 4 (!:4), yielding a “good” network of fully connected Asn residues with all heavy-atom donors and acceptors satisfied, whereas (D) would be rejected because the hydrogen-bonding rotamers i:6 (Gln) and j:4 (Ser) cannot form additional hydrogen bonds to nearby positions k and l, leaving unsatisfied buried polar atoms. (E-G) Design strategy: (E) Parametric backbone generation of two-ring coiled coils: a C3 symmetric trimer is shown, colored by monomer subunit, labeled with parameters sampled: supercoil radius of inner (Rin) and outer (Rout) helices, helical phase of the inner (Δφ1 in ) and outer (Δφ1 out ) helices, supercoil phase of the outer helix (11cp0), z-offset between the inner and outer helices (Zoff), and the supercoil twist (ω0). (F) HBNet™ is applied to parametric backbones to identify the best hydrogen bond networks (G) Networks are maintained while remaining residue positions are designed in context of the assembled symmetric oligomer. -
FIG. 2 . The outer ring of helices increase thermostability and can overcome poor helical propensity of the inner helices. (A) CD spectrum (260-195 nm) of design 2L4HC2_23 at 25° C. (blue), 75° C. (red), 95° C. (green), and 25° C. after cooling (purple). (B) 2L4HC2_23, denaturation by GdmCl monitoring 222 nm; (C) 2L4HC2_9, a supercoiled C2 homodimer colored by chain, looking down the supercoil axis. (D) CD spectrum of 2L4HC2_9 as in (A). (E) Inner ring design of 2L4HC2_9. (F) CD temperature melt monitoring absorption at 222 nm; 2L4HC2_9 (black) is significantly more stable than 2L4HC2_9_inner (gray). (G) 2L6HC3_13, a supercoiled C3 homotrimer. (H) CD spectrum of 2L6HC3_13 at different temperatures as in (A). (I) 2L6HC3_13_inner. (J) CD spectrum of 2L6HC3_13 (black) versus 2L6HC3_13_inner (gray) shows that the inner helix by itself is primarily unfolded. All CD data is plotted in Mean Residue Ellipticity (MRE) 103 deg cm2dmol−1 -
FIG. 3 . Structural characterization by x-ray crystallography. (A-F) Crystal structures (white) are superimposed onto the design models for six different topologies; (left) the full backbone is shown with cross-sections corresponding to the (middle) designed hydrogen bond networks; panel outline color corresponds to cross-section color on the left; RMSD over all network residue heavy-atoms is reported inside each panel. (A) 2L6HC3_13 (1.64 Å resolution; RMSD=0.51 Å over an Cα atoms) and (B) 2L6HC3_6 (2.26 Å resolution; RMSD=0.77 Å over all Cα atoms) are left-handed C3 homotrimers, each with two identical networks at different locations that span the entire interface, contacting all six helices. (C) 2L8HC4_12, a left-handed C4 homotetramer with two different hydrogen bond networks; the low (3.8 Å) resolution does not allow assessment of the hydrogen bond network sidechains. (D) 2L4HC2_9 (2.56 Å resolution; 0.39 Å RMSD over all Ca atoms) and (F) 2L4HC2_23 (1.54 Å resolution; RMSD=1.16 Å over all Cα atoms) are left-handed C2 homodimers, each with one network. (F) 5L6HC3_1 (2.36 Å resolution; RMSD=0.51 Å over all Cα atoms) is a C3 homotrimer with straight, untwisted helices and two identical networks at different cross-sections. (G, H) Schematics of hydrogen bond networks from 2L6HC3_13 (A) and 5L6HC3_1 (F). The indicated hydrogen bonds are present in both design model and crystal structure. -
FIG. 4 Structural characterization by small angle x-ray scattering (SAXS). (left) backbones and (middle) b-bond networks for the design models are displayed as inFIG. 3 ; (right) design models were fit to experimental scattering data (black) using FoXS; Chi2 values of fit (X) indicated inside each panel. (A) 5L8HC4_6 (X=1.36), an untwisted C4 homotetramer with two identical h-bond networks. (B) 5L4HC2_12 (X=1.45), an untwisted C2 homodimer with a single h-bond network. (C) 3L6HC2_4 (X=2.04), a parallel right-handed C2 homodimer with two repeated networks and two inner helices, one outer helix. (D) 2L6Hanti_3 (X=1.80), a left-handed anti parallel homodimer with two inner helices, one outer helix; because of the anti-parallel geometry, the same network occurs in two locations. -
FIG. 5 . The hydrogen bond networks confer specificity. (A) Interaction surfaces of monomer subunits for six structurally verified designs, ordered by increasing contiguous hydrophobic interface area, as calculated by h-patch; hydrogen bond network residues are colored. (B) Binding heat-map from yeast two-hybrid assay. Designs in (A) were fused to both DNA-binding domain and Activation domain constructs and binding measured by determining the cell growth rate (maximum ΔOD/hour): darker cells indicate more rapid growth, hence stronger binding; values are the average of at least 3 biological replicates. The heat-map is ordered as in (A), and designs with more extensive networks and better-partitioned hydrophobic interface area exhibit higher interaction specificity. (C-G) Modular networks confer specificity in a programmable fashion. (C) The backbone corresponding to designs 2L6HC3_13 (FIG. 3A ) and 2L6HC3_6 (FIG. 3B ) can accommodate different networks at each of four repeating geometric cross-sections. (D) Three possibilities for each cross-section: Network “A”, Network “B”, or hydrophobic, “X”. (E) Combinatorial designs using this three letter “alphabet” were tested for interaction specificity using the yeast two-hybrid assay as in (B). Axis labels denote the network pattern; for example, “AXBX” indicates Network A atcross-section 1, Network B atcross-section 3, and X (hydrophobic) at the two others. (F) SAXS profiles for combinatorial designs as inFIG. 4 ; (G) SEC chromatograms and estimated molecular weights (from MALS); designs range from ˜27-30 kDa. AAXX, XXBB, and XXXX correspond to designs 2L6HC3_13, 2L6HC3_6, and 2L6HC3_1respectively. -
FIG. 6 is a block diagram of an example computing network, in accordance with an example embodiment. -
FIG. 7A is a block diagram of an example computing device, in accordance with an example embodiment. -
FIG. 7B depicts a network of computing devices arranged as a cloud-based server system, in accordance with an example embodiment. -
FIG. 8 is a flowchart of a method, in accordance with an example embodiment. - All references cited are herein incorporated by reference in their entirety. Within this application, unless otherwise stated, the techniques utilized may be found in any of several well-known references such as: Molecular Cloning: A Laboratory Manual (Sambrook, et al., 1989, Cold Spring Harbor Laboratory Press), Gene Expression Technology (Methods in Enzymology, Vol. 185, edited by D. Goeddel, 1991. Academic Press, San Diego, Calif.), “Guide to Protein Purification” in Methods in Enzymology (M. P. Deutshcer, ed., (1990) Academic Press, PCR Protocols: A Guide to Methods and Applications (Innis, et al. 1990. Academic Press, San Diego, Calif.), Culture of Animal Cells: A Manual of Basic Technique, 2nd Ed. (R. I. Freshney. 1987. Liss, Inc. New York, N.Y.), Gene Transfer and Expression Protocols, pp. 109-128, ed. E. J. Murray, The Humana Press Inc., Clifton, N.J.), and the Ambion 1998 Catalog (Ambion, Austin, Tex.).
- As used herein, the singular forms “a”, “an” and “the” include plural referents unless the context clearly dictates otherwise. “And” as used herein is interchangeably used with “or” unless expressly stated otherwise.
- As used herein, the amino acid residues are abbreviated as follows: alanine (Ala; A), asparagine (Asn; N), aspartic acid (Asp; D), arginine (Arg; R), cysteine (Cys; C), glutamic acid (Glu, E), glutamine (Gln; glyciric (Gly; G), histidine (His; H), isoleucine (Ile; I), leucine (Len; L), lysine (Lys; K), methionine (Met; M), phenylalanine (Phe; F), praline (Pro; P), serine (Ser; S), threonine (Thr; T), tryptophan (Trp; W), tyrosine (Tyr; Y), and valine (Val; V).
- All embodiments of any aspect of the invention can be used in combination, unless the context clearly dictates otherwise.
- In one aspect, the invention provides polypeptides comprising an amino acid sequence that is at least 75% identical over its full length to the amino acid sequence selected from the group consisting of SEQ ID NOS:2-79.
-
AXAX (SEQ ID NO: 2) TRTRSLREQEEIIRELERSLREQEELLRELERLQREGSSDEDVRELLR EIKKLAREQKYLVEELKKLAREQKRQD; XAAX (SEQ ID NO: 3) TRTEIIRELERSLREQERSLREQEELLRELERLQREGSSDEDVRELLR EIKKLAREQKKLAREQKYLVEELKRQD; XAXA (SEQ ID NO: 4) TRTEIIRELERSLREQEELAKRLKRSLREQERLQREGSSDEDVRKLAR EQKELVEEIEKLAREQKYLVEELKRQD; and XXAA (SEQ ID NO: 5) TRTEIIRELEELAKRLKRSLREQERSLREQERLQREGSSDEDVRKLAR EQKKLAREQKELVEEIEYLVEELKRQD - Polypeptide Nomenclature: The name of the polypeptides shown below indicates oligomerization state and topology, and sequences below are organized by topology and oligomerization state. The first two characters indicate supercoil geometry. ‘2L’ refers to a two-layer heptad repeat that results in a left-handed supercoil; ‘3L’ refers to a three-layer 11-residue repeat with a right-handed supercoil; and ‘5L’ refers to untwisted designs with a five-layer 18-residue repeat and straight helices (no supercoiling), where “layer” in this context is the number of unique repeating geometric slices, or layers, along the supercoil axis. The middle two characters indicate the total number of helices, and the final two indicate symmetry. Thus, “2L6HC3” denotes a left-handed, six-helix trimer with C3 symmetry. Underlined residues are optional.
-
2L8HC4_12 (SEQ ID NO: 6) GTAIEANSRMLKALIEAKAIWKALWANSLLLEATSRGDTERMRQWAEE ARIYKEAKKIIDEADEIVKEAKERHD 5L6HC3_1 (SEQ ID NO: 7) SEELRAVADLQRLNIELARKLLEAVARLQELNIDLVRKTSELTDEKTI REEIRKVKEESKRIVEEAEEEIRRAKEESRYIADESRGS 2L4HC2_9 (SEQ ID NO: 8) GTSDYIIEQIQRDQEEARKKVEEAEERLERVKEASKRGVSSDQLLDLI RELAEIIEELIRIIRRSNEAIKELIKNQS 2L4HC2_11 (SEQ ID NO: 9) GSEDYKLREAQRELDKQRKDTEEIRKRLKEIQRLTDERTSTADELIKE LREIIRRLQEQSEKLREIIEELEKIIRKR 2L4HC2_23 (SEQ ID NO: 10) GTRTEIIRELERSLREQEELAKRLKELLRELERLQREGSSDEDVRELL REIKELVEEIEKLAREQKYLVEELKRQD 2L4HC2_24 (SEQ ID NO: 11) GTDTDELLRLAKEQAELLKEIKKLVEEIARLVKEIQEDPSDELLKTLA ELVRKLKELVEDMERSMKEQLYIIKKQKS 5L8HC4_6 (SEQ ID NO: 12) GSKDTEDSRKIWEDIRRLLEEARKNSEEIWKEITKNPDTSEIARLLSE QLLEIAEMLVRIAELLSRQTEQR 2L6HC3_AXAX (SEQ ID NO: 13) GTKYEIREALKEAQKQLEDLKRMLDELRRNLEELKRNPSEDALVENNE LIVRVLEVIVENNRSIIEILKLLAKSD 2L6HC3_AXBX (SEQ ID NO: 14) GTKYKIREMLEEAKRSLEELRRILEKLKESLRELRRNPSEDALVNNNE VIVKAIEASVENQRIIIELARMLAESD 2L6HC3_AXXB (SEQ ID NO: 15) GTKYRIKDTLRELKRALEELKKILEELQRSLEELRRNPSEDALVNNNE VIVKAIEAAVRAIEISAENQRMLAESD 2L6HC3_XAAA (SEQ ID NO: 16) GTKYEARKQLEEMKKQLKDLKRSLERLREILERLEENPSEDVIVEAIR AIVENNKQIVENNRSIIENNETIIRSD 2L6HC3_XBXA (SEQ ID NO: 17) GTKYELRRQLEELEKLLRELRKSLDELRKILEELERNPSEDVIVRAIK ASVKNQEIIVEVLRAIIENNKTIAKSD 2L6HC3_12 (SEQ ID NO: 18) GTKYELRRALEELEKALQELREMLRKLKESLEELKKNPSEDALVRNNE LIVEVLRVIVEVLSIIARVLEINARSD 2L6HC3_13 (SEQ ID NO: 19) GTKYELRRALEELEKALRELKKSLDELERSLEELEKNPSEDALVENNR LNVENNKIIVEVLRIIAEVLKINAKSD 2L6HC3_6 (SEQ ID NO: 20) GTKYKIKETLKRLEDSLRELRRILEELKEMLERLEKNPDKDVIVEVLK VIVKAIEASVENQRISAENQKALAESD 2L6HC3_10 (SEQ ID NO: 21) GTKYEIKKALKELEEAIQKLKKSLKELKESLKELQKNPSEDALVKNNS LNVANNEIIVEVLEIIARILELLARSD 2L6HC3_11 (SEQ ID NO: 22) GTKYEIKEALRELNRALKELKEALRELERSLRELQKNPDKDALVRNNE LNVDVARIIVEVLSIIARVLELLAKSD 2L6HC3_12 (SEQ ID NO: 23) GTKYELRRALEELEKALQELREMLRKLKESLEELKKNPSEDALVRNNE LIVEVLRVIVEVLSIIARVLEINARSD 2L6HC3_13 (SEQ ID NO: 24) GTKYELRRALEELEKALRELKKSLDELERSLEELEKNPSEDALVENNR LNVENNKIIVEVLRIIAEVLKINAKSD 2L6HC3_14 (SEQ ID NO: 25) GTKYELREAIRKLEEALRKLKKALDELRKSLEELKKNPSEDALVRNNE LNVKVAEIIVKVLKIIAEAIKINAKSD 2L6HC3_19 (SEQ ID NO: 26) GTEEYKLRELLKRHNEVLKELQKAAKEAEEVAERFKKTNDITEAIRVI ADLLRAIVKAIETNSRVVKMIVELNE 2L6HCL2 (SEQ ID NO: 27) GTKYIEKLLREAQRTLEELKRLLEELKEMLKELERANATDARLIAEVI RVIVEVLRASVENQEMIIRILKAITEE 2L6HC3_23 (SEQ ID NO: 28) TEKDVLRIIVKNNEIIVKVLSVIAEVLKIIAKILENPSEYMLKELKKA LKELEKMLKELRKSLKELKEALRELEGS 2L6HC3_37 (SEQ ID NO: 29) GTLDYKLDEMLKKLEKSREEMEKMAQELRRALEELEKNSNVDKVLKII IKAIQLSIENQKLNLEAVRLLIEAQKS 2L6HC3_6 (SEQ ID NO: 30) GTKYKIKETLKRLEDSLRELRRILEELKEMLERLEKNPDKDVIVEVLK VIVKAIEASVENQRISAENQKALAESD 2L8HC4_3 (SEQ ID NO: 31) GTDEYKWKEEVRRFEEEAKKWEEELKEMRKRIEDAKKGRPTLKVNLEA AEALLEAARLIVEAAKLLLAAAKLNEKQN 2L8HC4_9 (SEQ ID NO: 32) GSDEDRKAKELIERQRKLTDEAEEWAKQNEEIAKKIEKQPDTSLVARM LANVSRMLLATNRALLANTEALEALIRKT 2L8HC4_12 (SEQ ID NO: 33) GTAIEANSRMLKALIEIAKAIWKALWANSLLLEATSRGDTERMRQWAE EAREIYKEAKKIIDEADEIVKEAKERHD 3L6HC 2_4 (SEQ ID NO: 34) SALEKIAKLIIEAARLSAELARRAARASAEMARKAIEAVSEERGSESL LKIVADLIVESQEAVVRLIIESQQIAAKLAEDLIRAAKEAASDESKME EVAKEVQERAERAARDIERKLKRVLEELDYKLKESRDGS 3L6HC2_6 (SEQ ID NO: 35) TALEIAVRLNREAAREAARENADTARKAARRIAEVAKRLAEENRDAKL AARLLAEIARLLAELIARQSELLAEWLATQSKLAAELARKDTSATDEA ERIRKESEELLDKVREEIKRLEDEVSKTIEELSERVRGS 3L6HC2_7 (SEQ ID NO: 36) SILELAHESNRRALEMASRANREAMKAAREMIRAASEAARRAGSSNDK DSLRMIEEALRLALRMIEETNKKAVRMVLENNRKMVEAEKKKLSEEEI KRIAKETEDRMREIARRASEEARRLAEEIKREADYRSGS 5L6HC3_1 (SEQ ID NO: 37) SEELRAVADLQRLNIELARKLLEAVARLQELNIDLVRKTSELTDEKTI REEIRKVKEESKRIVEEAEEEIRRAKEESRYIADESRGS 5L6HC3_3 (SEQ ID NO: 38) GTERKDRLRKELKRIAEETDKWVEELKEELERILRTIEELRKDPSSEV IVDIARIQLEALREVIRVVAENSKAILEAIHRVIEEG 5L6HC3_5 (SEQ ID NO: 39) SKEVRLQKLNAEIMKEIMELIIRLQEANARIIEELVRLIIDLERSTDS KRMIEEIRKVAERAIEESKRLLEEAEKAMRRAIYESEDALREGS 5L8HC4_1 (SEQ ID NO: 40) GSKVEELLRKSEEAAERAKRELERLLEESERIVAEAQALAEKYESQKV WVRILIELIRATNRMLAEIARILLEMIEVTNRMIAESTK 5L8HC4_2 (SEQ ID NO: 41) SEQLKEIARILIKLIESLTRFILEVARILIELIEETQRLIVASTDSDE SELERIARESKKKAKKALDELKKIVDDQRREAKKAIEELEYDGS 5L8HC4_6 (SEQ ID NO: 42) GSKDTEDSRKIWEDIRRLLEEARKNSEEIWKEITKNPDTSEIARLLSE QLLEIAEMLVRIAELLSRQTEQR 2L4HC2_1 (SEQ ID NO: 43) GTAYELLRKAEELEKKQQELLKRQEELAKTAEELRKKGGNADSMMKII KESTRIVRESTEIVKELLKIIRELRRQS 2L4HC2_5 (SEQ ID NO: 44) GTRTEYLKKLAEEAKELAKRSRELSKESRRLSEEARRDPDKEKLLRVV KKLQEVIEELQRVIEELLRVIKEALENQS 2L4HC2_6 (SEQ ID NO: 45) GTETEYQRELAREARRLAKRSRELSERSRKLSEDAKRDPDKDKLLEVV ERLQQVIEELQKVIEELLRVIESSLKTIS 2L4HC2_9 (SEQ ID NO: 46) GTSDYIIEQIQRDQEEARKKVEEAEERLERVKEASKRGVSSDQLLDLI RELAEIIEELIRIIRRSNEAIKELIKNQS 2L4HC2_10 (SEQ ID NO: 47) GTEEYRRKEQEERTKEQQERTERQRRKTEELKRATKEGTLTPEEAIRQ AQKQSENAERQSREAEKQSREANEALRKR 2L4HC2_11 (SEQ ID NO: 48) GSEDYKLREAQRELDKQRKDTEEIRKRLKEIQRLTDERTSTADELIKE LREIIRRLQEQSEKLREIIEELEKIIRKR 2L4HC2_12 (SEQ ID NO: 49) GSEDYKLKELQKRNKKQEEEAKRNDDERKKIEELTRKRTSTADELIRE LQRSNEEMQRSQREMQDQSRRLEDIIRKR 2L4HC2_14 (SEQ ID NO: 50) GTEDYKRREAERKLQKQQEELKELKRKLEEIRELHEKGVGSPDRLIRE LERIIRELQRMQKENEKIIKELQRIIKKR 2L4HC2_18 (SEQ ID NO: 51) GTESKYLLEEARRLKDEARKLKEEAKKVKEESRKLIERIDRGEDSDRE LLERLKEQNNRLLEIIERLLEIIERLLKLIEEWTRDS 2L4HC2_19 (SEQ ID NO: 52) GTEEDYAEREIRKMKEEQKRQRKRLEELERELQEMQEKKREGTSDAKE VIDQLERIIRELQEIIRSQEDITRKLEEIIRRMKENS 2L4HC2_20 (SEQ ID NO: 53) GTNKEELKRTMEEQQRILEKLLRTIKEQKEILRKQEEGRATKEELKRL TKLAQEQERMMRELIDLARKQAYLLKRES 2L4HC2_21 (SEQ ID NO: 54) GTREEKIRRILEEIQKIMEEIKRIMEEIKRTQEEAEKHGSSKKAIEKQ KELLRRLEELLRKLERLLRELEYLMRDEK 2L4HC2_22 (SEQ ID NO: 55) GTREEWLYRILELIERIERLIKEIIRLSRRALELLENNASNEEWAQEI KEMQRKIQEWLKQILEWLKKIKEWIRESQ 2L4HC2_23 (SEQ ID NO: 56) GTRTEIIRELERSLREQEELAKRLKELLRELERLQREGSSDEDVRELL REIKELVEEIEKLAREQKYLVEELKRQD 2L4HC2_24 (SEQ ID NO: 57) GTDTDELLRLAKEQAELLKEIKKLVEEIARLVKEIQEDPSDELLKTLA ELVRKLKELVEDMERSMKEQLYIIKKQKS 5L4HC2_1 (SEQ ID NO: 58) GTEETKNSKRVLDIIEELMRQVEENSRELEKRIKELLRQTKEGKTKKE LERDVRRTIEEQKKELRRLKEQVRKTKEEQREEQYRS 5L4HC2_2 (SEQ ID NO: 59) GTRTEKLMKEVEEIQRRQIELLKKLMKEVEDSSKRNQEATERGTTKKK WKEEQEKILEDLKREVRRIIEESRKWLEDLKKKVYES 5L4HC2_3 (SEQ ID NO: 60) GTEKYRLREEVRRTIEEQKENLERLKQEVKETERKTEEWRERNTTTED AQREQIKIIRRLMKEVERNSRRLEKELRRLVEETRES 5L4HC2_6 (SEQ ID NO: 61) GTEKYRLIRESERALRELKRKVRELEEDQRERLDEQRKKVEEGQTTDE LLRQNEENSRRMLKETKKLLREIERIQREQQRQNQEN 5L411C2_7 (SEQ ID NO: 62) GTEKEKEIEKNSREVIKQVEDILREIKENSKRNIEIIKELQKDPSDEK MRETIEQQRENLERLERKARELIRRQERNLRETQYKD 5L4HC2_9 (SEQ ID NO: 63) GTEKYRIIEEQRRNLEDLEREIREIIKKLKEALERLRELVERNSTNDR LLDEVRKIIEEAIEDMKRLLEKVERSIRQNIEELRRS 5L4HC2_10 (SEQ ID NO: 64) GTNKEYLRRKVKELKDQQKRNLEELEREVRRLIKEIEEWRERNTTTDR ALKEIIRQIQRLLEEARRNSEEVLRQIEEIMEETRES 5L4HC2_11 (SEQ ID NO: 65) GTEEERALERIIRAIRELMREVERNSKEVLQWIKEMLRLTKENSSTKE LEERWREIEERQRRNLEKLKEEVRRLEDEIRQETYRS 5L4HC2_12 (SEQ ID NO: 66) GTETKKLVEEVERALRELLKTSEDLVRKVEKALRELLELIRRGGTKDK IEEKIRRVLEEIKRELERQKRKIEDVLRQIKEELYRS 2L6Hanti_1 (SEQ ID NO: 67) SDYLRLATEHNKLAVEANRLAIELAKSAVELAETDPSKTALEHAELAA RLLEMMVQFTKAAQELTREAIRKEGRNEESEKVLRKSKEAYKESEKAL EDARRLLDELRKKGS 2L6Hanti_2 (SEQ ID NO: 68) SEELRKAAENNELAVRLAEAALRMARSALHLFEENPSDEMLKFLELAM EVAKMAAELLKASLKMLKKAAEERGSDESVKYLADKSRDIMRQITEEL KKLEEEAKRAQKRGS 2L6Hanti_3 (SEQ ID NO: 69) SEKARIAVENLEAALRLNRAAAEMQKSAIKIMDDNRSDEKALRYLRLT TKVLRMSVELLRASLELAEKALREEGSDDSAEKVRKEAEEILKESTEI LKEADKETKRADEEGS 2L6Hanti_4 (SEQ ID NO: 70) SRRLELAARINKAAAENARSAIEIQELAARLADELSSSKKVIDFARAT TEVLRMSVKLLKLSLEMLEEAARQDGRSEEVRYLAEESKKILEEARKA LEDADRLTKRIEEEGS 2L6Hanti_5 (SEQ ID NO: 71) TDVLRIAAENLKAAVELAKAALEMAKSAIEIAKTLTEDDEALKFARAA AEVLRMAAKLLKLSIELARKAAEEEGSDDEVRYILDEARKQADELREA LKKVDEIMKELDKRGS 5H2LD_10 (SEQ ID NO: 72) TRRKQEMKRLKYEMEKIREETEEVKKEIEESKKRPQSESAKNLILIMQ LLINQIRLLALQIRMLALQLQE 5H2LD_13 (SEQ ID NO: 73) TEDQERLRKQMEYERKHTEKVEKEIRKVEQKMKSHEDTSLRLLVLIAR LLINQIRLLILQIRSLSNLERN 5H2LD_15 SEQ ID NO: 74) TESTLLILIMRLLVQQSELLQLQIQMLQLLLKANNGTNKTEIERRSKE MEEELKRMKESNREMTKRIKEME 5H2LD_18 (SEQ ID NO: 75) TESDLLRQISKLLIIQIRLLLLQIQMLILLLKMNTGTNTTQITKEAKR IEKEAQEARKELEKMQESNKKQT 6H2LD_8 (SEQ ID NO: 76) TEDEIRKLRKLLEEAEKKLYKLEDKTRRSEEISKTDDDPKAQSLQLIA ESLMLIAESLLIIAISLLLSSRNG 7H2LD_3 (SEQ ID NO: 77) TEDEELQRVEEEIRELERKAKELHYKSEEIRKKVNGRSPQAEALLMIA QALLNISESLLAIAKALLMIARST 8H2LD_4 (SEQ ID NO: 78) TDEREIIKRVKRLLEEVEYLIERLRDQIEKAEKGLLDSRKAQQNAEAL VNLIKAMVLVLKALLLAKELER 8H2LD_4_KE (SEQ ID NO: 79) TEEQYIIEEVKKLLEEVKKLIEELKKQIEKAEKGEEDSRKAQQNAEAL VNLIKAMVLVLKALLLAKELER - The polypeptides of this aspect of the invention have been shown in the examples that follow to be capable or forming homo-oligomers with modular hydrogen bond network-mediated specificity. In various embodiments, the polypeptides comprise or consist of an amino acid sequence that is at least 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97$, 98%, 99%, or 100% identical over its full length to the amino acid sequence selected from the group consisting of SEQ ID NOS:2-79.
- As used throughout the present application, the term “polypeptide” is used in its broadest sense to refer to a sequence of subunit amino acids. The polypeptides of the invention may comprise L-amino acids, D-amino acids (which are resistant to L-amino acid-specific proteases in vivo), or a combination of D- and L-amino acids. The polypeptides described herein may be chemically synthesized or recombinantly expressed. The polypeptides may be linked to other compounds to promote an increased half-life in vivo, such as by PEGylation, HESylation, PASylation, glycosylation, or may be produced as an Fc-fusion or in deimmunized variants. Such linkage can be covalent or non-covalent as is understood by those of skill in the art.
- As will be understood by those of skill in the art, the polypeptides of the invention may include additional residues at the N-terminus, C-terminus, or both that are not present in the polypeptides of the invention; these additional residues are not included in determining the percent identity of the polypeptides of the invention relative to the reference polypeptide.
- In one embodiment, changes from the reference polypeptide are conservative amino acid substitutions. As used herein, “conservative amino acid substitution” means an amino acid substitution that does not alter or substantially alter polypeptide function or other characteristics. A given amino acid can be replaced by a residue having similar physiochemical characteristics, e.g., substituting one aliphatic residue for another (such as Ile, Val, Leu, or Ala for one another), or substitution of one polar residue for another (such as between Lys and Arg; Gln and Asp; or Gln and Asn). Other such conservative substitutions, e.g., substitutions of entire regions having similar hydrophobicity characteristics, are well known. Polypeptides comprising conservative amino acid substitutions can be tested in any one of the assays described herein to confirm that a desired activity, e.g. antigen-binding activity and specificity of a native or reference polypeptide is retained.
- Amino acids can be grouped according to similarities in the properties of their side chains (in A. L. Lehninger, in Biochemistry, second ed., pp. 73-75, Worth Publishers, New York (1975)): (1) non-polar: Ala (A), Val (V), Leu (L), Ile (I), Pro (P), Phe (F), Trp (W), Met (M); (2) uncharged polar: Gly (G), Ser (S), Thr (T), Cys (C), Tyr (Y), Mn (N), Gln (Q); (3) acidic: Asp (D), Gln (E); (4) basic: Lys (K), Arg (R), His (H). Alternatively, naturally occurring residues can be divided into groups based on common side-chain properties: (1) hydrophobic: Norleucine, Met, Ala, Val, Leu, Ile; (2) neutral hydrophilic: Cys, Ser, Thr, Asn, Gln; (3) acidic: Asp, Glu; (4) basic: His, Lys, Arg; (5) residues that influence chain orientation: Gly, Pro; (6) aromatic: Trp, Tyr, Phe. Non-conservative substitutions will entail exchanging a member of one of these classes for another class. Particular conservative substitutions include, for example; Ala into Gly or into Ser; Arg into Lys; Asn into Gln or into H is; Asp into Gln; Cys into Ser; Gln into Asn; Glu into Asp; Gly into Ala or into Pro; His into Asn or into Gln; Ile into Leu or into Val; Leu into Ile or into Val; Lys into Arg, into Gln or into Glu; Met into Leu, into Tyr or into Ile; Phe into Met, into Leu or into Tyr; Ser into Thr; Thr into Ser; Trp into Tyr; Tyr into Trp; and/or Phe into Val, into Ile or into Leu.
- As noted above, the polypeptides of the invention may include additional residues at the N-terminus, C-terminus, or both. Such residues may be any residues suitable for an intended use, including but not limited to detection tags (i.e.: fluorescent proteins, antibody epitope tags, etc.), linkers, ligands suitable for purposes of purification (His tags, etc.), and peptide domains that add functionality to the polypeptides.
- In another aspect, the invention provides polypeptides comprising or consisting of the ammo acid sequence of Formula 1:
-
Z1-Z2-Z3-Z4-Z5, wherein: - Z1 is a helix initiating sequence comprising the amino acid sequence of Formula 2:
-
J1-J2-J3, wherein -
- J1 is selected from the group consisting of S, T, N, and D;
- J2 is selected from the group consisting of P, E, R, K, L, A; and
- J3 is selected from the group consisting of E, D, R, K, I, L, V, A, S, T, Y, or is absent;
- Z3 is a helix connecting sequence having the amino acid sequence of Formula 3:
-
(SEQ ID NO: 1) [RKED]-L-[NQEDRKST]-[NQEDRKST]-[NQEDRKST]-G- [STNQED]-[STNQED]-[STND]-E-[EDRK]-V-[RKED]; - Z5 is a helix terminating sequence comprising the amino acid sequence of Formula 4:
-
xx-xx-[RKEDSTNQYA] (SEQ ID NO: 80); - Z2 is selected from the group consisting of general formulae BX1BX2, X1BBX2, X1BX2B, X1X2BB, BX1X2B, and BBX1X2wherein:
-
B is xx-S-L-xx-xx-Q-xx; -
- X1 and X2 independently have the amino acid sequence of Formula 5:
-
O1O2O3O4O5O6O7 wherein: -
- O1, O4, O5, and O7 are xx;
- O2 and O3 are independently selected from the group consisting of I, L, and A; and
- O6 is L; and
- Z4 is selected from the group consisting of general formulae B B2X3B2X4, X3B2B2X4, X3B2X4B2, X2X2B2B2, B2X1X2B2, and B2B2X1X2, wherein
-
B2 is xx-L-A-xx-xx-Q-xx; and -
- X3 and X4 independently have the amino acid sequence of Formula 6:
-
O10O11O12O13O14O15O16 wherein -
- O10, O13, O14, and O16 are xx
- O11 is L, and
- O12 and O15 are independently selected from the group consisting of I, L, V, and A;
- wherein xx is any amino acid, and
- wherein:
- (i) when Z1 is BX1BX2 then Z2 is X3B2X4B2;
- (ii) when Z1 is X1BBX2 then Z2 is X3B2B2X4;
- (iii) when Z1 is X1BX2B then Z2 is B2X3B2X4;
- (iv) when Z1 is X1X2BB then Z2 is B2B2X3X4;
- (v) when Z1 is BX1X2B then Z2 is B2X3X4B2; and
- (vi) when Z1 is BBX1X2 then Z2 is X3X4B2B2.
- The polypeptides of this aspect of the invention have been shown in the examples that follow to be capable of forming homo-oligomers with modular hydrogen bond network-mediated specificity.
- In one embodiment, J3 is present. In another embodiment, Z1 is TRT. In a further embodiment, Z3 is RLQREGSSDEDVR (SEQ ID NO: 81). In a still further embodiment, Z5 is RQD. In another embodiment, B is RSLREQE (SEQ ID NO: 82). In a further embodiment, O1, O4, O5, and O7 are independently selected from the group consisting of E, R, and K. In a still further embodiment, X1 and X2 are independently selected from the group consisting of EIIRELE (SEQ ID NO: 83), ELLIRELE (SEQ ID NO: 84), and ELAKRLK (SEQ ID NO: 85). In another embodiment, B2 is KLAREQK (SEQ ID NO: 86). In one embodiment, O12 and O15 are independently selected from the group consisting of I, L, V, and A. In another embodiment, X3 and X4 are independently selected from the group consisting of [YE]-LVEELK (SEQ ID NO: 87), [YE]-LLREIK (SEQ ID NO: 88), and [YE]-LVEEIE (SEQ ID NO: 89). As used herein, residues in brackets are alternative residues for a given position within the recited peptide domain. In a further embodiment, X3 and X4 are independently selected from the group consisting of ELVEELK (SEQ ID NO: 90), ELLREIK (SEQ ID NO: 91), and ELVEEIE (SEQ ID NO: 92). In a still further embodiment, Z2 is selected from the group consisting of general formulae BX1BX2, X1BBX2, X1BX2B, and X1X2BB; and Z4 is selected from the group consisting of general formulae B2X3B2X4, X3B2B2X4, X3B2X4B2, and X2X2B2B2. In a further embodiment, the polypeptides of this aspect of the invention comprise a polypeptide that is at least 75% identical Over its fill length to the amino acid sequence selected from the group consisting of SEQ ID NOS:2-5.
- In another embodiment of any aspect, embodiment, or combination of embodiments of the invention, the polypeptides are linked to a cargo. As used herein, the “cargo” can be any suitable component, including but not limited to nucleic acids, peptides, small molecules, amino acids, a detectable label, etc. In one non-limiting embodiment, the polypeptides of the invention can be modified to facilitate covalent linkage to a “cargo” of interest. In one non-limiting example, the polypeptides can be modified, such as by introduction of various cysteine residues at defined positions to facilitate linkage to one or more antigens of interest, such that a nanostructure of the polypeptides would provide a scaffold to provide a large number of antigens for delivery as a vaccine to generate an improved immune response. In some embodiments, some or all native cysteine residues that are present in the polypeptides but not intended to be used for conjugation may be mutated to other amino acids to facilitate conjugation at defined positions. In another non-limiting embodiment, the polypeptides of the invention may be modified by linkage (covalent or non-covalent) with a moiety to help facilitate “endosomal escape.” For applications that involve delivering molecules of interest to a target cell, such as targeted delivery, a critical step can be escape from the endosome—membrane-bound organelle that is the entry point of the delivery vehicle into the cell. Endosomes mature into lysosomes, which degrade their contents. Thus, if the delivery vehicle does not somehow “escape” from the endosome before it becomes a lysosome, it will be degraded and will not perform its function. There are a variety of lipids or organic polymers that disrupt the endosome and allow escape into the cytosol. Thus, in this embodiment, the polypeptides can be modified, for example, by introducing cysteine residues that will allow chemical conjugation of such a lipid or organic polymer to the monomer or resulting assembly surface. In another non-limiting example, the polypeptides can be modified, for example, by introducing cysteine residues that will allow chemical conjugation of fluorophores or other imaging agents that allow visualization of the nanostructures of the invention in vitro or in vivo.
- In another embodiment, the invention provides homo-oligomers (i.e.: homodimer, homotrimers, homotetramer, etc.) comprising a plurality of polypeptides of the present invention having the same amino acid sequence. As shown in the examples that follow, the polypeptides of the invention are capable of forming homo-oligomers with modular hydrogen bond network-mediated specificity.
- In a further aspect, the present invention provides isolated nucleic acids encoding a polypeptide of the present invention. The isolated nucleic acid sequence may comprise RNA or DNA. As used herein, “isolated nucleic acids” are those that have been removed from their normal surrounding nucleic acid sequences in the genome or in cDNA sequences. Such isolated nucleic acid sequences may comprise additional sequences useful for promoting expression and/or purification of the encoded protein, including but not limited to polyA sequences, modified Kozak sequences, and sequences encoding epitope tags, export signals, and secretory signals, nuclear localization signals, and plasma membrane localization signals. It will be apparent to those of skill in the art, based on the teachings herein, what nucleic acid sequences will encode the polypeptides of the invention.
- In another aspect, the present invention provides recombinant expression vectors comprising the isolated nucleic acid of any aspect of the invention operatively linked to a suitable control sequence. “Recombinant expression vector” includes vectors that operatively link a nucleic acid coding region or gene to any control sequences capable of effecting expression of the gene product. “Control sequences” operably linked to the nucleic acid sequences of the invention are nucleic acid sequences capable of effecting the expression of the nucleic acid molecules. The control sequences need not be contiguous with the nucleic acid sequences, so long as they function to direct the expression thereof. Thus, for example, intervening untranslated yet transcribed sequences can be present between a promoter sequence and the nucleic acid sequences and the promoter sequence can still be considered “operably linked” to the coding sequence. Other such control sequences include, but are not limited to, polyadenylation signals, termination signals, and ribosome binding sites. Such expression vectors can be of any type known in the art, including but not limited plasmid and viral-based expression vectors. The control sequence used to drive expression of the disclosed nucleic acid sequences in a mammalian system may be constitutive (driven by any of a variety of promoters, including but not limited to, CMV, SV40, RSV, actin, EF) or inducible (driven by any of a number of inducible promoters including, but not limited to, tetracycline, ecdysone, steroid-responsive). The construction of expression vectors for use in transfecting host cells is well known in the art, and thus can be accomplished via standard techniques. (See, for example, Sambrook, Fritsch, and Maniatis, in: Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory Press, 1989; Gene Transfer and Expression Protocols, pp. 109-128, ed. E. J. Murray. The Humana Press Inc., Clifton, N.J.), and the Ambion 1998 Catalog (Ambion Austin, Tex.). The expression vector must be replicable in the host organisms either as an episome or by integration into host chromosomal DNA. In various embodiments, the expression vector may comprise a plasmid, viral-based vector, or any other suitable expression vector.
- In a further aspect, the present invention provides host cells that comprise the recombinant expression vectors disclosed herein, wherein the host cells can be either prokaryotic or eukaryotic. The cells can be transiently or stably engineered to incorporate the expression vector of the invention, using standard techniques in the art, including but not limited to standard bacterial transformations, calcium phosphate co-precipitation, electroporation, or liposome mediated-, DEAE dextran mediated-, polycationic mediated-, or viral mediated transfection. (See, for example, Molecular Cloning: A Laboratory Manual (Sambrook, et al., 1989, Cold Spring Harbor Laboratory Press; Culture of Animal Cells: A Manual of Basic Technique, 2nd Ed. (R. I. Freshney. 1987. Liss, Inc. New York, N.Y.). A method of producing a polypeptide according to the invention is an additional part of the invention. The method comprises the steps of (a) culturing a host according to this aspect of the invention under conditions conducive to the expression of the polypeptide, and (b) optionally, recovering the expressed polypeptide. The expressed polypeptide can be recovered from the cell free extract, but preferably they are recovered from the culture medium. Methods to recover polypeptide from cell free extracts or culture medium are well known to the person skilled in the art.
- The modular and predictable nature of DNA interaction specificity is central to molecular biology manipulations and DNA nanotechnology, but without parallels in nature, it has not been evident how to achieve analogous programmable specificity with proteins. There are more polar amino acids than DNA bases, each of which can adopt numerous sidechain conformations in the context of different backbones, allowing for countless network possibilities. The inventors have developed a general computational method, HBNet™ was developed to rapidly enumerate all sidechain hydrogen bond networks possible in an input backbone structure (
FIG. 1A ). - Traditional protein design algorithms are not well suited for this purpose because the total system energy is generally expressed as the sum of interactions between pairs of residues for computational efficiency, and hence cannot clearly distinguish a connected hydrogen bond network from a set of disconnected hydrogen bonds. HBNet™ starts by precomputing the hydrogen bonding and steric repulsion interactions between all conformations (rotameric states) of all pairs of polar sidechains. These energies are stored in a graph data structure where the nodes are residue positions, positions close in three-dimensional space are connected by edges, and for each edge there is a matrix representing the interaction energies between the different rotameric states at the two positions. HBNet™ then traverses this graph to identify all networks of three or more residues connected by low energy hydrogen bonds with little steric repulsion (
FIG. 1B ). The mast extensive and lowest energy networks (FIG. 1C ) are kept fixed in subsequent design calculations at the remaining residue positions. Networks with buried donors and acceptors not making hydrogen bonds (unsatisfied) are rejected (FIG. 1D ). Details of the method, as well as scripts for carrying out the design calculations, are described herein. - Inspired by the DNA double helix, it was attempted to host the hydrogen bond networks in protein oligomers with an inherent repeat structure to enable networks to be reutilized within the same scaffold. Attention was paid to coiled-coils, which are abundant in nature, the subject of many protein design studies, and can be generated parametrically, resulting in repeating geometric cross-sections. In natural and designed coiled coils, buried polar interactions can also alter specificity; however, most of these cases involve at most one or two sidechain-sidechain hydrogen bonds with remaining polar atoms satisfied by water or ions—the relatively small cross-sectional interface area of canonical coiled-coils limits the diversity and location of possible networks. To overcome these limitations, focus was placed on oligomeric structures with two concentric rings of helices (
FIG. 1E ). - “Two-ring” topologies were built from helical hairpin monomer subunits comprising an inner and outer helix connected by a short loop using a generalization of the Crick coiled-coil parameterization. Wide ranges of backbones were generated by systematically sampling the radii and helical phases of the inner and outer helices, the z-offset between inner and outer helices, and the overall supercoil twist (
FIG. 1E ). HBNet™ was then used to search these backbones for networks that span the intermolecular interface, have all heavy atom donors and acceptors satisfied, and involve at least three sidechains (FIG. 1F ; because of these stringent requirements, only a small fraction of backbones can support such networks-but by systematically varying the degrees of freedoms or the two-ring structures, tens of thousands of backbones can be generated, and the efficiency of HBNet™ makes searching for networks in large numbers of backbones computationally tractable). RosettaDesign™ was then used to optimize rotamers at the remaining residue positions in the context of the cyclic symmetry of the oligomer (FIG. 1G ). Designs were ranked based on the total oligomer energy using the Rosetta™ all atom force field, filtering to remove designs with large cavities or poor packing around the networks. The top-ranked designs were evaluated using Rosetta™ “fold-and-dock” calculations. Designs with energy landscapes shaped like funnels leading into the target designed structure were identified, and a total of 114 dimeric, trimeric, and tetrameric designs spanning a broad range of superhelical parameters and hydrogen bond networks were selected for experimental characterization. - Synthetic genes encoding the selected designs were obtained and the proteins expressed in Escherichia coli. The ˜90% (101/114) of designs that were expressed and soluble were purified by affinity chromatography, and their oligomerization state evaluated by size-exclusion chromatography multi-angle light scattering (SEC-MALS). Sixty-six of the 101 were found to have the designed oligomerization state. The 101 soluble designs span eight different topologies; of these, the supercoiled tetramers have the largest buried interface area, yielded the fewest designs with all buried donors and acceptors satisfied, and had the lowest success rate (only 3 of the 13 soluble designs properly assembled). Excluding supercoiled tetramers, 72˜˜(63/88) assembled to the designed oligomeric state, and of these, 89% (56/63) eluted as a single peak from the SEC column. The designed proteins were further characterized by circular dichroism (CD) spectroscopy; all designs tested exhibited characteristic a-helical spectra, and CD monitored unfolding experiments showed that more than 90% of these were stable at 95° C. (
FIG. 2 ). Tested peptides include the following: -
AXAX (SEQ. ID NO: 2) TRTRSLREQEEIIRELERSLREQEELLRELERLQREGSSDEDVR ELLREIKKLAREQKYLVEELKKLAREQKRQD; XAAX (SEQ ID NO: 3) TRTEIIRELERSLREQERSLREQEELLRELERLQREGSSDEDVR ELLREIKKLAREQKKLAREQKYLVEELKRQD; XAXA (SEQ ID NO: 4) TRTEIIRELERSLREQEELAKRLKRSLREQERLQREGSSDEDVR KLAREQKELVEEIEKLAREQKYLVEELKRQD; and XXAA (SEQ. ID NO: 5) TRTEIIRELEELAKRLKRSLREQERSLREQERLQREGSSDEDVR KLAREQKKLAREQKELVEEIEYLVEELKRQD - Polypeptide Nomenclature: The name of the polypeptides shown below indicates oligomerization state and topology, and sequences below are organized by topology and oligomerization state. The first two characters indicate supercoil geometry: ‘2L’ refers to a two-layer heptad repeat that results in a left-handed supercoil; ‘3L’ refers to a three-layer 11-residue repeat with a right-handed supercoil; and ‘5L’ refers to untwisted designs with a five-layer 18-residue repeat and straight helices (no supercoiling), where “layer” in this context is the number of unique repeating geometric slices, or layers, along the supercoil axis. The middle two characters indicate the total -number of helices, and the final two indicate symmetry. Thus, “2L6HC3” denotes a left-handed, six-helix trimer with C3 symmetry. Underlined residues are optional.
-
2L8HC4_12 (SEQ ID NO: 6) GTAIEANSRMLKALIEAKAIWKALWANSLLLEATSRGDTERMRQWAEE ARIYKEAKKIIDEADEIVKEAKERHD 5L6HC3_1 (SEQ ID NO: 7) SEELRAVADLQRLNIELARKLLEAVARLQELNIDLVRKTSELTDEKTI REEIRKVKEESKRIVEEAEEEIRRAKEESRYIADESRGS 2L4HC2_9 (SEQ ID NO: 8) GTSDYIIEQIQRDQEEARKKVEEAEERLERVKEASKRGVSSDQLLDLI RELAEIIEELIRIIRRSNEAIKELIKNQS 2L4HC2_11 (SEQ ID NO: 9) GSEDYKLREAQRELDKQRKDTEEIRKRLKEIQRLTDERTSTADELIKE LREIIRRLQEQSEKLREIIEELEKIIRKR 2L4HC2_23 (SEQ ID NO: 10) GTRTEIIRELERSLREQEELAKRLKELLRELERLQREGSSDEDVRELL REIKELVEEIEKLAREQKYLVEELKRQD 2L4HC2_24 (SEQ ID NO: 11) GTDTDELLRLAKEQAELLKEIKKLVEEIARLVKEIQEDPSDELLKTLA ELVRKLKELVEDMERSMKEQLYIIKKQKS 5L8HC4_6 (SEQ ID NO: 12) GSKDTEDSRKIWEDIRRLLEEARKNSEEIWKEITKNPDTSEIARLLSE QLLEIAEMLVRIAELLSRQTEQR 2L6HC3_AXAX (SEQ ID NO: 13) GTKYEIREALKEAQKQLEDLKRMLDELRRNLEELKRNPSEDALVENNE LIVRVLEVIVENNRSIIEILKLLAKSD 2L6HC3_AXBX (SEQ ID NO: 14) GTKYKIREMLEEAKRSLEELRRILEKLKESLRELRRNPSEDALVNNNE VIVKAIEASVENQRIIIELARMLAESD 2L6HC3_AXXB (SEQ ID NO: 15) GTKYRIKDTLRELKRALEELKKILEELQRSLEELRRNPSEDALVNNNE VIVKAIEAAVRAIEISAENQRMLAESD 2L6HC3_XAAA (SEQ ID NO: 16) GTKYEARKQLEEMKKQLKDLKRSLERLREILERLEENPSEDVIVEAIR AIVENNKQIVENNRSIIENNETIIRSD 2L6HC3_XBXA (SEQ ID NO: 17) GTKYELRRQLEELEKLLRELRKSLDELRKILEELERNPSEDVIVRAIK ASVKNQEIIVEVLRAIIENNKTIAKSD 2L6HC3_12 (SEQ ID NO: 18) GTKYELRRALEELEKALQELREMLRKLKESLEELKKNPSEDALVRNNE LIVEVLRVIVEVLSIIARVLEINARSD 2L6HC3_13 (SEQ ID NO: 19) GTKYELRRALEELEKALRELKKSLDELERSLEELEKNPSEDALVENNR LNVENNKIIVEVLRIIAEVLKINAKSD 2L6HC3_6 (SEQ ID NO: 20) GTKYKIKETLKRLEDSLRELRRILEELKEMLERLEKNPDKDVIVEVLK VIVKAIEASVENQRISAENQKALAESD 2L6HC3_10 (SEQ ID NO: 21) GTKYEIKKALKELEEAIQKLKKSLKELKESLKELQKNPSEDALVKNNS LNVANNEIIVEVLEIIARILELLARSD 2L6HC3_11 (SEQ ID NO: 22) GTKYEIKEALRELNRALKELKEALRELERSLRELQKNPDKDALVRNNE LNVDVARIIVEVLSIIARVLELLAKSD 2L6HC3_12 (SEQ ID NO: 23) GTKYELRRALEELEKALQELREMLRKLKESLEELKKNPSEDALVRNNE LIVEVLRVIVEVLSIIARVLEINARSD 2L6HC3_13 (SEQ ID NO: 24) GTKYELRRALEELEKALRELKKSLDELERSLEELEKNPSEDALVENNR LNVENNKIIVEVLRIIAEVLKINAKSD 2L6HC3_14 (SEQ ID NO: 25) GTKYELREAIRKLEEALRKLKKALDELRKSLEELKKNPSEDALVRNNE LNVKVAEIIVKVLKIIAEAIKINAKSD 2L6HC3_19 (SEQ ID NO: 26) GTEEYKLRELLKRHNEVLKELQKAAKEAEEVAERFKKTNDITEAIRVI ADLLRAIVKAIETNSRVVKMIVELNE 2L6HCL2 (SEQ ID NO: 27) GTKYIEKLLREAQRTLEELKRLLEELKEMLKELERANATDARLIAEVI RVIVEVLRASVENQEMIIRILKAITEE 2L6HC3_23 (SEQ ID NO: 28) TEKDVLRIIVKNNEIIVKVLSVIAEVLKIIAKILENPSEYMLKELKKA LKELEKMLKELRKSLKELKEALRELEGS 2L6HC3_37 (SEQ ID NO: 29) GTLDYKLDEMLKKLEKSREEMEKMAQELRRALEELEKNSNVDKVLKII IKAIQLSIENQKLNLEAVRLLIEAQKS 2L6HC3_6 (SEQ ID NO: 30) GTKYKIKETLKRLEDSLRELRRILEELKEMLERLEKNPDKDVIVEVLK VIVKAIEASVENQRISAENQKALAESD 2L8HC4_3 (SEQ ID NO: 31) GTDEYKWKEEVRRFEEEAKKWEEELKEMRKRIEDAKKGRPTLKVNLEA AEALLEAARLIVEAAKLLLAAAKLNEKQN 2L8HC4_9 (SEQ ID NO: 32) GSDEDRKAKELIERQRKLTDEAEEWAKQNEEIAKKIEKQPDTSLVARM LANVSRMLLATNRALLANTEALEALIRKT 2L8HC4_12 (SEQ ID NO: 33) GTAIEANSRMLKALIEIAKAIWKALWANSLLLEATSRGDTERMRQWAE EAREIYKEAKKIIDEADEIVKEAKERHD 3L6HC 2_4 (SEQ ID NO: 34) SALEKIAKLIIEAARLSAELARRAARASAEMARKAIEAVSEERGSESL LKIVADLIVESQEAVVRLIIESQQIAAKLAEDLIRAAKEAASDESKME EVAKEVQERAERAARDIERKLKRVLEELDYKLKESRDGS 3L6HC2_6 (SEQ ID NO: 35) TALEIAVRLNREAAREAARENADTARKAARRIAEVAKRLAEENRDAKL AARLLAEIARLLAELIARQSELLAEWLATQSKLAAELARKDTSATDEA ERIRKESEELLDKVREEIKRLEDEVSKTIEELSERVRGS 3L6HC2_7 (SEQ ID NO: 36) SILELAHESNRRALEMASRANREAMKAAREMIRAASEAARRAGSSNDK DSLRMIEEALRLALRMIEETNKKAVRMVLENNRKMVEAEKKKLSEEEI KRIAKETEDRMREIARRASEEARRLAEEIKREADYRSGS 5L6HC3_1 (SEQ ID NO: 37) SEELRAVADLQRLNIELARKLLEAVARLQELNIDLVRKTSELTDEKTI REEIRKVKEESKRIVEEAEEEIRRAKEESRYIADESRGS 5L6HC3_3 (SEQ ID NO: 38) GTERKDRLRKELKRIAEETDKWVEELKEELERILRTIEELRKDPSSEV IVDIARIQLEALREVIRVVAENSKAILEAIHRVIEEG 5L6HC3_5 (SEQ ID NO: 39) SKEVRLQKLNAEIMKEIMELIIRLQEANARIIEELVRLIIDLERSTDS KRMIEEIRKVAERAIEESKRLLEEAEKAMRRAIYESEDALREGS 5L8HC4_1 (SEQ ID NO: 40) GSKVEELLRKSEEAAERAKRELERLLEESERIVAEAQALAEKYESQKV WVRILIELIRATNRMLAEIARILLEMIEVTNRMIAESTK 5L8HC4_2 (SEQ ID NO: 41) SEQLKEIARILIKLIESLTRFILEVARILIELIEETQRLIVASTDSDE SELERIARESKKKAKKALDELKKIVDDQRREAKKAIEELEYDGS 5L8HC4_6 (SEQ ID NO: 42) GSKDTEDSRKIWEDIRRLLEEARKNSEEIWKEITKNPDTSEIARLLSE QLLEIAEMLVRIAELLSRQTEQR 2L4HC2_1 (SEQ ID NO: 43) GTAYELLRKAEELEKKQQELLKRQEELAKTAEELRKKGGNADSMMKII KESTRIVRESTEIVKELLKIIRELRRQS 2L4HC2_5 (SEQ ID NO: 44) GTRTEYLKKLAEEAKELAKRSRELSKESRRLSEEARRDPDKEKLLRVV KKLQEVIEELQRVIEELLRVIKEALENQS 2L4HC2_6 (SEQ ID NO: 45) GTETEYQRELAREARRLAKRSRELSERSRKLSEDAKRDPDKDKLLEVV ERLQQVIEELQKVIEELLRVIESSLKTIS 2L4HC2_9 (SEQ ID NO: 46) GTSDYIIEQIQRDQEEARKKVEEAEERLERVKEASKRGVSSDQLLDLI RELAEIIEELIRIIRRSNEAIKELIKNQS 2L4HC2_10 (SEQ ID NO: 47) GTEEYRRKEQEERTKEQQERTERQRRKTEELKRATKEGTLTPEEAIRQ AQKQSENAERQSREAEKQSREANEALRKR 2L4HC2_11 (SEQ ID NO: 48) GSEDYKLREAQRELDKQRKDTEEIRKRLKEIQRLTDERTSTADELIKE LREIIRRLQEQSEKLREIIEELEKIIRKR 2L4HC2_12 (SEQ ID NO: 49) GSEDYKLKELQKRNKKQEEEAKRNDDERKKIEELTRKRTSTADELIRE LQRSNEEMQRSQREMQDQSRRLEDIIRKR 2L4HC2_14 (SEQ ID NO: 50) GTEDYKRREAERKLQKQQEELKELKRKLEEIRELHEKGVGSPDRLIRE LERIIRELQRMQKENEKIIKELQRIIKKR 2L4HC2_18 (SEQ ID NO: 51) GTESKYLLEEARRLKDEARKLKEEAKKVKEESRKLIERIDRGEDSDRE LLERLKEQNNRLLEIIERLLEIIERLLKLIEEWTRDS 2L4HC2_19 (SEQ ID NO: 52) GTEEDYAEREIRKMKEEQKRQRKRLEELERELQEMQEKKREGTSDAKE VIDQLERIIRELQEIIRSQEDITRKLEEIIRRMKENS 2L4HC2_20 (SEQ ID NO: 53) GTNKEELKRTMEEQQRILEKLLRTIKEQKEILRKQEEGRATKEELKRL TKLAQEQERMMRELIDLARKQAYLLKRES 2L4HC2_21 (SEQ ID NO: 54) GTREEKIRRILEEIQKIMEEIKRIMEEIKRTQEEAEKHGSSKKAIEKQ KELLRRLEELLRKLERLLRELEYLMRDEK 2L4HC2_22 (SEQ ID NO: 55) GTREEWLYRILELIERIERLIKEIIRLSRRALELLENNASNEEWAQEI KEMQRKIQEWLKQILEWLKKIKEWIRESQ 2L4HC2_23 (SEQ ID NO: 56) GTRTEIIRELERSLREQEELAKRLKELLRELERLQREGSSDEDVRELL REIKELVEEIEKLAREQKYLVEELKRQD 2L4HC2_24 (SEQ ID NO: 57) GTDTDELLRLAKEQAELLKEIKKLVEEIARLVKEIQEDPSDELLKTLA ELVRKLKELVEDMERSMKEQLYIIKKQKS 5L4HC2_1 (SEQ ID NO: 58) GTEETKNSKRVLDIIEELMRQVEENSRELEKRIKELLRQTKEGKTKKE LERDVRRTIEEQKKELRRLKEQVRKTKEEQREEQYRS 5L4HC2_2 (SEQ ID NO: 59) GTRTEKLMKEVEEIQRRQIELLKKLMKEVEDSSKRNQEATERGTTKKK WKEEQEKILEDLKREVRRIIEESRKWLEDLKKKVYES 5L4HC2_3 (SEQ ID NO: 60) GTEKYRLREEVRRTIEEQKENLERLKQEVKETERKTEEWRERNTTTED AQREQIKIIRRLMKEVERNSRRLEKELRRLVEETRES 5L4HC2_6 (SEQ ID NO: 61) GTEKYRLIRESERALRELKRKVRELEEDQRERLDEQRKKVEEGQTTDE LLRQNEENSRRMLKETKKLLREIERIQREQQRQNQEN 5L411C2_7 (SEQ ID NO: 62) GTEKEKEIEKNSREVIKQVEDILREIKENSKRNIEIIKELQKDPSDEK MRETIEQQRENLERLERKARELIRRQERNLRETQYKD 5L4HC2_9 (SEQ ID NO: 63) GTEKYRIIEEQRRNLEDLEREIREIIKKLKEALERLRELVERNSTNDR LLDEVRKIIEEAIEDMKRLLEKVERSIRQNIEELRRS 5L4HC2_10 (SEQ ID NO: 64) GTNKEYLRRKVKELKDQQKRNLEELEREVRRLIKEIEEWRERNTTTDR ALKEIIRQIQRLLEEARRNSEEVLRQIEEIMEETRES 5L4HC2_11 (SEQ ID NO: 65) GTEEERALERIIRAIRELMREVERNSKEVLQWIKEMLRLTKENSSTKE LEERWREIEERQRRNLEKLKEEVRRLEDEIRQETYRS 5L4HC2_12 (SEQ ID NO: 66) GTETKKLVEEVERALRELLKTSEDLVRKVEKALRELLELIRRGGTKDK IEEKIRRVLEEIKRELERQKRKIEDVLRQIKEELYRS 2L6Hanti_1 (SEQ ID NO: 67) SDYLRLATEHNKLAVEANRLAIELAKSAVELAETDPSKTALEHAELAA RLLEMMVQFTKAAQELTREAIRKEGRNEESEKVLRKSKEAYKESEKAL EDARRLLDELRKKGS 2L6Hanti_2 (SEQ ID NO: 68) SEELRKAAENNELAVRLAEAALRMARSALHLFEENPSDEMLKFLELAM EVAKMAAELLKASLKMLKKAAEERGSDESVKYLADKSRDIMRQITEEL KKLEEEAKRAQKRGS 2L6Hanti_3 (SEQ ID NO: 69) SEKARIAVENLEAALRLNRAAAEMQKSAIKIMDDNRSDEKALRYLRLT TKVLRMSVELLRASLELAEKALREEGSDDSAEKVRKEAEEILKESTEI LKEADKETKRADEEGS 2L6Hanti_4 (SEQ ID NO: 70) SRRLELAARINKAAAENARSAIEIQELAARLADELSSSKKVIDFARAT TEVLRMSVKLLKLSLEMLEEAARQDGRSEEVRYLAEESKKILEEARKA LEDADRLTKRIEEEGS 2L6Hanti_5 (SEQ ID NO: 71) TDVLRIAAENLKAAVELAKAALEMAKSAIEIAKTLTEDDEALKFARAA AEVLRMAAKLLKLSIELARKAAEEEGSDDEVRYILDEARKQADELREA LKKVDEIMKELDKRGS 5H2LD_10 (SEQ ID NO: 72) TRRKQEMKRLKYEMEKIREETEEVKKEIEESKKRPQSESAKNLILIMQ LLINQIRLLALQIRMLALQLQE 5H2LD_13 (SEQ ID NO: 73) TEDQERLRKQMEYERKHTEKVEKEIRKVEQKMKSHEDTSLRLLVLIAR LLINQIRLLILQIRSLSNLERN 5H2LD_15 SEQ ID NO: 74) TESTLLILIMRLLVQQSELLQLQIQMLQLLLKANNGTNKTEIERRSKE MEEELKRMKESNREMTKRIKEME 5H2LD_18 (SEQ ID NO: 75) TESDLLRQISKLLIIQIRLLLLQIQMLILLLKMNTGTNTTQITKEAKR IEKEAQEARKELEKMQESNKKQT 6H2LD_8 (SEQ ID NO: 76) TEDEIRKLRKLLEEAEKKLYKLEDKTRRSEEISKTDDDPKAQSLQLIA ESLMLIAESLLIIAISLLLSSRNG 7H2LD_3 (SEQ ID NO: 77) TEDEELQRVEEEIRELERKAKELHYKSEEIRKKVNGRSPQAEALLMIA QALLNISESLLAIAKALLMIARST 8H2LD_4 (SEQ ID NO: 78) TDEREIIKRVKRLLEEVEYLIERLRDQIEKAEKGLLDSRKAQQNAEAL VNLIKAMVLVLKALLLAKELER 8H2LD_4_KE (SEQ ID NO: 79) TEEQYIIEEVKKLLEEVKKLIEELKKQIEKAEKGEEDSRKAQQNAEAL VNLIKAMVLVLKALLLAKELER - To probe the energetic contribution of the outer ring of helices, the stability of the two-ring designs was compared to corresponding designs with only the inner ring; core interface positions of the inner helices, including hydrogen bond network residues, were retained and solvent-exposed surface positions were redesigned in the same manner as the surface of the two-ring designs. 2L4HC2_9 (
FIG. 2C ), a supercoiled homodimer is folded and thermostable (FIG. 2D ); its inner helix peptide, 2L4HC2_9 inner (FIG. 2E ), also forms a homodimeric coiled-coil, but with markedly decreased thermostability (FIG. 2F ). 2L6HC3_13 (FIG. 2G ), a supercoiled homotrimer is also folded and thermostable (FIG. 2H ); however, the corresponding inner ring peptide (FIG. 2I ) in isolation is unfolded (FIG. 2J ) and monomeric. The sequence of this inner helix is notable because it has four Asn residues at canonical a or d heptad packing positions where Asn is destabilizing, and also because its other a and d positions are Leu and Ile respectively, which has been found to favor homotetramers. In the presence of the outer helix and designed hydrogen bond networks, the two-ring design assembles to the intended trimeric structure as elucidated by x-ray crystallography (FIG. 3A ). Together, these results suggest that the outer ring of helices not only increases thermostability but also can drive coiled-coil assembly, even in the context of an inner helix with low helical propensity and non-canonical helical packing, permitting greater sequence diversity across larger interfaces. - To assess the accuracy of the designs, ten crystal structures were determined spanning a range of oligomerization states, superhelical parameters, and hydrogen bond networks (
FIG. 3A-F ). Designs for which crystals were not obtained were characterized by small angle x-ray scattering (SAXS) (FIG. 4 ). Structures for three left-handed trimers, four left-handed dimers, a left-handed tetramer, and an untwisted triangle-shaped trimer were solved. Additional topologies characterized by SAXS include square-shaped untwisted tetramers (FIG. 4A ) and dimers (FIG. 4B ), as well as six-helix dimers (two inner, one outer helix) with either parallel right-handed (FIG. 4C ) or antiparallel left-handed (FIG. 4D ) supercoil geometry. Five of the x-ray crystallography-verified designs (FIG. 3A , C-F) were also characterized by SAXS, and the experimentally determined spectra were found to closely match those computed from the design models, suggesting that very similar structures are populated in solution. - The three left-handed trimer structures (2L6HC3_6, 2L6HC3_12, and 2L6HC3_13) are remarkably similar to the design models with sub-angstrom RMSD across all backbone Cα atoms and across all heavy atoms of the hydrogen bond networks (
FIG. 3A-B ). These structures are constructed with supercoil phases of 0, 120 and 240 degrees for the inner helices, and 60, 180, and 300 degrees for the outer helices; loops connect outer N-terminal helices to inner C-terminal helices (at −60 degrees from the outer helix). Extensive nine or twelve-residue networks form the intended hydrogen bonds in the crystal structures (FIGS. 3 , A and B middle). Unlike previously designed single-ring trimers where three buried asparagines resulted in substantially decreased thermostability, these two-ring turners are stable up to 95° C. and ˜4.5M guanidinium chloride with numerous buried polar residues; 2L6HC3_13 has twelve completely buried asparagines, and 2L6HC3_6 has 24 buried polar residues confined to a small region of the interface, including six asparagines and six glutamines. - The four left-handed dimer crystal structures (2L4HC2_9, 2L4HC2_23, 2L4HC2_11, and 2L4HC2_24) all have the designed parallel two-ring topology. Two of the dimer structures have hydrogen bond networks in close agreement to the designs: 2L4HC2_9 (
FIG. 3D ) and 2L4HC2_23 (FIG. 3E ) have 0.39 Å and 0.92 Å RMSD across all network residue heavy-atoms, respectively, and 0.39 Å and 1.16 Å RMSD over all α atoms. The other two, 2L4HC2_11 and 2L4HC2_24, have slight structural deviations from the design models caused by water displacing designed network sidechains; in the former, the interface shifts ˜2 Å due to a buried water molecule bridging two network residues, and in the latter, the backbone is nearly identical to the design model but sidechains of the designed network are displaced by ordered water molecules. These two cases highlight the need for high connectivity and satisfaction (all polar atoms participating in hydrogen bonds) of the networks. The lefthanded tetramer structure has the designed overall topology (FIG. 3C ), and SAXS data is in close agreement with the design model, but sidechain density was uncertain due to low (3.8 Å) resolution. The amino acid sequence is unrelated to any known sequence, and the top hit in structure-based searches of the Protein Data Bank (PDB) has a quite different helical bundle arrangement. - The five antiparallel dimers (2L6Hanti_1-5) were soluble and assembled to the designed oligomeric state, with SAXS data in agreement with the design models (
FIG. 4D ). Design 2L6Hanti_3 contains a hydrogen bond network with a buried Tyr at the dimer interface (FIG. 4D ). Of the three right-handed six-helix dimers characterized by SAXS, 3L6HC2 4 (FIG. 4C ) and 3L6HC2 7 exhibited scattering in agreement with the design models, whereas3L6HC2 2 did not. While3L614C2 2 was designed to form a parallel dimer, its crystal structure revealed an antiparallel dimer interface, highlighting two design lessons: first, the importance of intermolecular hydrogen bonds at the binding interface (the3L6HC2 2 design model has only two across the interface compared to 9 in 2L6HC3_6 (FIG. 3B )), and second, the importance of favorable hydrophobic contacts complementing the networks (the3L6HC2 2 design model has mainly alanines at the interface). - SAXS data suggest that untwisted dimer, trimer and tetramer designs assemble into the target triangular and square conformations (
FIG. 4A-B ). Guinier analysis and fit of the low-q region of the scattering vector indicates that the seven untwisted dimers tested are in the correct oligomeric state, four of which have very close agreement between the experimental spectra and design models (FIG. 4B ). The SAXS data on the three untwisted tetramers (5L8HC4_1, 5L8HC4_2, and 5L8HC4_6) were all in close agreement with the corresponding design models (FIG. 4A ).5L8HC4 6 has a distinctive network with a Trp making a buried hydrogen bond at one end of the network, which then propagates outwards towards solvent, connecting to an Glu on the surface (FIG. 4A ). It is believed that oligomers with such uniformly straight helices do not exist in nature, nor have these topologies been designed previously. - The 2.36 Å crystal structure of the untwisted winter (5L6HC3_1) reveals straight helices with 0.51 Å RMSD to the design model over all Cα atoms (
FIG. 3F ). The two hydrogen bond networks (FIG. 3F middle), as well as the hydrophobic packing residues surrounding the networks (FIG. 3F right), are nearly identical between the crystal structure and design model, with 0.41 ↑1 and 0.48 Å RMSD over all network heavy-atoms. Like the supercoiled trimers, each of these networks contains sidechains from every helix, and helices were constructed to be uniformly symmetrical and equidistant. The helices are nearly perfectly straight in the crystal structure with supercoil twist values very close to the idealized design value of zero: ω0=−0.036 degrees/residue for the inner three helices and ω0=−0.037 degrees/residues for the outer three helices. Blast searches with the amino acid sequence returned no matches with E-values better than 10, and the top hit in a search for similar structures in the PDB has three supercoiled helices flanked by long extended regions. - Several trends emerged distinguishing successful designs. First, in successful designs nearly all buried polar groups made hydrogen bonds. Designs with all heavy atom donors and acceptors satisfied were selected, but the networks had varying numbers of polar hydrogens unsatisfied. Networks with the largest fraction of satisfied polar groups generally had relatively high connectivity, both with respect to the total number of hydrogen bonds and number of sidechains contributing to the network. Networks with the highest connectivity and structural accuracy were those that spanned the entire cross-sectional interface, with each helix contributing at least one sidechain (
FIG. 3A, 3B, 3E, 3F ). Design 2L6HC3_13 also has two additional smaller networks comprising a single symmetric Asn making two hydrogen bonds but with one polar hydrogen unsatisfied; in the crystal structure, these residues move away from the design model, displaced by water molecules. - To test the role of the designed hydrogen bond networks in conferring specificity for the target oligomeric state, control design calculations were carried out using the same protein backbones without HBNet™, yielding uniformly hydrophobic interfaces. In silico, despite having lower total energy in the designed oligomeric state, these designs exhibit more pronounced alternative energy-minima in fold-and-dock and asymmetric docking calculations, consistent with the much less restrictive geometry of nonpolar packing interactions. Experimentally, these hydrophobic designs exhibited less soluble expression than their counterparts with hydrogen bond networks and tended to precipitate during purification; of those that remained in solution long enough to collect SEC-MALS data, all but one formed higher molecular weight aggregates, eluting as multiple peaks from the SEC column. These results suggest that the designed hydrogen bond networks confer specificity for the target oligomeric state and resolve the degeneracy of alternative states observed with purely hydrophobic packing (this degeneracy is considerably more pronounced for herein-described 2 ring structures than traditional single ring coiled coils, which have many fewer total hydrophobic residues and less inter-helical interface area).
- An in vivo yeast-two-hybrid assay was used to further probe the interaction specificity of the designed oligomers. Sequences encoding a range of dimers, trimers, and tetramers were crossed against each other in all-by-all binding assays (
FIG. 5 ); synthetic genes for the designs were cloned in frame with both DNA-binding domains and transcriptional activation domains in separate vectors such that binding of the designed protein interaction is necessary for cell growth. Designs in which the hydrogen bond networks partition hydrophobic interface area into relatively small regions are considerably more specific than designs with large contiguous hydrophobic patches at the helical interface (FIG. 5 , A and B). The designs with the best-partitioned hydrophobic area had networks spanning the entire oligomeric interface, with each helix contributing at least one sidechain. This unifying design principle can readily be enforced using HBNet™. - To test if regular arrays of networks can confer specificity in a modular, programmable manner, an additional set of trimers were designed, each with identical backbones and hydrophobic packing motifs, the only difference being placement and composition of the hydrogen bond networks. The designs are based on 2L6HC3_13 (
FIG. 3A ) and 2L6HC3_6 (FIG. 3B ), which originated from the same superhelical parameters but have unique networks referred to as “A” and “B”, respectively; cross-sections with only nonpolar residues are labeled “X”. This three-letter code was used to generate new designs in combinatorial fashion: at each of the 4 repeating cross-sections of the supercoil (FIG. 5C ), either the A, B, or X (FIG. 4D ) were placed followed by the same design strategy and selection process as before. Six of these combinatorial designs were synthesized and 5/6 were found to be folded, thermostable, and assembled to the designed trimeric oligomerization state in vitro. These five, along with the two parent designs (2L6HC3_13=AAXX and 2L6HC3_6=XXBB) and an all-hydrophobic control (XXXX), were crossed in all-by-all yeast-two-hybrid binding experiments (FIG. 5E ). The combinatorial designs exhibit a level of specificity that is striking given that all have identical backbones and high overall sequence similarity, whereas the hydrophobic control is relatively promiscuous; the central hydrogen bond networks are clearly responsible for mediating specificity. - Previous de novo protein design efforts have focused on jigsaw-puzzle-like hydrophobic core packing to design new structures and interactions. Unlike the multi-body problem of designing highly connected and satisfied hydrogen bond networks, hydrophobic packing is readily captured by established pairwise-decomposable potentials; consequently, most protein interface designs have been predominantly hydrophobic, and attempts to design buried hydrogen bonds across interfaces have routinely failed. Polar interfaces have been designed in specialized cases but have been difficult to generalize, with many interface design efforts requiring directed evolution to optimize polar contacts and achieve desired specificity. HBNet™ now provides a general computational method to accurately design hydrogen bond networks. This ability to precisely pre-organize polar contacts without buried unsatisfied polar atoms should be broadly useful in protein design challenges such as enzyme design, small molecule binding, and polar protein interface targeting.
- Two-ring structures are a new class of protein oligomers that have the potential for programmable interaction specificity analogous to that of Watson-Crick base paring. Whereas Watson-Crick base pairing is largely limited to the antiparallel double helix, the designed protein hydrogen bond networks allow the specification of two-ring structures with a range of oligomerization states (dimers, trimers, and tetramers) and supercoil geometries. Adding an outer ring of helices to enable hydrogen bond networks extends upon elegant studies from Keating, Woolfson, and others demonstrating the designability of coiled coils with a wide range of hetero and homo-oligomeric specificities. The design models and crystal structures show that a wide range of hydrogen bond network composition and geometry are possible in repeating two-ring topologies, and that multiple networks can be engineered into the same backbone at varying positions without sacrificing thermostability, enabling stable building blocks with uniform shape but orthogonal binding interfaces (
FIG. 5 ). The DNA nanotechnology field has demonstrated that a spectacular array of shapes and interactions can be built from a relatively limited set of hydrogen bonding interactions. It should now become possible to develop new protein-based materials with the advantages of both polymers: DNA-like programmability and tunable specificity, coupled with the geometric variability, interaction diversity, and catalytic function intrinsic to proteins. - Computational techniques related to protein design based on a Hydrogen Bond Network method (HBNet™) are described in detail below. The HBNet™ method can include three steps. First, an exhaustive but efficient search identifies the hydrogen bond networks possible within a given search space (which consists of all allowed sidechain rotamers of all amino acid types being considered for a particular backbone conformation). Second, networks are scored and ranked based on the Rosetta™ energy function, satisfaction (all buried polar atoms participating in hydrogen bonds), and user-defined options. And, third, the best networks, or combinations of the best networks, are iteratively placed onto the design scaffold and held in relative position with constraints that serve as ‘seeds’ for any subsequent Rosetta™ method to design around the network and optimize rotamers for the remaining positions in the scaffold.
- HBNet™ makes use of Rosetta™'s Interaction Graph (IG) data structure, initially populating it with only the sidechain hydrogen bond and Lennard-Jones (steric repulsive) energy terms. The nodes of the graph are the residue positions of all designable or packable residues, and the edges represent putative interactions between those residues, pointing to sparse matrices that store the two-body energies between all pairs of interacting rotamers (of all amino acid types being considered) at those two positions. Only using the hydrogen bond and repulsive energies allows for instant look-up of all rotamer pairs with favorable (low energy) hydrogen bond geometry and no steric clashing. In some embodiments, Monte Carlo or similar randomized methods can be used to search this rotamer interaction space.
- In other embodiments, the entire rotamer interaction space can be searched. The search through the entire rotamer interaction space can be performed using a recursive depth-first search or a recursive breadth-first search of the interaction graph, enumerating all compatible, non-clashing connectivities of hydrogen bonded sidechain rotamers. Since the search traverses not only the nodes of the graph, but also matrices pointed to by each edge (multiple rotamers per each node, and multiple pairs of rotamers for each edge), implementation of a graph traversal algorithm for this graph can consider connected nodes (residues positions) of networks as well as considering hydrogen bonds between atoms of particular rotamers at each node—this latter hydrogen-bond criteria requires additional steps and behavior for this graph traversal algorithm.
- Each time a new hydrogen bonding rotamer is considered, the graph traversal algorithm can check the rotamer to ensure it does not clash with any existing rotamers in that network. If it is accepted, a recursive call is made on this rotamer. These recursive calls continue until a stop condition is reached: either no additional hydrogen bonding interactions can be found, or the network connects back to one of the original starting residues.
- Some polar amino acids, such as Asn and Gln, can make three or more hydrogen bonds, serving as branch points in hydrogen bond networks; depth-first search misses these branching amino acids, and to account for this, a look-back function identifies networks that share one or more identical rotamers and, after checking for clashes or conflicting residues, merges them together into complete networks. Redundant networks are eliminated.
- An instance of HBNet™, “HBNetStapleInterface™”, was written, in which graph traversals are initiated at residue positions at the intermolecular interface. This implementation of HBNet™ offers two advantages: first, starting the traversal at only the interface positions reduces the search space, speeding up runtime, and second, it ensures only networks at the interface are found, which was the goal of the approach in this study; requiring that at least 2 residues in each network come from different polypeptide chains ensure that network spans the intermolecular interface. For each starting residue, HBNetStapleInterface™ iterates through each edge; at each edge, networks are initiated for rotamer pairs with interaction energies less than a threshold value (default=−0.75). Because the interaction energy only consists of hydrogen bonding and repulsive contributions, a positive energy indicates clashing, and a negative energy indicates hydrogen bonding; setting a threshold allows for both selection of hydrogen bonds with favorable (low energy) geometry and faster computational runtime—because of the multiple recursive steps, runtime is exponential dependent upon the number of hydrogen bonding rotamer pairs (which increases as the threshold is made less stringent). The total number of hydrogen bonding rotamer pairs differs vastly between input structures and cannot be calculated ahead of time; through extensive empirical testing, threshold values were found ranging from −0.65 to 0.85 resulted in favorable hydrogen bonds and runtimes on the order of ˜0.2-10 minutes for complete design runs that included downstream design of numerous network possibilities for a given input structure.
- Once all possible networks are identified, the identified networks are scored and ranked to determine the “best” networks. For each network, buried polar atoms are identified by solvent-accessible surface area (SASA); networks with buried heavy atom donors or acceptors not making hydrogen bonds (unsatisfied) are eliminated. The remaining networks are then ranked based on the least number of unsatisfied polar hydrogens. The networks are then scored against each other in the context of a background reference structure: all designable or packable positions in the scaffold are mutated to poly-alanine, network rotamer placed onto the scaffold, and the network scored with the full Rosetta™ energy function (talaris2013),
- During
Step 1, sidechain-backbone hydrogen bonds are not explicitly considered because the backbone is fixed (the number of sidechain-backbone hydrogen bonds for any given rotamer is constant). DuringStep 2, sidechain-backbone hydrogen bonds are scored when the networks are placed onto the reference structure, and are therefore included in evaluation for satisfaction (how many of the buried polar atoms participate in hydrogen bonds). Thus, even though they are not searched for explicitly, HBNet™ captures networks with sidechain-backbone hydrogen bonds. Networks with additional hydrogen bonds to backbone polar atoms will generally score better than a similar network without h-bonds to backbone in that the connectivity and satisfaction is improved. - The best networks as ranked by
Step 2 are iteratively placed onto the input scaffold and passed back to the RosettaScripts™ protocol and for user-defined design of the remaining residue positions. Atom-pair constraints are automatically turned on for each pair of atoms making a hydrogen bonds in the network; these constraints are tracked throughout the remainder of the design run to ensure the network residues are fixed in relative position during the downstream design. HBNet™ also outputs a Rosetta™ constraint (.cst) file that can be used to specify the same constraints in subsequent Rosetta design runs. - It should be noted that these atom-pair “constraints” in Rosetta™ nomenclature are really “restraints” in that the rotamers are allowed to move, and an energy penalty is applied if the constraint is broken (i.e., if the hydrogen bond is broken). This approach—as opposed to simply fixing the coordinates of the network atoms—allows small movements of the network rotamers, allowing for a larger number of solutions for packing additional rotamers around the network. A trend that emerged that tight packing around the networks, as well as satisfaction of all buried heavy-atom donors and acceptors, is paramount to design success; it is more important to have hydrogen bonds satisfying all polar atoms in the network with mediocre h-bond geometry than it is to have ideal h-bend geometry but poor packing around them and/or unsatisfied donors/acceptors.
- Combinations of multiple networks at the same interface can also be considered and specified by the user. Unlike typical Rosetta™ design, in which one input structure yields one output structure (the lowest energy solution found by sequence design and combinatorial sidechain optimization), this approach allows for hundreds of design possibilities to be output for each input structure.
- HBNet™ will only search for networks within a given search space (all possible rotamers of all possible amino acid types being considered for a given input backbone), which can be defined by the user. HBNet™ functions as a “Mover” within the RosettaScripts™ framework and can be passed “task operations” to specify which residue positions are fixed, packable (amino acid type is fixed but sidechain conformation is not), and designable—for designable positions, task operations can also specify which amino acid types are allowed at each position. The default setting in the absence of any task operations is drat all residues are considered for design and all polar amino acids are considered in the network search.
- All positions in the scaffold can be set to be designable; for HBNet™, buried positions (defined based on solvent-accessible surface area (SASA)) can be allowed to be any noncharged polar amino acid, and solvent-exposed positions can be allowed to be any polar amino acid.
- A generalization of the Crick coiled-coil parameters was used to independently vary parameters of two or more helices supercoiled around the same axis, parameters defined as described previously. Each monomer subunit has at least one inner helix and an outer helix (
FIG. 1D ). The supercoil phase (Δφ0 in) and z-offset of the first inner helix were fixed to 0 to serve as a relative reference point; all other parameters varied independently between the inner and outer helices, with the exception of the supercoil twist (ω0) and helical twist (ω1). Because these two parameters are coupled and determine handedness, ideal values were used for ω1 with ω0 and ω1 held constant between the inner and outer helices for the majority of designs. A left-handed supercoil results from ω0<0 and ω1=102.85, a right-handed supercoil from ω0>0 and ω198.18, and a straight bundle (no supercoiling) from ω0=0 and ω1=100. For the parallel six-helix dimer designs (3L6HC2), which have two inner helices and one out helix, ω0 of the outer helix was allowed deviate from that of the inner helix, but was required to be positive to maintain a right-handed supercoil. - Additional sets of supercoiled dimer backbones were generated by constraining the pitch of the outer helix to match that of the hirer helix via the following equation.
-
- where:
-
- ω′0: superhelical twist of outer helix
- ω0: superhelical twist of inner helix
- R′: superhelical radius of outer helix
- R: superhelical radius of outer helix
- d: rise per residue (set to 1.51)
- Constraining the pitch results in the outer helix maintaining more contacts to the inner helices throughout the length of the helical bundle allows for different hydrogen bond network and packing solutions.
- HBNet™ is written in C++ as part of the Rosetta™ software suite: HBNet™ was developed to be modular and is compatible with all symmetric Rosetta™ applications, as well as the RosettaScripts™ XML framework so that it can be plugged into most existing design protocols, and users can customize options specific to their design tasks. HBNet™ is written as an abstract base class, from which specialized “mover” classes can be derived for specific design cases. In particular, the instance of HBNet™ described herein as “HBNetStapleInterface™” was written to search for hydrogen bond networks that span across intermolecular interfaces. AB
- Table 1 shows example RosettaScripts™ XML used for design calculations, example command lines and flags used for design calculations, and customized score weighting information.
-
TABLE 1 <ROSETTASCRIPTS>#Design of symmetric homo-oligomers using HBNet, updated to work with new XSD <SCOREFXNS> <ScoreFunction name=“hard_symm” weights=“talaris2013_cst” symmetric=“1”> <Reweight scoretype=“coordinate_constraint” weight=“0.5” /> </ScoreFunction> <ScoreFunction name=“hard_bb” weights=“bb_only” symmetric=“1”> <Reweight scoretype=“coordinate_constraint” weight=“2.” /> <Reweight scoretype=“cart_bonded” weight=“0.5” /> </ScoreFunction> <ScoreFunction name=“hard_symm_no_cst” weights=“talaris2013” symmetric=“1”/> </SCOREFXNS> <TASKOPERATIONS> <InitializeFromCommandline name=“init”/> <IncludeCurrent name=“current”/> <LimitAromaChi2 name=“arochi” /> <ExtraRotamersGeneric name=“ex1_ex2” ex1=“1” ex2=“1”/> <ExtraRotamersGeneric name=“ex1” ex1=“1”/> <RestrictAbsentCanonicalAAS name=“ala_only” resnum=“0” keep_aas=“A” /> <LayerDesign name=“init_layers” layer=“other” make_pymol_script=“0”> <TaskLayer> <SelectBySASA name=“symmetric_inteface_core” state=“bound” mode=“mc” core=“1” probe_radius=“2.0” core_asa=“35” surface_asa=“45” verbose=“1”/> <all copy_layer=“core” /> <Helix append=“NQSTH”/> </TaskLayer> <TaskLayer> <SelectBySASA name=“symmetric_inteface_surface” state=“bound” mode=“mc” surface=“1” probe_radius=“2.0” core_asa=“35” surface_asa=“45” verbose=“1”/> <all copy_layer=“surface” /> </TaskLayer> <TaskLayer> <SelectBySASA name=“symmetric_inteface_boundary” state=“bound” mode=“mc” boundary=“1” probe_radius=“2.0” core_asa=“35” surface_asa=“45” verbose=“1”/> <all copy_layer=“boundary” /> <Helix exclude=“EKRW”/> </TaskLayer> </LayerDesign> <SelectBySASA name=“select_core” state=“bound” mode=“mc” core=“1” probe_radius=“2.0” core_asa=“35” surface_asa=“45” verbose=“1”/> <SelectBySASA name=“select_boundary” state=“bound” mode=“mc” boundary=“1” probe_radius=“2.0” core_asa=“35” surface_asa=“45” verbose=“1”/> <SelectBySASA name=“select_surface” state=“bound” mode=“mc” surface=“1” probe_radius=“2.0” core_asa=“35” surface_asa=“45” verbose=“1”/> <SelectBySASA name=“select_all” state=“bound” mode=“mc” core=“1” boundary=“1” surface=“1” probe_radius=“2.2” core_asa=“35” surface_asa=“45” verbose=“1”/> </TASKOPERATIONS> <FILTERS> <EnzScore name=“cst_score” score_type=“cstE” scorefxn=“hard_symm” whole_pose=“1” energy_cutoff=“10.0” /> <SymUnsatHbonds name=“uhb” cutoff=“1000”/> <Holes name=“holes” threshold=“1.8” confidence=“0”/> <PackStat name=“packstat” threshold=“0.65” confidence=“0”/> <PackStat name=“init_pstat” threshold=“0.575” confidence=“0”/> <ScoreType name=“cart_bonded_filter” scorefxn=“hard_symm” score_type=“cart_bonded” threshold=“30.” confidence=“1.” /> <Geometry name=“geo” omega=“165” cart_bonded=“35” confidence=“1”/> </FILTERS> <MOVERS> #define symmetry of homo-oligomer; in this example, it's C3 symmetry <SetupForSymmetry name=“setup_symm” definition=“C3_Z.sym”/> <SymPackRotamersMover name=“transform_sc” scorefxn=“hard_symm” task_operations=“ala_only” /> <AddConstraintsToCurrentConformationMover name=“add_cst” use_distance_cst=“0” max_distance=“12.” coord_dev=“2.5” min_seq_sep=“8” /> <ClearConstraintsMover name=“clearconstraints”/> <SymMinMover name=“hardmin_bb” scorefxn=“hard_bb” type=“lbfgs_armijo_nonmonotone” tolerance=“0.0001” chi=“1” bb=“1” bondangle=“1” bondlength=“1” jump=“all” cartesian=“1”/> #HBNet Mover definition <HBNetStapleInterface name=“hbnet_interf” hb_threshold=“−0.75” upper_score_limit=“3.5” write_network_pdbs=“1” pore_radius=“3.5” minimize=“0” min_helices_contacted_by_network=“6” min_network_size=“6” max_unsat=“2” max_staples_per_interface=“4” combos=“2” stringent_satisfaction=“1” onebody_hb_threshold=“−0.3” task_operations=“init,current,arochi,ex1_ex2,init_layers” /> #MultiplePoseMover (MPM) is needed because HBNet will pass back multiple poses -- one for each network, or combination of networks that is tried # The MPM collects all poses passed to it by HBNet, and then runs a nested ROSETTASCRIPTS protocol iteratively on each pose # Constraints are automatically turned on to keep the given network fixed in relative position during downstream design <MultiplePoseMover name=“MPM_design” max_input_poses=“100”> <SELECT> </SELECT> <ROSETTASCRIPTS> <SCOREFXNS> <ScoreFunction name=“soft_symm” weights=“soft_rep_trp_ala” symmetric=“1”/> <ScoreFunction name=“hard_symm” weights=“talaris2013_cst” symmetric=“1”> <Reweight scoretype=“coordinate_constraint” weight=“0.5” /> </ScoreFunction> <ScoreFunction name=“up_ele” weights=“talaris2013” symmetric=“1”> <Reweight scoretype=“fa_elec” weight=“1.4” /> <Reweight scoretype=“hbond_sc” weight=“2.0” /> </ScoreFunction> </SCOREFXNS> <TASKOPERATIONS> <InitializeFromCommandline name=“init”/> <IncludeCurrent name=“current”/> <LimitAromaChi2 name=“arochi” /> <ExtraRotamersGeneric name=“ex1_ex2” ex1=“1” ex2=“1”/> <ExtraRotamersGeneric name=“ex1” ex1=“1”/> <LayerDesign name=“all_layers” layer=“other” make_pymol_script=“0”> <TaskLayer> <SelectBySASA name=“symmetric_inteface_core” state=“bound” mode=“mc” core=“1” probe_radius=“2.0” core_asa=“35” surface_asa=“45” verbose=“1”/> <all copy_layer=“core” /> <Helix append=“M”/> </TaskLayer> <TaskLayer> <SelectBySASA name=“symmetric_inteface_surface” state=“bound” mode=“mc” surface=“1” probe_radius=“2.0” core_asa=“35” surface_asa=“45” verbose=“1”/> <all copy_layer=“surface” /> </TaskLayer> <TaskLayer> <TaskLayer> <SelectBySASA name=“symmetric_inteface_boundary” state=“bound” mode=“mc” boundary=“1” probe_radius=“2.0” core_asa=“35” surface_asa=“45” verbose=“1”/> <all copy_layer=“boundary” /> <Helix exclude=“D”/> </TaskLayer> </LayerDesign> <SelectBySASA name=“select_core” state=“bound” mode=“mc” core=“1” probe_radius=“2.0” core_asa=“35” surface_asa=“45” verbose=“1”/> <SelectBySASA name=“select_boundary” state=“bound” mode=“mc” boundary=“1” probe_radius=“2.0” core_asa=“35” surface_asa=“45” verbose=“1”/> <SelectBySASA name=“select_surface” state=“bound” mode=“mc” surface=“1” probe_radius=“2.0” core_asa=“35” surface_asa=“45'' verbose=“1”/> <ConstrainHBondNetwork name=“hbnet_task” /> </TASKOPERATIONS> <MOVERS> <SymPackRotamersMover name=“softpack_core” scorefxn=“soft_symm” task_operations=“init,all_layers,select_core,current,arochi,hbnet_task”/> <SymPackRotamersMover name=“softpack_boundary” scorefxn=“soft_symm” task_operations=“init,all_layers,select_boundary,current,arochi,hbnet_task”/> <SymPackRotamersMover name=“softpack_surface” scorefxn=“soft_symm” task_operations=“init,all_layers,select_surface,current,arochi,hbnet_task”/> <SymPackRotamersMover name=“hardpack_core” scorefxn=“hard_symm” task_operations=“init,all_layers,select_core,current,arochi,ex1_ex2,hbnet_task”/> <SymPackRotamersMover name=“hardpack_boundary” scorefxn=“hard_symm” task_operations=“init,all_layers,select_boundary,current,arochi,ex1_ex2,hbnet_task”/> <SymPackRotamersMover name=“hardpack_surface” scorefxn=“up_ele” task_operations=“init,all_layers,select_surface,current,arochi,ex1,hbnet_task”/> <SymMinMover name=“hardmin_sconly” scorefxn=“hard_symm” chi=“1” bb=“0” bondangle=“0” bondlength=“0” /> </MOVERS> <APPLY_TO_POSE> </APPLY_TO_POSE> <PROTOCOLS> <Add mover=“softpack_core”/> <Add mover=“softpack_boundary”/> <Add mover=“softpack_surface”/> <Add mover=“hardmin_sconly”/> <Add mover=“hardpack_core”/> <Add mover=“hardpack_boundary”/> <Add mover=“hardpack_surface”/> </PROTOCOLS> </ROSETTASCRIPTS> </MultiplePoseMover> <MultiplePoseMover name=“MPM_min_repack” max_input_poses=“100”> <ROSETTASCRIPTS> <SCOREFXNS> <ScoreFunction name=“hard_symm_no_cst” weights=“talaris2013” symmetric=“1”/> <ScoreFunction name=“talaris_cart_sym” weights=“talaris2013_cart” symmetric=“1”/> </SCOREFXNS> <TASKOPERATIONS> <RestrictToRepacking name=“repack_only” /> </TASKOPERATIONS> <MOVERS> <SymMinMover name=“hardmin_cart” scorefxn=“talaris_cart_sym” type=“lbfgs_armijo_nonmonotone” tolerance=“0.0001” chi=“1” bb=“1” bondangle=“1” bondlength=“1” jump=“ALL” cartesian=“1”/> <SymPackRotamersMover name=“repack” scorefxn=“hard_symm_no_cst” task_operations=“repack_only” /> </MOVERS> <APPLY_TO_POSE> </APPLY_TO_POSE> <PROTOCOLS> <Add mover=“hardmin_cart” /> <Add mover=“repack” /> </PROTOCOLS> </ROSETTASCRIPTS> </MultiplePoseMover> #minimize and repack without constrainsts on the network residues; if there is good packing around the networks, they should stay # in place in absence of the constraints. <MultiplePoseMover name=“MPM_filters” max_input_poses=“100”> <SELECT> <AndSelector> <Filter filter=“cst_score”/> #this score represent how much the network moved during repacking without constraints <Filter filter=“uhb”/> #number of buried unsatisfied polar atoms in the entire pose <Filter filter=“holes”/> #filter out designs with large cavities </AndSelector> </SELECT> </MultiplePoseMover> </MOVERS> <PROTOCOLS> #SETUP THE POSE #only do these first steps if starting with the python script parametric bockbones #generate the symmetric backbone <Add mover=“setup_symm”/> #transform all sidechains to Ala (need CB for minimization), then minimize with coordinate constraints on the backbone <Add mover=“transform_sc”/> #constraints on the backbone <Add mover=“add_cst”/> #minimize away bad torsions that may be present in the “ideal” generated backbone <Add mover=“hardmin_bb”/> <Add mover=“clearconstraints”/> #if using BGS, start here and comment out above: #NOW LOOK FOR NETWORKS #find h-bond networks using HBNet <Add mover_name=“hbnet_interf”/> #EVERYTHING AFTER HERE IS WITH MULTPLE_POSE_MOVER (MPM) #design the rest of the pose around the networks <Add mover_name=“MPM_design”/> #minimize and repack without the network csts turn on (this acts as a filter for networks with poor packing around them, or bad sidechains) <Add mover_name=“MPM_min_repack”/> #filters <Add mover_name=“MPM_filters”/> </PROTOCOLS> </ROSETTASCRIPTS> - Parametrically generated backbones were first regularized using Cartesian space minimization in Rosetta™ to alleviate any torsional strain introduced by ideal backbone generation. For each topology, an initial search of only the inner helix was performed to identify parameter ranges that resulted in the most favorable core sidechain packing; outer helix parameters were then extensively sampled in context of these inner helix parameter ranges, generating tens of thousands of backbones. HBNet™ was used to search these backbones for hydrogen bond networks that span the intermolecular interface, have all heavy atom donors and acceptors satisfied, and contain at least three sidechains contributing hydrogen bonds. For buried interface positions, only non-charged polar amino acids were considered; for residue positions that were at least partially solvent-exposed, all polar amino acids were considered. Finer sampling was performed around backbone parameters that could accommodate both favorable hydrogen bond networks and hydrophobic packing. The helices of monomer subunits were connected into a single chain and the assembled proteins were designed using symmetric Rosetta™ sequence design calculations coupled with HBNet™ (
FIG. 1F-G ). - For the designs described herein, generally on the order of ˜100,000 networks were detected after
Step 1, but only a handful of networks, if any, passed all of the criteria outlined inStep 2 and were carried forward. After downstream design (Step 3), packing around the networks was evaluated. Because the hydrogen bond networks are constrained during downstream design, models were minimized and sidechains repacked without the constraints to measure how well the networks remained intact in the absence of the constraints. - Lastly, models were evaluated for how closely the designed structure was recapitulated by “fold-and-dock” symmetric Rosetta™ structure predication calculations: starting from an extended chain, the energy of the assembled oligomer was optimized by Monte Carlo sampling of the internal degrees of freedom of the monomer along with the rigid body transforms relating monomer subunits in the target cyclic symmetry group. Precedence was given to designs with funnel-shaped energy landscapes, in which the ab initio predicted structures converge upon the designed structure, serving as an in silico consistency check, and checking for the possibility that the amino acid sequence can adopt alternate states. Many designs with multiple networks and high polar content at the intermolecular interfaces did not exhibit strong “funneling”, although they did exhibit large “energy gaps”, meaning that the designed structure was significantly lower in energy that any structure sampled during ab initio “fold-and-dock” calculations. Designs with large energy gaps were also considered Air selection for experimental testing.
- Designs selected for experimental validation were synthesized with the exact amino acid sequence resulting from the computational design method. The only exception to this was for designs lacking a Tyr or Trp residues, a Tyr was added to the surface at non-interface positions in order to monitor A280 for purification and concentration measurements. Additionally, in a few cases, charged surface residues were modified to move the estimated isoelectric point (pl) of the protein away from buffer pH.
- To connect helices of the monomer into a single chain, an exhaustive database of backbone samples composed of fragments spanning two helical regions via a loop of five or less residues, as identified by DSSP, in high resolution crystallographic structures was generated. Candidate loops were identified in this database via rigid alignment of the terminal residues of the fragment and target parametrically designed backbone using an optimized superposition algorithm.
- Candidates under a stringent alignment tolerance (within 0.35 Å RMSD) were then fully aligned to the target backbone via torsion-space minimization under stringent coordinate constraints to the target backbone heavy-atom coordinates and soft coordinate constraints to the aligned candidate backbone heavy-atom coordinates. Candidate loop sequences were then designed under sequence profile constraints generated via alignment of the loop backbone to the source structure database, and the lowest-scoring candidate selected as the final loop design.
- Protein BLAST™ searches were performed using the National Center for Biotechnology Information (NCBI) web server, searching against all non-redundant protein sequences (‘nr’ database) using an Expect threshold (E-value cutoff) of 10.0 and the BLOSUM62 substitution matrix.
- Crystal structures and design models were superimposed through structure-based alignment using all heavy atoms. From this alignment, RMSD was calculated across all alpha-carbon atoms, and also across heavy atoms of the hydrogen bond network residues.
- To investigate the structural uniqueness of our designs the MICAN alignment algorithm was used to search against homo-oligomer bio-units of the same symmetry group in the Protein Data Bank (PDB).
- To calculate parameters for the crystallized two-ring structures, the Coiled-coil Crick Parameterization (CCCP) web server with the “Global symmetric” optimization option as used, as structures of interest are all symmetric homooligomers. As parameters varied between the inner and outer helices of a given structure, parameters were calculated separately for inner ring and the outer ring helices, inputting .pdb files corresponding to either all helical residues of the inner ring helices, or all helical residues of the outer ring helices, for each crystal structure.
- All structural images for figures were generated using PyMOL™.
- Synthetic genes were ordered from Genscript Inc. (Piscataway, N.J., USA) and delivered in either pET21-NESG or pET-28b+ E. coli expression vectors, inserted at the NdeI and XhoI sites of each vector. For the pET21-NESG constructs, synthesized DNA was cloned in frame with the C-terminal hexahistidine tag. For the pET-28b+ constructs, synthesized DNA was cloned in frame with the N-terminal hexahistidine tag and thrombin cleavage site, and a stop codon was introduced at the C-terminus. Plasmids were transformed into chemically competent E. coli BL21(DE3)Star or L21 (DE3)Star-pLysS cells (Invitrogen) for protein expression. Constructs for yeast two-hybrid assays were made by Gibson assembly; inserts were generated by PCR from pET-21 or pET-28 E. coli expression vectors as templates, or ordered as gBlocks®(IDT). All primers and gBlocks® were ordered from Integrated DNA Technologies (IDT).
- Starter cultures were grown at 37° C. in either Luria-Bertani (LB) medium overnight, or in Terrific Broth for 8 hours, in the presence of 50 μg/ml carbenicillin (pET21-NESG) or 30 μg/ml kanamycin (pET-28b+). Starter cultures were used to inoculate 500 mL of LB, Terrific Broth, or Terrific Broth II (MP Biomedicals) containing antibiotic. Cultures were induced with 0.2-0.5 mM IPTG at an OD600 of 0.6-0.9 and expressed overnight at 18° C. (many designs were also later expressed at 37° C. for 4 hours with no noticeable difference in yield). Cells were harvested by centrifugation for 15 minutes at 5000
ref 4° C. and resuspended in lysis buffer (20 mM Tris, 300 mM NaCl, 20 mM Imidazole, pH 8.0 at room temperature), then lysed by sonication in presence of lysozyme, DNAse, and EDTA-free cocktail protease inhibitor (Roche) or 1 mM PMSF. Lysates were cleared by centrifugation at 4° C. 18,000 rpm for at least 30 minutes and applied to Ni-NTA (Qiagen) columns pre-equilibrated in lysis buffer. The column was washed three times with 5 column volumes (CV) of wash buffer (20 mM Tris, 300 mM NaCl, 30 mM Imidazole, pH 8.0 at room temperature), followed by 3-5 CV of high-salt wash buffer (20 mM Tris, 1 M NaCl, 30 mM imidazole, pH 8.0 at room temperature), and then 5 CV of wash buffer. Protein was eluted with 20 mM Tris, 300 mM NaCl, 250 mM Imidazole, pH 8.0 at room temperature. Proteins were initially screened by SEC-MALS and CD with His tags intact; if possible, the tags were cleaved and samples were further purified for crystallography, SAXS, and GdmaCl melts. - N-terminal hexahistidine tags of the pET-28 constructs were cleaved with restriction grade thrombin (EMD Millipore 69671-3) at room temperature for 4 hours or overnight, using a 1:5000 dilution of enzyme into sample solution; full cleavage was observed after 2 hours via SDS-PAGE analysis and no spurious cleavage was observed at time points upwards of 18 hours. Prior to addition of thrombin, buffer was exchanged into lysis buffer (20 mM Tris, 300 mM NaCl, 20 mM Imidazole). After cleavage, the sample was applied to a column of benzamidine resin (GE Healthcare/Pharmacia, Fisher #45-000-280); resin was resuspended and the sample was incubated on the column for 30-60 minutes with nutation. Flow-through was collected and additional sample was obtained by washing the benzamidine resin with 1.5 CV of lysis buffer. 1 mM PMSF was added to inhibit any remaining free thrombin. Sample was then passed over an additional Ni-NTA column and washed with 1.5 CV of lysis buffer. Proteins were further purified by FPLC size-exclusion chromatography (SEC) using a
Superdex 75 10/300 column (GE Healthcare). For SAXS, gel filtration buffer was 20 mM Tris pH 8.0 at room temperature. 150 mM NaCl and 2% glycerol; for crystallography, 20 mM Tris pH 8.0, 100 mM NaCl was used. No reducing agents were added, as none of the designed proteins contained cysteines. - SEC-MALS experiments used a Superdex 75 10/300 column connected to a miniDAWN TREOS multi-angle static light scattering and an Optilab T-rEX (refractometer with Extended range) detector (Wyatt Technology Corporation, Santa Barbara Calif., USA). Protein samples were injected at concentrations of 3-5 mg/mL in TBS (pH 8.0) or PBS (pH 7.4). Data was analyzed using ASTRATM (Wyatt Technologies) software to estimate the weigh average molar mass (Mw) of eluted species, as well as the number average molar mass (Mn) to assess monodispersity by polydispersity index (PDI)=Mw/Mn.
- CD wavelength scans (260 to 195 nm) and temperature melts (23 to 95° C.) were measured using a JASCO J-1500 or an AVIV model 420 CD spectrometer. Temperature melts monitored absorption signal at 222 nm and were carried out at a heating rate of 4° C./min; protein samples were at 0.2-0.5 mg/mL in phosphate buffered saline (PBS) pH 7.4 in a 0.1 cm cuvette.
- Guanidinium chloride (GdmCl) titrations were performed on the same spectrometers with automated titration apparatus in PBS pH 7.4 at 25° C., monitored at 222 nm, using a protein concentration of 0.025-0.06 mg/mL in a 1 cm cuvette with stir bar; each titration consisted of at least 40 evenly distributed concentration points with one minute mixing time for each step. Titrant solution consisted of the same concentration of protein in PBS+GdmCl; GdmCl concentration was determined by refractive index.
- Peptides 2L4HC2_9_inner and 2L6HC3_13_inner were ordered from Genscript Inc. (Piscataway, N.J., USA) with N-terminal acetylation and C-terminal amidation. 2L4HC2_9_inner=SSDYLRETIEELRERIRELEREIRRSNEEIERLREEKS (SEQ ID NO: 93) and 2L6HC3_13_inner=TERENNYRNEENNRKIEEEIREIKKEIKKNKERD (SEQ ID NO: 94). Peptides were dissolved in PBS pH 7.4 and further dialyzed into PBS pH 7.4 for CD experiments.
- Purified protein samples were concentrated to approximately 12 mg/ml in 20 mM Tris pH 8.0 and 1.00 mM NaCl. Samples were screened using the sparse matrix method (Jancarik and Kim, 1991) with a Phoenix Robot (Art Robbins Instruments, Sunnyvale, Calif.) utilizing the following crystallization screens: Berkeley Screen (Lawrence Berkeley National Laboratory), Crystal Screen, PEG/Ion, Index and PEGRx (Hampton Research, Aliso Viejo, Calif.). The optimum conditions for crystallization of the different designs were found as follows: 2L6HC3_6, 0.2 M Sodium Fluoride, 0.1 M MES pH 5.5 and 20
% PEG 400; 2L6HC3_12, 2.2 M Sodium Malonate pH 5.0; 2L6HC3_13, 0.06 M Citric acid, 0.04 M BIS-TRIS propane pH 4.1 and 16% PEG 3,350; 2L8HC4_12, 0.2 M Sodium Acetate trihydrate, 0.1 M Tris hydrochloride pH 8.5 and 30% PEG 4,000; 3L6HC2_2, 0.1 M Sodium Acetate trihydrate pH 4.5 and 3.0 M Sodium chloride; 2L4HC2_23, 0.2 M Lithium chloride and 20% PEG 3,350; 2L4HC2_9, 0.1 M Sodium citratetribasic dehydrate pH 30% PEG MME 550; 2L4HC2_11, 0.1 M Tris pH 8.5 and 2.0 M Ammonium sulfate; 5L6HC3_1, 0.1 M acid pH 3.5 and 3.0 M Sodium chloride; and 2L4HC2_24 was concentrated to 20 mg/ml and crystallized in 0.1 M Citric acid pH 3.5, 2.0 M Ammonium sulfate. Crystals were obtained after 1 to 14 days by the sitting-drop vapor-diffusion method with the drops consisting of a 1:1 mixture of 0.2 μL protein solution and 0.2 μL reservoir solution. - The crystals of the designed proteins were placed in a reservoir solution containing 15 to 20% (v/v) glycerol, and then flash-cooled in liquid nitrogen. The X-ray data sets were collected at the Berkeley Center for Structural Biology beamlines 5.0.1, 8.2.1 and 8.2.2 of the Advanced Light Source at Lawrence Berkeley National Laboratory (LBNL). Data sets were indexed and scaled using HKL2000. All the design structures were determined by the molecular-replacement method with the program PHASER within the Phenix suite using the design models as the initial search model. The atomic positions obtained from molecular replacement and the resulting electron density maps were used to build the design structures and initiate crystallographic refinement and model rebuilding. Structure refinement was performed using the phenix.refine program. Manual rebuilding using COOT and the addition of water molecules allowed construction of the final models. Root-mean-square deviation differences from ideal geometries for bond lengths, angles and dihedrals were calculated with Phenix. The overall stereochemical quality of all final models was assessed using the program MOLPROBITY.
- Samples were purified by gel filtration in 20 mM Tris pH 8.0 at room temperature, 150 mM NaCl and 2% glycerol; fractions preceding the void volume of the column were used as blanks for buffer subtraction. Scattering measurements were performed at the SIBYLS 12.3.1 beamline at the Advanced Light Source. The X-ray wavelength (λ) was 1 Å, and the sample-to-detector distance of the Mar 165 detector was 1.5 m, corresponding to a scattering vector q (q=4πsin θ/λ, where 2θ is the scattering angle) range of 0.01 to 0.3 Å−1. Data sets were collected using exposures of 0.5, 1, and 6 seconds at 12 keV. For longer exposures that resulted in saturation of low q signal or radiation damage, datasets were merged with lower exposures from the same sample. For each sample, data was collected for at least two different concentrations to test for concentration-dependent effects; “high” concentration samples ranged from 3-7 mg/ml and “low” concentration samples ranged from 1-2 mg/ml. Data was analyzed using the ScÅiter software package as previously described; for samples that did not exhibit concentration-dependence, the best data set based on signal-to-noise and Guinier fitting was used for analysis. FoXS was used to compare design models to experimental scattering profiles and calculate quality of fit (X) values. For the design models, extra residues introduced by the expression vector were added to the computational models using Rosetta™ Remodel so that the design sequence matched that of the experimental sample. To capture the conformational flexibility of these extra tag residues in solution, 100 independent models were generated per design. These 100 models were then clustered by Rosetta™, and to avoid bias, the cluster center of the largest cluster was selected as the single representative model used for fitting to experimental data.
- Protein binders were cloned into plasmids bearing the GAL4 DNA-binding domain (pOBD2) and or the GAL4 transcription activation domain (poAD) using Gibson assembly and sequence verified. For each pair of binders tested, the yeast strain PJ69-4a was transformed with the appropriate pair of plasmids using a modified LiQAc transformation protocol where rescue and selection of the transformed yeast was performed in minimal liquid media lacking tryptophan and leueine. Before the assay, transformed cells were diluted 1:10 and grown for 16 hours in fresh minimal media lacking tryptophan and leucine. After this initial incubation, cells were diluted again 1:10 and grown—while shaking—in a 96 well plate, this time in 200 μl of minimal media lacking tryptophan, leucine and histidine. Since a protein interaction between the DNA-binding domain and the transcription activation domain is necessary for the cells to grow in the absence of histidine, successful interactions can be approximated by growth rate. The optical density (OD) of cells was measured every 10 minutes over the span of 48 hours, and the growth rate was calculated for every 60-minute span. The maximum growth rate per hour (maxV) was used as a proxy for interactions between binder pairs.
- Gel bands were isolated, washed with ammonium bicarbonate, and reduced with DTT at 60° C. for 15 minutes. After cooling, gel pieces were treated with iodoacetamide for 15 minutes, in the dark at room temperature, to alkylate reduced thiol groups. Protease digestion was accomplished with sequencing grade trypsin at 10:1, substrate to enzyme, concentration for 4 hours at 37° C. Peptide samples were dried under vacuum and resuspended in 0.1% formic acid prior to LCMS/MS analysis. Liquid chromatography consisted of a 60-minute. gradient across a 15 cm column packed with C18 resin downstream of a 3 cm kasil frit trap packed with C12 resin. Spectra were collected using data-dependent acquisition on a Thermo Velos Pro mass spectrometer. Each sample was injected with three technical replicates and peptides were identified using SEQUEST and Percolator followed by IDPicker for protein inference.
- In a further aspect, a method is provided. A computing device determines a search space for hydrogen bond networks related to one or more molecules. The search space includes a plurality of energy terms related to a plurality of residues related to the hydrogen bond networks. The computing device searches the search space to identify one or more hydrogen bond networks based on the plurality of energy terms. The computing device screens the identified one or more hydrogen bond networks to identify one or more screened hydrogen bond networks based on scores for the one or more identified hydrogen bond networks. The computing device generates an output related to the one or more screened hydrogen bond networks.
- In another aspect, a computing device is provided. The computing device includes one or more data processors and a computer-readable medium. The computer-readable medium is configured to store at least computer-readable instructions that, when executed, cause the computing device to perform functions. The functions include: determining a search space for hydrogen bond networks related to one or more molecules, where the search space includes a plurality of energy terms related to a plurality of residues related to the hydrogen bond networks; searching the search space to identify one or more hydrogen bond networks based on the plurality of energy terms; screening the identified one or more hydrogen bond networks to identify one or more screened hydrogen bond networks based on scores for the one or more identified hydrogen bond networks; and generating an output related to the one or more screened hydrogen bond networks.
- In another aspect, a computer-readable medium is provided. The computer-readable medium is configured to store at least computer-readable instructions that, when executed by one or more processors of a computing device, cause the computing device to perform functions. The functions include: determining a search space for hydrogen bond networks related to one or more molecules, where the search space includes a plurality of energy terms related to a plurality of residues related to the hydrogen bond networks; searching the search space to identify one or more hydrogen bond networks based on the plurality of energy terms; screening the identified one or more hydrogen bond networks to identify one or more screened hydrogen bond networks based on scores for the one or more identified hydrogen bond networks; and generating an output related to the one or more screened hydrogen bond networks.
- In another aspect, an apparatus is provided. The apparatus includes: means for determining a search space for hydrogen bond networks related to one or more molecules, where the search space includes a plurality of energy terms related to a plurality of residues related to the hydrogen bond networks; means for searching the search space to identify one or more hydrogen bond networks based on the plurality of energy terms; means for screening the identified one or more hydrogen bond networks to identify one or more screened hydrogen bond networks based on scores for the one or more identified hydrogen bond networks; and means for generating an output related to the one or more screened hydrogen bond networks.
-
FIG. 6 is a block diagram of an example computing network. Some or all of the above-mentioned techniques disclosed herein, such as but not limited to techniques disclosed as part of and/or being performed by software, the Rosetta™ software suite, RosettaDesign™, Rosetta™ applications, and/or other herein-described computer software and computer hardware, can be part of and/or performed by a computing device. For example,FIG. 6 shows protein design system 602 configured to communicate, vianetwork 606, withclient devices protein database 608. In some embodiments, protein design system 602 and/orprotein database 608 can be a computing device configured to perform some or all of the herein described methods and techniques, such as but not limited to,method 800 and functionality described as being part of or related to Rosetta™.Protein database 608 can, in some embodiments, store information related to and/or used by Rosetta™. -
Network 606 may correspond to a LAN, a wide area network (WAN), a corporate intranet, the public Internet, or any other type of network configured to provide a communications path between networked computing devices.Network 606 may also correspond to a combination of one or more LANs, WANs, corporate intranets, and/or the public Internet. - Although
FIG. 6 only shows threeclient devices client devices client devices client devices protein database 608 can be incorporated in a client device, such asclient device -
FIG. 7A is a block diagram of an example computing device (e.g., system). In particular,computing device 700 shown inFIG. 7A can be configured to: include components of and/or perform one or more functions of protein design system 602,client device network 606, and/orprotein database 608 and/or carry out part or all of any herein-described methods and techniques, such as but not limited tomethod 800.Computing device 700 may include a user interface module 701, a network-communication interface module 702, one ormore processors 703, anddata storage 704, all of which may be linked together via a system bus, network, orother connection mechanism 705. - User interface module 701 can be operable to send data to and/or receive data from external user input/output devices. For example, user interface module 701 can be configured to send and/or receive data to and/or from user input devices such as a keyboard, a keypad, a touch screen, a computer mouse, a track ball, a joystick, a camera, a voice recognition module, and/or other similar devices. User interface module 701 can also be configured to provide output to user display devices, such as one or more cathode ray tubes (CRT), liquid crystal displays (LCD), light emitting diodes (LEDs), displays using digital light processing (DLP) technology, printers, light bulbs, and/or other similar devices, either now known or later developed. User interface module 701 can also be configured to generate audible output(s), such as a speaker, speaker jack, audio output port, audio output device, earphones, and/or other similar devices.
- Network-
communications interface module 702 can include one or morewireless interfaces 707 and/or one ormore wireline interfaces 708 that are configurable to communicate via a network, such asnetwork 606 shown inFIG. 6 . Wireless interfaces 707 can include one or more wireless transmitters, receivers, and/or transceivers, such as a Bluetooth transceiver, a Zigbee transceiver, a Wi-Fi transceiver, a WiMAX transceiver, and/or other similar type of wireless transceiver configurable to communicate via a wireless network. Wireline interfaces 708 can include one or more wireline transmitters, receivers, and/or transceivers, such as an Ethernet transceiver a Universal Serial Bus (USB) transceiver, or similar transceiver configurable to communicate via a twisted pair, one or more wires, a coaxial cable, a fiber-optic link, or a similar physical connection to a wireline network. - In some embodiments, network
communications interface module 702 can be configured to provide reliable, secured, and/or authenticated communications. For each communication described herein, information for ensuring reliable communications (i.e., guaranteed message delivery) can be provided, perhaps as part of a message header and/or footer (e.g., packet/message sequencing information, encapsulation header(s) and/or footer(s), size/time information, and transmission verification information such as CRC and/or parity check values). Communications can be made secure (e.g., be encoded or encrypted) and/or decrypted/decoded using one or more cryptographic protocols and/or algorithms, such as, but not limited to, DES, AES, RSA, Diffie-Hellman and/or DSA. Other cryptographic protocols and/or algorithms can be used as well or in addition to those listed herein to secure (and then decrypt/decode) communications. -
Processors 703 can include one or more general purpose processors and/or one or more special purpose processors (e.g., digital signal processors, application specific integrated circuits, etc.).Processors 703 can be configured to execute computer-readable program instructions 706 contained indata storage 704 and/or other instructions as described herein.Data storage 704 can include one or more computer-readable storage media that can be read and/or accessed by at least one ofprocessors 703. The one or more computer-readable storage media can include volatile and/or non-volatile storage components, such as optical, magnetic, organic or other memory or disc storage, which can be integrated in whole or in part with at least one ofprocessors 703. In some embodiments,data storage 704 can be implemented using a single physical device (e.g., one optical, magnetic, organic or other memory or disc storage unit), while in other embodiments,data storage 704 can be implemented using two or more physical devices. -
Data storage 704 can include computer-readable program instructions 706 and perhaps additional data. For example, in some embodiments,data storage 704 can store part or all of data utilized by a protein design system and/or a protein database; e.g., protein designs system 602,protein database 608. In some embodiments,data storage 704 can additionally include storage required to perform at least part of the herein-described methods and techniques and/or at least part of the functionality of the herein-described devices and networks. -
FIG. 7B depicts anetwork 606 ofcomputing clusters 709 a, 709 b, 709 c arranged as a cloud-based server system in accordance with an example embodiment. Data and/or software for protein design system 602 can be stored on one or more cloud-based devices that store program logic and/or data of cloud-based applications and/or services. In some embodiments, protein design system 602 can be a single computing device residing in a single computing center. In other embodiments, protein design system 602 can include multiple computing devices in a single computing center, or even multiple computing devices located in multiple computing centers located in diverse geographic locations. - In some embodiments, data and/or software for protein design system 602 can be encoded as computer readable information stored in tangible computer readable media (or computer readable storage media) and accessible by
client devices -
FIG. 7B depicts a cloud-based server system in accordance with an example embodiment. InFIG. 7B , the functions of protein design system 602 can be distributed among three computingclusters 709 a, 709 b, and 709 c. Computing cluster 709 a can include one ormore computing devices 700 a, cluster storage arrays 710 a, and cluster routers 711 a connected by alocal cluster network 712 a. Similarly,computing cluster 709 b can include one ormore computing devices 700 b, cluster storage arrays 710 b, and cluster routers 711 b connected by alocal cluster network 712 b. Likewise, computing cluster 709 c can include one ormore computing devices 700 c,cluster storage arrays 710 c, and cluster routers 711 c connected by alocal cluster network 712 c. - In some embodiments, each of the
computing clusters 709 a, 709 b, and 709 c can have an equal number of computing devices, an equal number of cluster storage arrays, and an equal number of cluster routers. In other embodiments, however, each computing cluster can have different numbers of computing devices, different numbers of cluster storage arrays, and different numbers of cluster routers. The number of computing devices, cluster storage arrays, and cluster routers in each computing cluster can depend on the computing task or tasks assigned to each computing cluster. - In computing cluster 709 a, for example,
computing devices 700 a can be configured to perform various computing tasks of protein design system 602. In one embodiment, the various functionalities of protein design system 602 can be distributed among one or more ofcomputing devices Computing devices computing clusters 709 b and 709 c can be configured similarly to computingdevices 700 a in computing cluster 709 a. On the other hand, in some embodiments,computing devices - In some embodiments, computing tasks and stored data associated with protein design system 602 can be distributed across
computing devices computing devices - The
cluster storage arrays 710 a, 710 b, and 710 c of thecomputing clusters 709 a, 709 b, and 709 c can be data storage arrays that include disk array controllers configured to manage read and write access to groups of hard disk drives. The disk array controllers, alone or in conjunction with their respective computing devices, can also be configured to manage backup or redundant copies of the data stored in the cluster storage arrays to protect against disk drive or other cluster storage array failures and/or network failures that prevent one or more computing devices from accessing one or more cluster storage arrays. - Similar to the manner in which the functions of protein design system 602 can be distributed across
computing devices computing clusters 709 a, 709 b, and 709 c, various active portions and/or backup portions of these components can be distributed acrosscluster storage arrays 710 a, 710 b, and 710 c. For example, some cluster storage arrays can be configured to store one portion of the data and/or software of protein design system 602, while other cluster storage arrays can store a separate portion of the data and/or software of protein design system 602. Additionally, some cluster storage arrays can be configured to store backup versions of data stored in other cluster storage arrays. - The cluster routers 711 a, 711 b, and 711 c in
computing clusters 709 a, 709 b, and 709 c can include networking equipment configured to provide internal and external communications for the computing clusters. For example, the cluster routers 711 a in computing cluster 709 a can include one or more internet switching and routing devices configured to provide (i) local area network communications between thecomputing devices 700 a and the cluster storage arrays 701 a via thelocal cluster network 712 a, and (ii) wide area network communications between the computing cluster 709 a and thecomputing clusters 709 b and 709 c via the widearea network connection 713 a tonetwork 606. Cluster routers 711 b and 711 c can include network equipment similar to the cluster routers 711 a, and cluster routers 711 b and 711 c can perform similar networking functions for computingclusters - In some embodiments, the configuration of the cluster routers 711 a, 711 b, and 711 c can be based at least in part on the data communication requirements of the computing devices and cluster storage arrays, the data communications capabilities of the network equipment in the cluster routers 711 a, 711 b, and 711 c, the latency and throughput of
local networks -
FIG. 8 is a flow chart of anexample method 800.Method 800 can be carried out by a computing device, such ascomputing device 700 described in the context of at leastFIG. 7A -
Method 800 can begin atblock 810, where the computing device can determine a search space for hydrogen bond networks related to one or more molecules, where the search space includes a plurality of energy terms related to a plurality of residues related to the hydrogen bond networks, such as discussed above at least in the “Computational Techniques” section. - In some embodiments, the search space can be configured as a graph having a plurality of nodes connected by one or more edges, where a node of the plurality of nodes is based on a particular residue of the plurality of residues, the particular residue having a residue position, and where an edge of the one or more edges connects a first node and a second node of the plurality of nodes based on a possible interaction between the first and second nodes, such as discussed above at least in the “computational Techniques” section. In particular of these embodiments, the first node can relate to a first residue of the plurality of residues where the second node relates to a second residue of the plurality of residues, and where the possible interaction between first and second nodes relate to a possible interaction between a rotamer of the first residue and/or a rotamer of the second residue, such as discussed above at least in the “Computational Techniques” section. In more particular of these embodiments, the possible interaction between the possible interaction between first and second nodes can relate to an interaction energy between the first residue and the second residue, such as discussed above at least in the “Computational Techniques” section. In even more particular of these embodiments, determining the search space can include: determining whether the interaction energy between the first residue and the second residue is less than a threshold interaction energy; and after determining that the interaction energy between the first residue and the second residue is less than the threshold interaction energy, adding a hydrogen bond network including the first node, the second node, and at least one edge between the first and second nodes to the search space, such as discussed above at least in the “Computational Techniques” section. In further more particular of these embodiments, at least one edge between the first and second nodes can include information about the interaction energy between the first residue and the second residue, such as discussed above at least in the “Computational Techniques” section. In even further particular of these embodiments, the information about the interaction energy between the first residue and the second residue can include a plurality of interaction energy values, where each interaction energy value in the plurality of interaction energy values is associated with a particular rotamer of the first residue and a particular rotamer of the second residue, such as discussed above at least in the “Computational Techniques” section.
- In other embodiments, determining the search space can include: determining at least a first residue position and a second residue position at an intermolecular interface between a first molecule and a second molecule, the first residue position associated with a first residue of the first molecule and the second residue position associated with a second residue of the second molecule; and determining the search space based on the at least the first residue position and the second residue position, such its discussed above at least in the “Computational Techniques” section. In some of these embodiments, at least one of the first molecule and the second molecule can include a polypeptide chain, such as discussed above at least in the “Computational. Techniques” section.
- At
block 820, the computing device can search the search space to identify one or more hydrogen bond networks based on the plurality of energy terms, such as discussed above at least in the “Computational Techniques” section. In some embodiments, searching the search space includes searching all of the search space, such as discussed above at least in the “Computational Techniques” section. In particular of these embodiments, searching all of the search space using the depth-first search. In other particular of these embodiments, searching all of the search space includes searching all of the search space using a breadth-first search, such as discussed above at least in the “Computational Techniques” section. - In other embodiments, searching the search space can include: performing a first search of the search space to identify one or more initial hydrogen bond networks; and identifying the one or more identified hydrogen bond networks by at least merging a first hydrogen bond network and a second hydrogen bond network of the one or more initial hydrogen bond networks, such as discussed above at least in the “Computational Techniques” section. In particular of these embodiments, merging the first hydrogen bond network and the second hydrogen bond network can include: determining whether the first hydrogen bond network and the second hydrogen bond network share an identical rotamer; and after determining that the first hydrogen bond network and the second hydrogen bond network share an identical rotamer, merging the first hydrogen bond network and the second hydrogen bond network, such as discussed above at least in the “Computational Techniques” section.
- At
block 830, the computing device can screening the identified one or more hydrogen bond networks to identify one or more screened hydrogen bond networks based on scores for the one or more identified hydrogen bond networks, such as discussed above at least in the “Computational Techniques” section. In some embodiments, a particular score for a particular identified hydrogen bond network of the one or more identified hydrogen bond networks can be based on a number of polar atoms that participate in the particular hydrogen bond network, such as discussed above at least in the “Computational Techniques” section. In other embodiments, a particular score for a particular identified hydrogen bond network of the one or more identified hydrogen bond networks can be based on a background reference structure, such as discussed above at least in the “Computational Techniques” section. In particular of these embodiments, the particular score for the particular identified hydrogen bond network can be based on a score related to one or more sidechain-backbone hydrogen bonds, where the one or more sidechain-backbone hydrogen bonds can be related to the background reference structure, such as discussed above at least in the “Computational Techniques” section. In still other embodiments, a particular score for a particular identified hydrogen bond network of the one or more identified hydrogen bond networks can be based on an energy function, such as discussed above at least in the “Computational Techniques” section. - At
block 840, an output related to the one or more screened hydrogen bond networks can be generated. In some embodiments, generating the output related to the one or more screened hydrogen bond networks can include designing one or more molecules based on the screened hydrogen bond networks, such as discussed above at least in the “Computational Techniques” section. In particular of these embodiments, designing the one or more molecules based on the screened hydrogen bond networks includes allowing one or more relatively-small movements of one or more rotamers in a screened hydrogen bond network, such as discussed above at least in the “Computational Techniques” section. - In other embodiments, generating the output related to the one or more screened hydrogen bond networks can include generating a plurality of outputs related to the one or more screened hydrogen bond networks, such as discussed above at least in the “Computational Techniques” section. In still other embodiments, generating the output related to the one or more screened hydrogen bond networks can include: generating a synthetic gene that is based on the one or more screened hydrogen bond networks; expressing a particular protein in vivo using the synthetic gene; and purifying the particular protein. In particular of these embodiments, expressing the particular protein sequence in vivo using the synthetic gene includes expressing the particular protein sequence in one or more Escherichia coli that include the synthetic gene, such as discussed above in at least in the “Experimental Methods” section.
- The particulars shown herein are by way of example and for purposes of illustrative discussion of the preferred embodiments of the present invention only and are presented in the cause of providing what is believed to be the most useful and readily understood description of the principles and conceptual aspects of various embodiments of the invention. In this regard, no attempt is made to show structural details of the invention in more detail than is necessary for the fundamental understanding of the invention, the description taken with the drawings and/or examples making apparent to those skilled in the art how the several forms of the invention may be embodied in practice.
- The above definitions and explanations are meant and intended to be controlling in any future construction unless clearly and unambiguously modified in the following examples or when application of the meaning renders any construction meaningless or essentially meaningless. In cases where the construction of the term would render it meaningless or essentially meaningless, the definition should be taken from Webster's Dictionary, 3rd Edition or a dictionary known to those of skill in the art, such as the Oxford Dictionary of Biochemistry and Molecular Biology (Ed. Anthony Smith, Oxford University Press, Oxford, 2004).
- As used herein and unless otherwise indicated, the terms “a” and “an” are taken to mean “one”, “at least one” or “one or more”. Unless otherwise required by context, singular terms used herein shall include pluralities and plural terms shall include the singular.
- Unless the context clearly requires otherwise, throughout the description and the claims, the words ‘comprise’, ‘comprising’, and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in the sense of “including, but not limited to”. Words using the singular or plural number also include the plural or singular number, respectively. Additionally, the words “herein,” “above” and “below” and words of similar import, when used in this application, shall refer to this application as a whole and not to any particular portions of this application.
- The above description provides specific details for a thorough understanding of, and enabling description for, embodiments of the disclosure. However, one skilled in the art will understand that the disclosure may be practiced without these details. In other instances, well-known structures and functions have not been shown or described in detail to avoid unnecessarily obscuring the description of the embodiments of the disclosure. The description of embodiments of the disclosure is not intended to be exhaustive or to limit the disclosure to the precise form disclosed. While specific embodiments of, and examples for, the disclosure are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the disclosure, as those skilled in the relevant art will recognize.
- All of the references cited herein are incorporated by reference. Aspects of the disclosure can be modified, if necessary, to employ the systems, functions and concepts of the above references and application to provide yet further embodiments of the disclosure. These and other changes can be made to the disclosure in light of the detailed description.
- Specific elements of any of the foregoing embodiments can be combined or substituted for elements in other embodiments. Furthermore, while advantages associated with certain embodiments of the disclosure have been described in the context of these embodiments, other embodiments may also exhibit such advantages, and not all embodiments need necessarily exhibit such advantages to fall within the scope of the disclosure.
- The above detailed description describes various features and functions of the disclosed systems, devices, and methods with reference to the accompanying figures. In the figures, similar symbols typically identify similar components, unless context dictates otherwise. The illustrative embodiments described in the detailed description, figures, and claims are not meant to be limiting. Other embodiments can be utilized, and other changes can be made, without departing from the spirit or scope of the subject matter presented herein. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, separated, and designed in a wide variety of different configurations, all of which are explicitly contemplated herein.
- With respect to any or all of the ladder diagrams, scenarios, and flow charts in the figures and as discussed herein, each block and/or communication may represent a processing of information and/or a transmission of information in accordance with example embodiments. Alternative embodiments are included within the scope of these example embodiments. In these alternative embodiments, for example, functions described as blocks, transmissions, communications, requests, responses, and/or messages may be executed out of order from that shown or discussed, including substantially concurrent or in reverse order, depending on the functionality involved. Further, more or fewer blocks and/or functions may be used with any of the ladder diagrams, scenarios, and flow charts discussed herein, and these ladder diagrams, scenarios, and flow charts may be combined with one another, in part or in whole.
- A block that represents a processing of information may correspond to circuitry that can be configured to perform the specific logical functions of a herein-described method or technique. Alternatively or additionally, a block that represents a processing of information may correspond to a module, a segment, or a portion of program code (including related data). The program code may include one or more instructions executable by a processor for implementing specific logical functions or actions in the method or technique. The program code and/or related data may be stored on any type of computer readable medium such as a storage device including a disk or hard drive or other storage medium.
- The computer readable medium may also include non-transitory computer readable media such as computer-readable media that stores data for short periods of time like register memory, processor cache, and random access memory (RAM). The computer readable media may also include non-transitory computer readable media that stores program code and/or data for longer periods of time, such as secondary or persistent long term storage, like read only memory (ROM), optical or magnetic disks, compact-disc read only memory (CD-ROM), for example. The computer readable media may also be any other volatile or non-volatile storage systems. A computer readable medium may be considered a computer readable storage medium, for example, or a tangible storage device. Moreover, a block that represents one or more information transmissions may correspond to information transmissions between software and/or hardware modules in the same physical device. However, other information transmissions may be between software modules and/or hardware modules in different physical devices.
- Numerous modifications and variations of the present disclosure are possible in light of the above teachings.
- The particulars shown herein are by way of example and for purposes of illustrative discussion of the preferred embodiments of the present invention only and are presented in the cause of providing what is believed to be the most useful and readily understood description of the principles and conceptual aspects of various embodiments of the invention. In this regard, no attempt is made to show structural details of the invention in more detail than is necessary for the fundamental understanding of the invention, the description taken with the drawings and/or examples making apparent to those skilled in the art how the several forms of the invention may be embodied in practice.
Claims (27)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/993,975 US20210101945A1 (en) | 2016-04-01 | 2020-08-14 | Polypeptides Capable of Forming Homo-Oligomers with Modular Hydrogen Bond Network-Mediated Specificity and Their Design |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201662317190P | 2016-04-01 | 2016-04-01 | |
PCT/US2017/025532 WO2017173356A1 (en) | 2016-04-01 | 2017-03-31 | Polypeptides capable of forming homo-oligomers with modular hydrogen bond network-mediated specificity and their design |
US201816088686A | 2018-09-26 | 2018-09-26 | |
US16/993,975 US20210101945A1 (en) | 2016-04-01 | 2020-08-14 | Polypeptides Capable of Forming Homo-Oligomers with Modular Hydrogen Bond Network-Mediated Specificity and Their Design |
Related Parent Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2017/025532 Division WO2017173356A1 (en) | 2016-04-01 | 2017-03-31 | Polypeptides capable of forming homo-oligomers with modular hydrogen bond network-mediated specificity and their design |
US16/088,686 Division US10988514B2 (en) | 2016-04-01 | 2017-03-31 | Polypeptdes capable of forming homo-oligomers with modular hydrogen bond network-mediated specificity and their design |
Publications (1)
Publication Number | Publication Date |
---|---|
US20210101945A1 true US20210101945A1 (en) | 2021-04-08 |
Family
ID=59965333
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/088,686 Active 2037-04-20 US10988514B2 (en) | 2016-04-01 | 2017-03-31 | Polypeptdes capable of forming homo-oligomers with modular hydrogen bond network-mediated specificity and their design |
US16/993,975 Pending US20210101945A1 (en) | 2016-04-01 | 2020-08-14 | Polypeptides Capable of Forming Homo-Oligomers with Modular Hydrogen Bond Network-Mediated Specificity and Their Design |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/088,686 Active 2037-04-20 US10988514B2 (en) | 2016-04-01 | 2017-03-31 | Polypeptdes capable of forming homo-oligomers with modular hydrogen bond network-mediated specificity and their design |
Country Status (6)
Country | Link |
---|---|
US (2) | US10988514B2 (en) |
EP (1) | EP3436470A4 (en) |
JP (3) | JP2019519468A (en) |
CN (1) | CN109311934B (en) |
CA (1) | CA3019594A1 (en) |
WO (1) | WO2017173356A1 (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109311934B (en) * | 2016-04-01 | 2022-04-29 | 华盛顿大学 | Polypeptides capable of forming homooligomers with modular hydrogen bonding network mediated specificity and design thereof |
US20210040469A1 (en) | 2018-01-25 | 2021-02-11 | University Of Washington | Engineered cell death-inducing enzymes and methods of use |
JP2022505866A (en) | 2018-11-02 | 2022-01-14 | ユニバーシティ オブ ワシントン | Orthogonal protein heterodimer |
KR20210138573A (en) * | 2019-01-07 | 2021-11-19 | 더 리전트 오브 더 유니버시티 오브 캘리포니아 | Caged-degron-based molecular feedback circuit and method of use thereof |
WO2020214549A1 (en) * | 2019-04-15 | 2020-10-22 | University Of Washington | Self-assembling 2d arrays with de novo protein building blocks |
EP3956344A1 (en) * | 2019-04-16 | 2022-02-23 | University of Washington | Amantadine binding protein |
WO2021155132A1 (en) | 2020-01-29 | 2021-08-05 | University Of Washington | De novo stable, modular pd-1 binding proteins and oligomeric variants |
US20230192773A1 (en) * | 2020-04-30 | 2023-06-22 | University Of Washington | De novo designed alpha-helical protein channels |
US20230287402A1 (en) * | 2020-08-25 | 2023-09-14 | University Of Washington | Two component co-assembling two dimensional protein structures |
JP2022103481A (en) * | 2020-12-28 | 2022-07-08 | 富士通株式会社 | Stable structure search method of cyclic peptide, stable structure search program of cyclic peptide, and stable structure search apparatus of cyclic peptide |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1998047089A1 (en) | 1997-04-11 | 1998-10-22 | California Institute Of Technology | Apparatus and method for automated protein design |
US20040031072A1 (en) * | 1999-05-06 | 2004-02-12 | La Rosa Thomas J. | Soy nucleic acid molecules and other molecules associated with transcription plants and uses thereof for plant improvement |
US7574306B1 (en) * | 2003-11-20 | 2009-08-11 | University Of Washington | Method and system for optimization of polymer sequences to produce polymers with stable, 3-dimensional conformations |
US20050202510A1 (en) | 2004-02-24 | 2005-09-15 | The Board Of Trustees Of The Leland Stanford Junior University | Method for identifying a site of protein-protein interaction for the rational design of short peptides that interfere with that interaction |
NZ576023A (en) * | 2006-10-03 | 2012-06-29 | Cadila Healthcare Ltd | Antidiabetic compounds comprising a fragment of a glucagon peptide and derivatives thereof |
US20110264432A1 (en) * | 2008-01-17 | 2011-10-27 | Aarhus Universitet | System and method for modelling a molecule with a graph |
BRPI0923346A2 (en) | 2008-12-08 | 2017-07-11 | Complix Nv | SINGLE CHAIN ANTIPARALLEL SUPERHELIC PROTEINS |
WO2010123898A1 (en) * | 2009-04-20 | 2010-10-28 | Fox Chase Cancer Center | Rotamer libraries and methods of use thereof |
WO2011047684A1 (en) * | 2009-10-19 | 2011-04-28 | Andersen Joergen Ellegaard | System and method for associating a moduli space with a molecule |
US20110224404A1 (en) | 2010-03-11 | 2011-09-15 | Florida State University Research Foundation | Method for development of a peptide building block useful for de novo protein design |
US10248758B2 (en) * | 2013-02-07 | 2019-04-02 | University Of Washington Through Its Center For Commercialization | Self-assembling protein nanostructures |
CA2925067C (en) * | 2013-09-25 | 2022-08-23 | Zymeworks Inc. | Systems and methods for making two dimensional graphs of complex molecules |
CN109311934B (en) * | 2016-04-01 | 2022-04-29 | 华盛顿大学 | Polypeptides capable of forming homooligomers with modular hydrogen bonding network mediated specificity and design thereof |
-
2017
- 2017-03-31 CN CN201780026994.6A patent/CN109311934B/en active Active
- 2017-03-31 WO PCT/US2017/025532 patent/WO2017173356A1/en active Application Filing
- 2017-03-31 JP JP2018550752A patent/JP2019519468A/en active Pending
- 2017-03-31 EP EP17776838.9A patent/EP3436470A4/en active Pending
- 2017-03-31 CA CA3019594A patent/CA3019594A1/en active Pending
- 2017-03-31 US US16/088,686 patent/US10988514B2/en active Active
-
2020
- 2020-08-14 US US16/993,975 patent/US20210101945A1/en active Pending
-
2022
- 2022-02-21 JP JP2022025023A patent/JP2022082471A/en not_active Withdrawn
-
2024
- 2024-01-05 JP JP2024000870A patent/JP2024056682A/en active Pending
Non-Patent Citations (1)
Title |
---|
Francis, D.M. and Page, R. (2010), Strategies to Optimize Protein Expression in E. coli. Current Protocols in Protein Science, 61: 5.24.1-5.24.29 (Year: 2010) * |
Also Published As
Publication number | Publication date |
---|---|
CN109311934A (en) | 2019-02-05 |
US20190112345A1 (en) | 2019-04-18 |
JP2019519468A (en) | 2019-07-11 |
JP2022082471A (en) | 2022-06-01 |
US10988514B2 (en) | 2021-04-27 |
EP3436470A4 (en) | 2019-11-20 |
CN109311934B (en) | 2022-04-29 |
EP3436470A1 (en) | 2019-02-06 |
WO2017173356A1 (en) | 2017-10-05 |
JP2024056682A (en) | 2024-04-23 |
CA3019594A1 (en) | 2017-10-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210101945A1 (en) | Polypeptides Capable of Forming Homo-Oligomers with Modular Hydrogen Bond Network-Mediated Specificity and Their Design | |
US20240038331A1 (en) | Self-Assembling Protein Nanostructures | |
US20210134388A1 (en) | Hyperstable Constrained Peptides and Their Design | |
Liu et al. | Bacterial Vipp1 and PspA are members of the ancient ESCRT-III membrane-remodeling superfamily | |
Pornillos et al. | Disulfide bond stabilization of the hexameric capsomer of human immunodeficiency virus | |
US10818377B2 (en) | Computational design of self-assembling cyclic protein homo-oligomers | |
US8969521B2 (en) | General method for designing self-assembling protein nanomaterials | |
WO2017106728A2 (en) | Repeat protein architectures | |
Harris et al. | An engineered switch in T cell receptor specificity leads to an unusual but functional binding geometry | |
Mylemans et al. | Structural plasticity of a designer protein sheds light on β‐propeller protein evolution | |
Hallin et al. | Crystal and solution structures reveal oligomerization of individual capsid homology domains of Drosophila Arc | |
US20210363214A1 (en) | Transmembrane polypeptides | |
US20230279055A1 (en) | De Novo Design of Immunoglobulin-like Domains | |
Gaur et al. | Design of human ACE2 mimic miniprotein binders that interact with RBD of SARS-CoV-2 variants of concerns | |
Liu et al. | Homology models of the tetramerization domain of six eukaryotic voltage-gated potassium channels Kv1. 1-Kv1. 6 | |
US11802141B2 (en) | De novo designed non-local beta sheet proteins | |
US20240029824A1 (en) | De novo design of obligate ABC heterotrimeric proteins | |
US20230295230A1 (en) | Transmembrane beta barrel proteins | |
US20240013853A1 (en) | De Novo Designed Homo-Oligomeric Protein Assemblies | |
WO2013003752A2 (en) | Methods for design of epitope scaffolds | |
Martinez-Felices et al. | Structural and biochemical analysis of a B12 super-binder | |
Hall-Beauvais et al. | De novo designed proteins: a study in engineering novel folds and functions | |
Fraser et al. | Template strand deoxyuridine promoter recognition by a viral RNA polymerase | |
Cappele et al. | Structural and biochemical analysis of OrfG: the VirB8-like component of the integrative and conjugative element ICE St3 from Streptococcus thermophilus | |
Vance et al. | SPACA6 structure reveals a conserved superfamily of gamete fusion-associated proteins |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: UNIVERSITY OF WASHINGTON, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HOWARD HUGHES MEDICAL INSTITUTE;REEL/FRAME:053516/0781 Effective date: 20200409 Owner name: UNIVERSITY OF WASHINGTON, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BOYKEN, SCOTT;CHEN, ZIBO;XU, CHUNFU;AND OTHERS;REEL/FRAME:053516/0662 Effective date: 20170407 Owner name: HOWARD HUGHES MEDICAL INSTITUTE, MARYLAND Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BAKER, DAVID;REEL/FRAME:053516/0712 Effective date: 20160525 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: APPLICATION DISPATCHED FROM PREEXAM, NOT YET DOCKETED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
AS | Assignment |
Owner name: HOWARD HUGHES MEDICAL INSTITUTE, MARYLAND Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BOYKEN, SCOTT;REEL/FRAME:063776/0787 Effective date: 20160906 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |