CN114835818B - Gene editing fusion protein, adenine base editor constructed by same and application thereof - Google Patents
Gene editing fusion protein, adenine base editor constructed by same and application thereof Download PDFInfo
- Publication number
- CN114835818B CN114835818B CN202210265179.8A CN202210265179A CN114835818B CN 114835818 B CN114835818 B CN 114835818B CN 202210265179 A CN202210265179 A CN 202210265179A CN 114835818 B CN114835818 B CN 114835818B
- Authority
- CN
- China
- Prior art keywords
- lys
- leu
- ile
- glu
- asn
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 title claims abstract description 61
- 229930024421 Adenine Natural products 0.000 title claims abstract description 60
- 229960000643 adenine Drugs 0.000 title claims abstract description 60
- 102000037865 fusion proteins Human genes 0.000 title claims abstract description 27
- 108020001507 fusion proteins Proteins 0.000 title claims abstract description 27
- 238000010362 genome editing Methods 0.000 title claims abstract description 27
- 238000003780 insertion Methods 0.000 claims abstract description 16
- 230000037431 insertion Effects 0.000 claims abstract description 16
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 claims abstract description 12
- 125000003275 alpha amino acid group Chemical group 0.000 claims abstract 3
- 239000013612 plasmid Substances 0.000 claims description 24
- 235000014469 Bacillus subtilis Nutrition 0.000 claims description 17
- 244000063299 Bacillus subtilis Species 0.000 claims description 16
- 108090000623 proteins and genes Proteins 0.000 claims description 11
- 101710089384 Extracellular protease Proteins 0.000 claims description 7
- 230000001276 controlling effect Effects 0.000 claims description 4
- 239000002773 nucleotide Substances 0.000 claims description 4
- 125000003729 nucleotide group Chemical group 0.000 claims description 4
- 230000001105 regulatory effect Effects 0.000 claims description 4
- 230000000415 inactivating effect Effects 0.000 claims description 2
- 230000001939 inductive effect Effects 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 14
- 238000000034 method Methods 0.000 abstract description 11
- 230000008569 process Effects 0.000 abstract description 8
- 108010003700 lysyl aspartic acid Proteins 0.000 description 41
- 108091033409 CRISPR Proteins 0.000 description 30
- XMBSYZWANAQXEV-UHFFFAOYSA-N N-alpha-L-glutamyl-L-phenylalanine Natural products OC(=O)CCC(N)C(=O)NC(C(O)=O)CC1=CC=CC=C1 XMBSYZWANAQXEV-UHFFFAOYSA-N 0.000 description 29
- 108010092854 aspartyllysine Proteins 0.000 description 28
- YLRAFVVWZRSZQC-DZKIICNBSA-N Val-Phe-Glu Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCC(=O)O)C(=O)O)N YLRAFVVWZRSZQC-DZKIICNBSA-N 0.000 description 21
- YBAFDPFAUTYYRW-UHFFFAOYSA-N N-L-alpha-glutamyl-L-leucine Natural products CC(C)CC(C(O)=O)NC(=O)C(N)CCC(O)=O YBAFDPFAUTYYRW-UHFFFAOYSA-N 0.000 description 18
- KOSRFJWDECSPRO-UHFFFAOYSA-N alpha-L-glutamyl-L-glutamic acid Natural products OC(=O)CCC(N)C(=O)NC(CCC(O)=O)C(O)=O KOSRFJWDECSPRO-UHFFFAOYSA-N 0.000 description 18
- 108010034529 leucyl-lysine Proteins 0.000 description 18
- 238000010354 CRISPR gene editing Methods 0.000 description 17
- RMNMUUCYTMLWNA-ZPFDUUQYSA-N Ile-Lys-Asp Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(=O)O)C(=O)O)N RMNMUUCYTMLWNA-ZPFDUUQYSA-N 0.000 description 17
- 241000880493 Leptailurus serval Species 0.000 description 17
- FSXRLASFHBWESK-UHFFFAOYSA-N dipeptide phenylalanyl-tyrosine Natural products C=1C=C(O)C=CC=1CC(C(O)=O)NC(=O)C(N)CC1=CC=CC=C1 FSXRLASFHBWESK-UHFFFAOYSA-N 0.000 description 17
- 108010051242 phenylalanylserine Proteins 0.000 description 17
- FADYJNXDPBKVCA-UHFFFAOYSA-N L-Phenylalanyl-L-lysin Natural products NCCCCC(C(O)=O)NC(=O)C(N)CC1=CC=CC=C1 FADYJNXDPBKVCA-UHFFFAOYSA-N 0.000 description 16
- 108010050848 glycylleucine Proteins 0.000 description 16
- 108020004414 DNA Proteins 0.000 description 15
- 108010054155 lysyllysine Proteins 0.000 description 15
- JCFYLFOCALSNLQ-GUBZILKMSA-N Lys-Ala-Gln Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](C)C(=O)N[C@@H](CCC(N)=O)C(O)=O JCFYLFOCALSNLQ-GUBZILKMSA-N 0.000 description 14
- YWAQATDNEKZFFK-BYPYZUCNSA-N Gly-Gly-Ser Chemical compound NCC(=O)NCC(=O)N[C@@H](CO)C(O)=O YWAQATDNEKZFFK-BYPYZUCNSA-N 0.000 description 13
- GVKKVHNRTUFCCE-BJDJZHNGSA-N Ile-Leu-Ser Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(=O)O)N GVKKVHNRTUFCCE-BJDJZHNGSA-N 0.000 description 13
- 108010038633 aspartylglutamate Proteins 0.000 description 13
- VPZXBVLAVMBEQI-UHFFFAOYSA-N glycyl-DL-alpha-alanine Natural products OC(=O)C(C)NC(=O)CN VPZXBVLAVMBEQI-UHFFFAOYSA-N 0.000 description 13
- 108010064235 lysylglycine Proteins 0.000 description 13
- JBGSZRYCXBPWGX-BQBZGAKWSA-N Ala-Arg-Gly Chemical compound OC(=O)CNC(=O)[C@@H](NC(=O)[C@@H](N)C)CCCN=C(N)N JBGSZRYCXBPWGX-BQBZGAKWSA-N 0.000 description 12
- GOWZVQXTHUCNSQ-NHCYSSNCSA-N Arg-Glu-Val Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](C(C)C)C(O)=O GOWZVQXTHUCNSQ-NHCYSSNCSA-N 0.000 description 12
- UGKZHCBLMLSANF-CIUDSAMLSA-N Asp-Asn-Leu Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(C)C)C(O)=O UGKZHCBLMLSANF-CIUDSAMLSA-N 0.000 description 12
- TZOZNVLBTAFJRW-UGYAYLCHSA-N Asp-Ile-Asp Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)O)NC(=O)[C@H](CC(=O)O)N TZOZNVLBTAFJRW-UGYAYLCHSA-N 0.000 description 12
- NNKLKUUGESXCBS-KBPBESRZSA-N Lys-Gly-Tyr Chemical compound [H]N[C@@H](CCCCN)C(=O)NCC(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O NNKLKUUGESXCBS-KBPBESRZSA-N 0.000 description 12
- SITLTJHOQZFJGG-UHFFFAOYSA-N N-L-alpha-glutamyl-L-valine Natural products CC(C)C(C(O)=O)NC(=O)C(N)CCC(O)=O SITLTJHOQZFJGG-UHFFFAOYSA-N 0.000 description 12
- 108010057821 leucylproline Proteins 0.000 description 12
- VJTWLBMESLDOMK-WDSKDSINSA-N Asn-Gln-Gly Chemical compound NC(=O)C[C@H](N)C(=O)N[C@@H](CCC(N)=O)C(=O)NCC(O)=O VJTWLBMESLDOMK-WDSKDSINSA-N 0.000 description 11
- BJDHEININLSZOT-KKUMJFAQSA-N Asp-Tyr-Lys Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CCCCN)C(O)=O BJDHEININLSZOT-KKUMJFAQSA-N 0.000 description 11
- JPHYJQHPILOKHC-ACZMJKKPSA-N Glu-Asp-Asp Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(O)=O)C(O)=O JPHYJQHPILOKHC-ACZMJKKPSA-N 0.000 description 11
- UERORLSAFUHDGU-AVGNSLFASA-N Glu-Phe-Asn Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)[C@H](CCC(=O)O)N UERORLSAFUHDGU-AVGNSLFASA-N 0.000 description 11
- MRWYPDWDZSLWJM-ACZMJKKPSA-N Glu-Ser-Asp Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(O)=O)C(O)=O MRWYPDWDZSLWJM-ACZMJKKPSA-N 0.000 description 11
- NYEYYMLUABXDMC-NHCYSSNCSA-N Ile-Gly-Leu Chemical compound CC[C@H](C)[C@@H](C(=O)NCC(=O)N[C@@H](CC(C)C)C(=O)O)N NYEYYMLUABXDMC-NHCYSSNCSA-N 0.000 description 11
- KBAPKNDWAGVGTH-IGISWZIWSA-N Ile-Ile-Tyr Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 KBAPKNDWAGVGTH-IGISWZIWSA-N 0.000 description 11
- ZNOBVZFCHNHKHA-KBIXCLLPSA-N Ile-Ser-Glu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(=O)O)C(=O)O)N ZNOBVZFCHNHKHA-KBIXCLLPSA-N 0.000 description 11
- MPOHDJKRBLVGCT-CIUDSAMLSA-N Lys-Ala-Asn Chemical compound C[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)[C@H](CCCCN)N MPOHDJKRBLVGCT-CIUDSAMLSA-N 0.000 description 11
- WSXTWLJHTLRFLW-SRVKXCTJSA-N Lys-Ala-Lys Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@@H](CCCCN)C(O)=O WSXTWLJHTLRFLW-SRVKXCTJSA-N 0.000 description 11
- ZENDEDYRYVHBEG-SRVKXCTJSA-N Phe-Asp-Asp Chemical compound OC(=O)C[C@@H](C(O)=O)NC(=O)[C@H](CC(O)=O)NC(=O)[C@@H](N)CC1=CC=CC=C1 ZENDEDYRYVHBEG-SRVKXCTJSA-N 0.000 description 11
- NHCKESBLOMHIIE-IRXDYDNUSA-N Phe-Gly-Phe Chemical compound C([C@H](N)C(=O)NCC(=O)N[C@@H](CC=1C=CC=CC=1)C(O)=O)C1=CC=CC=C1 NHCKESBLOMHIIE-IRXDYDNUSA-N 0.000 description 11
- GPLWGAYGROGDEN-BZSNNMDCSA-N Phe-Phe-Ser Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CO)C(O)=O GPLWGAYGROGDEN-BZSNNMDCSA-N 0.000 description 11
- GOUWCZRDTWTODO-YDHLFZDLSA-N Phe-Val-Asn Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC(N)=O)C(O)=O GOUWCZRDTWTODO-YDHLFZDLSA-N 0.000 description 11
- LRZLZIUXQBIWTB-KATARQTJSA-N Ser-Lys-Thr Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)O)C(O)=O LRZLZIUXQBIWTB-KATARQTJSA-N 0.000 description 11
- 108010062796 arginyllysine Proteins 0.000 description 11
- 108010092114 histidylphenylalanine Proteins 0.000 description 11
- 108010057952 lysyl-phenylalanyl-lysine Proteins 0.000 description 11
- DOXQMJCSSYZSNM-BZSNNMDCSA-N Phe-Lys-Leu Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(O)=O DOXQMJCSSYZSNM-BZSNNMDCSA-N 0.000 description 10
- JIPVNVNKXJLFJF-BJDJZHNGSA-N Ser-Ile-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CO)N JIPVNVNKXJLFJF-BJDJZHNGSA-N 0.000 description 10
- WAPFQMXRSDEGOE-IHRRRGAJSA-N Tyr-Glu-Gln Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(N)=O)C(O)=O WAPFQMXRSDEGOE-IHRRRGAJSA-N 0.000 description 10
- 230000035772 mutation Effects 0.000 description 10
- VFDRDMOMHBJGKD-UFYCRDLUSA-N Phe-Tyr-Arg Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CC2=CC=C(C=C2)O)C(=O)N[C@@H](CCCN=C(N)N)C(=O)O)N VFDRDMOMHBJGKD-UFYCRDLUSA-N 0.000 description 9
- JCLAFVNDBJMLBC-JBDRJPRFSA-N Ser-Ser-Ile Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CO)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O JCLAFVNDBJMLBC-JBDRJPRFSA-N 0.000 description 9
- 108010052875 Adenine deaminase Proteins 0.000 description 8
- HPBNLFLSSQDFQW-WHFBIAKZSA-N Asn-Ser-Gly Chemical compound NC(=O)C[C@H](N)C(=O)N[C@@H](CO)C(=O)NCC(O)=O HPBNLFLSSQDFQW-WHFBIAKZSA-N 0.000 description 8
- 108010002311 N-glycylglutamic acid Proteins 0.000 description 8
- YMTLKLXDFCSCNX-BYPYZUCNSA-N Ser-Gly-Gly Chemical compound OC[C@H](N)C(=O)NCC(=O)NCC(O)=O YMTLKLXDFCSCNX-BYPYZUCNSA-N 0.000 description 8
- 108010040443 aspartyl-aspartic acid Proteins 0.000 description 8
- 230000002779 inactivation Effects 0.000 description 8
- 108010038320 lysylphenylalanine Proteins 0.000 description 8
- 230000001404 mediated effect Effects 0.000 description 8
- 108090000765 processed proteins & peptides Proteins 0.000 description 8
- RLMISHABBKUNFO-WHFBIAKZSA-N Ala-Ala-Gly Chemical compound C[C@H](N)C(=O)N[C@@H](C)C(=O)NCC(O)=O RLMISHABBKUNFO-WHFBIAKZSA-N 0.000 description 7
- CCDFBRZVTDDJNM-GUBZILKMSA-N Ala-Leu-Glu Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(O)=O)C(O)=O CCDFBRZVTDDJNM-GUBZILKMSA-N 0.000 description 7
- VCSABYLVNWQYQE-UHFFFAOYSA-N Ala-Lys-Lys Natural products NCCCCC(NC(=O)C(N)C)C(=O)NC(CCCCN)C(O)=O VCSABYLVNWQYQE-UHFFFAOYSA-N 0.000 description 7
- QISZHYWZHJRDAO-CIUDSAMLSA-N Asn-Asp-Lys Chemical compound C(CCN)C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](CC(=O)N)N QISZHYWZHJRDAO-CIUDSAMLSA-N 0.000 description 7
- KHBLRHKVXICFMY-GUBZILKMSA-N Asp-Glu-Lys Chemical compound N[C@@H](CC(=O)O)C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CCCCN)C(=O)O KHBLRHKVXICFMY-GUBZILKMSA-N 0.000 description 7
- GQNZIAGMRXOFJX-GUBZILKMSA-N Cys-Val-Met Chemical compound [H]N[C@@H](CS)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CCSC)C(O)=O GQNZIAGMRXOFJX-GUBZILKMSA-N 0.000 description 7
- 241000588724 Escherichia coli Species 0.000 description 7
- HJIFPJUEOGZWRI-GUBZILKMSA-N Glu-Asp-Lys Chemical compound C(CCN)C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](CCC(=O)O)N HJIFPJUEOGZWRI-GUBZILKMSA-N 0.000 description 7
- FGPLUIQCSKGLTI-WDSKDSINSA-N Gly-Ser-Glu Chemical compound NCC(=O)N[C@@H](CO)C(=O)N[C@H](C(O)=O)CCC(O)=O FGPLUIQCSKGLTI-WDSKDSINSA-N 0.000 description 7
- WCORRBXVISTKQL-WHFBIAKZSA-N Gly-Ser-Ser Chemical compound NCC(=O)N[C@@H](CO)C(=O)N[C@@H](CO)C(O)=O WCORRBXVISTKQL-WHFBIAKZSA-N 0.000 description 7
- LMMPTUVWHCFTOT-GARJFASQSA-N His-Asp-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CC(=O)O)NC(=O)[C@H](CC2=CN=CN2)N)C(=O)O LMMPTUVWHCFTOT-GARJFASQSA-N 0.000 description 7
- ADDYYRVQQZFIMW-MNXVOIDGSA-N Ile-Lys-Glu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCC(=O)O)C(=O)O)N ADDYYRVQQZFIMW-MNXVOIDGSA-N 0.000 description 7
- WXJKFRMKJORORD-DCAQKATOSA-N Lys-Arg-Ala Chemical compound NC(=N)NCCC[C@@H](C(=O)N[C@@H](C)C(O)=O)NC(=O)[C@@H](N)CCCCN WXJKFRMKJORORD-DCAQKATOSA-N 0.000 description 7
- LMVOVCYVZBBWQB-SRVKXCTJSA-N Lys-Asp-Lys Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@H](C(O)=O)CCCCN LMVOVCYVZBBWQB-SRVKXCTJSA-N 0.000 description 7
- NJNRBRKHOWSGMN-SRVKXCTJSA-N Lys-Leu-Asn Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(N)=O)C(O)=O NJNRBRKHOWSGMN-SRVKXCTJSA-N 0.000 description 7
- RBEATVHTWHTHTJ-KKUMJFAQSA-N Lys-Leu-Lys Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(O)=O RBEATVHTWHTHTJ-KKUMJFAQSA-N 0.000 description 7
- VSTNAUBHKQPVJX-IHRRRGAJSA-N Lys-Met-Leu Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CC(C)C)C(O)=O VSTNAUBHKQPVJX-IHRRRGAJSA-N 0.000 description 7
- WIVCOAKLPICYGY-KKUMJFAQSA-N Phe-Asp-Lys Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CCCCN)C(=O)O)N WIVCOAKLPICYGY-KKUMJFAQSA-N 0.000 description 7
- FMMIYCMOVGXZIP-AVGNSLFASA-N Phe-Glu-Asn Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(N)=O)C(O)=O FMMIYCMOVGXZIP-AVGNSLFASA-N 0.000 description 7
- KYYMILWEGJYPQZ-IHRRRGAJSA-N Phe-Glu-Glu Chemical compound OC(=O)CC[C@@H](C(O)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@@H](N)CC1=CC=CC=C1 KYYMILWEGJYPQZ-IHRRRGAJSA-N 0.000 description 7
- HMRAQFJFTOLDKW-GUBZILKMSA-N Ser-His-Glu Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CCC(O)=O)C(O)=O HMRAQFJFTOLDKW-GUBZILKMSA-N 0.000 description 7
- XXNYYSXNXCJYKX-DCAQKATOSA-N Ser-Leu-Met Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCSC)C(O)=O XXNYYSXNXCJYKX-DCAQKATOSA-N 0.000 description 7
- LRWBCWGEUCKDTN-BJDJZHNGSA-N Ser-Lys-Ile Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O LRWBCWGEUCKDTN-BJDJZHNGSA-N 0.000 description 7
- STGXWWBXWXZOER-MBLNEYKQSA-N Thr-Ala-His Chemical compound C[C@@H](O)[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@H](C(O)=O)CC1=CN=CN1 STGXWWBXWXZOER-MBLNEYKQSA-N 0.000 description 7
- BVOVIGCHYNFJBZ-JXUBOQSCSA-N Thr-Leu-Ala Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C)C(O)=O BVOVIGCHYNFJBZ-JXUBOQSCSA-N 0.000 description 7
- MXDOAJQRJBMGMO-FJXKBIBVSA-N Thr-Pro-Gly Chemical compound C[C@@H](O)[C@H](N)C(=O)N1CCC[C@H]1C(=O)NCC(O)=O MXDOAJQRJBMGMO-FJXKBIBVSA-N 0.000 description 7
- XHWCDRUPDNSDAZ-XKBZYTNZSA-N Thr-Ser-Glu Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(=O)O)C(=O)O)N)O XHWCDRUPDNSDAZ-XKBZYTNZSA-N 0.000 description 7
- FOADDSDHGRFUOC-DZKIICNBSA-N Val-Glu-Phe Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)O)N FOADDSDHGRFUOC-DZKIICNBSA-N 0.000 description 7
- 108010024078 alanyl-glycyl-serine Proteins 0.000 description 7
- 108010087924 alanylproline Proteins 0.000 description 7
- 108010078144 glutaminyl-glycine Proteins 0.000 description 7
- 108010055341 glutamyl-glutamic acid Proteins 0.000 description 7
- 230000006698 induction Effects 0.000 description 7
- 108010044374 isoleucyl-tyrosine Proteins 0.000 description 7
- 102000004169 proteins and genes Human genes 0.000 description 7
- 238000012163 sequencing technique Methods 0.000 description 7
- FJVAQLJNTSUQPY-CIUDSAMLSA-N Ala-Ala-Lys Chemical compound C[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@H](C(O)=O)CCCCN FJVAQLJNTSUQPY-CIUDSAMLSA-N 0.000 description 6
- CVGNCMIULZNYES-WHFBIAKZSA-N Ala-Asn-Gly Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CC(N)=O)C(=O)NCC(O)=O CVGNCMIULZNYES-WHFBIAKZSA-N 0.000 description 6
- SHYYAQLDNVHPFT-DLOVCJGASA-N Ala-Asn-Phe Chemical compound C[C@H](N)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 SHYYAQLDNVHPFT-DLOVCJGASA-N 0.000 description 6
- FXKNPWNXPQZLES-ZLUOBGJFSA-N Ala-Asn-Ser Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CO)C(O)=O FXKNPWNXPQZLES-ZLUOBGJFSA-N 0.000 description 6
- GGNHBHYDMUDXQB-KBIXCLLPSA-N Ala-Glu-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@H](C)N GGNHBHYDMUDXQB-KBIXCLLPSA-N 0.000 description 6
- TZDNWXDLYFIFPT-BJDJZHNGSA-N Ala-Ile-Leu Chemical compound [H]N[C@@H](C)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(C)C)C(O)=O TZDNWXDLYFIFPT-BJDJZHNGSA-N 0.000 description 6
- IHRGVZXPTIQNIP-NAKRPEOUSA-N Ala-Met-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CCSC)NC(=O)[C@H](C)N IHRGVZXPTIQNIP-NAKRPEOUSA-N 0.000 description 6
- IORKCNUBHNIMKY-CIUDSAMLSA-N Ala-Pro-Glu Chemical compound C[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCC(O)=O)C(O)=O IORKCNUBHNIMKY-CIUDSAMLSA-N 0.000 description 6
- BTRULDJUUVGRNE-DCAQKATOSA-N Ala-Pro-Lys Chemical compound C[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCCCN)C(O)=O BTRULDJUUVGRNE-DCAQKATOSA-N 0.000 description 6
- DCVYRWFAMZFSDA-ZLUOBGJFSA-N Ala-Ser-Ala Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CO)C(=O)N[C@@H](C)C(O)=O DCVYRWFAMZFSDA-ZLUOBGJFSA-N 0.000 description 6
- BHFOJPDOQPWJRN-XDTLVQLUSA-N Ala-Tyr-Gln Chemical compound C[C@H](N)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N[C@@H](CCC(N)=O)C(O)=O BHFOJPDOQPWJRN-XDTLVQLUSA-N 0.000 description 6
- QRIYOHQJRDHFKF-UWJYBYFXSA-N Ala-Tyr-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@@H](N)C)CC1=CC=C(O)C=C1 QRIYOHQJRDHFKF-UWJYBYFXSA-N 0.000 description 6
- VBFJESQBIWCWRL-DCAQKATOSA-N Arg-Ala-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@H](C)NC(=O)[C@@H](N)CCCNC(N)=N VBFJESQBIWCWRL-DCAQKATOSA-N 0.000 description 6
- OZNSCVPYWZRQPY-CIUDSAMLSA-N Arg-Asp-Glu Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O OZNSCVPYWZRQPY-CIUDSAMLSA-N 0.000 description 6
- SLNCSSWAIDUUGF-LSJOCFKGSA-N Arg-His-Ala Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](C)C(O)=O SLNCSSWAIDUUGF-LSJOCFKGSA-N 0.000 description 6
- NVUIWHJLPSZZQC-CYDGBPFRSA-N Arg-Ile-Arg Chemical compound NC(N)=NCCC[C@H](N)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CCCN=C(N)N)C(O)=O NVUIWHJLPSZZQC-CYDGBPFRSA-N 0.000 description 6
- RIIVUOJDDQXHRV-SRVKXCTJSA-N Arg-Lys-Gln Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCC(N)=O)C(O)=O RIIVUOJDDQXHRV-SRVKXCTJSA-N 0.000 description 6
- XUGATJVGQUGQKY-ULQDDVLXSA-N Arg-Lys-Phe Chemical compound NC(=N)NCCC[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 XUGATJVGQUGQKY-ULQDDVLXSA-N 0.000 description 6
- DPLFNLDACGGBAK-KKUMJFAQSA-N Arg-Phe-Glu Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)O)NC(=O)[C@H](CCCN=C(N)N)N DPLFNLDACGGBAK-KKUMJFAQSA-N 0.000 description 6
- XHFXZQHTLJVZBN-FXQIFTODSA-N Asn-Arg-Asn Chemical compound C(C[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)[C@H](CC(=O)N)N)CN=C(N)N XHFXZQHTLJVZBN-FXQIFTODSA-N 0.000 description 6
- RCENDENBBJFJHZ-ACZMJKKPSA-N Asn-Asn-Gln Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(N)=O)C(O)=O RCENDENBBJFJHZ-ACZMJKKPSA-N 0.000 description 6
- BVLIJXXSXBUGEC-SRVKXCTJSA-N Asn-Asn-Tyr Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O BVLIJXXSXBUGEC-SRVKXCTJSA-N 0.000 description 6
- BHQQRVARKXWXPP-ACZMJKKPSA-N Asn-Asp-Glu Chemical compound C(CC(=O)O)[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](CC(=O)N)N BHQQRVARKXWXPP-ACZMJKKPSA-N 0.000 description 6
- ZWASIOHRQWRWAS-UGYAYLCHSA-N Asn-Asp-Ile Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O ZWASIOHRQWRWAS-UGYAYLCHSA-N 0.000 description 6
- XWFPGQVLOVGSLU-CIUDSAMLSA-N Asn-Gln-Arg Chemical compound NC(=O)C[C@H](N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@H](C(O)=O)CCCN=C(N)N XWFPGQVLOVGSLU-CIUDSAMLSA-N 0.000 description 6
- BZMWJLLUAKSIMH-FXQIFTODSA-N Asn-Glu-Glu Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O BZMWJLLUAKSIMH-FXQIFTODSA-N 0.000 description 6
- ODBSSLHUFPJRED-CIUDSAMLSA-N Asn-His-Asn Chemical compound C1=C(NC=N1)C[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)[C@H](CC(=O)N)N ODBSSLHUFPJRED-CIUDSAMLSA-N 0.000 description 6
- AITGTTNYKAWKDR-CIUDSAMLSA-N Asn-His-Ser Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CO)C(O)=O AITGTTNYKAWKDR-CIUDSAMLSA-N 0.000 description 6
- YYSYDIYQTUPNQQ-SXTJYALSSA-N Asn-Ile-Ile Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O YYSYDIYQTUPNQQ-SXTJYALSSA-N 0.000 description 6
- ACKNRKFVYUVWAC-ZPFDUUQYSA-N Asn-Ile-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CC(=O)N)N ACKNRKFVYUVWAC-ZPFDUUQYSA-N 0.000 description 6
- BXUHCIXDSWRSBS-CIUDSAMLSA-N Asn-Leu-Asp Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(O)=O)C(O)=O BXUHCIXDSWRSBS-CIUDSAMLSA-N 0.000 description 6
- BZWRLDPIWKOVKB-ZPFDUUQYSA-N Asn-Leu-Ile Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O BZWRLDPIWKOVKB-ZPFDUUQYSA-N 0.000 description 6
- TZFQICWZWFNIKU-KKUMJFAQSA-N Asn-Leu-Tyr Chemical compound NC(=O)C[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 TZFQICWZWFNIKU-KKUMJFAQSA-N 0.000 description 6
- JWKDQOORUCYUIW-ZPFDUUQYSA-N Asn-Lys-Ile Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O JWKDQOORUCYUIW-ZPFDUUQYSA-N 0.000 description 6
- WXVGISRWSYGEDK-KKUMJFAQSA-N Asn-Lys-Phe Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)O)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CC(=O)N)N WXVGISRWSYGEDK-KKUMJFAQSA-N 0.000 description 6
- WCRQQIPFSXFIRN-LPEHRKFASA-N Asn-Met-Pro Chemical compound CSCC[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CC(=O)N)N WCRQQIPFSXFIRN-LPEHRKFASA-N 0.000 description 6
- CDGHMJJJHYKMPA-DLOVCJGASA-N Asn-Phe-Ala Chemical compound C[C@@H](C(=O)O)NC(=O)[C@H](CC1=CC=CC=C1)NC(=O)[C@H](CC(=O)N)N CDGHMJJJHYKMPA-DLOVCJGASA-N 0.000 description 6
- GMUOCGCDOYYWPD-FXQIFTODSA-N Asn-Pro-Ser Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CO)C(O)=O GMUOCGCDOYYWPD-FXQIFTODSA-N 0.000 description 6
- VPPXTHJNTYDNFJ-CIUDSAMLSA-N Asp-Ala-Lys Chemical compound C[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CC(=O)O)N VPPXTHJNTYDNFJ-CIUDSAMLSA-N 0.000 description 6
- ZELQAFZSJOBEQS-ACZMJKKPSA-N Asp-Asn-Glu Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O ZELQAFZSJOBEQS-ACZMJKKPSA-N 0.000 description 6
- QOVWVLLHMMCFFY-ZLUOBGJFSA-N Asp-Asp-Asn Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(N)=O)C(O)=O QOVWVLLHMMCFFY-ZLUOBGJFSA-N 0.000 description 6
- SBHUBSDEZQFJHJ-CIUDSAMLSA-N Asp-Asp-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CC(O)=O)NC(=O)[C@@H](N)CC(O)=O SBHUBSDEZQFJHJ-CIUDSAMLSA-N 0.000 description 6
- CELPEWWLSXMVPH-CIUDSAMLSA-N Asp-Asp-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@H](CC(O)=O)NC(=O)[C@@H](N)CC(O)=O CELPEWWLSXMVPH-CIUDSAMLSA-N 0.000 description 6
- QCLHLXDWRKOHRR-GUBZILKMSA-N Asp-Glu-His Chemical compound C1=C(NC=N1)C[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CC(=O)O)N QCLHLXDWRKOHRR-GUBZILKMSA-N 0.000 description 6
- SPWXXPFDTMYTRI-IUKAMOBKSA-N Asp-Ile-Thr Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)O)C(O)=O SPWXXPFDTMYTRI-IUKAMOBKSA-N 0.000 description 6
- JTRDJYIZIKCIRC-AJNGGQMLSA-N Asp-Leu-Leu-Gln Chemical compound CC(C)C[C@H](NC(=O)[C@@H](N)CC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(O)=O JTRDJYIZIKCIRC-AJNGGQMLSA-N 0.000 description 6
- UMHUHHJMEXNSIV-CIUDSAMLSA-N Asp-Leu-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)CC(O)=O UMHUHHJMEXNSIV-CIUDSAMLSA-N 0.000 description 6
- GKWFMNNNYZHJHV-SRVKXCTJSA-N Asp-Lys-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CCCCN)NC(=O)[C@@H](N)CC(O)=O GKWFMNNNYZHJHV-SRVKXCTJSA-N 0.000 description 6
- JUWISGAGWSDGDH-KKUMJFAQSA-N Asp-Phe-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@@H](N)CC(O)=O)CC1=CC=CC=C1 JUWISGAGWSDGDH-KKUMJFAQSA-N 0.000 description 6
- GWWSUMLEWKQHLR-NUMRIWBASA-N Asp-Thr-Glu Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)O)NC(=O)[C@H](CC(=O)O)N)O GWWSUMLEWKQHLR-NUMRIWBASA-N 0.000 description 6
- GXHDGYOXPNQCKM-XVSYOHENSA-N Asp-Thr-Phe Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)O)NC(=O)[C@H](CC(=O)O)N)O GXHDGYOXPNQCKM-XVSYOHENSA-N 0.000 description 6
- NWAHPBGBDIFUFD-KKUMJFAQSA-N Asp-Tyr-Leu Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC(C)C)C(O)=O NWAHPBGBDIFUFD-KKUMJFAQSA-N 0.000 description 6
- XQFLFQWOBXPMHW-NHCYSSNCSA-N Asp-Val-His Chemical compound N[C@@H](CC(=O)O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC1=CNC=N1)C(=O)O XQFLFQWOBXPMHW-NHCYSSNCSA-N 0.000 description 6
- ZUNMTUPRQMWMHX-LSJOCFKGSA-N Asp-Val-Val Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](C(C)C)C(O)=O ZUNMTUPRQMWMHX-LSJOCFKGSA-N 0.000 description 6
- AEJSNWMRPXAKCW-WHFBIAKZSA-N Cys-Ala-Gly Chemical compound SC[C@H](N)C(=O)N[C@@H](C)C(=O)NCC(O)=O AEJSNWMRPXAKCW-WHFBIAKZSA-N 0.000 description 6
- XBELMDARIGXDKY-GUBZILKMSA-N Cys-Pro-Val Chemical compound CC(C)[C@@H](C(=O)O)NC(=O)[C@@H]1CCCN1C(=O)[C@H](CS)N XBELMDARIGXDKY-GUBZILKMSA-N 0.000 description 6
- 108010053770 Deoxyribonucleases Proteins 0.000 description 6
- 102000016911 Deoxyribonucleases Human genes 0.000 description 6
- TWHDOEYLXXQYOZ-FXQIFTODSA-N Gln-Asn-Gln Chemical compound C(CC(=O)N)[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CCC(=O)N)C(=O)O)N TWHDOEYLXXQYOZ-FXQIFTODSA-N 0.000 description 6
- ODBLJLZVLAWVMS-GUBZILKMSA-N Gln-Asn-Lys Chemical compound C(CCN)C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](CCC(=O)N)N ODBLJLZVLAWVMS-GUBZILKMSA-N 0.000 description 6
- AJDMYLOISOCHHC-YVNDNENWSA-N Gln-Gln-Ile Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O AJDMYLOISOCHHC-YVNDNENWSA-N 0.000 description 6
- XSBGUANSZDGULP-IUCAKERBSA-N Gln-Gly-Lys Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)NCC(=O)N[C@@H](CCCCN)C(O)=O XSBGUANSZDGULP-IUCAKERBSA-N 0.000 description 6
- FKXCBKCOSVIGCT-AVGNSLFASA-N Gln-Lys-Leu Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(O)=O FKXCBKCOSVIGCT-AVGNSLFASA-N 0.000 description 6
- FTTHLXOMDMLKKW-FHWLQOOXSA-N Gln-Phe-Phe Chemical compound C([C@H](NC(=O)[C@H](CCC(N)=O)N)C(=O)N[C@@H](CC=1C=CC=CC=1)C(O)=O)C1=CC=CC=C1 FTTHLXOMDMLKKW-FHWLQOOXSA-N 0.000 description 6
- ZGHMRONFHDVXEF-AVGNSLFASA-N Gln-Ser-Phe Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O ZGHMRONFHDVXEF-AVGNSLFASA-N 0.000 description 6
- ININBLZFFVOQIO-JHEQGTHGSA-N Gln-Thr-Gly Chemical compound C[C@H]([C@@H](C(=O)NCC(=O)O)NC(=O)[C@H](CCC(=O)N)N)O ININBLZFFVOQIO-JHEQGTHGSA-N 0.000 description 6
- CSMHMEATMDCQNY-DZKIICNBSA-N Gln-Val-Tyr Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O CSMHMEATMDCQNY-DZKIICNBSA-N 0.000 description 6
- LKDIBBOKUAASNP-FXQIFTODSA-N Glu-Ala-Glu Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@@H](CCC(O)=O)C(O)=O LKDIBBOKUAASNP-FXQIFTODSA-N 0.000 description 6
- CVPXINNKRTZBMO-CIUDSAMLSA-N Glu-Arg-Asn Chemical compound C(C[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)[C@H](CCC(=O)O)N)CN=C(N)N CVPXINNKRTZBMO-CIUDSAMLSA-N 0.000 description 6
- PBEQPAZRHDVJQI-SRVKXCTJSA-N Glu-Arg-His Chemical compound C1=C(NC=N1)C[C@@H](C(=O)O)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@H](CCC(=O)O)N PBEQPAZRHDVJQI-SRVKXCTJSA-N 0.000 description 6
- NADWTMLCUDMDQI-ACZMJKKPSA-N Glu-Asp-Cys Chemical compound C(CC(=O)O)[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CS)C(=O)O)N NADWTMLCUDMDQI-ACZMJKKPSA-N 0.000 description 6
- IESFZVCAVACGPH-PEFMBERDSA-N Glu-Asp-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@H](CC(O)=O)NC(=O)[C@@H](N)CCC(O)=O IESFZVCAVACGPH-PEFMBERDSA-N 0.000 description 6
- KRGZZKWSBGPLKL-IUCAKERBSA-N Glu-Gly-Lys Chemical compound C(CCN)C[C@@H](C(=O)O)NC(=O)CNC(=O)[C@H](CCC(=O)O)N KRGZZKWSBGPLKL-IUCAKERBSA-N 0.000 description 6
- ZWABFSSWTSAMQN-KBIXCLLPSA-N Glu-Ile-Ala Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](C)C(O)=O ZWABFSSWTSAMQN-KBIXCLLPSA-N 0.000 description 6
- NJCALAAIGREHDR-WDCWCFNPSA-N Glu-Leu-Thr Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)O)C(O)=O NJCALAAIGREHDR-WDCWCFNPSA-N 0.000 description 6
- OQXDUSZKISQQSS-GUBZILKMSA-N Glu-Lys-Ala Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](C)C(O)=O OQXDUSZKISQQSS-GUBZILKMSA-N 0.000 description 6
- MFNUFCFRAZPJFW-JYJNAYRXSA-N Glu-Lys-Phe Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 MFNUFCFRAZPJFW-JYJNAYRXSA-N 0.000 description 6
- IDEODOAVGCMUQV-GUBZILKMSA-N Glu-Ser-Leu Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(O)=O IDEODOAVGCMUQV-GUBZILKMSA-N 0.000 description 6
- HJTSRYLPAYGEEC-SIUGBPQLSA-N Glu-Tyr-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CC1=CC=C(C=C1)O)NC(=O)[C@H](CCC(=O)O)N HJTSRYLPAYGEEC-SIUGBPQLSA-N 0.000 description 6
- KKBWDNZXYLGJEY-UHFFFAOYSA-N Gly-Arg-Pro Natural products NCC(=O)NC(CCNC(=N)N)C(=O)N1CCCC1C(=O)O KKBWDNZXYLGJEY-UHFFFAOYSA-N 0.000 description 6
- AIJAPFVDBFYNKN-WHFBIAKZSA-N Gly-Asn-Asp Chemical compound C([C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)O)NC(=O)CN)C(=O)N AIJAPFVDBFYNKN-WHFBIAKZSA-N 0.000 description 6
- MHHUEAIBJZWDBH-YUMQZZPRSA-N Gly-Asp-Lys Chemical compound C(CCN)C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)O)NC(=O)CN MHHUEAIBJZWDBH-YUMQZZPRSA-N 0.000 description 6
- XTQFHTHIAKKCTM-YFKPBYRVSA-N Gly-Glu-Gly Chemical compound NCC(=O)N[C@@H](CCC(O)=O)C(=O)NCC(O)=O XTQFHTHIAKKCTM-YFKPBYRVSA-N 0.000 description 6
- QITBQGJOXQYMOA-ZETCQYMHSA-N Gly-Gly-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)CNC(=O)CN QITBQGJOXQYMOA-ZETCQYMHSA-N 0.000 description 6
- DGKBSGNCMCLDSL-BYULHYEWSA-N Gly-Ile-Asn Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)CN DGKBSGNCMCLDSL-BYULHYEWSA-N 0.000 description 6
- VEPBEGNDJYANCF-QWRGUYRKSA-N Gly-Lys-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)CN)CCCCN VEPBEGNDJYANCF-QWRGUYRKSA-N 0.000 description 6
- ICUTTWWCDIIIEE-BQBZGAKWSA-N Gly-Met-Asn Chemical compound CSCC[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)CN ICUTTWWCDIIIEE-BQBZGAKWSA-N 0.000 description 6
- FFJQHWKSGAWSTJ-BFHQHQDPSA-N Gly-Thr-Ala Chemical compound [H]NCC(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C)C(O)=O FFJQHWKSGAWSTJ-BFHQHQDPSA-N 0.000 description 6
- UMBDRSMLCUYIRI-DVJZZOLTSA-N Gly-Trp-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](CC1=CNC2=CC=CC=C21)NC(=O)CN)O UMBDRSMLCUYIRI-DVJZZOLTSA-N 0.000 description 6
- WYWBYSPRCFADBM-GARJFASQSA-N His-Cys-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CS)NC(=O)[C@H](CC2=CN=CN2)N)C(=O)O WYWBYSPRCFADBM-GARJFASQSA-N 0.000 description 6
- ZVKDCQVQTGYBQT-LSJOCFKGSA-N His-Pro-Ala Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N1CCC[C@H]1C(=O)N[C@@H](C)C(O)=O ZVKDCQVQTGYBQT-LSJOCFKGSA-N 0.000 description 6
- PYNPBMCLAKTHJL-SRVKXCTJSA-N His-Pro-Glu Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCC(O)=O)C(O)=O PYNPBMCLAKTHJL-SRVKXCTJSA-N 0.000 description 6
- LQSBBHNVAVNZSX-GHCJXIJMSA-N Ile-Ala-Asn Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](C)C(=O)N[C@@H](CC(=O)N)C(=O)O)N LQSBBHNVAVNZSX-GHCJXIJMSA-N 0.000 description 6
- QYZYJFXHXYUZMZ-UGYAYLCHSA-N Ile-Asn-Asn Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CC(=O)N)C(=O)O)N QYZYJFXHXYUZMZ-UGYAYLCHSA-N 0.000 description 6
- IDAHFEPYTJJZFD-PEFMBERDSA-N Ile-Asp-Glu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CCC(=O)O)C(=O)O)N IDAHFEPYTJJZFD-PEFMBERDSA-N 0.000 description 6
- HGNUKGZQASSBKQ-PCBIJLKTSA-N Ile-Asp-Phe Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)O)N HGNUKGZQASSBKQ-PCBIJLKTSA-N 0.000 description 6
- PFTFEWHJSAXGED-ZKWXMUAHSA-N Ile-Cys-Gly Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CS)C(=O)NCC(=O)O)N PFTFEWHJSAXGED-ZKWXMUAHSA-N 0.000 description 6
- VCYVLFAWCJRXFT-HJPIBITLSA-N Ile-Cys-Tyr Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CS)C(=O)N[C@@H](CC1=CC=C(C=C1)O)C(=O)O)N VCYVLFAWCJRXFT-HJPIBITLSA-N 0.000 description 6
- PHIXPNQDGGILMP-YVNDNENWSA-N Ile-Glu-Glu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CCC(=O)O)C(=O)O)N PHIXPNQDGGILMP-YVNDNENWSA-N 0.000 description 6
- CSQNHSGHAPRGPQ-YTFOTSKYSA-N Ile-Ile-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CCCCN)C(=O)O)N CSQNHSGHAPRGPQ-YTFOTSKYSA-N 0.000 description 6
- KLBVGHCGHUNHEA-BJDJZHNGSA-N Ile-Leu-Ala Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C)C(=O)O)N KLBVGHCGHUNHEA-BJDJZHNGSA-N 0.000 description 6
- YGDWPQCLFJNMOL-MNXVOIDGSA-N Ile-Leu-Gln Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(=O)N)C(=O)O)N YGDWPQCLFJNMOL-MNXVOIDGSA-N 0.000 description 6
- RQQCJTLBSJMVCR-DSYPUSFNSA-N Ile-Leu-Trp Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC1=CNC2=CC=CC=C21)C(=O)O)N RQQCJTLBSJMVCR-DSYPUSFNSA-N 0.000 description 6
- USXAYNCLFSUSBA-MGHWNKPDSA-N Ile-Phe-His Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC2=CN=CN2)C(=O)O)N USXAYNCLFSUSBA-MGHWNKPDSA-N 0.000 description 6
- XMYURPUVJSKTMC-KBIXCLLPSA-N Ile-Ser-Gln Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(=O)N)C(=O)O)N XMYURPUVJSKTMC-KBIXCLLPSA-N 0.000 description 6
- JNLSTRPWUXOORL-MMWGEVLESA-N Ile-Ser-Pro Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CO)C(=O)N1CCC[C@@H]1C(=O)O)N JNLSTRPWUXOORL-MMWGEVLESA-N 0.000 description 6
- BCISUQVFDGYZBO-QSFUFRPTSA-N Ile-Val-Asp Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@H](C(O)=O)CC(O)=O BCISUQVFDGYZBO-QSFUFRPTSA-N 0.000 description 6
- QSXSHZIRKTUXNG-STECZYCISA-N Ile-Val-Tyr Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 QSXSHZIRKTUXNG-STECZYCISA-N 0.000 description 6
- 108010065920 Insulin Lispro Proteins 0.000 description 6
- SITWEMZOJNKJCH-UHFFFAOYSA-N L-alanine-L-arginine Natural products CC(N)C(=O)NC(C(O)=O)CCCNC(N)=N SITWEMZOJNKJCH-UHFFFAOYSA-N 0.000 description 6
- SENJXOPIZNYLHU-UHFFFAOYSA-N L-leucyl-L-arginine Natural products CC(C)CC(N)C(=O)NC(C(O)=O)CCCN=C(N)N SENJXOPIZNYLHU-UHFFFAOYSA-N 0.000 description 6
- KFKWRHQBZQICHA-STQMWFEESA-N L-leucyl-L-phenylalanine Natural products CC(C)C[C@H](N)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 KFKWRHQBZQICHA-STQMWFEESA-N 0.000 description 6
- CQQGCWPXDHTTNF-GUBZILKMSA-N Leu-Ala-Glu Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@H](C(O)=O)CCC(O)=O CQQGCWPXDHTTNF-GUBZILKMSA-N 0.000 description 6
- SUPVSFFZWVOEOI-CQDKDKBSSA-N Leu-Ala-Tyr Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 SUPVSFFZWVOEOI-CQDKDKBSSA-N 0.000 description 6
- SUPVSFFZWVOEOI-UHFFFAOYSA-N Leu-Ala-Tyr Natural products CC(C)CC(N)C(=O)NC(C)C(=O)NC(C(O)=O)CC1=CC=C(O)C=C1 SUPVSFFZWVOEOI-UHFFFAOYSA-N 0.000 description 6
- RFUBXQQFJFGJFV-GUBZILKMSA-N Leu-Asn-Gln Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(N)=O)C(O)=O RFUBXQQFJFGJFV-GUBZILKMSA-N 0.000 description 6
- OIARJGNVARWKFP-YUMQZZPRSA-N Leu-Asn-Gly Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CC(N)=O)C(=O)NCC(O)=O OIARJGNVARWKFP-YUMQZZPRSA-N 0.000 description 6
- FIJMQLGQLBLBOL-HJGDQZAQSA-N Leu-Asn-Thr Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O FIJMQLGQLBLBOL-HJGDQZAQSA-N 0.000 description 6
- WXHFZJFZWNCDNB-KKUMJFAQSA-N Leu-Asn-Tyr Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 WXHFZJFZWNCDNB-KKUMJFAQSA-N 0.000 description 6
- ZURHXHNAEJJRNU-CIUDSAMLSA-N Leu-Asp-Asn Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(N)=O)C(O)=O ZURHXHNAEJJRNU-CIUDSAMLSA-N 0.000 description 6
- PJYSOYLLTJKZHC-GUBZILKMSA-N Leu-Asp-Gln Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@H](C(O)=O)CCC(N)=O PJYSOYLLTJKZHC-GUBZILKMSA-N 0.000 description 6
- MYGQXVYRZMKRDB-SRVKXCTJSA-N Leu-Asp-Lys Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@H](C(O)=O)CCCCN MYGQXVYRZMKRDB-SRVKXCTJSA-N 0.000 description 6
- ZYLJULGXQDNXDK-GUBZILKMSA-N Leu-Gln-Asp Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(O)=O)C(O)=O ZYLJULGXQDNXDK-GUBZILKMSA-N 0.000 description 6
- FQZPTCNSNPWHLJ-AVGNSLFASA-N Leu-Gln-Lys Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCCCN)C(O)=O FQZPTCNSNPWHLJ-AVGNSLFASA-N 0.000 description 6
- USLNHQZCDQJBOV-ZPFDUUQYSA-N Leu-Ile-Asn Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(N)=O)C(O)=O USLNHQZCDQJBOV-ZPFDUUQYSA-N 0.000 description 6
- OMHLATXVNQSALM-FQUUOJAGSA-N Leu-Ile-Pro Chemical compound CC[C@H](C)[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CC(C)C)N OMHLATXVNQSALM-FQUUOJAGSA-N 0.000 description 6
- KYIIALJHAOIAHF-KKUMJFAQSA-N Leu-Leu-His Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@H](C(O)=O)CC1=CN=CN1 KYIIALJHAOIAHF-KKUMJFAQSA-N 0.000 description 6
- XVZCXCTYGHPNEM-UHFFFAOYSA-N Leu-Leu-Pro Natural products CC(C)CC(N)C(=O)NC(CC(C)C)C(=O)N1CCCC1C(O)=O XVZCXCTYGHPNEM-UHFFFAOYSA-N 0.000 description 6
- ZGUMORRUBUCXEH-AVGNSLFASA-N Leu-Lys-Gln Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCC(N)=O)C(O)=O ZGUMORRUBUCXEH-AVGNSLFASA-N 0.000 description 6
- ZAVCJRJOQKIOJW-KKUMJFAQSA-N Leu-Phe-Asp Chemical compound CC(C)C[C@H](N)C(=O)N[C@H](C(=O)N[C@@H](CC(O)=O)C(O)=O)CC1=CC=CC=C1 ZAVCJRJOQKIOJW-KKUMJFAQSA-N 0.000 description 6
- KQFZKDITNUEVFJ-JYJNAYRXSA-N Leu-Phe-Gln Chemical compound NC(=O)CC[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@@H](N)CC(C)C)CC1=CC=CC=C1 KQFZKDITNUEVFJ-JYJNAYRXSA-N 0.000 description 6
- PJWOOBTYQNNRBF-BZSNNMDCSA-N Leu-Phe-Lys Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCCCN)C(=O)O)N PJWOOBTYQNNRBF-BZSNNMDCSA-N 0.000 description 6
- MVVSHHJKJRZVNY-ACRUOGEOSA-N Leu-Phe-Tyr Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O MVVSHHJKJRZVNY-ACRUOGEOSA-N 0.000 description 6
- ZJZNLRVCZWUONM-JXUBOQSCSA-N Leu-Thr-Ala Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C)C(O)=O ZJZNLRVCZWUONM-JXUBOQSCSA-N 0.000 description 6
- HQVDJTYKCMIWJP-YUMQZZPRSA-N Lys-Asn-Gly Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC(N)=O)C(=O)NCC(O)=O HQVDJTYKCMIWJP-YUMQZZPRSA-N 0.000 description 6
- YVSHZSUKQHNDHD-KKUMJFAQSA-N Lys-Asn-Phe Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](CCCCN)N YVSHZSUKQHNDHD-KKUMJFAQSA-N 0.000 description 6
- LZWNAOIMTLNMDW-NHCYSSNCSA-N Lys-Asn-Val Chemical compound CC(C)[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](CCCCN)N LZWNAOIMTLNMDW-NHCYSSNCSA-N 0.000 description 6
- NTBFKPBULZGXQL-KKUMJFAQSA-N Lys-Asp-Tyr Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 NTBFKPBULZGXQL-KKUMJFAQSA-N 0.000 description 6
- YVMQJGWLHRWMDF-MNXVOIDGSA-N Lys-Gln-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)N)NC(=O)[C@H](CCCCN)N YVMQJGWLHRWMDF-MNXVOIDGSA-N 0.000 description 6
- GJJQCBVRWDGLMQ-GUBZILKMSA-N Lys-Glu-Ala Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](C)C(O)=O GJJQCBVRWDGLMQ-GUBZILKMSA-N 0.000 description 6
- VEGLGAOVLFODGC-GUBZILKMSA-N Lys-Glu-Ser Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CO)C(O)=O VEGLGAOVLFODGC-GUBZILKMSA-N 0.000 description 6
- GQZMPWBZQALKJO-UWVGGRQHSA-N Lys-Gly-Arg Chemical compound [H]N[C@@H](CCCCN)C(=O)NCC(=O)N[C@@H](CCCNC(N)=N)C(O)=O GQZMPWBZQALKJO-UWVGGRQHSA-N 0.000 description 6
- KNKJPYAZQUFLQK-IHRRRGAJSA-N Lys-His-Arg Chemical compound C1=C(NC=N1)C[C@@H](C(=O)N[C@@H](CCCN=C(N)N)C(=O)O)NC(=O)[C@H](CCCCN)N KNKJPYAZQUFLQK-IHRRRGAJSA-N 0.000 description 6
- MUXNCRWTWBMNHX-SRVKXCTJSA-N Lys-Leu-Asp Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(O)=O)C(O)=O MUXNCRWTWBMNHX-SRVKXCTJSA-N 0.000 description 6
- AIRZWUMAHCDDHR-KKUMJFAQSA-N Lys-Leu-Leu Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(O)=O AIRZWUMAHCDDHR-KKUMJFAQSA-N 0.000 description 6
- OIQSIMFSVLLWBX-VOAKCMCISA-N Lys-Leu-Thr Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)O)C(O)=O OIQSIMFSVLLWBX-VOAKCMCISA-N 0.000 description 6
- VUTWYNQUSJWBHO-BZSNNMDCSA-N Lys-Leu-Tyr Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O VUTWYNQUSJWBHO-BZSNNMDCSA-N 0.000 description 6
- RIJCHEVHFWMDKD-SRVKXCTJSA-N Lys-Lys-Asn Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(N)=O)C(O)=O RIJCHEVHFWMDKD-SRVKXCTJSA-N 0.000 description 6
- ALGGDNMLQNFVIZ-SRVKXCTJSA-N Lys-Lys-Asp Chemical compound C(CCN)C[C@@H](C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(=O)O)C(=O)O)N ALGGDNMLQNFVIZ-SRVKXCTJSA-N 0.000 description 6
- ATNKHRAIZCMCCN-BZSNNMDCSA-N Lys-Lys-Phe Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)O)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCCN)N ATNKHRAIZCMCCN-BZSNNMDCSA-N 0.000 description 6
- PLDJDCJLRCYPJB-VOAKCMCISA-N Lys-Lys-Thr Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)O)C(O)=O PLDJDCJLRCYPJB-VOAKCMCISA-N 0.000 description 6
- MIMXMVDLMDMOJD-BZSNNMDCSA-N Lys-Tyr-Leu Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC(C)C)C(O)=O MIMXMVDLMDMOJD-BZSNNMDCSA-N 0.000 description 6
- WINFHLHJTRGLCV-BZSNNMDCSA-N Lys-Tyr-Lys Chemical compound NCCCC[C@H](N)C(=O)N[C@H](C(=O)N[C@@H](CCCCN)C(O)=O)CC1=CC=C(O)C=C1 WINFHLHJTRGLCV-BZSNNMDCSA-N 0.000 description 6
- QEVRUYFHWJJUHZ-DCAQKATOSA-N Met-Ala-Leu Chemical compound CSCC[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@H](C(O)=O)CC(C)C QEVRUYFHWJJUHZ-DCAQKATOSA-N 0.000 description 6
- BLIPQDLSCFGUFA-GUBZILKMSA-N Met-Arg-Asn Chemical compound CSCC[C@H](N)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(N)=O)C(O)=O BLIPQDLSCFGUFA-GUBZILKMSA-N 0.000 description 6
- JACAKCWAOHKQBV-UWVGGRQHSA-N Met-Gly-Lys Chemical compound CSCC[C@H](N)C(=O)NCC(=O)N[C@H](C(O)=O)CCCCN JACAKCWAOHKQBV-UWVGGRQHSA-N 0.000 description 6
- RMLLCGYYVZKKRT-CIUDSAMLSA-N Met-Ser-Glu Chemical compound CSCC[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@H](C(O)=O)CCC(O)=O RMLLCGYYVZKKRT-CIUDSAMLSA-N 0.000 description 6
- GGXZOTSDJJTDGB-GUBZILKMSA-N Met-Ser-Val Chemical compound [H]N[C@@H](CCSC)C(=O)N[C@@H](CO)C(=O)N[C@@H](C(C)C)C(O)=O GGXZOTSDJJTDGB-GUBZILKMSA-N 0.000 description 6
- AUEJLPRZGVVDNU-UHFFFAOYSA-N N-L-tyrosyl-L-leucine Natural products CC(C)CC(C(O)=O)NC(=O)C(N)CC1=CC=C(O)C=C1 AUEJLPRZGVVDNU-UHFFFAOYSA-N 0.000 description 6
- AJHCSUXXECOXOY-UHFFFAOYSA-N N-glycyl-L-tryptophan Natural products C1=CC=C2C(CC(NC(=O)CN)C(O)=O)=CNC2=C1 AJHCSUXXECOXOY-UHFFFAOYSA-N 0.000 description 6
- 108010079364 N-glycylalanine Proteins 0.000 description 6
- 101710163270 Nuclease Proteins 0.000 description 6
- AYPMIIKUMNADSU-IHRRRGAJSA-N Phe-Arg-Asn Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(N)=O)C(O)=O AYPMIIKUMNADSU-IHRRRGAJSA-N 0.000 description 6
- MRNRMSDVVSKPGM-AVGNSLFASA-N Phe-Asn-Gln Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(N)=O)C(O)=O MRNRMSDVVSKPGM-AVGNSLFASA-N 0.000 description 6
- CSDMCMITJLKBAH-SOUVJXGZSA-N Phe-Glu-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CC2=CC=CC=C2)N)C(=O)O CSDMCMITJLKBAH-SOUVJXGZSA-N 0.000 description 6
- BIYWZVCPZIFGPY-QWRGUYRKSA-N Phe-Gly-Ser Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)NCC(=O)N[C@@H](CO)C(O)=O BIYWZVCPZIFGPY-QWRGUYRKSA-N 0.000 description 6
- WEMYTDDMDBLPMI-DKIMLUQUSA-N Phe-Ile-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CC1=CC=CC=C1)N WEMYTDDMDBLPMI-DKIMLUQUSA-N 0.000 description 6
- CBENHWCORLVGEQ-HJOGWXRNSA-N Phe-Phe-Phe Chemical compound C([C@H](N)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@@H](CC=1C=CC=CC=1)C(O)=O)C1=CC=CC=C1 CBENHWCORLVGEQ-HJOGWXRNSA-N 0.000 description 6
- ZYNBEWGJFXTBDU-ACRUOGEOSA-N Phe-Tyr-Leu Chemical compound CC(C)C[C@@H](C(=O)O)NC(=O)[C@H](CC1=CC=C(C=C1)O)NC(=O)[C@H](CC2=CC=CC=C2)N ZYNBEWGJFXTBDU-ACRUOGEOSA-N 0.000 description 6
- KUSYCSMTTHSZOA-DZKIICNBSA-N Phe-Val-Gln Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)O)NC(=O)[C@H](CC1=CC=CC=C1)N KUSYCSMTTHSZOA-DZKIICNBSA-N 0.000 description 6
- BQMFWUKNOCJDNV-HJWJTTGWSA-N Phe-Val-Ile Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O BQMFWUKNOCJDNV-HJWJTTGWSA-N 0.000 description 6
- AMBLXEMWFARNNQ-DCAQKATOSA-N Pro-Asn-Leu Chemical compound CC(C)C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)[C@@H]1CCCN1 AMBLXEMWFARNNQ-DCAQKATOSA-N 0.000 description 6
- LXVLKXPFIDDHJG-CIUDSAMLSA-N Pro-Glu-Ser Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CO)C(O)=O LXVLKXPFIDDHJG-CIUDSAMLSA-N 0.000 description 6
- SXMSEHDMNIUTSP-DCAQKATOSA-N Pro-Lys-Asn Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(N)=O)C(O)=O SXMSEHDMNIUTSP-DCAQKATOSA-N 0.000 description 6
- SMFQZMGHCODUPQ-ULQDDVLXSA-N Pro-Lys-Phe Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O SMFQZMGHCODUPQ-ULQDDVLXSA-N 0.000 description 6
- WOIFYRZPIORBRY-AVGNSLFASA-N Pro-Lys-Val Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](C(C)C)C(O)=O WOIFYRZPIORBRY-AVGNSLFASA-N 0.000 description 6
- SXJOPONICMGFCR-DCAQKATOSA-N Pro-Ser-Lys Chemical compound C1C[C@H](NC1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCCN)C(=O)O SXJOPONICMGFCR-DCAQKATOSA-N 0.000 description 6
- VEUACYMXJKXALX-IHRRRGAJSA-N Pro-Tyr-Ser Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CO)C(O)=O VEUACYMXJKXALX-IHRRRGAJSA-N 0.000 description 6
- FUOGXAQMNJMBFG-WPRPVWTQSA-N Pro-Val-Gly Chemical compound OC(=O)CNC(=O)[C@H](C(C)C)NC(=O)[C@@H]1CCCN1 FUOGXAQMNJMBFG-WPRPVWTQSA-N 0.000 description 6
- JPIDMRXXNMIVKY-VZFHVOOUSA-N Ser-Ala-Thr Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](C)C(=O)N[C@@H]([C@@H](C)O)C(O)=O JPIDMRXXNMIVKY-VZFHVOOUSA-N 0.000 description 6
- QWZIOCFPXMAXET-CIUDSAMLSA-N Ser-Arg-Gln Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(N)=O)C(O)=O QWZIOCFPXMAXET-CIUDSAMLSA-N 0.000 description 6
- BYIROAKULFFTEK-CIUDSAMLSA-N Ser-Asp-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@H](CC(O)=O)NC(=O)[C@@H](N)CO BYIROAKULFFTEK-CIUDSAMLSA-N 0.000 description 6
- ZOHGLPQGEHSLPD-FXQIFTODSA-N Ser-Gln-Glu Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O ZOHGLPQGEHSLPD-FXQIFTODSA-N 0.000 description 6
- HJEBZBMOTCQYDN-ACZMJKKPSA-N Ser-Glu-Asp Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(O)=O)C(O)=O HJEBZBMOTCQYDN-ACZMJKKPSA-N 0.000 description 6
- WBINSDOPZHQPPM-AVGNSLFASA-N Ser-Glu-Tyr Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CO)N)O WBINSDOPZHQPPM-AVGNSLFASA-N 0.000 description 6
- OHKFXGKHSJKKAL-NRPADANISA-N Ser-Glu-Val Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](C(C)C)C(O)=O OHKFXGKHSJKKAL-NRPADANISA-N 0.000 description 6
- UQFYNFTYDHUIMI-WHFBIAKZSA-N Ser-Gly-Ala Chemical compound OC(=O)[C@H](C)NC(=O)CNC(=O)[C@@H](N)CO UQFYNFTYDHUIMI-WHFBIAKZSA-N 0.000 description 6
- DJACUBDEDBZKLQ-KBIXCLLPSA-N Ser-Ile-Glu Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CCC(O)=O)C(O)=O DJACUBDEDBZKLQ-KBIXCLLPSA-N 0.000 description 6
- MUJQWSAWLLRJCE-KATARQTJSA-N Ser-Leu-Thr Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)O)C(O)=O MUJQWSAWLLRJCE-KATARQTJSA-N 0.000 description 6
- PPNPDKGQRFSCAC-CIUDSAMLSA-N Ser-Lys-Asp Chemical compound NCCCC[C@H](NC(=O)[C@@H](N)CO)C(=O)N[C@@H](CC(O)=O)C(O)=O PPNPDKGQRFSCAC-CIUDSAMLSA-N 0.000 description 6
- FPCGZYMRFFIYIH-CIUDSAMLSA-N Ser-Lys-Ser Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CO)C(O)=O FPCGZYMRFFIYIH-CIUDSAMLSA-N 0.000 description 6
- XVWDJUROVRQKAE-KKUMJFAQSA-N Ser-Phe-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@@H](N)CO)CC1=CC=CC=C1 XVWDJUROVRQKAE-KKUMJFAQSA-N 0.000 description 6
- WNDUPCKKKGSKIQ-CIUDSAMLSA-N Ser-Pro-Gln Chemical compound OC[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCC(N)=O)C(O)=O WNDUPCKKKGSKIQ-CIUDSAMLSA-N 0.000 description 6
- NADLKBTYNKUJEP-KATARQTJSA-N Ser-Thr-Leu Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(O)=O NADLKBTYNKUJEP-KATARQTJSA-N 0.000 description 6
- MFQMZDPAZRZAPV-NAKRPEOUSA-N Ser-Val-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](C(C)C)NC(=O)[C@H](CO)N MFQMZDPAZRZAPV-NAKRPEOUSA-N 0.000 description 6
- VIBXMCZWVUOZLA-OLHMAJIHSA-N Thr-Asn-Asn Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CC(=O)N)C(=O)O)N)O VIBXMCZWVUOZLA-OLHMAJIHSA-N 0.000 description 6
- LAFLAXHTDVNVEL-WDCWCFNPSA-N Thr-Gln-Lys Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H](CCCCN)C(=O)O)N)O LAFLAXHTDVNVEL-WDCWCFNPSA-N 0.000 description 6
- XPNSAQMEAVSQRD-FBCQKBJTSA-N Thr-Gly-Gly Chemical compound C[C@@H](O)[C@H](N)C(=O)NCC(=O)NCC(O)=O XPNSAQMEAVSQRD-FBCQKBJTSA-N 0.000 description 6
- ZTPXSEUVYNNZRB-CDMKHQONSA-N Thr-Gly-Phe Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)NCC(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O ZTPXSEUVYNNZRB-CDMKHQONSA-N 0.000 description 6
- SXAGUVRFGJSFKC-ZEILLAHLSA-N Thr-His-Thr Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H]([C@@H](C)O)C(O)=O SXAGUVRFGJSFKC-ZEILLAHLSA-N 0.000 description 6
- GMXIJHCBTZDAPD-QPHKQPEJSA-N Thr-Ile-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H]([C@@H](C)CC)C(=O)O)NC(=O)[C@H]([C@@H](C)O)N GMXIJHCBTZDAPD-QPHKQPEJSA-N 0.000 description 6
- AHOLTQCAVBSUDP-PPCPHDFISA-N Thr-Ile-Lys Chemical compound CC[C@H](C)[C@H](NC(=O)[C@@H](N)[C@@H](C)O)C(=O)N[C@@H](CCCCN)C(O)=O AHOLTQCAVBSUDP-PPCPHDFISA-N 0.000 description 6
- VTVVYQOXJCZVEB-WDCWCFNPSA-N Thr-Leu-Glu Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(O)=O)C(O)=O VTVVYQOXJCZVEB-WDCWCFNPSA-N 0.000 description 6
- MECLEFZMPPOEAC-VOAKCMCISA-N Thr-Leu-Lys Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(=O)O)N)O MECLEFZMPPOEAC-VOAKCMCISA-N 0.000 description 6
- PJCYRZVSACOYSN-ZJDVBMNYSA-N Thr-Thr-Met Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCSC)C(O)=O PJCYRZVSACOYSN-ZJDVBMNYSA-N 0.000 description 6
- XVHAUVJXBFGUPC-RPTUDFQQSA-N Thr-Tyr-Phe Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O XVHAUVJXBFGUPC-RPTUDFQQSA-N 0.000 description 6
- HJTYJQVRIQXMHM-XIRDDKMYSA-N Trp-Asp-Lys Chemical compound C1=CC=C2C(=C1)C(=CN2)C[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CCCCN)C(=O)O)N HJTYJQVRIQXMHM-XIRDDKMYSA-N 0.000 description 6
- NKUIXQOJUAEIET-AQZXSJQPSA-N Trp-Asp-Thr Chemical compound C1=CC=C2C(C[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H]([C@H](O)C)C(O)=O)=CNC2=C1 NKUIXQOJUAEIET-AQZXSJQPSA-N 0.000 description 6
- KRCPXGSWDOGHAM-XIRDDKMYSA-N Trp-Lys-Asp Chemical compound [H]N[C@@H](CC1=CNC2=C1C=CC=C2)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(O)=O)C(O)=O KRCPXGSWDOGHAM-XIRDDKMYSA-N 0.000 description 6
- UUIYFDAWNBSWPG-IHPCNDPISA-N Trp-Lys-Lys Chemical compound C1=CC=C2C(=C1)C(=CN2)C[C@@H](C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCCN)C(=O)O)N UUIYFDAWNBSWPG-IHPCNDPISA-N 0.000 description 6
- SCCKSNREWHMKOJ-SRVKXCTJSA-N Tyr-Asn-Ser Chemical compound N[C@@H](Cc1ccc(O)cc1)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CO)C(O)=O SCCKSNREWHMKOJ-SRVKXCTJSA-N 0.000 description 6
- RCLOWEZASFJFEX-KKUMJFAQSA-N Tyr-Asp-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CC(O)=O)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 RCLOWEZASFJFEX-KKUMJFAQSA-N 0.000 description 6
- ZRPLVTZTKPPSBT-AVGNSLFASA-N Tyr-Glu-Ser Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CO)C(O)=O ZRPLVTZTKPPSBT-AVGNSLFASA-N 0.000 description 6
- IJUTXXAXQODRMW-KBPBESRZSA-N Tyr-Gly-His Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)NCC(=O)N[C@@H](CC2=CN=CN2)C(=O)O)N)O IJUTXXAXQODRMW-KBPBESRZSA-N 0.000 description 6
- ZOBLBMGJKVJVEV-BZSNNMDCSA-N Tyr-Lys-Lys Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCCN)C(=O)O)N)O ZOBLBMGJKVJVEV-BZSNNMDCSA-N 0.000 description 6
- LRHBBGDMBLFYGL-FHWLQOOXSA-N Tyr-Phe-Glu Chemical compound C([C@H](N)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@@H](CCC(O)=O)C(O)=O)C1=CC=C(O)C=C1 LRHBBGDMBLFYGL-FHWLQOOXSA-N 0.000 description 6
- XGZBEGGGAUQBMB-KJEVXHAQSA-N Tyr-Pro-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@@H]1CCCN1C(=O)[C@H](CC2=CC=C(C=C2)O)N)O XGZBEGGGAUQBMB-KJEVXHAQSA-N 0.000 description 6
- MQGGXGKQSVEQHR-KKUMJFAQSA-N Tyr-Ser-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CO)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 MQGGXGKQSVEQHR-KKUMJFAQSA-N 0.000 description 6
- UMSZZGTXGKHTFJ-SRVKXCTJSA-N Tyr-Ser-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@H](CO)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 UMSZZGTXGKHTFJ-SRVKXCTJSA-N 0.000 description 6
- PWKMJDQXKCENMF-MEYUZBJRSA-N Tyr-Thr-Leu Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(O)=O PWKMJDQXKCENMF-MEYUZBJRSA-N 0.000 description 6
- YOTRXXBHTZHKLU-BVSLBCMMSA-N Tyr-Trp-Met Chemical compound C([C@H](N)C(=O)N[C@@H](CC=1C2=CC=CC=C2NC=1)C(=O)N[C@@H](CCSC)C(O)=O)C1=CC=C(O)C=C1 YOTRXXBHTZHKLU-BVSLBCMMSA-N 0.000 description 6
- FZSPNKUFROZBSG-ZKWXMUAHSA-N Val-Ala-Asp Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@H](C(O)=O)CC(O)=O FZSPNKUFROZBSG-ZKWXMUAHSA-N 0.000 description 6
- QHDXUYOYTPWCSK-RCOVLWMOSA-N Val-Asp-Gly Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)NCC(=O)O)N QHDXUYOYTPWCSK-RCOVLWMOSA-N 0.000 description 6
- ZXAGTABZUOMUDO-GVXVVHGQSA-N Val-Glu-Lys Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CCCCN)C(=O)O)N ZXAGTABZUOMUDO-GVXVVHGQSA-N 0.000 description 6
- SDUBQHUJJWQTEU-XUXIUFHCSA-N Val-Ile-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](C(C)C)N SDUBQHUJJWQTEU-XUXIUFHCSA-N 0.000 description 6
- OTJMMKPMLUNTQT-AVGNSLFASA-N Val-Leu-Arg Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CCCN=C(N)N)C(=O)O)NC(=O)[C@H](C(C)C)N OTJMMKPMLUNTQT-AVGNSLFASA-N 0.000 description 6
- LYERIXUFCYVFFX-GVXVVHGQSA-N Val-Leu-Glu Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)O)NC(=O)[C@H](C(C)C)N LYERIXUFCYVFFX-GVXVVHGQSA-N 0.000 description 6
- RWOGENDAOGMHLX-DCAQKATOSA-N Val-Lys-Ala Chemical compound C[C@@H](C(=O)O)NC(=O)[C@H](CCCCN)NC(=O)[C@H](C(C)C)N RWOGENDAOGMHLX-DCAQKATOSA-N 0.000 description 6
- MHHAWNPHDLCPLF-ULQDDVLXSA-N Val-Phe-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@@H](N)C(C)C)CC1=CC=CC=C1 MHHAWNPHDLCPLF-ULQDDVLXSA-N 0.000 description 6
- KISFXYYRKKNLOP-IHRRRGAJSA-N Val-Phe-Ser Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CO)C(=O)O)N KISFXYYRKKNLOP-IHRRRGAJSA-N 0.000 description 6
- QTPQHINADBYBNA-DCAQKATOSA-N Val-Ser-Lys Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@H](C(O)=O)CCCCN QTPQHINADBYBNA-DCAQKATOSA-N 0.000 description 6
- RLVTVHSDKHBFQP-ULQDDVLXSA-N Val-Tyr-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@@H](N)C(C)C)CC1=CC=C(O)C=C1 RLVTVHSDKHBFQP-ULQDDVLXSA-N 0.000 description 6
- 108010008685 alanyl-glutamyl-aspartic acid Proteins 0.000 description 6
- 108010044940 alanylglutamine Proteins 0.000 description 6
- 108010050025 alpha-glutamyltryptophan Proteins 0.000 description 6
- 108010013835 arginine glutamate Proteins 0.000 description 6
- 108010009111 arginyl-glycyl-glutamic acid Proteins 0.000 description 6
- 108010077245 asparaginyl-proline Proteins 0.000 description 6
- 108010068265 aspartyltyrosine Proteins 0.000 description 6
- 108010063718 gamma-glutamylaspartic acid Proteins 0.000 description 6
- 108010049041 glutamylalanine Proteins 0.000 description 6
- 108010033719 glycyl-histidyl-glycine Proteins 0.000 description 6
- 108010059898 glycyl-tyrosyl-lysine Proteins 0.000 description 6
- 108010084389 glycyltryptophan Proteins 0.000 description 6
- 108010060857 isoleucyl-valyl-tyrosine Proteins 0.000 description 6
- 108010027338 isoleucylcysteine Proteins 0.000 description 6
- 108010044056 leucyl-phenylalanine Proteins 0.000 description 6
- 108010000761 leucylarginine Proteins 0.000 description 6
- 108010025153 lysyl-alanyl-alanine Proteins 0.000 description 6
- 108010045397 lysyl-tyrosyl-lysine Proteins 0.000 description 6
- 108010009298 lysylglutamic acid Proteins 0.000 description 6
- 108010069117 seryl-lysyl-aspartic acid Proteins 0.000 description 6
- 108010080629 tryptophan-leucine Proteins 0.000 description 6
- 108010078580 tyrosylleucine Proteins 0.000 description 6
- FVSOUJZKYWEFOB-KBIXCLLPSA-N Ala-Gln-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](C)N FVSOUJZKYWEFOB-KBIXCLLPSA-N 0.000 description 5
- AWAXZRDKUHOPBO-GUBZILKMSA-N Ala-Gln-Lys Chemical compound C[C@H](N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCCCN)C(O)=O AWAXZRDKUHOPBO-GUBZILKMSA-N 0.000 description 5
- QHASENCZLDHBGX-ONGXEEELSA-N Ala-Gly-Phe Chemical compound C[C@H](N)C(=O)NCC(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 QHASENCZLDHBGX-ONGXEEELSA-N 0.000 description 5
- IFKQPMZRDQZSHI-GHCJXIJMSA-N Ala-Ile-Asn Chemical compound [H]N[C@@H](C)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(N)=O)C(O)=O IFKQPMZRDQZSHI-GHCJXIJMSA-N 0.000 description 5
- OKIKVSXTXVVFDV-MMWGEVLESA-N Ala-Ile-Pro Chemical compound CC[C@H](C)[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](C)N OKIKVSXTXVVFDV-MMWGEVLESA-N 0.000 description 5
- NINQYGGNRIBFSC-CIUDSAMLSA-N Ala-Lys-Ser Chemical compound NCCCC[C@H](NC(=O)[C@@H](N)C)C(=O)N[C@@H](CO)C(O)=O NINQYGGNRIBFSC-CIUDSAMLSA-N 0.000 description 5
- VYMJAWXRWHJIMS-LKTVYLICSA-N Ala-Tyr-His Chemical compound C[C@@H](C(=O)N[C@@H](CC1=CC=C(C=C1)O)C(=O)N[C@@H](CC2=CN=CN2)C(=O)O)N VYMJAWXRWHJIMS-LKTVYLICSA-N 0.000 description 5
- NYDIVDKTULRINZ-AVGNSLFASA-N Arg-Met-Lys Chemical compound CSCC[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CCCN=C(N)N)N NYDIVDKTULRINZ-AVGNSLFASA-N 0.000 description 5
- KZXPVYVSHUJCEO-ULQDDVLXSA-N Arg-Phe-Lys Chemical compound NC(=N)NCCC[C@H](N)C(=O)N[C@H](C(=O)N[C@@H](CCCCN)C(O)=O)CC1=CC=CC=C1 KZXPVYVSHUJCEO-ULQDDVLXSA-N 0.000 description 5
- AOHKLEBWKMKITA-IHRRRGAJSA-N Arg-Phe-Ser Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CO)C(=O)O)NC(=O)[C@H](CCCN=C(N)N)N AOHKLEBWKMKITA-IHRRRGAJSA-N 0.000 description 5
- PBSQFBAJKPLRJY-BYULHYEWSA-N Asn-Gly-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)CNC(=O)[C@H](CC(=O)N)N PBSQFBAJKPLRJY-BYULHYEWSA-N 0.000 description 5
- FBODFHMLALOPHP-GUBZILKMSA-N Asn-Lys-Glu Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCC(O)=O)C(O)=O FBODFHMLALOPHP-GUBZILKMSA-N 0.000 description 5
- RAUPFUCUDBQYHE-AVGNSLFASA-N Asn-Phe-Glu Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCC(O)=O)C(O)=O RAUPFUCUDBQYHE-AVGNSLFASA-N 0.000 description 5
- RVHGJNGNKGDCPX-KKUMJFAQSA-N Asn-Phe-Lys Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CC(=O)N)N RVHGJNGNKGDCPX-KKUMJFAQSA-N 0.000 description 5
- FTNRWCPWDWRPAV-BZSNNMDCSA-N Asn-Phe-Phe Chemical compound C([C@H](NC(=O)[C@H](CC(N)=O)N)C(=O)N[C@@H](CC=1C=CC=CC=1)C(O)=O)C1=CC=CC=C1 FTNRWCPWDWRPAV-BZSNNMDCSA-N 0.000 description 5
- DPWDPEVGACCWTC-SRVKXCTJSA-N Asn-Tyr-Ser Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CO)C(O)=O DPWDPEVGACCWTC-SRVKXCTJSA-N 0.000 description 5
- KDFQZBWWPYQBEN-ZLUOBGJFSA-N Asp-Ala-Asn Chemical compound C[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)[C@H](CC(=O)O)N KDFQZBWWPYQBEN-ZLUOBGJFSA-N 0.000 description 5
- UGIBTKGQVWFTGX-BIIVOSGPSA-N Asp-Asn-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CC(=O)N)NC(=O)[C@H](CC(=O)O)N)C(=O)O UGIBTKGQVWFTGX-BIIVOSGPSA-N 0.000 description 5
- WCFCYFDBMNFSPA-ACZMJKKPSA-N Asp-Asp-Glu Chemical compound OC(=O)C[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@H](C(O)=O)CCC(O)=O WCFCYFDBMNFSPA-ACZMJKKPSA-N 0.000 description 5
- CLUMZOKVGUWUFD-CIUDSAMLSA-N Asp-Leu-Asn Chemical compound OC(=O)C[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(N)=O)C(O)=O CLUMZOKVGUWUFD-CIUDSAMLSA-N 0.000 description 5
- RXBGWGRSWXOBGK-KKUMJFAQSA-N Asp-Lys-Tyr Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O RXBGWGRSWXOBGK-KKUMJFAQSA-N 0.000 description 5
- JSHWXQIZOCVWIA-ZKWXMUAHSA-N Asp-Ser-Val Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](C(C)C)C(O)=O JSHWXQIZOCVWIA-ZKWXMUAHSA-N 0.000 description 5
- QOCFFCUFZGDHTP-NUMRIWBASA-N Asp-Thr-Gln Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCC(N)=O)C(O)=O QOCFFCUFZGDHTP-NUMRIWBASA-N 0.000 description 5
- KBJVTFWQWXCYCQ-IUKAMOBKSA-N Asp-Thr-Ile Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O KBJVTFWQWXCYCQ-IUKAMOBKSA-N 0.000 description 5
- JGLWFWXGOINXEA-YDHLFZDLSA-N Asp-Val-Tyr Chemical compound OC(=O)C[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 JGLWFWXGOINXEA-YDHLFZDLSA-N 0.000 description 5
- 238000010443 CRISPR/Cpf1 gene editing Methods 0.000 description 5
- CYTSBCIIEHUPDU-ACZMJKKPSA-N Gln-Asp-Ala Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](C)C(O)=O CYTSBCIIEHUPDU-ACZMJKKPSA-N 0.000 description 5
- UFNSPPFJOHNXRE-AUTRQRHGSA-N Gln-Gln-Val Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](C(C)C)C(O)=O UFNSPPFJOHNXRE-AUTRQRHGSA-N 0.000 description 5
- RGAOLBZBLOJUTP-GRLWGSQLSA-N Gln-Ile-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H]([C@@H](C)CC)C(=O)O)NC(=O)[C@H](CCC(=O)N)N RGAOLBZBLOJUTP-GRLWGSQLSA-N 0.000 description 5
- FTIJVMLAGRAYMJ-MNXVOIDGSA-N Gln-Ile-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H]([C@@H](C)CC)NC(=O)[C@@H](N)CCC(N)=O FTIJVMLAGRAYMJ-MNXVOIDGSA-N 0.000 description 5
- ITYRYNUZHPNCIK-GUBZILKMSA-N Glu-Ala-Leu Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](C)C(=O)N[C@@H](CC(C)C)C(O)=O ITYRYNUZHPNCIK-GUBZILKMSA-N 0.000 description 5
- KIMXNQXJJWWVIN-AVGNSLFASA-N Glu-Cys-Tyr Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)O)NC(=O)[C@H](CS)NC(=O)[C@H](CCC(=O)O)N)O KIMXNQXJJWWVIN-AVGNSLFASA-N 0.000 description 5
- LGYZYFFDELZWRS-DCAQKATOSA-N Glu-Glu-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@@H](N)CCC(O)=O LGYZYFFDELZWRS-DCAQKATOSA-N 0.000 description 5
- ZCOJVESMNGBGLF-GRLWGSQLSA-N Glu-Ile-Ile Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O ZCOJVESMNGBGLF-GRLWGSQLSA-N 0.000 description 5
- VGBSZQSKQRMLHD-MNXVOIDGSA-N Glu-Leu-Ile Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O VGBSZQSKQRMLHD-MNXVOIDGSA-N 0.000 description 5
- WNRZUESNGGDCJX-JYJNAYRXSA-N Glu-Leu-Phe Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O WNRZUESNGGDCJX-JYJNAYRXSA-N 0.000 description 5
- ILWHFUZZCFYSKT-AVGNSLFASA-N Glu-Lys-Leu Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(O)=O ILWHFUZZCFYSKT-AVGNSLFASA-N 0.000 description 5
- ZGEJRLJEAMPEDV-SRVKXCTJSA-N Glu-Lys-Met Chemical compound CSCC[C@@H](C(=O)O)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCC(=O)O)N ZGEJRLJEAMPEDV-SRVKXCTJSA-N 0.000 description 5
- YQAQQKPWFOBSMU-WDCWCFNPSA-N Glu-Thr-Leu Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(O)=O YQAQQKPWFOBSMU-WDCWCFNPSA-N 0.000 description 5
- LZEUDRYSAZAJIO-AUTRQRHGSA-N Glu-Val-Glu Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CCC(O)=O)C(O)=O LZEUDRYSAZAJIO-AUTRQRHGSA-N 0.000 description 5
- OCQUNKSFDYDXBG-QXEWZRGKSA-N Gly-Arg-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)CN)CCCN=C(N)N OCQUNKSFDYDXBG-QXEWZRGKSA-N 0.000 description 5
- HDNXXTBKOJKWNN-WDSKDSINSA-N Gly-Glu-Asn Chemical compound NCC(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(N)=O)C(O)=O HDNXXTBKOJKWNN-WDSKDSINSA-N 0.000 description 5
- BIRKKBCSAIHDDF-WDSKDSINSA-N Gly-Glu-Cys Chemical compound NCC(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CS)C(O)=O BIRKKBCSAIHDDF-WDSKDSINSA-N 0.000 description 5
- YKJUITHASJAGHO-HOTGVXAUSA-N Gly-Lys-Trp Chemical compound C1=CC=C2C(=C1)C(=CN2)C[C@@H](C(=O)O)NC(=O)[C@H](CCCCN)NC(=O)CN YKJUITHASJAGHO-HOTGVXAUSA-N 0.000 description 5
- ABPRMMYHROQBLY-NKWVEPMBSA-N Gly-Ser-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CO)NC(=O)CN)C(=O)O ABPRMMYHROQBLY-NKWVEPMBSA-N 0.000 description 5
- DBUNZBWUWCIELX-JHEQGTHGSA-N Gly-Thr-Glu Chemical compound [H]NCC(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCC(O)=O)C(O)=O DBUNZBWUWCIELX-JHEQGTHGSA-N 0.000 description 5
- RIYIFUFFFBIOEU-KBPBESRZSA-N Gly-Tyr-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)CN)CC1=CC=C(O)C=C1 RIYIFUFFFBIOEU-KBPBESRZSA-N 0.000 description 5
- MUGLKCQHTUFLGF-WPRPVWTQSA-N Gly-Val-Met Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCSC)C(=O)O)NC(=O)CN MUGLKCQHTUFLGF-WPRPVWTQSA-N 0.000 description 5
- TVRMJKNELJKNRS-GUBZILKMSA-N His-Glu-Asn Chemical compound C1=C(NC=N1)C[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CC(=O)N)C(=O)O)N TVRMJKNELJKNRS-GUBZILKMSA-N 0.000 description 5
- CCUSLCQWVMWTIS-IXOXFDKPSA-N His-Thr-Leu Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(O)=O CCUSLCQWVMWTIS-IXOXFDKPSA-N 0.000 description 5
- NKVZTQVGUNLLQW-JBDRJPRFSA-N Ile-Ala-Ala Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](C)C(=O)N[C@@H](C)C(=O)O)N NKVZTQVGUNLLQW-JBDRJPRFSA-N 0.000 description 5
- HZMLFETXHFHGBB-UGYAYLCHSA-N Ile-Asn-Asp Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CC(=O)O)C(=O)O)N HZMLFETXHFHGBB-UGYAYLCHSA-N 0.000 description 5
- XENGULNPUDGALZ-ZPFDUUQYSA-N Ile-Asn-Leu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CC(C)C)C(=O)O)N XENGULNPUDGALZ-ZPFDUUQYSA-N 0.000 description 5
- UIEZQYNXCYHMQS-BJDJZHNGSA-N Ile-Lys-Ala Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](C)C(=O)O)N UIEZQYNXCYHMQS-BJDJZHNGSA-N 0.000 description 5
- IDMNOFVUXYYZPF-DKIMLUQUSA-N Ile-Lys-Phe Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)O)N IDMNOFVUXYYZPF-DKIMLUQUSA-N 0.000 description 5
- HJDZMPFEXINXLO-QPHKQPEJSA-N Ile-Thr-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)O)N HJDZMPFEXINXLO-QPHKQPEJSA-N 0.000 description 5
- FXJLRZFMKGHYJP-CFMVVWHZSA-N Ile-Tyr-Asn Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CC=C(C=C1)O)C(=O)N[C@@H](CC(=O)N)C(=O)O)N FXJLRZFMKGHYJP-CFMVVWHZSA-N 0.000 description 5
- DZMWFIRHFFVBHS-ZEWNOJEFSA-N Ile-Tyr-Phe Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CC=C(C=C1)O)C(=O)N[C@@H](CC2=CC=CC=C2)C(=O)O)N DZMWFIRHFFVBHS-ZEWNOJEFSA-N 0.000 description 5
- ILJREDZFPHTUIE-GUBZILKMSA-N Leu-Asp-Glu Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O ILJREDZFPHTUIE-GUBZILKMSA-N 0.000 description 5
- YVKSMSDXKMSIRX-GUBZILKMSA-N Leu-Glu-Asn Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(N)=O)C(O)=O YVKSMSDXKMSIRX-GUBZILKMSA-N 0.000 description 5
- HGFGEMSVBMCFKK-MNXVOIDGSA-N Leu-Ile-Glu Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CCC(O)=O)C(O)=O HGFGEMSVBMCFKK-MNXVOIDGSA-N 0.000 description 5
- KUIDCYNIEJBZBU-AJNGGQMLSA-N Leu-Ile-Leu Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(C)C)C(O)=O KUIDCYNIEJBZBU-AJNGGQMLSA-N 0.000 description 5
- JNDYEOUZBLOVOF-AVGNSLFASA-N Leu-Leu-Gln Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(O)=O JNDYEOUZBLOVOF-AVGNSLFASA-N 0.000 description 5
- LXKNSJLSGPNHSK-KKUMJFAQSA-N Leu-Leu-Lys Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(=O)O)N LXKNSJLSGPNHSK-KKUMJFAQSA-N 0.000 description 5
- AMSSKPUHBUQBOQ-SRVKXCTJSA-N Leu-Ser-Lys Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCCN)C(=O)O)N AMSSKPUHBUQBOQ-SRVKXCTJSA-N 0.000 description 5
- ILDSIMPXNFWKLH-KATARQTJSA-N Leu-Thr-Ser Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CO)C(O)=O ILDSIMPXNFWKLH-KATARQTJSA-N 0.000 description 5
- AXVIGSRGTMNSJU-YESZJQIVSA-N Leu-Tyr-Pro Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CC1=CC=C(C=C1)O)C(=O)N2CCC[C@@H]2C(=O)O)N AXVIGSRGTMNSJU-YESZJQIVSA-N 0.000 description 5
- NFLFJGGKOHYZJF-BJDJZHNGSA-N Lys-Ala-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@H](C)NC(=O)[C@@H](N)CCCCN NFLFJGGKOHYZJF-BJDJZHNGSA-N 0.000 description 5
- GAOJCVKPIGHTGO-UWVGGRQHSA-N Lys-Arg-Gly Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCCNC(N)=N)C(=O)NCC(O)=O GAOJCVKPIGHTGO-UWVGGRQHSA-N 0.000 description 5
- WALVCOOOKULCQM-ULQDDVLXSA-N Lys-Arg-Phe Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O WALVCOOOKULCQM-ULQDDVLXSA-N 0.000 description 5
- ZQCVMVCVPFYXHZ-SRVKXCTJSA-N Lys-Asn-Lys Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@H](C(O)=O)CCCCN ZQCVMVCVPFYXHZ-SRVKXCTJSA-N 0.000 description 5
- QIJVAFLRMVBHMU-KKUMJFAQSA-N Lys-Asp-Phe Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O QIJVAFLRMVBHMU-KKUMJFAQSA-N 0.000 description 5
- GUYHHBZCBQZLFW-GUBZILKMSA-N Lys-Gln-Cys Chemical compound C(CCN)C[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H](CS)C(=O)O)N GUYHHBZCBQZLFW-GUBZILKMSA-N 0.000 description 5
- ZXEUFAVXODIPHC-GUBZILKMSA-N Lys-Glu-Asn Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(N)=O)C(O)=O ZXEUFAVXODIPHC-GUBZILKMSA-N 0.000 description 5
- GRADYHMSAUIKPS-DCAQKATOSA-N Lys-Glu-Gln Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(N)=O)C(O)=O GRADYHMSAUIKPS-DCAQKATOSA-N 0.000 description 5
- DCRWPTBMWMGADO-AVGNSLFASA-N Lys-Glu-Leu Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(O)=O DCRWPTBMWMGADO-AVGNSLFASA-N 0.000 description 5
- QZONCCHVHCOBSK-YUMQZZPRSA-N Lys-Gly-Asn Chemical compound [H]N[C@@H](CCCCN)C(=O)NCC(=O)N[C@@H](CC(N)=O)C(O)=O QZONCCHVHCOBSK-YUMQZZPRSA-N 0.000 description 5
- XNKDCYABMBBEKN-IUCAKERBSA-N Lys-Gly-Gln Chemical compound NCCCC[C@H](N)C(=O)NCC(=O)N[C@H](C(O)=O)CCC(N)=O XNKDCYABMBBEKN-IUCAKERBSA-N 0.000 description 5
- GQFDWEDHOQRNLC-QWRGUYRKSA-N Lys-Gly-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)CNC(=O)[C@@H](N)CCCCN GQFDWEDHOQRNLC-QWRGUYRKSA-N 0.000 description 5
- PBLLTSKBTAHDNA-KBPBESRZSA-N Lys-Gly-Phe Chemical compound [H]N[C@@H](CCCCN)C(=O)NCC(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O PBLLTSKBTAHDNA-KBPBESRZSA-N 0.000 description 5
- ZJWIXBZTAAJERF-IHRRRGAJSA-N Lys-Lys-Arg Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@H](C(O)=O)CCCN=C(N)N ZJWIXBZTAAJERF-IHRRRGAJSA-N 0.000 description 5
- GAHJXEMYXKLZRQ-AJNGGQMLSA-N Lys-Lys-Ile Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O GAHJXEMYXKLZRQ-AJNGGQMLSA-N 0.000 description 5
- YDDDRTIPNTWGIG-SRVKXCTJSA-N Lys-Lys-Ser Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CO)C(O)=O YDDDRTIPNTWGIG-SRVKXCTJSA-N 0.000 description 5
- TWPCWKVOZDUYAA-KKUMJFAQSA-N Lys-Phe-Asp Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(O)=O)C(O)=O TWPCWKVOZDUYAA-KKUMJFAQSA-N 0.000 description 5
- VHTOGMKQXXJOHG-RHYQMDGZSA-N Lys-Thr-Val Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C(C)C)C(O)=O VHTOGMKQXXJOHG-RHYQMDGZSA-N 0.000 description 5
- ORRNBLTZBBESPN-HJWJTTGWSA-N Met-Ile-Phe Chemical compound CSCC[C@H](N)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 ORRNBLTZBBESPN-HJWJTTGWSA-N 0.000 description 5
- SODXFJOPSCXOHE-IHRRRGAJSA-N Met-Leu-Leu Chemical compound CSCC[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(O)=O SODXFJOPSCXOHE-IHRRRGAJSA-N 0.000 description 5
- WPTHAGXMYDRPFD-SRVKXCTJSA-N Met-Lys-Glu Chemical compound CSCC[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCC(O)=O)C(O)=O WPTHAGXMYDRPFD-SRVKXCTJSA-N 0.000 description 5
- PHKBGZKVOJCIMZ-SRVKXCTJSA-N Met-Pro-Arg Chemical compound CSCC[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCCNC(N)=N)C(O)=O PHKBGZKVOJCIMZ-SRVKXCTJSA-N 0.000 description 5
- ULECEJGNDHWSKD-QEJZJMRPSA-N Phe-Ala-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@H](C)NC(=O)[C@@H](N)CC1=CC=CC=C1 ULECEJGNDHWSKD-QEJZJMRPSA-N 0.000 description 5
- IUVYJBMTHARMIP-PCBIJLKTSA-N Phe-Asp-Ile Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O IUVYJBMTHARMIP-PCBIJLKTSA-N 0.000 description 5
- OYQBFWWQSVIHBN-FHWLQOOXSA-N Phe-Glu-Phe Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O OYQBFWWQSVIHBN-FHWLQOOXSA-N 0.000 description 5
- RMKGXGPQIPLTFC-KKUMJFAQSA-N Phe-Lys-Asn Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(N)=O)C(O)=O RMKGXGPQIPLTFC-KKUMJFAQSA-N 0.000 description 5
- CJZTUKSFZUSNCC-FXQIFTODSA-N Pro-Asp-Asn Chemical compound NC(=O)C[C@@H](C(O)=O)NC(=O)[C@H](CC(O)=O)NC(=O)[C@@H]1CCCN1 CJZTUKSFZUSNCC-FXQIFTODSA-N 0.000 description 5
- CLNJSLSHKJECME-BQBZGAKWSA-N Pro-Gly-Ala Chemical compound OC(=O)[C@H](C)NC(=O)CNC(=O)[C@@H]1CCCN1 CLNJSLSHKJECME-BQBZGAKWSA-N 0.000 description 5
- JIWJRKNYLSHONY-KKUMJFAQSA-N Pro-Phe-Glu Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCC(O)=O)C(O)=O JIWJRKNYLSHONY-KKUMJFAQSA-N 0.000 description 5
- RMJZWERKFFNNNS-XGEHTFHBSA-N Pro-Thr-Ser Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CO)C(O)=O RMJZWERKFFNNNS-XGEHTFHBSA-N 0.000 description 5
- WTUJZHKANPDPIN-CIUDSAMLSA-N Ser-Ala-Lys Chemical compound C[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CO)N WTUJZHKANPDPIN-CIUDSAMLSA-N 0.000 description 5
- IXUGADGDCQDLSA-FXQIFTODSA-N Ser-Gln-Gln Chemical compound C(CC(=O)N)[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)O)NC(=O)[C@H](CO)N IXUGADGDCQDLSA-FXQIFTODSA-N 0.000 description 5
- QKQDTEYDEIJPNK-GUBZILKMSA-N Ser-Glu-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@@H](N)CO QKQDTEYDEIJPNK-GUBZILKMSA-N 0.000 description 5
- IOVHBRCQOGWAQH-ZKWXMUAHSA-N Ser-Gly-Ile Chemical compound [H]N[C@@H](CO)C(=O)NCC(=O)N[C@@H]([C@@H](C)CC)C(O)=O IOVHBRCQOGWAQH-ZKWXMUAHSA-N 0.000 description 5
- MOINZPRHJGTCHZ-MMWGEVLESA-N Ser-Ile-Pro Chemical compound CC[C@H](C)[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CO)N MOINZPRHJGTCHZ-MMWGEVLESA-N 0.000 description 5
- UIPXCLNLUUAMJU-JBDRJPRFSA-N Ser-Ile-Ser Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CO)C(O)=O UIPXCLNLUUAMJU-JBDRJPRFSA-N 0.000 description 5
- ZIFYDQAFEMIZII-GUBZILKMSA-N Ser-Leu-Glu Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(O)=O)C(O)=O ZIFYDQAFEMIZII-GUBZILKMSA-N 0.000 description 5
- VMLONWHIORGALA-SRVKXCTJSA-N Ser-Leu-Leu Chemical compound CC(C)C[C@@H](C([O-])=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H]([NH3+])CO VMLONWHIORGALA-SRVKXCTJSA-N 0.000 description 5
- UGTZYIPOBYXWRW-SRVKXCTJSA-N Ser-Phe-Asp Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(O)=O)C(O)=O UGTZYIPOBYXWRW-SRVKXCTJSA-N 0.000 description 5
- UBTNVMGPMYDYIU-HJPIBITLSA-N Ser-Tyr-Ile Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O UBTNVMGPMYDYIU-HJPIBITLSA-N 0.000 description 5
- SGZVZUCRAVSPKQ-FXQIFTODSA-N Ser-Val-Cys Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CS)C(=O)O)NC(=O)[C@H](CO)N SGZVZUCRAVSPKQ-FXQIFTODSA-N 0.000 description 5
- JVTHIXKSVYEWNI-JRQIVUDYSA-N Thr-Asn-Tyr Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O JVTHIXKSVYEWNI-JRQIVUDYSA-N 0.000 description 5
- XOWKUMFHEZLKLT-CIQUZCHMSA-N Thr-Ile-Ala Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](C)C(O)=O XOWKUMFHEZLKLT-CIQUZCHMSA-N 0.000 description 5
- TZJSEJOXAIWOST-RHYQMDGZSA-N Thr-Lys-Arg Chemical compound C[C@@H](O)[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@H](C(O)=O)CCCN=C(N)N TZJSEJOXAIWOST-RHYQMDGZSA-N 0.000 description 5
- QNCFWHZVRNXAKW-OEAJRASXSA-N Thr-Lys-Phe Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O QNCFWHZVRNXAKW-OEAJRASXSA-N 0.000 description 5
- VGYVVSQFSSKZRJ-OEAJRASXSA-N Thr-Phe-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@@H](N)[C@H](O)C)CC1=CC=CC=C1 VGYVVSQFSSKZRJ-OEAJRASXSA-N 0.000 description 5
- 108091028113 Trans-activating crRNA Proteins 0.000 description 5
- IYHNBRUWVBIVJR-IHRRRGAJSA-N Tyr-Gln-Glu Chemical compound OC(=O)CC[C@@H](C(O)=O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 IYHNBRUWVBIVJR-IHRRRGAJSA-N 0.000 description 5
- BXPOOVDVGWEXDU-WZLNRYEVSA-N Tyr-Ile-Thr Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)O)C(O)=O BXPOOVDVGWEXDU-WZLNRYEVSA-N 0.000 description 5
- BYAKMYBZADCNMN-JYJNAYRXSA-N Tyr-Lys-Gln Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCC(N)=O)C(O)=O BYAKMYBZADCNMN-JYJNAYRXSA-N 0.000 description 5
- LVILBTSHPTWDGE-PMVMPFDFSA-N Tyr-Trp-Lys Chemical compound C([C@H](N)C(=O)N[C@@H](CC=1C2=CC=CC=C2NC=1)C(=O)N[C@@H](CCCCN)C(O)=O)C1=CC=C(O)C=C1 LVILBTSHPTWDGE-PMVMPFDFSA-N 0.000 description 5
- HZDQUVQEVVYDDA-ACRUOGEOSA-N Tyr-Tyr-Leu Chemical compound C([C@@H](C(=O)N[C@@H](CC(C)C)C(O)=O)NC(=O)[C@@H](N)CC=1C=CC(O)=CC=1)C1=CC=C(O)C=C1 HZDQUVQEVVYDDA-ACRUOGEOSA-N 0.000 description 5
- OBKOPLHSRDATFO-XHSDSOJGSA-N Tyr-Val-Pro Chemical compound CC(C)[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CC2=CC=C(C=C2)O)N OBKOPLHSRDATFO-XHSDSOJGSA-N 0.000 description 5
- GXAZTLJYINLMJL-LAEOZQHASA-N Val-Asn-Gln Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CCC(=O)N)C(=O)O)N GXAZTLJYINLMJL-LAEOZQHASA-N 0.000 description 5
- UDNYEPLJTRDMEJ-RCOVLWMOSA-N Val-Asn-Gly Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)NCC(=O)O)N UDNYEPLJTRDMEJ-RCOVLWMOSA-N 0.000 description 5
- NZGOVKLVQNOEKP-YDHLFZDLSA-N Val-Phe-Asn Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(=O)N)C(=O)O)N NZGOVKLVQNOEKP-YDHLFZDLSA-N 0.000 description 5
- 108010041407 alanylaspartic acid Proteins 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 5
- XBGGUPMXALFZOT-UHFFFAOYSA-N glycyl-L-tyrosine hemihydrate Natural products NCC(=O)NC(C(O)=O)CC1=CC=C(O)C=C1 XBGGUPMXALFZOT-UHFFFAOYSA-N 0.000 description 5
- 229930027917 kanamycin Natural products 0.000 description 5
- 229960000318 kanamycin Drugs 0.000 description 5
- SBUJHOSQTJFQJX-NOAMYHISSA-N kanamycin Chemical compound O[C@@H]1[C@@H](O)[C@H](O)[C@@H](CN)O[C@@H]1O[C@H]1[C@H](O)[C@@H](O[C@@H]2[C@@H]([C@@H](N)[C@H](O)[C@@H](CO)O2)O)[C@H](N)C[C@@H]1N SBUJHOSQTJFQJX-NOAMYHISSA-N 0.000 description 5
- 229930182823 kanamycin A Natural products 0.000 description 5
- 108010091871 leucylmethionine Proteins 0.000 description 5
- 108010074082 phenylalanyl-alanyl-lysine Proteins 0.000 description 5
- 108010061238 threonyl-glycine Proteins 0.000 description 5
- 108010031491 threonyl-lysyl-glutamic acid Proteins 0.000 description 5
- 101710169336 5'-deoxyadenosine deaminase Proteins 0.000 description 4
- 102000055025 Adenosine deaminases Human genes 0.000 description 4
- KGHLGJAXYSVNJP-WHFBIAKZSA-N Asp-Ser-Gly Chemical compound OC(=O)C[C@H](N)C(=O)N[C@@H](CO)C(=O)NCC(O)=O KGHLGJAXYSVNJP-WHFBIAKZSA-N 0.000 description 4
- GIKOVDMXBAFXDF-NHCYSSNCSA-N Asp-Val-Leu Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC(C)C)C(O)=O GIKOVDMXBAFXDF-NHCYSSNCSA-N 0.000 description 4
- 238000010453 CRISPR/Cas method Methods 0.000 description 4
- 102000053602 DNA Human genes 0.000 description 4
- 102000004190 Enzymes Human genes 0.000 description 4
- 108090000790 Enzymes Proteins 0.000 description 4
- 108020005004 Guide RNA Proteins 0.000 description 4
- 108091028043 Nucleic acid sequence Proteins 0.000 description 4
- PYTKULIABVRXSC-BWBBJGPYSA-N Ser-Ser-Thr Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CO)C(=O)N[C@@H]([C@@H](C)O)C(O)=O PYTKULIABVRXSC-BWBBJGPYSA-N 0.000 description 4
- LUMQYLVYUIRHHU-YJRXYDGGSA-N Tyr-Ser-Thr Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CO)C(=O)N[C@@H]([C@@H](C)O)C(O)=O LUMQYLVYUIRHHU-YJRXYDGGSA-N 0.000 description 4
- 108010069205 aspartyl-phenylalanine Proteins 0.000 description 4
- 230000001580 bacterial effect Effects 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 4
- 238000010276 construction Methods 0.000 description 4
- BPHPUYQFMNQIOC-NXRLNHOXSA-N isopropyl beta-D-thiogalactopyranoside Chemical compound CC(C)S[C@@H]1O[C@H](CO)[C@H](O)[C@H](O)[C@H]1O BPHPUYQFMNQIOC-NXRLNHOXSA-N 0.000 description 4
- 101150112117 nprE gene Proteins 0.000 description 4
- DXHINQUXBZNUCF-MELADBBJSA-N Asn-Tyr-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CC2=CC=C(C=C2)O)NC(=O)[C@H](CC(=O)N)N)C(=O)O DXHINQUXBZNUCF-MELADBBJSA-N 0.000 description 3
- CBHVAFXKOYAHOY-NHCYSSNCSA-N Asn-Val-Leu Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC(C)C)C(O)=O CBHVAFXKOYAHOY-NHCYSSNCSA-N 0.000 description 3
- 241000193830 Bacillus <bacterium> Species 0.000 description 3
- 101100329224 Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003) cpf1 gene Proteins 0.000 description 3
- PMGDADKJMCOXHX-UHFFFAOYSA-N L-Arginyl-L-glutamin-acetat Natural products NC(=N)NCCCC(N)C(=O)NC(CCC(N)=O)C(O)=O PMGDADKJMCOXHX-UHFFFAOYSA-N 0.000 description 3
- RRSLQOLASISYTB-CIUDSAMLSA-N Leu-Cys-Asp Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CS)C(=O)N[C@@H](CC(O)=O)C(O)=O RRSLQOLASISYTB-CIUDSAMLSA-N 0.000 description 3
- 108091005804 Peptidases Proteins 0.000 description 3
- OXKJSGGTHFMGDT-UFYCRDLUSA-N Phe-Phe-Arg Chemical compound C([C@H](N)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O)C1=CC=CC=C1 OXKJSGGTHFMGDT-UFYCRDLUSA-N 0.000 description 3
- 239000004365 Protease Substances 0.000 description 3
- ABSXSJZNRAQDDI-KJEVXHAQSA-N Tyr-Val-Thr Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H]([C@@H](C)O)C(O)=O ABSXSJZNRAQDDI-KJEVXHAQSA-N 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 101150009206 aprE gene Proteins 0.000 description 3
- 230000033228 biological regulation Effects 0.000 description 3
- 101150059443 cas12a gene Proteins 0.000 description 3
- 230000000052 comparative effect Effects 0.000 description 3
- 230000000295 complement effect Effects 0.000 description 3
- 239000013604 expression vector Substances 0.000 description 3
- 239000012634 fragment Substances 0.000 description 3
- 239000007788 liquid Substances 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 102000004196 processed proteins & peptides Human genes 0.000 description 3
- HAJWYALLJIATCX-FXQIFTODSA-N Asn-Asn-Arg Chemical compound C(C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](CC(=O)N)N)CN=C(N)N HAJWYALLJIATCX-FXQIFTODSA-N 0.000 description 2
- 101150005393 CBF1 gene Proteins 0.000 description 2
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 2
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 2
- 102100035102 E3 ubiquitin-protein ligase MYCBP2 Human genes 0.000 description 2
- ZMXZGYLINVNTKH-DZKIICNBSA-N Gln-Val-Phe Chemical compound NC(=O)CC[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 ZMXZGYLINVNTKH-DZKIICNBSA-N 0.000 description 2
- RJONUNZIMUXUOI-GUBZILKMSA-N Glu-Asn-Lys Chemical compound C(CCN)C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](CCC(=O)O)N RJONUNZIMUXUOI-GUBZILKMSA-N 0.000 description 2
- UMIRPYLZFKOEOH-YVNDNENWSA-N Glu-Gln-Ile Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O UMIRPYLZFKOEOH-YVNDNENWSA-N 0.000 description 2
- GGJOGFJIPPGNRK-JSGCOSHPSA-N Glu-Gly-Trp Chemical compound C1=CC=C2C(C[C@H](NC(=O)CNC(=O)[C@H](CCC(O)=O)N)C(O)=O)=CNC2=C1 GGJOGFJIPPGNRK-JSGCOSHPSA-N 0.000 description 2
- JBSLJUPMTYLLFH-MELADBBJSA-N His-His-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CC2=CN=CN2)NC(=O)[C@H](CC3=CN=CN3)N)C(=O)O JBSLJUPMTYLLFH-MELADBBJSA-N 0.000 description 2
- 229930010555 Inosine Natural products 0.000 description 2
- UGQMRVRMYYASKQ-KQYNXXCUSA-N Inosine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C2=NC=NC(O)=C2N=C1 UGQMRVRMYYASKQ-KQYNXXCUSA-N 0.000 description 2
- SJNZALDHDUYDBU-IHRRRGAJSA-N Lys-Arg-Lys Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CCCCN)C(O)=O SJNZALDHDUYDBU-IHRRRGAJSA-N 0.000 description 2
- JHNOXVASMSXSNB-WEDXCCLWSA-N Lys-Thr-Gly Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)O)C(=O)NCC(O)=O JHNOXVASMSXSNB-WEDXCCLWSA-N 0.000 description 2
- 240000007594 Oryza sativa Species 0.000 description 2
- 235000007164 Oryza sativa Nutrition 0.000 description 2
- 102000035195 Peptidases Human genes 0.000 description 2
- 102000006382 Ribonucleases Human genes 0.000 description 2
- 108010083644 Ribonucleases Proteins 0.000 description 2
- 102000004389 Ribonucleoproteins Human genes 0.000 description 2
- 108010081734 Ribonucleoproteins Proteins 0.000 description 2
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 2
- UKEVLVBHRKWECS-LSJOCFKGSA-N Val-Ile-Gly Chemical compound CC[C@H](C)[C@@H](C(=O)NCC(=O)O)NC(=O)[C@H](C(C)C)N UKEVLVBHRKWECS-LSJOCFKGSA-N 0.000 description 2
- 150000001413 amino acids Chemical group 0.000 description 2
- 239000003242 anti bacterial agent Substances 0.000 description 2
- 229940088710 antibiotic agent Drugs 0.000 description 2
- 238000003776 cleavage reaction Methods 0.000 description 2
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 108010037850 glycylvaline Proteins 0.000 description 2
- 239000001963 growth medium Substances 0.000 description 2
- 229960003786 inosine Drugs 0.000 description 2
- 239000002609 medium Substances 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000037361 pathway Effects 0.000 description 2
- 229920001184 polypeptide Polymers 0.000 description 2
- 238000004321 preservation Methods 0.000 description 2
- 108010029020 prolylglycine Proteins 0.000 description 2
- 235000009566 rice Nutrition 0.000 description 2
- 230000007017 scission Effects 0.000 description 2
- 125000006850 spacer group Chemical group 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 101150011891 tadA gene Proteins 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000011282 treatment Methods 0.000 description 2
- 238000011144 upstream manufacturing Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- CNKBMTKICGGSCQ-ACRUOGEOSA-N (2S)-2-[[(2S)-2-[[(2S)-2,6-diamino-1-oxohexyl]amino]-1-oxo-3-phenylpropyl]amino]-3-(4-hydroxyphenyl)propanoic acid Chemical compound C([C@H](NC(=O)[C@@H](N)CCCCN)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(O)=O)C1=CC=CC=C1 CNKBMTKICGGSCQ-ACRUOGEOSA-N 0.000 description 1
- XVZCXCTYGHPNEM-IHRRRGAJSA-N (2s)-1-[(2s)-2-[[(2s)-2-amino-4-methylpentanoyl]amino]-4-methylpentanoyl]pyrrolidine-2-carboxylic acid Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N1CCC[C@H]1C(O)=O XVZCXCTYGHPNEM-IHRRRGAJSA-N 0.000 description 1
- PAHHYDSPOXDASW-VGWMRTNUSA-N (2s)-6-amino-2-[[(2s)-6-amino-2-[[(2s)-1-[(2s)-2-amino-3-hydroxypropanoyl]pyrrolidine-2-carbonyl]amino]hexanoyl]amino]hexanoic acid Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@H](CCCCN)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)CO PAHHYDSPOXDASW-VGWMRTNUSA-N 0.000 description 1
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 1
- IPAMZHCXCQLRJR-UHFFFAOYSA-N 2-[[2-[[2-[(2-amino-3-methylbutanoyl)amino]-4-methylpentanoyl]amino]-3-methylbutanoyl]amino]-4-methylpentanoic acid Chemical compound CC(C)CC(C(O)=O)NC(=O)C(C(C)C)NC(=O)C(CC(C)C)NC(=O)C(N)C(C)C IPAMZHCXCQLRJR-UHFFFAOYSA-N 0.000 description 1
- STACJSVFHSEZJV-GHCJXIJMSA-N Ala-Asn-Ile Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O STACJSVFHSEZJV-GHCJXIJMSA-N 0.000 description 1
- ZVFVBBGVOILKPO-WHFBIAKZSA-N Ala-Gly-Ala Chemical compound C[C@H](N)C(=O)NCC(=O)N[C@@H](C)C(O)=O ZVFVBBGVOILKPO-WHFBIAKZSA-N 0.000 description 1
- CKLDHDOIYBVUNP-KBIXCLLPSA-N Ala-Ile-Glu Chemical compound [H]N[C@@H](C)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CCC(O)=O)C(O)=O CKLDHDOIYBVUNP-KBIXCLLPSA-N 0.000 description 1
- RZZMZYZXNJRPOJ-BJDJZHNGSA-N Ala-Ile-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](C)N RZZMZYZXNJRPOJ-BJDJZHNGSA-N 0.000 description 1
- YHKANGMVQWRMAP-DCAQKATOSA-N Ala-Leu-Arg Chemical compound C[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@H](C(O)=O)CCCN=C(N)N YHKANGMVQWRMAP-DCAQKATOSA-N 0.000 description 1
- SDZRIBWEVVRDQI-CIUDSAMLSA-N Ala-Lys-Asp Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(O)=O)C(O)=O SDZRIBWEVVRDQI-CIUDSAMLSA-N 0.000 description 1
- PMQXMXAASGFUDX-SRVKXCTJSA-N Ala-Lys-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@H](C)N)CCCCN PMQXMXAASGFUDX-SRVKXCTJSA-N 0.000 description 1
- KQESEZXHYOUIIM-CQDKDKBSSA-N Ala-Lys-Tyr Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O KQESEZXHYOUIIM-CQDKDKBSSA-N 0.000 description 1
- WEZNQZHACPSMEF-QEJZJMRPSA-N Ala-Phe-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@@H](N)C)CC1=CC=CC=C1 WEZNQZHACPSMEF-QEJZJMRPSA-N 0.000 description 1
- IOFVWPYSRSCWHI-JXUBOQSCSA-N Ala-Thr-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H]([C@@H](C)O)NC(=O)[C@H](C)N IOFVWPYSRSCWHI-JXUBOQSCSA-N 0.000 description 1
- QOIGKCBMXUCDQU-KDXUFGMBSA-N Ala-Thr-Pro Chemical compound C[C@H]([C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](C)N)O QOIGKCBMXUCDQU-KDXUFGMBSA-N 0.000 description 1
- DCGLNNVKIZXQOJ-FXQIFTODSA-N Arg-Asn-Ala Chemical compound C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](CCCN=C(N)N)N DCGLNNVKIZXQOJ-FXQIFTODSA-N 0.000 description 1
- LKDHUGLXOHYINY-XUXIUFHCSA-N Arg-Ile-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CCCN=C(N)N)N LKDHUGLXOHYINY-XUXIUFHCSA-N 0.000 description 1
- MJINRRBEMOLJAK-DCAQKATOSA-N Arg-Lys-Asp Chemical compound OC(=O)C[C@@H](C(O)=O)NC(=O)[C@H](CCCCN)NC(=O)[C@@H](N)CCCN=C(N)N MJINRRBEMOLJAK-DCAQKATOSA-N 0.000 description 1
- LCBSSOCDWUTQQV-SDDRHHMPSA-N Arg-Met-Pro Chemical compound CSCC[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CCCN=C(N)N)N LCBSSOCDWUTQQV-SDDRHHMPSA-N 0.000 description 1
- PRLPSDIHSRITSF-UNQGMJICSA-N Arg-Phe-Thr Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H]([C@@H](C)O)C(O)=O PRLPSDIHSRITSF-UNQGMJICSA-N 0.000 description 1
- ULBHWNVWSCJLCO-NHCYSSNCSA-N Arg-Val-Glu Chemical compound OC(=O)CC[C@@H](C(O)=O)NC(=O)[C@H](C(C)C)NC(=O)[C@@H](N)CCCN=C(N)N ULBHWNVWSCJLCO-NHCYSSNCSA-N 0.000 description 1
- PDQBXRSOSCTGKY-ACZMJKKPSA-N Asn-Ala-Gln Chemical compound C[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)O)NC(=O)[C@H](CC(=O)N)N PDQBXRSOSCTGKY-ACZMJKKPSA-N 0.000 description 1
- CMLGVVWQQHUXOZ-GHCJXIJMSA-N Asn-Ala-Ile Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](C)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O CMLGVVWQQHUXOZ-GHCJXIJMSA-N 0.000 description 1
- VDCIPFYVCICPEC-FXQIFTODSA-N Asn-Arg-Ala Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](C)C(O)=O VDCIPFYVCICPEC-FXQIFTODSA-N 0.000 description 1
- JJGRJMKUOYXZRA-LPEHRKFASA-N Asn-Arg-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CCCN=C(N)N)NC(=O)[C@H](CC(=O)N)N)C(=O)O JJGRJMKUOYXZRA-LPEHRKFASA-N 0.000 description 1
- WPOLSNAQGVHROR-GUBZILKMSA-N Asn-Gln-Leu Chemical compound CC(C)C[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)N)NC(=O)[C@H](CC(=O)N)N WPOLSNAQGVHROR-GUBZILKMSA-N 0.000 description 1
- DXVMJJNAOVECBA-WHFBIAKZSA-N Asn-Gly-Asn Chemical compound NC(=O)C[C@H](N)C(=O)NCC(=O)N[C@@H](CC(N)=O)C(O)=O DXVMJJNAOVECBA-WHFBIAKZSA-N 0.000 description 1
- SPCONPVIDFMDJI-QSFUFRPTSA-N Asn-Ile-Val Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](C(C)C)C(O)=O SPCONPVIDFMDJI-QSFUFRPTSA-N 0.000 description 1
- GLWFAWNYGWBMOC-SRVKXCTJSA-N Asn-Leu-Leu Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(O)=O GLWFAWNYGWBMOC-SRVKXCTJSA-N 0.000 description 1
- FTSAJSADJCMDHH-CIUDSAMLSA-N Asn-Lys-Asp Chemical compound C(CCN)C[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)O)NC(=O)[C@H](CC(=O)N)N FTSAJSADJCMDHH-CIUDSAMLSA-N 0.000 description 1
- AWXDRZJQCVHCIT-DCAQKATOSA-N Asn-Pro-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)CC(N)=O AWXDRZJQCVHCIT-DCAQKATOSA-N 0.000 description 1
- QIRJQYQOIKBPBZ-IHRRRGAJSA-N Asn-Tyr-Arg Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O QIRJQYQOIKBPBZ-IHRRRGAJSA-N 0.000 description 1
- KTDWFWNZLLFEFU-KKUMJFAQSA-N Asn-Tyr-His Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)N[C@@H](CC2=CN=CN2)C(=O)O)NC(=O)[C@H](CC(=O)N)N)O KTDWFWNZLLFEFU-KKUMJFAQSA-N 0.000 description 1
- BEHQTVDBCLSCBY-CFMVVWHZSA-N Asn-Tyr-Ile Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O BEHQTVDBCLSCBY-CFMVVWHZSA-N 0.000 description 1
- XACXDSRQIXRMNS-OLHMAJIHSA-N Asp-Asn-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](CC(=O)O)N)O XACXDSRQIXRMNS-OLHMAJIHSA-N 0.000 description 1
- QXHVOUSPVAWEMX-ZLUOBGJFSA-N Asp-Asp-Ser Chemical compound OC(=O)C[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(O)=O QXHVOUSPVAWEMX-ZLUOBGJFSA-N 0.000 description 1
- VZNOVQKGJQJOCS-SRVKXCTJSA-N Asp-Asp-Tyr Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O VZNOVQKGJQJOCS-SRVKXCTJSA-N 0.000 description 1
- RATOMFTUDRYMKX-ACZMJKKPSA-N Asp-Glu-Cys Chemical compound C(CC(=O)O)[C@@H](C(=O)N[C@@H](CS)C(=O)O)NC(=O)[C@H](CC(=O)O)N RATOMFTUDRYMKX-ACZMJKKPSA-N 0.000 description 1
- YDJVIBMKAMQPPP-LAEOZQHASA-N Asp-Glu-Val Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](C(C)C)C(O)=O YDJVIBMKAMQPPP-LAEOZQHASA-N 0.000 description 1
- RPUYTJJZXQBWDT-SRVKXCTJSA-N Asp-Phe-Ser Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CO)C(=O)O)NC(=O)[C@H](CC(=O)O)N RPUYTJJZXQBWDT-SRVKXCTJSA-N 0.000 description 1
- WMLFFCRUSPNENW-ZLUOBGJFSA-N Asp-Ser-Ala Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](C)C(O)=O WMLFFCRUSPNENW-ZLUOBGJFSA-N 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 108700026883 Bacteria AprE Proteins 0.000 description 1
- 102220484559 C-type lectin domain family 4 member A_H36L_mutation Human genes 0.000 description 1
- 108091079001 CRISPR RNA Proteins 0.000 description 1
- 238000010356 CRISPR-Cas9 genome editing Methods 0.000 description 1
- HPZAJRPYUIHDIN-BZSNNMDCSA-N Cys-Tyr-Phe Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)O)NC(=O)[C@H](CC2=CC=C(C=C2)O)NC(=O)[C@H](CS)N HPZAJRPYUIHDIN-BZSNNMDCSA-N 0.000 description 1
- 230000007018 DNA scission Effects 0.000 description 1
- 241000196324 Embryophyta Species 0.000 description 1
- 108091029865 Exogenous DNA Proteins 0.000 description 1
- 241000233866 Fungi Species 0.000 description 1
- PCKOTDPDHIBGRW-CIUDSAMLSA-N Gln-Cys-Arg Chemical compound C(C[C@@H](C(=O)O)NC(=O)[C@H](CS)NC(=O)[C@H](CCC(=O)N)N)CN=C(N)N PCKOTDPDHIBGRW-CIUDSAMLSA-N 0.000 description 1
- KDXKFBSNIJYNNR-YVNDNENWSA-N Gln-Glu-Ile Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O KDXKFBSNIJYNNR-YVNDNENWSA-N 0.000 description 1
- LFIVHGMKWFGUGK-IHRRRGAJSA-N Gln-Glu-Phe Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCC(=O)N)N LFIVHGMKWFGUGK-IHRRRGAJSA-N 0.000 description 1
- YXQCLIVLWCKCRS-RYUDHWBXSA-N Gln-Gly-Tyr Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)O)NC(=O)CNC(=O)[C@H](CCC(=O)N)N)O YXQCLIVLWCKCRS-RYUDHWBXSA-N 0.000 description 1
- HDUDGCZEOZEFOA-KBIXCLLPSA-N Gln-Ile-Ala Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](C)C(=O)O)NC(=O)[C@H](CCC(=O)N)N HDUDGCZEOZEFOA-KBIXCLLPSA-N 0.000 description 1
- ITZWDGBYBPUZRG-KBIXCLLPSA-N Gln-Ile-Ser Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CO)C(O)=O ITZWDGBYBPUZRG-KBIXCLLPSA-N 0.000 description 1
- FLLRAEJOLZPSMN-CIUDSAMLSA-N Glu-Asn-Arg Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@H](C(O)=O)CCCN=C(N)N FLLRAEJOLZPSMN-CIUDSAMLSA-N 0.000 description 1
- VAZZOGXDUQSVQF-NUMRIWBASA-N Glu-Asn-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](CCC(=O)O)N)O VAZZOGXDUQSVQF-NUMRIWBASA-N 0.000 description 1
- XKPOCESCRTVRPL-KBIXCLLPSA-N Glu-Cys-Ile Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CS)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O XKPOCESCRTVRPL-KBIXCLLPSA-N 0.000 description 1
- XHUCVVHRLNPZSZ-CIUDSAMLSA-N Glu-Gln-Glu Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O XHUCVVHRLNPZSZ-CIUDSAMLSA-N 0.000 description 1
- GRHXUHCFENOCOS-ZPFDUUQYSA-N Glu-Ile-Met Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCSC)C(=O)O)NC(=O)[C@H](CCC(=O)O)N GRHXUHCFENOCOS-ZPFDUUQYSA-N 0.000 description 1
- HVYWQYLBVXMXSV-GUBZILKMSA-N Glu-Leu-Ala Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C)C(O)=O HVYWQYLBVXMXSV-GUBZILKMSA-N 0.000 description 1
- IRXNJYPKBVERCW-DCAQKATOSA-N Glu-Leu-Glu Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(O)=O)C(O)=O IRXNJYPKBVERCW-DCAQKATOSA-N 0.000 description 1
- RBXSZQRSEGYDFG-GUBZILKMSA-N Glu-Lys-Ser Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CO)C(O)=O RBXSZQRSEGYDFG-GUBZILKMSA-N 0.000 description 1
- FGSGPLRPQCZBSQ-AVGNSLFASA-N Glu-Phe-Ser Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CO)C(O)=O FGSGPLRPQCZBSQ-AVGNSLFASA-N 0.000 description 1
- VNCNWQPIQYAMAK-ACZMJKKPSA-N Glu-Ser-Ser Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CO)C(O)=O VNCNWQPIQYAMAK-ACZMJKKPSA-N 0.000 description 1
- RMWAOBGCZZSJHE-UMNHJUIQSA-N Glu-Val-Pro Chemical compound CC(C)[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CCC(=O)O)N RMWAOBGCZZSJHE-UMNHJUIQSA-N 0.000 description 1
- BRFJMRSRMOMIMU-WHFBIAKZSA-N Gly-Ala-Asn Chemical compound NCC(=O)N[C@@H](C)C(=O)N[C@@H](CC(N)=O)C(O)=O BRFJMRSRMOMIMU-WHFBIAKZSA-N 0.000 description 1
- WKJKBELXHCTHIJ-WPRPVWTQSA-N Gly-Arg-Val Chemical compound CC(C)[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)CN)CCCN=C(N)N WKJKBELXHCTHIJ-WPRPVWTQSA-N 0.000 description 1
- BGVYNAQWHSTTSP-BYULHYEWSA-N Gly-Asn-Ile Chemical compound [H]NCC(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O BGVYNAQWHSTTSP-BYULHYEWSA-N 0.000 description 1
- KTSZUNRRYXPZTK-BQBZGAKWSA-N Gly-Gln-Glu Chemical compound NCC(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O KTSZUNRRYXPZTK-BQBZGAKWSA-N 0.000 description 1
- XPJBQTCXPJNIFE-ZETCQYMHSA-N Gly-Gly-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)CNC(=O)CN XPJBQTCXPJNIFE-ZETCQYMHSA-N 0.000 description 1
- HKSNHPVETYYJBK-LAEOZQHASA-N Gly-Ile-Glu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)O)NC(=O)CN HKSNHPVETYYJBK-LAEOZQHASA-N 0.000 description 1
- UESJMAMHDLEHGM-NHCYSSNCSA-N Gly-Ile-Leu Chemical compound NCC(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(C)C)C(O)=O UESJMAMHDLEHGM-NHCYSSNCSA-N 0.000 description 1
- SCWYHUQOOFRVHP-MBLNEYKQSA-N Gly-Ile-Thr Chemical compound NCC(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)O)C(O)=O SCWYHUQOOFRVHP-MBLNEYKQSA-N 0.000 description 1
- DHNXGWVNLFPOMQ-KBPBESRZSA-N Gly-Phe-His Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CC2=CN=CN2)C(=O)O)NC(=O)CN DHNXGWVNLFPOMQ-KBPBESRZSA-N 0.000 description 1
- WNZOCXUOGVYYBJ-CDMKHQONSA-N Gly-Phe-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](CC1=CC=CC=C1)NC(=O)CN)O WNZOCXUOGVYYBJ-CDMKHQONSA-N 0.000 description 1
- VSLXGYMEHVAJBH-DLOVCJGASA-N His-Ala-Leu Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](C)C(=O)N[C@@H](CC(C)C)C(O)=O VSLXGYMEHVAJBH-DLOVCJGASA-N 0.000 description 1
- ISQOVWDWRUONJH-YESZJQIVSA-N His-Tyr-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CC2=CC=C(C=C2)O)NC(=O)[C@H](CC3=CN=CN3)N)C(=O)O ISQOVWDWRUONJH-YESZJQIVSA-N 0.000 description 1
- HDOYNXLPTRQLAD-JBDRJPRFSA-N Ile-Ala-Ser Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](C)C(=O)N[C@@H](CO)C(=O)O)N HDOYNXLPTRQLAD-JBDRJPRFSA-N 0.000 description 1
- LEDRIAHEWDJRMF-CFMVVWHZSA-N Ile-Asn-Tyr Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 LEDRIAHEWDJRMF-CFMVVWHZSA-N 0.000 description 1
- QSPLUJGYOPZINY-ZPFDUUQYSA-N Ile-Asp-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CCCCN)C(=O)O)N QSPLUJGYOPZINY-ZPFDUUQYSA-N 0.000 description 1
- DFJJAVZIHDFOGQ-MNXVOIDGSA-N Ile-Glu-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CCCCN)C(=O)O)N DFJJAVZIHDFOGQ-MNXVOIDGSA-N 0.000 description 1
- JXMSHKFPDIUYGS-SIUGBPQLSA-N Ile-Glu-Tyr Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CC1=CC=C(C=C1)O)C(=O)O)N JXMSHKFPDIUYGS-SIUGBPQLSA-N 0.000 description 1
- NHJKZMDIMMTVCK-QXEWZRGKSA-N Ile-Gly-Arg Chemical compound CC[C@H](C)[C@H](N)C(=O)NCC(=O)N[C@H](C(O)=O)CCCN=C(N)N NHJKZMDIMMTVCK-QXEWZRGKSA-N 0.000 description 1
- PWDSHAAAFXISLE-SXTJYALSSA-N Ile-Ile-Asp Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(O)=O)C(O)=O PWDSHAAAFXISLE-SXTJYALSSA-N 0.000 description 1
- OUUCIIJSBIBCHB-ZPFDUUQYSA-N Ile-Leu-Asp Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(O)=O)C(O)=O OUUCIIJSBIBCHB-ZPFDUUQYSA-N 0.000 description 1
- PNTWNAXGBOZMBO-MNXVOIDGSA-N Ile-Lys-Gln Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCC(=O)N)C(=O)O)N PNTWNAXGBOZMBO-MNXVOIDGSA-N 0.000 description 1
- YSGBJIQXTIVBHZ-AJNGGQMLSA-N Ile-Lys-Leu Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(O)=O YSGBJIQXTIVBHZ-AJNGGQMLSA-N 0.000 description 1
- FFJQAEYLAQMGDL-MGHWNKPDSA-N Ile-Lys-Tyr Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 FFJQAEYLAQMGDL-MGHWNKPDSA-N 0.000 description 1
- IIWQTXMUALXGOV-PCBIJLKTSA-N Ile-Phe-Asp Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(=O)O)C(=O)O)N IIWQTXMUALXGOV-PCBIJLKTSA-N 0.000 description 1
- NLZVTPYXYXMCIP-XUXIUFHCSA-N Ile-Pro-Lys Chemical compound CC[C@H](C)[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCCCN)C(O)=O NLZVTPYXYXMCIP-XUXIUFHCSA-N 0.000 description 1
- CZWANIQKACCEKW-CYDGBPFRSA-N Ile-Pro-Met Chemical compound CC[C@H](C)[C@@H](C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCSC)C(=O)O)N CZWANIQKACCEKW-CYDGBPFRSA-N 0.000 description 1
- VGSPNSSCMOHRRR-BJDJZHNGSA-N Ile-Ser-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCCN)C(=O)O)N VGSPNSSCMOHRRR-BJDJZHNGSA-N 0.000 description 1
- NAFIFZNBSPWYOO-RWRJDSDZSA-N Ile-Thr-Gln Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCC(=O)N)C(=O)O)N NAFIFZNBSPWYOO-RWRJDSDZSA-N 0.000 description 1
- COWHUQXTSYTKQC-RWRJDSDZSA-N Ile-Thr-Glu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCC(=O)O)C(=O)O)N COWHUQXTSYTKQC-RWRJDSDZSA-N 0.000 description 1
- HQLSBZFLOUHQJK-STECZYCISA-N Ile-Tyr-Arg Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CC=C(C=C1)O)C(=O)N[C@@H](CCCN=C(N)N)C(=O)O)N HQLSBZFLOUHQJK-STECZYCISA-N 0.000 description 1
- 108020005350 Initiator Codon Proteins 0.000 description 1
- RCFDOSNHHZGBOY-UHFFFAOYSA-N L-isoleucyl-L-alanine Natural products CCC(C)C(N)C(=O)NC(C)C(O)=O RCFDOSNHHZGBOY-UHFFFAOYSA-N 0.000 description 1
- LHSGPCFBGJHPCY-UHFFFAOYSA-N L-leucine-L-tyrosine Natural products CC(C)CC(N)C(=O)NC(C(O)=O)CC1=CC=C(O)C=C1 LHSGPCFBGJHPCY-UHFFFAOYSA-N 0.000 description 1
- NFHJQETXTSDZSI-DCAQKATOSA-N Leu-Cys-Arg Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CS)C(=O)N[C@@H](CCCN=C(N)N)C(O)=O NFHJQETXTSDZSI-DCAQKATOSA-N 0.000 description 1
- PIHFVNPEAHFNLN-KKUMJFAQSA-N Leu-Cys-Tyr Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CS)C(=O)N[C@@H](CC1=CC=C(C=C1)O)C(=O)O)N PIHFVNPEAHFNLN-KKUMJFAQSA-N 0.000 description 1
- DLCXCECTCPKKCD-GUBZILKMSA-N Leu-Gln-Asn Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(N)=O)C(O)=O DLCXCECTCPKKCD-GUBZILKMSA-N 0.000 description 1
- OGUUKPXUTHOIAV-SDDRHHMPSA-N Leu-Glu-Pro Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)N1CCC[C@@H]1C(=O)O)N OGUUKPXUTHOIAV-SDDRHHMPSA-N 0.000 description 1
- ZFNLIDNJUWNIJL-WDCWCFNPSA-N Leu-Glu-Thr Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O ZFNLIDNJUWNIJL-WDCWCFNPSA-N 0.000 description 1
- DBSLVQBXKVKDKJ-BJDJZHNGSA-N Leu-Ile-Ala Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](C)C(O)=O DBSLVQBXKVKDKJ-BJDJZHNGSA-N 0.000 description 1
- KOSWSHVQIVTVQF-ZPFDUUQYSA-N Leu-Ile-Asp Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(O)=O)C(O)=O KOSWSHVQIVTVQF-ZPFDUUQYSA-N 0.000 description 1
- IFMPDNRWZZEZSL-SRVKXCTJSA-N Leu-Leu-Cys Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CS)C(O)=O IFMPDNRWZZEZSL-SRVKXCTJSA-N 0.000 description 1
- YOKVEHGYYQEQOP-QWRGUYRKSA-N Leu-Leu-Gly Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)NCC(O)=O YOKVEHGYYQEQOP-QWRGUYRKSA-N 0.000 description 1
- UBZGNBKMIJHOHL-BZSNNMDCSA-N Leu-Leu-Phe Chemical compound CC(C)C[C@H]([NH3+])C(=O)N[C@@H](CC(C)C)C(=O)N[C@H](C([O-])=O)CC1=CC=CC=C1 UBZGNBKMIJHOHL-BZSNNMDCSA-N 0.000 description 1
- HVHRPWQEQHIQJF-AVGNSLFASA-N Leu-Lys-Glu Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCC(O)=O)C(O)=O HVHRPWQEQHIQJF-AVGNSLFASA-N 0.000 description 1
- IZPVWNSAVUQBGP-CIUDSAMLSA-N Leu-Ser-Asp Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(O)=O)C(O)=O IZPVWNSAVUQBGP-CIUDSAMLSA-N 0.000 description 1
- LMDVGHQPPPLYAR-IHRRRGAJSA-N Leu-Val-His Chemical compound N[C@@H](CC(C)C)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC1=CNC=N1)C(=O)O LMDVGHQPPPLYAR-IHRRRGAJSA-N 0.000 description 1
- 102000003960 Ligases Human genes 0.000 description 1
- 108090000364 Ligases Proteins 0.000 description 1
- FZIJIFCXUCZHOL-CIUDSAMLSA-N Lys-Ala-Ala Chemical compound OC(=O)[C@H](C)NC(=O)[C@H](C)NC(=O)[C@@H](N)CCCCN FZIJIFCXUCZHOL-CIUDSAMLSA-N 0.000 description 1
- 108010062166 Lys-Asn-Asp Proteins 0.000 description 1
- FACUGMGEFUEBTI-SRVKXCTJSA-N Lys-Asn-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CC(N)=O)NC(=O)[C@@H](N)CCCCN FACUGMGEFUEBTI-SRVKXCTJSA-N 0.000 description 1
- FLCMXEFCTLXBTL-DCAQKATOSA-N Lys-Asp-Arg Chemical compound C(CCN)C[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CCCN=C(N)N)C(=O)O)N FLCMXEFCTLXBTL-DCAQKATOSA-N 0.000 description 1
- NNCDAORZCMPZPX-GUBZILKMSA-N Lys-Gln-Ser Chemical compound C(CCN)C[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H](CO)C(=O)O)N NNCDAORZCMPZPX-GUBZILKMSA-N 0.000 description 1
- PAMDBWYMLWOELY-SDDRHHMPSA-N Lys-Glu-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCCCN)N)C(=O)O PAMDBWYMLWOELY-SDDRHHMPSA-N 0.000 description 1
- MXMDJEJWERYPMO-XUXIUFHCSA-N Lys-Ile-Arg Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O MXMDJEJWERYPMO-XUXIUFHCSA-N 0.000 description 1
- NCZIQZYZPUPMKY-PPCPHDFISA-N Lys-Ile-Thr Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)O)C(O)=O NCZIQZYZPUPMKY-PPCPHDFISA-N 0.000 description 1
- SKRGVGLIRUGANF-AVGNSLFASA-N Lys-Leu-Glu Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(O)=O)C(O)=O SKRGVGLIRUGANF-AVGNSLFASA-N 0.000 description 1
- ODTZHNZPINULEU-KKUMJFAQSA-N Lys-Phe-Asn Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)[C@H](CCCCN)N ODTZHNZPINULEU-KKUMJFAQSA-N 0.000 description 1
- ZJSZPXISKMDJKQ-JYJNAYRXSA-N Lys-Phe-Glu Chemical compound NCCCC[C@H](N)C(=O)N[C@H](C(=O)N[C@@H](CCC(O)=O)C(O)=O)CC1=CC=CC=C1 ZJSZPXISKMDJKQ-JYJNAYRXSA-N 0.000 description 1
- MGKFCQFVPKOWOL-CIUDSAMLSA-N Lys-Ser-Asp Chemical compound C(CCN)C[C@@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(=O)O)C(=O)O)N MGKFCQFVPKOWOL-CIUDSAMLSA-N 0.000 description 1
- JOSAKOKSPXROGQ-BJDJZHNGSA-N Lys-Ser-Ile Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CO)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O JOSAKOKSPXROGQ-BJDJZHNGSA-N 0.000 description 1
- BIWVMACFGZFIEB-VFAJRCTISA-N Lys-Trp-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](CC1=CNC2=CC=CC=C21)NC(=O)[C@H](CCCCN)N)O BIWVMACFGZFIEB-VFAJRCTISA-N 0.000 description 1
- PSVAVKGDUAKZKU-BZSNNMDCSA-N Lys-Tyr-His Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)N[C@@H](CC2=CN=CN2)C(=O)O)NC(=O)[C@H](CCCCN)N)O PSVAVKGDUAKZKU-BZSNNMDCSA-N 0.000 description 1
- JQEBITVYKUCBMC-SRVKXCTJSA-N Met-Arg-Arg Chemical compound CSCC[C@H](N)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O JQEBITVYKUCBMC-SRVKXCTJSA-N 0.000 description 1
- BQVJARUIXRXDKN-DCAQKATOSA-N Met-Asn-His Chemical compound CSCC[C@H](N)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@H](C(O)=O)CC1=CNC=N1 BQVJARUIXRXDKN-DCAQKATOSA-N 0.000 description 1
- NLHSFJQUHGCWSD-PYJNHQTQSA-N Met-Ile-His Chemical compound N[C@@H](CCSC)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC1=CNC=N1)C(O)=O NLHSFJQUHGCWSD-PYJNHQTQSA-N 0.000 description 1
- LCPUWQLULVXROY-RHYQMDGZSA-N Met-Lys-Thr Chemical compound CSCC[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)O)C(O)=O LCPUWQLULVXROY-RHYQMDGZSA-N 0.000 description 1
- WUGMRIBZSVSJNP-UHFFFAOYSA-N N-L-alanyl-L-tryptophan Natural products C1=CC=C2C(CC(NC(=O)C(N)C)C(O)=O)=CNC2=C1 WUGMRIBZSVSJNP-UHFFFAOYSA-N 0.000 description 1
- 108091034117 Oligonucleotide Proteins 0.000 description 1
- 238000012408 PCR amplification Methods 0.000 description 1
- KIEPQOIQHFKQLK-PCBIJLKTSA-N Phe-Asn-Ile Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O KIEPQOIQHFKQLK-PCBIJLKTSA-N 0.000 description 1
- CUMXHKAOHNWRFQ-BZSNNMDCSA-N Phe-Asp-Tyr Chemical compound C([C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(O)=O)C1=CC=CC=C1 CUMXHKAOHNWRFQ-BZSNNMDCSA-N 0.000 description 1
- MGECUMGTSHYHEJ-QEWYBTABSA-N Phe-Glu-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@@H](N)CC1=CC=CC=C1 MGECUMGTSHYHEJ-QEWYBTABSA-N 0.000 description 1
- BFYHIHGIHGROAT-HTUGSXCWSA-N Phe-Glu-Thr Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O BFYHIHGIHGROAT-HTUGSXCWSA-N 0.000 description 1
- MJAYDXWQQUOURZ-JYJNAYRXSA-N Phe-Lys-Gln Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCC(N)=O)C(O)=O MJAYDXWQQUOURZ-JYJNAYRXSA-N 0.000 description 1
- PEFJUUYFEGBXFA-BZSNNMDCSA-N Phe-Lys-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@H](CCCCN)NC(=O)[C@@H](N)CC1=CC=CC=C1 PEFJUUYFEGBXFA-BZSNNMDCSA-N 0.000 description 1
- SCKXGHWQPPURGT-KKUMJFAQSA-N Phe-Lys-Ser Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CO)C(O)=O SCKXGHWQPPURGT-KKUMJFAQSA-N 0.000 description 1
- KAJLHCWRWDSROH-BZSNNMDCSA-N Phe-Phe-Asp Chemical compound C([C@H](N)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@@H](CC(O)=O)C(O)=O)C1=CC=CC=C1 KAJLHCWRWDSROH-BZSNNMDCSA-N 0.000 description 1
- XDMMOISUAHXXFD-SRVKXCTJSA-N Phe-Ser-Asp Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(O)=O)C(O)=O XDMMOISUAHXXFD-SRVKXCTJSA-N 0.000 description 1
- IPFXYNKCXYGSSV-KKUMJFAQSA-N Phe-Ser-Lys Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCCN)C(=O)O)N IPFXYNKCXYGSSV-KKUMJFAQSA-N 0.000 description 1
- CPRLKHJUFAXVTD-ULQDDVLXSA-N Pro-Leu-Tyr Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O CPRLKHJUFAXVTD-ULQDDVLXSA-N 0.000 description 1
- 108010003201 RGH 0205 Proteins 0.000 description 1
- 102100037486 Reverse transcriptase/ribonuclease H Human genes 0.000 description 1
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 1
- KYKKKSWGEPFUMR-NAKRPEOUSA-N Ser-Arg-Ile Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O KYKKKSWGEPFUMR-NAKRPEOUSA-N 0.000 description 1
- BGOWRLSWJCVYAQ-CIUDSAMLSA-N Ser-Asp-Leu Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(C)C)C(O)=O BGOWRLSWJCVYAQ-CIUDSAMLSA-N 0.000 description 1
- GYDFRTRSSXOZCR-ACZMJKKPSA-N Ser-Ser-Glu Chemical compound OC[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@H](C(O)=O)CCC(O)=O GYDFRTRSSXOZCR-ACZMJKKPSA-N 0.000 description 1
- SRSPTFBENMJHMR-WHFBIAKZSA-N Ser-Ser-Gly Chemical compound OC[C@H](N)C(=O)N[C@@H](CO)C(=O)NCC(O)=O SRSPTFBENMJHMR-WHFBIAKZSA-N 0.000 description 1
- HSWXBJCBYSWBPT-GUBZILKMSA-N Ser-Val-Val Chemical compound CC(C)[C@H](NC(=O)[C@@H](NC(=O)[C@@H](N)CO)C(C)C)C(O)=O HSWXBJCBYSWBPT-GUBZILKMSA-N 0.000 description 1
- 108091081024 Start codon Proteins 0.000 description 1
- QILPDQCTQZDHFM-HJGDQZAQSA-N Thr-Gln-Arg Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O QILPDQCTQZDHFM-HJGDQZAQSA-N 0.000 description 1
- LHEZGZQRLDBSRR-WDCWCFNPSA-N Thr-Glu-Leu Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(O)=O LHEZGZQRLDBSRR-WDCWCFNPSA-N 0.000 description 1
- PAXANSWUSVPFNK-IUKAMOBKSA-N Thr-Ile-Asn Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)[C@H]([C@@H](C)O)N PAXANSWUSVPFNK-IUKAMOBKSA-N 0.000 description 1
- YOOAQCZYZHGUAZ-KATARQTJSA-N Thr-Leu-Ser Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(O)=O YOOAQCZYZHGUAZ-KATARQTJSA-N 0.000 description 1
- IJVNLNRVDUTWDD-MEYUZBJRSA-N Thr-Leu-Tyr Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O IJVNLNRVDUTWDD-MEYUZBJRSA-N 0.000 description 1
- NQQMWWVVGIXUOX-SVSWQMSJSA-N Thr-Ser-Ile Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CO)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O NQQMWWVVGIXUOX-SVSWQMSJSA-N 0.000 description 1
- IEZVHOULSUULHD-XGEHTFHBSA-N Thr-Ser-Val Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CO)C(=O)N[C@@H](C(C)C)C(O)=O IEZVHOULSUULHD-XGEHTFHBSA-N 0.000 description 1
- KPMIQCXJDVKWKO-IFFSRLJSSA-N Thr-Val-Glu Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CCC(O)=O)C(O)=O KPMIQCXJDVKWKO-IFFSRLJSSA-N 0.000 description 1
- FKAPNDWDLDWZNF-QEJZJMRPSA-N Trp-Asp-Glu Chemical compound C1=CC=C2C(=C1)C(=CN2)C[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CCC(=O)O)C(=O)O)N FKAPNDWDLDWZNF-QEJZJMRPSA-N 0.000 description 1
- NLLARHRWSFNEMH-NUTKFTJISA-N Trp-Lys-Ala Chemical compound C[C@@H](C(=O)O)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CC1=CNC2=CC=CC=C21)N NLLARHRWSFNEMH-NUTKFTJISA-N 0.000 description 1
- SNWIAPVRCNYFNI-SZMVWBNQSA-N Trp-Met-Arg Chemical compound CSCC[C@@H](C(=O)N[C@@H](CCCN=C(N)N)C(=O)O)NC(=O)[C@H](CC1=CNC2=CC=CC=C21)N SNWIAPVRCNYFNI-SZMVWBNQSA-N 0.000 description 1
- AYPAIRCDLARHLM-KKUMJFAQSA-N Tyr-Asn-Lys Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CCCCN)C(=O)O)N)O AYPAIRCDLARHLM-KKUMJFAQSA-N 0.000 description 1
- KIJLSRYAUGGZIN-CFMVVWHZSA-N Tyr-Ile-Asp Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(O)=O)C(O)=O KIJLSRYAUGGZIN-CFMVVWHZSA-N 0.000 description 1
- DWAMXBFJNZIHMC-KBPBESRZSA-N Tyr-Leu-Gly Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC(C)C)C(=O)NCC(O)=O DWAMXBFJNZIHMC-KBPBESRZSA-N 0.000 description 1
- SINRIKQYQJRGDQ-MEYUZBJRSA-N Tyr-Lys-Thr Chemical compound C[C@@H](O)[C@@H](C(O)=O)NC(=O)[C@H](CCCCN)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 SINRIKQYQJRGDQ-MEYUZBJRSA-N 0.000 description 1
- PSALWJCUIAQKFW-ACRUOGEOSA-N Tyr-Phe-Lys Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CC2=CC=C(C=C2)O)N PSALWJCUIAQKFW-ACRUOGEOSA-N 0.000 description 1
- JXGUUJMPCRXMSO-HJOGWXRNSA-N Tyr-Phe-Phe Chemical compound C([C@H](N)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@@H](CC=1C=CC=CC=1)C(O)=O)C1=CC=C(O)C=C1 JXGUUJMPCRXMSO-HJOGWXRNSA-N 0.000 description 1
- SZEIFUXUTBBQFQ-STQMWFEESA-N Tyr-Pro-Gly Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N1CCC[C@H]1C(=O)NCC(O)=O SZEIFUXUTBBQFQ-STQMWFEESA-N 0.000 description 1
- MNWINJDPGBNOED-ULQDDVLXSA-N Tyr-Pro-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)CC1=CC=C(O)C=C1 MNWINJDPGBNOED-ULQDDVLXSA-N 0.000 description 1
- GQVZBMROTPEPIF-SRVKXCTJSA-N Tyr-Ser-Asp Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(O)=O)C(O)=O GQVZBMROTPEPIF-SRVKXCTJSA-N 0.000 description 1
- OGNMURQZFMHFFD-NHCYSSNCSA-N Val-Asn-Lys Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CCCCN)C(=O)O)N OGNMURQZFMHFFD-NHCYSSNCSA-N 0.000 description 1
- FBVUOEYVGNMRMD-NAKRPEOUSA-N Val-Cys-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CS)NC(=O)[C@H](C(C)C)N FBVUOEYVGNMRMD-NAKRPEOUSA-N 0.000 description 1
- CVIXTAITYJQMPE-LAEOZQHASA-N Val-Glu-Asn Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(N)=O)C(O)=O CVIXTAITYJQMPE-LAEOZQHASA-N 0.000 description 1
- CELJCNRXKZPTCX-XPUUQOCRSA-N Val-Gly-Ala Chemical compound CC(C)[C@H](N)C(=O)NCC(=O)N[C@@H](C)C(O)=O CELJCNRXKZPTCX-XPUUQOCRSA-N 0.000 description 1
- SVFRYKBZHUGKLP-QXEWZRGKSA-N Val-Met-Asn Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CC(=O)N)C(=O)O)N SVFRYKBZHUGKLP-QXEWZRGKSA-N 0.000 description 1
- OFQGGTGZTOTLGH-NHCYSSNCSA-N Val-Met-Gln Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CCC(=O)N)C(=O)O)N OFQGGTGZTOTLGH-NHCYSSNCSA-N 0.000 description 1
- CKTMJBPRVQWPHU-JSGCOSHPSA-N Val-Phe-Gly Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)NCC(=O)O)N CKTMJBPRVQWPHU-JSGCOSHPSA-N 0.000 description 1
- XBJKAZATRJBDCU-GUBZILKMSA-N Val-Pro-Ala Chemical compound CC(C)[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](C)C(O)=O XBJKAZATRJBDCU-GUBZILKMSA-N 0.000 description 1
- PGBMPFKFKXYROZ-UFYCRDLUSA-N Val-Tyr-Phe Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC1=CC=C(C=C1)O)C(=O)N[C@@H](CC2=CC=CC=C2)C(=O)O)N PGBMPFKFKXYROZ-UFYCRDLUSA-N 0.000 description 1
- 108010047495 alanylglycine Proteins 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000000137 annealing Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 238000001816 cooling Methods 0.000 description 1
- 238000012258 culturing Methods 0.000 description 1
- 108010069495 cysteinyltyrosine Proteins 0.000 description 1
- 229940104302 cytosine Drugs 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000007865 diluting Methods 0.000 description 1
- 108010054813 diprotin B Proteins 0.000 description 1
- 230000005782 double-strand break Effects 0.000 description 1
- 230000011559 double-strand break repair via nonhomologous end joining Effects 0.000 description 1
- 239000012636 effector Substances 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000003209 gene knockout Methods 0.000 description 1
- 108010057083 glutamyl-aspartyl-leucine Proteins 0.000 description 1
- 108010027668 glycyl-alanyl-valine Proteins 0.000 description 1
- 108010025306 histidylleucine Proteins 0.000 description 1
- 108010018006 histidylserine Proteins 0.000 description 1
- 230000007062 hydrolysis Effects 0.000 description 1
- 238000006460 hydrolysis reaction Methods 0.000 description 1
- 230000036737 immune function Effects 0.000 description 1
- 238000011419 induction treatment Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 108010078274 isoleucylvaline Proteins 0.000 description 1
- 231100000518 lethal Toxicity 0.000 description 1
- 230000001665 lethal effect Effects 0.000 description 1
- 108010012058 leucyltyrosine Proteins 0.000 description 1
- 108010017391 lysylvaline Proteins 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000035800 maturation Effects 0.000 description 1
- 108020004999 messenger RNA Proteins 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000001821 nucleic acid purification Methods 0.000 description 1
- 108010012581 phenylalanylglutamate Proteins 0.000 description 1
- 239000000843 powder Substances 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 108010004914 prolylarginine Proteins 0.000 description 1
- 108010070643 prolylglutamic acid Proteins 0.000 description 1
- 108010053725 prolylvaline Proteins 0.000 description 1
- 238000002818 protein evolution Methods 0.000 description 1
- 230000017854 proteolysis Effects 0.000 description 1
- 230000008263 repair mechanism Effects 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 108091008146 restriction endonucleases Proteins 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 102200012576 rs111033648 Human genes 0.000 description 1
- 102220273513 rs373435521 Human genes 0.000 description 1
- 102220089709 rs869320709 Human genes 0.000 description 1
- 239000000523 sample Substances 0.000 description 1
- 238000007480 sanger sequencing Methods 0.000 description 1
- 230000003248 secreting effect Effects 0.000 description 1
- 108010015840 seryl-prolyl-lysyl-lysine Proteins 0.000 description 1
- 235000020183 skimmed milk Nutrition 0.000 description 1
- 239000011780 sodium chloride Substances 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 108091035705 tRNA adenine Proteins 0.000 description 1
- 108010072986 threonyl-seryl-lysine Proteins 0.000 description 1
- 239000012137 tryptone Substances 0.000 description 1
- 108010051110 tyrosyl-lysine Proteins 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/78—Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/74—Vectors or expression systems specially adapted for prokaryotic hosts other than E. coli, e.g. Lactobacillus, Micromonospora
- C12N15/75—Vectors or expression systems specially adapted for prokaryotic hosts other than E. coli, e.g. Lactobacillus, Micromonospora for Bacillus
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases RNAses, DNAses
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y305/00—Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
- C12Y305/04—Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
- C12Y305/04004—Adenosine deaminase (3.5.4.4)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
- C12N2310/20—Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Genetics & Genomics (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Wood Science & Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Zoology (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- Biochemistry (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Microbiology (AREA)
- Medicinal Chemistry (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- Plant Pathology (AREA)
- Enzymes And Modification Thereof (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
Abstract
The invention relates to a gene editing fusion protein, an adenine base editor constructed by the same and application thereof. The amino acid sequence of the gene editing fusion protein is one of sequences shown in SEQ ID NO.2-4, and an adenine base editor constructed based on the gene editing fusion protein also comprises a crRNA array insertion region for converting genome multisite adenine into guanine. The adenine base editor constructed by the invention solves the problems that the available range of the existing adenine base editor on a genome is limited, and the efficiency is low and the operation is complex in the multi-site base editing process, and provides the adenine base editor which has multiple editing sites, large editing window and high editing efficiency and editing activity.
Description
Technical Field
The invention relates to the technical field of biology, in particular to a gene editing fusion protein, an adenine base editor constructed by the same and application thereof.
Background
The gene editing is a technical means for achieving gene knockout, exogenous DNA fragment insertion or DNA base mutation by introducing sequence change at a specific site on DNA. The CRISPR/Cas system has very wide application in gene editing, and three CRISPR/Cas systems, type I, type II and Type III, have been found, wherein the Type II system is the most successful artificial nuclease for current modification, and only one Cas9 protein is needed to play the immune function of the CRISPR system. In this system, crRNA (CRISPR-extended RNA) binds to the tracrRNA (trans-activating crRNA) by base pairing to form a tracrRNA/crRNA complex that directs the nuclease Cas9 protein to cleave double-stranded DNA at the sequence target site paired with the crRNA. The action process comprises three stages: (1) obtaining a spacer of CRISPR: bacteria integrate between the repeats of their own genome, i.e. between the two repeats of the CRISPR 5' end, through a small DNA sequence of the invading phage or plasmid, thus obtaining a highly variable spacer region; (2) expression of CRISPR locus and maturation of crRNA: the CRISPR locus is first transcribed into precursor CRISPR RNA (pre-crRNA) and then sheared into mature crRNA by Cas9 and nuclease; (3) cleavage of foreign genetic material by CRISPR/Cas system: mature tracrRNA, crRNA and Cas9 form ribonucleoprotein complexes, and crrnas recognize and bind to exogenously matched DNA sequences, thereby mediating cleavage of exogenously genetic material by ribonucleoprotein complexes.
To simplify the process and increase the editing efficiency, crRNA and tracrRNA are often constructed as one chimera sgRNA (small guide RNA) for expression, i.e. only sgRNA and Cas9 protein need to be expressed for gene editing. In addition to the CRISPR/Cas9 system, the CRISPR/Cpf1 system is also commonly applied to gene editing at present, and unlike the CRISPR/Cas9 system, the CRISPR/Cpf1 only needs crRNA to play a role, cpf1 has the activity of RNase and can cut and process an immature mRNA sequence containing a plurality of crRNAs, so that a section of crRNA array containing a plurality of crRNAs can be designed, a plurality of crRNAs with independent functions can be generated, and the CRRNAs can guide Cpf1 to target corresponding targets of a genome, so that simultaneous editing of a plurality of sites is realized.
Many species lack the NHEJ pathway (even though it is less active), while the HDR pathway requires the participation of a homologous template to function, and there is also a competing relationship between the two repair mechanisms, which results in higher mortality and lower editing efficiency of gene editing processes based on the above approach. After inactivation of the dnase of Cas9, nCas9 that can cleave only one DNA strand and dCas9 that cannot cleave DNA can be obtained. nCas9 and dCas9 can still bind to specific sites of the genome under the guidance of sgrnas, but without generating a lethal double strand break. The tRNA adenosine deaminase tadA mutant tadA7.10 of the escherichia coli can deaminate adenine (A) to form inosine (I), and the inosine can be taken as guanine (G) to read and replicate at the DNA level, so that the conversion from A to G is finally realized. Thus, fusion proteins obtained after fusion of tada7.10 to the N-terminus of either nCas9 or dCas9 (dCas 9-ABE or nCas 9-ABE) can effect a→g conversion of a specific site on the genome under the guidance of sgrnas, known as adenine base editing systems.
Chinese patent CN201811613264.9 discloses a CRISPR/Cas9 mediated adenine base editing system, which implements single base site-directed substitution (a > G) and is used to improve rice blast broad-spectrum resistance, but when the system is used for multi-site base editing, multiple corresponding sgRNA expression frames need to be constructed, which not only increases complexity of the construction process, but also reduces stability of DNA sequence due to repeated use of elements such as promoters, and the site identified by Cas9 needs to have PAM sequence of NGG (n=a, T, C, G), which limits the operable range of Cas 9-based cytosine base editor on genome; chinese patent CN201811563073.6 discloses a plant base editing method comprising a base editing fusion protein formed by a nuclease inactivated CRISPR effector protein, which is a Cas9 nuclease or a Cpf1 nuclease, and a DNA dependent adenine deaminase, which is a variant of escherichia coli tRNA adenine deaminase TadA (ecTadA) comprising one or more sets of mutations selected from the group consisting of: 1) a106V and D108N; 2) D147Y and E155V; 3) L84F, H Y and I156F; 4) A142N; 5) H36L, R51L, S C and K157N; 6) P48S/T/A; 7) A142N; 8) W23L/R; 9) R152H/P; chinese patent 201811578853.8 discloses a base editing system based on CPF1 protein, wherein the base editing fusion protein comprises CPF1 (D917A mutation) with a deletion of DNA cleavage activity and deaminase, the mutant of which is the same as above, capable of effecting substitution of one or more C to T or a to G in the target sequence. The above system only uses the ABE7.9 (TadA 7.9) and ABE7.10 (TadA 7.10) mutants of the ecTadA which are widely used at present, and the performance of the base editor needs to be further improved. Therefore, the invention aims to find an adenine base editor with multiple editing sites, large editing window, high editing efficiency and high editing activity.
Disclosure of Invention
In order to solve the technical problems, the invention provides an adenine base editor based on CRISPR/Cpf1, which solves the problems that the available range of the existing A > G adenine base editor on a genome is limited, and the efficiency is low and the operation is complex in the multi-site base editing process.
The first object of the present invention is to provide a gene editing fusion protein having an amino acid sequence of one of the sequences shown in SEQ ID NO. 2-4. Wherein SEQ ID NO.2 is a fusion protein sequence containing TadA8.20, SEQ ID NO.3 is a fusion protein sequence containing TadA8e, and SEQ ID NO.4 is a fusion protein sequence containing TadA 9.
The invention provides a gene editing fusion protein which can recognize TTV as PAM, wherein V=A, C and G, and when the fusion protein is used for gene editing, an editing window edits adenine bases into guanine between 9 th base, 7 th base and 9 th base or between 4 th base and 23 rd base of crRNA.
It is a second object of the present invention to provide an adenine base editor comprising the above gene editing fusion protein and crRNA array insertion region. The crRNA inserted by the crRNA array insertion region is matched with the Cpf1 mutant dCPf1 inactivated by DNase, and the Cpf1 mutant dCPf1 inactivated by DNase is guided to be sheared and edited to a specific target.
Further, the nucleotide sequence of the crRNA array insertion region is shown as SEQ ID NO. 5. The two ends of the crRNA array insertion region contain two forward repeated crRNA handle sequences, two reverse Eco31I enzyme cutting sites are inserted in the middle, and a required recognition sequence or crRNA array can be placed between the two handle sequences through enzyme cutting connection.
Further, the above gene editing fusion protein is obtained by inducible promoter P grac100 Regulating and controlling expression.
Further, crRNA array insertion region is passed through constitutive promoter P veg Regulating and controlling expression.
Further, the adenine base editor is obtained by integrating a gene encoding a fusion protein of SEQ ID NO.2-4 into an expression vector and inserting a crRNA array insertion region into the expression vector.
Further, when the adenine base editor is used for gene editing, the above expression vector is introduced into a eukaryotic organism or a prokaryotic organism.
Further, the adenine base editor includes: a plasmid comprising the adenine nucleotide editor and the crRNA array insertion region.
Further, the nucleotide sequence of the plasmid is one of SEQ ID NO. 9-11. Wherein SEQ ID NO.9 is a plasmid sequence containing TadA8.20, SEQ ID NO.10 is a plasmid sequence containing TadA8e, and SEQ ID NO.11 is a plasmid sequence containing TadA 9.
It is a third object of the present invention to provide the use of the adenine base editor as described above in gene editing.
Further, the adenine base editor is used for converting a multi-site adenine on the Bacillus subtilis genome into guanine.
Further, the adenine base editor described above is used to obtain mutants or for extracellular protease inactivation.
Further, the bacillus subtilis is bacillus subtilis Bacillus subtilis, 168.
By means of the scheme, the invention has at least the following advantages:
the invention designs and constructs an adenine base editor (dCPf 1-ABE) based on CRISPR/Cpf1, and can simultaneously carry out base (A-G) at 5 sites of a genome by utilizing one crRNA array, thereby generating extremely rich mutant combinations, widening the operable range of the adenine base editor on the genome, ensuring that the editing efficiency of part sites reaches 100 percent, and having high editing activity and large editing window.
The foregoing description is only an overview of the present invention, and is presented in terms of preferred embodiments of the present invention and the following detailed description of the invention in conjunction with the accompanying drawings.
Drawings
In order that the invention may be more readily understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof that are illustrated in the appended drawings.
FIG. 1 is a schematic of a CRISPR/Cpf 1-based adenine base editor;
FIG. 2 is a plasmid map of the Bacillus subtilis adenine base editing system;
FIG. 3 is an adenine base editor-mediated multi-site base editing constructed when the adenosine deaminase selects TadA 9;
FIG. 4 is a specific composition of mutants produced after treatment by a multisite adenine base editing system;
FIG. 5 shows the proteolysis of extracellular proteases after inactivation using a multisite adenine base editing system.
Detailed Description
The present invention will be further described with reference to the accompanying drawings and specific examples, which are not intended to be limiting, so that those skilled in the art will better understand the invention and practice it.
Materials and reagents involved in the experiment:
DNA polymerase was purchased from Takara, restriction enzymes and T4 ligase were purchased from NEB, plasmid extraction kit was purchased from Biotechnology (Shanghai) Co., ltd, and PCR product nucleic acid purification kit was purchased from Thermo Scientific.
The cells were cultured using LB medium, which contained: 10g/L of tryptone, 5g/L of yeast powder and 10g/L of NaCl. The final concentration of kanamycin in the medium was 50. Mu.g/mL and that of IPTG was 1mM.
Example 1 design construction of CRISPR/Cpf 1-based adenine base editor
As shown in FIG. 1A, a CRISPR/Cpf 1-based adenine editing system was constructed, consisting of two basic elements:
(1) A fusion protein consisting of a tRNA adenosine deaminase TadA mutant of escherichia coli and a DNase inactivating mutant (D917A) dCPf1 of Cpf1, namely an adenine base editor dCPf1-ABE, which can be bound to a specific target point of a genome with the help of dCPf1 and convert adenine (A) therein into guanine (G) under the action of adenine deaminase;
specifically, mutant TadA8.17 (TadA 7.10V82S, Q154R), tadA8.20 (TadA 7.10I 76Y, V S, Y123H, Y147R, Q154R), tadA8e (TadA 7.10 a109S, T111R, D119N, H122N, Y147 4815 149Y, T166I, D167N) and TadA9 (TadA 7.10V82S, A109S, T111 65119N, H122N, Y147D, F149Y, Q R, T I, D167N) of tRNA adenosine deaminase TadA7.10 were fused to the N-terminus of dCpf1 via a short peptide linker (SGGSSGGSSGSETPGTSESATPESSGGSSGGS), respectively, and the resulting fused amino acid sequences were shown in sequences 1 to 4, respectively. In addition, a crRNA array insertion region shown in the DNA sequence 5 is designed, two ends of the region comprise two crRNA handle sequences of direct repeat, two reverse Eco31I enzyme cutting sites are inserted in the middle, and a required recognition sequence or crRNA array can be placed between the two handle sequences through enzyme cutting connection.
(2) And the crRNA is matched with Cpf1, comprises an immobilized handle sequence and a recognition sequence complementary to a target spot, dCPf1-ABE can be combined with the crRNA to form a complex under the interaction between dCPf1 and the crRNA handle sequence, and the specific target spot of a genome is recognized and targeted under the action of the complementary sequence, so that base editing (A-G) is realized. Since dCpf1 still has rnase activity, a plurality of crrnas can be expressed in an array form, and after the crRNA array is cut by dCpf1, dCpf1-ABE can be targeted to a plurality of sites to realize a base editing (a-G) process (fig. 1B).
Example 2 use of CRISPR/Cpf 1-based adenine base editor in Multi-site base editing
The verification and application of adenine base editing system based on CRISPR/Cpf1 are carried out in bacillus subtilis. As shown in FIG. 2, IPTG-induced P was used grac100 Promoters expressed the adenine base editors dCPf1-ABE of different constructs and placed the crRNA array insert into constitutive promoter P veg Then, to achieve expression of the crRNA array, both expression cassettes were placed on a plasmid containing the thermo-responsive replicon pE194 for base editing in B.subtilis (A.fwdarw.G).
The plasmid construction specifically comprises the following steps: the plasmid backbone used was from pJOE8999 (Altenbuchner, J.,2016.Editing of the Bacillus subtilis genome by the CRISPR-Cas9 system. Applied and Environmental Microbiology, 5421-5427), which is kanamycin resistant (KanR) in E.coli and B.subtilis, and which has multiple copies of replicon pBR322 for plasmid construction and preservation in E.coli, and which has temperature sensitive replicon pE194 in B.subtilis (stable replication at 30 ℃ C. And elimination at 50 ℃ C.); p used grac100 The promoter and the dCPf1 proteins are derived from the plasmid pLCg6-dCPf1 (Wu, Y., liu, Y., lv, X., li, J., du, G., liu, L.,2020.CAMERS-B: CRISPR/Cpf1assisted multiple-genes editing and regulation system for Bacillus polypeptides. Biotechnology and Bioengineering, 1817-1825); crRNA array insertion region and promoter P veg From plasmid pcra2 (Wu, y., liu, y., lv, X).Li, J., du, G., liu, L.,2020.CAMERS-B: CRISPR/Cpf1assisted multiple-genes editing and regulation system for Bacillus peptides, biotechnology and Bioengineering 117, 1817-1825); several mutants of adenine deaminase are obtained by means of gene synthesis. After PCR amplification, the fragments were ligated using a seamless ligation kit (Biyundian D7010M) and transformed into E.coli for sequencing and storage. Plasmid sequences comprising TadA8.17 (TadA 7.10V82 3834R), tadA8.20 (TadA 7.10I 76Y, V I82S, Y123H, Y147R, Q R), tadA8e (TadA 7.10A 109S, T111R, D N, H122N, Y149Y, T166I, D N) and TadA9 (TadA 7.10V82S, A109S, T111R, D119N, H122N, Y D, F149Y, Q probe 6767 166 167N) are shown as SEQ ID NO 8-11, respectively, plasmids of sequences 9-11 are specifically https:// benchling 6/s/SEQ-8 cQian 0nzb DwQoWwom=slm-JMjZL6 324 dod4 Dod4 GNOX, tt:/benchl 7/s/7-7 Qqqqqqqqqqqq45=35 Mv/45 mmjMhz=6-QshMhz 35 Mhz.
5 crRNAs as shown in Table 1 were designed in total by performing multi-site base editing using 5 sites in the Bacillus subtilis aprE gene and the nprE gene as targets, and editing plasmids capable of completing the above 5 site base editing (A.fwdarw.G.) were obtained by assembling 5 crRNAs into crRNA array insertion regions by the SOMACA (Synthetic Oligos Mediated Assembly of crRNA Array) method (references: wu, Y., liu, Y., lv, X., li, J., du, G., liu, L.,2020.CAMERS-B: CRISPR/Cpf1assisted multiple-genes editing and regulation system for Bacillus polypeptides. Biotechnology and Bioengineering 117, 1817-1825).
Specifically, when expressing a single crRNA, the crRNA can be directly obtained by annealing a pair of primers with an overlapping region (the concentration of the primers is 10uM, the concentration of the upstream primer and the downstream primer in a 20uL system are 10uL respectively, the reaction conditions are that 2min at 98 ℃ and heat preservation after cooling to 4 ℃ at 0.1 ℃/S), and 1uL of the degraded product is diluted 10 times and then is connected with a carrier which is cut by using Eco 31I. When designing a plurality of crRNAs to form a crRNA array, firstly, carrying out PCR (the primer concentration is 10uM, 10uL of DNA polymerase is contained in a 20uL system, 5uL of upstream and downstream primers are contained in each system, standard PCR program is used, the extension time is 5 seconds and 10 cycles are used), diluting 10 times after the completion, obtaining double-stranded DNA with the Eco31I at two ends, and carrying out golden gate assembly on 1uL of each double-stranded DNA and plasmids. After transformation of the above products into E.coli, colony PCR was performed by selecting one primer on each of the plasmid backbone and crRNA, and single colonies with bands were picked for sequencing to screen positive clones containing the desired crRNA.
TABLE 1 crRNA for leading Multi-site base editing (A.fwdarw.G)
The operation steps of multi-site base editing (A.fwdarw.G) and analysis are as follows: first, the edited plasmid was transformed into Bacillus subtilis Bacillus subtilis, plated with kanamycin-containing plates, and incubated at 30℃for 12h; then, 1 single colony is selected and inoculated into a14 mL shaking tube filled with 2mL LB culture medium (containing kanamycin), and the shaking culture is carried out for 12 hours at 30 ℃; subsequently, 5 mu L of bacterial liquid is transferred into a14 mL fungus shaking tube containing 2mL LB culture medium (containing kanamycin and IPTG), and is subjected to shaking culture at 30 ℃ for 12 hours to induce the expression of a base editing system for multi-site base editing (A-G); finally, 1. Mu.L of the induced bacterial liquid is taken into a 50. Mu.L PCR reaction system, and the aprE and nprE genes are amplified and analyzed by sanger sequencing. Primers used for aprE and nprE amplification are shown in Table 2, respectively, and the DNA fragments were purified and sequenced using the primers shown in Table 2, and the sequencing results were analyzed using BEAT software (https:// hanlab. Cc/bat /). As the plasmid is temperature sensitive, after editing, the bacterial liquid can be streaked on LB plate without antibiotics, and the plasmid can be eliminated after culturing overnight at 50 ℃.
TABLE 2 primer sequences
Adenine base editor (dCPf 1-ABE) mediated multi-site base editing containing different adenine deaminase tadA mutants is shown in Table 3: wherein when TadA9 and dCPf1 are fused, the base editing efficiency is highest, all 5 sites are converted from A to G, the editing window is between the 4 th base and the 23 rd base of crRNA, and the editing efficiency of part of sites reaches 100%; when TadA8e is fused, the base editing efficiency is slightly lower than that of TadA9, only the A in the site 2, the site 3 and the site 4 is converted into G, and the editing window is also narrower, namely the 7 th base to the 9 th base of the crRNA; when TadA8.20 is fused, the base editing efficiency is lower than that of TadA9 and TadA8e, only the A in the position 2 and the position 3 is converted into G, the editing window is also narrower, and editing only occurs at the 9 th base of crRNA; while base editing cannot be realized at 5 sites when TadA8.17 is fused.
TABLE 3 adenine base editor mediated multi-site base editing (A.fwdarw.G) and efficiency thereof
Example 3 Effect of Induction time on efficiency of Multi-site base editing
Because the adenine base editor containing the TadA7.10 mutant TadA9 has highest efficiency, the simultaneous editing of five sites can be realized. Therefore, the influence of the induction time of IPTG on the base editing efficiency of the base editor was examined again.
As shown in table 4, the editing efficiency of each site was improved to some extent with the increase of the induction time, but the amplitude was limited, and the editing window was not changed with the increase of the induction time. The above results indicate that induction of 12h is sufficient to achieve a more complete editing. The results of the edited sequencing after 36h of induction are shown in FIG. 3.
TABLE 4 influence of Induction time on efficiency of Multi-site base editing
EXAMPLE 4 analysis of mutant Generation by Multi-site adenine base editing System
To analyze the specific composition of the mutant obtained after the treatment with adenine base editor, the bacterial solution after 36 hours of induction treatment with the adenine base editor containing adenine deaminase TadA7.10 mutant TadA9 was streaked onto LB plates without antibiotics and incubated at 37 ℃ overnight. Then, 8 single colonies were picked to amplify the aprE and nprE sites, respectively, for sequencing analysis.
As shown in FIG. 4, only a specific crRNA array needs to be designed and constructed, so that abundant mutants with different mutation combinations can be generated simultaneously, which is very valuable for protein evolution and strain transformation; furthermore, we also observed that mutation sites that were not detected using mixed template sequencing (e.g., a10G in site 2, a11, a12G, and a15G in site 3) indicated that some sites may have very inefficient mutations that could not be detected by the method of example 2, which also indicated that the base editor had a larger edit window. In addition, some of the low frequency mutation sites in Table 4 may not be detected in the above mutants because fewer colonies were picked.
Example 5 use of a Multisite adenine base editor in extracellular protease inactivation
There are 6 major proteases in bacillus subtilis, which hydrolyze proteins secreted to the outside of the cell, and thus are extremely disadvantageous for efficient secretory expression of the target protein. Thus, we mutated A (A.fwdarw.G) in the complementary strand of the 6 initiation codons (ATG, TTG or GTG) thickened in the table using the 6 crRNAs shown in Table 5, so that T in the initiation codon was mutated to C and thus unable to initiate expression, resulting in inactivation of the protease. After all of the 6 mutant strains with all of the proteases inactivated were selected by sequencing analysis, and the extracellular protease activity was examined using a skim milk plate, the results showed that the mutant strain no longer had a hydrolysis circle compared to the wild-type strain, indicating that the extracellular protease activity had been successfully eliminated (FIG. 5).
TABLE 5 crRNA for directing the inactivation of extracellular proteases
Comparative example 1 other adenine deaminase TadA mutant mediated base editing
The earliest adenine deaminase TadA mutant used in CRISPR/Cas 9-based adenine base editing systems was TadA7.10, and one original TadA and one mutant TadA7.10 were often fused simultaneously to the N-terminus of nCas 9. Both Chinese patent CN201811563073.6 and Chinese patent 201811578853.8 also use the mutant to construct adenine base editing system and apply to rice. Thus, it was further attempted here to fuse TadA with TadA7.10 to the N-terminus of dCpf1 to construct an adenine base editing system based on CRISPR/Cpf1as shown in sequence 6.
However, it was found that, like TadA8.17, tadA7.10 did not have any base editing ability (a→g) after fusion with dCpf1, indicating that dCpf1 did not match with the TadA mutant described above, and base editing was not completed (a→g); alternatively, the mutation produced by the adenine base editor was too inefficient to be detected by the analytical method of example 2.
Comparative example 2 Cpf1 mutant mediated base editing with different DNase inactivation
dCPf1 in addition to the usual D917A, the double site inactivated mutant ddCpf1 (D917A, E1006A) also had only its DNase activity inactivated, but retained its RNase activity, so we constructed an adenine base editor composed of TadA9 and ddCpf 1as shown in SEQ ID NO. 7 and used 5 crRNAs in example 2 for verification.
As shown in Table 6, 5-site simultaneous mutation was similarly possible using an adenine base editor composed of TadA9 and ddCpf1, but the mutation efficiency was lower than that of dCPf 1.
TABLE 6 Cpf1 mutant mediated base editing with different DNase inactivation
Comparative example 3 Effect of ligation on the base editing System constructed
As described in example 1, the adenine deaminase mutant of the present invention was ligated to dCPf1 using a short peptide linker (SGGSSGGSSGSETPGTSESATPESSGGSSGGS), and in order to verify the effect of the ligation on the base editing system, we tried to remove the short peptide on TadA9 and dCPf1 and thereby directly ligate the two proteins. However, the fusion proteins after removal of the linker peptide, when validated using the 5 crrnas in example 3, were no longer active for base editing, indicating that the linker peptide was necessary for proper operation of the base editor.
It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations and modifications of the present invention will be apparent to those of ordinary skill in the art in light of the foregoing description. It is not necessary here nor is it exhaustive of all embodiments. And obvious variations or modifications thereof are contemplated as falling within the scope of the present invention.
Sequence listing
<110> university of Jiangnan
<120> a gene editing fusion protein, adenine base editor constructed by the same and use thereof
<160> 11
<170> SIPOSequenceListing 1.0
<210> 1
<211> 1509
<212> PRT
<213> (Artificial sequence)
<400> 1
Met Ser Glu Val Glu Phe Ser His Glu Tyr Trp Met Arg His Ala Leu
1 5 10 15
Thr Leu Ala Lys Arg Ala Arg Asp Glu Arg Glu Val Pro Val Gly Ala
20 25 30
Val Leu Val Leu Asn Asn Arg Val Ile Gly Glu Gly Trp Asn Arg Ala
35 40 45
Ile Gly Leu His Asp Pro Thr Ala His Ala Glu Ile Met Ala Leu Arg
50 55 60
Gln Gly Gly Leu Val Met Gln Asn Tyr Arg Leu Ile Asp Ala Thr Leu
65 70 75 80
Tyr Ser Thr Phe Glu Pro Cys Val Met Cys Ala Gly Ala Met Ile His
85 90 95
Ser Arg Ile Gly Arg Val Val Phe Gly Val Arg Asn Ala Lys Thr Gly
100 105 110
Ala Ala Gly Ser Leu Met Asp Val Leu His Tyr Pro Gly Met Asn His
115 120 125
Arg Val Glu Ile Thr Glu Gly Ile Leu Ala Asp Glu Cys Ala Ala Leu
130 135 140
Leu Cys Tyr Phe Phe Arg Met Pro Arg Arg Val Phe Asn Ala Gln Lys
145 150 155 160
Lys Ala Gln Ser Ser Thr Asp Ser Gly Gly Ser Ser Gly Gly Ser Ser
165 170 175
Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser Ser
180 185 190
Gly Gly Ser Ser Gly Gly Ser Ser Ile Tyr Gln Glu Phe Val Asn Lys
195 200 205
Tyr Ser Leu Ser Lys Thr Leu Arg Phe Glu Leu Ile Pro Gln Gly Lys
210 215 220
Thr Leu Glu Asn Ile Lys Ala Arg Gly Leu Ile Leu Asp Asp Glu Lys
225 230 235 240
Arg Ala Lys Asp Tyr Lys Lys Ala Lys Gln Ile Ile Asp Lys Tyr His
245 250 255
Gln Phe Phe Ile Glu Glu Ile Leu Ser Ser Val Cys Ile Ser Glu Asp
260 265 270
Leu Leu Gln Asn Tyr Ser Asp Val Tyr Phe Lys Leu Lys Lys Ser Asp
275 280 285
Asp Asp Asn Leu Gln Lys Asp Phe Lys Ser Ala Lys Asp Thr Ile Lys
290 295 300
Lys Gln Ile Ser Glu Tyr Ile Lys Asp Ser Glu Lys Phe Lys Asn Leu
305 310 315 320
Phe Asn Gln Asn Leu Ile Asp Ala Lys Lys Gly Gln Glu Ser Asp Leu
325 330 335
Ile Leu Trp Leu Lys Gln Ser Lys Asp Asn Gly Ile Glu Leu Phe Lys
340 345 350
Ala Asn Ser Asp Ile Thr Asp Ile Asp Glu Ala Leu Glu Ile Ile Lys
355 360 365
Ser Phe Lys Gly Trp Thr Thr Tyr Phe Lys Gly Phe His Glu Asn Arg
370 375 380
Lys Asn Val Tyr Ser Ser Asn Asp Ile Pro Thr Ser Ile Ile Tyr Arg
385 390 395 400
Ile Val Asp Asp Asn Leu Pro Lys Phe Leu Glu Asn Lys Ala Lys Tyr
405 410 415
Glu Ser Leu Lys Asp Lys Ala Pro Glu Ala Ile Asn Tyr Glu Gln Ile
420 425 430
Lys Lys Asp Leu Ala Glu Glu Leu Thr Phe Asp Ile Asp Tyr Lys Thr
435 440 445
Ser Glu Val Asn Gln Arg Val Phe Ser Leu Asp Glu Val Phe Glu Ile
450 455 460
Ala Asn Phe Asn Asn Tyr Leu Asn Gln Ser Gly Ile Thr Lys Phe Asn
465 470 475 480
Thr Ile Ile Gly Gly Lys Phe Val Asn Gly Glu Asn Thr Lys Arg Lys
485 490 495
Gly Ile Asn Glu Tyr Ile Asn Leu Tyr Ser Gln Gln Ile Asn Asp Lys
500 505 510
Thr Leu Lys Lys Tyr Lys Met Ser Val Leu Phe Lys Gln Ile Leu Ser
515 520 525
Asp Thr Glu Ser Lys Ser Phe Val Ile Asp Lys Leu Glu Asp Asp Ser
530 535 540
Asp Val Val Thr Thr Met Gln Ser Phe Tyr Glu Gln Ile Ala Ala Phe
545 550 555 560
Lys Thr Val Glu Glu Lys Ser Ile Lys Glu Thr Leu Ser Leu Leu Phe
565 570 575
Asp Asp Leu Lys Ala Gln Lys Leu Asp Leu Ser Lys Ile Tyr Phe Lys
580 585 590
Asn Asp Lys Ser Leu Thr Asp Leu Ser Gln Gln Val Phe Asp Asp Tyr
595 600 605
Ser Val Ile Gly Thr Ala Val Leu Glu Tyr Ile Thr Gln Gln Ile Ala
610 615 620
Pro Lys Asn Leu Asp Asn Pro Ser Lys Lys Glu Gln Glu Leu Ile Ala
625 630 635 640
Lys Lys Thr Glu Lys Ala Lys Tyr Leu Ser Leu Glu Thr Ile Lys Leu
645 650 655
Ala Leu Glu Glu Phe Asn Lys His Arg Asp Ile Asp Lys Gln Cys Arg
660 665 670
Phe Glu Glu Ile Leu Ala Asn Phe Ala Ala Ile Pro Met Ile Phe Asp
675 680 685
Glu Ile Ala Gln Asn Lys Asp Asn Leu Ala Gln Ile Ser Ile Lys Tyr
690 695 700
Gln Asn Gln Gly Lys Lys Asp Leu Leu Gln Ala Ser Ala Glu Asp Asp
705 710 715 720
Val Lys Ala Ile Lys Asp Leu Leu Asp Gln Thr Asn Asn Leu Leu His
725 730 735
Lys Leu Lys Ile Phe His Ile Ser Gln Ser Glu Asp Lys Ala Asn Ile
740 745 750
Leu Asp Lys Asp Glu His Phe Tyr Leu Val Phe Glu Glu Cys Tyr Phe
755 760 765
Glu Leu Ala Asn Ile Val Pro Leu Tyr Asn Lys Ile Arg Asn Tyr Ile
770 775 780
Thr Gln Lys Pro Tyr Ser Asp Glu Lys Phe Lys Leu Asn Phe Glu Asn
785 790 795 800
Ser Thr Leu Ala Asn Gly Trp Asp Lys Asn Lys Glu Pro Asp Asn Thr
805 810 815
Ala Ile Leu Phe Ile Lys Asp Asp Lys Tyr Tyr Leu Gly Val Met Asn
820 825 830
Lys Lys Asn Asn Lys Ile Phe Asp Asp Lys Ala Ile Lys Glu Asn Lys
835 840 845
Gly Glu Gly Tyr Lys Lys Ile Val Tyr Lys Leu Leu Pro Gly Ala Asn
850 855 860
Lys Met Leu Pro Lys Val Phe Phe Ser Ala Lys Ser Ile Lys Phe Tyr
865 870 875 880
Asn Pro Ser Glu Asp Ile Leu Arg Ile Arg Asn His Ser Thr His Thr
885 890 895
Lys Asn Gly Ser Pro Gln Lys Gly Tyr Glu Lys Phe Glu Phe Asn Ile
900 905 910
Glu Asp Cys Arg Lys Phe Ile Asp Phe Tyr Lys Gln Ser Ile Ser Lys
915 920 925
His Pro Glu Trp Lys Asp Phe Gly Phe Arg Phe Ser Asp Thr Gln Arg
930 935 940
Tyr Asn Ser Ile Asp Glu Phe Tyr Arg Glu Val Glu Asn Gln Gly Tyr
945 950 955 960
Lys Leu Thr Phe Glu Asn Ile Ser Glu Ser Tyr Ile Asp Ser Val Val
965 970 975
Asn Gln Gly Lys Leu Tyr Leu Phe Gln Ile Tyr Asn Lys Asp Phe Ser
980 985 990
Ala Tyr Ser Lys Gly Arg Pro Asn Leu His Thr Leu Tyr Trp Lys Ala
995 1000 1005
Leu Phe Asp Glu Arg Asn Leu Gln Asp Val Val Tyr Lys Leu Asn Gly
1010 1015 1020
Glu Ala Glu Leu Phe Tyr Arg Lys Gln Ser Ile Pro Lys Lys Ile Thr
1025 1030 1035 1040
His Pro Ala Lys Glu Ala Ile Ala Asn Lys Asn Lys Asp Asn Pro Lys
1045 1050 1055
Lys Glu Ser Val Phe Glu Tyr Asp Leu Ile Lys Asp Lys Arg Phe Thr
1060 1065 1070
Glu Asp Lys Phe Phe Phe His Cys Pro Ile Thr Ile Asn Phe Lys Ser
1075 1080 1085
Ser Gly Ala Asn Lys Phe Asn Asp Glu Ile Asn Leu Leu Leu Lys Glu
1090 1095 1100
Lys Ala Asn Asp Val His Ile Leu Ser Ile Ala Arg Gly Glu Arg His
1105 1110 1115 1120
Leu Ala Tyr Tyr Thr Leu Val Asp Gly Lys Gly Asn Ile Ile Lys Gln
1125 1130 1135
Asp Thr Phe Asn Ile Ile Gly Asn Asp Arg Met Lys Thr Asn Tyr His
1140 1145 1150
Asp Lys Leu Ala Ala Ile Glu Lys Asp Arg Asp Ser Ala Arg Lys Asp
1155 1160 1165
Trp Lys Lys Ile Asn Asn Ile Lys Glu Met Lys Glu Gly Tyr Leu Ser
1170 1175 1180
Gln Val Val His Glu Ile Ala Lys Leu Val Ile Glu Tyr Asn Ala Ile
1185 1190 1195 1200
Val Val Phe Glu Asp Leu Asn Phe Gly Phe Lys Arg Gly Arg Phe Lys
1205 1210 1215
Val Glu Lys Gln Val Tyr Gln Lys Leu Glu Lys Met Leu Ile Glu Lys
1220 1225 1230
Leu Asn Tyr Leu Val Phe Lys Asp Asn Glu Phe Asp Lys Thr Gly Gly
1235 1240 1245
Val Leu Arg Ala Tyr Gln Leu Thr Ala Pro Phe Glu Thr Phe Lys Lys
1250 1255 1260
Met Gly Lys Gln Thr Gly Ile Ile Tyr Tyr Val Pro Ala Gly Phe Thr
1265 1270 1275 1280
Ser Lys Ile Cys Pro Val Thr Gly Phe Val Asn Gln Leu Tyr Pro Lys
1285 1290 1295
Tyr Glu Ser Val Ser Lys Ser Gln Glu Phe Phe Ser Lys Phe Asp Lys
1300 1305 1310
Ile Cys Tyr Asn Leu Asp Lys Gly Tyr Phe Glu Phe Ser Phe Asp Tyr
1315 1320 1325
Lys Asn Phe Gly Asp Lys Ala Ala Lys Gly Lys Trp Thr Ile Ala Ser
1330 1335 1340
Phe Gly Ser Arg Leu Ile Asn Phe Arg Asn Ser Asp Lys Asn His Asn
1345 1350 1355 1360
Trp Asp Thr Arg Glu Val Tyr Pro Thr Lys Glu Leu Glu Lys Leu Leu
1365 1370 1375
Lys Asp Tyr Ser Ile Glu Tyr Gly His Gly Glu Cys Ile Lys Ala Ala
1380 1385 1390
Ile Cys Gly Glu Ser Asp Lys Lys Phe Phe Ala Lys Leu Thr Ser Val
1395 1400 1405
Leu Asn Thr Ile Leu Gln Met Arg Asn Ser Lys Thr Gly Thr Glu Leu
1410 1415 1420
Asp Tyr Leu Ile Ser Pro Val Ala Asp Val Asn Gly Asn Phe Phe Asp
1425 1430 1435 1440
Ser Arg Gln Ala Pro Lys Asn Met Pro Gln Asp Ala Asp Ala Asn Gly
1445 1450 1455
Ala Tyr His Ile Gly Leu Lys Gly Leu Met Leu Leu Gly Arg Ile Lys
1460 1465 1470
Asn Asn Gln Glu Gly Lys Lys Leu Asn Leu Val Ile Lys Asn Glu Glu
1475 1480 1485
Tyr Phe Glu Phe Val Gln Asn Arg Asn Asn Ser Gly Gly Ser Pro Lys
1490 1495 1500
Lys Lys Arg Lys Val
1505
<210> 2
<211> 1509
<212> PRT
<213> (Artificial sequence)
<400> 2
Met Ser Glu Val Glu Phe Ser His Glu Tyr Trp Met Arg His Ala Leu
1 5 10 15
Thr Leu Ala Lys Arg Ala Arg Asp Glu Arg Glu Val Pro Val Gly Ala
20 25 30
Val Leu Val Leu Asn Asn Arg Val Ile Gly Glu Gly Trp Asn Arg Ala
35 40 45
Ile Gly Leu His Asp Pro Thr Ala His Ala Glu Ile Met Ala Leu Arg
50 55 60
Gln Gly Gly Leu Val Met Gln Asn Tyr Arg Leu Tyr Asp Ala Thr Leu
65 70 75 80
Tyr Ser Thr Phe Glu Pro Cys Val Met Cys Ala Gly Ala Met Ile His
85 90 95
Ser Arg Ile Gly Arg Val Val Phe Gly Val Arg Asn Ala Lys Thr Gly
100 105 110
Ala Ala Gly Ser Leu Met Asp Val Leu His His Pro Gly Met Asn His
115 120 125
Arg Val Glu Ile Thr Glu Gly Ile Leu Ala Asp Glu Cys Ala Ala Leu
130 135 140
Leu Cys Arg Phe Phe Arg Met Pro Arg Arg Val Phe Asn Ala Gln Lys
145 150 155 160
Lys Ala Gln Ser Ser Thr Asp Ser Gly Gly Ser Ser Gly Gly Ser Ser
165 170 175
Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser Ser
180 185 190
Gly Gly Ser Ser Gly Gly Ser Ser Ile Tyr Gln Glu Phe Val Asn Lys
195 200 205
Tyr Ser Leu Ser Lys Thr Leu Arg Phe Glu Leu Ile Pro Gln Gly Lys
210 215 220
Thr Leu Glu Asn Ile Lys Ala Arg Gly Leu Ile Leu Asp Asp Glu Lys
225 230 235 240
Arg Ala Lys Asp Tyr Lys Lys Ala Lys Gln Ile Ile Asp Lys Tyr His
245 250 255
Gln Phe Phe Ile Glu Glu Ile Leu Ser Ser Val Cys Ile Ser Glu Asp
260 265 270
Leu Leu Gln Asn Tyr Ser Asp Val Tyr Phe Lys Leu Lys Lys Ser Asp
275 280 285
Asp Asp Asn Leu Gln Lys Asp Phe Lys Ser Ala Lys Asp Thr Ile Lys
290 295 300
Lys Gln Ile Ser Glu Tyr Ile Lys Asp Ser Glu Lys Phe Lys Asn Leu
305 310 315 320
Phe Asn Gln Asn Leu Ile Asp Ala Lys Lys Gly Gln Glu Ser Asp Leu
325 330 335
Ile Leu Trp Leu Lys Gln Ser Lys Asp Asn Gly Ile Glu Leu Phe Lys
340 345 350
Ala Asn Ser Asp Ile Thr Asp Ile Asp Glu Ala Leu Glu Ile Ile Lys
355 360 365
Ser Phe Lys Gly Trp Thr Thr Tyr Phe Lys Gly Phe His Glu Asn Arg
370 375 380
Lys Asn Val Tyr Ser Ser Asn Asp Ile Pro Thr Ser Ile Ile Tyr Arg
385 390 395 400
Ile Val Asp Asp Asn Leu Pro Lys Phe Leu Glu Asn Lys Ala Lys Tyr
405 410 415
Glu Ser Leu Lys Asp Lys Ala Pro Glu Ala Ile Asn Tyr Glu Gln Ile
420 425 430
Lys Lys Asp Leu Ala Glu Glu Leu Thr Phe Asp Ile Asp Tyr Lys Thr
435 440 445
Ser Glu Val Asn Gln Arg Val Phe Ser Leu Asp Glu Val Phe Glu Ile
450 455 460
Ala Asn Phe Asn Asn Tyr Leu Asn Gln Ser Gly Ile Thr Lys Phe Asn
465 470 475 480
Thr Ile Ile Gly Gly Lys Phe Val Asn Gly Glu Asn Thr Lys Arg Lys
485 490 495
Gly Ile Asn Glu Tyr Ile Asn Leu Tyr Ser Gln Gln Ile Asn Asp Lys
500 505 510
Thr Leu Lys Lys Tyr Lys Met Ser Val Leu Phe Lys Gln Ile Leu Ser
515 520 525
Asp Thr Glu Ser Lys Ser Phe Val Ile Asp Lys Leu Glu Asp Asp Ser
530 535 540
Asp Val Val Thr Thr Met Gln Ser Phe Tyr Glu Gln Ile Ala Ala Phe
545 550 555 560
Lys Thr Val Glu Glu Lys Ser Ile Lys Glu Thr Leu Ser Leu Leu Phe
565 570 575
Asp Asp Leu Lys Ala Gln Lys Leu Asp Leu Ser Lys Ile Tyr Phe Lys
580 585 590
Asn Asp Lys Ser Leu Thr Asp Leu Ser Gln Gln Val Phe Asp Asp Tyr
595 600 605
Ser Val Ile Gly Thr Ala Val Leu Glu Tyr Ile Thr Gln Gln Ile Ala
610 615 620
Pro Lys Asn Leu Asp Asn Pro Ser Lys Lys Glu Gln Glu Leu Ile Ala
625 630 635 640
Lys Lys Thr Glu Lys Ala Lys Tyr Leu Ser Leu Glu Thr Ile Lys Leu
645 650 655
Ala Leu Glu Glu Phe Asn Lys His Arg Asp Ile Asp Lys Gln Cys Arg
660 665 670
Phe Glu Glu Ile Leu Ala Asn Phe Ala Ala Ile Pro Met Ile Phe Asp
675 680 685
Glu Ile Ala Gln Asn Lys Asp Asn Leu Ala Gln Ile Ser Ile Lys Tyr
690 695 700
Gln Asn Gln Gly Lys Lys Asp Leu Leu Gln Ala Ser Ala Glu Asp Asp
705 710 715 720
Val Lys Ala Ile Lys Asp Leu Leu Asp Gln Thr Asn Asn Leu Leu His
725 730 735
Lys Leu Lys Ile Phe His Ile Ser Gln Ser Glu Asp Lys Ala Asn Ile
740 745 750
Leu Asp Lys Asp Glu His Phe Tyr Leu Val Phe Glu Glu Cys Tyr Phe
755 760 765
Glu Leu Ala Asn Ile Val Pro Leu Tyr Asn Lys Ile Arg Asn Tyr Ile
770 775 780
Thr Gln Lys Pro Tyr Ser Asp Glu Lys Phe Lys Leu Asn Phe Glu Asn
785 790 795 800
Ser Thr Leu Ala Asn Gly Trp Asp Lys Asn Lys Glu Pro Asp Asn Thr
805 810 815
Ala Ile Leu Phe Ile Lys Asp Asp Lys Tyr Tyr Leu Gly Val Met Asn
820 825 830
Lys Lys Asn Asn Lys Ile Phe Asp Asp Lys Ala Ile Lys Glu Asn Lys
835 840 845
Gly Glu Gly Tyr Lys Lys Ile Val Tyr Lys Leu Leu Pro Gly Ala Asn
850 855 860
Lys Met Leu Pro Lys Val Phe Phe Ser Ala Lys Ser Ile Lys Phe Tyr
865 870 875 880
Asn Pro Ser Glu Asp Ile Leu Arg Ile Arg Asn His Ser Thr His Thr
885 890 895
Lys Asn Gly Ser Pro Gln Lys Gly Tyr Glu Lys Phe Glu Phe Asn Ile
900 905 910
Glu Asp Cys Arg Lys Phe Ile Asp Phe Tyr Lys Gln Ser Ile Ser Lys
915 920 925
His Pro Glu Trp Lys Asp Phe Gly Phe Arg Phe Ser Asp Thr Gln Arg
930 935 940
Tyr Asn Ser Ile Asp Glu Phe Tyr Arg Glu Val Glu Asn Gln Gly Tyr
945 950 955 960
Lys Leu Thr Phe Glu Asn Ile Ser Glu Ser Tyr Ile Asp Ser Val Val
965 970 975
Asn Gln Gly Lys Leu Tyr Leu Phe Gln Ile Tyr Asn Lys Asp Phe Ser
980 985 990
Ala Tyr Ser Lys Gly Arg Pro Asn Leu His Thr Leu Tyr Trp Lys Ala
995 1000 1005
Leu Phe Asp Glu Arg Asn Leu Gln Asp Val Val Tyr Lys Leu Asn Gly
1010 1015 1020
Glu Ala Glu Leu Phe Tyr Arg Lys Gln Ser Ile Pro Lys Lys Ile Thr
1025 1030 1035 1040
His Pro Ala Lys Glu Ala Ile Ala Asn Lys Asn Lys Asp Asn Pro Lys
1045 1050 1055
Lys Glu Ser Val Phe Glu Tyr Asp Leu Ile Lys Asp Lys Arg Phe Thr
1060 1065 1070
Glu Asp Lys Phe Phe Phe His Cys Pro Ile Thr Ile Asn Phe Lys Ser
1075 1080 1085
Ser Gly Ala Asn Lys Phe Asn Asp Glu Ile Asn Leu Leu Leu Lys Glu
1090 1095 1100
Lys Ala Asn Asp Val His Ile Leu Ser Ile Ala Arg Gly Glu Arg His
1105 1110 1115 1120
Leu Ala Tyr Tyr Thr Leu Val Asp Gly Lys Gly Asn Ile Ile Lys Gln
1125 1130 1135
Asp Thr Phe Asn Ile Ile Gly Asn Asp Arg Met Lys Thr Asn Tyr His
1140 1145 1150
Asp Lys Leu Ala Ala Ile Glu Lys Asp Arg Asp Ser Ala Arg Lys Asp
1155 1160 1165
Trp Lys Lys Ile Asn Asn Ile Lys Glu Met Lys Glu Gly Tyr Leu Ser
1170 1175 1180
Gln Val Val His Glu Ile Ala Lys Leu Val Ile Glu Tyr Asn Ala Ile
1185 1190 1195 1200
Val Val Phe Glu Asp Leu Asn Phe Gly Phe Lys Arg Gly Arg Phe Lys
1205 1210 1215
Val Glu Lys Gln Val Tyr Gln Lys Leu Glu Lys Met Leu Ile Glu Lys
1220 1225 1230
Leu Asn Tyr Leu Val Phe Lys Asp Asn Glu Phe Asp Lys Thr Gly Gly
1235 1240 1245
Val Leu Arg Ala Tyr Gln Leu Thr Ala Pro Phe Glu Thr Phe Lys Lys
1250 1255 1260
Met Gly Lys Gln Thr Gly Ile Ile Tyr Tyr Val Pro Ala Gly Phe Thr
1265 1270 1275 1280
Ser Lys Ile Cys Pro Val Thr Gly Phe Val Asn Gln Leu Tyr Pro Lys
1285 1290 1295
Tyr Glu Ser Val Ser Lys Ser Gln Glu Phe Phe Ser Lys Phe Asp Lys
1300 1305 1310
Ile Cys Tyr Asn Leu Asp Lys Gly Tyr Phe Glu Phe Ser Phe Asp Tyr
1315 1320 1325
Lys Asn Phe Gly Asp Lys Ala Ala Lys Gly Lys Trp Thr Ile Ala Ser
1330 1335 1340
Phe Gly Ser Arg Leu Ile Asn Phe Arg Asn Ser Asp Lys Asn His Asn
1345 1350 1355 1360
Trp Asp Thr Arg Glu Val Tyr Pro Thr Lys Glu Leu Glu Lys Leu Leu
1365 1370 1375
Lys Asp Tyr Ser Ile Glu Tyr Gly His Gly Glu Cys Ile Lys Ala Ala
1380 1385 1390
Ile Cys Gly Glu Ser Asp Lys Lys Phe Phe Ala Lys Leu Thr Ser Val
1395 1400 1405
Leu Asn Thr Ile Leu Gln Met Arg Asn Ser Lys Thr Gly Thr Glu Leu
1410 1415 1420
Asp Tyr Leu Ile Ser Pro Val Ala Asp Val Asn Gly Asn Phe Phe Asp
1425 1430 1435 1440
Ser Arg Gln Ala Pro Lys Asn Met Pro Gln Asp Ala Asp Ala Asn Gly
1445 1450 1455
Ala Tyr His Ile Gly Leu Lys Gly Leu Met Leu Leu Gly Arg Ile Lys
1460 1465 1470
Asn Asn Gln Glu Gly Lys Lys Leu Asn Leu Val Ile Lys Asn Glu Glu
1475 1480 1485
Tyr Phe Glu Phe Val Gln Asn Arg Asn Asn Ser Gly Gly Ser Pro Lys
1490 1495 1500
Lys Lys Arg Lys Val
1505
<210> 3
<211> 1509
<212> PRT
<213> (Artificial sequence)
<400> 3
Met Ser Glu Val Glu Phe Ser His Glu Tyr Trp Met Arg His Ala Leu
1 5 10 15
Thr Leu Ala Lys Arg Ala Arg Asp Glu Arg Glu Val Pro Val Gly Ala
20 25 30
Val Leu Val Leu Asn Asn Arg Val Ile Gly Glu Gly Trp Asn Arg Ala
35 40 45
Ile Gly Leu His Asp Pro Thr Ala His Ala Glu Ile Met Ala Leu Arg
50 55 60
Gln Gly Gly Leu Val Met Gln Asn Tyr Arg Leu Ile Asp Ala Thr Leu
65 70 75 80
Tyr Val Thr Phe Glu Pro Cys Val Met Cys Ala Gly Ala Met Ile His
85 90 95
Ser Arg Ile Gly Arg Val Val Phe Gly Val Arg Asn Ser Lys Arg Gly
100 105 110
Ala Ala Gly Ser Leu Met Asn Val Leu Asn Tyr Pro Gly Met Asn His
115 120 125
Arg Val Glu Ile Thr Glu Gly Ile Leu Ala Asp Glu Cys Ala Ala Leu
130 135 140
Leu Cys Asp Phe Tyr Arg Met Pro Arg Gln Val Phe Asn Ala Gln Lys
145 150 155 160
Lys Ala Gln Ser Ser Ile Asn Ser Gly Gly Ser Ser Gly Gly Ser Ser
165 170 175
Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser Ser
180 185 190
Gly Gly Ser Ser Gly Gly Ser Ser Ile Tyr Gln Glu Phe Val Asn Lys
195 200 205
Tyr Ser Leu Ser Lys Thr Leu Arg Phe Glu Leu Ile Pro Gln Gly Lys
210 215 220
Thr Leu Glu Asn Ile Lys Ala Arg Gly Leu Ile Leu Asp Asp Glu Lys
225 230 235 240
Arg Ala Lys Asp Tyr Lys Lys Ala Lys Gln Ile Ile Asp Lys Tyr His
245 250 255
Gln Phe Phe Ile Glu Glu Ile Leu Ser Ser Val Cys Ile Ser Glu Asp
260 265 270
Leu Leu Gln Asn Tyr Ser Asp Val Tyr Phe Lys Leu Lys Lys Ser Asp
275 280 285
Asp Asp Asn Leu Gln Lys Asp Phe Lys Ser Ala Lys Asp Thr Ile Lys
290 295 300
Lys Gln Ile Ser Glu Tyr Ile Lys Asp Ser Glu Lys Phe Lys Asn Leu
305 310 315 320
Phe Asn Gln Asn Leu Ile Asp Ala Lys Lys Gly Gln Glu Ser Asp Leu
325 330 335
Ile Leu Trp Leu Lys Gln Ser Lys Asp Asn Gly Ile Glu Leu Phe Lys
340 345 350
Ala Asn Ser Asp Ile Thr Asp Ile Asp Glu Ala Leu Glu Ile Ile Lys
355 360 365
Ser Phe Lys Gly Trp Thr Thr Tyr Phe Lys Gly Phe His Glu Asn Arg
370 375 380
Lys Asn Val Tyr Ser Ser Asn Asp Ile Pro Thr Ser Ile Ile Tyr Arg
385 390 395 400
Ile Val Asp Asp Asn Leu Pro Lys Phe Leu Glu Asn Lys Ala Lys Tyr
405 410 415
Glu Ser Leu Lys Asp Lys Ala Pro Glu Ala Ile Asn Tyr Glu Gln Ile
420 425 430
Lys Lys Asp Leu Ala Glu Glu Leu Thr Phe Asp Ile Asp Tyr Lys Thr
435 440 445
Ser Glu Val Asn Gln Arg Val Phe Ser Leu Asp Glu Val Phe Glu Ile
450 455 460
Ala Asn Phe Asn Asn Tyr Leu Asn Gln Ser Gly Ile Thr Lys Phe Asn
465 470 475 480
Thr Ile Ile Gly Gly Lys Phe Val Asn Gly Glu Asn Thr Lys Arg Lys
485 490 495
Gly Ile Asn Glu Tyr Ile Asn Leu Tyr Ser Gln Gln Ile Asn Asp Lys
500 505 510
Thr Leu Lys Lys Tyr Lys Met Ser Val Leu Phe Lys Gln Ile Leu Ser
515 520 525
Asp Thr Glu Ser Lys Ser Phe Val Ile Asp Lys Leu Glu Asp Asp Ser
530 535 540
Asp Val Val Thr Thr Met Gln Ser Phe Tyr Glu Gln Ile Ala Ala Phe
545 550 555 560
Lys Thr Val Glu Glu Lys Ser Ile Lys Glu Thr Leu Ser Leu Leu Phe
565 570 575
Asp Asp Leu Lys Ala Gln Lys Leu Asp Leu Ser Lys Ile Tyr Phe Lys
580 585 590
Asn Asp Lys Ser Leu Thr Asp Leu Ser Gln Gln Val Phe Asp Asp Tyr
595 600 605
Ser Val Ile Gly Thr Ala Val Leu Glu Tyr Ile Thr Gln Gln Ile Ala
610 615 620
Pro Lys Asn Leu Asp Asn Pro Ser Lys Lys Glu Gln Glu Leu Ile Ala
625 630 635 640
Lys Lys Thr Glu Lys Ala Lys Tyr Leu Ser Leu Glu Thr Ile Lys Leu
645 650 655
Ala Leu Glu Glu Phe Asn Lys His Arg Asp Ile Asp Lys Gln Cys Arg
660 665 670
Phe Glu Glu Ile Leu Ala Asn Phe Ala Ala Ile Pro Met Ile Phe Asp
675 680 685
Glu Ile Ala Gln Asn Lys Asp Asn Leu Ala Gln Ile Ser Ile Lys Tyr
690 695 700
Gln Asn Gln Gly Lys Lys Asp Leu Leu Gln Ala Ser Ala Glu Asp Asp
705 710 715 720
Val Lys Ala Ile Lys Asp Leu Leu Asp Gln Thr Asn Asn Leu Leu His
725 730 735
Lys Leu Lys Ile Phe His Ile Ser Gln Ser Glu Asp Lys Ala Asn Ile
740 745 750
Leu Asp Lys Asp Glu His Phe Tyr Leu Val Phe Glu Glu Cys Tyr Phe
755 760 765
Glu Leu Ala Asn Ile Val Pro Leu Tyr Asn Lys Ile Arg Asn Tyr Ile
770 775 780
Thr Gln Lys Pro Tyr Ser Asp Glu Lys Phe Lys Leu Asn Phe Glu Asn
785 790 795 800
Ser Thr Leu Ala Asn Gly Trp Asp Lys Asn Lys Glu Pro Asp Asn Thr
805 810 815
Ala Ile Leu Phe Ile Lys Asp Asp Lys Tyr Tyr Leu Gly Val Met Asn
820 825 830
Lys Lys Asn Asn Lys Ile Phe Asp Asp Lys Ala Ile Lys Glu Asn Lys
835 840 845
Gly Glu Gly Tyr Lys Lys Ile Val Tyr Lys Leu Leu Pro Gly Ala Asn
850 855 860
Lys Met Leu Pro Lys Val Phe Phe Ser Ala Lys Ser Ile Lys Phe Tyr
865 870 875 880
Asn Pro Ser Glu Asp Ile Leu Arg Ile Arg Asn His Ser Thr His Thr
885 890 895
Lys Asn Gly Ser Pro Gln Lys Gly Tyr Glu Lys Phe Glu Phe Asn Ile
900 905 910
Glu Asp Cys Arg Lys Phe Ile Asp Phe Tyr Lys Gln Ser Ile Ser Lys
915 920 925
His Pro Glu Trp Lys Asp Phe Gly Phe Arg Phe Ser Asp Thr Gln Arg
930 935 940
Tyr Asn Ser Ile Asp Glu Phe Tyr Arg Glu Val Glu Asn Gln Gly Tyr
945 950 955 960
Lys Leu Thr Phe Glu Asn Ile Ser Glu Ser Tyr Ile Asp Ser Val Val
965 970 975
Asn Gln Gly Lys Leu Tyr Leu Phe Gln Ile Tyr Asn Lys Asp Phe Ser
980 985 990
Ala Tyr Ser Lys Gly Arg Pro Asn Leu His Thr Leu Tyr Trp Lys Ala
995 1000 1005
Leu Phe Asp Glu Arg Asn Leu Gln Asp Val Val Tyr Lys Leu Asn Gly
1010 1015 1020
Glu Ala Glu Leu Phe Tyr Arg Lys Gln Ser Ile Pro Lys Lys Ile Thr
1025 1030 1035 1040
His Pro Ala Lys Glu Ala Ile Ala Asn Lys Asn Lys Asp Asn Pro Lys
1045 1050 1055
Lys Glu Ser Val Phe Glu Tyr Asp Leu Ile Lys Asp Lys Arg Phe Thr
1060 1065 1070
Glu Asp Lys Phe Phe Phe His Cys Pro Ile Thr Ile Asn Phe Lys Ser
1075 1080 1085
Ser Gly Ala Asn Lys Phe Asn Asp Glu Ile Asn Leu Leu Leu Lys Glu
1090 1095 1100
Lys Ala Asn Asp Val His Ile Leu Ser Ile Ala Arg Gly Glu Arg His
1105 1110 1115 1120
Leu Ala Tyr Tyr Thr Leu Val Asp Gly Lys Gly Asn Ile Ile Lys Gln
1125 1130 1135
Asp Thr Phe Asn Ile Ile Gly Asn Asp Arg Met Lys Thr Asn Tyr His
1140 1145 1150
Asp Lys Leu Ala Ala Ile Glu Lys Asp Arg Asp Ser Ala Arg Lys Asp
1155 1160 1165
Trp Lys Lys Ile Asn Asn Ile Lys Glu Met Lys Glu Gly Tyr Leu Ser
1170 1175 1180
Gln Val Val His Glu Ile Ala Lys Leu Val Ile Glu Tyr Asn Ala Ile
1185 1190 1195 1200
Val Val Phe Glu Asp Leu Asn Phe Gly Phe Lys Arg Gly Arg Phe Lys
1205 1210 1215
Val Glu Lys Gln Val Tyr Gln Lys Leu Glu Lys Met Leu Ile Glu Lys
1220 1225 1230
Leu Asn Tyr Leu Val Phe Lys Asp Asn Glu Phe Asp Lys Thr Gly Gly
1235 1240 1245
Val Leu Arg Ala Tyr Gln Leu Thr Ala Pro Phe Glu Thr Phe Lys Lys
1250 1255 1260
Met Gly Lys Gln Thr Gly Ile Ile Tyr Tyr Val Pro Ala Gly Phe Thr
1265 1270 1275 1280
Ser Lys Ile Cys Pro Val Thr Gly Phe Val Asn Gln Leu Tyr Pro Lys
1285 1290 1295
Tyr Glu Ser Val Ser Lys Ser Gln Glu Phe Phe Ser Lys Phe Asp Lys
1300 1305 1310
Ile Cys Tyr Asn Leu Asp Lys Gly Tyr Phe Glu Phe Ser Phe Asp Tyr
1315 1320 1325
Lys Asn Phe Gly Asp Lys Ala Ala Lys Gly Lys Trp Thr Ile Ala Ser
1330 1335 1340
Phe Gly Ser Arg Leu Ile Asn Phe Arg Asn Ser Asp Lys Asn His Asn
1345 1350 1355 1360
Trp Asp Thr Arg Glu Val Tyr Pro Thr Lys Glu Leu Glu Lys Leu Leu
1365 1370 1375
Lys Asp Tyr Ser Ile Glu Tyr Gly His Gly Glu Cys Ile Lys Ala Ala
1380 1385 1390
Ile Cys Gly Glu Ser Asp Lys Lys Phe Phe Ala Lys Leu Thr Ser Val
1395 1400 1405
Leu Asn Thr Ile Leu Gln Met Arg Asn Ser Lys Thr Gly Thr Glu Leu
1410 1415 1420
Asp Tyr Leu Ile Ser Pro Val Ala Asp Val Asn Gly Asn Phe Phe Asp
1425 1430 1435 1440
Ser Arg Gln Ala Pro Lys Asn Met Pro Gln Asp Ala Asp Ala Asn Gly
1445 1450 1455
Ala Tyr His Ile Gly Leu Lys Gly Leu Met Leu Leu Gly Arg Ile Lys
1460 1465 1470
Asn Asn Gln Glu Gly Lys Lys Leu Asn Leu Val Ile Lys Asn Glu Glu
1475 1480 1485
Tyr Phe Glu Phe Val Gln Asn Arg Asn Asn Ser Gly Gly Ser Pro Lys
1490 1495 1500
Lys Lys Arg Lys Val
1505
<210> 4
<211> 1509
<212> PRT
<213> (Artificial sequence)
<400> 4
Met Ser Glu Val Glu Phe Ser His Glu Tyr Trp Met Arg His Ala Leu
1 5 10 15
Thr Leu Ala Lys Arg Ala Arg Asp Glu Arg Glu Val Pro Val Gly Ala
20 25 30
Val Leu Val Leu Asn Asn Arg Val Ile Gly Glu Gly Trp Asn Arg Ala
35 40 45
Ile Gly Leu His Asp Pro Thr Ala His Ala Glu Ile Met Ala Leu Arg
50 55 60
Gln Gly Gly Leu Val Met Gln Asn Tyr Arg Leu Ile Asp Ala Thr Leu
65 70 75 80
Tyr Ser Thr Phe Glu Pro Cys Val Met Cys Ala Gly Ala Met Ile His
85 90 95
Ser Arg Ile Gly Arg Val Val Phe Gly Val Arg Asn Ser Lys Arg Gly
100 105 110
Ala Ala Gly Ser Leu Met Asn Val Leu Asn Tyr Pro Gly Met Asn His
115 120 125
Arg Val Glu Ile Thr Glu Gly Ile Leu Ala Asp Glu Cys Ala Ala Leu
130 135 140
Leu Cys Asp Phe Tyr Arg Met Pro Arg Arg Val Phe Asn Ala Gln Lys
145 150 155 160
Lys Ala Gln Ser Ser Ile Asn Ser Gly Gly Ser Ser Gly Gly Ser Ser
165 170 175
Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser Ser
180 185 190
Gly Gly Ser Ser Gly Gly Ser Ser Ile Tyr Gln Glu Phe Val Asn Lys
195 200 205
Tyr Ser Leu Ser Lys Thr Leu Arg Phe Glu Leu Ile Pro Gln Gly Lys
210 215 220
Thr Leu Glu Asn Ile Lys Ala Arg Gly Leu Ile Leu Asp Asp Glu Lys
225 230 235 240
Arg Ala Lys Asp Tyr Lys Lys Ala Lys Gln Ile Ile Asp Lys Tyr His
245 250 255
Gln Phe Phe Ile Glu Glu Ile Leu Ser Ser Val Cys Ile Ser Glu Asp
260 265 270
Leu Leu Gln Asn Tyr Ser Asp Val Tyr Phe Lys Leu Lys Lys Ser Asp
275 280 285
Asp Asp Asn Leu Gln Lys Asp Phe Lys Ser Ala Lys Asp Thr Ile Lys
290 295 300
Lys Gln Ile Ser Glu Tyr Ile Lys Asp Ser Glu Lys Phe Lys Asn Leu
305 310 315 320
Phe Asn Gln Asn Leu Ile Asp Ala Lys Lys Gly Gln Glu Ser Asp Leu
325 330 335
Ile Leu Trp Leu Lys Gln Ser Lys Asp Asn Gly Ile Glu Leu Phe Lys
340 345 350
Ala Asn Ser Asp Ile Thr Asp Ile Asp Glu Ala Leu Glu Ile Ile Lys
355 360 365
Ser Phe Lys Gly Trp Thr Thr Tyr Phe Lys Gly Phe His Glu Asn Arg
370 375 380
Lys Asn Val Tyr Ser Ser Asn Asp Ile Pro Thr Ser Ile Ile Tyr Arg
385 390 395 400
Ile Val Asp Asp Asn Leu Pro Lys Phe Leu Glu Asn Lys Ala Lys Tyr
405 410 415
Glu Ser Leu Lys Asp Lys Ala Pro Glu Ala Ile Asn Tyr Glu Gln Ile
420 425 430
Lys Lys Asp Leu Ala Glu Glu Leu Thr Phe Asp Ile Asp Tyr Lys Thr
435 440 445
Ser Glu Val Asn Gln Arg Val Phe Ser Leu Asp Glu Val Phe Glu Ile
450 455 460
Ala Asn Phe Asn Asn Tyr Leu Asn Gln Ser Gly Ile Thr Lys Phe Asn
465 470 475 480
Thr Ile Ile Gly Gly Lys Phe Val Asn Gly Glu Asn Thr Lys Arg Lys
485 490 495
Gly Ile Asn Glu Tyr Ile Asn Leu Tyr Ser Gln Gln Ile Asn Asp Lys
500 505 510
Thr Leu Lys Lys Tyr Lys Met Ser Val Leu Phe Lys Gln Ile Leu Ser
515 520 525
Asp Thr Glu Ser Lys Ser Phe Val Ile Asp Lys Leu Glu Asp Asp Ser
530 535 540
Asp Val Val Thr Thr Met Gln Ser Phe Tyr Glu Gln Ile Ala Ala Phe
545 550 555 560
Lys Thr Val Glu Glu Lys Ser Ile Lys Glu Thr Leu Ser Leu Leu Phe
565 570 575
Asp Asp Leu Lys Ala Gln Lys Leu Asp Leu Ser Lys Ile Tyr Phe Lys
580 585 590
Asn Asp Lys Ser Leu Thr Asp Leu Ser Gln Gln Val Phe Asp Asp Tyr
595 600 605
Ser Val Ile Gly Thr Ala Val Leu Glu Tyr Ile Thr Gln Gln Ile Ala
610 615 620
Pro Lys Asn Leu Asp Asn Pro Ser Lys Lys Glu Gln Glu Leu Ile Ala
625 630 635 640
Lys Lys Thr Glu Lys Ala Lys Tyr Leu Ser Leu Glu Thr Ile Lys Leu
645 650 655
Ala Leu Glu Glu Phe Asn Lys His Arg Asp Ile Asp Lys Gln Cys Arg
660 665 670
Phe Glu Glu Ile Leu Ala Asn Phe Ala Ala Ile Pro Met Ile Phe Asp
675 680 685
Glu Ile Ala Gln Asn Lys Asp Asn Leu Ala Gln Ile Ser Ile Lys Tyr
690 695 700
Gln Asn Gln Gly Lys Lys Asp Leu Leu Gln Ala Ser Ala Glu Asp Asp
705 710 715 720
Val Lys Ala Ile Lys Asp Leu Leu Asp Gln Thr Asn Asn Leu Leu His
725 730 735
Lys Leu Lys Ile Phe His Ile Ser Gln Ser Glu Asp Lys Ala Asn Ile
740 745 750
Leu Asp Lys Asp Glu His Phe Tyr Leu Val Phe Glu Glu Cys Tyr Phe
755 760 765
Glu Leu Ala Asn Ile Val Pro Leu Tyr Asn Lys Ile Arg Asn Tyr Ile
770 775 780
Thr Gln Lys Pro Tyr Ser Asp Glu Lys Phe Lys Leu Asn Phe Glu Asn
785 790 795 800
Ser Thr Leu Ala Asn Gly Trp Asp Lys Asn Lys Glu Pro Asp Asn Thr
805 810 815
Ala Ile Leu Phe Ile Lys Asp Asp Lys Tyr Tyr Leu Gly Val Met Asn
820 825 830
Lys Lys Asn Asn Lys Ile Phe Asp Asp Lys Ala Ile Lys Glu Asn Lys
835 840 845
Gly Glu Gly Tyr Lys Lys Ile Val Tyr Lys Leu Leu Pro Gly Ala Asn
850 855 860
Lys Met Leu Pro Lys Val Phe Phe Ser Ala Lys Ser Ile Lys Phe Tyr
865 870 875 880
Asn Pro Ser Glu Asp Ile Leu Arg Ile Arg Asn His Ser Thr His Thr
885 890 895
Lys Asn Gly Ser Pro Gln Lys Gly Tyr Glu Lys Phe Glu Phe Asn Ile
900 905 910
Glu Asp Cys Arg Lys Phe Ile Asp Phe Tyr Lys Gln Ser Ile Ser Lys
915 920 925
His Pro Glu Trp Lys Asp Phe Gly Phe Arg Phe Ser Asp Thr Gln Arg
930 935 940
Tyr Asn Ser Ile Asp Glu Phe Tyr Arg Glu Val Glu Asn Gln Gly Tyr
945 950 955 960
Lys Leu Thr Phe Glu Asn Ile Ser Glu Ser Tyr Ile Asp Ser Val Val
965 970 975
Asn Gln Gly Lys Leu Tyr Leu Phe Gln Ile Tyr Asn Lys Asp Phe Ser
980 985 990
Ala Tyr Ser Lys Gly Arg Pro Asn Leu His Thr Leu Tyr Trp Lys Ala
995 1000 1005
Leu Phe Asp Glu Arg Asn Leu Gln Asp Val Val Tyr Lys Leu Asn Gly
1010 1015 1020
Glu Ala Glu Leu Phe Tyr Arg Lys Gln Ser Ile Pro Lys Lys Ile Thr
1025 1030 1035 1040
His Pro Ala Lys Glu Ala Ile Ala Asn Lys Asn Lys Asp Asn Pro Lys
1045 1050 1055
Lys Glu Ser Val Phe Glu Tyr Asp Leu Ile Lys Asp Lys Arg Phe Thr
1060 1065 1070
Glu Asp Lys Phe Phe Phe His Cys Pro Ile Thr Ile Asn Phe Lys Ser
1075 1080 1085
Ser Gly Ala Asn Lys Phe Asn Asp Glu Ile Asn Leu Leu Leu Lys Glu
1090 1095 1100
Lys Ala Asn Asp Val His Ile Leu Ser Ile Ala Arg Gly Glu Arg His
1105 1110 1115 1120
Leu Ala Tyr Tyr Thr Leu Val Asp Gly Lys Gly Asn Ile Ile Lys Gln
1125 1130 1135
Asp Thr Phe Asn Ile Ile Gly Asn Asp Arg Met Lys Thr Asn Tyr His
1140 1145 1150
Asp Lys Leu Ala Ala Ile Glu Lys Asp Arg Asp Ser Ala Arg Lys Asp
1155 1160 1165
Trp Lys Lys Ile Asn Asn Ile Lys Glu Met Lys Glu Gly Tyr Leu Ser
1170 1175 1180
Gln Val Val His Glu Ile Ala Lys Leu Val Ile Glu Tyr Asn Ala Ile
1185 1190 1195 1200
Val Val Phe Glu Asp Leu Asn Phe Gly Phe Lys Arg Gly Arg Phe Lys
1205 1210 1215
Val Glu Lys Gln Val Tyr Gln Lys Leu Glu Lys Met Leu Ile Glu Lys
1220 1225 1230
Leu Asn Tyr Leu Val Phe Lys Asp Asn Glu Phe Asp Lys Thr Gly Gly
1235 1240 1245
Val Leu Arg Ala Tyr Gln Leu Thr Ala Pro Phe Glu Thr Phe Lys Lys
1250 1255 1260
Met Gly Lys Gln Thr Gly Ile Ile Tyr Tyr Val Pro Ala Gly Phe Thr
1265 1270 1275 1280
Ser Lys Ile Cys Pro Val Thr Gly Phe Val Asn Gln Leu Tyr Pro Lys
1285 1290 1295
Tyr Glu Ser Val Ser Lys Ser Gln Glu Phe Phe Ser Lys Phe Asp Lys
1300 1305 1310
Ile Cys Tyr Asn Leu Asp Lys Gly Tyr Phe Glu Phe Ser Phe Asp Tyr
1315 1320 1325
Lys Asn Phe Gly Asp Lys Ala Ala Lys Gly Lys Trp Thr Ile Ala Ser
1330 1335 1340
Phe Gly Ser Arg Leu Ile Asn Phe Arg Asn Ser Asp Lys Asn His Asn
1345 1350 1355 1360
Trp Asp Thr Arg Glu Val Tyr Pro Thr Lys Glu Leu Glu Lys Leu Leu
1365 1370 1375
Lys Asp Tyr Ser Ile Glu Tyr Gly His Gly Glu Cys Ile Lys Ala Ala
1380 1385 1390
Ile Cys Gly Glu Ser Asp Lys Lys Phe Phe Ala Lys Leu Thr Ser Val
1395 1400 1405
Leu Asn Thr Ile Leu Gln Met Arg Asn Ser Lys Thr Gly Thr Glu Leu
1410 1415 1420
Asp Tyr Leu Ile Ser Pro Val Ala Asp Val Asn Gly Asn Phe Phe Asp
1425 1430 1435 1440
Ser Arg Gln Ala Pro Lys Asn Met Pro Gln Asp Ala Asp Ala Asn Gly
1445 1450 1455
Ala Tyr His Ile Gly Leu Lys Gly Leu Met Leu Leu Gly Arg Ile Lys
1460 1465 1470
Asn Asn Gln Glu Gly Lys Lys Leu Asn Leu Val Ile Lys Asn Glu Glu
1475 1480 1485
Tyr Phe Glu Phe Val Gln Asn Arg Asn Asn Ser Gly Gly Ser Pro Lys
1490 1495 1500
Lys Lys Arg Lys Val
1505
<210> 5
<211> 82
<212> DNA
<213> (Artificial sequence)
<400> 5
gtctaagaac tttaaataat ttctactgtt gtagatagag accgtgaagt taataaggtc 60
tcaaatttct actgttgtag at 82
<210> 6
<211> 1707
<212> PRT
<213> (Artificial sequence)
<400> 6
Met Ser Glu Val Glu Phe Ser His Glu Tyr Trp Met Arg His Ala Leu
1 5 10 15
Thr Leu Ala Lys Arg Ala Trp Asp Glu Arg Glu Val Pro Val Gly Ala
20 25 30
Val Leu Val His Asn Asn Arg Val Ile Gly Glu Gly Trp Asn Arg Pro
35 40 45
Ile Gly Arg His Asp Pro Thr Ala His Ala Glu Ile Met Ala Leu Arg
50 55 60
Gln Gly Gly Leu Val Met Gln Asn Tyr Arg Leu Ile Asp Ala Thr Leu
65 70 75 80
Tyr Val Thr Leu Glu Pro Cys Val Met Cys Ala Gly Ala Met Ile His
85 90 95
Ser Arg Ile Gly Arg Val Val Phe Gly Ala Arg Asp Ala Lys Thr Gly
100 105 110
Ala Ala Gly Ser Leu Met Asp Val Leu His His Pro Gly Met Asn His
115 120 125
Arg Val Glu Ile Thr Glu Gly Ile Leu Ala Asp Glu Cys Ala Ala Leu
130 135 140
Leu Ser Asp Phe Phe Arg Met Arg Arg Gln Glu Ile Lys Ala Gln Lys
145 150 155 160
Lys Ala Gln Ser Ser Thr Asp Ser Gly Gly Ser Ser Gly Gly Ser Ser
165 170 175
Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser Ser
180 185 190
Gly Gly Ser Ser Gly Gly Ser Ser Glu Val Glu Phe Ser His Glu Tyr
195 200 205
Trp Met Arg His Ala Leu Thr Leu Ala Lys Arg Ala Arg Asp Glu Arg
210 215 220
Glu Val Pro Val Gly Ala Val Leu Val Leu Asn Asn Arg Val Ile Gly
225 230 235 240
Glu Gly Trp Asn Arg Ala Ile Gly Leu His Asp Pro Thr Ala His Ala
245 250 255
Glu Ile Met Ala Leu Arg Gln Gly Gly Leu Val Met Gln Asn Tyr Arg
260 265 270
Leu Ile Asp Ala Thr Leu Tyr Val Thr Phe Glu Pro Cys Val Met Cys
275 280 285
Ala Gly Ala Met Ile His Ser Arg Ile Gly Arg Val Val Phe Gly Val
290 295 300
Arg Asn Ala Lys Thr Gly Ala Ala Gly Ser Leu Met Asp Val Leu His
305 310 315 320
Tyr Pro Gly Met Asn His Arg Val Glu Ile Thr Glu Gly Ile Leu Ala
325 330 335
Asp Glu Cys Ala Ala Leu Leu Cys Tyr Phe Phe Arg Met Pro Arg Gln
340 345 350
Val Phe Asn Ala Gln Lys Lys Ala Gln Ser Ser Thr Asp Ser Gly Gly
355 360 365
Ser Ser Gly Gly Ser Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser
370 375 380
Ala Thr Pro Glu Ser Ser Gly Gly Ser Ser Gly Gly Ser Ser Ile Tyr
385 390 395 400
Gln Glu Phe Val Asn Lys Tyr Ser Leu Ser Lys Thr Leu Arg Phe Glu
405 410 415
Leu Ile Pro Gln Gly Lys Thr Leu Glu Asn Ile Lys Ala Arg Gly Leu
420 425 430
Ile Leu Asp Asp Glu Lys Arg Ala Lys Asp Tyr Lys Lys Ala Lys Gln
435 440 445
Ile Ile Asp Lys Tyr His Gln Phe Phe Ile Glu Glu Ile Leu Ser Ser
450 455 460
Val Cys Ile Ser Glu Asp Leu Leu Gln Asn Tyr Ser Asp Val Tyr Phe
465 470 475 480
Lys Leu Lys Lys Ser Asp Asp Asp Asn Leu Gln Lys Asp Phe Lys Ser
485 490 495
Ala Lys Asp Thr Ile Lys Lys Gln Ile Ser Glu Tyr Ile Lys Asp Ser
500 505 510
Glu Lys Phe Lys Asn Leu Phe Asn Gln Asn Leu Ile Asp Ala Lys Lys
515 520 525
Gly Gln Glu Ser Asp Leu Ile Leu Trp Leu Lys Gln Ser Lys Asp Asn
530 535 540
Gly Ile Glu Leu Phe Lys Ala Asn Ser Asp Ile Thr Asp Ile Asp Glu
545 550 555 560
Ala Leu Glu Ile Ile Lys Ser Phe Lys Gly Trp Thr Thr Tyr Phe Lys
565 570 575
Gly Phe His Glu Asn Arg Lys Asn Val Tyr Ser Ser Asn Asp Ile Pro
580 585 590
Thr Ser Ile Ile Tyr Arg Ile Val Asp Asp Asn Leu Pro Lys Phe Leu
595 600 605
Glu Asn Lys Ala Lys Tyr Glu Ser Leu Lys Asp Lys Ala Pro Glu Ala
610 615 620
Ile Asn Tyr Glu Gln Ile Lys Lys Asp Leu Ala Glu Glu Leu Thr Phe
625 630 635 640
Asp Ile Asp Tyr Lys Thr Ser Glu Val Asn Gln Arg Val Phe Ser Leu
645 650 655
Asp Glu Val Phe Glu Ile Ala Asn Phe Asn Asn Tyr Leu Asn Gln Ser
660 665 670
Gly Ile Thr Lys Phe Asn Thr Ile Ile Gly Gly Lys Phe Val Asn Gly
675 680 685
Glu Asn Thr Lys Arg Lys Gly Ile Asn Glu Tyr Ile Asn Leu Tyr Ser
690 695 700
Gln Gln Ile Asn Asp Lys Thr Leu Lys Lys Tyr Lys Met Ser Val Leu
705 710 715 720
Phe Lys Gln Ile Leu Ser Asp Thr Glu Ser Lys Ser Phe Val Ile Asp
725 730 735
Lys Leu Glu Asp Asp Ser Asp Val Val Thr Thr Met Gln Ser Phe Tyr
740 745 750
Glu Gln Ile Ala Ala Phe Lys Thr Val Glu Glu Lys Ser Ile Lys Glu
755 760 765
Thr Leu Ser Leu Leu Phe Asp Asp Leu Lys Ala Gln Lys Leu Asp Leu
770 775 780
Ser Lys Ile Tyr Phe Lys Asn Asp Lys Ser Leu Thr Asp Leu Ser Gln
785 790 795 800
Gln Val Phe Asp Asp Tyr Ser Val Ile Gly Thr Ala Val Leu Glu Tyr
805 810 815
Ile Thr Gln Gln Ile Ala Pro Lys Asn Leu Asp Asn Pro Ser Lys Lys
820 825 830
Glu Gln Glu Leu Ile Ala Lys Lys Thr Glu Lys Ala Lys Tyr Leu Ser
835 840 845
Leu Glu Thr Ile Lys Leu Ala Leu Glu Glu Phe Asn Lys His Arg Asp
850 855 860
Ile Asp Lys Gln Cys Arg Phe Glu Glu Ile Leu Ala Asn Phe Ala Ala
865 870 875 880
Ile Pro Met Ile Phe Asp Glu Ile Ala Gln Asn Lys Asp Asn Leu Ala
885 890 895
Gln Ile Ser Ile Lys Tyr Gln Asn Gln Gly Lys Lys Asp Leu Leu Gln
900 905 910
Ala Ser Ala Glu Asp Asp Val Lys Ala Ile Lys Asp Leu Leu Asp Gln
915 920 925
Thr Asn Asn Leu Leu His Lys Leu Lys Ile Phe His Ile Ser Gln Ser
930 935 940
Glu Asp Lys Ala Asn Ile Leu Asp Lys Asp Glu His Phe Tyr Leu Val
945 950 955 960
Phe Glu Glu Cys Tyr Phe Glu Leu Ala Asn Ile Val Pro Leu Tyr Asn
965 970 975
Lys Ile Arg Asn Tyr Ile Thr Gln Lys Pro Tyr Ser Asp Glu Lys Phe
980 985 990
Lys Leu Asn Phe Glu Asn Ser Thr Leu Ala Asn Gly Trp Asp Lys Asn
995 1000 1005
Lys Glu Pro Asp Asn Thr Ala Ile Leu Phe Ile Lys Asp Asp Lys Tyr
1010 1015 1020
Tyr Leu Gly Val Met Asn Lys Lys Asn Asn Lys Ile Phe Asp Asp Lys
1025 1030 1035 1040
Ala Ile Lys Glu Asn Lys Gly Glu Gly Tyr Lys Lys Ile Val Tyr Lys
1045 1050 1055
Leu Leu Pro Gly Ala Asn Lys Met Leu Pro Lys Val Phe Phe Ser Ala
1060 1065 1070
Lys Ser Ile Lys Phe Tyr Asn Pro Ser Glu Asp Ile Leu Arg Ile Arg
1075 1080 1085
Asn His Ser Thr His Thr Lys Asn Gly Ser Pro Gln Lys Gly Tyr Glu
1090 1095 1100
Lys Phe Glu Phe Asn Ile Glu Asp Cys Arg Lys Phe Ile Asp Phe Tyr
1105 1110 1115 1120
Lys Gln Ser Ile Ser Lys His Pro Glu Trp Lys Asp Phe Gly Phe Arg
1125 1130 1135
Phe Ser Asp Thr Gln Arg Tyr Asn Ser Ile Asp Glu Phe Tyr Arg Glu
1140 1145 1150
Val Glu Asn Gln Gly Tyr Lys Leu Thr Phe Glu Asn Ile Ser Glu Ser
1155 1160 1165
Tyr Ile Asp Ser Val Val Asn Gln Gly Lys Leu Tyr Leu Phe Gln Ile
1170 1175 1180
Tyr Asn Lys Asp Phe Ser Ala Tyr Ser Lys Gly Arg Pro Asn Leu His
1185 1190 1195 1200
Thr Leu Tyr Trp Lys Ala Leu Phe Asp Glu Arg Asn Leu Gln Asp Val
1205 1210 1215
Val Tyr Lys Leu Asn Gly Glu Ala Glu Leu Phe Tyr Arg Lys Gln Ser
1220 1225 1230
Ile Pro Lys Lys Ile Thr His Pro Ala Lys Glu Ala Ile Ala Asn Lys
1235 1240 1245
Asn Lys Asp Asn Pro Lys Lys Glu Ser Val Phe Glu Tyr Asp Leu Ile
1250 1255 1260
Lys Asp Lys Arg Phe Thr Glu Asp Lys Phe Phe Phe His Cys Pro Ile
1265 1270 1275 1280
Thr Ile Asn Phe Lys Ser Ser Gly Ala Asn Lys Phe Asn Asp Glu Ile
1285 1290 1295
Asn Leu Leu Leu Lys Glu Lys Ala Asn Asp Val His Ile Leu Ser Ile
1300 1305 1310
Ala Arg Gly Glu Arg His Leu Ala Tyr Tyr Thr Leu Val Asp Gly Lys
1315 1320 1325
Gly Asn Ile Ile Lys Gln Asp Thr Phe Asn Ile Ile Gly Asn Asp Arg
1330 1335 1340
Met Lys Thr Asn Tyr His Asp Lys Leu Ala Ala Ile Glu Lys Asp Arg
1345 1350 1355 1360
Asp Ser Ala Arg Lys Asp Trp Lys Lys Ile Asn Asn Ile Lys Glu Met
1365 1370 1375
Lys Glu Gly Tyr Leu Ser Gln Val Val His Glu Ile Ala Lys Leu Val
1380 1385 1390
Ile Glu Tyr Asn Ala Ile Val Val Phe Glu Asp Leu Asn Phe Gly Phe
1395 1400 1405
Lys Arg Gly Arg Phe Lys Val Glu Lys Gln Val Tyr Gln Lys Leu Glu
1410 1415 1420
Lys Met Leu Ile Glu Lys Leu Asn Tyr Leu Val Phe Lys Asp Asn Glu
1425 1430 1435 1440
Phe Asp Lys Thr Gly Gly Val Leu Arg Ala Tyr Gln Leu Thr Ala Pro
1445 1450 1455
Phe Glu Thr Phe Lys Lys Met Gly Lys Gln Thr Gly Ile Ile Tyr Tyr
1460 1465 1470
Val Pro Ala Gly Phe Thr Ser Lys Ile Cys Pro Val Thr Gly Phe Val
1475 1480 1485
Asn Gln Leu Tyr Pro Lys Tyr Glu Ser Val Ser Lys Ser Gln Glu Phe
1490 1495 1500
Phe Ser Lys Phe Asp Lys Ile Cys Tyr Asn Leu Asp Lys Gly Tyr Phe
1505 1510 1515 1520
Glu Phe Ser Phe Asp Tyr Lys Asn Phe Gly Asp Lys Ala Ala Lys Gly
1525 1530 1535
Lys Trp Thr Ile Ala Ser Phe Gly Ser Arg Leu Ile Asn Phe Arg Asn
1540 1545 1550
Ser Asp Lys Asn His Asn Trp Asp Thr Arg Glu Val Tyr Pro Thr Lys
1555 1560 1565
Glu Leu Glu Lys Leu Leu Lys Asp Tyr Ser Ile Glu Tyr Gly His Gly
1570 1575 1580
Glu Cys Ile Lys Ala Ala Ile Cys Gly Glu Ser Asp Lys Lys Phe Phe
1585 1590 1595 1600
Ala Lys Leu Thr Ser Val Leu Asn Thr Ile Leu Gln Met Arg Asn Ser
1605 1610 1615
Lys Thr Gly Thr Glu Leu Asp Tyr Leu Ile Ser Pro Val Ala Asp Val
1620 1625 1630
Asn Gly Asn Phe Phe Asp Ser Arg Gln Ala Pro Lys Asn Met Pro Gln
1635 1640 1645
Asp Ala Asp Ala Asn Gly Ala Tyr His Ile Gly Leu Lys Gly Leu Met
1650 1655 1660
Leu Leu Gly Arg Ile Lys Asn Asn Gln Glu Gly Lys Lys Leu Asn Leu
1665 1670 1675 1680
Val Ile Lys Asn Glu Glu Tyr Phe Glu Phe Val Gln Asn Arg Asn Asn
1685 1690 1695
Ser Gly Gly Ser Pro Lys Lys Lys Arg Lys Val
1700 1705
<210> 7
<211> 1509
<212> PRT
<213> (Artificial sequence)
<400> 7
Met Ser Glu Val Glu Phe Ser His Glu Tyr Trp Met Arg His Ala Leu
1 5 10 15
Thr Leu Ala Lys Arg Ala Arg Asp Glu Arg Glu Val Pro Val Gly Ala
20 25 30
Val Leu Val Leu Asn Asn Arg Val Ile Gly Glu Gly Trp Asn Arg Ala
35 40 45
Ile Gly Leu His Asp Pro Thr Ala His Ala Glu Ile Met Ala Leu Arg
50 55 60
Gln Gly Gly Leu Val Met Gln Asn Tyr Arg Leu Ile Asp Ala Thr Leu
65 70 75 80
Tyr Ser Thr Phe Glu Pro Cys Val Met Cys Ala Gly Ala Met Ile His
85 90 95
Ser Arg Ile Gly Arg Val Val Phe Gly Val Arg Asn Ser Lys Arg Gly
100 105 110
Ala Ala Gly Ser Leu Met Asn Val Leu Asn Tyr Pro Gly Met Asn His
115 120 125
Arg Val Glu Ile Thr Glu Gly Ile Leu Ala Asp Glu Cys Ala Ala Leu
130 135 140
Leu Cys Asp Phe Tyr Arg Met Pro Arg Arg Val Phe Asn Ala Gln Lys
145 150 155 160
Lys Ala Gln Ser Ser Ile Asn Ser Gly Gly Ser Ser Gly Gly Ser Ser
165 170 175
Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser Ser
180 185 190
Gly Gly Ser Ser Gly Gly Ser Ser Ile Tyr Gln Glu Phe Val Asn Lys
195 200 205
Tyr Ser Leu Ser Lys Thr Leu Arg Phe Glu Leu Ile Pro Gln Gly Lys
210 215 220
Thr Leu Glu Asn Ile Lys Ala Arg Gly Leu Ile Leu Asp Asp Glu Lys
225 230 235 240
Arg Ala Lys Asp Tyr Lys Lys Ala Lys Gln Ile Ile Asp Lys Tyr His
245 250 255
Gln Phe Phe Ile Glu Glu Ile Leu Ser Ser Val Cys Ile Ser Glu Asp
260 265 270
Leu Leu Gln Asn Tyr Ser Asp Val Tyr Phe Lys Leu Lys Lys Ser Asp
275 280 285
Asp Asp Asn Leu Gln Lys Asp Phe Lys Ser Ala Lys Asp Thr Ile Lys
290 295 300
Lys Gln Ile Ser Glu Tyr Ile Lys Asp Ser Glu Lys Phe Lys Asn Leu
305 310 315 320
Phe Asn Gln Asn Leu Ile Asp Ala Lys Lys Gly Gln Glu Ser Asp Leu
325 330 335
Ile Leu Trp Leu Lys Gln Ser Lys Asp Asn Gly Ile Glu Leu Phe Lys
340 345 350
Ala Asn Ser Asp Ile Thr Asp Ile Asp Glu Ala Leu Glu Ile Ile Lys
355 360 365
Ser Phe Lys Gly Trp Thr Thr Tyr Phe Lys Gly Phe His Glu Asn Arg
370 375 380
Lys Asn Val Tyr Ser Ser Asn Asp Ile Pro Thr Ser Ile Ile Tyr Arg
385 390 395 400
Ile Val Asp Asp Asn Leu Pro Lys Phe Leu Glu Asn Lys Ala Lys Tyr
405 410 415
Glu Ser Leu Lys Asp Lys Ala Pro Glu Ala Ile Asn Tyr Glu Gln Ile
420 425 430
Lys Lys Asp Leu Ala Glu Glu Leu Thr Phe Asp Ile Asp Tyr Lys Thr
435 440 445
Ser Glu Val Asn Gln Arg Val Phe Ser Leu Asp Glu Val Phe Glu Ile
450 455 460
Ala Asn Phe Asn Asn Tyr Leu Asn Gln Ser Gly Ile Thr Lys Phe Asn
465 470 475 480
Thr Ile Ile Gly Gly Lys Phe Val Asn Gly Glu Asn Thr Lys Arg Lys
485 490 495
Gly Ile Asn Glu Tyr Ile Asn Leu Tyr Ser Gln Gln Ile Asn Asp Lys
500 505 510
Thr Leu Lys Lys Tyr Lys Met Ser Val Leu Phe Lys Gln Ile Leu Ser
515 520 525
Asp Thr Glu Ser Lys Ser Phe Val Ile Asp Lys Leu Glu Asp Asp Ser
530 535 540
Asp Val Val Thr Thr Met Gln Ser Phe Tyr Glu Gln Ile Ala Ala Phe
545 550 555 560
Lys Thr Val Glu Glu Lys Ser Ile Lys Glu Thr Leu Ser Leu Leu Phe
565 570 575
Asp Asp Leu Lys Ala Gln Lys Leu Asp Leu Ser Lys Ile Tyr Phe Lys
580 585 590
Asn Asp Lys Ser Leu Thr Asp Leu Ser Gln Gln Val Phe Asp Asp Tyr
595 600 605
Ser Val Ile Gly Thr Ala Val Leu Glu Tyr Ile Thr Gln Gln Ile Ala
610 615 620
Pro Lys Asn Leu Asp Asn Pro Ser Lys Lys Glu Gln Glu Leu Ile Ala
625 630 635 640
Lys Lys Thr Glu Lys Ala Lys Tyr Leu Ser Leu Glu Thr Ile Lys Leu
645 650 655
Ala Leu Glu Glu Phe Asn Lys His Arg Asp Ile Asp Lys Gln Cys Arg
660 665 670
Phe Glu Glu Ile Leu Ala Asn Phe Ala Ala Ile Pro Met Ile Phe Asp
675 680 685
Glu Ile Ala Gln Asn Lys Asp Asn Leu Ala Gln Ile Ser Ile Lys Tyr
690 695 700
Gln Asn Gln Gly Lys Lys Asp Leu Leu Gln Ala Ser Ala Glu Asp Asp
705 710 715 720
Val Lys Ala Ile Lys Asp Leu Leu Asp Gln Thr Asn Asn Leu Leu His
725 730 735
Lys Leu Lys Ile Phe His Ile Ser Gln Ser Glu Asp Lys Ala Asn Ile
740 745 750
Leu Asp Lys Asp Glu His Phe Tyr Leu Val Phe Glu Glu Cys Tyr Phe
755 760 765
Glu Leu Ala Asn Ile Val Pro Leu Tyr Asn Lys Ile Arg Asn Tyr Ile
770 775 780
Thr Gln Lys Pro Tyr Ser Asp Glu Lys Phe Lys Leu Asn Phe Glu Asn
785 790 795 800
Ser Thr Leu Ala Asn Gly Trp Asp Lys Asn Lys Glu Pro Asp Asn Thr
805 810 815
Ala Ile Leu Phe Ile Lys Asp Asp Lys Tyr Tyr Leu Gly Val Met Asn
820 825 830
Lys Lys Asn Asn Lys Ile Phe Asp Asp Lys Ala Ile Lys Glu Asn Lys
835 840 845
Gly Glu Gly Tyr Lys Lys Ile Val Tyr Lys Leu Leu Pro Gly Ala Asn
850 855 860
Lys Met Leu Pro Lys Val Phe Phe Ser Ala Lys Ser Ile Lys Phe Tyr
865 870 875 880
Asn Pro Ser Glu Asp Ile Leu Arg Ile Arg Asn His Ser Thr His Thr
885 890 895
Lys Asn Gly Ser Pro Gln Lys Gly Tyr Glu Lys Phe Glu Phe Asn Ile
900 905 910
Glu Asp Cys Arg Lys Phe Ile Asp Phe Tyr Lys Gln Ser Ile Ser Lys
915 920 925
His Pro Glu Trp Lys Asp Phe Gly Phe Arg Phe Ser Asp Thr Gln Arg
930 935 940
Tyr Asn Ser Ile Asp Glu Phe Tyr Arg Glu Val Glu Asn Gln Gly Tyr
945 950 955 960
Lys Leu Thr Phe Glu Asn Ile Ser Glu Ser Tyr Ile Asp Ser Val Val
965 970 975
Asn Gln Gly Lys Leu Tyr Leu Phe Gln Ile Tyr Asn Lys Asp Phe Ser
980 985 990
Ala Tyr Ser Lys Gly Arg Pro Asn Leu His Thr Leu Tyr Trp Lys Ala
995 1000 1005
Leu Phe Asp Glu Arg Asn Leu Gln Asp Val Val Tyr Lys Leu Asn Gly
1010 1015 1020
Glu Ala Glu Leu Phe Tyr Arg Lys Gln Ser Ile Pro Lys Lys Ile Thr
1025 1030 1035 1040
His Pro Ala Lys Glu Ala Ile Ala Asn Lys Asn Lys Asp Asn Pro Lys
1045 1050 1055
Lys Glu Ser Val Phe Glu Tyr Asp Leu Ile Lys Asp Lys Arg Phe Thr
1060 1065 1070
Glu Asp Lys Phe Phe Phe His Cys Pro Ile Thr Ile Asn Phe Lys Ser
1075 1080 1085
Ser Gly Ala Asn Lys Phe Asn Asp Glu Ile Asn Leu Leu Leu Lys Glu
1090 1095 1100
Lys Ala Asn Asp Val His Ile Leu Ser Ile Ala Arg Gly Glu Arg His
1105 1110 1115 1120
Leu Ala Tyr Tyr Thr Leu Val Asp Gly Lys Gly Asn Ile Ile Lys Gln
1125 1130 1135
Asp Thr Phe Asn Ile Ile Gly Asn Asp Arg Met Lys Thr Asn Tyr His
1140 1145 1150
Asp Lys Leu Ala Ala Ile Glu Lys Asp Arg Asp Ser Ala Arg Lys Asp
1155 1160 1165
Trp Lys Lys Ile Asn Asn Ile Lys Glu Met Lys Glu Gly Tyr Leu Ser
1170 1175 1180
Gln Val Val His Glu Ile Ala Lys Leu Val Ile Glu Tyr Asn Ala Ile
1185 1190 1195 1200
Val Val Phe Ala Asp Leu Asn Phe Gly Phe Lys Arg Gly Arg Phe Lys
1205 1210 1215
Val Glu Lys Gln Val Tyr Gln Lys Leu Glu Lys Met Leu Ile Glu Lys
1220 1225 1230
Leu Asn Tyr Leu Val Phe Lys Asp Asn Glu Phe Asp Lys Thr Gly Gly
1235 1240 1245
Val Leu Arg Ala Tyr Gln Leu Thr Ala Pro Phe Glu Thr Phe Lys Lys
1250 1255 1260
Met Gly Lys Gln Thr Gly Ile Ile Tyr Tyr Val Pro Ala Gly Phe Thr
1265 1270 1275 1280
Ser Lys Ile Cys Pro Val Thr Gly Phe Val Asn Gln Leu Tyr Pro Lys
1285 1290 1295
Tyr Glu Ser Val Ser Lys Ser Gln Glu Phe Phe Ser Lys Phe Asp Lys
1300 1305 1310
Ile Cys Tyr Asn Leu Asp Lys Gly Tyr Phe Glu Phe Ser Phe Asp Tyr
1315 1320 1325
Lys Asn Phe Gly Asp Lys Ala Ala Lys Gly Lys Trp Thr Ile Ala Ser
1330 1335 1340
Phe Gly Ser Arg Leu Ile Asn Phe Arg Asn Ser Asp Lys Asn His Asn
1345 1350 1355 1360
Trp Asp Thr Arg Glu Val Tyr Pro Thr Lys Glu Leu Glu Lys Leu Leu
1365 1370 1375
Lys Asp Tyr Ser Ile Glu Tyr Gly His Gly Glu Cys Ile Lys Ala Ala
1380 1385 1390
Ile Cys Gly Glu Ser Asp Lys Lys Phe Phe Ala Lys Leu Thr Ser Val
1395 1400 1405
Leu Asn Thr Ile Leu Gln Met Arg Asn Ser Lys Thr Gly Thr Glu Leu
1410 1415 1420
Asp Tyr Leu Ile Ser Pro Val Ala Asp Val Asn Gly Asn Phe Phe Asp
1425 1430 1435 1440
Ser Arg Gln Ala Pro Lys Asn Met Pro Gln Asp Ala Asp Ala Asn Gly
1445 1450 1455
Ala Tyr His Ile Gly Leu Lys Gly Leu Met Leu Leu Gly Arg Ile Lys
1460 1465 1470
Asn Asn Gln Glu Gly Lys Lys Leu Asn Leu Val Ile Lys Asn Glu Glu
1475 1480 1485
Tyr Phe Glu Phe Val Gln Asn Arg Asn Asn Ser Gly Gly Ser Pro Lys
1490 1495 1500
Lys Lys Arg Lys Val
1505
<210> 8
<211> 9445
<212> DNA
<213> (Artificial sequence)
<400> 8
atgtcatgac attggtgtac agaaatggcg cagcaatggc aagaacgtcc cgggcggagc 60
tcaggcctta actcacatta attgcgttgc gctcactgcc cgctttccag tcgggaaacc 120
tgtcgtgcca gctgcattaa tgaatcggcc aacgcgcggg gagaggcggt ttgcgtattg 180
ggcgccaggg tggtttttct tttcaccagt gagacgggca acagctgatt gcccttcacc 240
gcctggccct gagagagttg cagcaagcgg tccacgctgg tttgccccag caggcgaaaa 300
tcctgtttga tggtggttaa cggcgggata taacatgagc tgtcttcggt atcgtcgtat 360
cccactaccg agatatccgc accaacgcgc agcccggact cggtaatggc gcgcattgcg 420
cccagcgcca tctgatcgtt ggcaaccagc atcgcagtgg gaacgatgcc ctcattcagc 480
atttgcatgg tttgttgaaa accggacatg gcactccagt cgccttcccg ttccgctatc 540
ggctgaattt gattgcgagt gagatattta tgccagccag ccagacgcag acgcgccgag 600
acagaactta atgggcccgc taacagcgcg atttgctggt gacccaatgc gaccagatgc 660
tccacgccca gtcgcgtacc gtcttcatgg gagaaaataa tactgttgat gggtgtctgg 720
tcagagacat caagaaataa cgccggaaca ttagtgcagg cagcttccac agcaatggca 780
tcctggtcat ccagcggata gttaatgatc agcccactga cgcgttgcgc gagaagattg 840
tgcaccgccg ttttacaggc ttcgacgccg cttcgttcta ccatcgacac caccacgctg 900
gcacccagtt gatcggcgcg agatttaatc gccgcgacaa tttgcgacgg cgcgtgcagg 960
gccagactgg aggtggcaac gccaatcagc aacgactgtt tgcccgccag ttgttgtgcc 1020
acgcggttgg gaatgtaatt cagctccgcc atcgccgctt ccactttttc ccgcgttttc 1080
gcagaaacgt ggctggcctg gttcaccacg cgggaaacgg tctgataaga gacaccggca 1140
tactctgcga catcgtataa cgttactggt ttcatcaaaa tcgtctccct ccgtttgaat 1200
atttgattga tcgtaaccag atgaagcact ctttccacta tccctacagt gttatggctt 1260
gaacaatcac gaaacaataa ttggtacgta cgatctttca gccgactcaa acatcaaatc 1320
ttacaaatgt agtctttgaa agtattacat atgtaagatt taaatgcaac cgttttttcg 1380
gaaggaaatg atgacctcgt ttccaccgga attagcttgg taccagctat tgtaacataa 1440
tcggtacggg ggtgaaaaag ctaacggaaa agggagcgga aaagaatgat gtaagcgtga 1500
aaaatttttt aaaaaatctc ttgacattgg aagggagata tgttattata agaattgcgg 1560
aattgtgagc ggataacaat tcataattgt gagcggataa caattcaacc ccaaaggagg 1620
tgggatccat gagcgaagtt gaattcagcc acgagtactg gatgcgccat gcccttacac 1680
ttgccaaacg cgctcgcgat gaacgcgaag tcccagttgg cgccgttctt gttcttaaca 1740
accgcgtcat cggagaaggc tggaaccgcg ctatcggact tcatgatccg acagctcatg 1800
ccgaaattat ggctctgcgc caaggcggac ttgtcatgca gaattacaga cttattgatg 1860
ccacgcttta ctcaacgttc gaaccgtgcg tcatgtgcgc cggcgccatg attcatagcc 1920
gcatcggcag agttgtcttc ggcgtccgca atgctaaaac gggcgccgcc ggctccctta 1980
tggacgtcct tcattacccg ggcatgaatc accgcgttga aatcacagaa ggcattctgg 2040
ccgatgagtg cgccgctctg ctgtgctact tcttcagaat gccgagaaga gtcttcaacg 2100
cccagaagaa agcccaaagc agcacagact ctggaggatc atccggaggc agctctggaa 2160
gtgaaacacc aggaacaagc gaatcagcta caccagagtc ctctggaggc tcatctggag 2220
gaagctcaat ttatcaagaa tttgttaata aatatagttt aagtaaaact ctaagatttg 2280
agttaatccc acagggtaaa acacttgaaa acataaaagc aagaggtttg attttagatg 2340
atgagaaaag agctaaagac tacaaaaagg ctaaacaaat aattgataaa tatcatcagt 2400
tttttataga ggagatatta agttcggttt gtattagcga agatttatta caaaactatt 2460
ctgatgttta ttttaaactt aaaaagagtg atgatgataa tctacaaaaa gattttaaaa 2520
gtgcaaaaga tacgataaag aaacaaatat ctgaatatat aaaggactca gagaaattta 2580
agaatttgtt taatcaaaac cttatcgatg ctaaaaaagg gcaagagtca gatttaattc 2640
tatggctaaa gcaatctaag gataatggta tagaactatt taaagccaat agtgatatca 2700
cagatataga tgaggcgtta gaaataatca aatcttttaa aggttggaca acttatttta 2760
agggttttca tgaaaataga aaaaatgttt atagtagcaa tgatattcct acatctatta 2820
tttataggat agtagatgat aatttgccta aatttctaga aaataaagct aagtatgaga 2880
gtttaaaaga caaagctcca gaagctataa actatgaaca aattaaaaaa gatttggcag 2940
aagagctaac ctttgatatt gactacaaaa catctgaagt taatcaaaga gttttttcac 3000
ttgatgaagt ttttgagata gcaaacttta ataattatct aaatcaaagt ggtattacta 3060
aatttaatac tattattggt ggtaaatttg taaatggtga aaatacaaag agaaaaggta 3120
taaatgaata tataaatcta tactcacagc aaataaatga taaaacactc aaaaaatata 3180
aaatgagtgt tttatttaag caaattttaa gtgatacaga atctaaatct tttgtaattg 3240
ataagttaga agatgatagt gatgtagtta caacgatgca aagtttttat gagcaaatag 3300
cagcttttaa aacagtagaa gaaaaatcta ttaaagaaac actatcttta ttatttgatg 3360
atttaaaagc tcaaaaactt gatttgagta aaatttattt taaaaatgat aaatctctta 3420
ctgatctatc acaacaagtt tttgatgatt atagtgttat tggtacagcg gtactagaat 3480
atataactca acaaatagca cctaaaaatc ttgataaccc tagtaagaaa gagcaagaat 3540
taatagccaa aaaaactgaa aaagcaaaat acttatctct agaaactata aagcttgcct 3600
tagaagaatt taataagcat agagatatag ataaacagtg taggtttgaa gaaatacttg 3660
caaactttgc ggctattccg atgatatttg atgaaatagc tcaaaacaaa gacaatttgg 3720
cacagatatc tatcaaatat caaaatcaag gtaaaaaaga cctacttcaa gctagtgcgg 3780
aagatgatgt taaagctatc aaggatcttt tagatcaaac taataatctc ttacataaac 3840
taaaaatatt tcatattagt cagtcagaag ataaggcaaa tattttagac aaggatgagc 3900
atttttatct agtatttgag gagtgctact ttgagctagc gaatatagtg cctctttata 3960
acaaaattag aaactatata actcaaaagc catatagtga tgagaaattt aagctcaatt 4020
ttgagaactc gactttggct aatggttggg ataaaaataa agagcctgac aatacggcaa 4080
ttttatttat caaagatgat aaatattatc tgggtgtgat gaataagaaa aataacaaaa 4140
tatttgatga taaagctatc aaagaaaata aaggcgaggg ttataaaaaa attgtttata 4200
aacttttacc tggcgcaaat aaaatgttac ctaaggtttt cttttctgct aaatctataa 4260
aattttataa tcctagtgaa gatatactta gaataagaaa tcattccaca catacaaaaa 4320
atggtagtcc tcaaaaagga tatgaaaaat ttgagtttaa tattgaagat tgccgaaaat 4380
ttatagattt ttataaacag tctataagta agcatccgga gtggaaagat tttggattta 4440
gattttctga tactcaaaga tataattcta tagatgaatt ttatagagaa gttgaaaatc 4500
aaggctacaa actaactttt gaaaatatat cagagagcta tattgatagc gtagttaatc 4560
agggtaaatt gtacctattc caaatctata ataaagattt ttcagcttat agcaaagggc 4620
gaccaaatct acatacttta tattggaaag cgctgtttga tgagagaaat cttcaagatg 4680
tggtttataa gctaaatggt gaggcagagc ttttttatcg taaacaatca atacctaaaa 4740
aaatcactca cccagctaaa gaggcaatag ctaataaaaa caaagataat cctaaaaaag 4800
agagtgtttt tgaatatgat ttaatcaaag ataaacgctt tactgaagat aagtttttct 4860
ttcactgtcc tattacaatc aattttaaat ctagtggagc taataagttt aatgatgaaa 4920
tcaatttatt gctaaaagaa aaagcaaatg atgttcatat attaagtata gctagaggtg 4980
aaagacattt agcttactat actttggtag atggtaaagg caatatcatc aaacaagata 5040
ctttcaacat cattggtaat gatagaatga aaacaaacta ccatgataag cttgctgcaa 5100
tagagaaaga tagggattca gctaggaaag actggaaaaa gataaataac atcaaagaga 5160
tgaaagaggg ctatctatct caggtagttc atgaaatagc taagctagtt atagagtata 5220
atgctattgt ggtttttgag gatttaaatt ttggatttaa aagagggcgt ttcaaggtag 5280
agaagcaggt ctatcaaaag ttagaaaaaa tgctaattga gaaactaaac tatctagttt 5340
tcaaagataa tgagtttgat aaaactgggg gagtgcttag agcttatcag ctaacagcac 5400
cttttgagac ttttaaaaag atgggtaaac aaacaggtat tatctactat gtaccagctg 5460
gttttacttc aaaaatttgt cctgtaactg gttttgtaaa tcagttatat cctaagtatg 5520
aaagtgtcag caaatctcaa gagttcttta gtaagtttga caagatttgt tataaccttg 5580
ataagggcta ttttgagttt agttttgatt ataaaaactt tggtgacaag gctgccaaag 5640
gcaagtggac tatagctagc tttgggagta gattgattaa ctttagaaat tcagataaaa 5700
atcataattg ggatactcga gaagtttatc caactaaaga gttggagaaa ttgctaaaag 5760
attattctat cgaatatggg catggcgaat gtatcaaagc agctatttgc ggtgagagcg 5820
acaaaaagtt ttttgctaag ctaactagtg tcctaaatac tatcttacaa atgcgtaact 5880
caaaaacagg tactgagtta gattatctaa tttcaccagt agcagatgta aatggcaatt 5940
tctttgattc gcgacaggcg ccaaaaaata tgcctcaaga tgctgatgcc aatggtgctt 6000
atcatattgg gctaaaaggt ctgatgctac taggtaggat caaaaataat caagagggca 6060
aaaaactcaa tttggttatc aaaaatgaag agtattttga gttcgtgcag aataggaata 6120
actctggtgg ttctcccaag aagaagagga aagtctaact gcagtataat cagaaacagc 6180
ccgcggatgt tgatctgcgg gctgtttttt attgatcgaa tggccatgac caaaatccct 6240
taacgtgaag ttttcgttcc actgagcgtc agaccccgta gaaaagatca aaggatcttc 6300
ttgagatcct ttttttctgc gcgtaatctg ctgcttgcaa acaaaaaaac caccgctacc 6360
agcggtggtt tgtttgccgg atcaagagct accaactctt tttccgaagg taactggctt 6420
cagcagagcg cagataccaa atactgtcct tctagtgtag ccgtagttag gccaccactt 6480
caagaactct gtagcaccgc ctacatacct cgctctgcta atcctgttac cagtggctgc 6540
tgccagtggc gataagtcgt gtcttaccgg gttggactca agacgatagt taccggataa 6600
ggcgcagcgg tcgggctgaa cggggggttc gtgcacacag cccagcttgg agcgaacgac 6660
ctacaccgaa ctgagatacc tacagcgtga gctatgagaa agcgccacgc ttcccgaagg 6720
gagaaaggcg gacaggtatc cggtaagcgg cagggtcgga acaggagagc gcacgaggga 6780
gcttccaggg ggaaacgcct ggtatcttta tagtcctgtc gggtttcgcc acctctgact 6840
tgagcgtcga tttttgtgat gctcgtcagg ggggcggagc ctatggaaaa acgccagcaa 6900
cgcggccttt ttacggttcc tggccttttg ctggcctttt gctcacatgt tctttcctgc 6960
gttatcccct gattctgtgg ataaccgtat taccgccttt gagtgagctg agacaggtca 7020
ttcagactgg ctaatgcacc cagtaaggca gcggtatcat caactcaaaa tggtatgcgt 7080
tttgacacat ccactatata tccgtgtcgt tctgtccact cctgaatccc attccagaaa 7140
ttctctagcg attccagaag tttctcagag tcggaaagtt gaccagacat tacgaactgg 7200
cacagatggt cataacctga aggaagatct gattgcttaa ctgcttcagt taagaccgaa 7260
gcgctcgtcg tataacagat gcgatgatgc agaccaatca acatggcacc tgccattgct 7320
acctgcacag tcaaggatgg tagaaatgtt gtcggtcctt gcacacgaat attacgccat 7380
ttgcctgcat attcaaacag ctcttctacg ataagggcac aaatcgcatc gtggaacgtt 7440
tgggcttcta ccgatttagc agtttgatac actttctcta agtatccacc tgaatcataa 7500
atcggcaaaa tagagaaaaa ttgaccatgt gtaagcggcc aatctgattc cacctgagat 7560
gcataatcta gtagaatctc ttcgctatca aaattcactt ccaccttcca ctcaccggtt 7620
gtccattcat ggctgaactc tgcttcctct gttgacatga cacacatcat ctcaatatcc 7680
gaatagggcc catcagtctg acgaccaaga gagccataaa caccaatagc cttaacatca 7740
tccccatatt tatccaatat tcgttcctta atttcatgaa caatcttcat tctttcttct 7800
ctagtcatta ttattggtcc attcactatt ctcattcccc tttcagataa ttttagattt 7860
gcttttctaa ataagaatat ttggagagca ccgttcttat tcagctatta aacccattat 7920
atcgggtttt tgaggggatt tcaactgcag acacctaaat tcaaaatcta tcggtcagat 7980
ttataccgat ttgattttat atattcttga ataacatacg ccgagttatc acataaaagc 8040
gggaaccaat catcaaattt aaacttcatt gcataatcca ttaaactctt aaattctacg 8100
attccttgtt catcaataaa ctcaatcatt tctttaatta atttatatct atctgttgtt 8160
gttttcttta ataattcatc aacatctaca ccgccataaa ctatcatatc ttctttttga 8220
tatttaaatt tattaggatc gtccatgtga agcatatatc tcacaagacc tttcacactt 8280
cctgcaatct gcggaatagt cgcattcaat tcttctgtaa ttatttttat ctgttcataa 8340
gatttattac cctcatacat cactagaata tgataatgct cttttttcat cctatcttct 8400
gtatcagtat ccctatcatg taatggagac actacaaatt gaatgtgtaa ctcttttaaa 8460
tactctaacc actcggcttt tgctgattct ggatataaaa caaatgtcca attacgtcct 8520
cttgaatttt tcttgttttc agtttctttt attacatttt cgctcatgat ataataacgg 8580
tgctaataca tttaacaaaa tttagtcata gataggcagc atgccagtgc tgtctatctt 8640
tttttgttta aaatgcaccg tattcctcct ttgcatattt ttttattaga ataccggttg 8700
catctgattt gctaatatta tatttttctt tgattctatt taatatctca ttttcttctg 8760
ttgtaagtct taaagtaaca gcaacttttt tctcttcttt tctatctaca accatcactg 8820
tacctcccaa catctgtttt tttcacttta acataaaaaa caacctttta acattaaaaa 8880
cccaatattt atttatttgt ttggacaatg gacaatggac acctaggggg gaggtcgtag 8940
taccccccta tgttttctcc cctaaataac cccaaaaatc taagaaaaaa agacctcaaa 9000
aaggtcttta attaacatct caaatttcgc atttattcca atttcctttt tgcgtgtgat 9060
gcgttattaa cgttgatata atttaaattt tatttgacaa aaatgggctc gtgttgtaca 9120
ataaatgtag aggtagagac gcgaggtcta agaactttaa ataatttcta ctgttgtaga 9180
tagagaccgt gaagttaata aggtctcaaa tttctactgt tgtagatcgt ctctgaactg 9240
attcaagcaa gcttaaaccc agctcaatga gctgggtttt ttgtttgttt tttcaaactt 9300
agttagcttg gccagtgcct ctagagtcaa gtaaagagtc gacctgttac gaacggcaga 9360
tcagaatttt gtaataaaaa aagagcctgc tcattacact gcgggctctt tttcatggtc 9420
agaagacggg taaccaagat aacaa 9445
<210> 9
<211> 9445
<212> DNA
<213> (Artificial sequence)
<400> 9
atgtcatgac attggtgtac agaaatggcg cagcaatggc aagaacgtcc cgggcggagc 60
tcaggcctta actcacatta attgcgttgc gctcactgcc cgctttccag tcgggaaacc 120
tgtcgtgcca gctgcattaa tgaatcggcc aacgcgcggg gagaggcggt ttgcgtattg 180
ggcgccaggg tggtttttct tttcaccagt gagacgggca acagctgatt gcccttcacc 240
gcctggccct gagagagttg cagcaagcgg tccacgctgg tttgccccag caggcgaaaa 300
tcctgtttga tggtggttaa cggcgggata taacatgagc tgtcttcggt atcgtcgtat 360
cccactaccg agatatccgc accaacgcgc agcccggact cggtaatggc gcgcattgcg 420
cccagcgcca tctgatcgtt ggcaaccagc atcgcagtgg gaacgatgcc ctcattcagc 480
atttgcatgg tttgttgaaa accggacatg gcactccagt cgccttcccg ttccgctatc 540
ggctgaattt gattgcgagt gagatattta tgccagccag ccagacgcag acgcgccgag 600
acagaactta atgggcccgc taacagcgcg atttgctggt gacccaatgc gaccagatgc 660
tccacgccca gtcgcgtacc gtcttcatgg gagaaaataa tactgttgat gggtgtctgg 720
tcagagacat caagaaataa cgccggaaca ttagtgcagg cagcttccac agcaatggca 780
tcctggtcat ccagcggata gttaatgatc agcccactga cgcgttgcgc gagaagattg 840
tgcaccgccg ttttacaggc ttcgacgccg cttcgttcta ccatcgacac caccacgctg 900
gcacccagtt gatcggcgcg agatttaatc gccgcgacaa tttgcgacgg cgcgtgcagg 960
gccagactgg aggtggcaac gccaatcagc aacgactgtt tgcccgccag ttgttgtgcc 1020
acgcggttgg gaatgtaatt cagctccgcc atcgccgctt ccactttttc ccgcgttttc 1080
gcagaaacgt ggctggcctg gttcaccacg cgggaaacgg tctgataaga gacaccggca 1140
tactctgcga catcgtataa cgttactggt ttcatcaaaa tcgtctccct ccgtttgaat 1200
atttgattga tcgtaaccag atgaagcact ctttccacta tccctacagt gttatggctt 1260
gaacaatcac gaaacaataa ttggtacgta cgatctttca gccgactcaa acatcaaatc 1320
ttacaaatgt agtctttgaa agtattacat atgtaagatt taaatgcaac cgttttttcg 1380
gaaggaaatg atgacctcgt ttccaccgga attagcttgg taccagctat tgtaacataa 1440
tcggtacggg ggtgaaaaag ctaacggaaa agggagcgga aaagaatgat gtaagcgtga 1500
aaaatttttt aaaaaatctc ttgacattgg aagggagata tgttattata agaattgcgg 1560
aattgtgagc ggataacaat tcataattgt gagcggataa caattcaacc ccaaaggagg 1620
tgggatccat gagcgaagtc gagttctccc acgaatactg gatgcgccat gcccttacac 1680
ttgccaaacg cgctcgcgat gaacgcgaag tcccagttgg cgccgttctt gttcttaaca 1740
accgcgtcat cggagaaggc tggaaccgcg ctatcggact tcatgatccg acagctcatg 1800
ccgaaattat ggctctgcgc caaggcggac ttgtcatgca gaattacaga ctttatgatg 1860
ccacgcttta ctcaacgttc gaaccgtgcg tcatgtgcgc cggcgccatg attcatagcc 1920
gcatcggcag agttgtcttc ggcgtccgca atgctaaaac gggcgccgcc ggctccctta 1980
tggacgtcct tcatcacccg ggcatgaatc accgcgttga aatcacagaa ggcattctgg 2040
ccgatgagtg cgccgctctg ctgtgcagat tcttcagaat gccgagaaga gtcttcaacg 2100
cccagaagaa agcccaaagc agcacagact ctggaggatc atccggaggc agctctggaa 2160
gtgaaacacc aggaacaagc gaatcagcta caccagagtc ctctggaggc tcatctggag 2220
gaagctcaat ttatcaagaa tttgttaata aatatagttt aagtaaaact ctaagatttg 2280
agttaatccc acagggtaaa acacttgaaa acataaaagc aagaggtttg attttagatg 2340
atgagaaaag agctaaagac tacaaaaagg ctaaacaaat aattgataaa tatcatcagt 2400
tttttataga ggagatatta agttcggttt gtattagcga agatttatta caaaactatt 2460
ctgatgttta ttttaaactt aaaaagagtg atgatgataa tctacaaaaa gattttaaaa 2520
gtgcaaaaga tacgataaag aaacaaatat ctgaatatat aaaggactca gagaaattta 2580
agaatttgtt taatcaaaac cttatcgatg ctaaaaaagg gcaagagtca gatttaattc 2640
tatggctaaa gcaatctaag gataatggta tagaactatt taaagccaat agtgatatca 2700
cagatataga tgaggcgtta gaaataatca aatcttttaa aggttggaca acttatttta 2760
agggttttca tgaaaataga aaaaatgttt atagtagcaa tgatattcct acatctatta 2820
tttataggat agtagatgat aatttgccta aatttctaga aaataaagct aagtatgaga 2880
gtttaaaaga caaagctcca gaagctataa actatgaaca aattaaaaaa gatttggcag 2940
aagagctaac ctttgatatt gactacaaaa catctgaagt taatcaaaga gttttttcac 3000
ttgatgaagt ttttgagata gcaaacttta ataattatct aaatcaaagt ggtattacta 3060
aatttaatac tattattggt ggtaaatttg taaatggtga aaatacaaag agaaaaggta 3120
taaatgaata tataaatcta tactcacagc aaataaatga taaaacactc aaaaaatata 3180
aaatgagtgt tttatttaag caaattttaa gtgatacaga atctaaatct tttgtaattg 3240
ataagttaga agatgatagt gatgtagtta caacgatgca aagtttttat gagcaaatag 3300
cagcttttaa aacagtagaa gaaaaatcta ttaaagaaac actatcttta ttatttgatg 3360
atttaaaagc tcaaaaactt gatttgagta aaatttattt taaaaatgat aaatctctta 3420
ctgatctatc acaacaagtt tttgatgatt atagtgttat tggtacagcg gtactagaat 3480
atataactca acaaatagca cctaaaaatc ttgataaccc tagtaagaaa gagcaagaat 3540
taatagccaa aaaaactgaa aaagcaaaat acttatctct agaaactata aagcttgcct 3600
tagaagaatt taataagcat agagatatag ataaacagtg taggtttgaa gaaatacttg 3660
caaactttgc ggctattccg atgatatttg atgaaatagc tcaaaacaaa gacaatttgg 3720
cacagatatc tatcaaatat caaaatcaag gtaaaaaaga cctacttcaa gctagtgcgg 3780
aagatgatgt taaagctatc aaggatcttt tagatcaaac taataatctc ttacataaac 3840
taaaaatatt tcatattagt cagtcagaag ataaggcaaa tattttagac aaggatgagc 3900
atttttatct agtatttgag gagtgctact ttgagctagc gaatatagtg cctctttata 3960
acaaaattag aaactatata actcaaaagc catatagtga tgagaaattt aagctcaatt 4020
ttgagaactc gactttggct aatggttggg ataaaaataa agagcctgac aatacggcaa 4080
ttttatttat caaagatgat aaatattatc tgggtgtgat gaataagaaa aataacaaaa 4140
tatttgatga taaagctatc aaagaaaata aaggcgaggg ttataaaaaa attgtttata 4200
aacttttacc tggcgcaaat aaaatgttac ctaaggtttt cttttctgct aaatctataa 4260
aattttataa tcctagtgaa gatatactta gaataagaaa tcattccaca catacaaaaa 4320
atggtagtcc tcaaaaagga tatgaaaaat ttgagtttaa tattgaagat tgccgaaaat 4380
ttatagattt ttataaacag tctataagta agcatccgga gtggaaagat tttggattta 4440
gattttctga tactcaaaga tataattcta tagatgaatt ttatagagaa gttgaaaatc 4500
aaggctacaa actaactttt gaaaatatat cagagagcta tattgatagc gtagttaatc 4560
agggtaaatt gtacctattc caaatctata ataaagattt ttcagcttat agcaaagggc 4620
gaccaaatct acatacttta tattggaaag cgctgtttga tgagagaaat cttcaagatg 4680
tggtttataa gctaaatggt gaggcagagc ttttttatcg taaacaatca atacctaaaa 4740
aaatcactca cccagctaaa gaggcaatag ctaataaaaa caaagataat cctaaaaaag 4800
agagtgtttt tgaatatgat ttaatcaaag ataaacgctt tactgaagat aagtttttct 4860
ttcactgtcc tattacaatc aattttaaat ctagtggagc taataagttt aatgatgaaa 4920
tcaatttatt gctaaaagaa aaagcaaatg atgttcatat attaagtata gctagaggtg 4980
aaagacattt agcttactat actttggtag atggtaaagg caatatcatc aaacaagata 5040
ctttcaacat cattggtaat gatagaatga aaacaaacta ccatgataag cttgctgcaa 5100
tagagaaaga tagggattca gctaggaaag actggaaaaa gataaataac atcaaagaga 5160
tgaaagaggg ctatctatct caggtagttc atgaaatagc taagctagtt atagagtata 5220
atgctattgt ggtttttgag gatttaaatt ttggatttaa aagagggcgt ttcaaggtag 5280
agaagcaggt ctatcaaaag ttagaaaaaa tgctaattga gaaactaaac tatctagttt 5340
tcaaagataa tgagtttgat aaaactgggg gagtgcttag agcttatcag ctaacagcac 5400
cttttgagac ttttaaaaag atgggtaaac aaacaggtat tatctactat gtaccagctg 5460
gttttacttc aaaaatttgt cctgtaactg gttttgtaaa tcagttatat cctaagtatg 5520
aaagtgtcag caaatctcaa gagttcttta gtaagtttga caagatttgt tataaccttg 5580
ataagggcta ttttgagttt agttttgatt ataaaaactt tggtgacaag gctgccaaag 5640
gcaagtggac tatagctagc tttgggagta gattgattaa ctttagaaat tcagataaaa 5700
atcataattg ggatactcga gaagtttatc caactaaaga gttggagaaa ttgctaaaag 5760
attattctat cgaatatggg catggcgaat gtatcaaagc agctatttgc ggtgagagcg 5820
acaaaaagtt ttttgctaag ctaactagtg tcctaaatac tatcttacaa atgcgtaact 5880
caaaaacagg tactgagtta gattatctaa tttcaccagt agcagatgta aatggcaatt 5940
tctttgattc gcgacaggcg ccaaaaaata tgcctcaaga tgctgatgcc aatggtgctt 6000
atcatattgg gctaaaaggt ctgatgctac taggtaggat caaaaataat caagagggca 6060
aaaaactcaa tttggttatc aaaaatgaag agtattttga gttcgtgcag aataggaata 6120
actctggtgg ttctcccaag aagaagagga aagtctaact gcagtataat cagaaacagc 6180
ccgcggatgt tgatctgcgg gctgtttttt attgatcgaa tggccatgac caaaatccct 6240
taacgtgaag ttttcgttcc actgagcgtc agaccccgta gaaaagatca aaggatcttc 6300
ttgagatcct ttttttctgc gcgtaatctg ctgcttgcaa acaaaaaaac caccgctacc 6360
agcggtggtt tgtttgccgg atcaagagct accaactctt tttccgaagg taactggctt 6420
cagcagagcg cagataccaa atactgtcct tctagtgtag ccgtagttag gccaccactt 6480
caagaactct gtagcaccgc ctacatacct cgctctgcta atcctgttac cagtggctgc 6540
tgccagtggc gataagtcgt gtcttaccgg gttggactca agacgatagt taccggataa 6600
ggcgcagcgg tcgggctgaa cggggggttc gtgcacacag cccagcttgg agcgaacgac 6660
ctacaccgaa ctgagatacc tacagcgtga gctatgagaa agcgccacgc ttcccgaagg 6720
gagaaaggcg gacaggtatc cggtaagcgg cagggtcgga acaggagagc gcacgaggga 6780
gcttccaggg ggaaacgcct ggtatcttta tagtcctgtc gggtttcgcc acctctgact 6840
tgagcgtcga tttttgtgat gctcgtcagg ggggcggagc ctatggaaaa acgccagcaa 6900
cgcggccttt ttacggttcc tggccttttg ctggcctttt gctcacatgt tctttcctgc 6960
gttatcccct gattctgtgg ataaccgtat taccgccttt gagtgagctg agacaggtca 7020
ttcagactgg ctaatgcacc cagtaaggca gcggtatcat caactcaaaa tggtatgcgt 7080
tttgacacat ccactatata tccgtgtcgt tctgtccact cctgaatccc attccagaaa 7140
ttctctagcg attccagaag tttctcagag tcggaaagtt gaccagacat tacgaactgg 7200
cacagatggt cataacctga aggaagatct gattgcttaa ctgcttcagt taagaccgaa 7260
gcgctcgtcg tataacagat gcgatgatgc agaccaatca acatggcacc tgccattgct 7320
acctgcacag tcaaggatgg tagaaatgtt gtcggtcctt gcacacgaat attacgccat 7380
ttgcctgcat attcaaacag ctcttctacg ataagggcac aaatcgcatc gtggaacgtt 7440
tgggcttcta ccgatttagc agtttgatac actttctcta agtatccacc tgaatcataa 7500
atcggcaaaa tagagaaaaa ttgaccatgt gtaagcggcc aatctgattc cacctgagat 7560
gcataatcta gtagaatctc ttcgctatca aaattcactt ccaccttcca ctcaccggtt 7620
gtccattcat ggctgaactc tgcttcctct gttgacatga cacacatcat ctcaatatcc 7680
gaatagggcc catcagtctg acgaccaaga gagccataaa caccaatagc cttaacatca 7740
tccccatatt tatccaatat tcgttcctta atttcatgaa caatcttcat tctttcttct 7800
ctagtcatta ttattggtcc attcactatt ctcattcccc tttcagataa ttttagattt 7860
gcttttctaa ataagaatat ttggagagca ccgttcttat tcagctatta aacccattat 7920
atcgggtttt tgaggggatt tcaactgcag acacctaaat tcaaaatcta tcggtcagat 7980
ttataccgat ttgattttat atattcttga ataacatacg ccgagttatc acataaaagc 8040
gggaaccaat catcaaattt aaacttcatt gcataatcca ttaaactctt aaattctacg 8100
attccttgtt catcaataaa ctcaatcatt tctttaatta atttatatct atctgttgtt 8160
gttttcttta ataattcatc aacatctaca ccgccataaa ctatcatatc ttctttttga 8220
tatttaaatt tattaggatc gtccatgtga agcatatatc tcacaagacc tttcacactt 8280
cctgcaatct gcggaatagt cgcattcaat tcttctgtaa ttatttttat ctgttcataa 8340
gatttattac cctcatacat cactagaata tgataatgct cttttttcat cctatcttct 8400
gtatcagtat ccctatcatg taatggagac actacaaatt gaatgtgtaa ctcttttaaa 8460
tactctaacc actcggcttt tgctgattct ggatataaaa caaatgtcca attacgtcct 8520
cttgaatttt tcttgttttc agtttctttt attacatttt cgctcatgat ataataacgg 8580
tgctaataca tttaacaaaa tttagtcata gataggcagc atgccagtgc tgtctatctt 8640
tttttgttta aaatgcaccg tattcctcct ttgcatattt ttttattaga ataccggttg 8700
catctgattt gctaatatta tatttttctt tgattctatt taatatctca ttttcttctg 8760
ttgtaagtct taaagtaaca gcaacttttt tctcttcttt tctatctaca accatcactg 8820
tacctcccaa catctgtttt tttcacttta acataaaaaa caacctttta acattaaaaa 8880
cccaatattt atttatttgt ttggacaatg gacaatggac acctaggggg gaggtcgtag 8940
taccccccta tgttttctcc cctaaataac cccaaaaatc taagaaaaaa agacctcaaa 9000
aaggtcttta attaacatct caaatttcgc atttattcca atttcctttt tgcgtgtgat 9060
gcgttattaa cgttgatata atttaaattt tatttgacaa aaatgggctc gtgttgtaca 9120
ataaatgtag aggtagagac gcgaggtcta agaactttaa ataatttcta ctgttgtaga 9180
tagagaccgt gaagttaata aggtctcaaa tttctactgt tgtagatcgt ctctgaactg 9240
attcaagcaa gcttaaaccc agctcaatga gctgggtttt ttgtttgttt tttcaaactt 9300
agttagcttg gccagtgcct ctagagtcaa gtaaagagtc gacctgttac gaacggcaga 9360
tcagaatttt gtaataaaaa aagagcctgc tcattacact gcgggctctt tttcatggtc 9420
agaagacggg taaccaagat aacaa 9445
<210> 10
<211> 9445
<212> DNA
<213> (Artificial sequence)
<400> 10
atgtcatgac attggtgtac agaaatggcg cagcaatggc aagaacgtcc cgggcggagc 60
tcaggcctta actcacatta attgcgttgc gctcactgcc cgctttccag tcgggaaacc 120
tgtcgtgcca gctgcattaa tgaatcggcc aacgcgcggg gagaggcggt ttgcgtattg 180
ggcgccaggg tggtttttct tttcaccagt gagacgggca acagctgatt gcccttcacc 240
gcctggccct gagagagttg cagcaagcgg tccacgctgg tttgccccag caggcgaaaa 300
tcctgtttga tggtggttaa cggcgggata taacatgagc tgtcttcggt atcgtcgtat 360
cccactaccg agatatccgc accaacgcgc agcccggact cggtaatggc gcgcattgcg 420
cccagcgcca tctgatcgtt ggcaaccagc atcgcagtgg gaacgatgcc ctcattcagc 480
atttgcatgg tttgttgaaa accggacatg gcactccagt cgccttcccg ttccgctatc 540
ggctgaattt gattgcgagt gagatattta tgccagccag ccagacgcag acgcgccgag 600
acagaactta atgggcccgc taacagcgcg atttgctggt gacccaatgc gaccagatgc 660
tccacgccca gtcgcgtacc gtcttcatgg gagaaaataa tactgttgat gggtgtctgg 720
tcagagacat caagaaataa cgccggaaca ttagtgcagg cagcttccac agcaatggca 780
tcctggtcat ccagcggata gttaatgatc agcccactga cgcgttgcgc gagaagattg 840
tgcaccgccg ttttacaggc ttcgacgccg cttcgttcta ccatcgacac caccacgctg 900
gcacccagtt gatcggcgcg agatttaatc gccgcgacaa tttgcgacgg cgcgtgcagg 960
gccagactgg aggtggcaac gccaatcagc aacgactgtt tgcccgccag ttgttgtgcc 1020
acgcggttgg gaatgtaatt cagctccgcc atcgccgctt ccactttttc ccgcgttttc 1080
gcagaaacgt ggctggcctg gttcaccacg cgggaaacgg tctgataaga gacaccggca 1140
tactctgcga catcgtataa cgttactggt ttcatcaaaa tcgtctccct ccgtttgaat 1200
atttgattga tcgtaaccag atgaagcact ctttccacta tccctacagt gttatggctt 1260
gaacaatcac gaaacaataa ttggtacgta cgatctttca gccgactcaa acatcaaatc 1320
ttacaaatgt agtctttgaa agtattacat atgtaagatt taaatgcaac cgttttttcg 1380
gaaggaaatg atgacctcgt ttccaccgga attagcttgg taccagctat tgtaacataa 1440
tcggtacggg ggtgaaaaag ctaacggaaa agggagcgga aaagaatgat gtaagcgtga 1500
aaaatttttt aaaaaatctc ttgacattgg aagggagata tgttattata agaattgcgg 1560
aattgtgagc ggataacaat tcataattgt gagcggataa caattcaacc ccaaaggagg 1620
tgggatccat gagcgaagtc gagttctccc acgaatactg gatgcgccat gcccttacac 1680
ttgccaaacg cgctcgcgat gaacgcgaag tcccagttgg cgccgttctt gttcttaaca 1740
accgcgtcat cggagaaggc tggaaccgcg ctatcggact tcatgatccg acagctcatg 1800
ccgaaattat ggctctgcgc caaggcggac ttgtcatgca gaattacaga cttattgatg 1860
ccacgcttta cgtcacgttc gaaccgtgcg tcatgtgcgc cggcgccatg attcatagcc 1920
gcatcggcag agttgtcttc ggcgtccgca attcaaaaag aggcgccgcc ggctccctta 1980
tgaacgtcct taactacccg ggcatgaatc accgcgttga aatcacagaa ggcattctgg 2040
ccgatgagtg cgccgctctg ctgtgcgact tctatagaat gccgagacaa gtcttcaacg 2100
cccagaagaa agcccaaagc agcattaact ctggaggatc atccggaggc agctctggaa 2160
gtgaaacacc aggaacaagc gaatcagcta caccagagtc ctctggaggc tcatctggag 2220
gaagctcaat ttatcaagaa tttgttaata aatatagttt aagtaaaact ctaagatttg 2280
agttaatccc acagggtaaa acacttgaaa acataaaagc aagaggtttg attttagatg 2340
atgagaaaag agctaaagac tacaaaaagg ctaaacaaat aattgataaa tatcatcagt 2400
tttttataga ggagatatta agttcggttt gtattagcga agatttatta caaaactatt 2460
ctgatgttta ttttaaactt aaaaagagtg atgatgataa tctacaaaaa gattttaaaa 2520
gtgcaaaaga tacgataaag aaacaaatat ctgaatatat aaaggactca gagaaattta 2580
agaatttgtt taatcaaaac cttatcgatg ctaaaaaagg gcaagagtca gatttaattc 2640
tatggctaaa gcaatctaag gataatggta tagaactatt taaagccaat agtgatatca 2700
cagatataga tgaggcgtta gaaataatca aatcttttaa aggttggaca acttatttta 2760
agggttttca tgaaaataga aaaaatgttt atagtagcaa tgatattcct acatctatta 2820
tttataggat agtagatgat aatttgccta aatttctaga aaataaagct aagtatgaga 2880
gtttaaaaga caaagctcca gaagctataa actatgaaca aattaaaaaa gatttggcag 2940
aagagctaac ctttgatatt gactacaaaa catctgaagt taatcaaaga gttttttcac 3000
ttgatgaagt ttttgagata gcaaacttta ataattatct aaatcaaagt ggtattacta 3060
aatttaatac tattattggt ggtaaatttg taaatggtga aaatacaaag agaaaaggta 3120
taaatgaata tataaatcta tactcacagc aaataaatga taaaacactc aaaaaatata 3180
aaatgagtgt tttatttaag caaattttaa gtgatacaga atctaaatct tttgtaattg 3240
ataagttaga agatgatagt gatgtagtta caacgatgca aagtttttat gagcaaatag 3300
cagcttttaa aacagtagaa gaaaaatcta ttaaagaaac actatcttta ttatttgatg 3360
atttaaaagc tcaaaaactt gatttgagta aaatttattt taaaaatgat aaatctctta 3420
ctgatctatc acaacaagtt tttgatgatt atagtgttat tggtacagcg gtactagaat 3480
atataactca acaaatagca cctaaaaatc ttgataaccc tagtaagaaa gagcaagaat 3540
taatagccaa aaaaactgaa aaagcaaaat acttatctct agaaactata aagcttgcct 3600
tagaagaatt taataagcat agagatatag ataaacagtg taggtttgaa gaaatacttg 3660
caaactttgc ggctattccg atgatatttg atgaaatagc tcaaaacaaa gacaatttgg 3720
cacagatatc tatcaaatat caaaatcaag gtaaaaaaga cctacttcaa gctagtgcgg 3780
aagatgatgt taaagctatc aaggatcttt tagatcaaac taataatctc ttacataaac 3840
taaaaatatt tcatattagt cagtcagaag ataaggcaaa tattttagac aaggatgagc 3900
atttttatct agtatttgag gagtgctact ttgagctagc gaatatagtg cctctttata 3960
acaaaattag aaactatata actcaaaagc catatagtga tgagaaattt aagctcaatt 4020
ttgagaactc gactttggct aatggttggg ataaaaataa agagcctgac aatacggcaa 4080
ttttatttat caaagatgat aaatattatc tgggtgtgat gaataagaaa aataacaaaa 4140
tatttgatga taaagctatc aaagaaaata aaggcgaggg ttataaaaaa attgtttata 4200
aacttttacc tggcgcaaat aaaatgttac ctaaggtttt cttttctgct aaatctataa 4260
aattttataa tcctagtgaa gatatactta gaataagaaa tcattccaca catacaaaaa 4320
atggtagtcc tcaaaaagga tatgaaaaat ttgagtttaa tattgaagat tgccgaaaat 4380
ttatagattt ttataaacag tctataagta agcatccgga gtggaaagat tttggattta 4440
gattttctga tactcaaaga tataattcta tagatgaatt ttatagagaa gttgaaaatc 4500
aaggctacaa actaactttt gaaaatatat cagagagcta tattgatagc gtagttaatc 4560
agggtaaatt gtacctattc caaatctata ataaagattt ttcagcttat agcaaagggc 4620
gaccaaatct acatacttta tattggaaag cgctgtttga tgagagaaat cttcaagatg 4680
tggtttataa gctaaatggt gaggcagagc ttttttatcg taaacaatca atacctaaaa 4740
aaatcactca cccagctaaa gaggcaatag ctaataaaaa caaagataat cctaaaaaag 4800
agagtgtttt tgaatatgat ttaatcaaag ataaacgctt tactgaagat aagtttttct 4860
ttcactgtcc tattacaatc aattttaaat ctagtggagc taataagttt aatgatgaaa 4920
tcaatttatt gctaaaagaa aaagcaaatg atgttcatat attaagtata gctagaggtg 4980
aaagacattt agcttactat actttggtag atggtaaagg caatatcatc aaacaagata 5040
ctttcaacat cattggtaat gatagaatga aaacaaacta ccatgataag cttgctgcaa 5100
tagagaaaga tagggattca gctaggaaag actggaaaaa gataaataac atcaaagaga 5160
tgaaagaggg ctatctatct caggtagttc atgaaatagc taagctagtt atagagtata 5220
atgctattgt ggtttttgag gatttaaatt ttggatttaa aagagggcgt ttcaaggtag 5280
agaagcaggt ctatcaaaag ttagaaaaaa tgctaattga gaaactaaac tatctagttt 5340
tcaaagataa tgagtttgat aaaactgggg gagtgcttag agcttatcag ctaacagcac 5400
cttttgagac ttttaaaaag atgggtaaac aaacaggtat tatctactat gtaccagctg 5460
gttttacttc aaaaatttgt cctgtaactg gttttgtaaa tcagttatat cctaagtatg 5520
aaagtgtcag caaatctcaa gagttcttta gtaagtttga caagatttgt tataaccttg 5580
ataagggcta ttttgagttt agttttgatt ataaaaactt tggtgacaag gctgccaaag 5640
gcaagtggac tatagctagc tttgggagta gattgattaa ctttagaaat tcagataaaa 5700
atcataattg ggatactcga gaagtttatc caactaaaga gttggagaaa ttgctaaaag 5760
attattctat cgaatatggg catggcgaat gtatcaaagc agctatttgc ggtgagagcg 5820
acaaaaagtt ttttgctaag ctaactagtg tcctaaatac tatcttacaa atgcgtaact 5880
caaaaacagg tactgagtta gattatctaa tttcaccagt agcagatgta aatggcaatt 5940
tctttgattc gcgacaggcg ccaaaaaata tgcctcaaga tgctgatgcc aatggtgctt 6000
atcatattgg gctaaaaggt ctgatgctac taggtaggat caaaaataat caagagggca 6060
aaaaactcaa tttggttatc aaaaatgaag agtattttga gttcgtgcag aataggaata 6120
actctggtgg ttctcccaag aagaagagga aagtctaact gcagtataat cagaaacagc 6180
ccgcggatgt tgatctgcgg gctgtttttt attgatcgaa tggccatgac caaaatccct 6240
taacgtgaag ttttcgttcc actgagcgtc agaccccgta gaaaagatca aaggatcttc 6300
ttgagatcct ttttttctgc gcgtaatctg ctgcttgcaa acaaaaaaac caccgctacc 6360
agcggtggtt tgtttgccgg atcaagagct accaactctt tttccgaagg taactggctt 6420
cagcagagcg cagataccaa atactgtcct tctagtgtag ccgtagttag gccaccactt 6480
caagaactct gtagcaccgc ctacatacct cgctctgcta atcctgttac cagtggctgc 6540
tgccagtggc gataagtcgt gtcttaccgg gttggactca agacgatagt taccggataa 6600
ggcgcagcgg tcgggctgaa cggggggttc gtgcacacag cccagcttgg agcgaacgac 6660
ctacaccgaa ctgagatacc tacagcgtga gctatgagaa agcgccacgc ttcccgaagg 6720
gagaaaggcg gacaggtatc cggtaagcgg cagggtcgga acaggagagc gcacgaggga 6780
gcttccaggg ggaaacgcct ggtatcttta tagtcctgtc gggtttcgcc acctctgact 6840
tgagcgtcga tttttgtgat gctcgtcagg ggggcggagc ctatggaaaa acgccagcaa 6900
cgcggccttt ttacggttcc tggccttttg ctggcctttt gctcacatgt tctttcctgc 6960
gttatcccct gattctgtgg ataaccgtat taccgccttt gagtgagctg agacaggtca 7020
ttcagactgg ctaatgcacc cagtaaggca gcggtatcat caactcaaaa tggtatgcgt 7080
tttgacacat ccactatata tccgtgtcgt tctgtccact cctgaatccc attccagaaa 7140
ttctctagcg attccagaag tttctcagag tcggaaagtt gaccagacat tacgaactgg 7200
cacagatggt cataacctga aggaagatct gattgcttaa ctgcttcagt taagaccgaa 7260
gcgctcgtcg tataacagat gcgatgatgc agaccaatca acatggcacc tgccattgct 7320
acctgcacag tcaaggatgg tagaaatgtt gtcggtcctt gcacacgaat attacgccat 7380
ttgcctgcat attcaaacag ctcttctacg ataagggcac aaatcgcatc gtggaacgtt 7440
tgggcttcta ccgatttagc agtttgatac actttctcta agtatccacc tgaatcataa 7500
atcggcaaaa tagagaaaaa ttgaccatgt gtaagcggcc aatctgattc cacctgagat 7560
gcataatcta gtagaatctc ttcgctatca aaattcactt ccaccttcca ctcaccggtt 7620
gtccattcat ggctgaactc tgcttcctct gttgacatga cacacatcat ctcaatatcc 7680
gaatagggcc catcagtctg acgaccaaga gagccataaa caccaatagc cttaacatca 7740
tccccatatt tatccaatat tcgttcctta atttcatgaa caatcttcat tctttcttct 7800
ctagtcatta ttattggtcc attcactatt ctcattcccc tttcagataa ttttagattt 7860
gcttttctaa ataagaatat ttggagagca ccgttcttat tcagctatta aacccattat 7920
atcgggtttt tgaggggatt tcaactgcag acacctaaat tcaaaatcta tcggtcagat 7980
ttataccgat ttgattttat atattcttga ataacatacg ccgagttatc acataaaagc 8040
gggaaccaat catcaaattt aaacttcatt gcataatcca ttaaactctt aaattctacg 8100
attccttgtt catcaataaa ctcaatcatt tctttaatta atttatatct atctgttgtt 8160
gttttcttta ataattcatc aacatctaca ccgccataaa ctatcatatc ttctttttga 8220
tatttaaatt tattaggatc gtccatgtga agcatatatc tcacaagacc tttcacactt 8280
cctgcaatct gcggaatagt cgcattcaat tcttctgtaa ttatttttat ctgttcataa 8340
gatttattac cctcatacat cactagaata tgataatgct cttttttcat cctatcttct 8400
gtatcagtat ccctatcatg taatggagac actacaaatt gaatgtgtaa ctcttttaaa 8460
tactctaacc actcggcttt tgctgattct ggatataaaa caaatgtcca attacgtcct 8520
cttgaatttt tcttgttttc agtttctttt attacatttt cgctcatgat ataataacgg 8580
tgctaataca tttaacaaaa tttagtcata gataggcagc atgccagtgc tgtctatctt 8640
tttttgttta aaatgcaccg tattcctcct ttgcatattt ttttattaga ataccggttg 8700
catctgattt gctaatatta tatttttctt tgattctatt taatatctca ttttcttctg 8760
ttgtaagtct taaagtaaca gcaacttttt tctcttcttt tctatctaca accatcactg 8820
tacctcccaa catctgtttt tttcacttta acataaaaaa caacctttta acattaaaaa 8880
cccaatattt atttatttgt ttggacaatg gacaatggac acctaggggg gaggtcgtag 8940
taccccccta tgttttctcc cctaaataac cccaaaaatc taagaaaaaa agacctcaaa 9000
aaggtcttta attaacatct caaatttcgc atttattcca atttcctttt tgcgtgtgat 9060
gcgttattaa cgttgatata atttaaattt tatttgacaa aaatgggctc gtgttgtaca 9120
ataaatgtag aggtagagac gcgaggtcta agaactttaa ataatttcta ctgttgtaga 9180
tagagaccgt gaagttaata aggtctcaaa tttctactgt tgtagatcgt ctctgaactg 9240
attcaagcaa gcttaaaccc agctcaatga gctgggtttt ttgtttgttt tttcaaactt 9300
agttagcttg gccagtgcct ctagagtcaa gtaaagagtc gacctgttac gaacggcaga 9360
tcagaatttt gtaataaaaa aagagcctgc tcattacact gcgggctctt tttcatggtc 9420
agaagacggg taaccaagat aacaa 9445
<210> 11
<211> 9445
<212> DNA
<213> (Artificial sequence)
<400> 11
atgtcatgac attggtgtac agaaatggcg cagcaatggc aagaacgtcc cgggcggagc 60
tcaggcctta actcacatta attgcgttgc gctcactgcc cgctttccag tcgggaaacc 120
tgtcgtgcca gctgcattaa tgaatcggcc aacgcgcggg gagaggcggt ttgcgtattg 180
ggcgccaggg tggtttttct tttcaccagt gagacgggca acagctgatt gcccttcacc 240
gcctggccct gagagagttg cagcaagcgg tccacgctgg tttgccccag caggcgaaaa 300
tcctgtttga tggtggttaa cggcgggata taacatgagc tgtcttcggt atcgtcgtat 360
cccactaccg agatatccgc accaacgcgc agcccggact cggtaatggc gcgcattgcg 420
cccagcgcca tctgatcgtt ggcaaccagc atcgcagtgg gaacgatgcc ctcattcagc 480
atttgcatgg tttgttgaaa accggacatg gcactccagt cgccttcccg ttccgctatc 540
ggctgaattt gattgcgagt gagatattta tgccagccag ccagacgcag acgcgccgag 600
acagaactta atgggcccgc taacagcgcg atttgctggt gacccaatgc gaccagatgc 660
tccacgccca gtcgcgtacc gtcttcatgg gagaaaataa tactgttgat gggtgtctgg 720
tcagagacat caagaaataa cgccggaaca ttagtgcagg cagcttccac agcaatggca 780
tcctggtcat ccagcggata gttaatgatc agcccactga cgcgttgcgc gagaagattg 840
tgcaccgccg ttttacaggc ttcgacgccg cttcgttcta ccatcgacac caccacgctg 900
gcacccagtt gatcggcgcg agatttaatc gccgcgacaa tttgcgacgg cgcgtgcagg 960
gccagactgg aggtggcaac gccaatcagc aacgactgtt tgcccgccag ttgttgtgcc 1020
acgcggttgg gaatgtaatt cagctccgcc atcgccgctt ccactttttc ccgcgttttc 1080
gcagaaacgt ggctggcctg gttcaccacg cgggaaacgg tctgataaga gacaccggca 1140
tactctgcga catcgtataa cgttactggt ttcatcaaaa tcgtctccct ccgtttgaat 1200
atttgattga tcgtaaccag atgaagcact ctttccacta tccctacagt gttatggctt 1260
gaacaatcac gaaacaataa ttggtacgta cgatctttca gccgactcaa acatcaaatc 1320
ttacaaatgt agtctttgaa agtattacat atgtaagatt taaatgcaac cgttttttcg 1380
gaaggaaatg atgacctcgt ttccaccgga attagcttgg taccagctat tgtaacataa 1440
tcggtacggg ggtgaaaaag ctaacggaaa agggagcgga aaagaatgat gtaagcgtga 1500
aaaatttttt aaaaaatctc ttgacattgg aagggagata tgttattata agaattgcgg 1560
aattgtgagc ggataacaat tcataattgt gagcggataa caattcaacc ccaaaggagg 1620
tgggatccat gagcgaagtc gagttctccc acgaatactg gatgcgccat gcccttacac 1680
ttgccaaacg cgctcgcgat gaacgcgaag tcccagttgg cgccgttctt gttcttaaca 1740
accgcgtcat cggagaaggc tggaaccgcg ctatcggact tcatgatccg acagctcatg 1800
ccgaaattat ggctctgcgc caaggcggac ttgtcatgca gaattacaga cttattgatg 1860
ccacgcttta ctcaacgttc gaaccgtgcg tcatgtgcgc cggcgccatg attcatagcc 1920
gcatcggcag agttgtcttc ggcgtccgca attcaaaaag aggcgccgcc ggctccctta 1980
tgaacgtcct taactacccg ggcatgaatc accgcgttga aatcacagaa ggcattctgg 2040
ccgatgagtg cgccgctctg ctgtgcgact tctatagaat gccgagaaga gtcttcaacg 2100
cccagaagaa agcccaaagc agcattaact ctggaggatc atccggaggc agctctggaa 2160
gtgaaacacc aggaacaagc gaatcagcta caccagagtc ctctggaggc tcatctggag 2220
gaagctcaat ttatcaagaa tttgttaata aatatagttt aagtaaaact ctaagatttg 2280
agttaatccc acagggtaaa acacttgaaa acataaaagc aagaggtttg attttagatg 2340
atgagaaaag agctaaagac tacaaaaagg ctaaacaaat aattgataaa tatcatcagt 2400
tttttataga ggagatatta agttcggttt gtattagcga agatttatta caaaactatt 2460
ctgatgttta ttttaaactt aaaaagagtg atgatgataa tctacaaaaa gattttaaaa 2520
gtgcaaaaga tacgataaag aaacaaatat ctgaatatat aaaggactca gagaaattta 2580
agaatttgtt taatcaaaac cttatcgatg ctaaaaaagg gcaagagtca gatttaattc 2640
tatggctaaa gcaatctaag gataatggta tagaactatt taaagccaat agtgatatca 2700
cagatataga tgaggcgtta gaaataatca aatcttttaa aggttggaca acttatttta 2760
agggttttca tgaaaataga aaaaatgttt atagtagcaa tgatattcct acatctatta 2820
tttataggat agtagatgat aatttgccta aatttctaga aaataaagct aagtatgaga 2880
gtttaaaaga caaagctcca gaagctataa actatgaaca aattaaaaaa gatttggcag 2940
aagagctaac ctttgatatt gactacaaaa catctgaagt taatcaaaga gttttttcac 3000
ttgatgaagt ttttgagata gcaaacttta ataattatct aaatcaaagt ggtattacta 3060
aatttaatac tattattggt ggtaaatttg taaatggtga aaatacaaag agaaaaggta 3120
taaatgaata tataaatcta tactcacagc aaataaatga taaaacactc aaaaaatata 3180
aaatgagtgt tttatttaag caaattttaa gtgatacaga atctaaatct tttgtaattg 3240
ataagttaga agatgatagt gatgtagtta caacgatgca aagtttttat gagcaaatag 3300
cagcttttaa aacagtagaa gaaaaatcta ttaaagaaac actatcttta ttatttgatg 3360
atttaaaagc tcaaaaactt gatttgagta aaatttattt taaaaatgat aaatctctta 3420
ctgatctatc acaacaagtt tttgatgatt atagtgttat tggtacagcg gtactagaat 3480
atataactca acaaatagca cctaaaaatc ttgataaccc tagtaagaaa gagcaagaat 3540
taatagccaa aaaaactgaa aaagcaaaat acttatctct agaaactata aagcttgcct 3600
tagaagaatt taataagcat agagatatag ataaacagtg taggtttgaa gaaatacttg 3660
caaactttgc ggctattccg atgatatttg atgaaatagc tcaaaacaaa gacaatttgg 3720
cacagatatc tatcaaatat caaaatcaag gtaaaaaaga cctacttcaa gctagtgcgg 3780
aagatgatgt taaagctatc aaggatcttt tagatcaaac taataatctc ttacataaac 3840
taaaaatatt tcatattagt cagtcagaag ataaggcaaa tattttagac aaggatgagc 3900
atttttatct agtatttgag gagtgctact ttgagctagc gaatatagtg cctctttata 3960
acaaaattag aaactatata actcaaaagc catatagtga tgagaaattt aagctcaatt 4020
ttgagaactc gactttggct aatggttggg ataaaaataa agagcctgac aatacggcaa 4080
ttttatttat caaagatgat aaatattatc tgggtgtgat gaataagaaa aataacaaaa 4140
tatttgatga taaagctatc aaagaaaata aaggcgaggg ttataaaaaa attgtttata 4200
aacttttacc tggcgcaaat aaaatgttac ctaaggtttt cttttctgct aaatctataa 4260
aattttataa tcctagtgaa gatatactta gaataagaaa tcattccaca catacaaaaa 4320
atggtagtcc tcaaaaagga tatgaaaaat ttgagtttaa tattgaagat tgccgaaaat 4380
ttatagattt ttataaacag tctataagta agcatccgga gtggaaagat tttggattta 4440
gattttctga tactcaaaga tataattcta tagatgaatt ttatagagaa gttgaaaatc 4500
aaggctacaa actaactttt gaaaatatat cagagagcta tattgatagc gtagttaatc 4560
agggtaaatt gtacctattc caaatctata ataaagattt ttcagcttat agcaaagggc 4620
gaccaaatct acatacttta tattggaaag cgctgtttga tgagagaaat cttcaagatg 4680
tggtttataa gctaaatggt gaggcagagc ttttttatcg taaacaatca atacctaaaa 4740
aaatcactca cccagctaaa gaggcaatag ctaataaaaa caaagataat cctaaaaaag 4800
agagtgtttt tgaatatgat ttaatcaaag ataaacgctt tactgaagat aagtttttct 4860
ttcactgtcc tattacaatc aattttaaat ctagtggagc taataagttt aatgatgaaa 4920
tcaatttatt gctaaaagaa aaagcaaatg atgttcatat attaagtata gctagaggtg 4980
aaagacattt agcttactat actttggtag atggtaaagg caatatcatc aaacaagata 5040
ctttcaacat cattggtaat gatagaatga aaacaaacta ccatgataag cttgctgcaa 5100
tagagaaaga tagggattca gctaggaaag actggaaaaa gataaataac atcaaagaga 5160
tgaaagaggg ctatctatct caggtagttc atgaaatagc taagctagtt atagagtata 5220
atgctattgt ggtttttgag gatttaaatt ttggatttaa aagagggcgt ttcaaggtag 5280
agaagcaggt ctatcaaaag ttagaaaaaa tgctaattga gaaactaaac tatctagttt 5340
tcaaagataa tgagtttgat aaaactgggg gagtgcttag agcttatcag ctaacagcac 5400
cttttgagac ttttaaaaag atgggtaaac aaacaggtat tatctactat gtaccagctg 5460
gttttacttc aaaaatttgt cctgtaactg gttttgtaaa tcagttatat cctaagtatg 5520
aaagtgtcag caaatctcaa gagttcttta gtaagtttga caagatttgt tataaccttg 5580
ataagggcta ttttgagttt agttttgatt ataaaaactt tggtgacaag gctgccaaag 5640
gcaagtggac tatagctagc tttgggagta gattgattaa ctttagaaat tcagataaaa 5700
atcataattg ggatactcga gaagtttatc caactaaaga gttggagaaa ttgctaaaag 5760
attattctat cgaatatggg catggcgaat gtatcaaagc agctatttgc ggtgagagcg 5820
acaaaaagtt ttttgctaag ctaactagtg tcctaaatac tatcttacaa atgcgtaact 5880
caaaaacagg tactgagtta gattatctaa tttcaccagt agcagatgta aatggcaatt 5940
tctttgattc gcgacaggcg ccaaaaaata tgcctcaaga tgctgatgcc aatggtgctt 6000
atcatattgg gctaaaaggt ctgatgctac taggtaggat caaaaataat caagagggca 6060
aaaaactcaa tttggttatc aaaaatgaag agtattttga gttcgtgcag aataggaata 6120
actctggtgg ttctcccaag aagaagagga aagtctaact gcagtataat cagaaacagc 6180
ccgcggatgt tgatctgcgg gctgtttttt attgatcgaa tggccatgac caaaatccct 6240
taacgtgaag ttttcgttcc actgagcgtc agaccccgta gaaaagatca aaggatcttc 6300
ttgagatcct ttttttctgc gcgtaatctg ctgcttgcaa acaaaaaaac caccgctacc 6360
agcggtggtt tgtttgccgg atcaagagct accaactctt tttccgaagg taactggctt 6420
cagcagagcg cagataccaa atactgtcct tctagtgtag ccgtagttag gccaccactt 6480
caagaactct gtagcaccgc ctacatacct cgctctgcta atcctgttac cagtggctgc 6540
tgccagtggc gataagtcgt gtcttaccgg gttggactca agacgatagt taccggataa 6600
ggcgcagcgg tcgggctgaa cggggggttc gtgcacacag cccagcttgg agcgaacgac 6660
ctacaccgaa ctgagatacc tacagcgtga gctatgagaa agcgccacgc ttcccgaagg 6720
gagaaaggcg gacaggtatc cggtaagcgg cagggtcgga acaggagagc gcacgaggga 6780
gcttccaggg ggaaacgcct ggtatcttta tagtcctgtc gggtttcgcc acctctgact 6840
tgagcgtcga tttttgtgat gctcgtcagg ggggcggagc ctatggaaaa acgccagcaa 6900
cgcggccttt ttacggttcc tggccttttg ctggcctttt gctcacatgt tctttcctgc 6960
gttatcccct gattctgtgg ataaccgtat taccgccttt gagtgagctg agacaggtca 7020
ttcagactgg ctaatgcacc cagtaaggca gcggtatcat caactcaaaa tggtatgcgt 7080
tttgacacat ccactatata tccgtgtcgt tctgtccact cctgaatccc attccagaaa 7140
ttctctagcg attccagaag tttctcagag tcggaaagtt gaccagacat tacgaactgg 7200
cacagatggt cataacctga aggaagatct gattgcttaa ctgcttcagt taagaccgaa 7260
gcgctcgtcg tataacagat gcgatgatgc agaccaatca acatggcacc tgccattgct 7320
acctgcacag tcaaggatgg tagaaatgtt gtcggtcctt gcacacgaat attacgccat 7380
ttgcctgcat attcaaacag ctcttctacg ataagggcac aaatcgcatc gtggaacgtt 7440
tgggcttcta ccgatttagc agtttgatac actttctcta agtatccacc tgaatcataa 7500
atcggcaaaa tagagaaaaa ttgaccatgt gtaagcggcc aatctgattc cacctgagat 7560
gcataatcta gtagaatctc ttcgctatca aaattcactt ccaccttcca ctcaccggtt 7620
gtccattcat ggctgaactc tgcttcctct gttgacatga cacacatcat ctcaatatcc 7680
gaatagggcc catcagtctg acgaccaaga gagccataaa caccaatagc cttaacatca 7740
tccccatatt tatccaatat tcgttcctta atttcatgaa caatcttcat tctttcttct 7800
ctagtcatta ttattggtcc attcactatt ctcattcccc tttcagataa ttttagattt 7860
gcttttctaa ataagaatat ttggagagca ccgttcttat tcagctatta aacccattat 7920
atcgggtttt tgaggggatt tcaactgcag acacctaaat tcaaaatcta tcggtcagat 7980
ttataccgat ttgattttat atattcttga ataacatacg ccgagttatc acataaaagc 8040
gggaaccaat catcaaattt aaacttcatt gcataatcca ttaaactctt aaattctacg 8100
attccttgtt catcaataaa ctcaatcatt tctttaatta atttatatct atctgttgtt 8160
gttttcttta ataattcatc aacatctaca ccgccataaa ctatcatatc ttctttttga 8220
tatttaaatt tattaggatc gtccatgtga agcatatatc tcacaagacc tttcacactt 8280
cctgcaatct gcggaatagt cgcattcaat tcttctgtaa ttatttttat ctgttcataa 8340
gatttattac cctcatacat cactagaata tgataatgct cttttttcat cctatcttct 8400
gtatcagtat ccctatcatg taatggagac actacaaatt gaatgtgtaa ctcttttaaa 8460
tactctaacc actcggcttt tgctgattct ggatataaaa caaatgtcca attacgtcct 8520
cttgaatttt tcttgttttc agtttctttt attacatttt cgctcatgat ataataacgg 8580
tgctaataca tttaacaaaa tttagtcata gataggcagc atgccagtgc tgtctatctt 8640
tttttgttta aaatgcaccg tattcctcct ttgcatattt ttttattaga ataccggttg 8700
catctgattt gctaatatta tatttttctt tgattctatt taatatctca ttttcttctg 8760
ttgtaagtct taaagtaaca gcaacttttt tctcttcttt tctatctaca accatcactg 8820
tacctcccaa catctgtttt tttcacttta acataaaaaa caacctttta acattaaaaa 8880
cccaatattt atttatttgt ttggacaatg gacaatggac acctaggggg gaggtcgtag 8940
taccccccta tgttttctcc cctaaataac cccaaaaatc taagaaaaaa agacctcaaa 9000
aaggtcttta attaacatct caaatttcgc atttattcca atttcctttt tgcgtgtgat 9060
gcgttattaa cgttgatata atttaaattt tatttgacaa aaatgggctc gtgttgtaca 9120
ataaatgtag aggtagagac gcgaggtcta agaactttaa ataatttcta ctgttgtaga 9180
tagagaccgt gaagttaata aggtctcaaa tttctactgt tgtagatcgt ctctgaactg 9240
attcaagcaa gcttaaaccc agctcaatga gctgggtttt ttgtttgttt tttcaaactt 9300
agttagcttg gccagtgcct ctagagtcaa gtaaagagtc gacctgttac gaacggcaga 9360
tcagaatttt gtaataaaaa aagagcctgc tcattacact gcgggctctt tttcatggtc 9420
agaagacggg taaccaagat aacaa 9445
Claims (8)
1. The application of the gene editing fusion protein in bacillus subtilis gene editing is characterized in that the amino acid sequence of the gene editing fusion protein is shown as SEQ ID NO. 4.
2. The bacillus subtilis adenine base editor is characterized by comprising a gene editing fusion protein and a crRNA array insertion region, wherein the amino acid sequence of the gene editing fusion protein is shown as SEQ ID NO.4, and the nucleotide sequence of the crRNA array insertion region is shown as SEQ ID NO. 5.
3. The bacillus subtilis adenine base editor of claim 2, wherein: the gene editing fusion protein passes through an inducible promoter P grac100 Regulating and controlling expression.
4. The bacillus subtilis adenine base editor of claim 2, wherein: the crRNA array insertion region passes through the constitutive promoter P veg Regulating and controlling expression.
5. The bacillus subtilis adenine base editor of claim 2, wherein: the bacillus subtilis adenine base editor contains a gene for encoding gene editing fusion protein and a plasmid of a crRNA array insertion region, and the nucleotide sequence of the plasmid is shown as SEQ ID NO. 11.
6. Use of the bacillus subtilis adenine base editor of any one of claims 2-5 in gene editing.
7. The use according to claim 6, characterized in that: the gene is edited to convert a plurality of adenine sites on the genome to guanine.
8. The use according to claim 6, characterized in that: the bacillus subtilis adenine base editor is used for obtaining mutants or inactivating extracellular proteases.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210265179.8A CN114835818B (en) | 2022-03-17 | 2022-03-17 | Gene editing fusion protein, adenine base editor constructed by same and application thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210265179.8A CN114835818B (en) | 2022-03-17 | 2022-03-17 | Gene editing fusion protein, adenine base editor constructed by same and application thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114835818A CN114835818A (en) | 2022-08-02 |
CN114835818B true CN114835818B (en) | 2024-03-22 |
Family
ID=82561443
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210265179.8A Active CN114835818B (en) | 2022-03-17 | 2022-03-17 | Gene editing fusion protein, adenine base editor constructed by same and application thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114835818B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116751799B (en) * | 2023-06-14 | 2024-01-26 | 江南大学 | Multi-site double-base editor and application thereof |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109295186A (en) * | 2018-09-30 | 2019-02-01 | 中山大学 | A kind of method based on genome sequencing detection adenine single base editing system undershooting-effect and its application in gene editing |
KR20190044157A (en) * | 2017-10-20 | 2019-04-30 | 경상대학교산학협력단 | Composition for single base editing comprising adenine or adenosine deaminase as effective component and uses thereof |
CN109957569A (en) * | 2017-12-22 | 2019-07-02 | 中国科学院遗传与发育生物学研究所 | Base editing system and method based on CPF1 albumen |
CN112080513A (en) * | 2020-09-16 | 2020-12-15 | 中国农业科学院植物保护研究所 | Rice artificial genome editing system with expanded editing range and application thereof |
CN112143753A (en) * | 2020-09-17 | 2020-12-29 | 中国农业科学院植物保护研究所 | Adenine base editor and related biological material and application thereof |
-
2022
- 2022-03-17 CN CN202210265179.8A patent/CN114835818B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20190044157A (en) * | 2017-10-20 | 2019-04-30 | 경상대학교산학협력단 | Composition for single base editing comprising adenine or adenosine deaminase as effective component and uses thereof |
CN109957569A (en) * | 2017-12-22 | 2019-07-02 | 中国科学院遗传与发育生物学研究所 | Base editing system and method based on CPF1 albumen |
CN109295186A (en) * | 2018-09-30 | 2019-02-01 | 中山大学 | A kind of method based on genome sequencing detection adenine single base editing system undershooting-effect and its application in gene editing |
CN112080513A (en) * | 2020-09-16 | 2020-12-15 | 中国农业科学院植物保护研究所 | Rice artificial genome editing system with expanded editing range and application thereof |
CN112143753A (en) * | 2020-09-17 | 2020-12-29 | 中国农业科学院植物保护研究所 | Adenine base editor and related biological material and application thereof |
Non-Patent Citations (4)
Title |
---|
High-efficiency and multiplex adenine base editing in plants using new TadA variants;Yan et al.;《Mol. Plant.》;第14卷;第724页左栏第2段及图2A * |
Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity;Richter et al.;《NATURE BIOTECHNOLOGY》;第38卷;摘要部分、第886页右栏第1-2段及附加信息 * |
基于 CRISPR 的枯草芽孢杆菌基因编辑和表达调控系统的设计、构建与应用;武耀康;《中国博士学位论文全文数据库(电子期刊) 基础科学辑》;第13-16页 * |
基于CRISPR/Cas9系统的单碱基基因编辑技术及其在医药研究中的应用;张爱霞;赵宇;安静;罗影;陈志国;;中国药理学与毒理学杂志(07);第607-514页 * |
Also Published As
Publication number | Publication date |
---|---|
CN114835818A (en) | 2022-08-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4469005B2 (en) | Artificial promoter library for selected organisms and promoters derived from the library | |
Kirschbaum et al. | Isolation of a specialized lambda transducing bacteriophage carrying the beta subunit gene for Escherichia coli ribonucleic acid polymerase | |
KR100519902B1 (en) | Variants of soluble pyrroloquinoline quinone-dependent glucose dehydrogenase | |
CN108384784A (en) | A method of knocking out Endoglin genes using CRISPR/Cas9 technologies | |
CN110066829B (en) | CRISPR/Cas9 gene editing system and application thereof | |
CN113481136B (en) | Recombinant halophilic monad, construction method and application of catalyzing citric acid to prepare itaconic acid | |
US20190024099A1 (en) | Methods and compositions for recombinase-based genetic diversification | |
CA2747462A1 (en) | Systems and methods for the secretion of recombinant proteins in gram negative bacteria | |
CA2652689C (en) | Method of constructing gene transport support | |
CN114835818B (en) | Gene editing fusion protein, adenine base editor constructed by same and application thereof | |
US6558924B1 (en) | Recombinant expression of insulin C-peptide | |
WO2020169221A1 (en) | Production of plant-based active substances (e.g. cannabinoids) by recombinant microorganisms | |
CN102094037A (en) | Reference internal type dual-luciferase reporter vector and application thereof | |
CN112301018B (en) | Novel Cas protein, crispr-Cas system and use thereof in the field of gene editing | |
CN116286931B (en) | Double-plasmid system for rapid gene editing of Ralstonia eutropha and application thereof | |
CN112961832A (en) | Cell strain and preparation method and application thereof | |
CN111534578A (en) | Method for high-throughput screening of target gene of interaction between eukaryotic cells and pesticides | |
US6387683B1 (en) | Recombinant yeast PDI and process for production thereof | |
CN113151130A (en) | Genetically engineered bacterium and application thereof in preparation of isobutanol by bioconversion of methane | |
CN112195190B (en) | Replication element derived from Bacillus belgii plasmid and application thereof | |
CN115605589A (en) | Improved process for the production of isoprenoids | |
KR20120107519A (en) | Nucleic acid structure containing a pyripyropene biosynthesis gene cluster and a marker gene | |
CN106715697A (en) | Transformation method of sugar beet protoplasts by TALEN platform technology | |
KR102669217B1 (en) | Expression vector for use in methanogens | |
CN113462701B (en) | High-temperature polyphenol oxidase and application thereof in treatment of phenol-containing wastewater |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |