CA2294473A1 - Novel family of pheromone receptors - Google Patents
Novel family of pheromone receptors Download PDFInfo
- Publication number
- CA2294473A1 CA2294473A1 CA002294473A CA2294473A CA2294473A1 CA 2294473 A1 CA2294473 A1 CA 2294473A1 CA 002294473 A CA002294473 A CA 002294473A CA 2294473 A CA2294473 A CA 2294473A CA 2294473 A1 CA2294473 A1 CA 2294473A1
- Authority
- CA
- Canada
- Prior art keywords
- leu
- ser
- ile
- phe
- val
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 108010002724 Pheromone Receptors Proteins 0.000 title claims abstract description 370
- 102100038344 Vomeronasal type-1 receptor 2 Human genes 0.000 title claims abstract description 358
- 239000002427 pheromone receptor Substances 0.000 title claims abstract description 355
- 108090000765 processed proteins & peptides Proteins 0.000 claims abstract description 296
- 102000004196 processed proteins & peptides Human genes 0.000 claims abstract description 289
- 229920001184 polypeptide Polymers 0.000 claims abstract description 282
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 119
- 102000039446 nucleic acids Human genes 0.000 claims abstract description 115
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 115
- 239000012634 fragment Substances 0.000 claims abstract description 108
- 238000000034 method Methods 0.000 claims abstract description 83
- 230000027455 binding Effects 0.000 claims description 152
- 238000009739 binding Methods 0.000 claims description 143
- 210000004027 cell Anatomy 0.000 claims description 138
- 239000002299 complementary DNA Substances 0.000 claims description 131
- 108090000623 proteins and genes Proteins 0.000 claims description 119
- 210000001121 vomeronasal organ Anatomy 0.000 claims description 115
- 239000000523 sample Substances 0.000 claims description 93
- 239000003446 ligand Substances 0.000 claims description 79
- 239000002773 nucleotide Substances 0.000 claims description 76
- 125000003729 nucleotide group Chemical group 0.000 claims description 76
- 150000001413 amino acids Chemical class 0.000 claims description 64
- 239000003795 chemical substances by application Substances 0.000 claims description 59
- 210000002569 neuron Anatomy 0.000 claims description 59
- 230000014509 gene expression Effects 0.000 claims description 56
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 54
- 230000000694 effects Effects 0.000 claims description 49
- 239000000203 mixture Substances 0.000 claims description 44
- 102000004169 proteins and genes Human genes 0.000 claims description 43
- 239000003016 pheromone Substances 0.000 claims description 32
- 230000003834 intracellular effect Effects 0.000 claims description 28
- 241001465754 Metazoa Species 0.000 claims description 25
- 239000002831 pharmacologic agent Substances 0.000 claims description 25
- 102000003688 G-Protein-Coupled Receptors Human genes 0.000 claims description 22
- 108090000045 G-Protein-Coupled Receptors Proteins 0.000 claims description 22
- 230000019491 signal transduction Effects 0.000 claims description 22
- 239000013604 expression vector Substances 0.000 claims description 16
- 230000001404 mediated effect Effects 0.000 claims description 16
- 230000000692 anti-sense effect Effects 0.000 claims description 13
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 11
- 230000035558 fertility Effects 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 11
- 238000006243 chemical reaction Methods 0.000 claims description 10
- 210000000056 organ Anatomy 0.000 claims description 10
- 238000012360 testing method Methods 0.000 claims description 10
- 230000002068 genetic effect Effects 0.000 claims description 9
- 150000002611 lead compounds Chemical class 0.000 claims description 9
- 125000002924 primary amino group Chemical group [H]N([H])* 0.000 claims description 9
- 239000007787 solid Substances 0.000 claims description 8
- 241000251539 Vertebrata <Metazoa> Species 0.000 claims description 7
- 230000003321 amplification Effects 0.000 claims description 7
- 125000003178 carboxy group Chemical group [H]OC(*)=O 0.000 claims description 7
- 239000003937 drug carrier Substances 0.000 claims description 7
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 7
- 241000238631 Hexapoda Species 0.000 claims description 6
- 108010054477 Immunoglobulin Fab Fragments Proteins 0.000 claims description 6
- 108020004711 Nucleic Acid Probes Proteins 0.000 claims description 6
- 108010076504 Protein Sorting Signals Proteins 0.000 claims description 6
- 238000003745 diagnosis Methods 0.000 claims description 6
- 239000002853 nucleic acid probe Substances 0.000 claims description 6
- 108020004705 Codon Proteins 0.000 claims description 5
- 102000001706 Immunoglobulin Fab Fragments Human genes 0.000 claims description 5
- 230000007423 decrease Effects 0.000 claims description 5
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 5
- 229960005486 vaccine Drugs 0.000 claims description 5
- 201000010099 disease Diseases 0.000 claims description 4
- 210000004962 mammalian cell Anatomy 0.000 claims description 4
- 108010021625 Immunoglobulin Fragments Proteins 0.000 claims description 3
- 102000008394 Immunoglobulin Fragments Human genes 0.000 claims description 3
- 230000028993 immune response Effects 0.000 claims description 3
- 108020001756 ligand binding domains Proteins 0.000 claims description 2
- 230000009467 reduction Effects 0.000 claims description 2
- 230000003247 decreasing effect Effects 0.000 claims 1
- 108700005084 Multigene Family Proteins 0.000 abstract description 9
- 108020003175 receptors Proteins 0.000 description 106
- 102000005962 receptors Human genes 0.000 description 98
- 241000699666 Mus <mouse, genus> Species 0.000 description 73
- 235000001014 amino acid Nutrition 0.000 description 70
- 229940024606 amino acid Drugs 0.000 description 61
- 108020004635 Complementary DNA Proteins 0.000 description 59
- 241000700159 Rattus Species 0.000 description 47
- 235000018102 proteins Nutrition 0.000 description 40
- 108091026890 Coding region Proteins 0.000 description 36
- 108020004414 DNA Proteins 0.000 description 36
- 108010050848 glycylleucine Proteins 0.000 description 28
- 108091034117 Oligonucleotide Proteins 0.000 description 25
- 238000003556 assay Methods 0.000 description 25
- 238000009396 hybridization Methods 0.000 description 25
- COYHRQWNJDJCNA-NUJDXYNKSA-N Thr-Thr-Thr Chemical compound C[C@@H](O)[C@H](N)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O COYHRQWNJDJCNA-NUJDXYNKSA-N 0.000 description 24
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 20
- 108020004999 messenger RNA Proteins 0.000 description 18
- 238000012216 screening Methods 0.000 description 18
- 108010050543 Calcium-Sensing Receptors Proteins 0.000 description 16
- 102000013830 Calcium-Sensing Receptors Human genes 0.000 description 16
- 239000013598 vector Substances 0.000 description 16
- 241000880493 Leptailurus serval Species 0.000 description 14
- 238000002105 Southern blotting Methods 0.000 description 14
- 238000004458 analytical method Methods 0.000 description 14
- 239000000074 antisense oligonucleotide Substances 0.000 description 14
- 238000012230 antisense oligonucleotides Methods 0.000 description 14
- 108010038633 aspartylglutamate Proteins 0.000 description 14
- 239000000872 buffer Substances 0.000 description 14
- 239000000047 product Substances 0.000 description 14
- 238000006467 substitution reaction Methods 0.000 description 14
- 230000006399 behavior Effects 0.000 description 13
- 108091006027 G proteins Proteins 0.000 description 12
- 102000030782 GTP binding Human genes 0.000 description 12
- 108091000058 GTP-Binding Proteins 0.000 description 12
- 102000016193 Metabotropic glutamate receptors Human genes 0.000 description 12
- 108010010914 Metabotropic glutamate receptors Proteins 0.000 description 12
- 108050002069 Olfactory receptors Proteins 0.000 description 12
- 210000001519 tissue Anatomy 0.000 description 12
- 108020000948 Antisense Oligonucleotides Proteins 0.000 description 11
- 102000012547 Olfactory receptors Human genes 0.000 description 11
- 238000013518 transcription Methods 0.000 description 11
- 230000035897 transcription Effects 0.000 description 11
- MJTOYIHCKVQICL-ULQDDVLXSA-N Leu-Met-Phe Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)O)N MJTOYIHCKVQICL-ULQDDVLXSA-N 0.000 description 10
- XZFYRXDAULDNFX-UHFFFAOYSA-N N-L-cysteinyl-L-phenylalanine Natural products SCC(N)C(=O)NC(C(O)=O)CC1=CC=CC=C1 XZFYRXDAULDNFX-UHFFFAOYSA-N 0.000 description 10
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Chemical class Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 10
- 210000001072 colon Anatomy 0.000 description 10
- 238000007901 in situ hybridization Methods 0.000 description 10
- 239000003112 inhibitor Substances 0.000 description 10
- 210000001706 olfactory mucosa Anatomy 0.000 description 10
- 238000002360 preparation method Methods 0.000 description 10
- 238000007423 screening assay Methods 0.000 description 10
- HGCNKOLVKRAVHD-UHFFFAOYSA-N L-Met-L-Phe Natural products CSCCC(N)C(=O)NC(C(O)=O)CC1=CC=CC=C1 HGCNKOLVKRAVHD-UHFFFAOYSA-N 0.000 description 9
- KWTVLKBOQATPHJ-SRVKXCTJSA-N Leu-Ala-Lys Chemical compound C[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CC(C)C)N KWTVLKBOQATPHJ-SRVKXCTJSA-N 0.000 description 9
- 238000000338 in vitro Methods 0.000 description 9
- 238000001727 in vivo Methods 0.000 description 9
- 108010044374 isoleucyl-tyrosine Proteins 0.000 description 9
- 108010034529 leucyl-lysine Proteins 0.000 description 9
- 239000012528 membrane Substances 0.000 description 9
- 230000036961 partial effect Effects 0.000 description 9
- 241000699667 Mus spretus Species 0.000 description 8
- VMLONWHIORGALA-SRVKXCTJSA-N Ser-Leu-Leu Chemical compound CC(C)C[C@@H](C([O-])=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H]([NH3+])CO VMLONWHIORGALA-SRVKXCTJSA-N 0.000 description 8
- 238000007792 addition Methods 0.000 description 8
- 108010047495 alanylglycine Proteins 0.000 description 8
- 239000011230 binding agent Substances 0.000 description 8
- 150000001875 compounds Chemical class 0.000 description 8
- 108010004073 cysteinylcysteine Proteins 0.000 description 8
- 238000001514 detection method Methods 0.000 description 8
- 108010054812 diprotin A Proteins 0.000 description 8
- 238000002955 isolation Methods 0.000 description 8
- 108010076756 leucyl-alanyl-phenylalanine Proteins 0.000 description 8
- 108010068488 methionylphenylalanine Proteins 0.000 description 8
- 230000035772 mutation Effects 0.000 description 8
- 108010051242 phenylalanylserine Proteins 0.000 description 8
- 230000001105 regulatory effect Effects 0.000 description 8
- 230000014616 translation Effects 0.000 description 8
- KLBVGHCGHUNHEA-BJDJZHNGSA-N Ile-Leu-Ala Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C)C(=O)O)N KLBVGHCGHUNHEA-BJDJZHNGSA-N 0.000 description 7
- WXLYNEHOGRYNFU-URLPEUOOSA-N Ile-Thr-Phe Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)O)N WXLYNEHOGRYNFU-URLPEUOOSA-N 0.000 description 7
- XOEDPXDZJHBQIX-ULQDDVLXSA-N Leu-Val-Phe Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 XOEDPXDZJHBQIX-ULQDDVLXSA-N 0.000 description 7
- AUEJLPRZGVVDNU-UHFFFAOYSA-N N-L-tyrosyl-L-leucine Natural products CC(C)CC(C(O)=O)NC(=O)C(N)CC1=CC=C(O)C=C1 AUEJLPRZGVVDNU-UHFFFAOYSA-N 0.000 description 7
- FUMGHWDRRFCKEP-CIUDSAMLSA-N Ser-Leu-Ala Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C)C(O)=O FUMGHWDRRFCKEP-CIUDSAMLSA-N 0.000 description 7
- UPLYXVPQLJVWMM-KKUMJFAQSA-N Ser-Phe-Leu Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(C)C)C(O)=O UPLYXVPQLJVWMM-KKUMJFAQSA-N 0.000 description 7
- MQUZANJDFOQOBX-SRVKXCTJSA-N Ser-Phe-Ser Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CO)C(O)=O MQUZANJDFOQOBX-SRVKXCTJSA-N 0.000 description 7
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 7
- UMPVMAYCLYMYGA-ONGXEEELSA-N Val-Leu-Gly Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)NCC(O)=O UMPVMAYCLYMYGA-ONGXEEELSA-N 0.000 description 7
- YQMILNREHKTFBS-IHRRRGAJSA-N Val-Phe-Cys Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CS)C(=O)O)N YQMILNREHKTFBS-IHRRRGAJSA-N 0.000 description 7
- 239000000427 antigen Substances 0.000 description 7
- 108091007433 antigens Proteins 0.000 description 7
- 102000036639 antigens Human genes 0.000 description 7
- 108010013835 arginine glutamate Proteins 0.000 description 7
- 108010040443 aspartyl-aspartic acid Proteins 0.000 description 7
- 230000002759 chromosomal effect Effects 0.000 description 7
- 108010060199 cysteinylproline Proteins 0.000 description 7
- FSXRLASFHBWESK-UHFFFAOYSA-N dipeptide phenylalanyl-tyrosine Natural products C=1C=C(O)C=CC=1CC(C(O)=O)NC(=O)C(N)CC1=CC=CC=C1 FSXRLASFHBWESK-UHFFFAOYSA-N 0.000 description 7
- 230000002209 hydrophobic effect Effects 0.000 description 7
- 108010054155 lysyllysine Proteins 0.000 description 7
- 239000000463 material Substances 0.000 description 7
- 210000004379 membrane Anatomy 0.000 description 7
- 239000000758 substrate Substances 0.000 description 7
- 108010073969 valyllysine Proteins 0.000 description 7
- 238000005406 washing Methods 0.000 description 7
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 6
- REWSWYIDQIELBE-FXQIFTODSA-N Ala-Val-Ser Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CO)C(O)=O REWSWYIDQIELBE-FXQIFTODSA-N 0.000 description 6
- JBCLFWXMTIKCCB-UHFFFAOYSA-N H-Gly-Phe-OH Natural products NCC(=O)NC(C(O)=O)CC1=CC=CC=C1 JBCLFWXMTIKCCB-UHFFFAOYSA-N 0.000 description 6
- NYEYYMLUABXDMC-NHCYSSNCSA-N Ile-Gly-Leu Chemical compound CC[C@H](C)[C@@H](C(=O)NCC(=O)N[C@@H](CC(C)C)C(=O)O)N NYEYYMLUABXDMC-NHCYSSNCSA-N 0.000 description 6
- KFKWRHQBZQICHA-STQMWFEESA-N L-leucyl-L-phenylalanine Natural products CC(C)C[C@H](N)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 KFKWRHQBZQICHA-STQMWFEESA-N 0.000 description 6
- KAFOIVJDVSZUMD-UHFFFAOYSA-N Leu-Gln-Gln Natural products CC(C)CC(N)C(=O)NC(CCC(N)=O)C(=O)NC(CCC(N)=O)C(O)=O KAFOIVJDVSZUMD-UHFFFAOYSA-N 0.000 description 6
- FEHQLKKBVJHSEC-SZMVWBNQSA-N Leu-Glu-Trp Chemical compound C1=CC=C2C(C[C@H](NC(=O)[C@H](CCC(O)=O)NC(=O)[C@@H](N)CC(C)C)C(O)=O)=CNC2=C1 FEHQLKKBVJHSEC-SZMVWBNQSA-N 0.000 description 6
- VGPCJSXPPOQPBK-YUMQZZPRSA-N Leu-Gly-Ser Chemical compound CC(C)C[C@H](N)C(=O)NCC(=O)N[C@@H](CO)C(O)=O VGPCJSXPPOQPBK-YUMQZZPRSA-N 0.000 description 6
- 108700026244 Open Reading Frames Proteins 0.000 description 6
- HHJFMHQYEAAOBM-ZLUOBGJFSA-N Ser-Ser-Ala Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CO)C(=O)N[C@@H](C)C(O)=O HHJFMHQYEAAOBM-ZLUOBGJFSA-N 0.000 description 6
- DBMJMQXJHONAFJ-UHFFFAOYSA-M Sodium laurylsulphate Chemical compound [Na+].CCCCCCCCCCCCOS([O-])(=O)=O DBMJMQXJHONAFJ-UHFFFAOYSA-M 0.000 description 6
- GXUWHVZYDAHFSV-FLBSBUHZSA-N Thr-Ile-Thr Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)O)C(O)=O GXUWHVZYDAHFSV-FLBSBUHZSA-N 0.000 description 6
- IMDMLDSVUSMAEJ-HJGDQZAQSA-N Thr-Leu-Asn Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(N)=O)C(O)=O IMDMLDSVUSMAEJ-HJGDQZAQSA-N 0.000 description 6
- BIBYEFRASCNLAA-CDMKHQONSA-N Thr-Phe-Gly Chemical compound C[C@@H](O)[C@H](N)C(=O)N[C@H](C(=O)NCC(O)=O)CC1=CC=CC=C1 BIBYEFRASCNLAA-CDMKHQONSA-N 0.000 description 6
- AHERARIZBPOMNU-KATARQTJSA-N Thr-Ser-Leu Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(O)=O AHERARIZBPOMNU-KATARQTJSA-N 0.000 description 6
- BMOFUVHDBROBSE-DCAQKATOSA-N Val-Leu-Cys Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CS)C(=O)O)NC(=O)[C@H](C(C)C)N BMOFUVHDBROBSE-DCAQKATOSA-N 0.000 description 6
- QZKVWWIUSQGWMY-IHRRRGAJSA-N Val-Ser-Phe Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 QZKVWWIUSQGWMY-IHRRRGAJSA-N 0.000 description 6
- 230000004913 activation Effects 0.000 description 6
- 238000010367 cloning Methods 0.000 description 6
- 230000001276 controlling effect Effects 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 6
- 239000003205 fragrance Substances 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 108020001507 fusion proteins Proteins 0.000 description 6
- 102000037865 fusion proteins Human genes 0.000 description 6
- 108010063718 gamma-glutamylaspartic acid Proteins 0.000 description 6
- XBGGUPMXALFZOT-UHFFFAOYSA-N glycyl-L-tyrosine hemihydrate Natural products NCC(=O)NC(C(O)=O)CC1=CC=C(O)C=C1 XBGGUPMXALFZOT-UHFFFAOYSA-N 0.000 description 6
- 238000011534 incubation Methods 0.000 description 6
- 108010044056 leucyl-phenylalanine Proteins 0.000 description 6
- 108010057821 leucylproline Proteins 0.000 description 6
- 108010038320 lysylphenylalanine Proteins 0.000 description 6
- 108010056582 methionylglutamic acid Proteins 0.000 description 6
- 230000004048 modification Effects 0.000 description 6
- 238000012986 modification Methods 0.000 description 6
- 238000010369 molecular cloning Methods 0.000 description 6
- 239000013612 plasmid Substances 0.000 description 6
- 108010070643 prolylglutamic acid Proteins 0.000 description 6
- 210000001044 sensory neuron Anatomy 0.000 description 6
- 238000000926 separation method Methods 0.000 description 6
- 108010048818 seryl-histidine Proteins 0.000 description 6
- 108010071207 serylmethionine Proteins 0.000 description 6
- 235000019333 sodium laurylsulphate Nutrition 0.000 description 6
- 239000000243 solution Substances 0.000 description 6
- 238000001890 transfection Methods 0.000 description 6
- 238000013519 translation Methods 0.000 description 6
- 108010078580 tyrosylleucine Proteins 0.000 description 6
- JNTMAZFVYNDPLB-PEDHHIEDSA-N (2S,3S)-2-[[[(2S)-1-[(2S,3S)-2-amino-3-methyl-1-oxopentyl]-2-pyrrolidinyl]-oxomethyl]amino]-3-methylpentanoic acid Chemical compound CC[C@H](C)[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H]([C@@H](C)CC)C(O)=O JNTMAZFVYNDPLB-PEDHHIEDSA-N 0.000 description 5
- XVZCXCTYGHPNEM-IHRRRGAJSA-N (2s)-1-[(2s)-2-[[(2s)-2-amino-4-methylpentanoyl]amino]-4-methylpentanoyl]pyrrolidine-2-carboxylic acid Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N1CCC[C@H]1C(O)=O XVZCXCTYGHPNEM-IHRRRGAJSA-N 0.000 description 5
- LXAARTARZJJCMB-CIQUZCHMSA-N Ala-Ile-Thr Chemical compound [H]N[C@@H](C)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)O)C(O)=O LXAARTARZJJCMB-CIQUZCHMSA-N 0.000 description 5
- WEZNQZHACPSMEF-QEJZJMRPSA-N Ala-Phe-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@@H](N)C)CC1=CC=CC=C1 WEZNQZHACPSMEF-QEJZJMRPSA-N 0.000 description 5
- BTRULDJUUVGRNE-DCAQKATOSA-N Ala-Pro-Lys Chemical compound C[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCCCN)C(O)=O BTRULDJUUVGRNE-DCAQKATOSA-N 0.000 description 5
- SNAKIVFVLVUCKB-UHFFFAOYSA-N Asn-Glu-Ala-Lys Natural products NCCCCC(C(O)=O)NC(=O)C(C)NC(=O)C(CCC(O)=O)NC(=O)C(N)CC(N)=O SNAKIVFVLVUCKB-UHFFFAOYSA-N 0.000 description 5
- SUIJFTJDTJKSRK-IHRRRGAJSA-N Asn-Pro-Tyr Chemical compound NC(=O)C[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 SUIJFTJDTJKSRK-IHRRRGAJSA-N 0.000 description 5
- FRSGNOZCTWDVFZ-ACZMJKKPSA-N Asp-Asp-Gln Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CCC(N)=O)C(O)=O FRSGNOZCTWDVFZ-ACZMJKKPSA-N 0.000 description 5
- 108010054576 Deoxyribonuclease EcoRI Proteins 0.000 description 5
- NPTGGVQJYRSMCM-GLLZPBPUSA-N Gln-Gln-Thr Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O NPTGGVQJYRSMCM-GLLZPBPUSA-N 0.000 description 5
- KMSGYZQRXPUKGI-BYPYZUCNSA-N Gly-Gly-Asn Chemical compound NCC(=O)NCC(=O)N[C@H](C(O)=O)CC(N)=O KMSGYZQRXPUKGI-BYPYZUCNSA-N 0.000 description 5
- DBJYVKDPGIFXFO-BQBZGAKWSA-N Gly-Met-Ala Chemical compound [H]NCC(=O)N[C@@H](CCSC)C(=O)N[C@@H](C)C(O)=O DBJYVKDPGIFXFO-BQBZGAKWSA-N 0.000 description 5
- AYUOWUNWZGTNKB-ULQDDVLXSA-N His-Phe-Arg Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O AYUOWUNWZGTNKB-ULQDDVLXSA-N 0.000 description 5
- NUKXXNFEUZGPRO-BJDJZHNGSA-N Ile-Leu-Cys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CS)C(=O)O)N NUKXXNFEUZGPRO-BJDJZHNGSA-N 0.000 description 5
- RCFDOSNHHZGBOY-UHFFFAOYSA-N L-isoleucyl-L-alanine Natural products CCC(C)C(N)C(=O)NC(C)C(O)=O RCFDOSNHHZGBOY-UHFFFAOYSA-N 0.000 description 5
- XQXGNBFMAXWIGI-MXAVVETBSA-N Leu-His-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@@H](N)CC(C)C)CC1=CN=CN1 XQXGNBFMAXWIGI-MXAVVETBSA-N 0.000 description 5
- DSFYPIUSAMSERP-IHRRRGAJSA-N Leu-Leu-Arg Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@H](C(O)=O)CCCN=C(N)N DSFYPIUSAMSERP-IHRRRGAJSA-N 0.000 description 5
- KTOIECMYZZGVSI-BZSNNMDCSA-N Leu-Phe-His Chemical compound C([C@H](NC(=O)[C@@H](N)CC(C)C)C(=O)N[C@@H](CC=1NC=NC=1)C(O)=O)C1=CC=CC=C1 KTOIECMYZZGVSI-BZSNNMDCSA-N 0.000 description 5
- KSIPKXNIQOWMIC-RCWTZXSCSA-N Met-Thr-Arg Chemical compound CSCC[C@H](N)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@H](C(O)=O)CCCNC(N)=N KSIPKXNIQOWMIC-RCWTZXSCSA-N 0.000 description 5
- 241000699670 Mus sp. Species 0.000 description 5
- KXUZHWXENMYOHC-QEJZJMRPSA-N Phe-Leu-Ala Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C)C(O)=O KXUZHWXENMYOHC-QEJZJMRPSA-N 0.000 description 5
- YTILBRIUASDGBL-BZSNNMDCSA-N Phe-Leu-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)CC1=CC=CC=C1 YTILBRIUASDGBL-BZSNNMDCSA-N 0.000 description 5
- FGWUALWGCZJQDJ-URLPEUOOSA-N Phe-Thr-Ile Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O FGWUALWGCZJQDJ-URLPEUOOSA-N 0.000 description 5
- VIIRRNQMMIHYHQ-XHSDSOJGSA-N Phe-Val-Pro Chemical compound CC(C)[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CC2=CC=CC=C2)N VIIRRNQMMIHYHQ-XHSDSOJGSA-N 0.000 description 5
- LSHUNRICNSEEAN-BPUTZDHNSA-N Ser-Val-Trp Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC1=CNC2=CC=CC=C21)C(=O)O)NC(=O)[C@H](CO)N LSHUNRICNSEEAN-BPUTZDHNSA-N 0.000 description 5
- KZURUCDWKDEAFZ-XVSYOHENSA-N Thr-Phe-Asn Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(=O)N)C(=O)O)N)O KZURUCDWKDEAFZ-XVSYOHENSA-N 0.000 description 5
- 108091036066 Three prime untranslated region Proteins 0.000 description 5
- KISFXYYRKKNLOP-IHRRRGAJSA-N Val-Phe-Ser Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CO)C(=O)O)N KISFXYYRKKNLOP-IHRRRGAJSA-N 0.000 description 5
- ODUHAIXFXFACDY-SRVKXCTJSA-N Val-Val-Met Chemical compound CSCC[C@@H](C(O)=O)NC(=O)[C@H](C(C)C)NC(=O)[C@@H](N)C(C)C ODUHAIXFXFACDY-SRVKXCTJSA-N 0.000 description 5
- 108010047857 aspartylglycine Proteins 0.000 description 5
- 239000011324 bead Substances 0.000 description 5
- 210000004556 brain Anatomy 0.000 description 5
- 239000003153 chemical reaction reagent Substances 0.000 description 5
- 230000000295 complement effect Effects 0.000 description 5
- 238000012217 deletion Methods 0.000 description 5
- 230000037430 deletion Effects 0.000 description 5
- 238000009826 distribution Methods 0.000 description 5
- 108010013768 glutamyl-aspartyl-proline Proteins 0.000 description 5
- 108010015792 glycyllysine Proteins 0.000 description 5
- 108010092114 histidylphenylalanine Proteins 0.000 description 5
- 108010018006 histidylserine Proteins 0.000 description 5
- 239000000545 human pheromone Substances 0.000 description 5
- 108010027338 isoleucylcysteine Proteins 0.000 description 5
- 108010009298 lysylglutamic acid Proteins 0.000 description 5
- 108010031719 prolyl-serine Proteins 0.000 description 5
- 108010026333 seryl-proline Proteins 0.000 description 5
- 230000009870 specific binding Effects 0.000 description 5
- 239000000126 substance Substances 0.000 description 5
- 230000001225 therapeutic effect Effects 0.000 description 5
- 238000011144 upstream manufacturing Methods 0.000 description 5
- MDNAVFBZPROEHO-DCAQKATOSA-N Ala-Lys-Val Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](C(C)C)C(O)=O MDNAVFBZPROEHO-DCAQKATOSA-N 0.000 description 4
- MDNAVFBZPROEHO-UHFFFAOYSA-N Ala-Lys-Val Natural products CC(C)C(C(O)=O)NC(=O)C(NC(=O)C(C)N)CCCCN MDNAVFBZPROEHO-UHFFFAOYSA-N 0.000 description 4
- VNFSAYFQLXPHPY-CIQUZCHMSA-N Ala-Thr-Ile Chemical compound [H]N[C@@H](C)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O VNFSAYFQLXPHPY-CIQUZCHMSA-N 0.000 description 4
- NABSCJGZKWSNHX-RCWTZXSCSA-N Arg-Arg-Thr Chemical compound NC(N)=NCCC[C@@H](C(=O)N[C@@H]([C@H](O)C)C(O)=O)NC(=O)[C@@H](N)CCCN=C(N)N NABSCJGZKWSNHX-RCWTZXSCSA-N 0.000 description 4
- JTKLCCFLSLCCST-SZMVWBNQSA-N Arg-Arg-Trp Chemical compound C1=CC=C2C(C[C@H](NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@H](CCCN=C(N)N)N)C(O)=O)=CNC2=C1 JTKLCCFLSLCCST-SZMVWBNQSA-N 0.000 description 4
- CFGHCPUPFHWMCM-FDARSICLSA-N Arg-Ile-Trp Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CNC2=CC=CC=C21)C(=O)O)NC(=O)[C@H](CCCN=C(N)N)N CFGHCPUPFHWMCM-FDARSICLSA-N 0.000 description 4
- HXWUJJADFMXNKA-BQBZGAKWSA-N Asn-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@@H](N)CC(N)=O HXWUJJADFMXNKA-BQBZGAKWSA-N 0.000 description 4
- NYGILGUOUOXGMJ-YUMQZZPRSA-N Asn-Lys-Gly Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CCCCN)C(=O)NCC(O)=O NYGILGUOUOXGMJ-YUMQZZPRSA-N 0.000 description 4
- SNYCNNPOFYBCEK-ZLUOBGJFSA-N Asn-Ser-Ser Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CO)C(O)=O SNYCNNPOFYBCEK-ZLUOBGJFSA-N 0.000 description 4
- QUCCLIXMVPIVOB-BZSNNMDCSA-N Asn-Tyr-Phe Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)O)NC(=O)[C@H](CC2=CC=C(C=C2)O)NC(=O)[C@H](CC(=O)N)N QUCCLIXMVPIVOB-BZSNNMDCSA-N 0.000 description 4
- KLYPOCBLKMPBIQ-GHCJXIJMSA-N Asp-Ile-Ser Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CO)C(=O)O)NC(=O)[C@H](CC(=O)O)N KLYPOCBLKMPBIQ-GHCJXIJMSA-N 0.000 description 4
- 108091003079 Bovine Serum Albumin Proteins 0.000 description 4
- 108010047041 Complementarity Determining Regions Proteins 0.000 description 4
- KXUKWRVYDYIPSQ-CIUDSAMLSA-N Cys-Leu-Ala Chemical compound [H]N[C@@H](CS)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C)C(O)=O KXUKWRVYDYIPSQ-CIUDSAMLSA-N 0.000 description 4
- SRUKWJMBAALPQV-IHPCNDPISA-N Cys-Phe-Trp Chemical compound [H]N[C@@H](CS)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC1=CNC2=C1C=CC=C2)C(O)=O SRUKWJMBAALPQV-IHPCNDPISA-N 0.000 description 4
- MBRWOKXNHTUJMB-CIUDSAMLSA-N Cys-Pro-Glu Chemical compound [H]N[C@@H](CS)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCC(O)=O)C(O)=O MBRWOKXNHTUJMB-CIUDSAMLSA-N 0.000 description 4
- ABLQPNMKLMFDQU-BIIVOSGPSA-N Cys-Ser-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CO)NC(=O)[C@H](CS)N)C(=O)O ABLQPNMKLMFDQU-BIIVOSGPSA-N 0.000 description 4
- 108700024394 Exon Proteins 0.000 description 4
- 108010008177 Fd immunoglobulins Proteins 0.000 description 4
- 108700039691 Genetic Promoter Regions Proteins 0.000 description 4
- ZDJZEGYVKANKED-NRPADANISA-N Gln-Cys-Val Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CS)C(=O)N[C@@H](C(C)C)C(O)=O ZDJZEGYVKANKED-NRPADANISA-N 0.000 description 4
- UFNSPPFJOHNXRE-AUTRQRHGSA-N Gln-Gln-Val Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](C(C)C)C(O)=O UFNSPPFJOHNXRE-AUTRQRHGSA-N 0.000 description 4
- JKGHMESJHRTHIC-SIUGBPQLSA-N Gln-Ile-Tyr Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CC=C(C=C1)O)C(=O)O)NC(=O)[C@H](CCC(=O)N)N JKGHMESJHRTHIC-SIUGBPQLSA-N 0.000 description 4
- JRCUFCXYZLPSDZ-ACZMJKKPSA-N Glu-Asp-Ser Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(O)=O JRCUFCXYZLPSDZ-ACZMJKKPSA-N 0.000 description 4
- JYXKPJVDCAWMDG-ZPFDUUQYSA-N Glu-Pro-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@@H]1CCCN1C(=O)[C@H](CCC(=O)O)N JYXKPJVDCAWMDG-ZPFDUUQYSA-N 0.000 description 4
- VLPMGIJPAWENQB-SRVKXCTJSA-N His-Cys-Leu Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CS)C(=O)N[C@@H](CC(C)C)C(O)=O VLPMGIJPAWENQB-SRVKXCTJSA-N 0.000 description 4
- KBAPKNDWAGVGTH-IGISWZIWSA-N Ile-Ile-Tyr Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 KBAPKNDWAGVGTH-IGISWZIWSA-N 0.000 description 4
- JWBXCSQZLLIOCI-GUBZILKMSA-N Ile-Leu Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@H](C(O)=O)CC(C)C JWBXCSQZLLIOCI-GUBZILKMSA-N 0.000 description 4
- GVKKVHNRTUFCCE-BJDJZHNGSA-N Ile-Leu-Ser Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(=O)O)N GVKKVHNRTUFCCE-BJDJZHNGSA-N 0.000 description 4
- SHVFUCSSACPBTF-VGDYDELISA-N Ile-Ser-His Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](CC1=CN=CN1)C(=O)O)N SHVFUCSSACPBTF-VGDYDELISA-N 0.000 description 4
- NURNJECQNNCRBK-FLBSBUHZSA-N Ile-Thr-Thr Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O NURNJECQNNCRBK-FLBSBUHZSA-N 0.000 description 4
- NJGXXYLPDMMFJB-XUXIUFHCSA-N Ile-Val-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CCCCN)C(=O)O)N NJGXXYLPDMMFJB-XUXIUFHCSA-N 0.000 description 4
- IIKJNQWOQIWWMR-CIUDSAMLSA-N Leu-Cys-Ala Chemical compound C[C@@H](C(=O)O)NC(=O)[C@H](CS)NC(=O)[C@H](CC(C)C)N IIKJNQWOQIWWMR-CIUDSAMLSA-N 0.000 description 4
- QJUWBDPGGYVRHY-YUMQZZPRSA-N Leu-Gly-Cys Chemical compound CC(C)C[C@@H](C(=O)NCC(=O)N[C@@H](CS)C(=O)O)N QJUWBDPGGYVRHY-YUMQZZPRSA-N 0.000 description 4
- CCQLQKZTXZBXTN-NHCYSSNCSA-N Leu-Gly-Ile Chemical compound [H]N[C@@H](CC(C)C)C(=O)NCC(=O)N[C@@H]([C@@H](C)CC)C(O)=O CCQLQKZTXZBXTN-NHCYSSNCSA-N 0.000 description 4
- PPQRKXHCLYCBSP-IHRRRGAJSA-N Leu-Leu-Met Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCSC)C(=O)O)N PPQRKXHCLYCBSP-IHRRRGAJSA-N 0.000 description 4
- BMVFXOQHDQZAQU-DCAQKATOSA-N Leu-Pro-Asp Chemical compound CC(C)C[C@@H](C(=O)N1CCC[C@H]1C(=O)N[C@@H](CC(=O)O)C(=O)O)N BMVFXOQHDQZAQU-DCAQKATOSA-N 0.000 description 4
- JDBQSGMJBMPNFT-AVGNSLFASA-N Leu-Pro-Val Chemical compound CC(C)C[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](C(C)C)C(O)=O JDBQSGMJBMPNFT-AVGNSLFASA-N 0.000 description 4
- SBANPBVRHYIMRR-UHFFFAOYSA-N Leu-Ser-Pro Natural products CC(C)CC(N)C(=O)NC(CO)C(=O)N1CCCC1C(O)=O SBANPBVRHYIMRR-UHFFFAOYSA-N 0.000 description 4
- QQXJROOJCMIHIV-AVGNSLFASA-N Leu-Val-Met Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CCSC)C(O)=O QQXJROOJCMIHIV-AVGNSLFASA-N 0.000 description 4
- FDBTVENULFNTAL-XQQFMLRXSA-N Leu-Val-Pro Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](C(C)C)C(=O)N1CCC[C@@H]1C(=O)O)N FDBTVENULFNTAL-XQQFMLRXSA-N 0.000 description 4
- QESXLSQLQHHTIX-RHYQMDGZSA-N Leu-Val-Thr Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H]([C@@H](C)O)C(O)=O QESXLSQLQHHTIX-RHYQMDGZSA-N 0.000 description 4
- DZQYZKPINJLLEN-KKUMJFAQSA-N Lys-Cys-Tyr Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)O)NC(=O)[C@H](CS)NC(=O)[C@H](CCCCN)N)O DZQYZKPINJLLEN-KKUMJFAQSA-N 0.000 description 4
- WTZUSCUIVPVCRH-SRVKXCTJSA-N Lys-Gln-Arg Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@H](C(O)=O)CCCN=C(N)N WTZUSCUIVPVCRH-SRVKXCTJSA-N 0.000 description 4
- FHIAJWBDZVHLAH-YUMQZZPRSA-N Lys-Gly-Ser Chemical compound NCCCC[C@H](N)C(=O)NCC(=O)N[C@@H](CO)C(O)=O FHIAJWBDZVHLAH-YUMQZZPRSA-N 0.000 description 4
- XREQQOATSMMAJP-MGHWNKPDSA-N Lys-Ile-Tyr Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O XREQQOATSMMAJP-MGHWNKPDSA-N 0.000 description 4
- GIKFNMZSGYAPEJ-HJGDQZAQSA-N Lys-Thr-Asp Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(O)=O)C(O)=O GIKFNMZSGYAPEJ-HJGDQZAQSA-N 0.000 description 4
- UAPZLLPGGOOCRO-IHRRRGAJSA-N Met-Asn-Phe Chemical compound CSCC[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)O)N UAPZLLPGGOOCRO-IHRRRGAJSA-N 0.000 description 4
- CAODKDAPYGUMLK-FXQIFTODSA-N Met-Asn-Ser Chemical compound CSCC[C@H](N)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CO)C(O)=O CAODKDAPYGUMLK-FXQIFTODSA-N 0.000 description 4
- MSSJHBAKDDIRMJ-SRVKXCTJSA-N Met-Lys-Gln Chemical compound [H]N[C@@H](CCSC)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCC(N)=O)C(O)=O MSSJHBAKDDIRMJ-SRVKXCTJSA-N 0.000 description 4
- SITLTJHOQZFJGG-UHFFFAOYSA-N N-L-alpha-glutamyl-L-valine Natural products CC(C)C(C(O)=O)NC(=O)C(N)CCC(O)=O SITLTJHOQZFJGG-UHFFFAOYSA-N 0.000 description 4
- KZNQNBZMBZJQJO-UHFFFAOYSA-N N-glycyl-L-proline Natural products NCC(=O)N1CCCC1C(O)=O KZNQNBZMBZJQJO-UHFFFAOYSA-N 0.000 description 4
- 108010002311 N-glycylglutamic acid Proteins 0.000 description 4
- 101150112539 OR gene Proteins 0.000 description 4
- WGXOKDLDIWSOCV-MELADBBJSA-N Phe-Asn-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CC(=O)N)NC(=O)[C@H](CC2=CC=CC=C2)N)C(=O)O WGXOKDLDIWSOCV-MELADBBJSA-N 0.000 description 4
- GYEPCBNTTRORKW-PCBIJLKTSA-N Phe-Ile-Asp Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(O)=O)C(O)=O GYEPCBNTTRORKW-PCBIJLKTSA-N 0.000 description 4
- LRBSWBVUCLLRLU-BZSNNMDCSA-N Phe-Leu-Lys Chemical compound CC(C)C[C@H](NC(=O)[C@@H](N)Cc1ccccc1)C(=O)N[C@@H](CCCCN)C(O)=O LRBSWBVUCLLRLU-BZSNNMDCSA-N 0.000 description 4
- YCCUXNNKXDGMAM-KKUMJFAQSA-N Phe-Leu-Ser Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(O)=O YCCUXNNKXDGMAM-KKUMJFAQSA-N 0.000 description 4
- CMHTUJQZQXFNTQ-OEAJRASXSA-N Phe-Leu-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CC1=CC=CC=C1)N)O CMHTUJQZQXFNTQ-OEAJRASXSA-N 0.000 description 4
- JKJSIYKSGIDHPM-WBAXXEDZSA-N Phe-Phe-Ala Chemical compound C[C@H](NC(=O)[C@H](Cc1ccccc1)NC(=O)[C@@H](N)Cc1ccccc1)C(O)=O JKJSIYKSGIDHPM-WBAXXEDZSA-N 0.000 description 4
- GKRCCTYAGQPMMP-IHRRRGAJSA-N Phe-Ser-Met Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCSC)C(O)=O GKRCCTYAGQPMMP-IHRRRGAJSA-N 0.000 description 4
- GOUWCZRDTWTODO-YDHLFZDLSA-N Phe-Val-Asn Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC(N)=O)C(O)=O GOUWCZRDTWTODO-YDHLFZDLSA-N 0.000 description 4
- DMKWYMWNEKIPFC-IUCAKERBSA-N Pro-Gly-Arg Chemical compound [H]N1CCC[C@H]1C(=O)NCC(=O)N[C@@H](CCCNC(N)=N)C(O)=O DMKWYMWNEKIPFC-IUCAKERBSA-N 0.000 description 4
- YHUBAXGAAYULJY-ULQDDVLXSA-N Pro-Tyr-Leu Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC(C)C)C(O)=O YHUBAXGAAYULJY-ULQDDVLXSA-N 0.000 description 4
- FUOGXAQMNJMBFG-WPRPVWTQSA-N Pro-Val-Gly Chemical compound OC(=O)CNC(=O)[C@H](C(C)C)NC(=O)[C@@H]1CCCN1 FUOGXAQMNJMBFG-WPRPVWTQSA-N 0.000 description 4
- 108020004518 RNA Probes Proteins 0.000 description 4
- 239000003391 RNA probe Substances 0.000 description 4
- 241000283984 Rodentia Species 0.000 description 4
- ZXLUWXWISXIFIX-ACZMJKKPSA-N Ser-Asn-Glu Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O ZXLUWXWISXIFIX-ACZMJKKPSA-N 0.000 description 4
- WBINSDOPZHQPPM-AVGNSLFASA-N Ser-Glu-Tyr Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CO)N)O WBINSDOPZHQPPM-AVGNSLFASA-N 0.000 description 4
- NFDYGNFETJVMSE-BQBZGAKWSA-N Ser-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@@H](N)CO NFDYGNFETJVMSE-BQBZGAKWSA-N 0.000 description 4
- MUJQWSAWLLRJCE-KATARQTJSA-N Ser-Leu-Thr Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)O)C(O)=O MUJQWSAWLLRJCE-KATARQTJSA-N 0.000 description 4
- JAWGSPUJAXYXJA-IHRRRGAJSA-N Ser-Phe-Arg Chemical compound NC(N)=NCCC[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@H](CO)N)CC1=CC=CC=C1 JAWGSPUJAXYXJA-IHRRRGAJSA-N 0.000 description 4
- ILVGMCVCQBJPSH-WDSKDSINSA-N Ser-Val Chemical compound CC(C)[C@@H](C(O)=O)NC(=O)[C@@H](N)CO ILVGMCVCQBJPSH-WDSKDSINSA-N 0.000 description 4
- AYCQVUUPIJHJTA-IXOXFDKPSA-N Thr-His-Leu Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CC(C)C)C(O)=O AYCQVUUPIJHJTA-IXOXFDKPSA-N 0.000 description 4
- VTVVYQOXJCZVEB-WDCWCFNPSA-N Thr-Leu-Glu Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(O)=O)C(O)=O VTVVYQOXJCZVEB-WDCWCFNPSA-N 0.000 description 4
- ABWNZPOIUJMNKT-IXOXFDKPSA-N Thr-Phe-Ser Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CO)C(O)=O ABWNZPOIUJMNKT-IXOXFDKPSA-N 0.000 description 4
- 108091007498 Transmembrane domain 2 Proteins 0.000 description 4
- SUGLEXVWEJOCGN-ONUFPDRFSA-N Trp-Thr-Trp Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CC1=CNC2=CC=CC=C21)C(=O)O)NC(=O)[C@H](CC3=CNC4=CC=CC=C43)N)O SUGLEXVWEJOCGN-ONUFPDRFSA-N 0.000 description 4
- RCLOWEZASFJFEX-KKUMJFAQSA-N Tyr-Asp-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CC(O)=O)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 RCLOWEZASFJFEX-KKUMJFAQSA-N 0.000 description 4
- GIOBXJSONRQHKQ-RYUDHWBXSA-N Tyr-Gly-Glu Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)NCC(=O)N[C@@H](CCC(O)=O)C(O)=O GIOBXJSONRQHKQ-RYUDHWBXSA-N 0.000 description 4
- JHORGUYURUBVOM-KKUMJFAQSA-N Tyr-His-Ser Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CO)C(O)=O JHORGUYURUBVOM-KKUMJFAQSA-N 0.000 description 4
- QJKMCQRFHJRIPU-XDTLVQLUSA-N Tyr-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 QJKMCQRFHJRIPU-XDTLVQLUSA-N 0.000 description 4
- DWAMXBFJNZIHMC-KBPBESRZSA-N Tyr-Leu-Gly Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC(C)C)C(=O)NCC(O)=O DWAMXBFJNZIHMC-KBPBESRZSA-N 0.000 description 4
- ARJASMXQBRNAGI-YESZJQIVSA-N Tyr-Leu-Pro Chemical compound CC(C)C[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CC2=CC=C(C=C2)O)N ARJASMXQBRNAGI-YESZJQIVSA-N 0.000 description 4
- BJCILVZEZRDIDR-PMVMPFDFSA-N Tyr-Leu-Trp Chemical compound C([C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC=1C2=CC=CC=C2NC=1)C(O)=O)C1=CC=C(O)C=C1 BJCILVZEZRDIDR-PMVMPFDFSA-N 0.000 description 4
- HSBZWINKRYZCSQ-KKUMJFAQSA-N Tyr-Lys-Asp Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(O)=O)C(O)=O HSBZWINKRYZCSQ-KKUMJFAQSA-N 0.000 description 4
- REJBPZVUHYNMEN-LSJOCFKGSA-N Val-Ala-His Chemical compound C[C@@H](C(=O)N[C@@H](CC1=CN=CN1)C(=O)O)NC(=O)[C@H](C(C)C)N REJBPZVUHYNMEN-LSJOCFKGSA-N 0.000 description 4
- HIZMLPKDJAXDRG-FXQIFTODSA-N Val-Cys-Ser Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CS)C(=O)N[C@@H](CO)C(=O)O)N HIZMLPKDJAXDRG-FXQIFTODSA-N 0.000 description 4
- VHRLUTIMTDOVCG-PEDHHIEDSA-N Val-Ile-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H]([C@@H](C)CC)C(=O)O)NC(=O)[C@H](C(C)C)N VHRLUTIMTDOVCG-PEDHHIEDSA-N 0.000 description 4
- PYXQBKJPHNCTNW-CYDGBPFRSA-N Val-Ile-Met Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCSC)C(=O)O)NC(=O)[C@H](C(C)C)N PYXQBKJPHNCTNW-CYDGBPFRSA-N 0.000 description 4
- NLNCNKIVJPEFBC-DLOVCJGASA-N Val-Val-Glu Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@H](C(O)=O)CCC(O)=O NLNCNKIVJPEFBC-DLOVCJGASA-N 0.000 description 4
- 239000000443 aerosol Substances 0.000 description 4
- 108010036533 arginylvaline Proteins 0.000 description 4
- 108010092854 aspartyllysine Proteins 0.000 description 4
- 229940098773 bovine serum albumin Drugs 0.000 description 4
- 238000000423 cell based assay Methods 0.000 description 4
- 125000003636 chemical group Chemical group 0.000 description 4
- 210000000349 chromosome Anatomy 0.000 description 4
- 230000005021 gait Effects 0.000 description 4
- VPZXBVLAVMBEQI-UHFFFAOYSA-N glycyl-DL-alpha-alanine Natural products OC(=O)C(C)NC(=O)CN VPZXBVLAVMBEQI-UHFFFAOYSA-N 0.000 description 4
- 108010074027 glycyl-seryl-phenylalanine Proteins 0.000 description 4
- 108010089804 glycyl-threonine Proteins 0.000 description 4
- 108010081551 glycylphenylalanine Proteins 0.000 description 4
- 108010037850 glycylvaline Proteins 0.000 description 4
- 108010025306 histidylleucine Proteins 0.000 description 4
- 238000003018 immunoassay Methods 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 239000002502 liposome Substances 0.000 description 4
- 108010064235 lysylglycine Proteins 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 210000000956 olfactory bulb Anatomy 0.000 description 4
- 230000008520 organization Effects 0.000 description 4
- 238000002823 phage display Methods 0.000 description 4
- 108010084525 phenylalanyl-phenylalanyl-glycine Proteins 0.000 description 4
- 108010018625 phenylalanylarginine Proteins 0.000 description 4
- -1 phosphate triesters Chemical class 0.000 description 4
- 238000003752 polymerase chain reaction Methods 0.000 description 4
- 108010090894 prolylleucine Proteins 0.000 description 4
- 230000010076 replication Effects 0.000 description 4
- 150000003839 salts Chemical class 0.000 description 4
- 230000001568 sexual effect Effects 0.000 description 4
- 239000011780 sodium chloride Substances 0.000 description 4
- 230000008685 targeting Effects 0.000 description 4
- 238000010399 three-hybrid screening Methods 0.000 description 4
- 238000010396 two-hybrid screening Methods 0.000 description 4
- 239000003981 vehicle Substances 0.000 description 4
- AUXMWYRZQPIXCC-KNIFDHDWSA-N (2s)-2-amino-4-methylpentanoic acid;(2s)-2-aminopropanoic acid Chemical compound C[C@H](N)C(O)=O.CC(C)C[C@H](N)C(O)=O AUXMWYRZQPIXCC-KNIFDHDWSA-N 0.000 description 3
- DVWVZSJAYIJZFI-FXQIFTODSA-N Ala-Arg-Asn Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(N)=O)C(O)=O DVWVZSJAYIJZFI-FXQIFTODSA-N 0.000 description 3
- NXSFUECZFORGOG-CIUDSAMLSA-N Ala-Asn-Leu Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(C)C)C(O)=O NXSFUECZFORGOG-CIUDSAMLSA-N 0.000 description 3
- 108010076441 Ala-His-His Proteins 0.000 description 3
- MNZHHDPWDWQJCQ-YUMQZZPRSA-N Ala-Leu-Gly Chemical compound C[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)NCC(O)=O MNZHHDPWDWQJCQ-YUMQZZPRSA-N 0.000 description 3
- OPZJWMJPCNNZNT-DCAQKATOSA-N Ala-Leu-Met Chemical compound C[C@@H](C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCSC)C(=O)O)N OPZJWMJPCNNZNT-DCAQKATOSA-N 0.000 description 3
- KQESEZXHYOUIIM-CQDKDKBSSA-N Ala-Lys-Tyr Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O KQESEZXHYOUIIM-CQDKDKBSSA-N 0.000 description 3
- ZBLQIYPCUWZSRZ-QEJZJMRPSA-N Ala-Phe-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@H](C)N)CC1=CC=CC=C1 ZBLQIYPCUWZSRZ-QEJZJMRPSA-N 0.000 description 3
- WNHNMKOFKCHKKD-BFHQHQDPSA-N Ala-Thr-Gly Chemical compound [H]N[C@@H](C)C(=O)N[C@@H]([C@@H](C)O)C(=O)NCC(O)=O WNHNMKOFKCHKKD-BFHQHQDPSA-N 0.000 description 3
- 108700028369 Alleles Proteins 0.000 description 3
- 108010032595 Antibody Binding Sites Proteins 0.000 description 3
- JSHVMZANPXCDTL-GMOBBJLQSA-N Arg-Asp-Ile Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O JSHVMZANPXCDTL-GMOBBJLQSA-N 0.000 description 3
- PNQWAUXQDBIJDY-GUBZILKMSA-N Arg-Glu-Glu Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O PNQWAUXQDBIJDY-GUBZILKMSA-N 0.000 description 3
- WMEVEPXNCMKNGH-IHRRRGAJSA-N Arg-Leu-His Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CC1=CN=CN1)C(=O)O)NC(=O)[C@H](CCCN=C(N)N)N WMEVEPXNCMKNGH-IHRRRGAJSA-N 0.000 description 3
- OMKZPCPZEFMBIT-SRVKXCTJSA-N Arg-Met-Arg Chemical compound NC(=N)NCCC[C@H](N)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O OMKZPCPZEFMBIT-SRVKXCTJSA-N 0.000 description 3
- HAJWYALLJIATCX-FXQIFTODSA-N Asn-Asn-Arg Chemical compound C(C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](CC(=O)N)N)CN=C(N)N HAJWYALLJIATCX-FXQIFTODSA-N 0.000 description 3
- MSBDSTRUMZFSEU-PEFMBERDSA-N Asn-Glu-Ile Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O MSBDSTRUMZFSEU-PEFMBERDSA-N 0.000 description 3
- IBLAOXSULLECQZ-IUKAMOBKSA-N Asn-Ile-Thr Chemical compound C[C@@H](O)[C@@H](C(O)=O)NC(=O)[C@H]([C@@H](C)CC)NC(=O)[C@@H](N)CC(N)=O IBLAOXSULLECQZ-IUKAMOBKSA-N 0.000 description 3
- QGABLMITFKUQDF-DCAQKATOSA-N Asn-Met-Lys Chemical compound CSCC[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CC(=O)N)N QGABLMITFKUQDF-DCAQKATOSA-N 0.000 description 3
- BYLSYQASFJJBCL-DCAQKATOSA-N Asn-Pro-Leu Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CC(C)C)C(O)=O BYLSYQASFJJBCL-DCAQKATOSA-N 0.000 description 3
- GZXOUBTUAUAVHD-ACZMJKKPSA-N Asn-Ser-Glu Chemical compound NC(=O)C[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@H](C(O)=O)CCC(O)=O GZXOUBTUAUAVHD-ACZMJKKPSA-N 0.000 description 3
- VPSHHQXIWLGVDD-ZLUOBGJFSA-N Asp-Asp-Asp Chemical compound OC(=O)C[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(O)=O)C(O)=O VPSHHQXIWLGVDD-ZLUOBGJFSA-N 0.000 description 3
- WCFCYFDBMNFSPA-ACZMJKKPSA-N Asp-Asp-Glu Chemical compound OC(=O)C[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@H](C(O)=O)CCC(O)=O WCFCYFDBMNFSPA-ACZMJKKPSA-N 0.000 description 3
- SMZCLQGDQMGESY-ACZMJKKPSA-N Asp-Gln-Cys Chemical compound C(CC(=O)N)[C@@H](C(=O)N[C@@H](CS)C(=O)O)NC(=O)[C@H](CC(=O)O)N SMZCLQGDQMGESY-ACZMJKKPSA-N 0.000 description 3
- OGTCOKZFOJIZFG-CIUDSAMLSA-N Asp-His-Asp Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CC(O)=O)C(O)=O OGTCOKZFOJIZFG-CIUDSAMLSA-N 0.000 description 3
- DWOGMPWRQQWPPF-GUBZILKMSA-N Asp-Leu-Glu Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(O)=O)C(O)=O DWOGMPWRQQWPPF-GUBZILKMSA-N 0.000 description 3
- YFGUZQQCSDZRBN-DCAQKATOSA-N Asp-Pro-Leu Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CC(C)C)C(O)=O YFGUZQQCSDZRBN-DCAQKATOSA-N 0.000 description 3
- XXAMCEGRCZQGEM-ZLUOBGJFSA-N Asp-Ser-Asn Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(N)=O)C(O)=O XXAMCEGRCZQGEM-ZLUOBGJFSA-N 0.000 description 3
- UXRVDHVARNBOIO-QSFUFRPTSA-N Asp-Val-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](C(C)C)NC(=O)[C@H](CC(=O)O)N UXRVDHVARNBOIO-QSFUFRPTSA-N 0.000 description 3
- OYPRJOBELJOOCE-UHFFFAOYSA-N Calcium Chemical compound [Ca] OYPRJOBELJOOCE-UHFFFAOYSA-N 0.000 description 3
- QJUDRFBUWAGUSG-SRVKXCTJSA-N Cys-Cys-Phe Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)O)NC(=O)[C@H](CS)NC(=O)[C@H](CS)N QJUDRFBUWAGUSG-SRVKXCTJSA-N 0.000 description 3
- ANRWXLYGJRSQEQ-CIUDSAMLSA-N Cys-His-Asp Chemical compound [H]N[C@@H](CS)C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CC(O)=O)C(O)=O ANRWXLYGJRSQEQ-CIUDSAMLSA-N 0.000 description 3
- XIZWKXATMJODQW-KKUMJFAQSA-N Cys-His-Phe Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)O)NC(=O)[C@H](CC2=CN=CN2)NC(=O)[C@H](CS)N XIZWKXATMJODQW-KKUMJFAQSA-N 0.000 description 3
- RESAHOSBQHMOKH-KKUMJFAQSA-N Cys-Phe-Leu Chemical compound CC(C)C[C@@H](C(=O)O)NC(=O)[C@H](CC1=CC=CC=C1)NC(=O)[C@H](CS)N RESAHOSBQHMOKH-KKUMJFAQSA-N 0.000 description 3
- SRZZZTMJARUVPI-JBDRJPRFSA-N Cys-Ser-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CO)NC(=O)[C@H](CS)N SRZZZTMJARUVPI-JBDRJPRFSA-N 0.000 description 3
- NDNZRWUDUMTITL-FXQIFTODSA-N Cys-Ser-Val Chemical compound [H]N[C@@H](CS)C(=O)N[C@@H](CO)C(=O)N[C@@H](C(C)C)C(O)=O NDNZRWUDUMTITL-FXQIFTODSA-N 0.000 description 3
- SHIBSTMRCDJXLN-UHFFFAOYSA-N Digoxigenin Natural products C1CC(C2C(C3(C)CCC(O)CC3CC2)CC2O)(O)C2(C)C1C1=CC(=O)OC1 SHIBSTMRCDJXLN-UHFFFAOYSA-N 0.000 description 3
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 3
- 102100031780 Endonuclease Human genes 0.000 description 3
- WSFSSNUMVMOOMR-UHFFFAOYSA-N Formaldehyde Chemical compound O=C WSFSSNUMVMOOMR-UHFFFAOYSA-N 0.000 description 3
- LZRMPXRYLLTAJX-GUBZILKMSA-N Gln-Arg-Glu Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(O)=O)C(O)=O LZRMPXRYLLTAJX-GUBZILKMSA-N 0.000 description 3
- IKDOHQHEFPPGJG-FXQIFTODSA-N Gln-Asp-Glu Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O IKDOHQHEFPPGJG-FXQIFTODSA-N 0.000 description 3
- JHPFPROFOAJRFN-IHRRRGAJSA-N Gln-Glu-Tyr Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCC(=O)N)N)O JHPFPROFOAJRFN-IHRRRGAJSA-N 0.000 description 3
- JRHPEMVLTRADLJ-AVGNSLFASA-N Gln-Lys-Lys Chemical compound C(CCN)C[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CCC(=O)N)N JRHPEMVLTRADLJ-AVGNSLFASA-N 0.000 description 3
- XZUUUKNKNWVPHQ-JYJNAYRXSA-N Gln-Phe-Leu Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(C)C)C(O)=O XZUUUKNKNWVPHQ-JYJNAYRXSA-N 0.000 description 3
- ZZLDMBMFKZFQMU-NRPADANISA-N Gln-Val-Ala Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](C)C(O)=O ZZLDMBMFKZFQMU-NRPADANISA-N 0.000 description 3
- FITIQFSXXBKFFM-NRPADANISA-N Gln-Val-Ser Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CO)C(O)=O FITIQFSXXBKFFM-NRPADANISA-N 0.000 description 3
- HNAUFGBKJLTWQE-IFFSRLJSSA-N Gln-Val-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](C(C)C)NC(=O)[C@H](CCC(=O)N)N)O HNAUFGBKJLTWQE-IFFSRLJSSA-N 0.000 description 3
- LJLPOZGRPLORTF-CIUDSAMLSA-N Glu-Asn-Met Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCSC)C(O)=O LJLPOZGRPLORTF-CIUDSAMLSA-N 0.000 description 3
- MUSGDMDGNGXULI-DCAQKATOSA-N Glu-Glu-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@@H](N)CCC(O)=O MUSGDMDGNGXULI-DCAQKATOSA-N 0.000 description 3
- CXRWMMRLEMVSEH-PEFMBERDSA-N Glu-Ile-Asn Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(N)=O)C(O)=O CXRWMMRLEMVSEH-PEFMBERDSA-N 0.000 description 3
- HVYWQYLBVXMXSV-GUBZILKMSA-N Glu-Leu-Ala Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C)C(O)=O HVYWQYLBVXMXSV-GUBZILKMSA-N 0.000 description 3
- GJBUAAAIZSRCDC-GVXVVHGQSA-N Glu-Leu-Val Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C(C)C)C(O)=O GJBUAAAIZSRCDC-GVXVVHGQSA-N 0.000 description 3
- QNJNPKSWAHPYGI-JYJNAYRXSA-N Glu-Phe-Leu Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@H](C(=O)N[C@@H](CC(C)C)C(O)=O)CC1=CC=CC=C1 QNJNPKSWAHPYGI-JYJNAYRXSA-N 0.000 description 3
- BIYNPVYAZOUVFQ-CIUDSAMLSA-N Glu-Pro-Ser Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CO)C(O)=O BIYNPVYAZOUVFQ-CIUDSAMLSA-N 0.000 description 3
- IDEODOAVGCMUQV-GUBZILKMSA-N Glu-Ser-Leu Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(O)=O IDEODOAVGCMUQV-GUBZILKMSA-N 0.000 description 3
- YOTHMZZSJKKEHZ-SZMVWBNQSA-N Glu-Trp-Lys Chemical compound C1=CC=C2C(C[C@@H](C(=O)N[C@@H](CCCCN)C(O)=O)NC(=O)[C@@H](N)CCC(O)=O)=CNC2=C1 YOTHMZZSJKKEHZ-SZMVWBNQSA-N 0.000 description 3
- HHSKZJZWQFPSKN-AVGNSLFASA-N Glu-Tyr-Asp Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC(O)=O)C(O)=O HHSKZJZWQFPSKN-AVGNSLFASA-N 0.000 description 3
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 3
- PUUYVMYCMIWHFE-BQBZGAKWSA-N Gly-Ala-Arg Chemical compound NCC(=O)N[C@@H](C)C(=O)N[C@H](C(O)=O)CCCN=C(N)N PUUYVMYCMIWHFE-BQBZGAKWSA-N 0.000 description 3
- UPOJUWHGMDJUQZ-IUCAKERBSA-N Gly-Arg-Arg Chemical compound NC(=N)NCCC[C@H](NC(=O)CN)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O UPOJUWHGMDJUQZ-IUCAKERBSA-N 0.000 description 3
- FZQLXNIMCPJVJE-YUMQZZPRSA-N Gly-Asp-Leu Chemical compound [H]NCC(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(C)C)C(O)=O FZQLXNIMCPJVJE-YUMQZZPRSA-N 0.000 description 3
- MHHUEAIBJZWDBH-YUMQZZPRSA-N Gly-Asp-Lys Chemical compound C(CCN)C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)O)NC(=O)CN MHHUEAIBJZWDBH-YUMQZZPRSA-N 0.000 description 3
- LEGMTEAZGRRIMY-ZKWXMUAHSA-N Gly-Cys-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CS)NC(=O)CN LEGMTEAZGRRIMY-ZKWXMUAHSA-N 0.000 description 3
- UHPAZODVFFYEEL-QWRGUYRKSA-N Gly-Leu-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)CN UHPAZODVFFYEEL-QWRGUYRKSA-N 0.000 description 3
- MIIVFRCYJABHTQ-ONGXEEELSA-N Gly-Leu-Val Chemical compound [H]NCC(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C(C)C)C(O)=O MIIVFRCYJABHTQ-ONGXEEELSA-N 0.000 description 3
- OQQKUTVULYLCDG-ONGXEEELSA-N Gly-Lys-Val Chemical compound CC(C)[C@H](NC(=O)[C@H](CCCCN)NC(=O)CN)C(O)=O OQQKUTVULYLCDG-ONGXEEELSA-N 0.000 description 3
- FXLVSYVJDPCIHH-STQMWFEESA-N Gly-Phe-Arg Chemical compound [H]NCC(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O FXLVSYVJDPCIHH-STQMWFEESA-N 0.000 description 3
- PGTISAJTWZPFGN-PEXQALLHSA-N His-Gly-Ile Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)NCC(=O)N[C@@H]([C@@H](C)CC)C(O)=O PGTISAJTWZPFGN-PEXQALLHSA-N 0.000 description 3
- CKONPJHGMIDMJP-IHRRRGAJSA-N His-Val-His Chemical compound C([C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC=1NC=NC=1)C(O)=O)C1=CN=CN1 CKONPJHGMIDMJP-IHRRRGAJSA-N 0.000 description 3
- NKVZTQVGUNLLQW-JBDRJPRFSA-N Ile-Ala-Ala Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](C)C(=O)N[C@@H](C)C(=O)O)N NKVZTQVGUNLLQW-JBDRJPRFSA-N 0.000 description 3
- YKRIXHPEIZUDDY-GMOBBJLQSA-N Ile-Asn-Arg Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@H](C(O)=O)CCCN=C(N)N YKRIXHPEIZUDDY-GMOBBJLQSA-N 0.000 description 3
- FGBRXCZYVRFNKQ-MXAVVETBSA-N Ile-Phe-Ser Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CO)C(=O)O)N FGBRXCZYVRFNKQ-MXAVVETBSA-N 0.000 description 3
- JODPUDMBQBIWCK-GHCJXIJMSA-N Ile-Ser-Asn Chemical compound [H]N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(N)=O)C(O)=O JODPUDMBQBIWCK-GHCJXIJMSA-N 0.000 description 3
- WHUUTDBJXJRKMK-VKHMYHEASA-N L-glutamic acid Chemical compound OC(=O)[C@@H](N)CCC(O)=O WHUUTDBJXJRKMK-VKHMYHEASA-N 0.000 description 3
- AGPKZVBTJJNPAG-WHFBIAKZSA-N L-isoleucine Chemical compound CC[C@H](C)[C@H](N)C(O)=O AGPKZVBTJJNPAG-WHFBIAKZSA-N 0.000 description 3
- DLFAACQHIRSQGG-CIUDSAMLSA-N Leu-Asp-Asp Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(O)=O)C(O)=O DLFAACQHIRSQGG-CIUDSAMLSA-N 0.000 description 3
- XVSJMWYYLHPDKY-DCAQKATOSA-N Leu-Asp-Met Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CCSC)C(O)=O XVSJMWYYLHPDKY-DCAQKATOSA-N 0.000 description 3
- AUBMZAMQCOYSIC-MNXVOIDGSA-N Leu-Ile-Gln Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CCC(N)=O)C(O)=O AUBMZAMQCOYSIC-MNXVOIDGSA-N 0.000 description 3
- HRTRLSRYZZKPCO-BJDJZHNGSA-N Leu-Ile-Ser Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CO)C(O)=O HRTRLSRYZZKPCO-BJDJZHNGSA-N 0.000 description 3
- GZRABTMNWJXFMH-UVOCVTCTSA-N Leu-Thr-Thr Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O GZRABTMNWJXFMH-UVOCVTCTSA-N 0.000 description 3
- WAIHHELKYSFIQN-XUXIUFHCSA-N Lys-Ile-Val Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](C(C)C)C(O)=O WAIHHELKYSFIQN-XUXIUFHCSA-N 0.000 description 3
- MYZMQWHPDAYKIE-SRVKXCTJSA-N Lys-Leu-Ala Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C)C(O)=O MYZMQWHPDAYKIE-SRVKXCTJSA-N 0.000 description 3
- MTBLFIQZECOEBY-IHRRRGAJSA-N Lys-Met-Lys Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CCCCN)C(O)=O MTBLFIQZECOEBY-IHRRRGAJSA-N 0.000 description 3
- MIMXMVDLMDMOJD-BZSNNMDCSA-N Lys-Tyr-Leu Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC(C)C)C(O)=O MIMXMVDLMDMOJD-BZSNNMDCSA-N 0.000 description 3
- NYTDJEZBAAFLLG-IHRRRGAJSA-N Lys-Val-Lys Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CCCCN)C(O)=O NYTDJEZBAAFLLG-IHRRRGAJSA-N 0.000 description 3
- QEVRUYFHWJJUHZ-DCAQKATOSA-N Met-Ala-Leu Chemical compound CSCC[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@H](C(O)=O)CC(C)C QEVRUYFHWJJUHZ-DCAQKATOSA-N 0.000 description 3
- GTRWUQSSISWRTL-NAKRPEOUSA-N Met-Cys-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CS)NC(=O)[C@H](CCSC)N GTRWUQSSISWRTL-NAKRPEOUSA-N 0.000 description 3
- HHCOOFPGNXKFGR-HJGDQZAQSA-N Met-Gln-Thr Chemical compound [H]N[C@@H](CCSC)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O HHCOOFPGNXKFGR-HJGDQZAQSA-N 0.000 description 3
- DJDFBVNNDAUPRW-GUBZILKMSA-N Met-Glu-Gln Chemical compound CSCC[C@H](N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(N)=O)C(O)=O DJDFBVNNDAUPRW-GUBZILKMSA-N 0.000 description 3
- BKIFWLQFOOKUCA-DCAQKATOSA-N Met-His-Ser Chemical compound CSCC[C@@H](C(=O)N[C@@H](CC1=CN=CN1)C(=O)N[C@@H](CO)C(=O)O)N BKIFWLQFOOKUCA-DCAQKATOSA-N 0.000 description 3
- HZLSUXCMSIBCRV-RVMXOQNASA-N Met-Ile-Pro Chemical compound CC[C@H](C)[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CCSC)N HZLSUXCMSIBCRV-RVMXOQNASA-N 0.000 description 3
- USBFEVBHEQBWDD-AVGNSLFASA-N Met-Leu-Val Chemical compound CSCC[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C(C)C)C(O)=O USBFEVBHEQBWDD-AVGNSLFASA-N 0.000 description 3
- 101100480205 Mus musculus Syt4 gene Proteins 0.000 description 3
- XMBSYZWANAQXEV-UHFFFAOYSA-N N-alpha-L-glutamyl-L-phenylalanine Natural products OC(=O)CCC(N)C(=O)NC(C(O)=O)CC1=CC=CC=C1 XMBSYZWANAQXEV-UHFFFAOYSA-N 0.000 description 3
- 238000000636 Northern blotting Methods 0.000 description 3
- 108091005461 Nucleic proteins Proteins 0.000 description 3
- 239000004677 Nylon Substances 0.000 description 3
- WFDAEEUZPZSMOG-SRVKXCTJSA-N Phe-Cys-Ser Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CS)C(=O)N[C@@H](CO)C(O)=O WFDAEEUZPZSMOG-SRVKXCTJSA-N 0.000 description 3
- DHZOGDVYRQOGAC-BZSNNMDCSA-N Phe-Cys-Tyr Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CS)C(=O)N[C@@H](CC2=CC=C(C=C2)O)C(=O)O)N DHZOGDVYRQOGAC-BZSNNMDCSA-N 0.000 description 3
- JWBLQDDHSDGEGR-DRZSPHRISA-N Phe-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@@H](N)CC1=CC=CC=C1 JWBLQDDHSDGEGR-DRZSPHRISA-N 0.000 description 3
- MJQFZGOIVBDIMZ-WHOFXGATSA-N Phe-Ile-Gly Chemical compound N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H]([C@@H](C)CC)C(=O)NCC(=O)O MJQFZGOIVBDIMZ-WHOFXGATSA-N 0.000 description 3
- TXKWKTWYTIAZSV-KKUMJFAQSA-N Phe-Leu-Cys Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CS)C(=O)O)NC(=O)[C@H](CC1=CC=CC=C1)N TXKWKTWYTIAZSV-KKUMJFAQSA-N 0.000 description 3
- OSBADCBXAMSPQD-YESZJQIVSA-N Phe-Leu-Pro Chemical compound CC(C)C[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CC2=CC=CC=C2)N OSBADCBXAMSPQD-YESZJQIVSA-N 0.000 description 3
- UNBFGVQVQGXXCK-KKUMJFAQSA-N Phe-Ser-Leu Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(O)=O UNBFGVQVQGXXCK-KKUMJFAQSA-N 0.000 description 3
- KLYYKKGCPOGDPE-OEAJRASXSA-N Phe-Thr-Leu Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(O)=O KLYYKKGCPOGDPE-OEAJRASXSA-N 0.000 description 3
- GTMSCDVFQLNEOY-BZSNNMDCSA-N Phe-Tyr-Asn Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CC2=CC=C(C=C2)O)C(=O)N[C@@H](CC(=O)N)C(=O)O)N GTMSCDVFQLNEOY-BZSNNMDCSA-N 0.000 description 3
- RGMLUHANLDVMPB-ULQDDVLXSA-N Phe-Val-Lys Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CC1=CC=CC=C1)N RGMLUHANLDVMPB-ULQDDVLXSA-N 0.000 description 3
- KIPIKSXPPLABPN-CIUDSAMLSA-N Pro-Glu-Asn Chemical compound NC(=O)C[C@@H](C(O)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@@H]1CCCN1 KIPIKSXPPLABPN-CIUDSAMLSA-N 0.000 description 3
- VWXGFAIZUQBBBG-UWVGGRQHSA-N Pro-His-Gly Chemical compound C([C@@H](C(=O)NCC(=O)[O-])NC(=O)[C@H]1[NH2+]CCC1)C1=CN=CN1 VWXGFAIZUQBBBG-UWVGGRQHSA-N 0.000 description 3
- KWMUAKQOVYCQJQ-ZPFDUUQYSA-N Pro-Ile-Glu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)O)NC(=O)[C@@H]1CCCN1 KWMUAKQOVYCQJQ-ZPFDUUQYSA-N 0.000 description 3
- ZLXKLMHAMDENIO-DCAQKATOSA-N Pro-Lys-Asp Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(O)=O)C(O)=O ZLXKLMHAMDENIO-DCAQKATOSA-N 0.000 description 3
- DWPXHLIBFQLKLK-CYDGBPFRSA-N Pro-Pro-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@@H]1CCCN1C(=O)[C@H]1NCCC1 DWPXHLIBFQLKLK-CYDGBPFRSA-N 0.000 description 3
- NAIPAPCKKRCMBL-JYJNAYRXSA-N Pro-Pro-Phe Chemical compound C([C@@H](C(=O)O)NC(=O)[C@H]1N(CCC1)C(=O)[C@H]1NCCC1)C1=CC=CC=C1 NAIPAPCKKRCMBL-JYJNAYRXSA-N 0.000 description 3
- DNIAPMSPPWPWGF-UHFFFAOYSA-N Propylene glycol Chemical compound CC(O)CO DNIAPMSPPWPWGF-UHFFFAOYSA-N 0.000 description 3
- 238000012300 Sequence Analysis Methods 0.000 description 3
- YQHZVYJAGWMHES-ZLUOBGJFSA-N Ser-Ala-Ser Chemical compound OC[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@@H](CO)C(O)=O YQHZVYJAGWMHES-ZLUOBGJFSA-N 0.000 description 3
- KNZQGAUEYZJUSQ-ZLUOBGJFSA-N Ser-Asp-Ala Chemical compound C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](CO)N KNZQGAUEYZJUSQ-ZLUOBGJFSA-N 0.000 description 3
- BGOWRLSWJCVYAQ-CIUDSAMLSA-N Ser-Asp-Leu Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(C)C)C(O)=O BGOWRLSWJCVYAQ-CIUDSAMLSA-N 0.000 description 3
- KCFKKAQKRZBWJB-ZLUOBGJFSA-N Ser-Cys-Ala Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CS)C(=O)N[C@@H](C)C(O)=O KCFKKAQKRZBWJB-ZLUOBGJFSA-N 0.000 description 3
- RNMRYWZYFHHOEV-CIUDSAMLSA-N Ser-Gln-Arg Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O RNMRYWZYFHHOEV-CIUDSAMLSA-N 0.000 description 3
- DGHFNYXVIXNNMC-GUBZILKMSA-N Ser-Gln-His Chemical compound C1=C(NC=N1)C[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)N)NC(=O)[C@H](CO)N DGHFNYXVIXNNMC-GUBZILKMSA-N 0.000 description 3
- QGAHMVHBORDHDC-YUMQZZPRSA-N Ser-His-Gly Chemical compound OC[C@H](N)C(=O)N[C@H](C(=O)NCC(O)=O)CC1=CN=CN1 QGAHMVHBORDHDC-YUMQZZPRSA-N 0.000 description 3
- HBTCFCHYALPXME-HTFCKZLJSA-N Ser-Ile-Ile Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O HBTCFCHYALPXME-HTFCKZLJSA-N 0.000 description 3
- ZSLFCBHEINFXRS-LPEHRKFASA-N Ser-Met-Pro Chemical compound CSCC[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CO)N ZSLFCBHEINFXRS-LPEHRKFASA-N 0.000 description 3
- UGTZYIPOBYXWRW-SRVKXCTJSA-N Ser-Phe-Asp Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(O)=O)C(O)=O UGTZYIPOBYXWRW-SRVKXCTJSA-N 0.000 description 3
- BMKNXTJLHFIAAH-CIUDSAMLSA-N Ser-Ser-Leu Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(O)=O BMKNXTJLHFIAAH-CIUDSAMLSA-N 0.000 description 3
- XPVIVVLLLOFBRH-XIRDDKMYSA-N Ser-Trp-Lys Chemical compound NCCCC[C@H](NC(=O)[C@H](Cc1c[nH]c2ccccc12)NC(=O)[C@@H](N)CO)C(O)=O XPVIVVLLLOFBRH-XIRDDKMYSA-N 0.000 description 3
- SGZVZUCRAVSPKQ-FXQIFTODSA-N Ser-Val-Cys Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CS)C(=O)O)NC(=O)[C@H](CO)N SGZVZUCRAVSPKQ-FXQIFTODSA-N 0.000 description 3
- 101150064314 Syt3 gene Proteins 0.000 description 3
- HYLXOQURIOCKIH-VQVTYTSYSA-N Thr-Arg Chemical compound C[C@@H](O)[C@H](N)C(=O)N[C@H](C(O)=O)CCCNC(N)=N HYLXOQURIOCKIH-VQVTYTSYSA-N 0.000 description 3
- TWLMXDWFVNEFFK-FJXKBIBVSA-N Thr-Arg-Gly Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)NCC(O)=O TWLMXDWFVNEFFK-FJXKBIBVSA-N 0.000 description 3
- JNQZPAWOPBZGIX-RCWTZXSCSA-N Thr-Arg-Val Chemical compound CC(C)[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@@H](N)[C@@H](C)O)CCCN=C(N)N JNQZPAWOPBZGIX-RCWTZXSCSA-N 0.000 description 3
- JTEICXDKGWKRRV-HJGDQZAQSA-N Thr-Asn-Lys Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CCCCN)C(=O)O)N)O JTEICXDKGWKRRV-HJGDQZAQSA-N 0.000 description 3
- QNJZOAHSYPXTAB-VEVYYDQMSA-N Thr-Asn-Met Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCSC)C(O)=O QNJZOAHSYPXTAB-VEVYYDQMSA-N 0.000 description 3
- KCRQEJSKXAIULJ-FJXKBIBVSA-N Thr-Gly-Arg Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)NCC(=O)N[C@@H](CCCNC(N)=N)C(O)=O KCRQEJSKXAIULJ-FJXKBIBVSA-N 0.000 description 3
- MSIYNSBKKVMGFO-BHNWBGBOSA-N Thr-Gly-Pro Chemical compound C[C@H]([C@@H](C(=O)NCC(=O)N1CCC[C@@H]1C(=O)O)N)O MSIYNSBKKVMGFO-BHNWBGBOSA-N 0.000 description 3
- ADPHPKGWVDHWML-PPCPHDFISA-N Thr-Ile-Leu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(C)C)C(=O)O)NC(=O)[C@H]([C@@H](C)O)N ADPHPKGWVDHWML-PPCPHDFISA-N 0.000 description 3
- WNQJTLATMXYSEL-OEAJRASXSA-N Thr-Phe-Leu Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(C)C)C(O)=O WNQJTLATMXYSEL-OEAJRASXSA-N 0.000 description 3
- VTMGKRABARCZAX-OSUNSFLBSA-N Thr-Pro-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)[C@@H](C)O VTMGKRABARCZAX-OSUNSFLBSA-N 0.000 description 3
- WPSKTVVMQCXPRO-BWBBJGPYSA-N Thr-Ser-Ser Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CO)C(O)=O WPSKTVVMQCXPRO-BWBBJGPYSA-N 0.000 description 3
- CSNBWOJOEOPYIJ-UVOCVTCTSA-N Thr-Thr-Lys Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCCCN)C(O)=O CSNBWOJOEOPYIJ-UVOCVTCTSA-N 0.000 description 3
- JONPRIHUYSPIMA-UWJYBYFXSA-N Tyr-Ala-Asn Chemical compound NC(=O)C[C@@H](C(O)=O)NC(=O)[C@H](C)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 JONPRIHUYSPIMA-UWJYBYFXSA-N 0.000 description 3
- DKKHULUSOSWGHS-UWJYBYFXSA-N Tyr-Asn-Ala Chemical compound C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](CC1=CC=C(C=C1)O)N DKKHULUSOSWGHS-UWJYBYFXSA-N 0.000 description 3
- NLMXVDDEQFKQQU-CFMVVWHZSA-N Tyr-Asp-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@H](CC(O)=O)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 NLMXVDDEQFKQQU-CFMVVWHZSA-N 0.000 description 3
- JFDGVHXRCKEBAU-KKUMJFAQSA-N Tyr-Asp-Lys Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CCCCN)C(=O)O)N)O JFDGVHXRCKEBAU-KKUMJFAQSA-N 0.000 description 3
- HVPPEXXUDXAPOM-MGHWNKPDSA-N Tyr-Ile-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H]([C@@H](C)CC)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 HVPPEXXUDXAPOM-MGHWNKPDSA-N 0.000 description 3
- BGFCXQXETBDEHP-BZSNNMDCSA-N Tyr-Phe-Asn Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(N)=O)C(O)=O BGFCXQXETBDEHP-BZSNNMDCSA-N 0.000 description 3
- VBFVQTPETKJCQW-RPTUDFQQSA-N Tyr-Phe-Thr Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H]([C@@H](C)O)C(O)=O VBFVQTPETKJCQW-RPTUDFQQSA-N 0.000 description 3
- IQQYYFPCWKWUHW-YDHLFZDLSA-N Val-Asn-Tyr Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CC1=CC=C(C=C1)O)C(=O)O)N IQQYYFPCWKWUHW-YDHLFZDLSA-N 0.000 description 3
- ZSZFTYVFQLUWBF-QXEWZRGKSA-N Val-Asp-Met Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CCSC)C(=O)O)N ZSZFTYVFQLUWBF-QXEWZRGKSA-N 0.000 description 3
- ZTKGDWOUYRRAOQ-ULQDDVLXSA-N Val-His-Phe Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC1=CN=CN1)C(=O)N[C@@H](CC2=CC=CC=C2)C(=O)O)N ZTKGDWOUYRRAOQ-ULQDDVLXSA-N 0.000 description 3
- FTKXYXACXYOHND-XUXIUFHCSA-N Val-Ile-Leu Chemical compound CC(C)[C@H](N)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(C)C)C(O)=O FTKXYXACXYOHND-XUXIUFHCSA-N 0.000 description 3
- OVBMCNDKCWAXMZ-NAKRPEOUSA-N Val-Ile-Ser Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CO)C(=O)O)NC(=O)[C@H](C(C)C)N OVBMCNDKCWAXMZ-NAKRPEOUSA-N 0.000 description 3
- RWOGENDAOGMHLX-DCAQKATOSA-N Val-Lys-Ala Chemical compound C[C@@H](C(=O)O)NC(=O)[C@H](CCCCN)NC(=O)[C@H](C(C)C)N RWOGENDAOGMHLX-DCAQKATOSA-N 0.000 description 3
- IEBGHUMBJXIXHM-AVGNSLFASA-N Val-Lys-Met Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCSC)C(=O)O)N IEBGHUMBJXIXHM-AVGNSLFASA-N 0.000 description 3
- ZEBRMWPTJNHXAJ-JYJNAYRXSA-N Val-Phe-Met Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCSC)C(=O)O)N ZEBRMWPTJNHXAJ-JYJNAYRXSA-N 0.000 description 3
- BCBFMJYTNKDALA-UFYCRDLUSA-N Val-Phe-Phe Chemical compound N[C@@H](C(C)C)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)O BCBFMJYTNKDALA-UFYCRDLUSA-N 0.000 description 3
- JPBGMZDTPVGGMQ-ULQDDVLXSA-N Val-Tyr-His Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC1=CC=C(C=C1)O)C(=O)N[C@@H](CC2=CN=CN2)C(=O)O)N JPBGMZDTPVGGMQ-ULQDDVLXSA-N 0.000 description 3
- 239000011543 agarose gel Substances 0.000 description 3
- 230000016571 aggressive behavior Effects 0.000 description 3
- 108010070944 alanylhistidine Proteins 0.000 description 3
- 108010050025 alpha-glutamyltryptophan Proteins 0.000 description 3
- 125000000539 amino acid group Chemical group 0.000 description 3
- 108010052670 arginyl-glutamyl-glutamic acid Proteins 0.000 description 3
- 108010068380 arginylarginine Proteins 0.000 description 3
- 108010062796 arginyllysine Proteins 0.000 description 3
- 108010077245 asparaginyl-proline Proteins 0.000 description 3
- 230000001580 bacterial effect Effects 0.000 description 3
- 239000011575 calcium Substances 0.000 description 3
- 229910052791 calcium Inorganic materials 0.000 description 3
- 108091036078 conserved sequence Proteins 0.000 description 3
- 238000010276 construction Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 239000008121 dextrose Substances 0.000 description 3
- QONQRTHLHBTMGP-UHFFFAOYSA-N digitoxigenin Natural products CC12CCC(C3(CCC(O)CC3CC3)C)C3C11OC1CC2C1=CC(=O)OC1 QONQRTHLHBTMGP-UHFFFAOYSA-N 0.000 description 3
- SHIBSTMRCDJXLN-KCZCNTNESA-N digoxigenin Chemical compound C1([C@@H]2[C@@]3([C@@](CC2)(O)[C@H]2[C@@H]([C@@]4(C)CC[C@H](O)C[C@H]4CC2)C[C@H]3O)C)=CC(=O)OC1 SHIBSTMRCDJXLN-KCZCNTNESA-N 0.000 description 3
- 210000002257 embryonic structure Anatomy 0.000 description 3
- 210000000981 epithelium Anatomy 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 239000012530 fluid Substances 0.000 description 3
- 230000037433 frameshift Effects 0.000 description 3
- 238000001415 gene therapy Methods 0.000 description 3
- 108010073628 glutamyl-valyl-phenylalanine Proteins 0.000 description 3
- 108010020688 glycylhistidine Proteins 0.000 description 3
- 230000006698 induction Effects 0.000 description 3
- 238000001990 intravenous administration Methods 0.000 description 3
- 108010003700 lysyl aspartic acid Proteins 0.000 description 3
- 210000001161 mammalian embryo Anatomy 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 229920001778 nylon Polymers 0.000 description 3
- 150000002894 organic compounds Chemical class 0.000 description 3
- 239000008194 pharmaceutical composition Substances 0.000 description 3
- 108010064486 phenylalanyl-leucyl-valine Proteins 0.000 description 3
- 108010073025 phenylalanylphenylalanine Proteins 0.000 description 3
- 108010020432 prolyl-prolylisoleucine Proteins 0.000 description 3
- 108010015796 prolylisoleucine Proteins 0.000 description 3
- 238000000159 protein binding assay Methods 0.000 description 3
- 230000002285 radioactive effect Effects 0.000 description 3
- 230000002829 reductive effect Effects 0.000 description 3
- 108091008146 restriction endonucleases Proteins 0.000 description 3
- 238000003757 reverse transcription PCR Methods 0.000 description 3
- 238000012163 sequencing technique Methods 0.000 description 3
- 125000003607 serino group Chemical group [H]N([H])[C@]([H])(C(=O)[*])C(O[H])([H])[H] 0.000 description 3
- 235000000346 sugar Nutrition 0.000 description 3
- 238000010361 transduction Methods 0.000 description 3
- 230000026683 transduction Effects 0.000 description 3
- 108010051110 tyrosyl-lysine Proteins 0.000 description 3
- 108010020532 tyrosyl-proline Proteins 0.000 description 3
- 241000701161 unidentified adenovirus Species 0.000 description 3
- 241001430294 unidentified retrovirus Species 0.000 description 3
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 2
- 108091027075 5S-rRNA precursor Proteins 0.000 description 2
- UWQJHXKARZWDIJ-ZLUOBGJFSA-N Ala-Ala-Cys Chemical compound C[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@@H](CS)C(O)=O UWQJHXKARZWDIJ-ZLUOBGJFSA-N 0.000 description 2
- SDMAQFGBPOJFOM-GUBZILKMSA-N Ala-Arg-Arg Chemical compound NC(=N)NCCC[C@H](NC(=O)[C@@H](N)C)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O SDMAQFGBPOJFOM-GUBZILKMSA-N 0.000 description 2
- LBJYAILUMSUTAM-ZLUOBGJFSA-N Ala-Asn-Asn Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(N)=O)C(O)=O LBJYAILUMSUTAM-ZLUOBGJFSA-N 0.000 description 2
- STACJSVFHSEZJV-GHCJXIJMSA-N Ala-Asn-Ile Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O STACJSVFHSEZJV-GHCJXIJMSA-N 0.000 description 2
- WXERCAHAIKMTKX-ZLUOBGJFSA-N Ala-Asp-Asp Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(O)=O)C(O)=O WXERCAHAIKMTKX-ZLUOBGJFSA-N 0.000 description 2
- WQVYAWIMAWTGMW-ZLUOBGJFSA-N Ala-Asp-Cys Chemical compound C[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CS)C(=O)O)N WQVYAWIMAWTGMW-ZLUOBGJFSA-N 0.000 description 2
- CXISPYVYMQWFLE-VKHMYHEASA-N Ala-Gly Chemical compound C[C@H]([NH3+])C(=O)NCC([O-])=O CXISPYVYMQWFLE-VKHMYHEASA-N 0.000 description 2
- IFKQPMZRDQZSHI-GHCJXIJMSA-N Ala-Ile-Asn Chemical compound [H]N[C@@H](C)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(N)=O)C(O)=O IFKQPMZRDQZSHI-GHCJXIJMSA-N 0.000 description 2
- DPNZTBKGAUAZQU-DLOVCJGASA-N Ala-Leu-His Chemical compound C[C@@H](C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC1=CN=CN1)C(=O)O)N DPNZTBKGAUAZQU-DLOVCJGASA-N 0.000 description 2
- CHFFHQUVXHEGBY-GARJFASQSA-N Ala-Lys-Pro Chemical compound C[C@@H](C(=O)N[C@@H](CCCCN)C(=O)N1CCC[C@@H]1C(=O)O)N CHFFHQUVXHEGBY-GARJFASQSA-N 0.000 description 2
- OMDNCNKNEGFOMM-BQBZGAKWSA-N Ala-Met-Gly Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CCSC)C(=O)NCC(O)=O OMDNCNKNEGFOMM-BQBZGAKWSA-N 0.000 description 2
- BFMIRJBURUXDRG-DLOVCJGASA-N Ala-Phe-Asp Chemical compound OC(=O)C[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@@H](N)C)CC1=CC=CC=C1 BFMIRJBURUXDRG-DLOVCJGASA-N 0.000 description 2
- HYIDEIQUCBKIPL-CQDKDKBSSA-N Ala-Phe-His Chemical compound C[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC2=CN=CN2)C(=O)O)N HYIDEIQUCBKIPL-CQDKDKBSSA-N 0.000 description 2
- YCRAFFCYWOUEOF-DLOVCJGASA-N Ala-Phe-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@@H](N)C)CC1=CC=CC=C1 YCRAFFCYWOUEOF-DLOVCJGASA-N 0.000 description 2
- PEEYDECOOVQKRZ-DLOVCJGASA-N Ala-Ser-Phe Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O PEEYDECOOVQKRZ-DLOVCJGASA-N 0.000 description 2
- NCQMBSJGJMYKCK-ZLUOBGJFSA-N Ala-Ser-Ser Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CO)C(=O)N[C@@H](CO)C(O)=O NCQMBSJGJMYKCK-ZLUOBGJFSA-N 0.000 description 2
- BGGAIXWIZCIFSG-XDTLVQLUSA-N Ala-Tyr-Glu Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CCC(O)=O)C(O)=O BGGAIXWIZCIFSG-XDTLVQLUSA-N 0.000 description 2
- YJHKTAMKPGFJCT-NRPADANISA-N Ala-Val-Glu Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CCC(O)=O)C(O)=O YJHKTAMKPGFJCT-NRPADANISA-N 0.000 description 2
- 101710117290 Aldo-keto reductase family 1 member C4 Proteins 0.000 description 2
- IIABBYGHLYWVOS-FXQIFTODSA-N Arg-Asn-Ser Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CO)C(O)=O IIABBYGHLYWVOS-FXQIFTODSA-N 0.000 description 2
- NVUIWHJLPSZZQC-CYDGBPFRSA-N Arg-Ile-Arg Chemical compound NC(N)=NCCC[C@H](N)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CCCN=C(N)N)C(O)=O NVUIWHJLPSZZQC-CYDGBPFRSA-N 0.000 description 2
- OOIMKQRCPJBGPD-XUXIUFHCSA-N Arg-Ile-Leu Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(C)C)C(O)=O OOIMKQRCPJBGPD-XUXIUFHCSA-N 0.000 description 2
- LKDHUGLXOHYINY-XUXIUFHCSA-N Arg-Ile-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CCCN=C(N)N)N LKDHUGLXOHYINY-XUXIUFHCSA-N 0.000 description 2
- NMRHDSAOIURTNT-RWMBFGLXSA-N Arg-Leu-Pro Chemical compound CC(C)C[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CCCN=C(N)N)N NMRHDSAOIURTNT-RWMBFGLXSA-N 0.000 description 2
- OGSQONVYSTZIJB-WDSOQIARSA-N Arg-Leu-Trp Chemical compound CC(C)C[C@H](NC(=O)[C@@H](N)CCCN=C(N)N)C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(O)=O OGSQONVYSTZIJB-WDSOQIARSA-N 0.000 description 2
- MTYLORHAQXVQOW-AVGNSLFASA-N Arg-Lys-Met Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCSC)C(O)=O MTYLORHAQXVQOW-AVGNSLFASA-N 0.000 description 2
- XSPKAHFVDKRGRL-DCAQKATOSA-N Arg-Pro-Glu Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCC(O)=O)C(O)=O XSPKAHFVDKRGRL-DCAQKATOSA-N 0.000 description 2
- IJYZHIOOBGIINM-WDSKDSINSA-N Arg-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@@H](N)CCCN=C(N)N IJYZHIOOBGIINM-WDSKDSINSA-N 0.000 description 2
- LFWOQHSQNCKXRU-UFYCRDLUSA-N Arg-Tyr-Phe Chemical compound C([C@H](NC(=O)[C@H](CCCN=C(N)N)N)C(=O)N[C@@H](CC=1C=CC=CC=1)C(O)=O)C1=CC=C(O)C=C1 LFWOQHSQNCKXRU-UFYCRDLUSA-N 0.000 description 2
- WTUZDHWWGUQEKN-SRVKXCTJSA-N Arg-Val-Met Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CCSC)C(O)=O WTUZDHWWGUQEKN-SRVKXCTJSA-N 0.000 description 2
- VKCOHFFSTKCXEQ-OLHMAJIHSA-N Asn-Asn-Thr Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O VKCOHFFSTKCXEQ-OLHMAJIHSA-N 0.000 description 2
- ZMWDUIIACVLIHK-GHCJXIJMSA-N Asn-Cys-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CS)NC(=O)[C@H](CC(=O)N)N ZMWDUIIACVLIHK-GHCJXIJMSA-N 0.000 description 2
- YQNBILXAUIAUCF-CIUDSAMLSA-N Asn-Cys-Lys Chemical compound C(CCN)C[C@@H](C(=O)O)NC(=O)[C@H](CS)NC(=O)[C@H](CC(=O)N)N YQNBILXAUIAUCF-CIUDSAMLSA-N 0.000 description 2
- SPIPSJXLZVTXJL-ZLUOBGJFSA-N Asn-Cys-Ser Chemical compound NC(=O)C[C@H](N)C(=O)N[C@@H](CS)C(=O)N[C@@H](CO)C(O)=O SPIPSJXLZVTXJL-ZLUOBGJFSA-N 0.000 description 2
- FAEFJTCTNZTPHX-ACZMJKKPSA-N Asn-Gln-Ala Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](C)C(O)=O FAEFJTCTNZTPHX-ACZMJKKPSA-N 0.000 description 2
- QGNXYDHVERJIAY-ACZMJKKPSA-N Asn-Gln-Cys Chemical compound C(CC(=O)N)[C@@H](C(=O)N[C@@H](CS)C(=O)O)NC(=O)[C@H](CC(=O)N)N QGNXYDHVERJIAY-ACZMJKKPSA-N 0.000 description 2
- GWNMUVANAWDZTI-YUMQZZPRSA-N Asn-Gly-His Chemical compound C1=C(NC=N1)C[C@@H](C(=O)O)NC(=O)CNC(=O)[C@H](CC(=O)N)N GWNMUVANAWDZTI-YUMQZZPRSA-N 0.000 description 2
- QUAWOKPCAKCHQL-SRVKXCTJSA-N Asn-His-Lys Chemical compound C1=C(NC=N1)C[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CC(=O)N)N QUAWOKPCAKCHQL-SRVKXCTJSA-N 0.000 description 2
- NLRJGXZWTKXRHP-DCAQKATOSA-N Asn-Leu-Arg Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O NLRJGXZWTKXRHP-DCAQKATOSA-N 0.000 description 2
- JLNFZLNDHONLND-GARJFASQSA-N Asn-Leu-Pro Chemical compound CC(C)C[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CC(=O)N)N JLNFZLNDHONLND-GARJFASQSA-N 0.000 description 2
- KHCNTVRVAYCPQE-CIUDSAMLSA-N Asn-Lys-Asn Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(N)=O)C(O)=O KHCNTVRVAYCPQE-CIUDSAMLSA-N 0.000 description 2
- KEUNWIXNKVWCFL-FXQIFTODSA-N Asn-Met-Ser Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CO)C(O)=O KEUNWIXNKVWCFL-FXQIFTODSA-N 0.000 description 2
- MVXJBVVLACEGCG-PCBIJLKTSA-N Asn-Phe-Ile Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O MVXJBVVLACEGCG-PCBIJLKTSA-N 0.000 description 2
- HZZIFFOVHLWGCS-KKUMJFAQSA-N Asn-Phe-Leu Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(C)C)C(O)=O HZZIFFOVHLWGCS-KKUMJFAQSA-N 0.000 description 2
- ZJIFRAPZHAGLGR-MELADBBJSA-N Asn-Phe-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CC2=CC=CC=C2)NC(=O)[C@H](CC(=O)N)N)C(=O)O ZJIFRAPZHAGLGR-MELADBBJSA-N 0.000 description 2
- UYCPJVYQYARFGB-YDHLFZDLSA-N Asn-Phe-Val Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](C(C)C)C(O)=O UYCPJVYQYARFGB-YDHLFZDLSA-N 0.000 description 2
- JTXVXGXTRXMOFJ-FXQIFTODSA-N Asn-Pro-Asn Chemical compound NC(=O)C[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CC(N)=O)C(O)=O JTXVXGXTRXMOFJ-FXQIFTODSA-N 0.000 description 2
- VBKIFHUVGLOJKT-FKZODXBYSA-N Asn-Thr Chemical compound C[C@@H]([C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)N)O VBKIFHUVGLOJKT-FKZODXBYSA-N 0.000 description 2
- BIGRHVNFFJTHEB-UBHSHLNASA-N Asn-Trp-Asp Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC1=CNC2=C1C=CC=C2)C(=O)N[C@@H](CC(O)=O)C(O)=O BIGRHVNFFJTHEB-UBHSHLNASA-N 0.000 description 2
- UPAGTDJAORYMEC-VHWLVUOQSA-N Asn-Trp-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CC1=CNC2=CC=CC=C21)NC(=O)[C@H](CC(=O)N)N UPAGTDJAORYMEC-VHWLVUOQSA-N 0.000 description 2
- JNCRAQVYJZGIOW-QSFUFRPTSA-N Asn-Val-Ile Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O JNCRAQVYJZGIOW-QSFUFRPTSA-N 0.000 description 2
- NECWUSYTYSIFNC-DLOVCJGASA-N Asp-Ala-Phe Chemical compound OC(=O)C[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 NECWUSYTYSIFNC-DLOVCJGASA-N 0.000 description 2
- GWTLRDMPMJCNMH-WHFBIAKZSA-N Asp-Asn-Gly Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(N)=O)C(=O)NCC(O)=O GWTLRDMPMJCNMH-WHFBIAKZSA-N 0.000 description 2
- HOQGTAIGQSDCHR-SRVKXCTJSA-N Asp-Asn-Phe Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O HOQGTAIGQSDCHR-SRVKXCTJSA-N 0.000 description 2
- CKAJHWFHHFSCDT-WHFBIAKZSA-N Asp-Glu Chemical compound OC(=O)C[C@H](N)C(=O)N[C@H](C(O)=O)CCC(O)=O CKAJHWFHHFSCDT-WHFBIAKZSA-N 0.000 description 2
- VFUXXFVCYZPOQG-WDSKDSINSA-N Asp-Glu-Gly Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)NCC(O)=O VFUXXFVCYZPOQG-WDSKDSINSA-N 0.000 description 2
- QCLHLXDWRKOHRR-GUBZILKMSA-N Asp-Glu-His Chemical compound C1=C(NC=N1)C[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CC(=O)O)N QCLHLXDWRKOHRR-GUBZILKMSA-N 0.000 description 2
- YDJVIBMKAMQPPP-LAEOZQHASA-N Asp-Glu-Val Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](C(C)C)C(O)=O YDJVIBMKAMQPPP-LAEOZQHASA-N 0.000 description 2
- PYXXJFRXIYAESU-PCBIJLKTSA-N Asp-Ile-Phe Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O PYXXJFRXIYAESU-PCBIJLKTSA-N 0.000 description 2
- MYOHQBFRJQFIDZ-KKUMJFAQSA-N Asp-Leu-Tyr Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O MYOHQBFRJQFIDZ-KKUMJFAQSA-N 0.000 description 2
- JXGJJQJHXHXJQF-CIUDSAMLSA-N Asp-Met-Glu Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CCC(O)=O)C(O)=O JXGJJQJHXHXJQF-CIUDSAMLSA-N 0.000 description 2
- VWWAFGHMPWBKEP-GMOBBJLQSA-N Asp-Met-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CCSC)NC(=O)[C@H](CC(=O)O)N VWWAFGHMPWBKEP-GMOBBJLQSA-N 0.000 description 2
- CUQDCPXNZPDYFQ-ZLUOBGJFSA-N Asp-Ser-Asp Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(O)=O)C(O)=O CUQDCPXNZPDYFQ-ZLUOBGJFSA-N 0.000 description 2
- FIAKNCXQFFKSSI-ZLUOBGJFSA-N Asp-Ser-Cys Chemical compound OC(=O)C[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@@H](CS)C(O)=O FIAKNCXQFFKSSI-ZLUOBGJFSA-N 0.000 description 2
- ZQFRDAZBTSFGGW-SRVKXCTJSA-N Asp-Ser-Phe Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O ZQFRDAZBTSFGGW-SRVKXCTJSA-N 0.000 description 2
- XAPPCWUWHNWCPQ-PBCZWWQYSA-N Asp-Thr-His Chemical compound N[C@@H](CC(=O)O)C(=O)N[C@@H]([C@H](O)C)C(=O)N[C@@H](CC1=CNC=N1)C(=O)O XAPPCWUWHNWCPQ-PBCZWWQYSA-N 0.000 description 2
- 238000011746 C57BL/6J (JAX™ mouse strain) Methods 0.000 description 2
- HAYVTMHUNMMXCV-IMJSIDKUSA-N Cys-Ala Chemical compound OC(=O)[C@H](C)NC(=O)[C@@H](N)CS HAYVTMHUNMMXCV-IMJSIDKUSA-N 0.000 description 2
- ZOLXQKZHYOHHMD-DLOVCJGASA-N Cys-Ala-Phe Chemical compound C[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)O)NC(=O)[C@H](CS)N ZOLXQKZHYOHHMD-DLOVCJGASA-N 0.000 description 2
- WVJHEDOLHPZLRV-CIUDSAMLSA-N Cys-Asn-Lys Chemical compound C(CCN)C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](CS)N WVJHEDOLHPZLRV-CIUDSAMLSA-N 0.000 description 2
- MGAWEOHYNIMOQJ-ACZMJKKPSA-N Cys-Gln-Asp Chemical compound C(CC(=O)N)[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)O)NC(=O)[C@H](CS)N MGAWEOHYNIMOQJ-ACZMJKKPSA-N 0.000 description 2
- BPHKULHWEIUDOB-FXQIFTODSA-N Cys-Gln-Gln Chemical compound SC[C@H](N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCC(N)=O)C(O)=O BPHKULHWEIUDOB-FXQIFTODSA-N 0.000 description 2
- OXFOKRAFNYSREH-BJDJZHNGSA-N Cys-Ile-Leu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(C)C)C(=O)O)NC(=O)[C@H](CS)N OXFOKRAFNYSREH-BJDJZHNGSA-N 0.000 description 2
- KKUVRYLJEXJSGX-MXAVVETBSA-N Cys-Ile-Phe Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)O)NC(=O)[C@H](CS)N KKUVRYLJEXJSGX-MXAVVETBSA-N 0.000 description 2
- XZKJEOMFLDVXJG-KATARQTJSA-N Cys-Leu-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CS)N)O XZKJEOMFLDVXJG-KATARQTJSA-N 0.000 description 2
- YYLBXQJGWOQZOU-IHRRRGAJSA-N Cys-Phe-Val Chemical compound CC(C)[C@@H](C(=O)O)NC(=O)[C@H](CC1=CC=CC=C1)NC(=O)[C@H](CS)N YYLBXQJGWOQZOU-IHRRRGAJSA-N 0.000 description 2
- GGRDJANMZPGMNS-CIUDSAMLSA-N Cys-Ser-Leu Chemical compound [H]N[C@@H](CS)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(O)=O GGRDJANMZPGMNS-CIUDSAMLSA-N 0.000 description 2
- HJXSYJVCMUOUNY-SRVKXCTJSA-N Cys-Ser-Phe Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)O)NC(=O)[C@H](CO)NC(=O)[C@H](CS)N HJXSYJVCMUOUNY-SRVKXCTJSA-N 0.000 description 2
- SAEVTQWAYDPXMU-KATARQTJSA-N Cys-Thr-Leu Chemical compound [H]N[C@@H](CS)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(O)=O SAEVTQWAYDPXMU-KATARQTJSA-N 0.000 description 2
- KZZYVYWSXMFYEC-DCAQKATOSA-N Cys-Val-Leu Chemical compound [H]N[C@@H](CS)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC(C)C)C(O)=O KZZYVYWSXMFYEC-DCAQKATOSA-N 0.000 description 2
- 241000701022 Cytomegalovirus Species 0.000 description 2
- 239000003298 DNA probe Substances 0.000 description 2
- 238000001712 DNA sequencing Methods 0.000 description 2
- 102000004190 Enzymes Human genes 0.000 description 2
- 108090000790 Enzymes Proteins 0.000 description 2
- 241000588724 Escherichia coli Species 0.000 description 2
- KZKBJEUWNMQTLV-XDTLVQLUSA-N Gln-Ala-Tyr Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](C)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O KZKBJEUWNMQTLV-XDTLVQLUSA-N 0.000 description 2
- WMOMPXKOKASNBK-PEFMBERDSA-N Gln-Asn-Ile Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O WMOMPXKOKASNBK-PEFMBERDSA-N 0.000 description 2
- AJDMYLOISOCHHC-YVNDNENWSA-N Gln-Gln-Ile Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O AJDMYLOISOCHHC-YVNDNENWSA-N 0.000 description 2
- GNMQDOGFWYWPNM-LAEOZQHASA-N Gln-Gly-Ile Chemical compound CC[C@H](C)[C@H](NC(=O)CNC(=O)[C@@H](N)CCC(N)=O)C(O)=O GNMQDOGFWYWPNM-LAEOZQHASA-N 0.000 description 2
- FGYPOQPQTUNESW-IUCAKERBSA-N Gln-Gly-Leu Chemical compound CC(C)C[C@@H](C(=O)O)NC(=O)CNC(=O)[C@H](CCC(=O)N)N FGYPOQPQTUNESW-IUCAKERBSA-N 0.000 description 2
- FYAULIGIFPPOAA-ZPFDUUQYSA-N Gln-Ile-Met Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CCSC)C(O)=O FYAULIGIFPPOAA-ZPFDUUQYSA-N 0.000 description 2
- VZRAXPGTUNDIDK-GUBZILKMSA-N Gln-Leu-Asn Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)[C@H](CCC(=O)N)N VZRAXPGTUNDIDK-GUBZILKMSA-N 0.000 description 2
- SHAUZYVSXAMYAZ-JYJNAYRXSA-N Gln-Leu-Phe Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)O)NC(=O)[C@H](CCC(=O)N)N SHAUZYVSXAMYAZ-JYJNAYRXSA-N 0.000 description 2
- HPCOBEHVEHWREJ-DCAQKATOSA-N Gln-Lys-Glu Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCC(O)=O)C(O)=O HPCOBEHVEHWREJ-DCAQKATOSA-N 0.000 description 2
- CELXWPDNIGWCJN-WDCWCFNPSA-N Gln-Lys-Thr Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)O)C(O)=O CELXWPDNIGWCJN-WDCWCFNPSA-N 0.000 description 2
- RWQCWSGOOOEGPB-FXQIFTODSA-N Gln-Ser-Glu Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(O)=O)C(O)=O RWQCWSGOOOEGPB-FXQIFTODSA-N 0.000 description 2
- WTJIWXMJESRHMM-XDTLVQLUSA-N Gln-Tyr-Ala Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](C)C(O)=O WTJIWXMJESRHMM-XDTLVQLUSA-N 0.000 description 2
- HPBKQFJXDUVNQV-FHWLQOOXSA-N Gln-Tyr-Tyr Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)N[C@@H](CC2=CC=C(C=C2)O)C(=O)O)NC(=O)[C@H](CCC(=O)N)N)O HPBKQFJXDUVNQV-FHWLQOOXSA-N 0.000 description 2
- RJONUNZIMUXUOI-GUBZILKMSA-N Glu-Asn-Lys Chemical compound C(CCN)C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](CCC(=O)O)N RJONUNZIMUXUOI-GUBZILKMSA-N 0.000 description 2
- CKOFNWCLWRYUHK-XHNCKOQMSA-N Glu-Asp-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CC(=O)O)NC(=O)[C@H](CCC(=O)O)N)C(=O)O CKOFNWCLWRYUHK-XHNCKOQMSA-N 0.000 description 2
- XMVLTPMCUJTJQP-FXQIFTODSA-N Glu-Gln-Cys Chemical compound C(CC(=O)O)[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H](CS)C(=O)O)N XMVLTPMCUJTJQP-FXQIFTODSA-N 0.000 description 2
- VFZIDQZAEBORGY-GLLZPBPUSA-N Glu-Gln-Thr Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O VFZIDQZAEBORGY-GLLZPBPUSA-N 0.000 description 2
- IQACOVZVOMVILH-FXQIFTODSA-N Glu-Glu-Ser Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CO)C(O)=O IQACOVZVOMVILH-FXQIFTODSA-N 0.000 description 2
- VOORMNJKNBGYGK-YUMQZZPRSA-N Glu-Gly-Met Chemical compound CSCC[C@@H](C(=O)O)NC(=O)CNC(=O)[C@H](CCC(=O)O)N VOORMNJKNBGYGK-YUMQZZPRSA-N 0.000 description 2
- YVYVMJNUENBOOL-KBIXCLLPSA-N Glu-Ile-Cys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CS)C(=O)O)NC(=O)[C@H](CCC(=O)O)N YVYVMJNUENBOOL-KBIXCLLPSA-N 0.000 description 2
- GXMXPCXXKVWOSM-KQXIARHKSA-N Glu-Ile-Pro Chemical compound CC[C@H](C)[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CCC(=O)O)N GXMXPCXXKVWOSM-KQXIARHKSA-N 0.000 description 2
- IVGJYOOGJLFKQE-AVGNSLFASA-N Glu-Leu-Lys Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CCC(=O)O)N IVGJYOOGJLFKQE-AVGNSLFASA-N 0.000 description 2
- BCYGDJXHAGZNPQ-DCAQKATOSA-N Glu-Lys-Glu Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCC(O)=O)C(O)=O BCYGDJXHAGZNPQ-DCAQKATOSA-N 0.000 description 2
- DAHLWSFUXOHMIA-FXQIFTODSA-N Glu-Ser-Gln Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(N)=O)C(O)=O DAHLWSFUXOHMIA-FXQIFTODSA-N 0.000 description 2
- BXSZPACYCMNKLS-AVGNSLFASA-N Glu-Ser-Phe Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O BXSZPACYCMNKLS-AVGNSLFASA-N 0.000 description 2
- RZMXBFUSQNLEQF-QEJZJMRPSA-N Glu-Trp-Asn Chemical compound C1=CC=C2C(=C1)C(=CN2)C[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)[C@H](CCC(=O)O)N RZMXBFUSQNLEQF-QEJZJMRPSA-N 0.000 description 2
- UUTGYDAKPISJAO-JYJNAYRXSA-N Glu-Tyr-Leu Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@H](C(=O)N[C@@H](CC(C)C)C(O)=O)CC1=CC=C(O)C=C1 UUTGYDAKPISJAO-JYJNAYRXSA-N 0.000 description 2
- WJZLEENECIOOSA-WDSKDSINSA-N Gly-Asn-Gln Chemical compound NCC(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(N)=O)C(=O)O WJZLEENECIOOSA-WDSKDSINSA-N 0.000 description 2
- MBOAPAXLTUSMQI-JHEQGTHGSA-N Gly-Glu-Thr Chemical compound [H]NCC(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O MBOAPAXLTUSMQI-JHEQGTHGSA-N 0.000 description 2
- VIIBEIQMLJEUJG-LAEOZQHASA-N Gly-Ile-Gln Chemical compound [H]NCC(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CCC(N)=O)C(O)=O VIIBEIQMLJEUJG-LAEOZQHASA-N 0.000 description 2
- UYPPAMNTTMJHJW-KCTSRDHCSA-N Gly-Ile-Trp Chemical compound [H]NCC(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC1=CNC2=C1C=CC=C2)C(O)=O UYPPAMNTTMJHJW-KCTSRDHCSA-N 0.000 description 2
- LLZXNUUIBOALNY-QWRGUYRKSA-N Gly-Leu-Lys Chemical compound NCC(=O)N[C@@H](CC(C)C)C(=O)N[C@H](C(O)=O)CCCCN LLZXNUUIBOALNY-QWRGUYRKSA-N 0.000 description 2
- MHXKHKWHPNETGG-QWRGUYRKSA-N Gly-Lys-Leu Chemical compound [H]NCC(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(O)=O MHXKHKWHPNETGG-QWRGUYRKSA-N 0.000 description 2
- YHYDTTUSJXGTQK-UWVGGRQHSA-N Gly-Met-Leu Chemical compound CSCC[C@H](NC(=O)CN)C(=O)N[C@@H](CC(C)C)C(O)=O YHYDTTUSJXGTQK-UWVGGRQHSA-N 0.000 description 2
- FKYQEVBRZSFAMJ-QWRGUYRKSA-N Gly-Ser-Tyr Chemical compound NCC(=O)N[C@@H](CO)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 FKYQEVBRZSFAMJ-QWRGUYRKSA-N 0.000 description 2
- ZLCLYFGMKFCDCN-XPUUQOCRSA-N Gly-Ser-Val Chemical compound CC(C)[C@H](NC(=O)[C@H](CO)NC(=O)CN)C(O)=O ZLCLYFGMKFCDCN-XPUUQOCRSA-N 0.000 description 2
- HUFUVTYGPOUCBN-MBLNEYKQSA-N Gly-Thr-Ile Chemical compound [H]NCC(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O HUFUVTYGPOUCBN-MBLNEYKQSA-N 0.000 description 2
- ZZWUYQXMIFTIIY-WEDXCCLWSA-N Gly-Thr-Leu Chemical compound [H]NCC(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(O)=O ZZWUYQXMIFTIIY-WEDXCCLWSA-N 0.000 description 2
- FFALDIDGPLUDKV-ZDLURKLDSA-N Gly-Thr-Ser Chemical compound [H]NCC(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CO)C(O)=O FFALDIDGPLUDKV-ZDLURKLDSA-N 0.000 description 2
- VSLXGYMEHVAJBH-DLOVCJGASA-N His-Ala-Leu Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](C)C(=O)N[C@@H](CC(C)C)C(O)=O VSLXGYMEHVAJBH-DLOVCJGASA-N 0.000 description 2
- YPLYIXGKCRQZGW-SRVKXCTJSA-N His-Arg-Glu Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(O)=O)C(O)=O YPLYIXGKCRQZGW-SRVKXCTJSA-N 0.000 description 2
- UZZXGLOJRZKYEL-DJFWLOJKSA-N His-Asn-Ile Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O UZZXGLOJRZKYEL-DJFWLOJKSA-N 0.000 description 2
- VOKCBYNCZVSILJ-KKUMJFAQSA-N His-Asn-Tyr Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](CC2=CN=CN2)N)O VOKCBYNCZVSILJ-KKUMJFAQSA-N 0.000 description 2
- YOSQCYUFZGPIPC-PBCZWWQYSA-N His-Asp-Thr Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O YOSQCYUFZGPIPC-PBCZWWQYSA-N 0.000 description 2
- ZYDYEPDFFVCUBI-SRVKXCTJSA-N His-Glu-Met Chemical compound CSCC[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CC1=CN=CN1)N ZYDYEPDFFVCUBI-SRVKXCTJSA-N 0.000 description 2
- PYNUBZSXKQKAHL-UWVGGRQHSA-N His-Gly-Arg Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)NCC(=O)N[C@@H](CCCNC(N)=N)C(O)=O PYNUBZSXKQKAHL-UWVGGRQHSA-N 0.000 description 2
- NQKRILCJYCASDV-QWRGUYRKSA-N His-Gly-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)CNC(=O)[C@@H](N)CC1=CN=CN1 NQKRILCJYCASDV-QWRGUYRKSA-N 0.000 description 2
- CSTNMMIHMYJGFR-IHRRRGAJSA-N His-His-Arg Chemical compound C([C@H](N)C(=O)N[C@@H](CC=1NC=NC=1)C(=O)N[C@@H](CCCN=C(N)N)C(O)=O)C1=CN=CN1 CSTNMMIHMYJGFR-IHRRRGAJSA-N 0.000 description 2
- JIUYRPFQJJRSJB-QWRGUYRKSA-N His-His-Gly Chemical compound C([C@H](N)C(=O)N[C@@H](CC=1NC=NC=1)C(=O)NCC(O)=O)C1=CN=CN1 JIUYRPFQJJRSJB-QWRGUYRKSA-N 0.000 description 2
- MLZVJIREOKTDAR-SIGLWIIPSA-N His-Ile-Ile Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O MLZVJIREOKTDAR-SIGLWIIPSA-N 0.000 description 2
- ZRSJXIKQXUGKRB-TUBUOCAGSA-N His-Ile-Thr Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)O)C(O)=O ZRSJXIKQXUGKRB-TUBUOCAGSA-N 0.000 description 2
- WHKLDLQHSYAVGU-ACRUOGEOSA-N His-Phe-Tyr Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O WHKLDLQHSYAVGU-ACRUOGEOSA-N 0.000 description 2
- DGLAHESNTJWGDO-SRVKXCTJSA-N His-Ser-His Chemical compound C1=C(NC=N1)C[C@@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](CC2=CN=CN2)C(=O)O)N DGLAHESNTJWGDO-SRVKXCTJSA-N 0.000 description 2
- 241000701044 Human gammaherpesvirus 4 Species 0.000 description 2
- FVEWRQXNISSYFO-ZPFDUUQYSA-N Ile-Arg-Glu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CCC(=O)O)C(=O)O)N FVEWRQXNISSYFO-ZPFDUUQYSA-N 0.000 description 2
- DMHGKBGOUAJRHU-UHFFFAOYSA-N Ile-Arg-Pro Natural products CCC(C)C(N)C(=O)NC(CCCN=C(N)N)C(=O)N1CCCC1C(O)=O DMHGKBGOUAJRHU-UHFFFAOYSA-N 0.000 description 2
- DMHGKBGOUAJRHU-RVMXOQNASA-N Ile-Arg-Pro Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCN=C(N)N)C(=O)N1CCC[C@@H]1C(=O)O)N DMHGKBGOUAJRHU-RVMXOQNASA-N 0.000 description 2
- HZYHBDVRCBDJJV-HAFWLYHUSA-N Ile-Asn Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@H](C(O)=O)CC(N)=O HZYHBDVRCBDJJV-HAFWLYHUSA-N 0.000 description 2
- FJWYJQRCVNGEAQ-ZPFDUUQYSA-N Ile-Asn-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CCCCN)C(=O)O)N FJWYJQRCVNGEAQ-ZPFDUUQYSA-N 0.000 description 2
- UMYZBHKAVTXWIW-GMOBBJLQSA-N Ile-Asp-Arg Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CCCN=C(N)N)C(=O)O)N UMYZBHKAVTXWIW-GMOBBJLQSA-N 0.000 description 2
- IDAHFEPYTJJZFD-PEFMBERDSA-N Ile-Asp-Glu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CCC(=O)O)C(=O)O)N IDAHFEPYTJJZFD-PEFMBERDSA-N 0.000 description 2
- NKRJALPCDNXULF-BYULHYEWSA-N Ile-Asp-Gly Chemical compound [H]N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(O)=O)C(=O)NCC(O)=O NKRJALPCDNXULF-BYULHYEWSA-N 0.000 description 2
- RGSOCXHDOPQREB-ZPFDUUQYSA-N Ile-Asp-Leu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CC(C)C)C(=O)O)N RGSOCXHDOPQREB-ZPFDUUQYSA-N 0.000 description 2
- FADXGVVLSPPEQY-GHCJXIJMSA-N Ile-Cys-Asn Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CS)C(=O)N[C@@H](CC(=O)N)C(=O)O)N FADXGVVLSPPEQY-GHCJXIJMSA-N 0.000 description 2
- PPSQSIDMOVPKPI-BJDJZHNGSA-N Ile-Cys-Leu Chemical compound N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CS)C(=O)N[C@@H](CC(C)C)C(=O)O PPSQSIDMOVPKPI-BJDJZHNGSA-N 0.000 description 2
- VQUCKIAECLVLAD-SVSWQMSJSA-N Ile-Cys-Thr Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CS)C(=O)N[C@@H]([C@@H](C)O)C(=O)O)N VQUCKIAECLVLAD-SVSWQMSJSA-N 0.000 description 2
- KUHFPGIVBOCRMV-MNXVOIDGSA-N Ile-Gln-Leu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H](CC(C)C)C(=O)O)N KUHFPGIVBOCRMV-MNXVOIDGSA-N 0.000 description 2
- GAZGFPOZOLEYAJ-YTFOTSKYSA-N Ile-Leu-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(=O)O)N GAZGFPOZOLEYAJ-YTFOTSKYSA-N 0.000 description 2
- HPCFRQWLTRDGHT-AJNGGQMLSA-N Ile-Leu-Leu Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(O)=O HPCFRQWLTRDGHT-AJNGGQMLSA-N 0.000 description 2
- TVYWVSJGSHQWMT-AJNGGQMLSA-N Ile-Leu-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(=O)O)N TVYWVSJGSHQWMT-AJNGGQMLSA-N 0.000 description 2
- FCWFBHMAJZGWRY-XUXIUFHCSA-N Ile-Leu-Met Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCSC)C(=O)O)N FCWFBHMAJZGWRY-XUXIUFHCSA-N 0.000 description 2
- PWUMCBLVWPCKNO-MGHWNKPDSA-N Ile-Leu-Tyr Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 PWUMCBLVWPCKNO-MGHWNKPDSA-N 0.000 description 2
- UYNXBNHVWFNVIN-HJWJTTGWSA-N Ile-Phe-Arg Chemical compound NC(N)=NCCC[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@@H](N)[C@@H](C)CC)CC1=CC=CC=C1 UYNXBNHVWFNVIN-HJWJTTGWSA-N 0.000 description 2
- XHBYEMIUENPZLY-GMOBBJLQSA-N Ile-Pro-Asn Chemical compound CC[C@H](C)[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CC(N)=O)C(O)=O XHBYEMIUENPZLY-GMOBBJLQSA-N 0.000 description 2
- OWSWUWDMSNXTNE-GMOBBJLQSA-N Ile-Pro-Asp Chemical compound CC[C@H](C)[C@@H](C(=O)N1CCC[C@H]1C(=O)N[C@@H](CC(=O)O)C(=O)O)N OWSWUWDMSNXTNE-GMOBBJLQSA-N 0.000 description 2
- TWVKGYNQQAUNRN-ACZMJKKPSA-N Ile-Ser Chemical compound CC[C@H](C)[C@H]([NH3+])C(=O)N[C@@H](CO)C([O-])=O TWVKGYNQQAUNRN-ACZMJKKPSA-N 0.000 description 2
- JZNVOBUNTWNZPW-GHCJXIJMSA-N Ile-Ser-Asp Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(=O)O)C(=O)O)N JZNVOBUNTWNZPW-GHCJXIJMSA-N 0.000 description 2
- VGSPNSSCMOHRRR-BJDJZHNGSA-N Ile-Ser-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCCN)C(=O)O)N VGSPNSSCMOHRRR-BJDJZHNGSA-N 0.000 description 2
- HXIDVIFHRYRXLZ-NAKRPEOUSA-N Ile-Ser-Val Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](C(C)C)C(=O)O)N HXIDVIFHRYRXLZ-NAKRPEOUSA-N 0.000 description 2
- YCKPUHHMCFSUMD-IUKAMOBKSA-N Ile-Thr-Asp Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(=O)O)C(=O)O)N YCKPUHHMCFSUMD-IUKAMOBKSA-N 0.000 description 2
- HJDZMPFEXINXLO-QPHKQPEJSA-N Ile-Thr-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)O)N HJDZMPFEXINXLO-QPHKQPEJSA-N 0.000 description 2
- KBDIBHQICWDGDL-PPCPHDFISA-N Ile-Thr-Leu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(=O)O)N KBDIBHQICWDGDL-PPCPHDFISA-N 0.000 description 2
- XVUAQNRNFMVWBR-BLMTYFJBSA-N Ile-Trp-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CNC2=CC=CC=C21)C(=O)N[C@@H]([C@@H](C)CC)C(=O)O)N XVUAQNRNFMVWBR-BLMTYFJBSA-N 0.000 description 2
- RTSQPLLOYSGMKM-DSYPUSFNSA-N Ile-Trp-Leu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CNC2=CC=CC=C21)C(=O)N[C@@H](CC(C)C)C(=O)O)N RTSQPLLOYSGMKM-DSYPUSFNSA-N 0.000 description 2
- DLEBSGAVWRPTIX-PEDHHIEDSA-N Ile-Val-Ile Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@H](C(O)=O)[C@@H](C)CC DLEBSGAVWRPTIX-PEDHHIEDSA-N 0.000 description 2
- WIYDLTIBHZSPKY-HJWJTTGWSA-N Ile-Val-Phe Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 WIYDLTIBHZSPKY-HJWJTTGWSA-N 0.000 description 2
- 108091092195 Intron Proteins 0.000 description 2
- HFKJBCPRWWGPEY-BQBZGAKWSA-N L-arginyl-L-glutamic acid Chemical compound NC(=N)NCCC[C@H](N)C(=O)N[C@@H](CCC(O)=O)C(O)=O HFKJBCPRWWGPEY-BQBZGAKWSA-N 0.000 description 2
- QOOWRKBDDXQRHC-BQBZGAKWSA-N L-lysyl-L-alanine Chemical compound OC(=O)[C@H](C)NC(=O)[C@@H](N)CCCCN QOOWRKBDDXQRHC-BQBZGAKWSA-N 0.000 description 2
- LZDNBBYBDGBADK-UHFFFAOYSA-N L-valyl-L-tryptophan Natural products C1=CC=C2C(CC(NC(=O)C(N)C(C)C)C(O)=O)=CNC2=C1 LZDNBBYBDGBADK-UHFFFAOYSA-N 0.000 description 2
- KVRKAGGMEWNURO-CIUDSAMLSA-N Leu-Ala-Cys Chemical compound C[C@@H](C(=O)N[C@@H](CS)C(=O)O)NC(=O)[C@H](CC(C)C)N KVRKAGGMEWNURO-CIUDSAMLSA-N 0.000 description 2
- DQPQTXMIRBUWKO-DCAQKATOSA-N Leu-Ala-Met Chemical compound C[C@@H](C(=O)N[C@@H](CCSC)C(=O)O)NC(=O)[C@H](CC(C)C)N DQPQTXMIRBUWKO-DCAQKATOSA-N 0.000 description 2
- XIRYQRLFHWWWTC-QEJZJMRPSA-N Leu-Ala-Phe Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 XIRYQRLFHWWWTC-QEJZJMRPSA-N 0.000 description 2
- JKGHDYGZRDWHGA-SRVKXCTJSA-N Leu-Asn-Leu Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(C)C)C(O)=O JKGHDYGZRDWHGA-SRVKXCTJSA-N 0.000 description 2
- HIZYETOZLYFUFF-BQBZGAKWSA-N Leu-Cys Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CS)C(O)=O HIZYETOZLYFUFF-BQBZGAKWSA-N 0.000 description 2
- DKEZVKFLETVJFY-CIUDSAMLSA-N Leu-Cys-Asn Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CS)C(=O)N[C@@H](CC(=O)N)C(=O)O)N DKEZVKFLETVJFY-CIUDSAMLSA-N 0.000 description 2
- FOEHRHOBWFQSNW-KATARQTJSA-N Leu-Cys-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](CS)NC(=O)[C@H](CC(C)C)N)O FOEHRHOBWFQSNW-KATARQTJSA-N 0.000 description 2
- VPKIQULSKFVCSM-SRVKXCTJSA-N Leu-Gln-Arg Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O VPKIQULSKFVCSM-SRVKXCTJSA-N 0.000 description 2
- KAFOIVJDVSZUMD-DCAQKATOSA-N Leu-Gln-Gln Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCC(N)=O)C(O)=O KAFOIVJDVSZUMD-DCAQKATOSA-N 0.000 description 2
- LOLUPZNNADDTAA-AVGNSLFASA-N Leu-Gln-Leu Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(C)C)C(O)=O LOLUPZNNADDTAA-AVGNSLFASA-N 0.000 description 2
- AXZGZMGRBDQTEY-SRVKXCTJSA-N Leu-Gln-Met Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCSC)C(O)=O AXZGZMGRBDQTEY-SRVKXCTJSA-N 0.000 description 2
- QVFGXCVIXXBFHO-AVGNSLFASA-N Leu-Glu-Leu Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(O)=O QVFGXCVIXXBFHO-AVGNSLFASA-N 0.000 description 2
- LESXFEZIFXFIQR-LURJTMIESA-N Leu-Gly Chemical compound CC(C)C[C@H](N)C(=O)NCC(O)=O LESXFEZIFXFIQR-LURJTMIESA-N 0.000 description 2
- OXRLYTYUXAQTHP-YUMQZZPRSA-N Leu-Gly-Ala Chemical compound [H]N[C@@H](CC(C)C)C(=O)NCC(=O)N[C@@H](C)C(O)=O OXRLYTYUXAQTHP-YUMQZZPRSA-N 0.000 description 2
- HYIFFZAQXPUEAU-QWRGUYRKSA-N Leu-Gly-Leu Chemical compound CC(C)C[C@H](N)C(=O)NCC(=O)N[C@H](C(O)=O)CC(C)C HYIFFZAQXPUEAU-QWRGUYRKSA-N 0.000 description 2
- APFJUBGRZGMQFF-QWRGUYRKSA-N Leu-Gly-Lys Chemical compound CC(C)C[C@H](N)C(=O)NCC(=O)N[C@H](C(O)=O)CCCCN APFJUBGRZGMQFF-QWRGUYRKSA-N 0.000 description 2
- QPXBPQUGXHURGP-UWVGGRQHSA-N Leu-Gly-Met Chemical compound CC(C)C[C@@H](C(=O)NCC(=O)N[C@@H](CCSC)C(=O)O)N QPXBPQUGXHURGP-UWVGGRQHSA-N 0.000 description 2
- TVEOVCYCYGKVPP-HSCHXYMDSA-N Leu-Ile-Trp Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CNC2=CC=CC=C21)C(=O)O)NC(=O)[C@H](CC(C)C)N TVEOVCYCYGKVPP-HSCHXYMDSA-N 0.000 description 2
- FAELBUXXFQLUAX-AJNGGQMLSA-N Leu-Leu-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)CC(C)C FAELBUXXFQLUAX-AJNGGQMLSA-N 0.000 description 2
- FIICHHJDINDXKG-IHPCNDPISA-N Leu-Lys-Trp Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC1=CNC2=C1C=CC=C2)C(O)=O FIICHHJDINDXKG-IHPCNDPISA-N 0.000 description 2
- NTISAKGPIGTIJJ-IUCAKERBSA-N Leu-Met Chemical compound CSCC[C@@H](C(O)=O)NC(=O)[C@@H](N)CC(C)C NTISAKGPIGTIJJ-IUCAKERBSA-N 0.000 description 2
- IBSGMIPRBMPMHE-IHRRRGAJSA-N Leu-Met-Lys Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CCCCN)C(O)=O IBSGMIPRBMPMHE-IHRRRGAJSA-N 0.000 description 2
- YWKNKRAKOCLOLH-OEAJRASXSA-N Leu-Phe-Thr Chemical compound CC(C)C[C@H](N)C(=O)N[C@H](C(=O)N[C@@H]([C@@H](C)O)C(O)=O)CC1=CC=CC=C1 YWKNKRAKOCLOLH-OEAJRASXSA-N 0.000 description 2
- HGUUMQWGYCVPKG-DCAQKATOSA-N Leu-Pro-Cys Chemical compound CC(C)C[C@@H](C(=O)N1CCC[C@H]1C(=O)N[C@@H](CS)C(=O)O)N HGUUMQWGYCVPKG-DCAQKATOSA-N 0.000 description 2
- IZPVWNSAVUQBGP-CIUDSAMLSA-N Leu-Ser-Asp Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(O)=O)C(O)=O IZPVWNSAVUQBGP-CIUDSAMLSA-N 0.000 description 2
- RGUXWMDNCPMQFB-YUMQZZPRSA-N Leu-Ser-Gly Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CO)C(=O)NCC(O)=O RGUXWMDNCPMQFB-YUMQZZPRSA-N 0.000 description 2
- ZGGVHTQAPHVMKM-IHPCNDPISA-N Leu-Trp-Lys Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CC1=CNC2=CC=CC=C21)C(=O)N[C@@H](CCCCN)C(=O)O)N ZGGVHTQAPHVMKM-IHPCNDPISA-N 0.000 description 2
- VQHUBNVKFFLWRP-ULQDDVLXSA-N Leu-Tyr-Val Chemical compound CC(C)C[C@H](N)C(=O)N[C@H](C(=O)N[C@@H](C(C)C)C(O)=O)CC1=CC=C(O)C=C1 VQHUBNVKFFLWRP-ULQDDVLXSA-N 0.000 description 2
- FPPCCQGECVKLDY-IHRRRGAJSA-N Leu-Val-Leu Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@H](C(O)=O)CC(C)C FPPCCQGECVKLDY-IHRRRGAJSA-N 0.000 description 2
- VKVDRTGWLVZJOM-DCAQKATOSA-N Leu-Val-Ser Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CO)C(O)=O VKVDRTGWLVZJOM-DCAQKATOSA-N 0.000 description 2
- JPNRPAJITHRXRH-BQBZGAKWSA-N Lys-Asn Chemical compound NCCCC[C@H](N)C(=O)N[C@H](C(O)=O)CC(N)=O JPNRPAJITHRXRH-BQBZGAKWSA-N 0.000 description 2
- MKBIVWXCFINCLE-SRVKXCTJSA-N Lys-Asn-His Chemical compound C1=C(NC=N1)C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](CCCCN)N MKBIVWXCFINCLE-SRVKXCTJSA-N 0.000 description 2
- LZWNAOIMTLNMDW-NHCYSSNCSA-N Lys-Asn-Val Chemical compound CC(C)[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](CCCCN)N LZWNAOIMTLNMDW-NHCYSSNCSA-N 0.000 description 2
- QIJVAFLRMVBHMU-KKUMJFAQSA-N Lys-Asp-Phe Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O QIJVAFLRMVBHMU-KKUMJFAQSA-N 0.000 description 2
- KWUKZRFFKPLUPE-HJGDQZAQSA-N Lys-Asp-Thr Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O KWUKZRFFKPLUPE-HJGDQZAQSA-N 0.000 description 2
- NNCDAORZCMPZPX-GUBZILKMSA-N Lys-Gln-Ser Chemical compound C(CCN)C[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H](CO)C(=O)O)N NNCDAORZCMPZPX-GUBZILKMSA-N 0.000 description 2
- WGLAORUKDGRINI-WDCWCFNPSA-N Lys-Glu-Thr Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O WGLAORUKDGRINI-WDCWCFNPSA-N 0.000 description 2
- ITWQLSZTLBKWJM-YUMQZZPRSA-N Lys-Gly-Ala Chemical compound OC(=O)[C@H](C)NC(=O)CNC(=O)[C@@H](N)CCCCN ITWQLSZTLBKWJM-YUMQZZPRSA-N 0.000 description 2
- DTUZCYRNEJDKSR-NHCYSSNCSA-N Lys-Gly-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)CNC(=O)[C@@H](N)CCCCN DTUZCYRNEJDKSR-NHCYSSNCSA-N 0.000 description 2
- NKKFVJRLCCUJNA-QWRGUYRKSA-N Lys-Gly-Lys Chemical compound NCCCC[C@H](N)C(=O)NCC(=O)N[C@H](C(O)=O)CCCCN NKKFVJRLCCUJNA-QWRGUYRKSA-N 0.000 description 2
- HAUUXTXKJNVIFY-ONGXEEELSA-N Lys-Gly-Val Chemical compound [H]N[C@@H](CCCCN)C(=O)NCC(=O)N[C@@H](C(C)C)C(O)=O HAUUXTXKJNVIFY-ONGXEEELSA-N 0.000 description 2
- OIQSIMFSVLLWBX-VOAKCMCISA-N Lys-Leu-Thr Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)O)C(O)=O OIQSIMFSVLLWBX-VOAKCMCISA-N 0.000 description 2
- XOQMURBBIXRRCR-SRVKXCTJSA-N Lys-Lys-Ala Chemical compound OC(=O)[C@H](C)NC(=O)[C@H](CCCCN)NC(=O)[C@@H](N)CCCCN XOQMURBBIXRRCR-SRVKXCTJSA-N 0.000 description 2
- ATNKHRAIZCMCCN-BZSNNMDCSA-N Lys-Lys-Phe Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)O)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCCN)N ATNKHRAIZCMCCN-BZSNNMDCSA-N 0.000 description 2
- YDDDRTIPNTWGIG-SRVKXCTJSA-N Lys-Lys-Ser Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CO)C(O)=O YDDDRTIPNTWGIG-SRVKXCTJSA-N 0.000 description 2
- GZGWILAQHOVXTD-DCAQKATOSA-N Lys-Met-Asp Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CC(O)=O)C(O)=O GZGWILAQHOVXTD-DCAQKATOSA-N 0.000 description 2
- LNMKRJJLEFASGA-BZSNNMDCSA-N Lys-Phe-Leu Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(C)C)C(O)=O LNMKRJJLEFASGA-BZSNNMDCSA-N 0.000 description 2
- CAVRAQIDHUPECU-UVOCVTCTSA-N Lys-Thr-Thr Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O CAVRAQIDHUPECU-UVOCVTCTSA-N 0.000 description 2
- MYTOTTSMVMWVJN-STQMWFEESA-N Lys-Tyr Chemical compound NCCCC[C@H](N)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 MYTOTTSMVMWVJN-STQMWFEESA-N 0.000 description 2
- SUZVLFWOCKHWET-CQDKDKBSSA-N Lys-Tyr-Ala Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](C)C(O)=O SUZVLFWOCKHWET-CQDKDKBSSA-N 0.000 description 2
- MDDUIRLQCYVRDO-NHCYSSNCSA-N Lys-Val-Asn Chemical compound NC(=O)C[C@@H](C(O)=O)NC(=O)[C@H](C(C)C)NC(=O)[C@@H](N)CCCCN MDDUIRLQCYVRDO-NHCYSSNCSA-N 0.000 description 2
- RPWQJSBMXJSCPD-XUXIUFHCSA-N Lys-Val-Ile Chemical compound CC[C@H](C)[C@H](NC(=O)[C@@H](NC(=O)[C@@H](N)CCCCN)C(C)C)C(O)=O RPWQJSBMXJSCPD-XUXIUFHCSA-N 0.000 description 2
- VWJFOUBDZIUXGA-AVGNSLFASA-N Lys-Val-Met Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCSC)C(=O)O)NC(=O)[C@H](CCCCN)N VWJFOUBDZIUXGA-AVGNSLFASA-N 0.000 description 2
- 102000018697 Membrane Proteins Human genes 0.000 description 2
- 108010052285 Membrane Proteins Proteins 0.000 description 2
- HDNOQCZWJGGHSS-VEVYYDQMSA-N Met-Asn-Thr Chemical compound CSCC[C@H](N)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O HDNOQCZWJGGHSS-VEVYYDQMSA-N 0.000 description 2
- RMHHNLKYPOOKQN-FXQIFTODSA-N Met-Cys-Ser Chemical compound [H]N[C@@H](CCSC)C(=O)N[C@@H](CS)C(=O)N[C@@H](CO)C(O)=O RMHHNLKYPOOKQN-FXQIFTODSA-N 0.000 description 2
- XKJUFUPCHARJKX-UWVGGRQHSA-N Met-Gly-His Chemical compound CSCC[C@H](N)C(=O)NCC(=O)N[C@H](C(O)=O)CC1=CNC=N1 XKJUFUPCHARJKX-UWVGGRQHSA-N 0.000 description 2
- IMTUWVJPCQPJEE-IUCAKERBSA-N Met-Lys Chemical compound CSCC[C@H](N)C(=O)N[C@H](C(O)=O)CCCCN IMTUWVJPCQPJEE-IUCAKERBSA-N 0.000 description 2
- NHXXGBXJTLRGJI-GUBZILKMSA-N Met-Pro-Ser Chemical compound [H]N[C@@H](CCSC)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CO)C(O)=O NHXXGBXJTLRGJI-GUBZILKMSA-N 0.000 description 2
- CIIJWIAORKTXAH-FJXKBIBVSA-N Met-Thr-Gly Chemical compound CSCC[C@H](N)C(=O)N[C@@H]([C@@H](C)O)C(=O)NCC(O)=O CIIJWIAORKTXAH-FJXKBIBVSA-N 0.000 description 2
- ZBLSZPYQQRIHQU-RCWTZXSCSA-N Met-Thr-Val Chemical compound CSCC[C@H](N)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C(C)C)C(O)=O ZBLSZPYQQRIHQU-RCWTZXSCSA-N 0.000 description 2
- JZXKNNOWPBVZEV-XIRDDKMYSA-N Met-Trp-Glu Chemical compound CSCC[C@@H](C(=O)N[C@@H](CC1=CNC2=CC=CC=C21)C(=O)N[C@@H](CCC(=O)O)C(=O)O)N JZXKNNOWPBVZEV-XIRDDKMYSA-N 0.000 description 2
- LBSWWNKMVPAXOI-GUBZILKMSA-N Met-Val-Ser Chemical compound CSCC[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CO)C(O)=O LBSWWNKMVPAXOI-GUBZILKMSA-N 0.000 description 2
- YBAFDPFAUTYYRW-UHFFFAOYSA-N N-L-alpha-glutamyl-L-leucine Natural products CC(C)CC(C(O)=O)NC(=O)C(N)CCC(O)=O YBAFDPFAUTYYRW-UHFFFAOYSA-N 0.000 description 2
- 108010066427 N-valyltryptophan Proteins 0.000 description 2
- BQVUABVGYYSDCJ-UHFFFAOYSA-N Nalpha-L-Leucyl-L-tryptophan Natural products C1=CC=C2C(CC(NC(=O)C(N)CC(C)C)C(O)=O)=CNC2=C1 BQVUABVGYYSDCJ-UHFFFAOYSA-N 0.000 description 2
- 108091005804 Peptidases Proteins 0.000 description 2
- LBSARGIQACMGDF-WBAXXEDZSA-N Phe-Ala-Phe Chemical compound C([C@H](N)C(=O)N[C@@H](C)C(=O)N[C@@H](CC=1C=CC=CC=1)C(O)=O)C1=CC=CC=C1 LBSARGIQACMGDF-WBAXXEDZSA-N 0.000 description 2
- LZDIENNKWVXJMX-JYJNAYRXSA-N Phe-Arg-Arg Chemical compound NC(N)=NCCC[C@@H](C(O)=O)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@@H](N)CC1=CC=CC=C1 LZDIENNKWVXJMX-JYJNAYRXSA-N 0.000 description 2
- LGBVMDMZZFYSFW-HJWJTTGWSA-N Phe-Arg-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@H](CC1=CC=CC=C1)N LGBVMDMZZFYSFW-HJWJTTGWSA-N 0.000 description 2
- HTTYNOXBBOWZTB-SRVKXCTJSA-N Phe-Asn-Asn Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CC(=O)N)C(=O)O)N HTTYNOXBBOWZTB-SRVKXCTJSA-N 0.000 description 2
- HPECNYCQLSVCHH-BZSNNMDCSA-N Phe-Cys-Phe Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CS)C(=O)N[C@@H](CC2=CC=CC=C2)C(=O)O)N HPECNYCQLSVCHH-BZSNNMDCSA-N 0.000 description 2
- MGECUMGTSHYHEJ-QEWYBTABSA-N Phe-Glu-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@@H](N)CC1=CC=CC=C1 MGECUMGTSHYHEJ-QEWYBTABSA-N 0.000 description 2
- GLUBLISJVJFHQS-VIFPVBQESA-N Phe-Gly Chemical compound OC(=O)CNC(=O)[C@@H](N)CC1=CC=CC=C1 GLUBLISJVJFHQS-VIFPVBQESA-N 0.000 description 2
- NPLGQVKZFGJWAI-QWHCGFSZSA-N Phe-Gly-Pro Chemical compound C1C[C@@H](N(C1)C(=O)CNC(=O)[C@H](CC2=CC=CC=C2)N)C(=O)O NPLGQVKZFGJWAI-QWHCGFSZSA-N 0.000 description 2
- FENSZYFJQOFSQR-FIRPJDEBSA-N Phe-Phe-Ile Chemical compound C([C@@H](C(=O)N[C@@H]([C@@H](C)CC)C(O)=O)NC(=O)[C@@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FENSZYFJQOFSQR-FIRPJDEBSA-N 0.000 description 2
- CZQZSMJXFGGBHM-KKUMJFAQSA-N Phe-Pro-Gln Chemical compound C1C[C@H](N(C1)C(=O)[C@H](CC2=CC=CC=C2)N)C(=O)N[C@@H](CCC(=O)N)C(=O)O CZQZSMJXFGGBHM-KKUMJFAQSA-N 0.000 description 2
- XOHJOMKCRLHGCY-UNQGMJICSA-N Phe-Pro-Thr Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N1CCC[C@H]1C(=O)N[C@@H]([C@@H](C)O)C(O)=O XOHJOMKCRLHGCY-UNQGMJICSA-N 0.000 description 2
- AFNJAQVMTIQTCB-DLOVCJGASA-N Phe-Ser-Ala Chemical compound OC(=O)[C@H](C)NC(=O)[C@H](CO)NC(=O)[C@@H](N)CC1=CC=CC=C1 AFNJAQVMTIQTCB-DLOVCJGASA-N 0.000 description 2
- QSWKNJAPHQDAAS-MELADBBJSA-N Phe-Ser-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CO)NC(=O)[C@H](CC2=CC=CC=C2)N)C(=O)O QSWKNJAPHQDAAS-MELADBBJSA-N 0.000 description 2
- GMWNQSGWWGKTSF-LFSVMHDDSA-N Phe-Thr-Ala Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C)C(O)=O GMWNQSGWWGKTSF-LFSVMHDDSA-N 0.000 description 2
- MSSXKZBDKZAHCX-UNQGMJICSA-N Phe-Thr-Val Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C(C)C)C(O)=O MSSXKZBDKZAHCX-UNQGMJICSA-N 0.000 description 2
- GCFNFKNPCMBHNT-IRXDYDNUSA-N Phe-Tyr-Gly Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CC2=CC=C(C=C2)O)C(=O)NCC(=O)O)N GCFNFKNPCMBHNT-IRXDYDNUSA-N 0.000 description 2
- 101710175536 Pheromone receptor 1 Proteins 0.000 description 2
- 241000288906 Primates Species 0.000 description 2
- GRIRJQGZZJVANI-CYDGBPFRSA-N Pro-Arg-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@@H]1CCCN1 GRIRJQGZZJVANI-CYDGBPFRSA-N 0.000 description 2
- GLEOIKLQBZNKJZ-WDSKDSINSA-N Pro-Asp Chemical compound OC(=O)C[C@@H](C(O)=O)NC(=O)[C@@H]1CCCN1 GLEOIKLQBZNKJZ-WDSKDSINSA-N 0.000 description 2
- ZCXQTRXYZOSGJR-FXQIFTODSA-N Pro-Asp-Ser Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(O)=O ZCXQTRXYZOSGJR-FXQIFTODSA-N 0.000 description 2
- SFECXGVELZFBFJ-VEVYYDQMSA-N Pro-Asp-Thr Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O SFECXGVELZFBFJ-VEVYYDQMSA-N 0.000 description 2
- ZPPVJIJMIKTERM-YUMQZZPRSA-N Pro-Gln-Gly Chemical compound OC(=O)CNC(=O)[C@H](CCC(=O)N)NC(=O)[C@@H]1CCCN1 ZPPVJIJMIKTERM-YUMQZZPRSA-N 0.000 description 2
- SMFQZMGHCODUPQ-ULQDDVLXSA-N Pro-Lys-Phe Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O SMFQZMGHCODUPQ-ULQDDVLXSA-N 0.000 description 2
- DLZBBDSPTJBOOD-BPNCWPANSA-N Pro-Tyr-Ala Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](C)C(O)=O DLZBBDSPTJBOOD-BPNCWPANSA-N 0.000 description 2
- OOZJHTXCLJUODH-QXEWZRGKSA-N Pro-Val-Asp Chemical compound OC(=O)C[C@@H](C(O)=O)NC(=O)[C@H](C(C)C)NC(=O)[C@@H]1CCCN1 OOZJHTXCLJUODH-QXEWZRGKSA-N 0.000 description 2
- 239000004365 Protease Substances 0.000 description 2
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 description 2
- LVVBAKCGXXUHFO-ZLUOBGJFSA-N Ser-Ala-Asp Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](C)C(=O)N[C@@H](CC(O)=O)C(O)=O LVVBAKCGXXUHFO-ZLUOBGJFSA-N 0.000 description 2
- GXXTUIUYTWGPMV-FXQIFTODSA-N Ser-Arg-Ala Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](C)C(O)=O GXXTUIUYTWGPMV-FXQIFTODSA-N 0.000 description 2
- MESDJCNHLZBMEP-ZLUOBGJFSA-N Ser-Asp-Asp Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(O)=O)C(O)=O MESDJCNHLZBMEP-ZLUOBGJFSA-N 0.000 description 2
- DGPGKMKUNGKHPK-QEJZJMRPSA-N Ser-Gln-Trp Chemical compound C1=CC=C2C(=C1)C(=CN2)C[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)N)NC(=O)[C@H](CO)N DGPGKMKUNGKHPK-QEJZJMRPSA-N 0.000 description 2
- HJEBZBMOTCQYDN-ACZMJKKPSA-N Ser-Glu-Asp Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(O)=O)C(O)=O HJEBZBMOTCQYDN-ACZMJKKPSA-N 0.000 description 2
- KDGARKCAKHBEDB-NKWVEPMBSA-N Ser-Gly-Pro Chemical compound C1C[C@@H](N(C1)C(=O)CNC(=O)[C@H](CO)N)C(=O)O KDGARKCAKHBEDB-NKWVEPMBSA-N 0.000 description 2
- CICQXRWZNVXFCU-SRVKXCTJSA-N Ser-His-Leu Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CC(C)C)C(O)=O CICQXRWZNVXFCU-SRVKXCTJSA-N 0.000 description 2
- RIAKPZVSNBBNRE-BJDJZHNGSA-N Ser-Ile-Leu Chemical compound OC[C@H](N)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(C)C)C(O)=O RIAKPZVSNBBNRE-BJDJZHNGSA-N 0.000 description 2
- LWMQRHDTXHQQOV-MXAVVETBSA-N Ser-Ile-Phe Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O LWMQRHDTXHQQOV-MXAVVETBSA-N 0.000 description 2
- NLOAIFSWUUFQFR-CIUDSAMLSA-N Ser-Leu-Asp Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(O)=O)C(O)=O NLOAIFSWUUFQFR-CIUDSAMLSA-N 0.000 description 2
- HEUVHBXOVZONPU-BJDJZHNGSA-N Ser-Leu-Ile Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O HEUVHBXOVZONPU-BJDJZHNGSA-N 0.000 description 2
- XXNYYSXNXCJYKX-DCAQKATOSA-N Ser-Leu-Met Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCSC)C(O)=O XXNYYSXNXCJYKX-DCAQKATOSA-N 0.000 description 2
- VZQRNAYURWAEFE-KKUMJFAQSA-N Ser-Leu-Phe Chemical compound OC[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 VZQRNAYURWAEFE-KKUMJFAQSA-N 0.000 description 2
- YUJLIIRMIAGMCQ-CIUDSAMLSA-N Ser-Leu-Ser Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(O)=O YUJLIIRMIAGMCQ-CIUDSAMLSA-N 0.000 description 2
- HDBOEVPDIDDEPC-CIUDSAMLSA-N Ser-Lys-Asn Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(N)=O)C(O)=O HDBOEVPDIDDEPC-CIUDSAMLSA-N 0.000 description 2
- LRZLZIUXQBIWTB-KATARQTJSA-N Ser-Lys-Thr Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)O)C(O)=O LRZLZIUXQBIWTB-KATARQTJSA-N 0.000 description 2
- KZPRPBLHYMZIMH-MXAVVETBSA-N Ser-Phe-Ile Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O KZPRPBLHYMZIMH-MXAVVETBSA-N 0.000 description 2
- RWDVVSKYZBNDCO-MELADBBJSA-N Ser-Phe-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CC2=CC=CC=C2)NC(=O)[C@H](CO)N)C(=O)O RWDVVSKYZBNDCO-MELADBBJSA-N 0.000 description 2
- FBLNYDYPCLFTSP-IXOXFDKPSA-N Ser-Phe-Thr Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H]([C@@H](C)O)C(O)=O FBLNYDYPCLFTSP-IXOXFDKPSA-N 0.000 description 2
- MHVXPTAMDHLTHB-IHPCNDPISA-N Ser-Phe-Trp Chemical compound C([C@H](NC(=O)[C@H](CO)N)C(=O)N[C@@H](CC=1C2=CC=CC=C2NC=1)C(O)=O)C1=CC=CC=C1 MHVXPTAMDHLTHB-IHPCNDPISA-N 0.000 description 2
- DINQYZRMXGWWTG-GUBZILKMSA-N Ser-Pro-Pro Chemical compound OC[C@H](N)C(=O)N1CCC[C@H]1C(=O)N1[C@H](C(O)=O)CCC1 DINQYZRMXGWWTG-GUBZILKMSA-N 0.000 description 2
- XZKQVQKUZMAADP-IMJSIDKUSA-N Ser-Ser Chemical compound OC[C@H](N)C(=O)N[C@@H](CO)C(O)=O XZKQVQKUZMAADP-IMJSIDKUSA-N 0.000 description 2
- OZPDGESCTGGNAD-CIUDSAMLSA-N Ser-Ser-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@H](CO)NC(=O)[C@@H](N)CO OZPDGESCTGGNAD-CIUDSAMLSA-N 0.000 description 2
- BDMWLJLPPUCLNV-XGEHTFHBSA-N Ser-Thr-Val Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C(C)C)C(O)=O BDMWLJLPPUCLNV-XGEHTFHBSA-N 0.000 description 2
- FHXGMDRKJHKLKW-QWRGUYRKSA-N Ser-Tyr-Gly Chemical compound OC[C@H](N)C(=O)N[C@H](C(=O)NCC(O)=O)CC1=CC=C(O)C=C1 FHXGMDRKJHKLKW-QWRGUYRKSA-N 0.000 description 2
- PLQWGQUNUPMNOD-KKUMJFAQSA-N Ser-Tyr-Leu Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC(C)C)C(O)=O PLQWGQUNUPMNOD-KKUMJFAQSA-N 0.000 description 2
- ZVBCMFDJIMUELU-BZSNNMDCSA-N Ser-Tyr-Phe Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)O)NC(=O)[C@H](CC2=CC=C(C=C2)O)NC(=O)[C@H](CO)N ZVBCMFDJIMUELU-BZSNNMDCSA-N 0.000 description 2
- MFQMZDPAZRZAPV-NAKRPEOUSA-N Ser-Val-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](C(C)C)NC(=O)[C@H](CO)N MFQMZDPAZRZAPV-NAKRPEOUSA-N 0.000 description 2
- SIEBDTCABMZCLF-XGEHTFHBSA-N Ser-Val-Thr Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H]([C@@H](C)O)C(O)=O SIEBDTCABMZCLF-XGEHTFHBSA-N 0.000 description 2
- 108010006785 Taq Polymerase Proteins 0.000 description 2
- NJEMRSFGDNECGF-GCJQMDKQSA-N Thr-Ala-Asp Chemical compound C[C@@H](O)[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@H](C(O)=O)CC(O)=O NJEMRSFGDNECGF-GCJQMDKQSA-N 0.000 description 2
- YBXMGKCLOPDEKA-NUMRIWBASA-N Thr-Asp-Glu Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O YBXMGKCLOPDEKA-NUMRIWBASA-N 0.000 description 2
- GARULAKWZGFIKC-RWRJDSDZSA-N Thr-Gln-Ile Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O GARULAKWZGFIKC-RWRJDSDZSA-N 0.000 description 2
- LAFLAXHTDVNVEL-WDCWCFNPSA-N Thr-Gln-Lys Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H](CCCCN)C(=O)O)N)O LAFLAXHTDVNVEL-WDCWCFNPSA-N 0.000 description 2
- LKEKWDJCJSPXNI-IRIUXVKKSA-N Thr-Glu-Tyr Chemical compound C[C@@H](O)[C@H](N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 LKEKWDJCJSPXNI-IRIUXVKKSA-N 0.000 description 2
- IQHUITKNHOKGFC-MIMYLULJSA-N Thr-Phe Chemical compound C[C@@H](O)[C@H](N)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 IQHUITKNHOKGFC-MIMYLULJSA-N 0.000 description 2
- WVVOFCVMHAXGLE-LFSVMHDDSA-N Thr-Phe-Ala Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](C)C(O)=O WVVOFCVMHAXGLE-LFSVMHDDSA-N 0.000 description 2
- GVMXJJAJLIEASL-ZJDVBMNYSA-N Thr-Pro-Thr Chemical compound C[C@@H](O)[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H]([C@@H](C)O)C(O)=O GVMXJJAJLIEASL-ZJDVBMNYSA-N 0.000 description 2
- ABCLYRRGTZNIFU-BWAGICSOSA-N Thr-Tyr-His Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CC1=CC=C(C=C1)O)C(=O)N[C@@H](CC2=CN=CN2)C(=O)O)N)O ABCLYRRGTZNIFU-BWAGICSOSA-N 0.000 description 2
- CKHWEVXPLJBEOZ-VQVTYTSYSA-N Thr-Val Chemical compound CC(C)[C@@H](C([O-])=O)NC(=O)[C@@H]([NH3+])[C@@H](C)O CKHWEVXPLJBEOZ-VQVTYTSYSA-N 0.000 description 2
- OGOYMQWIWHGTGH-KZVJFYERSA-N Thr-Val-Ala Chemical compound C[C@@H](O)[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](C)C(O)=O OGOYMQWIWHGTGH-KZVJFYERSA-N 0.000 description 2
- MNYNCKZAEIAONY-XGEHTFHBSA-N Thr-Val-Ser Chemical compound C[C@@H](O)[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CO)C(O)=O MNYNCKZAEIAONY-XGEHTFHBSA-N 0.000 description 2
- KZTLZZQTJMCGIP-ZJDVBMNYSA-N Thr-Val-Thr Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H]([C@@H](C)O)C(O)=O KZTLZZQTJMCGIP-ZJDVBMNYSA-N 0.000 description 2
- KIMOCKLJBXHFIN-YLVFBTJISA-N Trp-Ile-Gly Chemical compound C1=CC=C2C(C[C@H](N)C(=O)N[C@@H]([C@@H](C)CC)C(=O)NCC(O)=O)=CNC2=C1 KIMOCKLJBXHFIN-YLVFBTJISA-N 0.000 description 2
- YLGQHMHKAASRGJ-WDSOQIARSA-N Trp-Leu-Met Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CCSC)C(=O)O)NC(=O)[C@H](CC1=CNC2=CC=CC=C21)N YLGQHMHKAASRGJ-WDSOQIARSA-N 0.000 description 2
- 102000004243 Tubulin Human genes 0.000 description 2
- 108090000704 Tubulin Proteins 0.000 description 2
- XMNDQSYABVWZRK-BZSNNMDCSA-N Tyr-Asn-Phe Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O XMNDQSYABVWZRK-BZSNNMDCSA-N 0.000 description 2
- JRXKIVGWMMIIOF-YDHLFZDLSA-N Tyr-Asn-Val Chemical compound CC(C)[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](CC1=CC=C(C=C1)O)N JRXKIVGWMMIIOF-YDHLFZDLSA-N 0.000 description 2
- PDSLRCZINIDLMU-QWRGUYRKSA-N Tyr-Glu Chemical compound OC(=O)CC[C@@H](C(O)=O)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 PDSLRCZINIDLMU-QWRGUYRKSA-N 0.000 description 2
- HKYTWJOWZTWBQB-AVGNSLFASA-N Tyr-Glu-Asp Chemical compound OC(=O)C[C@@H](C(O)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 HKYTWJOWZTWBQB-AVGNSLFASA-N 0.000 description 2
- KCPFDGNYAMKZQP-KBPBESRZSA-N Tyr-Gly-Leu Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)NCC(=O)N[C@@H](CC(C)C)C(O)=O KCPFDGNYAMKZQP-KBPBESRZSA-N 0.000 description 2
- NXRGXTBPMOGFID-CFMVVWHZSA-N Tyr-Ile-Asn Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(N)=O)C(O)=O NXRGXTBPMOGFID-CFMVVWHZSA-N 0.000 description 2
- BYAKMYBZADCNMN-JYJNAYRXSA-N Tyr-Lys-Gln Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCC(N)=O)C(O)=O BYAKMYBZADCNMN-JYJNAYRXSA-N 0.000 description 2
- RGYCVIZZTUBSSG-JYJNAYRXSA-N Tyr-Pro-Val Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N1CCC[C@H]1C(=O)N[C@@H](C(C)C)C(O)=O RGYCVIZZTUBSSG-JYJNAYRXSA-N 0.000 description 2
- ZLFHAAGHGQBQQN-GUBZILKMSA-N Val-Ala-Pro Natural products CC(C)[C@H](N)C(=O)N[C@@H](C)C(=O)N1CCC[C@H]1C(O)=O ZLFHAAGHGQBQQN-GUBZILKMSA-N 0.000 description 2
- JOQSQZFKFYJKKJ-GUBZILKMSA-N Val-Arg-Cys Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CS)C(=O)O)N JOQSQZFKFYJKKJ-GUBZILKMSA-N 0.000 description 2
- CVUDMNSZAIZFAE-TUAOUCFPSA-N Val-Arg-Pro Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCCN=C(N)N)C(=O)N1CCC[C@@H]1C(=O)O)N CVUDMNSZAIZFAE-TUAOUCFPSA-N 0.000 description 2
- DCOOGDCRFXXQNW-ZKWXMUAHSA-N Val-Asn-Cys Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CS)C(=O)O)N DCOOGDCRFXXQNW-ZKWXMUAHSA-N 0.000 description 2
- NWDOPHYLSORNEX-QXEWZRGKSA-N Val-Asn-Met Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CCSC)C(=O)O)N NWDOPHYLSORNEX-QXEWZRGKSA-N 0.000 description 2
- JLFKWDAZBRYCGX-ZKWXMUAHSA-N Val-Asn-Ser Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CO)C(=O)O)N JLFKWDAZBRYCGX-ZKWXMUAHSA-N 0.000 description 2
- XLDYBRXERHITNH-QSFUFRPTSA-N Val-Asp-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@H](CC(O)=O)NC(=O)[C@@H](N)C(C)C XLDYBRXERHITNH-QSFUFRPTSA-N 0.000 description 2
- BMGOFDMKDVVGJG-NHCYSSNCSA-N Val-Asp-Lys Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CCCCN)C(=O)O)N BMGOFDMKDVVGJG-NHCYSSNCSA-N 0.000 description 2
- OQWNEUXPKHIEJO-NRPADANISA-N Val-Glu-Ser Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CO)C(=O)O)N OQWNEUXPKHIEJO-NRPADANISA-N 0.000 description 2
- FEFZWCSXEMVSPO-LSJOCFKGSA-N Val-His-Ala Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](Cc1cnc[nH]1)C(=O)N[C@@H](C)C(O)=O FEFZWCSXEMVSPO-LSJOCFKGSA-N 0.000 description 2
- JZWZACGUZVCQPS-RNJOBUHISA-N Val-Ile-Pro Chemical compound CC[C@H](C)[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](C(C)C)N JZWZACGUZVCQPS-RNJOBUHISA-N 0.000 description 2
- WSUWDIVCPOJFCX-TUAOUCFPSA-N Val-Met-Pro Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCSC)C(=O)N1CCC[C@@H]1C(=O)O)N WSUWDIVCPOJFCX-TUAOUCFPSA-N 0.000 description 2
- MJFSRZZJQWZHFQ-SRVKXCTJSA-N Val-Met-Val Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCSC)C(=O)N[C@@H](C(C)C)C(=O)O)N MJFSRZZJQWZHFQ-SRVKXCTJSA-N 0.000 description 2
- SSYBNWFXCFNRFN-GUBZILKMSA-N Val-Pro-Ser Chemical compound CC(C)[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CO)C(O)=O SSYBNWFXCFNRFN-GUBZILKMSA-N 0.000 description 2
- GBIUHAYJGWVNLN-UHFFFAOYSA-N Val-Ser-Pro Natural products CC(C)C(N)C(=O)NC(CO)C(=O)N1CCCC1C(O)=O GBIUHAYJGWVNLN-UHFFFAOYSA-N 0.000 description 2
- PZTZYZUTCPZWJH-FXQIFTODSA-N Val-Ser-Ser Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](CO)C(=O)O)N PZTZYZUTCPZWJH-FXQIFTODSA-N 0.000 description 2
- UJMCYJKPDFQLHX-XGEHTFHBSA-N Val-Ser-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](CO)NC(=O)[C@H](C(C)C)N)O UJMCYJKPDFQLHX-XGEHTFHBSA-N 0.000 description 2
- NZYNRRGJJVSSTJ-GUBZILKMSA-N Val-Ser-Val Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@@H](C(C)C)C(O)=O NZYNRRGJJVSSTJ-GUBZILKMSA-N 0.000 description 2
- GVRKWABULJAONN-VQVTYTSYSA-N Val-Thr Chemical compound CC(C)[C@H](N)C(=O)N[C@@H]([C@@H](C)O)C(O)=O GVRKWABULJAONN-VQVTYTSYSA-N 0.000 description 2
- PDDJTOSAVNRJRH-UNQGMJICSA-N Val-Thr-Phe Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)O)NC(=O)[C@H](C(C)C)N)O PDDJTOSAVNRJRH-UNQGMJICSA-N 0.000 description 2
- DVLWZWNAQUBZBC-ZNSHCXBVSA-N Val-Thr-Pro Chemical compound C[C@H]([C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](C(C)C)N)O DVLWZWNAQUBZBC-ZNSHCXBVSA-N 0.000 description 2
- KJFBXCFOPAKPTM-BZSNNMDCSA-N Val-Trp-Val Chemical compound C1=CC=C2C(C[C@H](NC(=O)[C@@H](N)C(C)C)C(=O)N[C@@H](C(C)C)C(O)=O)=CNC2=C1 KJFBXCFOPAKPTM-BZSNNMDCSA-N 0.000 description 2
- QPJSIBAOZBVELU-BPNCWPANSA-N Val-Tyr-Ala Chemical compound C[C@@H](C(=O)O)NC(=O)[C@H](CC1=CC=C(C=C1)O)NC(=O)[C@H](C(C)C)N QPJSIBAOZBVELU-BPNCWPANSA-N 0.000 description 2
- JSOXWWFKRJKTMT-WOPDTQHZSA-N Val-Val-Pro Chemical compound CC(C)[C@@H](C(=O)N[C@@H](C(C)C)C(=O)N1CCC[C@@H]1C(=O)O)N JSOXWWFKRJKTMT-WOPDTQHZSA-N 0.000 description 2
- 241000700605 Viruses Species 0.000 description 2
- FKNHDDTXBWMZIR-GEMLJDPKSA-N acetic acid;(2s)-1-[(2r)-2-amino-3-sulfanylpropanoyl]pyrrolidine-2-carboxylic acid Chemical compound CC(O)=O.SC[C@H](N)C(=O)N1CCC[C@H]1C(O)=O FKNHDDTXBWMZIR-GEMLJDPKSA-N 0.000 description 2
- 238000001042 affinity chromatography Methods 0.000 description 2
- 239000000556 agonist Substances 0.000 description 2
- 108010087924 alanylproline Proteins 0.000 description 2
- KOSRFJWDECSPRO-UHFFFAOYSA-N alpha-L-glutamyl-L-glutamic acid Natural products OC(=O)CCC(N)C(=O)NC(CCC(O)=O)C(O)=O KOSRFJWDECSPRO-UHFFFAOYSA-N 0.000 description 2
- 230000004075 alteration Effects 0.000 description 2
- 230000000890 antigenic effect Effects 0.000 description 2
- 239000004599 antimicrobial Substances 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 239000007864 aqueous solution Substances 0.000 description 2
- WQZGKKKJIJFFOK-VFUOTHLCSA-N beta-D-glucose Chemical compound OC[C@H]1O[C@@H](O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-VFUOTHLCSA-N 0.000 description 2
- 230000004071 biological effect Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 229960002685 biotin Drugs 0.000 description 2
- 239000011616 biotin Substances 0.000 description 2
- 230000003915 cell function Effects 0.000 description 2
- 108700010039 chimeric receptor Proteins 0.000 description 2
- 230000009137 competitive binding Effects 0.000 description 2
- 239000003184 complementary RNA Substances 0.000 description 2
- 125000004122 cyclic group Chemical group 0.000 description 2
- 235000018417 cysteine Nutrition 0.000 description 2
- 108010016616 cysteinylglycine Proteins 0.000 description 2
- 230000002950 deficient Effects 0.000 description 2
- 239000003599 detergent Substances 0.000 description 2
- 235000014113 dietary fatty acids Nutrition 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 239000000839 emulsion Substances 0.000 description 2
- 229930195729 fatty acid Natural products 0.000 description 2
- 239000000194 fatty acid Substances 0.000 description 2
- 150000004665 fatty acids Chemical class 0.000 description 2
- 238000002866 fluorescence resonance energy transfer Methods 0.000 description 2
- 239000007850 fluorescent dye Substances 0.000 description 2
- 108010078144 glutaminyl-glycine Proteins 0.000 description 2
- 108010049041 glutamylalanine Proteins 0.000 description 2
- 108010077515 glycylproline Proteins 0.000 description 2
- 108010087823 glycyltyrosine Proteins 0.000 description 2
- 238000013537 high throughput screening Methods 0.000 description 2
- 108010040030 histidinoalanine Proteins 0.000 description 2
- 108010045383 histidyl-glycyl-glutamic acid Proteins 0.000 description 2
- 108010028295 histidylhistidine Proteins 0.000 description 2
- 238000002169 hydrotherapy Methods 0.000 description 2
- 125000002887 hydroxy group Chemical group [H]O* 0.000 description 2
- 230000001900 immune effect Effects 0.000 description 2
- 230000000977 initiatory effect Effects 0.000 description 2
- 238000002347 injection Methods 0.000 description 2
- 239000007924 injection Substances 0.000 description 2
- 108010091871 leucylmethionine Proteins 0.000 description 2
- 238000011694 lewis rat Methods 0.000 description 2
- 210000004185 liver Anatomy 0.000 description 2
- 244000144972 livestock Species 0.000 description 2
- 108010057952 lysyl-phenylalanyl-lysine Proteins 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 229930182817 methionine Natural products 0.000 description 2
- 229930014626 natural product Natural products 0.000 description 2
- 238000007857 nested PCR Methods 0.000 description 2
- 231100000252 nontoxic Toxicity 0.000 description 2
- 230000003000 nontoxic effect Effects 0.000 description 2
- 210000004940 nucleus Anatomy 0.000 description 2
- 210000001517 olfactory receptor neuron Anatomy 0.000 description 2
- 229940124276 oligodeoxyribonucleotide Drugs 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- 239000000575 pesticide Substances 0.000 description 2
- 239000000825 pharmaceutical preparation Substances 0.000 description 2
- 108010065135 phenylalanyl-phenylalanyl-phenylalanine Proteins 0.000 description 2
- 108010083476 phenylalanyltryptophan Proteins 0.000 description 2
- 150000004713 phosphodiesters Chemical class 0.000 description 2
- 230000004962 physiological condition Effects 0.000 description 2
- 230000003389 potentiating effect Effects 0.000 description 2
- 239000003755 preservative agent Substances 0.000 description 2
- 230000037452 priming Effects 0.000 description 2
- 238000000746 purification Methods 0.000 description 2
- 229940044551 receptor antagonist Drugs 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000005204 segregation Methods 0.000 description 2
- 239000002904 solvent Substances 0.000 description 2
- 210000000952 spleen Anatomy 0.000 description 2
- 238000012453 sprague-dawley rat model Methods 0.000 description 2
- 150000003431 steroids Chemical class 0.000 description 2
- 150000008163 sugars Chemical class 0.000 description 2
- 230000008093 supporting effect Effects 0.000 description 2
- 239000000725 suspension Substances 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 108010061238 threonyl-glycine Proteins 0.000 description 2
- 108010072986 threonyl-seryl-lysine Proteins 0.000 description 2
- 108010036387 trimethionine Proteins 0.000 description 2
- 108010080629 tryptophan-leucine Proteins 0.000 description 2
- 108010035534 tyrosyl-leucyl-alanine Proteins 0.000 description 2
- 210000002700 urine Anatomy 0.000 description 2
- IBIDRSSEHFLGSD-UHFFFAOYSA-N valinyl-arginine Natural products CC(C)C(N)C(=O)NC(C(O)=O)CCCN=C(N)N IBIDRSSEHFLGSD-UHFFFAOYSA-N 0.000 description 2
- 108010009962 valyltyrosine Proteins 0.000 description 2
- CWFMWBHMIMNZLN-NAKRPEOUSA-N (2s)-1-[(2s)-2-[[(2s,3s)-2-amino-3-methylpentanoyl]amino]propanoyl]pyrrolidine-2-carboxylic acid Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](C)C(=O)N1CCC[C@H]1C(O)=O CWFMWBHMIMNZLN-NAKRPEOUSA-N 0.000 description 1
- DLFVBJFMPXGRIB-UHFFFAOYSA-N Acetamide Chemical class CC(N)=O DLFVBJFMPXGRIB-UHFFFAOYSA-N 0.000 description 1
- 206010001488 Aggression Diseases 0.000 description 1
- YYSWCHMLFJLLBJ-ZLUOBGJFSA-N Ala-Ala-Ser Chemical compound C[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@@H](CO)C(O)=O YYSWCHMLFJLLBJ-ZLUOBGJFSA-N 0.000 description 1
- SHYYAQLDNVHPFT-DLOVCJGASA-N Ala-Asn-Phe Chemical compound C[C@H](N)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 SHYYAQLDNVHPFT-DLOVCJGASA-N 0.000 description 1
- GORKKVHIBWAQHM-GCJQMDKQSA-N Ala-Asn-Thr Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O GORKKVHIBWAQHM-GCJQMDKQSA-N 0.000 description 1
- GSCLWXDNIMNIJE-ZLUOBGJFSA-N Ala-Asp-Asn Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(N)=O)C(O)=O GSCLWXDNIMNIJE-ZLUOBGJFSA-N 0.000 description 1
- FRFDXQWNDZMREB-ACZMJKKPSA-N Ala-Cys-Gln Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CS)C(=O)N[C@@H](CCC(N)=O)C(O)=O FRFDXQWNDZMREB-ACZMJKKPSA-N 0.000 description 1
- UQJUGHFKNKGHFQ-VZFHVOOUSA-N Ala-Cys-Thr Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CS)C(=O)N[C@@H]([C@@H](C)O)C(O)=O UQJUGHFKNKGHFQ-VZFHVOOUSA-N 0.000 description 1
- FVSOUJZKYWEFOB-KBIXCLLPSA-N Ala-Gln-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](C)N FVSOUJZKYWEFOB-KBIXCLLPSA-N 0.000 description 1
- LMFXXZPPZDCPTA-ZKWXMUAHSA-N Ala-Gly-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)CNC(=O)[C@H](C)N LMFXXZPPZDCPTA-ZKWXMUAHSA-N 0.000 description 1
- PCIFXPRIFWKWLK-YUMQZZPRSA-N Ala-Gly-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)CNC(=O)[C@H](C)N PCIFXPRIFWKWLK-YUMQZZPRSA-N 0.000 description 1
- CWEAKSWWKHGTRJ-BQBZGAKWSA-N Ala-Gly-Met Chemical compound [H]N[C@@H](C)C(=O)NCC(=O)N[C@@H](CCSC)C(O)=O CWEAKSWWKHGTRJ-BQBZGAKWSA-N 0.000 description 1
- QHASENCZLDHBGX-ONGXEEELSA-N Ala-Gly-Phe Chemical compound C[C@H](N)C(=O)NCC(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 QHASENCZLDHBGX-ONGXEEELSA-N 0.000 description 1
- NBTGEURICRTMGL-WHFBIAKZSA-N Ala-Gly-Ser Chemical compound C[C@H](N)C(=O)NCC(=O)N[C@@H](CO)C(O)=O NBTGEURICRTMGL-WHFBIAKZSA-N 0.000 description 1
- XZWXFWBHYRFLEF-FSPLSTOPSA-N Ala-His Chemical compound C[C@H](N)C(=O)N[C@H](C(O)=O)CC1=CN=CN1 XZWXFWBHYRFLEF-FSPLSTOPSA-N 0.000 description 1
- ZPXCNXMJEZKRLU-LSJOCFKGSA-N Ala-His-Arg Chemical compound NC(N)=NCCC[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@@H](N)C)CC1=CN=CN1 ZPXCNXMJEZKRLU-LSJOCFKGSA-N 0.000 description 1
- NJWJSLCQEDMGNC-MBLNEYKQSA-N Ala-His-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](CC1=CN=CN1)NC(=O)[C@H](C)N)O NJWJSLCQEDMGNC-MBLNEYKQSA-N 0.000 description 1
- PNALXAODQKTNLV-JBDRJPRFSA-N Ala-Ile-Ala Chemical compound C[C@H](N)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](C)C(O)=O PNALXAODQKTNLV-JBDRJPRFSA-N 0.000 description 1
- SOBIAADAMRHGKH-CIUDSAMLSA-N Ala-Leu-Ser Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(O)=O SOBIAADAMRHGKH-CIUDSAMLSA-N 0.000 description 1
- VHEVVUZDDUCAKU-FXQIFTODSA-N Ala-Met-Asp Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CC(O)=O)C(O)=O VHEVVUZDDUCAKU-FXQIFTODSA-N 0.000 description 1
- MAEQBGQTDWDSJQ-LSJOCFKGSA-N Ala-Met-His Chemical compound C[C@@H](C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CC1=CN=CN1)C(=O)O)N MAEQBGQTDWDSJQ-LSJOCFKGSA-N 0.000 description 1
- BDQNLQSWRAPHGU-DLOVCJGASA-N Ala-Phe-Cys Chemical compound C[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CS)C(=O)O)N BDQNLQSWRAPHGU-DLOVCJGASA-N 0.000 description 1
- CYBJZLQSUJEMAS-LFSVMHDDSA-N Ala-Phe-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](CC1=CC=CC=C1)NC(=O)[C@H](C)N)O CYBJZLQSUJEMAS-LFSVMHDDSA-N 0.000 description 1
- IHMCQESUJVZTKW-UBHSHLNASA-N Ala-Phe-Val Chemical compound CC(C)[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@H](C)N)CC1=CC=CC=C1 IHMCQESUJVZTKW-UBHSHLNASA-N 0.000 description 1
- MAZZQZWCCYJQGZ-GUBZILKMSA-N Ala-Pro-Arg Chemical compound [H]N[C@@H](C)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCCNC(N)=N)C(O)=O MAZZQZWCCYJQGZ-GUBZILKMSA-N 0.000 description 1
- DYJJJCHDHLEFDW-FXQIFTODSA-N Ala-Pro-Cys Chemical compound C[C@@H](C(=O)N1CCC[C@H]1C(=O)N[C@@H](CS)C(=O)O)N DYJJJCHDHLEFDW-FXQIFTODSA-N 0.000 description 1
- ADSGHMXEAZJJNF-DCAQKATOSA-N Ala-Pro-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@@H]1CCCN1C(=O)[C@H](C)N ADSGHMXEAZJJNF-DCAQKATOSA-N 0.000 description 1
- NZGRHTKZFSVPAN-BIIVOSGPSA-N Ala-Ser-Pro Chemical compound C[C@@H](C(=O)N[C@@H](CO)C(=O)N1CCC[C@@H]1C(=O)O)N NZGRHTKZFSVPAN-BIIVOSGPSA-N 0.000 description 1
- XQNRANMFRPCFFW-GCJQMDKQSA-N Ala-Thr-Asn Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)[C@H](C)N)O XQNRANMFRPCFFW-GCJQMDKQSA-N 0.000 description 1
- YNOCMHZSWJMGBB-GCJQMDKQSA-N Ala-Thr-Asp Chemical compound [H]N[C@@H](C)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(O)=O)C(O)=O YNOCMHZSWJMGBB-GCJQMDKQSA-N 0.000 description 1
- AENHOIXXHKNIQL-AUTRQRHGSA-N Ala-Tyr-Ala Chemical compound [O-]C(=O)[C@H](C)NC(=O)[C@@H](NC(=O)[C@@H]([NH3+])C)CC1=CC=C(O)C=C1 AENHOIXXHKNIQL-AUTRQRHGSA-N 0.000 description 1
- SOTXLXCVCZAKFI-FXQIFTODSA-N Ala-Val-Ala Chemical compound C[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](C)C(O)=O SOTXLXCVCZAKFI-FXQIFTODSA-N 0.000 description 1
- ZCUFMRIQCPNOHZ-NRPADANISA-N Ala-Val-Gln Chemical compound C[C@@H](C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CCC(=O)N)C(=O)O)N ZCUFMRIQCPNOHZ-NRPADANISA-N 0.000 description 1
- CLOMBHBBUKAUBP-LSJOCFKGSA-N Ala-Val-His Chemical compound C[C@@H](C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC1=CN=CN1)C(=O)O)N CLOMBHBBUKAUBP-LSJOCFKGSA-N 0.000 description 1
- LYILPUNCKACNGF-NAKRPEOUSA-N Ala-Val-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@H](C(C)C)NC(=O)[C@H](C)N LYILPUNCKACNGF-NAKRPEOUSA-N 0.000 description 1
- ZDILXFDENZVOTL-BPNCWPANSA-N Ala-Val-Tyr Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O ZDILXFDENZVOTL-BPNCWPANSA-N 0.000 description 1
- 108010088751 Albumins Proteins 0.000 description 1
- 102000009027 Albumins Human genes 0.000 description 1
- 102000002260 Alkaline Phosphatase Human genes 0.000 description 1
- 108020004774 Alkaline Phosphatase Proteins 0.000 description 1
- 108020005544 Antisense RNA Proteins 0.000 description 1
- VKKYFICVTYKFIO-CIUDSAMLSA-N Arg-Ala-Glu Chemical compound OC(=O)CC[C@@H](C(O)=O)NC(=O)[C@H](C)NC(=O)[C@@H](N)CCCN=C(N)N VKKYFICVTYKFIO-CIUDSAMLSA-N 0.000 description 1
- IJPNNYWHXGADJG-GUBZILKMSA-N Arg-Ala-Val Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](C)C(=O)N[C@@H](C(C)C)C(O)=O IJPNNYWHXGADJG-GUBZILKMSA-N 0.000 description 1
- HJVGMOYJDDXLMI-AVGNSLFASA-N Arg-Arg-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@H](CCCNC(N)=N)NC(=O)[C@@H](N)CCCNC(N)=N HJVGMOYJDDXLMI-AVGNSLFASA-N 0.000 description 1
- WESHVRNMNFMVBE-FXQIFTODSA-N Arg-Asn-Asp Chemical compound C(C[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CC(=O)O)C(=O)O)N)CN=C(N)N WESHVRNMNFMVBE-FXQIFTODSA-N 0.000 description 1
- BVBKBQRPOJFCQM-DCAQKATOSA-N Arg-Asn-Leu Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(C)C)C(O)=O BVBKBQRPOJFCQM-DCAQKATOSA-N 0.000 description 1
- KWTVWJPNHAOREN-IHRRRGAJSA-N Arg-Asn-Phe Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O KWTVWJPNHAOREN-IHRRRGAJSA-N 0.000 description 1
- ZTKHZAXGTFXUDD-VEVYYDQMSA-N Arg-Asn-Thr Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O ZTKHZAXGTFXUDD-VEVYYDQMSA-N 0.000 description 1
- YFBGNGASPGRWEM-DCAQKATOSA-N Arg-Asp-His Chemical compound C1=C(NC=N1)C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](CCCN=C(N)N)N YFBGNGASPGRWEM-DCAQKATOSA-N 0.000 description 1
- FBLMOFHNVQBKRR-IHRRRGAJSA-N Arg-Asp-Tyr Chemical compound NC(N)=NCCC[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 FBLMOFHNVQBKRR-IHRRRGAJSA-N 0.000 description 1
- VDBKFYYIBLXEIF-GUBZILKMSA-N Arg-Gln-Glu Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O VDBKFYYIBLXEIF-GUBZILKMSA-N 0.000 description 1
- RKRSYHCNPFGMTA-CIUDSAMLSA-N Arg-Glu-Asn Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(N)=O)C(O)=O RKRSYHCNPFGMTA-CIUDSAMLSA-N 0.000 description 1
- UFBURHXMKFQVLM-CIUDSAMLSA-N Arg-Glu-Ser Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CO)C(O)=O UFBURHXMKFQVLM-CIUDSAMLSA-N 0.000 description 1
- PHHRSPBBQUFULD-UWVGGRQHSA-N Arg-Gly-Lys Chemical compound C(CCN)C[C@@H](C(=O)O)NC(=O)CNC(=O)[C@H](CCCN=C(N)N)N PHHRSPBBQUFULD-UWVGGRQHSA-N 0.000 description 1
- BNODVYXZAAXSHW-IUCAKERBSA-N Arg-His Chemical compound NC(=N)NCCC[C@H](N)C(=O)N[C@H](C(O)=O)CC1=CNC=N1 BNODVYXZAAXSHW-IUCAKERBSA-N 0.000 description 1
- OCDJOVKIUJVUMO-SRVKXCTJSA-N Arg-His-Gln Chemical compound C1=C(NC=N1)C[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)O)NC(=O)[C@H](CCCN=C(N)N)N OCDJOVKIUJVUMO-SRVKXCTJSA-N 0.000 description 1
- MSILNNHVVMMTHZ-UWVGGRQHSA-N Arg-His-Gly Chemical compound NC(N)=NCCC[C@H](N)C(=O)N[C@H](C(=O)NCC(O)=O)CC1=CN=CN1 MSILNNHVVMMTHZ-UWVGGRQHSA-N 0.000 description 1
- MMGCRPZQZWTZTA-IHRRRGAJSA-N Arg-His-His Chemical compound C1=C(NC=N1)C[C@@H](C(=O)N[C@@H](CC2=CN=CN2)C(=O)O)NC(=O)[C@H](CCCN=C(N)N)N MMGCRPZQZWTZTA-IHRRRGAJSA-N 0.000 description 1
- CRCCTGPNZUCAHE-DCAQKATOSA-N Arg-His-Ser Chemical compound NC(N)=NCCC[C@H](N)C(=O)N[C@H](C(=O)N[C@@H](CO)C(O)=O)CC1=CN=CN1 CRCCTGPNZUCAHE-DCAQKATOSA-N 0.000 description 1
- FRMQITGHXMUNDF-GMOBBJLQSA-N Arg-Ile-Asn Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)[C@H](CCCN=C(N)N)N FRMQITGHXMUNDF-GMOBBJLQSA-N 0.000 description 1
- LVMUGODRNHFGRA-AVGNSLFASA-N Arg-Leu-Arg Chemical compound NC(N)=NCCC[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCN=C(N)N)C(O)=O LVMUGODRNHFGRA-AVGNSLFASA-N 0.000 description 1
- RTDZQOFEGPWSJD-AVGNSLFASA-N Arg-Leu-Val Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C(C)C)C(O)=O RTDZQOFEGPWSJD-AVGNSLFASA-N 0.000 description 1
- BNYNOWJESJJIOI-XUXIUFHCSA-N Arg-Lys-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCN=C(N)N)N BNYNOWJESJJIOI-XUXIUFHCSA-N 0.000 description 1
- BTJVOUQWFXABOI-IHRRRGAJSA-N Arg-Lys-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@H](CCCCN)NC(=O)[C@@H](N)CCCNC(N)=N BTJVOUQWFXABOI-IHRRRGAJSA-N 0.000 description 1
- VIINVRPKMUZYOI-DCAQKATOSA-N Arg-Met-Glu Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CCC(O)=O)C(O)=O VIINVRPKMUZYOI-DCAQKATOSA-N 0.000 description 1
- RFNDQEWMNJMQHD-SZMVWBNQSA-N Arg-Met-Trp Chemical compound CSCC[C@@H](C(=O)N[C@@H](CC1=CNC2=CC=CC=C21)C(=O)O)NC(=O)[C@H](CCCN=C(N)N)N RFNDQEWMNJMQHD-SZMVWBNQSA-N 0.000 description 1
- DPLFNLDACGGBAK-KKUMJFAQSA-N Arg-Phe-Glu Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)O)NC(=O)[C@H](CCCN=C(N)N)N DPLFNLDACGGBAK-KKUMJFAQSA-N 0.000 description 1
- XFXZKCRBBOVJKS-BVSLBCMMSA-N Arg-Phe-Trp Chemical compound C([C@H](NC(=O)[C@H](CCCN=C(N)N)N)C(=O)N[C@@H](CC=1C2=CC=CC=C2NC=1)C(O)=O)C1=CC=CC=C1 XFXZKCRBBOVJKS-BVSLBCMMSA-N 0.000 description 1
- LFAUVOXPCGJKTB-DCAQKATOSA-N Arg-Ser-His Chemical compound C1=C(NC=N1)C[C@@H](C(=O)O)NC(=O)[C@H](CO)NC(=O)[C@H](CCCN=C(N)N)N LFAUVOXPCGJKTB-DCAQKATOSA-N 0.000 description 1
- KMFPQTITXUKJOV-DCAQKATOSA-N Arg-Ser-Leu Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(O)=O KMFPQTITXUKJOV-DCAQKATOSA-N 0.000 description 1
- XNSKSTRGQIPTSE-ACZMJKKPSA-N Arg-Thr Chemical compound C[C@@H]([C@@H](C(=O)O)NC(=O)[C@H](CCCN=C(N)N)N)O XNSKSTRGQIPTSE-ACZMJKKPSA-N 0.000 description 1
- XRNXPIGJPQHCPC-RCWTZXSCSA-N Arg-Thr-Val Chemical compound CC(C)[C@H](NC(=O)[C@@H](NC(=O)[C@@H](N)CCCNC(N)=N)[C@@H](C)O)C(O)=O XRNXPIGJPQHCPC-RCWTZXSCSA-N 0.000 description 1
- UVTGNSWSRSCPLP-UHFFFAOYSA-N Arg-Tyr Natural products NC(CCNC(=N)N)C(=O)NC(Cc1ccc(O)cc1)C(=O)O UVTGNSWSRSCPLP-UHFFFAOYSA-N 0.000 description 1
- NMTANZXPDAHUKU-ULQDDVLXSA-N Arg-Tyr-Lys Chemical compound NC(N)=NCCC[C@H](N)C(=O)N[C@H](C(=O)N[C@@H](CCCCN)C(O)=O)CC1=CC=C(O)C=C1 NMTANZXPDAHUKU-ULQDDVLXSA-N 0.000 description 1
- QCTOLCVIGRLMQS-HRCADAONSA-N Arg-Tyr-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CC2=CC=C(C=C2)O)NC(=O)[C@H](CCCN=C(N)N)N)C(=O)O QCTOLCVIGRLMQS-HRCADAONSA-N 0.000 description 1
- QHUOOCKNNURZSL-IHRRRGAJSA-N Arg-Tyr-Ser Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CO)C(O)=O QHUOOCKNNURZSL-IHRRRGAJSA-N 0.000 description 1
- 239000004475 Arginine Substances 0.000 description 1
- QEYJFBMTSMLPKZ-ZKWXMUAHSA-N Asn-Ala-Val Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](C)C(=O)N[C@@H](C(C)C)C(O)=O QEYJFBMTSMLPKZ-ZKWXMUAHSA-N 0.000 description 1
- NPDLYUOYAGBHFB-WDSKDSINSA-N Asn-Arg Chemical compound NC(=O)C[C@H](N)C(=O)N[C@H](C(O)=O)CCCN=C(N)N NPDLYUOYAGBHFB-WDSKDSINSA-N 0.000 description 1
- JZRLLSOWDYUKOK-SRVKXCTJSA-N Asn-Asp-Phe Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](CC(=O)N)N JZRLLSOWDYUKOK-SRVKXCTJSA-N 0.000 description 1
- VYLVOMUVLMGCRF-ZLUOBGJFSA-N Asn-Asp-Ser Chemical compound NC(=O)C[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(O)=O VYLVOMUVLMGCRF-ZLUOBGJFSA-N 0.000 description 1
- QCWJKJLNCFEVPQ-WHFBIAKZSA-N Asn-Gln Chemical compound NC(=O)C[C@H](N)C(=O)N[C@H](C(O)=O)CCC(N)=O QCWJKJLNCFEVPQ-WHFBIAKZSA-N 0.000 description 1
- AYKKKGFJXIDYLX-ACZMJKKPSA-N Asn-Gln-Asn Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(N)=O)C(O)=O AYKKKGFJXIDYLX-ACZMJKKPSA-N 0.000 description 1
- WPOLSNAQGVHROR-GUBZILKMSA-N Asn-Gln-Leu Chemical compound CC(C)C[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)N)NC(=O)[C@H](CC(=O)N)N WPOLSNAQGVHROR-GUBZILKMSA-N 0.000 description 1
- BZMWJLLUAKSIMH-FXQIFTODSA-N Asn-Glu-Glu Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O BZMWJLLUAKSIMH-FXQIFTODSA-N 0.000 description 1
- GFFRWIJAFFMQGM-NUMRIWBASA-N Asn-Glu-Thr Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O GFFRWIJAFFMQGM-NUMRIWBASA-N 0.000 description 1
- KLKHFFMNGWULBN-VKHMYHEASA-N Asn-Gly Chemical compound NC(=O)C[C@H](N)C(=O)NCC(O)=O KLKHFFMNGWULBN-VKHMYHEASA-N 0.000 description 1
- HYQYLOSCICEYTR-YUMQZZPRSA-N Asn-Gly-Leu Chemical compound [H]N[C@@H](CC(N)=O)C(=O)NCC(=O)N[C@@H](CC(C)C)C(O)=O HYQYLOSCICEYTR-YUMQZZPRSA-N 0.000 description 1
- PHJPKNUWWHRAOC-PEFMBERDSA-N Asn-Ile-Gln Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)O)NC(=O)[C@H](CC(=O)N)N PHJPKNUWWHRAOC-PEFMBERDSA-N 0.000 description 1
- YYSYDIYQTUPNQQ-SXTJYALSSA-N Asn-Ile-Ile Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O YYSYDIYQTUPNQQ-SXTJYALSSA-N 0.000 description 1
- PNHQRQTVBRDIEF-CIUDSAMLSA-N Asn-Leu-Ala Chemical compound C[C@@H](C(=O)O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CC(=O)N)N PNHQRQTVBRDIEF-CIUDSAMLSA-N 0.000 description 1
- HFPXZWPUVFVNLL-GUBZILKMSA-N Asn-Leu-Gln Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(O)=O HFPXZWPUVFVNLL-GUBZILKMSA-N 0.000 description 1
- HDHZCEDPLTVHFZ-GUBZILKMSA-N Asn-Leu-Glu Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(O)=O)C(O)=O HDHZCEDPLTVHFZ-GUBZILKMSA-N 0.000 description 1
- MYCSPQIARXTUTP-SRVKXCTJSA-N Asn-Leu-His Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CC1=CN=CN1)C(=O)O)NC(=O)[C@H](CC(=O)N)N MYCSPQIARXTUTP-SRVKXCTJSA-N 0.000 description 1
- DJIMLSXHXKWADV-CIUDSAMLSA-N Asn-Leu-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)CC(N)=O DJIMLSXHXKWADV-CIUDSAMLSA-N 0.000 description 1
- TZFQICWZWFNIKU-KKUMJFAQSA-N Asn-Leu-Tyr Chemical compound NC(=O)C[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 TZFQICWZWFNIKU-KKUMJFAQSA-N 0.000 description 1
- QJMCHPGWFZZRID-BQBZGAKWSA-N Asn-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@@H](N)CC(N)=O QJMCHPGWFZZRID-BQBZGAKWSA-N 0.000 description 1
- FODVBOKTYKYRFJ-CIUDSAMLSA-N Asn-Lys-Cys Chemical compound C(CCN)C[C@@H](C(=O)N[C@@H](CS)C(=O)O)NC(=O)[C@H](CC(=O)N)N FODVBOKTYKYRFJ-CIUDSAMLSA-N 0.000 description 1
- ZYPWIUFLYMQZBS-SRVKXCTJSA-N Asn-Lys-Lys Chemical compound C(CCN)C[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CC(=O)N)N ZYPWIUFLYMQZBS-SRVKXCTJSA-N 0.000 description 1
- WXVGISRWSYGEDK-KKUMJFAQSA-N Asn-Lys-Phe Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)O)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CC(=O)N)N WXVGISRWSYGEDK-KKUMJFAQSA-N 0.000 description 1
- NLDNNZKUSLAYFW-NHCYSSNCSA-N Asn-Lys-Val Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](C(C)C)C(O)=O NLDNNZKUSLAYFW-NHCYSSNCSA-N 0.000 description 1
- WCRQQIPFSXFIRN-LPEHRKFASA-N Asn-Met-Pro Chemical compound CSCC[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CC(=O)N)N WCRQQIPFSXFIRN-LPEHRKFASA-N 0.000 description 1
- ZVUMKOMKQCANOM-AVGNSLFASA-N Asn-Phe-Gln Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCC(N)=O)C(O)=O ZVUMKOMKQCANOM-AVGNSLFASA-N 0.000 description 1
- YUUIAUXBNOHFRJ-IHRRRGAJSA-N Asn-Phe-Met Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCSC)C(O)=O YUUIAUXBNOHFRJ-IHRRRGAJSA-N 0.000 description 1
- GKKUBLFXKRDMFC-BQBZGAKWSA-N Asn-Pro-Gly Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N1CCC[C@H]1C(=O)NCC(O)=O GKKUBLFXKRDMFC-BQBZGAKWSA-N 0.000 description 1
- VHQSGALUSWIYOD-QXEWZRGKSA-N Asn-Pro-Val Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N1CCC[C@H]1C(=O)N[C@@H](C(C)C)C(O)=O VHQSGALUSWIYOD-QXEWZRGKSA-N 0.000 description 1
- DOURAOODTFJRIC-CIUDSAMLSA-N Asn-Ser-His Chemical compound C1=C(NC=N1)C[C@@H](C(=O)O)NC(=O)[C@H](CO)NC(=O)[C@H](CC(=O)N)N DOURAOODTFJRIC-CIUDSAMLSA-N 0.000 description 1
- MKJBPDLENBUHQU-CIUDSAMLSA-N Asn-Ser-Leu Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(O)=O MKJBPDLENBUHQU-CIUDSAMLSA-N 0.000 description 1
- WLVLIYYBPPONRJ-GCJQMDKQSA-N Asn-Thr-Ala Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C)C(O)=O WLVLIYYBPPONRJ-GCJQMDKQSA-N 0.000 description 1
- HPASIOLTWSNMFB-OLHMAJIHSA-N Asn-Thr-Asp Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(O)=O)C(O)=O HPASIOLTWSNMFB-OLHMAJIHSA-N 0.000 description 1
- ZUFPUBYQYWCMDB-NUMRIWBASA-N Asn-Thr-Glu Chemical compound NC(=O)C[C@H](N)C(=O)N[C@@H]([C@H](O)C)C(=O)N[C@@H](CCC(O)=O)C(O)=O ZUFPUBYQYWCMDB-NUMRIWBASA-N 0.000 description 1
- JBDLMLZNDRLDIX-HJGDQZAQSA-N Asn-Thr-Leu Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(O)=O JBDLMLZNDRLDIX-HJGDQZAQSA-N 0.000 description 1
- ULZOQOKFYMXHPZ-AQZXSJQPSA-N Asn-Trp-Thr Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC1=CNC2=C1C=CC=C2)C(=O)N[C@@H]([C@@H](C)O)C(O)=O ULZOQOKFYMXHPZ-AQZXSJQPSA-N 0.000 description 1
- FYRVDDJMNISIKJ-UWVGGRQHSA-N Asn-Tyr Chemical compound NC(=O)C[C@H](N)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 FYRVDDJMNISIKJ-UWVGGRQHSA-N 0.000 description 1
- SKQTXVZTCGSRJS-SRVKXCTJSA-N Asn-Tyr-Asp Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)O)NC(=O)[C@H](CC(=O)N)N)O SKQTXVZTCGSRJS-SRVKXCTJSA-N 0.000 description 1
- YNQMEIJEWSHOEO-SRVKXCTJSA-N Asn-Tyr-Cys Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)N[C@@H](CS)C(=O)O)NC(=O)[C@H](CC(=O)N)N)O YNQMEIJEWSHOEO-SRVKXCTJSA-N 0.000 description 1
- BEHQTVDBCLSCBY-CFMVVWHZSA-N Asn-Tyr-Ile Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O BEHQTVDBCLSCBY-CFMVVWHZSA-N 0.000 description 1
- ZLGKHJHFYSRUBH-FXQIFTODSA-N Asp-Arg-Asp Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(O)=O)C(O)=O ZLGKHJHFYSRUBH-FXQIFTODSA-N 0.000 description 1
- VGRHZPNRCLAHQA-IMJSIDKUSA-N Asp-Asn Chemical compound OC(=O)C[C@H](N)C(=O)N[C@@H](CC(N)=O)C(O)=O VGRHZPNRCLAHQA-IMJSIDKUSA-N 0.000 description 1
- QRULNKJGYQQZMW-ZLUOBGJFSA-N Asp-Asn-Asp Chemical compound OC(=O)C[C@H](N)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(O)=O)C(O)=O QRULNKJGYQQZMW-ZLUOBGJFSA-N 0.000 description 1
- ATYWBXGNXZYZGI-ACZMJKKPSA-N Asp-Asn-Gln Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(N)=O)C(O)=O ATYWBXGNXZYZGI-ACZMJKKPSA-N 0.000 description 1
- KNMRXHIAVXHCLW-ZLUOBGJFSA-N Asp-Asn-Ser Chemical compound C([C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CO)C(=O)O)N)C(=O)O KNMRXHIAVXHCLW-ZLUOBGJFSA-N 0.000 description 1
- FRYULLIZUDQONW-IMJSIDKUSA-N Asp-Asp Chemical compound OC(=O)C[C@H](N)C(=O)N[C@@H](CC(O)=O)C(O)=O FRYULLIZUDQONW-IMJSIDKUSA-N 0.000 description 1
- QXHVOUSPVAWEMX-ZLUOBGJFSA-N Asp-Asp-Ser Chemical compound OC(=O)C[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(O)=O QXHVOUSPVAWEMX-ZLUOBGJFSA-N 0.000 description 1
- AAIUGNSRQDGCDC-ZLUOBGJFSA-N Asp-Cys-Cys Chemical compound C([C@@H](C(=O)N[C@@H](CS)C(=O)N[C@@H](CS)C(=O)O)N)C(=O)O AAIUGNSRQDGCDC-ZLUOBGJFSA-N 0.000 description 1
- NURJSGZGBVJFAD-ZLUOBGJFSA-N Asp-Cys-Ser Chemical compound C([C@@H](C(=O)N[C@@H](CS)C(=O)N[C@@H](CO)C(=O)O)N)C(=O)O NURJSGZGBVJFAD-ZLUOBGJFSA-N 0.000 description 1
- PJERDVUTUDZPGX-ZKWXMUAHSA-N Asp-Cys-Val Chemical compound CC(C)[C@@H](C(O)=O)NC(=O)[C@H](CS)NC(=O)[C@@H](N)CC(O)=O PJERDVUTUDZPGX-ZKWXMUAHSA-N 0.000 description 1
- NYQHSUGFEWDWPD-ACZMJKKPSA-N Asp-Gln-Ala Chemical compound C[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)N)NC(=O)[C@H](CC(=O)O)N NYQHSUGFEWDWPD-ACZMJKKPSA-N 0.000 description 1
- ZSJFGGSPCCHMNE-LAEOZQHASA-N Asp-Gln-Val Chemical compound CC(C)[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)N)NC(=O)[C@H](CC(=O)O)N ZSJFGGSPCCHMNE-LAEOZQHASA-N 0.000 description 1
- VILLWIDTHYPSLC-PEFMBERDSA-N Asp-Glu-Ile Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O VILLWIDTHYPSLC-PEFMBERDSA-N 0.000 description 1
- DGKCOYGQLNWNCJ-ACZMJKKPSA-N Asp-Glu-Ser Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CO)C(O)=O DGKCOYGQLNWNCJ-ACZMJKKPSA-N 0.000 description 1
- WBDWQKRLTVCDSY-WHFBIAKZSA-N Asp-Gly-Asp Chemical compound OC(=O)C[C@H](N)C(=O)NCC(=O)N[C@@H](CC(O)=O)C(O)=O WBDWQKRLTVCDSY-WHFBIAKZSA-N 0.000 description 1
- ZSVJVIOVABDTTL-YUMQZZPRSA-N Asp-Gly-His Chemical compound C1=C(NC=N1)C[C@@H](C(=O)O)NC(=O)CNC(=O)[C@H](CC(=O)O)N ZSVJVIOVABDTTL-YUMQZZPRSA-N 0.000 description 1
- RWHHSFSWKFBTCF-KKUMJFAQSA-N Asp-His-Phe Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)O)NC(=O)[C@H](CC2=CN=CN2)NC(=O)[C@H](CC(=O)O)N RWHHSFSWKFBTCF-KKUMJFAQSA-N 0.000 description 1
- RKNIUWSZIAUEPK-PBCZWWQYSA-N Asp-His-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](CC1=CN=CN1)NC(=O)[C@H](CC(=O)O)N)O RKNIUWSZIAUEPK-PBCZWWQYSA-N 0.000 description 1
- BSWHERGFUNMWGS-UHFFFAOYSA-N Asp-Ile Chemical compound CCC(C)C(C(O)=O)NC(=O)C(N)CC(O)=O BSWHERGFUNMWGS-UHFFFAOYSA-N 0.000 description 1
- TZOZNVLBTAFJRW-UGYAYLCHSA-N Asp-Ile-Asp Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)O)NC(=O)[C@H](CC(=O)O)N TZOZNVLBTAFJRW-UGYAYLCHSA-N 0.000 description 1
- QNFRBNZGVVKBNJ-PEFMBERDSA-N Asp-Ile-Gln Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)O)NC(=O)[C@H](CC(=O)O)N QNFRBNZGVVKBNJ-PEFMBERDSA-N 0.000 description 1
- UJGRZQYSNYTCAX-SRVKXCTJSA-N Asp-Leu-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)CC(O)=O UJGRZQYSNYTCAX-SRVKXCTJSA-N 0.000 description 1
- UMHUHHJMEXNSIV-CIUDSAMLSA-N Asp-Leu-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)CC(O)=O UMHUHHJMEXNSIV-CIUDSAMLSA-N 0.000 description 1
- OAMLVOVXNKILLQ-BQBZGAKWSA-N Asp-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@@H](N)CC(O)=O OAMLVOVXNKILLQ-BQBZGAKWSA-N 0.000 description 1
- YWLDTBBUHZJQHW-KKUMJFAQSA-N Asp-Lys-Phe Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)O)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CC(=O)O)N YWLDTBBUHZJQHW-KKUMJFAQSA-N 0.000 description 1
- DPNWSMBUYCLEDG-CIUDSAMLSA-N Asp-Lys-Ser Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CO)C(O)=O DPNWSMBUYCLEDG-CIUDSAMLSA-N 0.000 description 1
- RXBGWGRSWXOBGK-KKUMJFAQSA-N Asp-Lys-Tyr Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O RXBGWGRSWXOBGK-KKUMJFAQSA-N 0.000 description 1
- SARSTIZOZFBDOM-FXQIFTODSA-N Asp-Met-Ala Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](C)C(O)=O SARSTIZOZFBDOM-FXQIFTODSA-N 0.000 description 1
- DJCAHYVLMSRBFR-QXEWZRGKSA-N Asp-Met-Val Chemical compound CC(C)[C@@H](C(O)=O)NC(=O)[C@H](CCSC)NC(=O)[C@@H](N)CC(O)=O DJCAHYVLMSRBFR-QXEWZRGKSA-N 0.000 description 1
- PCJOFZYFFMBZKC-PCBIJLKTSA-N Asp-Phe-Ile Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O PCJOFZYFFMBZKC-PCBIJLKTSA-N 0.000 description 1
- USNJAPJZSGTTPX-XVSYOHENSA-N Asp-Phe-Thr Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H]([C@@H](C)O)C(O)=O USNJAPJZSGTTPX-XVSYOHENSA-N 0.000 description 1
- ZKAOJVJQGVUIIU-GUBZILKMSA-N Asp-Pro-Arg Chemical compound OC(=O)C[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCCNC(N)=N)C(O)=O ZKAOJVJQGVUIIU-GUBZILKMSA-N 0.000 description 1
- DWBZEJHQQIURML-IMJSIDKUSA-N Asp-Ser Chemical compound OC(=O)C[C@H](N)C(=O)N[C@@H](CO)C(O)=O DWBZEJHQQIURML-IMJSIDKUSA-N 0.000 description 1
- QSFHZPQUAAQHAQ-CIUDSAMLSA-N Asp-Ser-Leu Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(O)=O QSFHZPQUAAQHAQ-CIUDSAMLSA-N 0.000 description 1
- GXHDGYOXPNQCKM-XVSYOHENSA-N Asp-Thr-Phe Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)O)NC(=O)[C@H](CC(=O)O)N)O GXHDGYOXPNQCKM-XVSYOHENSA-N 0.000 description 1
- PDIYGFYAMZZFCW-JIOCBJNQSA-N Asp-Thr-Pro Chemical compound C[C@H]([C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CC(=O)O)N)O PDIYGFYAMZZFCW-JIOCBJNQSA-N 0.000 description 1
- GCACQYDBDHRVGE-LKXGYXEUSA-N Asp-Thr-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@H]([C@H](O)C)NC(=O)[C@@H](N)CC(O)=O GCACQYDBDHRVGE-LKXGYXEUSA-N 0.000 description 1
- XWKBWZXGNXTDKY-ZKWXMUAHSA-N Asp-Val-Ala Chemical compound OC(=O)[C@H](C)NC(=O)[C@H](C(C)C)NC(=O)[C@@H](N)CC(O)=O XWKBWZXGNXTDKY-ZKWXMUAHSA-N 0.000 description 1
- GIKOVDMXBAFXDF-NHCYSSNCSA-N Asp-Val-Leu Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC(C)C)C(O)=O GIKOVDMXBAFXDF-NHCYSSNCSA-N 0.000 description 1
- SFJUYBCDQBAYAJ-YDHLFZDLSA-N Asp-Val-Phe Chemical compound OC(=O)C[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 SFJUYBCDQBAYAJ-YDHLFZDLSA-N 0.000 description 1
- DCXYFEDJOCDNAF-UHFFFAOYSA-N Asparagine Natural products OC(=O)C(N)CC(N)=O DCXYFEDJOCDNAF-UHFFFAOYSA-N 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 241000282465 Canis Species 0.000 description 1
- 241000283707 Capra Species 0.000 description 1
- 108090000565 Capsid Proteins Proteins 0.000 description 1
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical group [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 102100023321 Ceruloplasmin Human genes 0.000 description 1
- 102100035932 Cocaine- and amphetamine-regulated transcript protein Human genes 0.000 description 1
- 241000699800 Cricetinae Species 0.000 description 1
- TULNGKSILXCZQT-IMJSIDKUSA-N Cys-Asp Chemical compound SC[C@H](N)C(=O)N[C@H](C(O)=O)CC(O)=O TULNGKSILXCZQT-IMJSIDKUSA-N 0.000 description 1
- VZKXOWRNJDEGLZ-WHFBIAKZSA-N Cys-Asp-Gly Chemical compound SC[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)NCC(O)=O VZKXOWRNJDEGLZ-WHFBIAKZSA-N 0.000 description 1
- VKAWJBQTFCBHQY-GUBZILKMSA-N Cys-Gln-Lys Chemical compound C(CCN)C[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)N)NC(=O)[C@H](CS)N VKAWJBQTFCBHQY-GUBZILKMSA-N 0.000 description 1
- UPURLDIGQGTUPJ-ZKWXMUAHSA-N Cys-Gly-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)CNC(=O)[C@H](CS)N UPURLDIGQGTUPJ-ZKWXMUAHSA-N 0.000 description 1
- JDHMXPSXWMPYQZ-AAEUAGOBSA-N Cys-Gly-Trp Chemical compound C1=CC=C2C(=C1)C(=CN2)C[C@@H](C(=O)O)NC(=O)CNC(=O)[C@H](CS)N JDHMXPSXWMPYQZ-AAEUAGOBSA-N 0.000 description 1
- WTNLLMQAFPOCTJ-GARJFASQSA-N Cys-His-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CC2=CN=CN2)NC(=O)[C@H](CS)N)C(=O)O WTNLLMQAFPOCTJ-GARJFASQSA-N 0.000 description 1
- ZLHPWFSAUJEEAN-KBIXCLLPSA-N Cys-Ile-Gln Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)O)NC(=O)[C@H](CS)N ZLHPWFSAUJEEAN-KBIXCLLPSA-N 0.000 description 1
- PDRMRVHPAQKTLT-NAKRPEOUSA-N Cys-Ile-Val Chemical compound [H]N[C@@H](CS)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](C(C)C)C(O)=O PDRMRVHPAQKTLT-NAKRPEOUSA-N 0.000 description 1
- HKALUUKHYNEDRS-GUBZILKMSA-N Cys-Leu-Gln Chemical compound SC[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(O)=O HKALUUKHYNEDRS-GUBZILKMSA-N 0.000 description 1
- UCSXXFRXHGUXCQ-SRVKXCTJSA-N Cys-Leu-Lys Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CS)N UCSXXFRXHGUXCQ-SRVKXCTJSA-N 0.000 description 1
- XZFYRXDAULDNFX-UWVGGRQHSA-N Cys-Phe Chemical compound SC[C@H](N)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 XZFYRXDAULDNFX-UWVGGRQHSA-N 0.000 description 1
- LHJDLVVQRJIURS-SRVKXCTJSA-N Cys-Phe-Asp Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)O)NC(=O)[C@H](CS)N LHJDLVVQRJIURS-SRVKXCTJSA-N 0.000 description 1
- ZHCCYSDALWJITB-SRVKXCTJSA-N Cys-Phe-Cys Chemical compound N[C@@H](CS)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](CS)C(O)=O ZHCCYSDALWJITB-SRVKXCTJSA-N 0.000 description 1
- NMWZMKLDGZXRKP-BZSNNMDCSA-N Cys-Phe-Phe Chemical compound [H]N[C@@H](CS)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O NMWZMKLDGZXRKP-BZSNNMDCSA-N 0.000 description 1
- GFMJUESGWILPEN-MELADBBJSA-N Cys-Phe-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CC2=CC=CC=C2)NC(=O)[C@H](CS)N)C(=O)O GFMJUESGWILPEN-MELADBBJSA-N 0.000 description 1
- KJJASVYBTKRYSN-FXQIFTODSA-N Cys-Pro-Asp Chemical compound C1C[C@H](N(C1)C(=O)[C@H](CS)N)C(=O)N[C@@H](CC(=O)O)C(=O)O KJJASVYBTKRYSN-FXQIFTODSA-N 0.000 description 1
- JUNZLDGUJZIUCO-IHRRRGAJSA-N Cys-Pro-Tyr Chemical compound C1C[C@H](N(C1)C(=O)[C@H](CS)N)C(=O)N[C@@H](CC2=CC=C(C=C2)O)C(=O)O JUNZLDGUJZIUCO-IHRRRGAJSA-N 0.000 description 1
- YXQDRIRSAHTJKM-IMJSIDKUSA-N Cys-Ser Chemical compound SC[C@H](N)C(=O)N[C@@H](CO)C(O)=O YXQDRIRSAHTJKM-IMJSIDKUSA-N 0.000 description 1
- WYVKPHCYMTWUCW-YUPRTTJUSA-N Cys-Thr Chemical compound C[C@@H]([C@@H](C(=O)O)NC(=O)[C@H](CS)N)O WYVKPHCYMTWUCW-YUPRTTJUSA-N 0.000 description 1
- NRVQLLDIJJEIIZ-VZFHVOOUSA-N Cys-Thr-Ala Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](C)C(=O)O)NC(=O)[C@H](CS)N)O NRVQLLDIJJEIIZ-VZFHVOOUSA-N 0.000 description 1
- JLZCAZJGWNRXCI-XKBZYTNZSA-N Cys-Thr-Glu Chemical compound [H]N[C@@H](CS)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCC(O)=O)C(O)=O JLZCAZJGWNRXCI-XKBZYTNZSA-N 0.000 description 1
- SPJRFUJMDJGDRO-UBHSHLNASA-N Cys-Trp-Ser Chemical compound C1=CC=C2C(C[C@H](NC(=O)[C@H](CS)N)C(=O)N[C@@H](CO)C(O)=O)=CNC2=C1 SPJRFUJMDJGDRO-UBHSHLNASA-N 0.000 description 1
- UGPCUUWZXRMCIJ-KKUMJFAQSA-N Cys-Tyr-Leu Chemical compound CC(C)C[C@@H](C(=O)O)NC(=O)[C@H](CC1=CC=C(C=C1)O)NC(=O)[C@H](CS)N UGPCUUWZXRMCIJ-KKUMJFAQSA-N 0.000 description 1
- ZOMMHASZJQRLFS-IHRRRGAJSA-N Cys-Tyr-Val Chemical compound CC(C)[C@@H](C(=O)O)NC(=O)[C@H](CC1=CC=C(C=C1)O)NC(=O)[C@H](CS)N ZOMMHASZJQRLFS-IHRRRGAJSA-N 0.000 description 1
- MHYHLWUGWUBUHF-GUBZILKMSA-N Cys-Val-Arg Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCCN=C(N)N)C(=O)O)NC(=O)[C@H](CS)N MHYHLWUGWUBUHF-GUBZILKMSA-N 0.000 description 1
- IOLWXFWVYYCVTJ-NRPADANISA-N Cys-Val-Gln Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)O)NC(=O)[C@H](CS)N IOLWXFWVYYCVTJ-NRPADANISA-N 0.000 description 1
- 102000004127 Cytokines Human genes 0.000 description 1
- 108090000695 Cytokines Proteins 0.000 description 1
- HMFHBZSHGGEWLO-SOOFDHNKSA-N D-ribofuranose Chemical compound OC[C@H]1OC(O)[C@H](O)[C@@H]1O HMFHBZSHGGEWLO-SOOFDHNKSA-N 0.000 description 1
- 108020003215 DNA Probes Proteins 0.000 description 1
- 108010008286 DNA nucleotidylexotransferase Proteins 0.000 description 1
- 102100029764 DNA-directed DNA/RNA polymerase mu Human genes 0.000 description 1
- 206010011878 Deafness Diseases 0.000 description 1
- BWGNESOTFCXPMA-UHFFFAOYSA-N Dihydrogen disulfide Chemical compound SS BWGNESOTFCXPMA-UHFFFAOYSA-N 0.000 description 1
- 101710201734 E3 protein Proteins 0.000 description 1
- LVGKNOAMLMIIKO-UHFFFAOYSA-N Elaidinsaeure-aethylester Natural products CCCCCCCCC=CCCCCCCCC(=O)OCC LVGKNOAMLMIIKO-UHFFFAOYSA-N 0.000 description 1
- 241000196324 Embryophyta Species 0.000 description 1
- 108010042407 Endonucleases Proteins 0.000 description 1
- 241000701959 Escherichia virus Lambda Species 0.000 description 1
- 229920001917 Ficoll Polymers 0.000 description 1
- 108700028146 Genetic Enhancer Elements Proteins 0.000 description 1
- REJJNXODKSHOKA-ACZMJKKPSA-N Gln-Ala-Asp Chemical compound C[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)O)NC(=O)[C@H](CCC(=O)N)N REJJNXODKSHOKA-ACZMJKKPSA-N 0.000 description 1
- LKUWAWGNJYJODH-KBIXCLLPSA-N Gln-Ala-Ile Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](C)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O LKUWAWGNJYJODH-KBIXCLLPSA-N 0.000 description 1
- XXLBHPPXDUWYAG-XQXXSGGOSA-N Gln-Ala-Thr Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](C)C(=O)N[C@@H]([C@@H](C)O)C(O)=O XXLBHPPXDUWYAG-XQXXSGGOSA-N 0.000 description 1
- WOACHWLUOFZLGJ-GUBZILKMSA-N Gln-Arg-Gln Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(N)=O)C(O)=O WOACHWLUOFZLGJ-GUBZILKMSA-N 0.000 description 1
- ODBLJLZVLAWVMS-GUBZILKMSA-N Gln-Asn-Lys Chemical compound C(CCN)C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](CCC(=O)N)N ODBLJLZVLAWVMS-GUBZILKMSA-N 0.000 description 1
- QFTRCUPCARNIPZ-XHNCKOQMSA-N Gln-Cys-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CS)NC(=O)[C@H](CCC(=O)N)N)C(=O)O QFTRCUPCARNIPZ-XHNCKOQMSA-N 0.000 description 1
- LOJYQMFIIJVETK-WDSKDSINSA-N Gln-Gln Chemical compound NC(=O)CC[C@H](N)C(=O)N[C@@H](CCC(N)=O)C(O)=O LOJYQMFIIJVETK-WDSKDSINSA-N 0.000 description 1
- LPYPANUXJGFMGV-FXQIFTODSA-N Gln-Gln-Ala Chemical compound C[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)N)NC(=O)[C@H](CCC(=O)N)N LPYPANUXJGFMGV-FXQIFTODSA-N 0.000 description 1
- JZOYFBPIEHCDFV-YUMQZZPRSA-N Gln-His Chemical compound NC(=O)CC[C@H](N)C(=O)N[C@H](C(O)=O)CC1=CN=CN1 JZOYFBPIEHCDFV-YUMQZZPRSA-N 0.000 description 1
- NROSLUJMIQGFKS-IUCAKERBSA-N Gln-His-Gly Chemical compound C1=C(NC=N1)C[C@@H](C(=O)NCC(=O)O)NC(=O)[C@H](CCC(=O)N)N NROSLUJMIQGFKS-IUCAKERBSA-N 0.000 description 1
- IWUFOVSLWADEJC-AVGNSLFASA-N Gln-His-Leu Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CC(C)C)C(O)=O IWUFOVSLWADEJC-AVGNSLFASA-N 0.000 description 1
- JXBZEDIQFFCHPZ-PEFMBERDSA-N Gln-Ile-Asp Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)O)NC(=O)[C@H](CCC(=O)N)N JXBZEDIQFFCHPZ-PEFMBERDSA-N 0.000 description 1
- ZNTDJIMJKNNSLR-RWRJDSDZSA-N Gln-Ile-Thr Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H]([C@@H](C)O)C(=O)O)NC(=O)[C@H](CCC(=O)N)N ZNTDJIMJKNNSLR-RWRJDSDZSA-N 0.000 description 1
- FFVXLVGUJBCKRX-UKJIMTQDSA-N Gln-Ile-Val Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](C(C)C)C(=O)O)NC(=O)[C@H](CCC(=O)N)N FFVXLVGUJBCKRX-UKJIMTQDSA-N 0.000 description 1
- YPMDZWPZFOZYFG-GUBZILKMSA-N Gln-Leu-Ser Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(O)=O YPMDZWPZFOZYFG-GUBZILKMSA-N 0.000 description 1
- SXGMGNZEHFORAV-IUCAKERBSA-N Gln-Lys-Gly Chemical compound C(CCN)C[C@@H](C(=O)NCC(=O)O)NC(=O)[C@H](CCC(=O)N)N SXGMGNZEHFORAV-IUCAKERBSA-N 0.000 description 1
- KLKYKPXITJBSNI-CIUDSAMLSA-N Gln-Met-Ala Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](C)C(O)=O KLKYKPXITJBSNI-CIUDSAMLSA-N 0.000 description 1
- KVQOVQVGVKDZNW-GUBZILKMSA-N Gln-Ser-His Chemical compound C1=C(NC=N1)C[C@@H](C(=O)O)NC(=O)[C@H](CO)NC(=O)[C@H](CCC(=O)N)N KVQOVQVGVKDZNW-GUBZILKMSA-N 0.000 description 1
- SYZZMPFLOLSMHL-XHNCKOQMSA-N Gln-Ser-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CO)NC(=O)[C@H](CCC(=O)N)N)C(=O)O SYZZMPFLOLSMHL-XHNCKOQMSA-N 0.000 description 1
- SYTFJIQPBRJSOK-NKIYYHGXSA-N Gln-Thr-His Chemical compound NC(=O)CC[C@H](N)C(=O)N[C@@H]([C@H](O)C)C(=O)N[C@H](C(O)=O)CC1=CN=CN1 SYTFJIQPBRJSOK-NKIYYHGXSA-N 0.000 description 1
- OUBUHIODTNUUTC-WDCWCFNPSA-N Gln-Thr-Lys Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CCC(=O)N)N)O OUBUHIODTNUUTC-WDCWCFNPSA-N 0.000 description 1
- RONJIBWTGKVKFY-HTUGSXCWSA-N Gln-Thr-Phe Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)O)NC(=O)[C@H](CCC(=O)N)N)O RONJIBWTGKVKFY-HTUGSXCWSA-N 0.000 description 1
- XMWNHGKDDIFXQJ-NWLDYVSISA-N Gln-Thr-Trp Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CC1=CNC2=CC=CC=C21)C(=O)O)NC(=O)[C@H](CCC(=O)N)N)O XMWNHGKDDIFXQJ-NWLDYVSISA-N 0.000 description 1
- OEIDWQHTRYEYGG-QEJZJMRPSA-N Gln-Trp-Asp Chemical compound C1=CC=C2C(=C1)C(=CN2)C[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)O)NC(=O)[C@H](CCC(=O)N)N OEIDWQHTRYEYGG-QEJZJMRPSA-N 0.000 description 1
- JTWZNMUVQWWGOX-SOUVJXGZSA-N Gln-Tyr-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CC2=CC=C(C=C2)O)NC(=O)[C@H](CCC(=O)N)N)C(=O)O JTWZNMUVQWWGOX-SOUVJXGZSA-N 0.000 description 1
- MRVYVEQPNDSWLH-XPUUQOCRSA-N Gln-Val Chemical compound CC(C)[C@@H](C(O)=O)NC(=O)[C@@H](N)CCC(N)=O MRVYVEQPNDSWLH-XPUUQOCRSA-N 0.000 description 1
- ATRHMOJQJWPVBQ-DRZSPHRISA-N Glu-Ala-Phe Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](C)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O ATRHMOJQJWPVBQ-DRZSPHRISA-N 0.000 description 1
- MXOODARRORARSU-ACZMJKKPSA-N Glu-Ala-Ser Chemical compound C[C@@H](C(=O)N[C@@H](CO)C(=O)O)NC(=O)[C@H](CCC(=O)O)N MXOODARRORARSU-ACZMJKKPSA-N 0.000 description 1
- CVPXINNKRTZBMO-CIUDSAMLSA-N Glu-Arg-Asn Chemical compound C(C[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)[C@H](CCC(=O)O)N)CN=C(N)N CVPXINNKRTZBMO-CIUDSAMLSA-N 0.000 description 1
- GLWXKFRTOHKGIT-ACZMJKKPSA-N Glu-Asn-Asn Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(N)=O)C(O)=O GLWXKFRTOHKGIT-ACZMJKKPSA-N 0.000 description 1
- MLCPTRRNICEKIS-FXQIFTODSA-N Glu-Asn-Gln Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(N)=O)C(O)=O MLCPTRRNICEKIS-FXQIFTODSA-N 0.000 description 1
- YYOBUPFZLKQUAX-FXQIFTODSA-N Glu-Asn-Glu Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O YYOBUPFZLKQUAX-FXQIFTODSA-N 0.000 description 1
- ZOXBSICWUDAOHX-GUBZILKMSA-N Glu-Asn-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CC(N)=O)NC(=O)[C@@H](N)CCC(O)=O ZOXBSICWUDAOHX-GUBZILKMSA-N 0.000 description 1
- RDDSZZJOKDVPAE-ACZMJKKPSA-N Glu-Asn-Ser Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CO)C(O)=O RDDSZZJOKDVPAE-ACZMJKKPSA-N 0.000 description 1
- NTBDVNJIWCKURJ-ACZMJKKPSA-N Glu-Asp-Asn Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(N)=O)C(O)=O NTBDVNJIWCKURJ-ACZMJKKPSA-N 0.000 description 1
- FLQAKQOBSPFGKG-CIUDSAMLSA-N Glu-Cys-Arg Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@@H](CS)C(=O)N[C@H](C(O)=O)CCCN=C(N)N FLQAKQOBSPFGKG-CIUDSAMLSA-N 0.000 description 1
- ALCAUWPAMLVUDB-FXQIFTODSA-N Glu-Gln-Asn Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(N)=O)C(O)=O ALCAUWPAMLVUDB-FXQIFTODSA-N 0.000 description 1
- LVCHEMOPBORRLB-DCAQKATOSA-N Glu-Gln-Lys Chemical compound NCCCC[C@H](NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](N)CCC(O)=O)C(O)=O LVCHEMOPBORRLB-DCAQKATOSA-N 0.000 description 1
- HUFCEIHAFNVSNR-IHRRRGAJSA-N Glu-Gln-Tyr Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 HUFCEIHAFNVSNR-IHRRRGAJSA-N 0.000 description 1
- HNVFSTLPVJWIDV-CIUDSAMLSA-N Glu-Glu-Gln Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(N)=O)C(O)=O HNVFSTLPVJWIDV-CIUDSAMLSA-N 0.000 description 1
- LGYZYFFDELZWRS-DCAQKATOSA-N Glu-Glu-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@@H](N)CCC(O)=O LGYZYFFDELZWRS-DCAQKATOSA-N 0.000 description 1
- KASDBWKLWJKTLJ-GUBZILKMSA-N Glu-Glu-Met Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCSC)C(O)=O KASDBWKLWJKTLJ-GUBZILKMSA-N 0.000 description 1
- XMPAXPSENRSOSV-RYUDHWBXSA-N Glu-Gly-Tyr Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)NCC(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O XMPAXPSENRSOSV-RYUDHWBXSA-N 0.000 description 1
- HILMIYALTUQTRC-XVKPBYJWSA-N Glu-Gly-Val Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)NCC(=O)N[C@@H](C(C)C)C(O)=O HILMIYALTUQTRC-XVKPBYJWSA-N 0.000 description 1
- XIKYNVKEUINBGL-IUCAKERBSA-N Glu-His-Gly Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC1=CNC=N1)C(=O)NCC(O)=O XIKYNVKEUINBGL-IUCAKERBSA-N 0.000 description 1
- NJPQBTJSYCKCNS-HVTMNAMFSA-N Glu-His-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CC1=CN=CN1)NC(=O)[C@H](CCC(=O)O)N NJPQBTJSYCKCNS-HVTMNAMFSA-N 0.000 description 1
- WVTIBGWZUMJBFY-GUBZILKMSA-N Glu-His-Ser Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CO)C(O)=O WVTIBGWZUMJBFY-GUBZILKMSA-N 0.000 description 1
- ZHNHJYYFCGUZNQ-KBIXCLLPSA-N Glu-Ile-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@H]([C@@H](C)CC)NC(=O)[C@@H](N)CCC(O)=O ZHNHJYYFCGUZNQ-KBIXCLLPSA-N 0.000 description 1
- KRRFFAHEAOCBCQ-SIUGBPQLSA-N Glu-Ile-Tyr Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O KRRFFAHEAOCBCQ-SIUGBPQLSA-N 0.000 description 1
- YBAFDPFAUTYYRW-YUMQZZPRSA-N Glu-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@@H](N)CCC(O)=O YBAFDPFAUTYYRW-YUMQZZPRSA-N 0.000 description 1
- ATVYZJGOZLVXDK-IUCAKERBSA-N Glu-Leu-Gly Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)NCC(O)=O ATVYZJGOZLVXDK-IUCAKERBSA-N 0.000 description 1
- MWMJCGBSIORNCD-AVGNSLFASA-N Glu-Leu-Leu Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(O)=O MWMJCGBSIORNCD-AVGNSLFASA-N 0.000 description 1
- BBBXWRGITSUJPB-YUMQZZPRSA-N Glu-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@@H](N)CCC(O)=O BBBXWRGITSUJPB-YUMQZZPRSA-N 0.000 description 1
- CUPSDFQZTVVTSK-GUBZILKMSA-N Glu-Lys-Asp Chemical compound OC(=O)C[C@@H](C(O)=O)NC(=O)[C@H](CCCCN)NC(=O)[C@@H](N)CCC(O)=O CUPSDFQZTVVTSK-GUBZILKMSA-N 0.000 description 1
- RBXSZQRSEGYDFG-GUBZILKMSA-N Glu-Lys-Ser Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CO)C(O)=O RBXSZQRSEGYDFG-GUBZILKMSA-N 0.000 description 1
- NPMSEUWUMOSEFM-CIUDSAMLSA-N Glu-Met-Asn Chemical compound CSCC[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)[C@H](CCC(=O)O)N NPMSEUWUMOSEFM-CIUDSAMLSA-N 0.000 description 1
- LKOAAMXDJGEYMS-ZPFDUUQYSA-N Glu-Met-Ile Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCSC)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O LKOAAMXDJGEYMS-ZPFDUUQYSA-N 0.000 description 1
- CBEUFCJRFNZMCU-SRVKXCTJSA-N Glu-Met-Leu Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CC(C)C)C(O)=O CBEUFCJRFNZMCU-SRVKXCTJSA-N 0.000 description 1
- UQHGAYSULGRWRG-WHFBIAKZSA-N Glu-Ser Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@@H](CO)C(O)=O UQHGAYSULGRWRG-WHFBIAKZSA-N 0.000 description 1
- BPLNJYHNAJVLRT-ACZMJKKPSA-N Glu-Ser-Ala Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](C)C(O)=O BPLNJYHNAJVLRT-ACZMJKKPSA-N 0.000 description 1
- RFTVTKBHDXCEEX-WDSKDSINSA-N Glu-Ser-Gly Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CO)C(=O)NCC(O)=O RFTVTKBHDXCEEX-WDSKDSINSA-N 0.000 description 1
- ZAPFAWQHBOHWLL-GUBZILKMSA-N Glu-Ser-His Chemical compound C1=C(NC=N1)C[C@@H](C(=O)O)NC(=O)[C@H](CO)NC(=O)[C@H](CCC(=O)O)N ZAPFAWQHBOHWLL-GUBZILKMSA-N 0.000 description 1
- JSIQVRIXMINMTA-ZDLURKLDSA-N Glu-Thr Chemical compound C[C@@H](O)[C@@H](C(O)=O)NC(=O)[C@@H](N)CCC(O)=O JSIQVRIXMINMTA-ZDLURKLDSA-N 0.000 description 1
- HZISRJBYZAODRV-XQXXSGGOSA-N Glu-Thr-Ala Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C)C(O)=O HZISRJBYZAODRV-XQXXSGGOSA-N 0.000 description 1
- BDISFWMLMNBTGP-NUMRIWBASA-N Glu-Thr-Asp Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(O)=O)C(O)=O BDISFWMLMNBTGP-NUMRIWBASA-N 0.000 description 1
- LLEUXCDZPQOJMY-AAEUAGOBSA-N Glu-Trp Chemical compound C1=CC=C2C(C[C@H](NC(=O)[C@H](CCC(O)=O)N)C(O)=O)=CNC2=C1 LLEUXCDZPQOJMY-AAEUAGOBSA-N 0.000 description 1
- HGJREIGJLUQBTJ-SZMVWBNQSA-N Glu-Trp-Leu Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC1=CNC2=C1C=CC=C2)C(=O)N[C@@H](CC(C)C)C(O)=O HGJREIGJLUQBTJ-SZMVWBNQSA-N 0.000 description 1
- CGWHAXBNGYQBBK-JBACZVJFSA-N Glu-Trp-Tyr Chemical compound C([C@H](NC(=O)[C@H](CC=1C2=CC=CC=C2NC=1)NC(=O)[C@H](CCC(O)=O)N)C(O)=O)C1=CC=C(O)C=C1 CGWHAXBNGYQBBK-JBACZVJFSA-N 0.000 description 1
- RXJFSLQVMGYQEL-IHRRRGAJSA-N Glu-Tyr-Glu Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@H](C(=O)N[C@@H](CCC(O)=O)C(O)=O)CC1=CC=C(O)C=C1 RXJFSLQVMGYQEL-IHRRRGAJSA-N 0.000 description 1
- HJTSRYLPAYGEEC-SIUGBPQLSA-N Glu-Tyr-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CC1=CC=C(C=C1)O)NC(=O)[C@H](CCC(=O)O)N HJTSRYLPAYGEEC-SIUGBPQLSA-N 0.000 description 1
- HBMRTXJZQDVRFT-DZKIICNBSA-N Glu-Tyr-Val Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](C(C)C)C(O)=O HBMRTXJZQDVRFT-DZKIICNBSA-N 0.000 description 1
- SITLTJHOQZFJGG-XPUUQOCRSA-N Glu-Val Chemical compound CC(C)[C@@H](C(O)=O)NC(=O)[C@@H](N)CCC(O)=O SITLTJHOQZFJGG-XPUUQOCRSA-N 0.000 description 1
- UZWUBBRJWFTHTD-LAEOZQHASA-N Glu-Val-Asn Chemical compound NC(=O)C[C@@H](C(O)=O)NC(=O)[C@H](C(C)C)NC(=O)[C@@H](N)CCC(O)=O UZWUBBRJWFTHTD-LAEOZQHASA-N 0.000 description 1
- QXUPRMQJDWJDFR-NRPADANISA-N Glu-Val-Ser Chemical compound CC(C)[C@H](NC(=O)[C@@H](N)CCC(O)=O)C(=O)N[C@@H](CO)C(O)=O QXUPRMQJDWJDFR-NRPADANISA-N 0.000 description 1
- QXPRJQPCFXMCIY-NKWVEPMBSA-N Gly-Ala-Pro Chemical compound C[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)CN QXPRJQPCFXMCIY-NKWVEPMBSA-N 0.000 description 1
- JLXVRFDTDUGQEE-YFKPBYRVSA-N Gly-Arg Chemical compound NCC(=O)N[C@H](C(O)=O)CCCN=C(N)N JLXVRFDTDUGQEE-YFKPBYRVSA-N 0.000 description 1
- MXXXVOYFNVJHMA-IUCAKERBSA-N Gly-Arg-Met Chemical compound CSCC[C@@H](C(=O)O)NC(=O)[C@H](CCCN=C(N)N)NC(=O)CN MXXXVOYFNVJHMA-IUCAKERBSA-N 0.000 description 1
- XQHSBNVACKQWAV-WHFBIAKZSA-N Gly-Asp-Asn Chemical compound [H]NCC(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(N)=O)C(O)=O XQHSBNVACKQWAV-WHFBIAKZSA-N 0.000 description 1
- MFBYPDKTAJXHNI-VKHMYHEASA-N Gly-Cys Chemical compound [NH3+]CC(=O)N[C@@H](CS)C([O-])=O MFBYPDKTAJXHNI-VKHMYHEASA-N 0.000 description 1
- YYPFZVIXAVDHIK-IUCAKERBSA-N Gly-Glu-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)CN YYPFZVIXAVDHIK-IUCAKERBSA-N 0.000 description 1
- CCQOOWAONKGYKQ-BYPYZUCNSA-N Gly-Gly-Ala Chemical compound OC(=O)[C@H](C)NC(=O)CNC(=O)CN CCQOOWAONKGYKQ-BYPYZUCNSA-N 0.000 description 1
- VILQXLMSDPJBFR-IUCAKERBSA-N Gly-Gly-Cys-His Natural products NCC(=O)NCC(=O)N[C@@H](CS)C(=O)N[C@@H](Cc1c[nH]cn1)C(=O)O VILQXLMSDPJBFR-IUCAKERBSA-N 0.000 description 1
- UQJNXZSSGQIPIQ-FBCQKBJTSA-N Gly-Gly-Thr Chemical compound C[C@@H](O)[C@@H](C(O)=O)NC(=O)CNC(=O)CN UQJNXZSSGQIPIQ-FBCQKBJTSA-N 0.000 description 1
- IVSWQHKONQIOHA-YUMQZZPRSA-N Gly-His-Cys Chemical compound C1=C(NC=N1)C[C@@H](C(=O)N[C@@H](CS)C(=O)O)NC(=O)CN IVSWQHKONQIOHA-YUMQZZPRSA-N 0.000 description 1
- HHSOPSCKAZKQHQ-PEXQALLHSA-N Gly-His-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CC1=CN=CN1)NC(=O)CN HHSOPSCKAZKQHQ-PEXQALLHSA-N 0.000 description 1
- FSPVILZGHUJOHS-QWRGUYRKSA-N Gly-His-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)CN)CC1=CNC=N1 FSPVILZGHUJOHS-QWRGUYRKSA-N 0.000 description 1
- UUWOBINZFGTFMS-UWVGGRQHSA-N Gly-His-Met Chemical compound [H]NCC(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CCSC)C(O)=O UUWOBINZFGTFMS-UWVGGRQHSA-N 0.000 description 1
- YFGONBOFGGWKKY-VHSXEESVSA-N Gly-His-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CC2=CN=CN2)NC(=O)CN)C(=O)O YFGONBOFGGWKKY-VHSXEESVSA-N 0.000 description 1
- SWQALSGKVLYKDT-UHFFFAOYSA-N Gly-Ile-Ala Natural products NCC(=O)NC(C(C)CC)C(=O)NC(C)C(O)=O SWQALSGKVLYKDT-UHFFFAOYSA-N 0.000 description 1
- LUJVWKKYHSLULQ-ZKWXMUAHSA-N Gly-Ile-Cys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CS)C(=O)O)NC(=O)CN LUJVWKKYHSLULQ-ZKWXMUAHSA-N 0.000 description 1
- AAHSHTLISQUZJL-QSFUFRPTSA-N Gly-Ile-Ile Chemical compound [H]NCC(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O AAHSHTLISQUZJL-QSFUFRPTSA-N 0.000 description 1
- HAXARWKYFIIHKD-ZKWXMUAHSA-N Gly-Ile-Ser Chemical compound NCC(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CO)C(O)=O HAXARWKYFIIHKD-ZKWXMUAHSA-N 0.000 description 1
- COVXELOAORHTND-LSJOCFKGSA-N Gly-Ile-Val Chemical compound NCC(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](C(C)C)C(O)=O COVXELOAORHTND-LSJOCFKGSA-N 0.000 description 1
- DKEXFJVMVGETOO-LURJTMIESA-N Gly-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)CN DKEXFJVMVGETOO-LURJTMIESA-N 0.000 description 1
- YTSVAIMKVLZUDU-YUMQZZPRSA-N Gly-Leu-Asp Chemical compound [H]NCC(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(O)=O)C(O)=O YTSVAIMKVLZUDU-YUMQZZPRSA-N 0.000 description 1
- LRQXRHGQEVWGPV-NHCYSSNCSA-N Gly-Leu-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)CN LRQXRHGQEVWGPV-NHCYSSNCSA-N 0.000 description 1
- NNCSJUBVFBDDLC-YUMQZZPRSA-N Gly-Leu-Ser Chemical compound NCC(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(O)=O NNCSJUBVFBDDLC-YUMQZZPRSA-N 0.000 description 1
- PDUHNKAFQXQNLH-ZETCQYMHSA-N Gly-Lys-Gly Chemical compound NCCCC[C@H](NC(=O)CN)C(=O)NCC(O)=O PDUHNKAFQXQNLH-ZETCQYMHSA-N 0.000 description 1
- IUKIDFVOUHZRAK-QWRGUYRKSA-N Gly-Lys-His Chemical compound NCCCC[C@H](NC(=O)CN)C(=O)N[C@H](C(O)=O)CC1=CN=CN1 IUKIDFVOUHZRAK-QWRGUYRKSA-N 0.000 description 1
- MHZXESQPPXOING-KBPBESRZSA-N Gly-Lys-Phe Chemical compound [H]NCC(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O MHZXESQPPXOING-KBPBESRZSA-N 0.000 description 1
- FXGRXIATVXUAHO-WEDXCCLWSA-N Gly-Lys-Thr Chemical compound C[C@@H](O)[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)CN)CCCCN FXGRXIATVXUAHO-WEDXCCLWSA-N 0.000 description 1
- RUDRIZRGOLQSMX-IUCAKERBSA-N Gly-Met-Met Chemical compound [H]NCC(=O)N[C@@H](CCSC)C(=O)N[C@@H](CCSC)C(O)=O RUDRIZRGOLQSMX-IUCAKERBSA-N 0.000 description 1
- OMOZPGCHVWOXHN-BQBZGAKWSA-N Gly-Met-Ser Chemical compound CSCC[C@@H](C(=O)N[C@@H](CO)C(=O)O)NC(=O)CN OMOZPGCHVWOXHN-BQBZGAKWSA-N 0.000 description 1
- YLEIWGJJBFBFHC-KBPBESRZSA-N Gly-Phe-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)CN)CC1=CC=CC=C1 YLEIWGJJBFBFHC-KBPBESRZSA-N 0.000 description 1
- OOCFXNOVSLSHAB-IUCAKERBSA-N Gly-Pro-Pro Chemical compound NCC(=O)N1CCC[C@H]1C(=O)N1[C@H](C(O)=O)CCC1 OOCFXNOVSLSHAB-IUCAKERBSA-N 0.000 description 1
- ISSDODCYBOWWIP-GJZGRUSLSA-N Gly-Pro-Trp Chemical compound [H]NCC(=O)N1CCC[C@H]1C(=O)N[C@@H](CC1=CNC2=C1C=CC=C2)C(O)=O ISSDODCYBOWWIP-GJZGRUSLSA-N 0.000 description 1
- LBDXVCBAJJNJNN-WHFBIAKZSA-N Gly-Ser-Cys Chemical compound NCC(=O)N[C@@H](CO)C(=O)N[C@@H](CS)C(O)=O LBDXVCBAJJNJNN-WHFBIAKZSA-N 0.000 description 1
- WNGHUXFWEWTKAO-YUMQZZPRSA-N Gly-Ser-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CO)NC(=O)CN WNGHUXFWEWTKAO-YUMQZZPRSA-N 0.000 description 1
- OLIFSFOFKGKIRH-WUJLRWPWSA-N Gly-Thr Chemical compound C[C@@H](O)[C@@H](C(O)=O)NC(=O)CN OLIFSFOFKGKIRH-WUJLRWPWSA-N 0.000 description 1
- LLWQVJNHMYBLLK-CDMKHQONSA-N Gly-Thr-Phe Chemical compound [H]NCC(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O LLWQVJNHMYBLLK-CDMKHQONSA-N 0.000 description 1
- ONSARSFSJHTMFJ-STQMWFEESA-N Gly-Trp-Ser Chemical compound [H]NCC(=O)N[C@@H](CC1=CNC2=C1C=CC=C2)C(=O)N[C@@H](CO)C(O)=O ONSARSFSJHTMFJ-STQMWFEESA-N 0.000 description 1
- YJDALMUYJIENAG-QWRGUYRKSA-N Gly-Tyr-Asn Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)CN)O YJDALMUYJIENAG-QWRGUYRKSA-N 0.000 description 1
- IHDKKJVBLGXLEL-STQMWFEESA-N Gly-Tyr-Met Chemical compound CSCC[C@H](NC(=O)[C@H](Cc1ccc(O)cc1)NC(=O)CN)C(O)=O IHDKKJVBLGXLEL-STQMWFEESA-N 0.000 description 1
- 108090000288 Glycoproteins Proteins 0.000 description 1
- 102000003886 Glycoproteins Human genes 0.000 description 1
- 108091006067 Goα proteins Proteins 0.000 description 1
- 108010043121 Green Fluorescent Proteins Proteins 0.000 description 1
- 102000004144 Green Fluorescent Proteins Human genes 0.000 description 1
- XINDHUAGVGCNSF-QSFUFRPTSA-N His-Ala-Ile Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](C)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O XINDHUAGVGCNSF-QSFUFRPTSA-N 0.000 description 1
- PDSUIXMZYNURGI-AVGNSLFASA-N His-Arg-Arg Chemical compound NC(N)=NCCC[C@@H](C(O)=O)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@@H](N)CC1=CN=CN1 PDSUIXMZYNURGI-AVGNSLFASA-N 0.000 description 1
- UOAVQQRILDGZEN-SRVKXCTJSA-N His-Asp-Leu Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(C)C)C(O)=O UOAVQQRILDGZEN-SRVKXCTJSA-N 0.000 description 1
- LDTJBEOANMQRJE-CIUDSAMLSA-N His-Cys-Asp Chemical compound C1=C(NC=N1)C[C@@H](C(=O)N[C@@H](CS)C(=O)N[C@@H](CC(=O)O)C(=O)O)N LDTJBEOANMQRJE-CIUDSAMLSA-N 0.000 description 1
- UJWYPUUXIAKEES-CUJWVEQBSA-N His-Cys-Thr Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CS)C(=O)N[C@@H]([C@@H](C)O)C(O)=O UJWYPUUXIAKEES-CUJWVEQBSA-N 0.000 description 1
- NWGXCPUKPVISSJ-AVGNSLFASA-N His-Gln-Lys Chemical compound C1=C(NC=N1)C[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H](CCCCN)C(=O)O)N NWGXCPUKPVISSJ-AVGNSLFASA-N 0.000 description 1
- IIVZNQCUUMBBKF-GVXVVHGQSA-N His-Gln-Val Chemical compound CC(C)[C@@H](C(O)=O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](N)CC1=CN=CN1 IIVZNQCUUMBBKF-GVXVVHGQSA-N 0.000 description 1
- VHOLZZKNEBBHTH-YUMQZZPRSA-N His-Glu Chemical compound OC(=O)CC[C@@H](C(O)=O)NC(=O)[C@@H](N)CC1=CNC=N1 VHOLZZKNEBBHTH-YUMQZZPRSA-N 0.000 description 1
- HAPWZEVRQYGLSG-IUCAKERBSA-N His-Gly-Glu Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)NCC(=O)N[C@@H](CCC(O)=O)C(O)=O HAPWZEVRQYGLSG-IUCAKERBSA-N 0.000 description 1
- NTXIJPDAHXSHNL-ONGXEEELSA-N His-Gly-Val Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)NCC(=O)N[C@@H](C(C)C)C(O)=O NTXIJPDAHXSHNL-ONGXEEELSA-N 0.000 description 1
- CTJHHEQNUNIYNN-SRVKXCTJSA-N His-His-Asn Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CC(N)=O)C(O)=O CTJHHEQNUNIYNN-SRVKXCTJSA-N 0.000 description 1
- KAFZDWMZKGQDEE-SRVKXCTJSA-N His-His-Asp Chemical compound C1=C(NC=N1)C[C@@H](C(=O)N[C@@H](CC2=CN=CN2)C(=O)N[C@@H](CC(=O)O)C(=O)O)N KAFZDWMZKGQDEE-SRVKXCTJSA-N 0.000 description 1
- AKAPKBNIVNPIPO-KKUMJFAQSA-N His-His-Lys Chemical compound C([C@@H](C(=O)N[C@@H](CCCCN)C(O)=O)NC(=O)[C@@H](N)CC=1NC=NC=1)C1=CN=CN1 AKAPKBNIVNPIPO-KKUMJFAQSA-N 0.000 description 1
- MPXGJGBXCRQQJE-MXAVVETBSA-N His-Ile-Leu Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(C)C)C(O)=O MPXGJGBXCRQQJE-MXAVVETBSA-N 0.000 description 1
- BILZDIPAKWZFSG-PYJNHQTQSA-N His-Ile-Met Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCSC)C(=O)O)NC(=O)[C@H](CC1=CN=CN1)N BILZDIPAKWZFSG-PYJNHQTQSA-N 0.000 description 1
- MFQVZYSPCIZFMR-MGHWNKPDSA-N His-Ile-Phe Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)O)NC(=O)[C@H](CC2=CN=CN2)N MFQVZYSPCIZFMR-MGHWNKPDSA-N 0.000 description 1
- SKYULSWNBYAQMG-IHRRRGAJSA-N His-Leu-Arg Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O SKYULSWNBYAQMG-IHRRRGAJSA-N 0.000 description 1
- AIPUZFXMXAHZKY-QWRGUYRKSA-N His-Leu-Gly Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CC(C)C)C(=O)NCC(O)=O AIPUZFXMXAHZKY-QWRGUYRKSA-N 0.000 description 1
- KHUFDBQXGLEIHC-BZSNNMDCSA-N His-Leu-Tyr Chemical compound C([C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(O)=O)C1=CN=CN1 KHUFDBQXGLEIHC-BZSNNMDCSA-N 0.000 description 1
- QEYUCKCWTMIERU-SRVKXCTJSA-N His-Lys-Asp Chemical compound C1=C(NC=N1)C[C@@H](C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(=O)O)C(=O)O)N QEYUCKCWTMIERU-SRVKXCTJSA-N 0.000 description 1
- UMBKDWGQESDCTO-KKUMJFAQSA-N His-Lys-Lys Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCCN)C(O)=O UMBKDWGQESDCTO-KKUMJFAQSA-N 0.000 description 1
- WYSJPCTWSBJFCO-AVGNSLFASA-N His-Met-Val Chemical compound CC(C)[C@@H](C(=O)O)NC(=O)[C@H](CCSC)NC(=O)[C@H](CC1=CN=CN1)N WYSJPCTWSBJFCO-AVGNSLFASA-N 0.000 description 1
- KYFGGRHWLFZXPU-KKUMJFAQSA-N His-Phe-Asn Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)[C@H](CC2=CN=CN2)N KYFGGRHWLFZXPU-KKUMJFAQSA-N 0.000 description 1
- GNBHSMFBUNEWCJ-DCAQKATOSA-N His-Pro-Asn Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CC(N)=O)C(O)=O GNBHSMFBUNEWCJ-DCAQKATOSA-N 0.000 description 1
- BZAQOPHNBFOOJS-DCAQKATOSA-N His-Pro-Asp Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CC(O)=O)C(O)=O BZAQOPHNBFOOJS-DCAQKATOSA-N 0.000 description 1
- KRBMQYPTDYSENE-BQBZGAKWSA-N His-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@@H](N)CC1=CNC=N1 KRBMQYPTDYSENE-BQBZGAKWSA-N 0.000 description 1
- VIJMRAIWYWRXSR-CIUDSAMLSA-N His-Ser-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@H](CO)NC(=O)[C@@H](N)CC1=CN=CN1 VIJMRAIWYWRXSR-CIUDSAMLSA-N 0.000 description 1
- ILUVWFTXAUYOBW-CUJWVEQBSA-N His-Ser-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](CO)NC(=O)[C@H](CC1=CN=CN1)N)O ILUVWFTXAUYOBW-CUJWVEQBSA-N 0.000 description 1
- XVZJRZQIHJMUBG-TUBUOCAGSA-N His-Thr-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H]([C@@H](C)O)NC(=O)[C@H](CC1=CN=CN1)N XVZJRZQIHJMUBG-TUBUOCAGSA-N 0.000 description 1
- DLTCGJZBNFOWFL-LKTVYLICSA-N His-Tyr-Ala Chemical compound C[C@@H](C(=O)O)NC(=O)[C@H](CC1=CC=C(C=C1)O)NC(=O)[C@H](CC2=CN=CN2)N DLTCGJZBNFOWFL-LKTVYLICSA-N 0.000 description 1
- LPBWRHRHEIYAIP-KKUMJFAQSA-N His-Tyr-Asp Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC(O)=O)C(O)=O LPBWRHRHEIYAIP-KKUMJFAQSA-N 0.000 description 1
- ZHMZWSFQRUGLEC-JYJNAYRXSA-N His-Tyr-Glu Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CCC(O)=O)C(O)=O ZHMZWSFQRUGLEC-JYJNAYRXSA-N 0.000 description 1
- VLDVBZICYBVQHB-IUCAKERBSA-N His-Val Chemical compound CC(C)[C@@H](C([O-])=O)NC(=O)[C@@H]([NH3+])CC1=CN=CN1 VLDVBZICYBVQHB-IUCAKERBSA-N 0.000 description 1
- MCGOGXFMKHPMSQ-AVGNSLFASA-N His-Val-Met Chemical compound CSCC[C@@H](C(O)=O)NC(=O)[C@H](C(C)C)NC(=O)[C@@H](N)CC1=CN=CN1 MCGOGXFMKHPMSQ-AVGNSLFASA-N 0.000 description 1
- 101000690301 Homo sapiens Aldo-keto reductase family 1 member C4 Proteins 0.000 description 1
- 101000715592 Homo sapiens Cocaine- and amphetamine-regulated transcript protein Proteins 0.000 description 1
- 101000935587 Homo sapiens Flavin reductase (NADPH) Proteins 0.000 description 1
- 101001116548 Homo sapiens Protein CBFA2T1 Proteins 0.000 description 1
- 101000693970 Homo sapiens Scavenger receptor class A member 3 Proteins 0.000 description 1
- 108010001336 Horseradish Peroxidase Proteins 0.000 description 1
- 241000701024 Human betaherpesvirus 5 Species 0.000 description 1
- VWFWJPJMAFRNET-MQWMPURMSA-N IACI Chemical compound C1=CC=C2C([C@@H](O)[C@@H]3CC4CCN3C[C@@H]4CC)=CC=NC2=C1NC(=O)C1=CC([125I])=C(N=[N+]=[N-])C=C1O VWFWJPJMAFRNET-MQWMPURMSA-N 0.000 description 1
- AQCUAZTZSPQJFF-ZKWXMUAHSA-N Ile-Ala-Gly Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](C)C(=O)NCC(O)=O AQCUAZTZSPQJFF-ZKWXMUAHSA-N 0.000 description 1
- YPWHUFAAMNHMGS-QSFUFRPTSA-N Ile-Ala-His Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](C)C(=O)N[C@@H](CC1=CN=CN1)C(=O)O)N YPWHUFAAMNHMGS-QSFUFRPTSA-N 0.000 description 1
- VAXBXNPRXPHGHG-BJDJZHNGSA-N Ile-Ala-Leu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](C)C(=O)N[C@@H](CC(C)C)C(=O)O)N VAXBXNPRXPHGHG-BJDJZHNGSA-N 0.000 description 1
- HYXQKVOADYPQEA-CIUDSAMLSA-N Ile-Arg Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@H](C(O)=O)CCCN=C(N)N HYXQKVOADYPQEA-CIUDSAMLSA-N 0.000 description 1
- YOTNPRLPIPHQSB-XUXIUFHCSA-N Ile-Arg-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CCCCN)C(=O)O)N YOTNPRLPIPHQSB-XUXIUFHCSA-N 0.000 description 1
- CWJQMCPYXNVMBS-STECZYCISA-N Ile-Arg-Tyr Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CC1=CC=C(C=C1)O)C(=O)O)N CWJQMCPYXNVMBS-STECZYCISA-N 0.000 description 1
- UAVQIQOOBXFKRC-BYULHYEWSA-N Ile-Asn-Gly Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](CC(N)=O)C(=O)NCC(O)=O UAVQIQOOBXFKRC-BYULHYEWSA-N 0.000 description 1
- YPQDTQJBOFOTJQ-SXTJYALSSA-N Ile-Asn-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H]([C@@H](C)CC)C(=O)O)N YPQDTQJBOFOTJQ-SXTJYALSSA-N 0.000 description 1
- LEDRIAHEWDJRMF-CFMVVWHZSA-N Ile-Asn-Tyr Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 LEDRIAHEWDJRMF-CFMVVWHZSA-N 0.000 description 1
- WKXVAXOSIPTXEC-HAFWLYHUSA-N Ile-Asp Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@H](C(O)=O)CC(O)=O WKXVAXOSIPTXEC-HAFWLYHUSA-N 0.000 description 1
- NBJAAWYRLGCJOF-UGYAYLCHSA-N Ile-Asp-Asn Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CC(=O)N)C(=O)O)N NBJAAWYRLGCJOF-UGYAYLCHSA-N 0.000 description 1
- HVWXAQVMRBKKFE-UGYAYLCHSA-N Ile-Asp-Asp Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CC(=O)O)C(=O)O)N HVWXAQVMRBKKFE-UGYAYLCHSA-N 0.000 description 1
- NPROWIBAWYMPAZ-GUDRVLHUSA-N Ile-Asp-Pro Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N1CCC[C@@H]1C(=O)O)N NPROWIBAWYMPAZ-GUDRVLHUSA-N 0.000 description 1
- JSZMKEYEVLDPDO-ACZMJKKPSA-N Ile-Cys Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](CS)C(O)=O JSZMKEYEVLDPDO-ACZMJKKPSA-N 0.000 description 1
- CNPNWGHRMBQHBZ-ZKWXMUAHSA-N Ile-Gln Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@H](C(O)=O)CCC(N)=O CNPNWGHRMBQHBZ-ZKWXMUAHSA-N 0.000 description 1
- DMZOUKXXHJQPTL-GRLWGSQLSA-N Ile-Gln-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H]([C@@H](C)CC)C(=O)O)N DMZOUKXXHJQPTL-GRLWGSQLSA-N 0.000 description 1
- BALLIXFZYSECCF-QEWYBTABSA-N Ile-Gln-Phe Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)O)N BALLIXFZYSECCF-QEWYBTABSA-N 0.000 description 1
- YBJWJQQBWRARLT-KBIXCLLPSA-N Ile-Gln-Ser Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CO)C(O)=O YBJWJQQBWRARLT-KBIXCLLPSA-N 0.000 description 1
- LGMUPVWZEYYUMU-YVNDNENWSA-N Ile-Glu-Gln Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CCC(=O)N)C(=O)O)N LGMUPVWZEYYUMU-YVNDNENWSA-N 0.000 description 1
- LPXHYGGZJOCAFR-MNXVOIDGSA-N Ile-Glu-Leu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CC(C)C)C(=O)O)N LPXHYGGZJOCAFR-MNXVOIDGSA-N 0.000 description 1
- DFJJAVZIHDFOGQ-MNXVOIDGSA-N Ile-Glu-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CCCCN)C(=O)O)N DFJJAVZIHDFOGQ-MNXVOIDGSA-N 0.000 description 1
- LEHPJMKVGFPSSP-ZQINRCPSSA-N Ile-Glu-Trp Chemical compound C1=CC=C2C(C[C@H](NC(=O)[C@H](CCC(O)=O)NC(=O)[C@@H](N)[C@@H](C)CC)C(O)=O)=CNC2=C1 LEHPJMKVGFPSSP-ZQINRCPSSA-N 0.000 description 1
- UCGDDTHMMVWVMV-FSPLSTOPSA-N Ile-Gly Chemical compound CC[C@H](C)[C@H](N)C(=O)NCC(O)=O UCGDDTHMMVWVMV-FSPLSTOPSA-N 0.000 description 1
- LPFBXFILACZHIB-LAEOZQHASA-N Ile-Gly-Glu Chemical compound CC[C@H](C)[C@@H](C(=O)NCC(=O)N[C@@H](CCC(=O)O)C(=O)O)N LPFBXFILACZHIB-LAEOZQHASA-N 0.000 description 1
- CDGLBYSAZFIIJO-RCOVLWMOSA-N Ile-Gly-Gly Chemical compound CC[C@H](C)[C@H]([NH3+])C(=O)NCC(=O)NCC([O-])=O CDGLBYSAZFIIJO-RCOVLWMOSA-N 0.000 description 1
- LBRCLQMZAHRTLV-ZKWXMUAHSA-N Ile-Gly-Ser Chemical compound CC[C@H](C)[C@H](N)C(=O)NCC(=O)N[C@@H](CO)C(O)=O LBRCLQMZAHRTLV-ZKWXMUAHSA-N 0.000 description 1
- VOBYAKCXGQQFLR-LSJOCFKGSA-N Ile-Gly-Val Chemical compound CC[C@H](C)[C@H](N)C(=O)NCC(=O)N[C@@H](C(C)C)C(O)=O VOBYAKCXGQQFLR-LSJOCFKGSA-N 0.000 description 1
- HYLIOBDWPQNLKI-HVTMNAMFSA-N Ile-His-Gln Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CN=CN1)C(=O)N[C@@H](CCC(=O)N)C(=O)O)N HYLIOBDWPQNLKI-HVTMNAMFSA-N 0.000 description 1
- BBQABUDWDUKJMB-LZXPERKUSA-N Ile-Ile-Ile Chemical compound CC[C@H](C)[C@H]([NH3+])C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)CC)C([O-])=O BBQABUDWDUKJMB-LZXPERKUSA-N 0.000 description 1
- TWPSALMCEHCIOY-YTFOTSKYSA-N Ile-Ile-Leu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(C)C)C(=O)O)N TWPSALMCEHCIOY-YTFOTSKYSA-N 0.000 description 1
- GLLAUPMJCGKPFY-BLMTYFJBSA-N Ile-Ile-Trp Chemical compound C1=CC=C2C(C[C@H](NC(=O)[C@H]([C@@H](C)CC)NC(=O)[C@@H](N)[C@@H](C)CC)C(O)=O)=CNC2=C1 GLLAUPMJCGKPFY-BLMTYFJBSA-N 0.000 description 1
- QZZIBQZLWBOOJH-PEDHHIEDSA-N Ile-Ile-Val Chemical compound N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](C(C)C)C(=O)O QZZIBQZLWBOOJH-PEDHHIEDSA-N 0.000 description 1
- OUUCIIJSBIBCHB-ZPFDUUQYSA-N Ile-Leu-Asp Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(O)=O)C(O)=O OUUCIIJSBIBCHB-ZPFDUUQYSA-N 0.000 description 1
- DSDPLOODKXISDT-XUXIUFHCSA-N Ile-Leu-Val Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C(C)C)C(O)=O DSDPLOODKXISDT-XUXIUFHCSA-N 0.000 description 1
- XDUVMJCBYUKNFJ-MXAVVETBSA-N Ile-Lys-His Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC1=CN=CN1)C(=O)O)N XDUVMJCBYUKNFJ-MXAVVETBSA-N 0.000 description 1
- WVUDHMBJNBWZBU-XUXIUFHCSA-N Ile-Lys-Met Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCSC)C(=O)O)N WVUDHMBJNBWZBU-XUXIUFHCSA-N 0.000 description 1
- MSASLZGZQAXVFP-PEDHHIEDSA-N Ile-Met-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCSC)C(=O)N[C@@H]([C@@H](C)CC)C(=O)O)N MSASLZGZQAXVFP-PEDHHIEDSA-N 0.000 description 1
- WMDZARSFSMZOQO-DRZSPHRISA-N Ile-Phe Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 WMDZARSFSMZOQO-DRZSPHRISA-N 0.000 description 1
- SAVXZJYTTQQQDD-QEWYBTABSA-N Ile-Phe-Glu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCC(=O)O)C(=O)O)N SAVXZJYTTQQQDD-QEWYBTABSA-N 0.000 description 1
- WYUHAXJAMDTOAU-IAVJCBSLSA-N Ile-Phe-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H]([C@@H](C)CC)C(=O)O)N WYUHAXJAMDTOAU-IAVJCBSLSA-N 0.000 description 1
- XLXPYSDGMXTTNQ-UHFFFAOYSA-N Ile-Phe-Leu Natural products CCC(C)C(N)C(=O)NC(C(=O)NC(CC(C)C)C(O)=O)CC1=CC=CC=C1 XLXPYSDGMXTTNQ-UHFFFAOYSA-N 0.000 description 1
- VISRCHQHQCLODA-NAKRPEOUSA-N Ile-Pro-Cys Chemical compound CC[C@H](C)[C@@H](C(=O)N1CCC[C@H]1C(=O)N[C@@H](CS)C(=O)O)N VISRCHQHQCLODA-NAKRPEOUSA-N 0.000 description 1
- NLZVTPYXYXMCIP-XUXIUFHCSA-N Ile-Pro-Lys Chemical compound CC[C@H](C)[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCCCN)C(O)=O NLZVTPYXYXMCIP-XUXIUFHCSA-N 0.000 description 1
- XMYURPUVJSKTMC-KBIXCLLPSA-N Ile-Ser-Gln Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(=O)N)C(=O)O)N XMYURPUVJSKTMC-KBIXCLLPSA-N 0.000 description 1
- ZLFNNVATRMCAKN-ZKWXMUAHSA-N Ile-Ser-Gly Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CO)C(=O)NCC(=O)O)N ZLFNNVATRMCAKN-ZKWXMUAHSA-N 0.000 description 1
- PXKACEXYLPBMAD-JBDRJPRFSA-N Ile-Ser-Ser Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](CO)C(=O)O)N PXKACEXYLPBMAD-JBDRJPRFSA-N 0.000 description 1
- DRCKHKZYDLJYFQ-YWIQKCBGSA-N Ile-Thr Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H]([C@@H](C)O)C(O)=O DRCKHKZYDLJYFQ-YWIQKCBGSA-N 0.000 description 1
- WCNWGAUZWWSYDG-SVSWQMSJSA-N Ile-Thr-Ser Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CO)C(=O)O)N WCNWGAUZWWSYDG-SVSWQMSJSA-N 0.000 description 1
- QHUREMVLLMNUAX-OSUNSFLBSA-N Ile-Thr-Val Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C(C)C)C(=O)O)N QHUREMVLLMNUAX-OSUNSFLBSA-N 0.000 description 1
- BZUOLKFQVVBTJY-SLBDDTMCSA-N Ile-Trp-Asn Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CNC2=CC=CC=C21)C(=O)N[C@@H](CC(=O)N)C(=O)O)N BZUOLKFQVVBTJY-SLBDDTMCSA-N 0.000 description 1
- DTPGSUQHUMELQB-GVARAGBVSA-N Ile-Tyr-Ala Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@H](C(=O)N[C@@H](C)C(O)=O)CC1=CC=C(O)C=C1 DTPGSUQHUMELQB-GVARAGBVSA-N 0.000 description 1
- GNXGAVNTVNOCLL-SIUGBPQLSA-N Ile-Tyr-Gln Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CC=C(C=C1)O)C(=O)N[C@@H](CCC(=O)N)C(=O)O)N GNXGAVNTVNOCLL-SIUGBPQLSA-N 0.000 description 1
- NUEHSWNAFIEBCQ-NAKRPEOUSA-N Ile-Val-Cys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CS)C(=O)O)N NUEHSWNAFIEBCQ-NAKRPEOUSA-N 0.000 description 1
- JCGMFFQQHJQASB-PYJNHQTQSA-N Ile-Val-His Chemical compound N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC1=CNC=N1)C(=O)O JCGMFFQQHJQASB-PYJNHQTQSA-N 0.000 description 1
- JZBVBOKASHNXAD-NAKRPEOUSA-N Ile-Val-Ser Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CO)C(=O)O)N JZBVBOKASHNXAD-NAKRPEOUSA-N 0.000 description 1
- 108060003951 Immunoglobulin Proteins 0.000 description 1
- 108010067060 Immunoglobulin Variable Region Proteins 0.000 description 1
- 229930010555 Inosine Natural products 0.000 description 1
- UGQMRVRMYYASKQ-KQYNXXCUSA-N Inosine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C2=NC=NC(O)=C2N=C1 UGQMRVRMYYASKQ-KQYNXXCUSA-N 0.000 description 1
- PMGDADKJMCOXHX-UHFFFAOYSA-N L-Arginyl-L-glutamin-acetat Natural products NC(=N)NCCCC(N)C(=O)NC(CCC(N)=O)C(O)=O PMGDADKJMCOXHX-UHFFFAOYSA-N 0.000 description 1
- HHSJMSCOLJVTCX-ZDLURKLDSA-N L-Glutaminyl-L-threonine Chemical compound C[C@@H](O)[C@@H](C(O)=O)NC(=O)[C@@H](N)CCC(N)=O HHSJMSCOLJVTCX-ZDLURKLDSA-N 0.000 description 1
- FADYJNXDPBKVCA-UHFFFAOYSA-N L-Phenylalanyl-L-lysin Natural products NCCCCC(C(O)=O)NC(=O)C(N)CC1=CC=CC=C1 FADYJNXDPBKVCA-UHFFFAOYSA-N 0.000 description 1
- ONIBWKKTOPOVIA-BYPYZUCNSA-N L-Proline Chemical compound OC(=O)[C@@H]1CCCN1 ONIBWKKTOPOVIA-BYPYZUCNSA-N 0.000 description 1
- DCXYFEDJOCDNAF-REOHCLBHSA-N L-asparagine Chemical compound OC(=O)[C@@H](N)CC(N)=O DCXYFEDJOCDNAF-REOHCLBHSA-N 0.000 description 1
- FFEARJCKVFRZRR-BYPYZUCNSA-N L-methionine Chemical compound CSCC[C@H](N)C(O)=O FFEARJCKVFRZRR-BYPYZUCNSA-N 0.000 description 1
- RNKSNIBMTUYWSH-YFKPBYRVSA-N L-prolylglycine Chemical compound [O-]C(=O)CNC(=O)[C@@H]1CCC[NH2+]1 RNKSNIBMTUYWSH-YFKPBYRVSA-N 0.000 description 1
- MJOZZTKJZQFKDK-GUBZILKMSA-N Leu-Ala-Gln Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@H](C(O)=O)CCC(N)=O MJOZZTKJZQFKDK-GUBZILKMSA-N 0.000 description 1
- XBBKIIGCUMBKCO-JXUBOQSCSA-N Leu-Ala-Thr Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](C)C(=O)N[C@@H]([C@@H](C)O)C(O)=O XBBKIIGCUMBKCO-JXUBOQSCSA-N 0.000 description 1
- SUPVSFFZWVOEOI-CQDKDKBSSA-N Leu-Ala-Tyr Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 SUPVSFFZWVOEOI-CQDKDKBSSA-N 0.000 description 1
- SUPVSFFZWVOEOI-UHFFFAOYSA-N Leu-Ala-Tyr Natural products CC(C)CC(N)C(=O)NC(C)C(=O)NC(C(O)=O)CC1=CC=C(O)C=C1 SUPVSFFZWVOEOI-UHFFFAOYSA-N 0.000 description 1
- HBJZFCIVFIBNSV-DCAQKATOSA-N Leu-Arg-Asn Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CC(N)=O)C(O)=O HBJZFCIVFIBNSV-DCAQKATOSA-N 0.000 description 1
- REPPKAMYTOJTFC-DCAQKATOSA-N Leu-Arg-Asp Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(O)=O)C(O)=O REPPKAMYTOJTFC-DCAQKATOSA-N 0.000 description 1
- HASRFYOMVPJRPU-SRVKXCTJSA-N Leu-Arg-Glu Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CCC(O)=O)C(O)=O HASRFYOMVPJRPU-SRVKXCTJSA-N 0.000 description 1
- MLTRLIITQPXHBJ-BQBZGAKWSA-N Leu-Asn Chemical compound CC(C)C[C@H](N)C(=O)N[C@H](C(O)=O)CC(N)=O MLTRLIITQPXHBJ-BQBZGAKWSA-N 0.000 description 1
- ILJREDZFPHTUIE-GUBZILKMSA-N Leu-Asp-Glu Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O ILJREDZFPHTUIE-GUBZILKMSA-N 0.000 description 1
- PVMPDMIKUVNOBD-CIUDSAMLSA-N Leu-Asp-Ser Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(O)=O PVMPDMIKUVNOBD-CIUDSAMLSA-N 0.000 description 1
- GZAUZBUKDXYPEH-CIUDSAMLSA-N Leu-Cys-Cys Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CS)C(=O)N[C@@H](CS)C(=O)O)N GZAUZBUKDXYPEH-CIUDSAMLSA-N 0.000 description 1
- PPBKJAQJAUHZKX-SRVKXCTJSA-N Leu-Cys-Leu Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CS)C(=O)N[C@H](C(O)=O)CC(C)C PPBKJAQJAUHZKX-SRVKXCTJSA-N 0.000 description 1
- JYOAXOMPIXKMKK-YUMQZZPRSA-N Leu-Gln Chemical compound CC(C)C[C@H]([NH3+])C(=O)N[C@H](C([O-])=O)CCC(N)=O JYOAXOMPIXKMKK-YUMQZZPRSA-N 0.000 description 1
- VQPPIMUZCZCOIL-GUBZILKMSA-N Leu-Gln-Ala Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](C)C(O)=O VQPPIMUZCZCOIL-GUBZILKMSA-N 0.000 description 1
- DXYBNWJZJVSZAE-GUBZILKMSA-N Leu-Gln-Cys Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H](CS)C(=O)O)N DXYBNWJZJVSZAE-GUBZILKMSA-N 0.000 description 1
- RSFGIMMPWAXNML-MNXVOIDGSA-N Leu-Gln-Ile Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O RSFGIMMPWAXNML-MNXVOIDGSA-N 0.000 description 1
- GLBNEGIOFRVRHO-JYJNAYRXSA-N Leu-Gln-Phe Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O GLBNEGIOFRVRHO-JYJNAYRXSA-N 0.000 description 1
- GPICTNQYKHHHTH-GUBZILKMSA-N Leu-Gln-Ser Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CO)C(O)=O GPICTNQYKHHHTH-GUBZILKMSA-N 0.000 description 1
- YSKSXVKQLLBVEX-SZMVWBNQSA-N Leu-Gln-Trp Chemical compound C1=CC=C2C(C[C@H](NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](N)CC(C)C)C(O)=O)=CNC2=C1 YSKSXVKQLLBVEX-SZMVWBNQSA-N 0.000 description 1
- DZQMXBALGUHGJT-GUBZILKMSA-N Leu-Glu-Ala Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](C)C(O)=O DZQMXBALGUHGJT-GUBZILKMSA-N 0.000 description 1
- WMTOVWLLDGQGCV-GUBZILKMSA-N Leu-Glu-Cys Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CS)C(=O)O)N WMTOVWLLDGQGCV-GUBZILKMSA-N 0.000 description 1
- PRZVBIAOPFGAQF-SRVKXCTJSA-N Leu-Glu-Met Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCSC)C(O)=O PRZVBIAOPFGAQF-SRVKXCTJSA-N 0.000 description 1
- BABSVXFGKFLIGW-UWVGGRQHSA-N Leu-Gly-Arg Chemical compound CC(C)C[C@H](N)C(=O)NCC(=O)N[C@H](C(O)=O)CCCNC(N)=N BABSVXFGKFLIGW-UWVGGRQHSA-N 0.000 description 1
- VBZOAGIPCULURB-QWRGUYRKSA-N Leu-Gly-His Chemical compound CC(C)C[C@@H](C(=O)NCC(=O)N[C@@H](CC1=CN=CN1)C(=O)O)N VBZOAGIPCULURB-QWRGUYRKSA-N 0.000 description 1
- KEVYYIMVELOXCT-KBPBESRZSA-N Leu-Gly-Phe Chemical compound CC(C)C[C@H]([NH3+])C(=O)NCC(=O)N[C@H](C([O-])=O)CC1=CC=CC=C1 KEVYYIMVELOXCT-KBPBESRZSA-N 0.000 description 1
- UCDHVOALNXENLC-KBPBESRZSA-N Leu-Gly-Tyr Chemical compound CC(C)C[C@H]([NH3+])C(=O)NCC(=O)N[C@H](C([O-])=O)CC1=CC=C(O)C=C1 UCDHVOALNXENLC-KBPBESRZSA-N 0.000 description 1
- BTNXKBVLWJBTNR-SRVKXCTJSA-N Leu-His-Asn Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CC(N)=O)C(O)=O BTNXKBVLWJBTNR-SRVKXCTJSA-N 0.000 description 1
- OYQUOLRTJHWVSQ-SRVKXCTJSA-N Leu-His-Ser Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CO)C(O)=O OYQUOLRTJHWVSQ-SRVKXCTJSA-N 0.000 description 1
- SGIIOQQGLUUMDQ-IHRRRGAJSA-N Leu-His-Val Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CC1=CN=CN1)C(=O)N[C@@H](C(C)C)C(=O)O)N SGIIOQQGLUUMDQ-IHRRRGAJSA-N 0.000 description 1
- AVEGDIAXTDVBJS-XUXIUFHCSA-N Leu-Ile-Arg Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O AVEGDIAXTDVBJS-XUXIUFHCSA-N 0.000 description 1
- KOSWSHVQIVTVQF-ZPFDUUQYSA-N Leu-Ile-Asp Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(O)=O)C(O)=O KOSWSHVQIVTVQF-ZPFDUUQYSA-N 0.000 description 1
- QLDHBYRUNQZIJQ-DKIMLUQUSA-N Leu-Ile-Phe Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O QLDHBYRUNQZIJQ-DKIMLUQUSA-N 0.000 description 1
- LIINDKYIGYTDLG-PPCPHDFISA-N Leu-Ile-Thr Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)O)C(O)=O LIINDKYIGYTDLG-PPCPHDFISA-N 0.000 description 1
- LCPYQJIKPJDLLB-UWVGGRQHSA-N Leu-Leu Chemical compound CC(C)C[C@H](N)C(=O)N[C@H](C(O)=O)CC(C)C LCPYQJIKPJDLLB-UWVGGRQHSA-N 0.000 description 1
- IAJFFZORSWOZPQ-SRVKXCTJSA-N Leu-Leu-Asn Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(N)=O)C(O)=O IAJFFZORSWOZPQ-SRVKXCTJSA-N 0.000 description 1
- DNDWZFHLZVYOGF-KKUMJFAQSA-N Leu-Leu-Leu Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(O)=O DNDWZFHLZVYOGF-KKUMJFAQSA-N 0.000 description 1
- LXKNSJLSGPNHSK-KKUMJFAQSA-N Leu-Leu-Lys Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(=O)O)N LXKNSJLSGPNHSK-KKUMJFAQSA-N 0.000 description 1
- XVZCXCTYGHPNEM-UHFFFAOYSA-N Leu-Leu-Pro Natural products CC(C)CC(N)C(=O)NC(CC(C)C)C(=O)N1CCCC1C(O)=O XVZCXCTYGHPNEM-UHFFFAOYSA-N 0.000 description 1
- OTXBNHIUIHNGAO-UWVGGRQHSA-N Leu-Lys Chemical compound CC(C)C[C@H](N)C(=O)N[C@H](C(O)=O)CCCCN OTXBNHIUIHNGAO-UWVGGRQHSA-N 0.000 description 1
- WXUOJXIGOPMDJM-SRVKXCTJSA-N Leu-Lys-Asn Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(N)=O)C(O)=O WXUOJXIGOPMDJM-SRVKXCTJSA-N 0.000 description 1
- BGZCJDGBBUUBHA-KKUMJFAQSA-N Leu-Lys-Leu Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(O)=O BGZCJDGBBUUBHA-KKUMJFAQSA-N 0.000 description 1
- OVZLLFONXILPDZ-VOAKCMCISA-N Leu-Lys-Thr Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)O)C(O)=O OVZLLFONXILPDZ-VOAKCMCISA-N 0.000 description 1
- LZHJZLHSRGWBBE-IHRRRGAJSA-N Leu-Lys-Val Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](C(C)C)C(O)=O LZHJZLHSRGWBBE-IHRRRGAJSA-N 0.000 description 1
- CPONGMJGVIAWEH-DCAQKATOSA-N Leu-Met-Ala Chemical compound CSCC[C@H](NC(=O)[C@@H](N)CC(C)C)C(=O)N[C@@H](C)C(O)=O CPONGMJGVIAWEH-DCAQKATOSA-N 0.000 description 1
- BJWKOATWNQJPSK-SRVKXCTJSA-N Leu-Met-Glu Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CCC(=O)O)C(=O)O)N BJWKOATWNQJPSK-SRVKXCTJSA-N 0.000 description 1
- HDHQQEDVWQGBEE-DCAQKATOSA-N Leu-Met-Ser Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CO)C(O)=O HDHQQEDVWQGBEE-DCAQKATOSA-N 0.000 description 1
- SYRTUBLKWNDSDK-DKIMLUQUSA-N Leu-Phe-Ile Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O SYRTUBLKWNDSDK-DKIMLUQUSA-N 0.000 description 1
- VTJUNIYRYIAIHF-IUCAKERBSA-N Leu-Pro Chemical compound CC(C)C[C@H](N)C(=O)N1CCC[C@H]1C(O)=O VTJUNIYRYIAIHF-IUCAKERBSA-N 0.000 description 1
- RRVCZCNFXIFGRA-DCAQKATOSA-N Leu-Pro-Asn Chemical compound [H]N[C@@H](CC(C)C)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CC(N)=O)C(O)=O RRVCZCNFXIFGRA-DCAQKATOSA-N 0.000 description 1
- KWLWZYMNUZJKMZ-IHRRRGAJSA-N Leu-Pro-Leu Chemical compound CC(C)C[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CC(C)C)C(O)=O KWLWZYMNUZJKMZ-IHRRRGAJSA-N 0.000 description 1
- XGDCYUQSFDQISZ-BQBZGAKWSA-N Leu-Ser Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CO)C(O)=O XGDCYUQSFDQISZ-BQBZGAKWSA-N 0.000 description 1
- IDGZVZJLYFTXSL-DCAQKATOSA-N Leu-Ser-Arg Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@H](C(O)=O)CCCN=C(N)N IDGZVZJLYFTXSL-DCAQKATOSA-N 0.000 description 1
- KZZCOWMDDXDKSS-CIUDSAMLSA-N Leu-Ser-Asn Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(N)=O)C(O)=O KZZCOWMDDXDKSS-CIUDSAMLSA-N 0.000 description 1
- AKVBOOKXVAMKSS-GUBZILKMSA-N Leu-Ser-Gln Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(N)=O)C(O)=O AKVBOOKXVAMKSS-GUBZILKMSA-N 0.000 description 1
- IWMJFLJQHIDZQW-KKUMJFAQSA-N Leu-Ser-Phe Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 IWMJFLJQHIDZQW-KKUMJFAQSA-N 0.000 description 1
- SBANPBVRHYIMRR-GARJFASQSA-N Leu-Ser-Pro Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CO)C(=O)N1CCC[C@@H]1C(=O)O)N SBANPBVRHYIMRR-GARJFASQSA-N 0.000 description 1
- SQUFDMCWMFOEBA-KKUMJFAQSA-N Leu-Ser-Tyr Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 SQUFDMCWMFOEBA-KKUMJFAQSA-N 0.000 description 1
- LRKCBIUDWAXNEG-CSMHCCOUSA-N Leu-Thr Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H]([C@@H](C)O)C(O)=O LRKCBIUDWAXNEG-CSMHCCOUSA-N 0.000 description 1
- ZJZNLRVCZWUONM-JXUBOQSCSA-N Leu-Thr-Ala Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C)C(O)=O ZJZNLRVCZWUONM-JXUBOQSCSA-N 0.000 description 1
- LINKCQUOMUDLKN-KATARQTJSA-N Leu-Thr-Cys Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CS)C(=O)O)NC(=O)[C@H](CC(C)C)N)O LINKCQUOMUDLKN-KATARQTJSA-N 0.000 description 1
- VDIARPPNADFEAV-WEDXCCLWSA-N Leu-Thr-Gly Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H]([C@@H](C)O)C(=O)NCC(O)=O VDIARPPNADFEAV-WEDXCCLWSA-N 0.000 description 1
- WUHBLPVELFTPQK-KKUMJFAQSA-N Leu-Tyr-Asn Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC(N)=O)C(O)=O WUHBLPVELFTPQK-KKUMJFAQSA-N 0.000 description 1
- ARNIBBOXIAWUOP-MGHWNKPDSA-N Leu-Tyr-Ile Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O ARNIBBOXIAWUOP-MGHWNKPDSA-N 0.000 description 1
- MDSUKZSLOATHMH-IUCAKERBSA-N Leu-Val Chemical compound CC(C)C[C@H]([NH3+])C(=O)N[C@@H](C(C)C)C([O-])=O MDSUKZSLOATHMH-IUCAKERBSA-N 0.000 description 1
- MKBVYCVTDBHWSZ-DCAQKATOSA-N Leu-Val-Ala Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](C)C(O)=O MKBVYCVTDBHWSZ-DCAQKATOSA-N 0.000 description 1
- FBNPMTNBFFAMMH-UHFFFAOYSA-N Leu-Val-Arg Natural products CC(C)CC(N)C(=O)NC(C(C)C)C(=O)NC(C(O)=O)CCCN=C(N)N FBNPMTNBFFAMMH-UHFFFAOYSA-N 0.000 description 1
- CGHXMODRYJISSK-NHCYSSNCSA-N Leu-Val-Asp Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@H](C(O)=O)CC(O)=O CGHXMODRYJISSK-NHCYSSNCSA-N 0.000 description 1
- 108060001084 Luciferase Proteins 0.000 description 1
- 239000005089 Luciferase Substances 0.000 description 1
- MPOHDJKRBLVGCT-CIUDSAMLSA-N Lys-Ala-Asn Chemical compound C[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)[C@H](CCCCN)N MPOHDJKRBLVGCT-CIUDSAMLSA-N 0.000 description 1
- NPBGTPKLVJEOBE-IUCAKERBSA-N Lys-Arg Chemical compound NCCCC[C@H](N)C(=O)N[C@H](C(O)=O)CCCNC(N)=N NPBGTPKLVJEOBE-IUCAKERBSA-N 0.000 description 1
- ALSRJRIWBNENFY-DCAQKATOSA-N Lys-Arg-Asn Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(N)=O)C(O)=O ALSRJRIWBNENFY-DCAQKATOSA-N 0.000 description 1
- WALVCOOOKULCQM-ULQDDVLXSA-N Lys-Arg-Phe Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O WALVCOOOKULCQM-ULQDDVLXSA-N 0.000 description 1
- GGAPIOORBXHMNY-ULQDDVLXSA-N Lys-Arg-Tyr Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)O)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@H](CCCCN)N)O GGAPIOORBXHMNY-ULQDDVLXSA-N 0.000 description 1
- FACUGMGEFUEBTI-SRVKXCTJSA-N Lys-Asn-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CC(N)=O)NC(=O)[C@@H](N)CCCCN FACUGMGEFUEBTI-SRVKXCTJSA-N 0.000 description 1
- HGZHSNBZDOLMLH-DCAQKATOSA-N Lys-Asn-Met Chemical compound CSCC[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](CCCCN)N HGZHSNBZDOLMLH-DCAQKATOSA-N 0.000 description 1
- DGWXCIORNLWGGG-CIUDSAMLSA-N Lys-Asn-Ser Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CO)C(O)=O DGWXCIORNLWGGG-CIUDSAMLSA-N 0.000 description 1
- GKFNXYMAMKJSKD-NHCYSSNCSA-N Lys-Asp-Val Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](C(C)C)C(O)=O GKFNXYMAMKJSKD-NHCYSSNCSA-N 0.000 description 1
- SFQPJNQDUUYCLA-BJDJZHNGSA-N Lys-Cys-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CS)NC(=O)[C@H](CCCCN)N SFQPJNQDUUYCLA-BJDJZHNGSA-N 0.000 description 1
- KSFQPRLZAUXXPT-GARJFASQSA-N Lys-Cys-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CS)NC(=O)[C@H](CCCCN)N)C(=O)O KSFQPRLZAUXXPT-GARJFASQSA-N 0.000 description 1
- OPTCSTACHGNULU-DCAQKATOSA-N Lys-Cys-Val Chemical compound CC(C)[C@@H](C(O)=O)NC(=O)[C@H](CS)NC(=O)[C@@H](N)CCCCN OPTCSTACHGNULU-DCAQKATOSA-N 0.000 description 1
- NDORZBUHCOJQDO-GVXVVHGQSA-N Lys-Gln-Val Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](C(C)C)C(O)=O NDORZBUHCOJQDO-GVXVVHGQSA-N 0.000 description 1
- UGTZHPSKYRIGRJ-YUMQZZPRSA-N Lys-Glu Chemical compound NCCCC[C@H](N)C(=O)N[C@H](C(O)=O)CCC(O)=O UGTZHPSKYRIGRJ-YUMQZZPRSA-N 0.000 description 1
- GCMWRRQAKQXDED-IUCAKERBSA-N Lys-Glu-Gly Chemical compound [NH3+]CCCC[C@H]([NH3+])C(=O)N[C@@H](CCC([O-])=O)C(=O)NCC([O-])=O GCMWRRQAKQXDED-IUCAKERBSA-N 0.000 description 1
- DUTMKEAPLLUGNO-JYJNAYRXSA-N Lys-Glu-Phe Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O DUTMKEAPLLUGNO-JYJNAYRXSA-N 0.000 description 1
- ISHNZELVUVPCHY-ZETCQYMHSA-N Lys-Gly-Gly Chemical compound NCCCC[C@H](N)C(=O)NCC(=O)NCC(O)=O ISHNZELVUVPCHY-ZETCQYMHSA-N 0.000 description 1
- IGRMTQMIDNDFAA-UWVGGRQHSA-N Lys-His Chemical compound NCCCC[C@H](N)C(=O)N[C@H](C(O)=O)CC1=CN=CN1 IGRMTQMIDNDFAA-UWVGGRQHSA-N 0.000 description 1
- KNKJPYAZQUFLQK-IHRRRGAJSA-N Lys-His-Arg Chemical compound C1=C(NC=N1)C[C@@H](C(=O)N[C@@H](CCCN=C(N)N)C(=O)O)NC(=O)[C@H](CCCCN)N KNKJPYAZQUFLQK-IHRRRGAJSA-N 0.000 description 1
- PRCHKVGXZVTALR-KKUMJFAQSA-N Lys-His-His Chemical compound C1=C(NC=N1)C[C@@H](C(=O)N[C@@H](CC2=CN=CN2)C(=O)O)NC(=O)[C@H](CCCCN)N PRCHKVGXZVTALR-KKUMJFAQSA-N 0.000 description 1
- WOEDRPCHKPSFDT-MXAVVETBSA-N Lys-His-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CC1=CN=CN1)NC(=O)[C@H](CCCCN)N WOEDRPCHKPSFDT-MXAVVETBSA-N 0.000 description 1
- YXTKSLRSRXKXNV-IHRRRGAJSA-N Lys-His-Met Chemical compound CSCC[C@@H](C(=O)O)NC(=O)[C@H](CC1=CN=CN1)NC(=O)[C@H](CCCCN)N YXTKSLRSRXKXNV-IHRRRGAJSA-N 0.000 description 1
- MXMDJEJWERYPMO-XUXIUFHCSA-N Lys-Ile-Arg Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O MXMDJEJWERYPMO-XUXIUFHCSA-N 0.000 description 1
- IUWMQCZOTYRXPL-ZPFDUUQYSA-N Lys-Ile-Asp Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(O)=O)C(O)=O IUWMQCZOTYRXPL-ZPFDUUQYSA-N 0.000 description 1
- JYXBNQOKPRQNQS-YTFOTSKYSA-N Lys-Ile-Ile Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O JYXBNQOKPRQNQS-YTFOTSKYSA-N 0.000 description 1
- PRSBSVAVOQOAMI-BJDJZHNGSA-N Lys-Ile-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@H]([C@@H](C)CC)NC(=O)[C@@H](N)CCCCN PRSBSVAVOQOAMI-BJDJZHNGSA-N 0.000 description 1
- MUXNCRWTWBMNHX-SRVKXCTJSA-N Lys-Leu-Asp Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(O)=O)C(O)=O MUXNCRWTWBMNHX-SRVKXCTJSA-N 0.000 description 1
- VMTYLUGCXIEDMV-QWRGUYRKSA-N Lys-Leu-Gly Chemical compound OC(=O)CNC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)CCCCN VMTYLUGCXIEDMV-QWRGUYRKSA-N 0.000 description 1
- QKXZCUCBFPEXNK-KKUMJFAQSA-N Lys-Leu-His Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@H](C(O)=O)CC1=CN=CN1 QKXZCUCBFPEXNK-KKUMJFAQSA-N 0.000 description 1
- WRODMZBHNNPRLN-SRVKXCTJSA-N Lys-Leu-Ser Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(O)=O WRODMZBHNNPRLN-SRVKXCTJSA-N 0.000 description 1
- LJADEBULDNKJNK-IHRRRGAJSA-N Lys-Leu-Val Chemical compound CC(C)C[C@H](NC(=O)[C@@H](N)CCCCN)C(=O)N[C@@H](C(C)C)C(O)=O LJADEBULDNKJNK-IHRRRGAJSA-N 0.000 description 1
- NVGBPTNZLWRQSY-UWVGGRQHSA-N Lys-Lys Chemical compound NCCCC[C@H](N)C(=O)N[C@H](C(O)=O)CCCCN NVGBPTNZLWRQSY-UWVGGRQHSA-N 0.000 description 1
- ALGGDNMLQNFVIZ-SRVKXCTJSA-N Lys-Lys-Asp Chemical compound C(CCN)C[C@@H](C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(=O)O)C(=O)O)N ALGGDNMLQNFVIZ-SRVKXCTJSA-N 0.000 description 1
- PYFNONMJYNJENN-AVGNSLFASA-N Lys-Lys-Gln Chemical compound C(CCN)C[C@@H](C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCC(=O)N)C(=O)O)N PYFNONMJYNJENN-AVGNSLFASA-N 0.000 description 1
- PLDJDCJLRCYPJB-VOAKCMCISA-N Lys-Lys-Thr Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)O)C(O)=O PLDJDCJLRCYPJB-VOAKCMCISA-N 0.000 description 1
- VSTNAUBHKQPVJX-IHRRRGAJSA-N Lys-Met-Leu Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CC(C)C)C(O)=O VSTNAUBHKQPVJX-IHRRRGAJSA-N 0.000 description 1
- XFOAWKDQMRMCDN-ULQDDVLXSA-N Lys-Phe-Arg Chemical compound NC(N)=NCCC[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@@H](N)CCCCN)CC1=CC=CC=C1 XFOAWKDQMRMCDN-ULQDDVLXSA-N 0.000 description 1
- ODTZHNZPINULEU-KKUMJFAQSA-N Lys-Phe-Asn Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)[C@H](CCCCN)N ODTZHNZPINULEU-KKUMJFAQSA-N 0.000 description 1
- LMGNWHDWJDIOPK-DKIMLUQUSA-N Lys-Phe-Ile Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O LMGNWHDWJDIOPK-DKIMLUQUSA-N 0.000 description 1
- LUAJJLPHUXPQLH-KKUMJFAQSA-N Lys-Phe-Ser Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CO)C(=O)O)NC(=O)[C@H](CCCCN)N LUAJJLPHUXPQLH-KKUMJFAQSA-N 0.000 description 1
- UDXSLGLHFUBRRM-OEAJRASXSA-N Lys-Phe-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](CC1=CC=CC=C1)NC(=O)[C@H](CCCCN)N)O UDXSLGLHFUBRRM-OEAJRASXSA-N 0.000 description 1
- LUTDBHBIHHREDC-IHRRRGAJSA-N Lys-Pro-Lys Chemical compound NCCCC[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCCCN)C(O)=O LUTDBHBIHHREDC-IHRRRGAJSA-N 0.000 description 1
- JMNRXRPBHFGXQX-GUBZILKMSA-N Lys-Ser-Glu Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@H](C(O)=O)CCC(O)=O JMNRXRPBHFGXQX-GUBZILKMSA-N 0.000 description 1
- IOQWIOPSKJOEKI-SRVKXCTJSA-N Lys-Ser-Leu Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(O)=O IOQWIOPSKJOEKI-SRVKXCTJSA-N 0.000 description 1
- YRNRVKTYDSLKMD-KKUMJFAQSA-N Lys-Ser-Tyr Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O YRNRVKTYDSLKMD-KKUMJFAQSA-N 0.000 description 1
- PLOUVAYOMTYJRG-JXUBOQSCSA-N Lys-Thr-Ala Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C)C(O)=O PLOUVAYOMTYJRG-JXUBOQSCSA-N 0.000 description 1
- CUHGAUZONORRIC-HJGDQZAQSA-N Lys-Thr-Asn Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)[C@H](CCCCN)N)O CUHGAUZONORRIC-HJGDQZAQSA-N 0.000 description 1
- DLCAXBGXGOVUCD-PPCPHDFISA-N Lys-Thr-Ile Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O DLCAXBGXGOVUCD-PPCPHDFISA-N 0.000 description 1
- RMOKGALPSPOYKE-KATARQTJSA-N Lys-Thr-Ser Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CO)C(O)=O RMOKGALPSPOYKE-KATARQTJSA-N 0.000 description 1
- VHTOGMKQXXJOHG-RHYQMDGZSA-N Lys-Thr-Val Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C(C)C)C(O)=O VHTOGMKQXXJOHG-RHYQMDGZSA-N 0.000 description 1
- WINFHLHJTRGLCV-BZSNNMDCSA-N Lys-Tyr-Lys Chemical compound NCCCC[C@H](N)C(=O)N[C@H](C(=O)N[C@@H](CCCCN)C(O)=O)CC1=CC=C(O)C=C1 WINFHLHJTRGLCV-BZSNNMDCSA-N 0.000 description 1
- LMMBAXJRYSXCOQ-ACRUOGEOSA-N Lys-Tyr-Phe Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N[C@@H](Cc1ccccc1)C(O)=O LMMBAXJRYSXCOQ-ACRUOGEOSA-N 0.000 description 1
- TXTZMVNJIRZABH-ULQDDVLXSA-N Lys-Val-Phe Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 TXTZMVNJIRZABH-ULQDDVLXSA-N 0.000 description 1
- IKXQOBUBZSOWDY-AVGNSLFASA-N Lys-Val-Val Chemical compound CC(C)[C@@H](C(=O)N[C@@H](C(C)C)C(=O)O)NC(=O)[C@H](CCCCN)N IKXQOBUBZSOWDY-AVGNSLFASA-N 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- YRAWWKUTNBILNT-FXQIFTODSA-N Met-Ala-Ala Chemical compound CSCC[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@@H](C)C(O)=O YRAWWKUTNBILNT-FXQIFTODSA-N 0.000 description 1
- WYEXWKAWMNJKPN-UBHSHLNASA-N Met-Ala-Phe Chemical compound C[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)O)NC(=O)[C@H](CCSC)N WYEXWKAWMNJKPN-UBHSHLNASA-N 0.000 description 1
- WXHHTBVYQOSYSL-FXQIFTODSA-N Met-Ala-Ser Chemical compound CSCC[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@@H](CO)C(O)=O WXHHTBVYQOSYSL-FXQIFTODSA-N 0.000 description 1
- DSWOTZCVCBEPOU-IUCAKERBSA-N Met-Arg-Gly Chemical compound CSCC[C@H](N)C(=O)N[C@H](C(=O)NCC(O)=O)CCCNC(N)=N DSWOTZCVCBEPOU-IUCAKERBSA-N 0.000 description 1
- ZAJNRWKGHWGPDQ-SDDRHHMPSA-N Met-Arg-Pro Chemical compound CSCC[C@@H](C(=O)N[C@@H](CCCN=C(N)N)C(=O)N1CCC[C@@H]1C(=O)O)N ZAJNRWKGHWGPDQ-SDDRHHMPSA-N 0.000 description 1
- JMEWFDUAFKVAAT-WDSKDSINSA-N Met-Asn Chemical compound CSCC[C@H]([NH3+])C(=O)N[C@H](C([O-])=O)CC(N)=O JMEWFDUAFKVAAT-WDSKDSINSA-N 0.000 description 1
- GODBLDDYHFTUAH-CIUDSAMLSA-N Met-Asp-Glu Chemical compound CSCC[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@H](C(O)=O)CCC(O)=O GODBLDDYHFTUAH-CIUDSAMLSA-N 0.000 description 1
- FJVJLMZUIGMFFU-BQBZGAKWSA-N Met-Asp-Gly Chemical compound CSCC[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)NCC(O)=O FJVJLMZUIGMFFU-BQBZGAKWSA-N 0.000 description 1
- HLYIDXAXQIJYIG-CIUDSAMLSA-N Met-Gln-Asp Chemical compound CSCC[C@H](N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(O)=O)C(O)=O HLYIDXAXQIJYIG-CIUDSAMLSA-N 0.000 description 1
- ADHNYKZHPOEULM-BQBZGAKWSA-N Met-Glu Chemical compound CSCC[C@H](N)C(=O)N[C@H](C(O)=O)CCC(O)=O ADHNYKZHPOEULM-BQBZGAKWSA-N 0.000 description 1
- QXOHLNCNYLGICT-YFKPBYRVSA-N Met-Gly Chemical compound CSCC[C@H](N)C(=O)NCC(O)=O QXOHLNCNYLGICT-YFKPBYRVSA-N 0.000 description 1
- BMHIFARYXOJDLD-WPRPVWTQSA-N Met-Gly-Val Chemical compound [H]N[C@@H](CCSC)C(=O)NCC(=O)N[C@@H](C(C)C)C(O)=O BMHIFARYXOJDLD-WPRPVWTQSA-N 0.000 description 1
- RXWPLVRJQNWXRQ-IHRRRGAJSA-N Met-His-His Chemical compound C([C@H](NC(=O)[C@@H](N)CCSC)C(=O)N[C@@H](CC=1N=CNC=1)C(O)=O)C1=CNC=N1 RXWPLVRJQNWXRQ-IHRRRGAJSA-N 0.000 description 1
- TZHFJXDKXGZHEN-IHRRRGAJSA-N Met-His-Leu Chemical compound [H]N[C@@H](CCSC)C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CC(C)C)C(O)=O TZHFJXDKXGZHEN-IHRRRGAJSA-N 0.000 description 1
- FWAHLGXNBLWIKB-NAKRPEOUSA-N Met-Ile-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@H]([C@@H](C)CC)NC(=O)[C@@H](N)CCSC FWAHLGXNBLWIKB-NAKRPEOUSA-N 0.000 description 1
- LCPUWQLULVXROY-RHYQMDGZSA-N Met-Lys-Thr Chemical compound CSCC[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)O)C(O)=O LCPUWQLULVXROY-RHYQMDGZSA-N 0.000 description 1
- WTHGNAAQXISJHP-AVGNSLFASA-N Met-Lys-Val Chemical compound [H]N[C@@H](CCSC)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](C(C)C)C(O)=O WTHGNAAQXISJHP-AVGNSLFASA-N 0.000 description 1
- VWWGEKCAPBMIFE-SRVKXCTJSA-N Met-Met-Met Chemical compound CSCC[C@H](N)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CCSC)C(O)=O VWWGEKCAPBMIFE-SRVKXCTJSA-N 0.000 description 1
- ILKCLLLOGPDNIP-RCWTZXSCSA-N Met-Met-Thr Chemical compound CSCC[C@H](N)C(=O)N[C@@H](CCSC)C(=O)N[C@@H]([C@@H](C)O)C(O)=O ILKCLLLOGPDNIP-RCWTZXSCSA-N 0.000 description 1
- HGCNKOLVKRAVHD-RYUDHWBXSA-N Met-Phe Chemical compound CSCC[C@H](N)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 HGCNKOLVKRAVHD-RYUDHWBXSA-N 0.000 description 1
- RSOMVHWMIAZNLE-HJWJTTGWSA-N Met-Phe-Ile Chemical compound [H]N[C@@H](CCSC)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O RSOMVHWMIAZNLE-HJWJTTGWSA-N 0.000 description 1
- WNJXJJSGUXAIQU-UFYCRDLUSA-N Met-Phe-Phe Chemical compound C([C@H](NC(=O)[C@@H](N)CCSC)C(=O)N[C@@H](CC=1C=CC=CC=1)C(O)=O)C1=CC=CC=C1 WNJXJJSGUXAIQU-UFYCRDLUSA-N 0.000 description 1
- RDLSEGZJMYGFNS-FXQIFTODSA-N Met-Ser-Asp Chemical compound [H]N[C@@H](CCSC)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(O)=O)C(O)=O RDLSEGZJMYGFNS-FXQIFTODSA-N 0.000 description 1
- DSZFTPCSFVWMKP-DCAQKATOSA-N Met-Ser-Lys Chemical compound CSCC[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@H](C(O)=O)CCCCN DSZFTPCSFVWMKP-DCAQKATOSA-N 0.000 description 1
- KAKJTZWHIUWTTD-VQVTYTSYSA-N Met-Thr Chemical compound CSCC[C@H]([NH3+])C(=O)N[C@@H]([C@@H](C)O)C([O-])=O KAKJTZWHIUWTTD-VQVTYTSYSA-N 0.000 description 1
- KYJHWKAMFISDJE-RCWTZXSCSA-N Met-Thr-Met Chemical compound CSCC[C@H](N)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@H](C(O)=O)CCSC KYJHWKAMFISDJE-RCWTZXSCSA-N 0.000 description 1
- YGNUDKAPJARTEM-GUBZILKMSA-N Met-Val-Ala Chemical compound CSCC[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](C)C(O)=O YGNUDKAPJARTEM-GUBZILKMSA-N 0.000 description 1
- 101000803710 Mus musculus Vitronectin Proteins 0.000 description 1
- MDSUKZSLOATHMH-UHFFFAOYSA-N N-L-leucyl-L-valine Natural products CC(C)CC(N)C(=O)NC(C(C)C)C(O)=O MDSUKZSLOATHMH-UHFFFAOYSA-N 0.000 description 1
- PESQCPHRXOFIPX-UHFFFAOYSA-N N-L-methionyl-L-tyrosine Natural products CSCCC(N)C(=O)NC(C(O)=O)CC1=CC=C(O)C=C1 PESQCPHRXOFIPX-UHFFFAOYSA-N 0.000 description 1
- XUYPXLNMDZIRQH-LURJTMIESA-N N-acetyl-L-methionine Chemical compound CSCC[C@@H](C(O)=O)NC(C)=O XUYPXLNMDZIRQH-LURJTMIESA-N 0.000 description 1
- 108010079364 N-glycylalanine Proteins 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 101710163270 Nuclease Proteins 0.000 description 1
- 102000016979 Other receptors Human genes 0.000 description 1
- 238000012408 PCR amplification Methods 0.000 description 1
- 229910019142 PO4 Inorganic materials 0.000 description 1
- 102000035195 Peptidases Human genes 0.000 description 1
- 108010077524 Peptide Elongation Factor 1 Proteins 0.000 description 1
- 108010067902 Peptide Library Proteins 0.000 description 1
- 108010043958 Peptoids Proteins 0.000 description 1
- WSXKXSBOJXEZDV-DLOVCJGASA-N Phe-Ala-Asn Chemical compound NC(=O)C[C@@H](C([O-])=O)NC(=O)[C@H](C)NC(=O)[C@@H]([NH3+])CC1=CC=CC=C1 WSXKXSBOJXEZDV-DLOVCJGASA-N 0.000 description 1
- LSXGADJXBDFXQU-DLOVCJGASA-N Phe-Ala-Asp Chemical compound OC(=O)C[C@@H](C(O)=O)NC(=O)[C@H](C)NC(=O)[C@@H](N)CC1=CC=CC=C1 LSXGADJXBDFXQU-DLOVCJGASA-N 0.000 description 1
- YRKFKTQRVBJYLT-CQDKDKBSSA-N Phe-Ala-His Chemical compound C([C@H](N)C(=O)N[C@@H](C)C(=O)N[C@@H](CC=1NC=NC=1)C(O)=O)C1=CC=CC=C1 YRKFKTQRVBJYLT-CQDKDKBSSA-N 0.000 description 1
- ULECEJGNDHWSKD-QEJZJMRPSA-N Phe-Ala-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@H](C)NC(=O)[C@@H](N)CC1=CC=CC=C1 ULECEJGNDHWSKD-QEJZJMRPSA-N 0.000 description 1
- YYRCPTVAPLQRNC-ULQDDVLXSA-N Phe-Arg-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@H](CCCNC(N)=N)NC(=O)[C@@H](N)CC1=CC=CC=C1 YYRCPTVAPLQRNC-ULQDDVLXSA-N 0.000 description 1
- IWRZUGHCHFZYQZ-UFYCRDLUSA-N Phe-Arg-Tyr Chemical compound C([C@H](N)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(O)=O)C1=CC=CC=C1 IWRZUGHCHFZYQZ-UFYCRDLUSA-N 0.000 description 1
- BXNGIHFNNNSEOS-UWVGGRQHSA-N Phe-Asn Chemical compound NC(=O)C[C@@H](C(O)=O)NC(=O)[C@@H](N)CC1=CC=CC=C1 BXNGIHFNNNSEOS-UWVGGRQHSA-N 0.000 description 1
- KIAWKQJTSGRCSA-AVGNSLFASA-N Phe-Asn-Glu Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CCC(=O)O)C(=O)O)N KIAWKQJTSGRCSA-AVGNSLFASA-N 0.000 description 1
- HTKNPQZCMLBOTQ-XVSYOHENSA-N Phe-Asn-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](CC1=CC=CC=C1)N)O HTKNPQZCMLBOTQ-XVSYOHENSA-N 0.000 description 1
- WMGVYPPIMZPWPN-SRVKXCTJSA-N Phe-Asp-Asn Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CC(=O)N)C(=O)O)N WMGVYPPIMZPWPN-SRVKXCTJSA-N 0.000 description 1
- UEXCHCYDPAIVDE-SRVKXCTJSA-N Phe-Asp-Cys Chemical compound SC[C@@H](C(O)=O)NC(=O)[C@H](CC(O)=O)NC(=O)[C@@H](N)CC1=CC=CC=C1 UEXCHCYDPAIVDE-SRVKXCTJSA-N 0.000 description 1
- KNPVDQMEHSCAGX-UWVGGRQHSA-N Phe-Cys Chemical compound SC[C@@H](C(O)=O)NC(=O)[C@@H](N)CC1=CC=CC=C1 KNPVDQMEHSCAGX-UWVGGRQHSA-N 0.000 description 1
- OMHMIXFFRPMYHB-SRVKXCTJSA-N Phe-Cys-Asn Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CS)C(=O)N[C@@H](CC(=O)N)C(=O)O)N OMHMIXFFRPMYHB-SRVKXCTJSA-N 0.000 description 1
- KLAONOISLHWJEE-QWRGUYRKSA-N Phe-Gln Chemical compound NC(=O)CC[C@@H](C(O)=O)NC(=O)[C@@H](N)CC1=CC=CC=C1 KLAONOISLHWJEE-QWRGUYRKSA-N 0.000 description 1
- IILUKIJNFMUBNF-IHRRRGAJSA-N Phe-Gln-Gln Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCC(N)=O)C(O)=O IILUKIJNFMUBNF-IHRRRGAJSA-N 0.000 description 1
- APJPXSFJBMMOLW-KBPBESRZSA-N Phe-Gly-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)CNC(=O)[C@@H](N)CC1=CC=CC=C1 APJPXSFJBMMOLW-KBPBESRZSA-N 0.000 description 1
- BIYWZVCPZIFGPY-QWRGUYRKSA-N Phe-Gly-Ser Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)NCC(=O)N[C@@H](CO)C(O)=O BIYWZVCPZIFGPY-QWRGUYRKSA-N 0.000 description 1
- SWCOXQLDICUYOL-ULQDDVLXSA-N Phe-His-Arg Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O SWCOXQLDICUYOL-ULQDDVLXSA-N 0.000 description 1
- SFKOEHXABNPLRT-KBPBESRZSA-N Phe-His-Gly Chemical compound N[C@@H](Cc1ccccc1)C(=O)N[C@@H](Cc1cnc[nH]1)C(=O)NCC(O)=O SFKOEHXABNPLRT-KBPBESRZSA-N 0.000 description 1
- VADLTGVIOIOKGM-BZSNNMDCSA-N Phe-His-Leu Chemical compound C([C@@H](C(=O)N[C@@H](CC(C)C)C(O)=O)NC(=O)[C@@H](N)CC=1C=CC=CC=1)C1=CN=CN1 VADLTGVIOIOKGM-BZSNNMDCSA-N 0.000 description 1
- FINLZXKJWTYYLC-ACRUOGEOSA-N Phe-His-Phe Chemical compound C([C@H](N)C(=O)N[C@@H](CC=1N=CNC=1)C(=O)N[C@@H](CC=1C=CC=CC=1)C(O)=O)C1=CC=CC=C1 FINLZXKJWTYYLC-ACRUOGEOSA-N 0.000 description 1
- SPXWRYVHOZVYBU-ULQDDVLXSA-N Phe-His-Val Chemical compound CC(C)[C@@H](C(=O)O)NC(=O)[C@H](CC1=CN=CN1)NC(=O)[C@H](CC2=CC=CC=C2)N SPXWRYVHOZVYBU-ULQDDVLXSA-N 0.000 description 1
- FXPZZKBHNOMLGA-HJWJTTGWSA-N Phe-Ile-Arg Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCN=C(N)N)C(=O)O)NC(=O)[C@H](CC1=CC=CC=C1)N FXPZZKBHNOMLGA-HJWJTTGWSA-N 0.000 description 1
- MIICYIIBVYQNKE-QEWYBTABSA-N Phe-Ile-Gln Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)O)NC(=O)[C@H](CC1=CC=CC=C1)N MIICYIIBVYQNKE-QEWYBTABSA-N 0.000 description 1
- GXDPQJUBLBZKDY-IAVJCBSLSA-N Phe-Ile-Ile Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O GXDPQJUBLBZKDY-IAVJCBSLSA-N 0.000 description 1
- WEMYTDDMDBLPMI-DKIMLUQUSA-N Phe-Ile-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CC1=CC=CC=C1)N WEMYTDDMDBLPMI-DKIMLUQUSA-N 0.000 description 1
- BWTKUQPNOMMKMA-FIRPJDEBSA-N Phe-Ile-Phe Chemical compound C([C@H](N)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC=1C=CC=CC=1)C(O)=O)C1=CC=CC=C1 BWTKUQPNOMMKMA-FIRPJDEBSA-N 0.000 description 1
- XMQSOOJRRVEHRO-ULQDDVLXSA-N Phe-Leu-Arg Chemical compound NC(N)=NCCC[C@@H](C(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)CC1=CC=CC=C1 XMQSOOJRRVEHRO-ULQDDVLXSA-N 0.000 description 1
- KBVJZCVLQWCJQN-KKUMJFAQSA-N Phe-Leu-Asn Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(N)=O)C(O)=O KBVJZCVLQWCJQN-KKUMJFAQSA-N 0.000 description 1
- KZRQONDKKJCAOL-DKIMLUQUSA-N Phe-Leu-Ile Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O KZRQONDKKJCAOL-DKIMLUQUSA-N 0.000 description 1
- INHMISZWLJZQGH-ULQDDVLXSA-N Phe-Leu-Val Chemical compound CC(C)[C@@H](C(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)CC1=CC=CC=C1 INHMISZWLJZQGH-ULQDDVLXSA-N 0.000 description 1
- RMKGXGPQIPLTFC-KKUMJFAQSA-N Phe-Lys-Asn Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(N)=O)C(O)=O RMKGXGPQIPLTFC-KKUMJFAQSA-N 0.000 description 1
- PEFJUUYFEGBXFA-BZSNNMDCSA-N Phe-Lys-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@H](CCCCN)NC(=O)[C@@H](N)CC1=CC=CC=C1 PEFJUUYFEGBXFA-BZSNNMDCSA-N 0.000 description 1
- YOFKMVUAZGPFCF-IHRRRGAJSA-N Phe-Met-Asn Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CC(N)=O)C(O)=O YOFKMVUAZGPFCF-IHRRRGAJSA-N 0.000 description 1
- PTLMYJOMJLTMCB-KKUMJFAQSA-N Phe-Met-Gln Chemical compound CSCC[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)O)NC(=O)[C@H](CC1=CC=CC=C1)N PTLMYJOMJLTMCB-KKUMJFAQSA-N 0.000 description 1
- UXQFHEKRGHYJRA-STQMWFEESA-N Phe-Met-Gly Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCSC)C(=O)NCC(O)=O UXQFHEKRGHYJRA-STQMWFEESA-N 0.000 description 1
- IWZRODDWOSIXPZ-IRXDYDNUSA-N Phe-Phe-Gly Chemical compound C([C@H](N)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)NCC(O)=O)C1=CC=CC=C1 IWZRODDWOSIXPZ-IRXDYDNUSA-N 0.000 description 1
- TXJJXEXCZBHDNA-ACRUOGEOSA-N Phe-Phe-His Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CC2=CC=CC=C2)C(=O)N[C@@H](CC3=CN=CN3)C(=O)O)N TXJJXEXCZBHDNA-ACRUOGEOSA-N 0.000 description 1
- CBENHWCORLVGEQ-HJOGWXRNSA-N Phe-Phe-Phe Chemical compound C([C@H](N)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@@H](CC=1C=CC=CC=1)C(O)=O)C1=CC=CC=C1 CBENHWCORLVGEQ-HJOGWXRNSA-N 0.000 description 1
- WEQJQNWXCSUVMA-RYUDHWBXSA-N Phe-Pro Chemical compound C([C@H]([NH3+])C(=O)N1[C@@H](CCC1)C([O-])=O)C1=CC=CC=C1 WEQJQNWXCSUVMA-RYUDHWBXSA-N 0.000 description 1
- GZGPMBKUJDRICD-ULQDDVLXSA-N Phe-Pro-His Chemical compound C1C[C@H](N(C1)C(=O)[C@H](CC2=CC=CC=C2)N)C(=O)N[C@@H](CC3=CN=CN3)C(=O)O GZGPMBKUJDRICD-ULQDDVLXSA-N 0.000 description 1
- ZVRJWDUPIDMHDN-ULQDDVLXSA-N Phe-Pro-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)CC1=CC=CC=C1 ZVRJWDUPIDMHDN-ULQDDVLXSA-N 0.000 description 1
- ODGNUUUDJONJSC-UFYCRDLUSA-N Phe-Pro-Tyr Chemical compound C1C[C@H](N(C1)C(=O)[C@H](CC2=CC=CC=C2)N)C(=O)N[C@@H](CC3=CC=C(C=C3)O)C(=O)O ODGNUUUDJONJSC-UFYCRDLUSA-N 0.000 description 1
- ZLAKUZDMKVKFAI-JYJNAYRXSA-N Phe-Pro-Val Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N1CCC[C@H]1C(=O)N[C@@H](C(C)C)C(O)=O ZLAKUZDMKVKFAI-JYJNAYRXSA-N 0.000 description 1
- XDMMOISUAHXXFD-SRVKXCTJSA-N Phe-Ser-Asp Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(O)=O)C(O)=O XDMMOISUAHXXFD-SRVKXCTJSA-N 0.000 description 1
- MCIXMYKSPQUMJG-SRVKXCTJSA-N Phe-Ser-Ser Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CO)C(O)=O MCIXMYKSPQUMJG-SRVKXCTJSA-N 0.000 description 1
- NYQBYASWHVRESG-MIMYLULJSA-N Phe-Thr Chemical compound C[C@@H](O)[C@@H](C(O)=O)NC(=O)[C@@H](N)CC1=CC=CC=C1 NYQBYASWHVRESG-MIMYLULJSA-N 0.000 description 1
- JHSRGEODDALISP-XVSYOHENSA-N Phe-Thr-Asn Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(N)=O)C(O)=O JHSRGEODDALISP-XVSYOHENSA-N 0.000 description 1
- BPIMVBKDLSBKIJ-FCLVOEFKSA-N Phe-Thr-Phe Chemical compound C([C@H](N)C(=O)N[C@@H]([C@H](O)C)C(=O)N[C@@H](CC=1C=CC=CC=1)C(O)=O)C1=CC=CC=C1 BPIMVBKDLSBKIJ-FCLVOEFKSA-N 0.000 description 1
- CVAUVSOFHJKCHN-BZSNNMDCSA-N Phe-Tyr-Cys Chemical compound C([C@H](N)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(=O)N[C@@H](CS)C(O)=O)C1=CC=CC=C1 CVAUVSOFHJKCHN-BZSNNMDCSA-N 0.000 description 1
- DBNGDEAQXGFGRA-ACRUOGEOSA-N Phe-Tyr-Lys Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CC2=CC=C(C=C2)O)C(=O)N[C@@H](CCCCN)C(=O)O)N DBNGDEAQXGFGRA-ACRUOGEOSA-N 0.000 description 1
- IPVPGAADZXRZSH-RNXOBYDBSA-N Phe-Tyr-Trp Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC1=CNC2=C1C=CC=C2)C(O)=O IPVPGAADZXRZSH-RNXOBYDBSA-N 0.000 description 1
- CDHURCQGUDNBMA-UBHSHLNASA-N Phe-Val-Ala Chemical compound OC(=O)[C@H](C)NC(=O)[C@H](C(C)C)NC(=O)[C@@H](N)CC1=CC=CC=C1 CDHURCQGUDNBMA-UBHSHLNASA-N 0.000 description 1
- KUSYCSMTTHSZOA-DZKIICNBSA-N Phe-Val-Gln Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)O)NC(=O)[C@H](CC1=CC=CC=C1)N KUSYCSMTTHSZOA-DZKIICNBSA-N 0.000 description 1
- XALFIVXGQUEGKV-JSGCOSHPSA-N Phe-Val-Gly Chemical compound OC(=O)CNC(=O)[C@H](C(C)C)NC(=O)[C@@H](N)CC1=CC=CC=C1 XALFIVXGQUEGKV-JSGCOSHPSA-N 0.000 description 1
- JTKGCYOOJLUETJ-ULQDDVLXSA-N Phe-Val-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](C(C)C)NC(=O)[C@@H](N)CC1=CC=CC=C1 JTKGCYOOJLUETJ-ULQDDVLXSA-N 0.000 description 1
- 239000002202 Polyethylene glycol Substances 0.000 description 1
- 239000004793 Polystyrene Substances 0.000 description 1
- KQCCDMFIALWGTL-GUBZILKMSA-N Pro-Asn-Met Chemical compound CSCC[C@@H](C(O)=O)NC(=O)[C@H](CC(N)=O)NC(=O)[C@@H]1CCCN1 KQCCDMFIALWGTL-GUBZILKMSA-N 0.000 description 1
- RETPETNFPLNLRV-JYJNAYRXSA-N Pro-Asn-Trp Chemical compound C1C[C@H](NC1)C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CC2=CNC3=CC=CC=C32)C(=O)O RETPETNFPLNLRV-JYJNAYRXSA-N 0.000 description 1
- MLQVJYMFASXBGZ-IHRRRGAJSA-N Pro-Asn-Tyr Chemical compound C1C[C@H](NC1)C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CC2=CC=C(C=C2)O)C(=O)O MLQVJYMFASXBGZ-IHRRRGAJSA-N 0.000 description 1
- UTAUEDINXUMHLG-FXQIFTODSA-N Pro-Asp-Ala Chemical compound C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)O)NC(=O)[C@@H]1CCCN1 UTAUEDINXUMHLG-FXQIFTODSA-N 0.000 description 1
- CJZTUKSFZUSNCC-FXQIFTODSA-N Pro-Asp-Asn Chemical compound NC(=O)C[C@@H](C(O)=O)NC(=O)[C@H](CC(O)=O)NC(=O)[C@@H]1CCCN1 CJZTUKSFZUSNCC-FXQIFTODSA-N 0.000 description 1
- ILMLVTGTUJPQFP-FXQIFTODSA-N Pro-Asp-Asp Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(O)=O)C(O)=O ILMLVTGTUJPQFP-FXQIFTODSA-N 0.000 description 1
- HXNYBZQLBWIADP-WDSKDSINSA-N Pro-Cys Chemical compound OC(=O)[C@H](CS)NC(=O)[C@@H]1CCCN1 HXNYBZQLBWIADP-WDSKDSINSA-N 0.000 description 1
- ZBAGOWGNNAXMOY-IHRRRGAJSA-N Pro-Cys-Phe Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CS)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O ZBAGOWGNNAXMOY-IHRRRGAJSA-N 0.000 description 1
- SHAQGFGGJSLLHE-BQBZGAKWSA-N Pro-Gln Chemical compound NC(=O)CC[C@@H](C([O-])=O)NC(=O)[C@@H]1CCC[NH2+]1 SHAQGFGGJSLLHE-BQBZGAKWSA-N 0.000 description 1
- LANQLYHLMYDWJP-SRVKXCTJSA-N Pro-Gln-Lys Chemical compound C1C[C@H](NC1)C(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H](CCCCN)C(=O)O LANQLYHLMYDWJP-SRVKXCTJSA-N 0.000 description 1
- PTLOFJZJADCNCD-DCAQKATOSA-N Pro-Glu-Met Chemical compound CSCC[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@@H]1CCCN1 PTLOFJZJADCNCD-DCAQKATOSA-N 0.000 description 1
- QGOZJLYCGRYYRW-KKUMJFAQSA-N Pro-Glu-Tyr Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O QGOZJLYCGRYYRW-KKUMJFAQSA-N 0.000 description 1
- XQSREVQDGCPFRJ-STQMWFEESA-N Pro-Gly-Phe Chemical compound [H]N1CCC[C@H]1C(=O)NCC(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O XQSREVQDGCPFRJ-STQMWFEESA-N 0.000 description 1
- BFXZQMWKTYWGCF-PYJNHQTQSA-N Pro-His-Ile Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O BFXZQMWKTYWGCF-PYJNHQTQSA-N 0.000 description 1
- OCYROESYHWUPBP-CIUDSAMLSA-N Pro-Ile Chemical compound CC[C@H](C)[C@@H](C([O-])=O)NC(=O)[C@@H]1CCC[NH2+]1 OCYROESYHWUPBP-CIUDSAMLSA-N 0.000 description 1
- TYMBHHITTMGGPI-NAKRPEOUSA-N Pro-Ile-Cys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CS)C(=O)O)NC(=O)[C@@H]1CCCN1 TYMBHHITTMGGPI-NAKRPEOUSA-N 0.000 description 1
- AUQGUYPHJSMAKI-CYDGBPFRSA-N Pro-Ile-Val Chemical compound CC(C)[C@@H](C(O)=O)NC(=O)[C@H]([C@@H](C)CC)NC(=O)[C@@H]1CCCN1 AUQGUYPHJSMAKI-CYDGBPFRSA-N 0.000 description 1
- FXGIMYRVJJEIIM-UWVGGRQHSA-N Pro-Leu-Gly Chemical compound OC(=O)CNC(=O)[C@H](CC(C)C)NC(=O)[C@@H]1CCCN1 FXGIMYRVJJEIIM-UWVGGRQHSA-N 0.000 description 1
- DRKAXLDECUGLFE-ULQDDVLXSA-N Pro-Leu-Phe Chemical compound CC(C)C[C@H](NC(=O)[C@@H]1CCCN1)C(=O)N[C@@H](Cc1ccccc1)C(O)=O DRKAXLDECUGLFE-ULQDDVLXSA-N 0.000 description 1
- SUENWIFTSTWUKD-AVGNSLFASA-N Pro-Leu-Val Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C(C)C)C(O)=O SUENWIFTSTWUKD-AVGNSLFASA-N 0.000 description 1
- RVQDZELMXZRSSI-IUCAKERBSA-N Pro-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@@H]1CCCN1 RVQDZELMXZRSSI-IUCAKERBSA-N 0.000 description 1
- JUJCUYWRJMFJJF-AVGNSLFASA-N Pro-Lys-Arg Chemical compound NC(N)=NCCC[C@@H](C(O)=O)NC(=O)[C@H](CCCCN)NC(=O)[C@@H]1CCCN1 JUJCUYWRJMFJJF-AVGNSLFASA-N 0.000 description 1
- RMODQFBNDDENCP-IHRRRGAJSA-N Pro-Lys-Leu Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(O)=O RMODQFBNDDENCP-IHRRRGAJSA-N 0.000 description 1
- WFIVLLFYUZZWOD-RHYQMDGZSA-N Pro-Lys-Thr Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)O)C(O)=O WFIVLLFYUZZWOD-RHYQMDGZSA-N 0.000 description 1
- WLJYLAQSUSIQNH-GUBZILKMSA-N Pro-Met-Ser Chemical compound CSCC[C@@H](C(=O)N[C@@H](CO)C(=O)O)NC(=O)[C@@H]1CCCN1 WLJYLAQSUSIQNH-GUBZILKMSA-N 0.000 description 1
- IWIANZLCJVYEFX-RYUDHWBXSA-N Pro-Phe Chemical compound C([C@@H](C(=O)O)NC(=O)[C@H]1NCCC1)C1=CC=CC=C1 IWIANZLCJVYEFX-RYUDHWBXSA-N 0.000 description 1
- MLKVIVZCFYRTIR-KKUMJFAQSA-N Pro-Phe-Gln Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCC(N)=O)C(O)=O MLKVIVZCFYRTIR-KKUMJFAQSA-N 0.000 description 1
- MHBSUKYVBZVQRW-HJWJTTGWSA-N Pro-Phe-Ile Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O MHBSUKYVBZVQRW-HJWJTTGWSA-N 0.000 description 1
- GFHXZNVJIKMAGO-IHRRRGAJSA-N Pro-Phe-Ser Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CO)C(O)=O GFHXZNVJIKMAGO-IHRRRGAJSA-N 0.000 description 1
- XYAFCOJKICBRDU-JYJNAYRXSA-N Pro-Phe-Val Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](C(C)C)C(O)=O XYAFCOJKICBRDU-JYJNAYRXSA-N 0.000 description 1
- FHZJRBVMLGOHBX-GUBZILKMSA-N Pro-Pro-Asp Chemical compound OC(=O)C[C@H](NC(=O)[C@@H]1CCCN1C(=O)[C@@H]1CCCN1)C(O)=O FHZJRBVMLGOHBX-GUBZILKMSA-N 0.000 description 1
- CGSOWZUPLOKYOR-AVGNSLFASA-N Pro-Pro-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@@H]1CCCN1C(=O)[C@H]1NCCC1 CGSOWZUPLOKYOR-AVGNSLFASA-N 0.000 description 1
- SXJOPONICMGFCR-DCAQKATOSA-N Pro-Ser-Lys Chemical compound C1C[C@H](NC1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCCN)C(=O)O SXJOPONICMGFCR-DCAQKATOSA-N 0.000 description 1
- MKGIILKDUGDRRO-FXQIFTODSA-N Pro-Ser-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@H](CO)NC(=O)[C@@H]1CCCN1 MKGIILKDUGDRRO-FXQIFTODSA-N 0.000 description 1
- XSXABUHLKPUVLX-JYJNAYRXSA-N Pro-Ser-Trp Chemical compound C1C[C@H](NC1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC2=CNC3=CC=CC=C32)C(=O)O XSXABUHLKPUVLX-JYJNAYRXSA-N 0.000 description 1
- PRKWBYCXBBSLSK-GUBZILKMSA-N Pro-Ser-Val Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CO)C(=O)N[C@@H](C(C)C)C(O)=O PRKWBYCXBBSLSK-GUBZILKMSA-N 0.000 description 1
- AJJDPGVVNPUZCR-RHYQMDGZSA-N Pro-Thr-Lys Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@@H]1CCCN1)O AJJDPGVVNPUZCR-RHYQMDGZSA-N 0.000 description 1
- RMJZWERKFFNNNS-XGEHTFHBSA-N Pro-Thr-Ser Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CO)C(O)=O RMJZWERKFFNNNS-XGEHTFHBSA-N 0.000 description 1
- CXGLFEOYCJFKPR-RCWTZXSCSA-N Pro-Thr-Val Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C(C)C)C(O)=O CXGLFEOYCJFKPR-RCWTZXSCSA-N 0.000 description 1
- PGSWNLRYYONGPE-JYJNAYRXSA-N Pro-Val-Tyr Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O PGSWNLRYYONGPE-JYJNAYRXSA-N 0.000 description 1
- ONIBWKKTOPOVIA-UHFFFAOYSA-N Proline Natural products OC(=O)C1CCCN1 ONIBWKKTOPOVIA-UHFFFAOYSA-N 0.000 description 1
- 108091008109 Pseudogenes Proteins 0.000 description 1
- 102000057361 Pseudogenes Human genes 0.000 description 1
- 101710185720 Putative ethidium bromide resistance protein Proteins 0.000 description 1
- 108010025216 RVF peptide Proteins 0.000 description 1
- 101100219388 Rattus norvegicus Ca2 gene Proteins 0.000 description 1
- 108700008625 Reporter Genes Proteins 0.000 description 1
- 102100037486 Reverse transcriptase/ribonuclease H Human genes 0.000 description 1
- 108091028664 Ribonucleotide Proteins 0.000 description 1
- PYMYPHUHKUWMLA-LMVFSUKVSA-N Ribose Natural products OC[C@@H](O)[C@@H](O)[C@@H](O)C=O PYMYPHUHKUWMLA-LMVFSUKVSA-N 0.000 description 1
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 1
- SSJMZMUVNKEENT-IMJSIDKUSA-N Ser-Ala Chemical compound OC(=O)[C@H](C)NC(=O)[C@@H](N)CO SSJMZMUVNKEENT-IMJSIDKUSA-N 0.000 description 1
- BTKUIVBNGBFTTP-WHFBIAKZSA-N Ser-Ala-Gly Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](C)C(=O)NCC(O)=O BTKUIVBNGBFTTP-WHFBIAKZSA-N 0.000 description 1
- RZEQTVHJZCIUBT-WDSKDSINSA-N Ser-Arg Chemical compound OC[C@H](N)C(=O)N[C@H](C(O)=O)CCCNC(N)=N RZEQTVHJZCIUBT-WDSKDSINSA-N 0.000 description 1
- VQBLHWSPVYYZTB-DCAQKATOSA-N Ser-Arg-His Chemical compound C1=C(NC=N1)C[C@@H](C(=O)O)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@H](CO)N VQBLHWSPVYYZTB-DCAQKATOSA-N 0.000 description 1
- HZWAHWQZPSXNCB-BPUTZDHNSA-N Ser-Arg-Trp Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC1=CNC2=C1C=CC=C2)C(O)=O HZWAHWQZPSXNCB-BPUTZDHNSA-N 0.000 description 1
- FIDMVVBUOCMMJG-CIUDSAMLSA-N Ser-Asn-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CC(N)=O)NC(=O)[C@@H](N)CO FIDMVVBUOCMMJG-CIUDSAMLSA-N 0.000 description 1
- VBKBDLMWICBSCY-IMJSIDKUSA-N Ser-Asp Chemical compound OC[C@H](N)C(=O)N[C@H](C(O)=O)CC(O)=O VBKBDLMWICBSCY-IMJSIDKUSA-N 0.000 description 1
- FTVRVZNYIYWJGB-ACZMJKKPSA-N Ser-Asp-Glu Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O FTVRVZNYIYWJGB-ACZMJKKPSA-N 0.000 description 1
- BTPAWKABYQMKKN-LKXGYXEUSA-N Ser-Asp-Thr Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O BTPAWKABYQMKKN-LKXGYXEUSA-N 0.000 description 1
- MOVJSUIKUNCVMG-ZLUOBGJFSA-N Ser-Cys-Ser Chemical compound C([C@@H](C(=O)N[C@@H](CS)C(=O)N[C@@H](CO)C(=O)O)N)O MOVJSUIKUNCVMG-ZLUOBGJFSA-N 0.000 description 1
- IXUGADGDCQDLSA-FXQIFTODSA-N Ser-Gln-Gln Chemical compound C(CC(=O)N)[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)O)NC(=O)[C@H](CO)N IXUGADGDCQDLSA-FXQIFTODSA-N 0.000 description 1
- LAFKUZYWNCHOHT-WHFBIAKZSA-N Ser-Glu Chemical compound OC[C@H](N)C(=O)N[C@H](C(O)=O)CCC(O)=O LAFKUZYWNCHOHT-WHFBIAKZSA-N 0.000 description 1
- JFWDJFULOLKQFY-QWRGUYRKSA-N Ser-Gly-Phe Chemical compound [H]N[C@@H](CO)C(=O)NCC(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O JFWDJFULOLKQFY-QWRGUYRKSA-N 0.000 description 1
- SFTZWNJFZYOLBD-ZDLURKLDSA-N Ser-Gly-Thr Chemical compound C[C@@H](O)[C@@H](C(O)=O)NC(=O)CNC(=O)[C@@H](N)CO SFTZWNJFZYOLBD-ZDLURKLDSA-N 0.000 description 1
- BXLYSRPHVMCOPS-ACZMJKKPSA-N Ser-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@@H](N)CO BXLYSRPHVMCOPS-ACZMJKKPSA-N 0.000 description 1
- YIUWWXVTYLANCJ-NAKRPEOUSA-N Ser-Ile-Arg Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O YIUWWXVTYLANCJ-NAKRPEOUSA-N 0.000 description 1
- XNCUYZKGQOCOQH-YUMQZZPRSA-N Ser-Leu-Gly Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(=O)NCC(O)=O XNCUYZKGQOCOQH-YUMQZZPRSA-N 0.000 description 1
- IXZHZUGGKLRHJD-DCAQKATOSA-N Ser-Leu-Val Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C(C)C)C(O)=O IXZHZUGGKLRHJD-DCAQKATOSA-N 0.000 description 1
- PPNPDKGQRFSCAC-CIUDSAMLSA-N Ser-Lys-Asp Chemical compound NCCCC[C@H](NC(=O)[C@@H](N)CO)C(=O)N[C@@H](CC(O)=O)C(O)=O PPNPDKGQRFSCAC-CIUDSAMLSA-N 0.000 description 1
- PBUXMVYWOSKHMF-WDSKDSINSA-N Ser-Met Chemical compound CSCC[C@@H](C(O)=O)NC(=O)[C@@H](N)CO PBUXMVYWOSKHMF-WDSKDSINSA-N 0.000 description 1
- AXVNLRQLPLSIPQ-FXQIFTODSA-N Ser-Met-Cys Chemical compound CSCC[C@@H](C(=O)N[C@@H](CS)C(=O)O)NC(=O)[C@H](CO)N AXVNLRQLPLSIPQ-FXQIFTODSA-N 0.000 description 1
- PPQRSMGDOHLTBE-UWVGGRQHSA-N Ser-Phe Chemical compound OC[C@H](N)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 PPQRSMGDOHLTBE-UWVGGRQHSA-N 0.000 description 1
- JLKWJWPDXPKKHI-FXQIFTODSA-N Ser-Pro-Asn Chemical compound C1C[C@H](N(C1)C(=O)[C@H](CO)N)C(=O)N[C@@H](CC(=O)N)C(=O)O JLKWJWPDXPKKHI-FXQIFTODSA-N 0.000 description 1
- WNDUPCKKKGSKIQ-CIUDSAMLSA-N Ser-Pro-Gln Chemical compound OC[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCC(N)=O)C(O)=O WNDUPCKKKGSKIQ-CIUDSAMLSA-N 0.000 description 1
- RHAPJNVNWDBFQI-BQBZGAKWSA-N Ser-Pro-Gly Chemical compound OC[C@H](N)C(=O)N1CCC[C@H]1C(=O)NCC(O)=O RHAPJNVNWDBFQI-BQBZGAKWSA-N 0.000 description 1
- GZGFSPWOMUKKCV-NAKRPEOUSA-N Ser-Pro-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)CO GZGFSPWOMUKKCV-NAKRPEOUSA-N 0.000 description 1
- CKDXFSPMIDSMGV-GUBZILKMSA-N Ser-Pro-Val Chemical compound [H]N[C@@H](CO)C(=O)N1CCC[C@H]1C(=O)N[C@@H](C(C)C)C(O)=O CKDXFSPMIDSMGV-GUBZILKMSA-N 0.000 description 1
- PPCZVWHJWJFTFN-ZLUOBGJFSA-N Ser-Ser-Asp Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(O)=O)C(O)=O PPCZVWHJWJFTFN-ZLUOBGJFSA-N 0.000 description 1
- NVNPWELENFJOHH-CIUDSAMLSA-N Ser-Ser-His Chemical compound C1=C(NC=N1)C[C@@H](C(=O)O)NC(=O)[C@H](CO)NC(=O)[C@H](CO)N NVNPWELENFJOHH-CIUDSAMLSA-N 0.000 description 1
- JCLAFVNDBJMLBC-JBDRJPRFSA-N Ser-Ser-Ile Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CO)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O JCLAFVNDBJMLBC-JBDRJPRFSA-N 0.000 description 1
- AABIBDJHSKIMJK-FXQIFTODSA-N Ser-Ser-Met Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCSC)C(O)=O AABIBDJHSKIMJK-FXQIFTODSA-N 0.000 description 1
- DKGRNFUXVTYRAS-UBHSHLNASA-N Ser-Ser-Trp Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC1=CNC2=C1C=CC=C2)C(O)=O DKGRNFUXVTYRAS-UBHSHLNASA-N 0.000 description 1
- VGQVAVQWKJLIRM-FXQIFTODSA-N Ser-Ser-Val Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CO)C(=O)N[C@@H](C(C)C)C(O)=O VGQVAVQWKJLIRM-FXQIFTODSA-N 0.000 description 1
- SZRNDHWMVSFPSP-XKBZYTNZSA-N Ser-Thr-Gln Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)O)NC(=O)[C@H](CO)N)O SZRNDHWMVSFPSP-XKBZYTNZSA-N 0.000 description 1
- QNBVFKZSSRYNFX-CUJWVEQBSA-N Ser-Thr-His Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CC1=CN=CN1)C(=O)O)NC(=O)[C@H](CO)N)O QNBVFKZSSRYNFX-CUJWVEQBSA-N 0.000 description 1
- PCJLFYBAQZQOFE-KATARQTJSA-N Ser-Thr-Lys Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CO)N)O PCJLFYBAQZQOFE-KATARQTJSA-N 0.000 description 1
- FVFUOQIYDPAIJR-XIRDDKMYSA-N Ser-Trp-Leu Chemical compound CC(C)C[C@@H](C(=O)O)NC(=O)[C@H](CC1=CNC2=CC=CC=C21)NC(=O)[C@H](CO)N FVFUOQIYDPAIJR-XIRDDKMYSA-N 0.000 description 1
- QYBRQMLZDDJBSW-AVGNSLFASA-N Ser-Tyr-Glu Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CCC(O)=O)C(O)=O QYBRQMLZDDJBSW-AVGNSLFASA-N 0.000 description 1
- UBTNVMGPMYDYIU-HJPIBITLSA-N Ser-Tyr-Ile Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O UBTNVMGPMYDYIU-HJPIBITLSA-N 0.000 description 1
- PMTWIUBUQRGCSB-FXQIFTODSA-N Ser-Val-Ala Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](C)C(O)=O PMTWIUBUQRGCSB-FXQIFTODSA-N 0.000 description 1
- LGIMRDKGABDMBN-DCAQKATOSA-N Ser-Val-Lys Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CO)N LGIMRDKGABDMBN-DCAQKATOSA-N 0.000 description 1
- 229930182558 Sterol Natural products 0.000 description 1
- 241000282898 Sus scrofa Species 0.000 description 1
- 108700026226 TATA Box Proteins 0.000 description 1
- RYYWUUFWQRZTIU-UHFFFAOYSA-N Thiophosphoric acid Chemical class OP(O)(S)=O RYYWUUFWQRZTIU-UHFFFAOYSA-N 0.000 description 1
- IGROJMCBGRFRGI-YTLHQDLWSA-N Thr-Ala-Ala Chemical compound C[C@@H](O)[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@@H](C)C(O)=O IGROJMCBGRFRGI-YTLHQDLWSA-N 0.000 description 1
- YRNBANYVJJBGDI-VZFHVOOUSA-N Thr-Ala-Cys Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](C)C(=O)N[C@@H](CS)C(=O)O)N)O YRNBANYVJJBGDI-VZFHVOOUSA-N 0.000 description 1
- TYVAWPFQYFPSBR-BFHQHQDPSA-N Thr-Ala-Gly Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C)C(=O)NCC(O)=O TYVAWPFQYFPSBR-BFHQHQDPSA-N 0.000 description 1
- BSNZTJXVDOINSR-JXUBOQSCSA-N Thr-Ala-Leu Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C)C(=O)N[C@@H](CC(C)C)C(O)=O BSNZTJXVDOINSR-JXUBOQSCSA-N 0.000 description 1
- ZUXQFMVPAYGPFJ-JXUBOQSCSA-N Thr-Ala-Lys Chemical compound C[C@@H](O)[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@H](C(O)=O)CCCCN ZUXQFMVPAYGPFJ-JXUBOQSCSA-N 0.000 description 1
- KEGBFULVYKYJRD-LFSVMHDDSA-N Thr-Ala-Phe Chemical compound C[C@@H](O)[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 KEGBFULVYKYJRD-LFSVMHDDSA-N 0.000 description 1
- IRKWVRSEQFTGGV-VEVYYDQMSA-N Thr-Asn-Arg Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O IRKWVRSEQFTGGV-VEVYYDQMSA-N 0.000 description 1
- PZVGOVRNGKEFCB-KKHAAJSZSA-N Thr-Asn-Val Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](C(C)C)C(=O)O)N)O PZVGOVRNGKEFCB-KKHAAJSZSA-N 0.000 description 1
- APIQKJYZDWVOCE-VEVYYDQMSA-N Thr-Asp-Met Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CCSC)C(O)=O APIQKJYZDWVOCE-VEVYYDQMSA-N 0.000 description 1
- DXNUZQGVOMCGNS-SWRJLBSHSA-N Thr-Gln-Trp Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H](CC1=CNC2=CC=CC=C21)C(=O)O)N)O DXNUZQGVOMCGNS-SWRJLBSHSA-N 0.000 description 1
- DKDHTRVDOUZZTP-IFFSRLJSSA-N Thr-Gln-Val Chemical compound CC(C)[C@H](NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](N)[C@@H](C)O)C(O)=O DKDHTRVDOUZZTP-IFFSRLJSSA-N 0.000 description 1
- GKWNLDNXMMLRMC-GLLZPBPUSA-N Thr-Glu-Gln Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CCC(=O)N)C(=O)O)N)O GKWNLDNXMMLRMC-GLLZPBPUSA-N 0.000 description 1
- JMGJDTNUMAZNLX-RWRJDSDZSA-N Thr-Glu-Ile Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O JMGJDTNUMAZNLX-RWRJDSDZSA-N 0.000 description 1
- BIYXEUAFGLTAEM-WUJLRWPWSA-N Thr-Gly Chemical compound C[C@@H](O)[C@H](N)C(=O)NCC(O)=O BIYXEUAFGLTAEM-WUJLRWPWSA-N 0.000 description 1
- SLUWOCTZVGMURC-BFHQHQDPSA-N Thr-Gly-Ala Chemical compound C[C@@H](O)[C@H](N)C(=O)NCC(=O)N[C@@H](C)C(O)=O SLUWOCTZVGMURC-BFHQHQDPSA-N 0.000 description 1
- QQWNRERCGGZOKG-WEDXCCLWSA-N Thr-Gly-Leu Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)NCC(=O)N[C@@H](CC(C)C)C(O)=O QQWNRERCGGZOKG-WEDXCCLWSA-N 0.000 description 1
- WXVIGTAUZBUDPZ-DTLFHODZSA-N Thr-His Chemical compound C[C@@H](O)[C@H](N)C(=O)N[C@H](C(O)=O)CC1=CN=CN1 WXVIGTAUZBUDPZ-DTLFHODZSA-N 0.000 description 1
- HEJJDUDEHLPDAW-CUJWVEQBSA-N Thr-His-Cys Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CC1=CN=CN1)C(=O)N[C@@H](CS)C(=O)O)N)O HEJJDUDEHLPDAW-CUJWVEQBSA-N 0.000 description 1
- UYTYTDMCDBPDSC-URLPEUOOSA-N Thr-Ile-Phe Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)O)NC(=O)[C@H]([C@@H](C)O)N UYTYTDMCDBPDSC-URLPEUOOSA-N 0.000 description 1
- IHAPJUHCZXBPHR-WZLNRYEVSA-N Thr-Ile-Tyr Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CC=C(C=C1)O)C(=O)O)NC(=O)[C@H]([C@@H](C)O)N IHAPJUHCZXBPHR-WZLNRYEVSA-N 0.000 description 1
- BQBCIBCLXBKYHW-CSMHCCOUSA-N Thr-Leu Chemical compound CC(C)C[C@@H](C([O-])=O)NC(=O)[C@@H]([NH3+])[C@@H](C)O BQBCIBCLXBKYHW-CSMHCCOUSA-N 0.000 description 1
- PRNGXSILMXSWQQ-OEAJRASXSA-N Thr-Leu-Phe Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O PRNGXSILMXSWQQ-OEAJRASXSA-N 0.000 description 1
- BDGBHYCAZJPLHX-HJGDQZAQSA-N Thr-Lys-Asn Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(N)=O)C(O)=O BDGBHYCAZJPLHX-HJGDQZAQSA-N 0.000 description 1
- ZSPQUTWLWGWTPS-HJGDQZAQSA-N Thr-Lys-Asp Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(O)=O)C(O)=O ZSPQUTWLWGWTPS-HJGDQZAQSA-N 0.000 description 1
- JLNMFGCJODTXDH-WEDXCCLWSA-N Thr-Lys-Gly Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCCCN)C(=O)NCC(O)=O JLNMFGCJODTXDH-WEDXCCLWSA-N 0.000 description 1
- APIDTRXFGYOLLH-VQVTYTSYSA-N Thr-Met Chemical compound CSCC[C@@H](C(O)=O)NC(=O)[C@@H](N)[C@@H](C)O APIDTRXFGYOLLH-VQVTYTSYSA-N 0.000 description 1
- PCMDGXKXVMBIFP-VEVYYDQMSA-N Thr-Met-Asn Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CC(N)=O)C(O)=O PCMDGXKXVMBIFP-VEVYYDQMSA-N 0.000 description 1
- PUEWAXRPXOEQOW-HJGDQZAQSA-N Thr-Met-Gln Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CCC(N)=O)C(O)=O PUEWAXRPXOEQOW-HJGDQZAQSA-N 0.000 description 1
- WRQLCVIALDUQEQ-UNQGMJICSA-N Thr-Phe-Arg Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O WRQLCVIALDUQEQ-UNQGMJICSA-N 0.000 description 1
- NZRUWPIYECBYRK-HTUGSXCWSA-N Thr-Phe-Glu Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCC(O)=O)C(O)=O NZRUWPIYECBYRK-HTUGSXCWSA-N 0.000 description 1
- BDYBHQWMHYDRKJ-UNQGMJICSA-N Thr-Phe-Met Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCSC)C(=O)O)N)O BDYBHQWMHYDRKJ-UNQGMJICSA-N 0.000 description 1
- JMBRNXUOLJFURW-BEAPCOKYSA-N Thr-Phe-Pro Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N2CCC[C@@H]2C(=O)O)N)O JMBRNXUOLJFURW-BEAPCOKYSA-N 0.000 description 1
- QOLYAJSZHIJCTO-VQVTYTSYSA-N Thr-Pro Chemical compound C[C@@H](O)[C@H](N)C(=O)N1CCC[C@H]1C(O)=O QOLYAJSZHIJCTO-VQVTYTSYSA-N 0.000 description 1
- MUAFDCVOHYAFNG-RCWTZXSCSA-N Thr-Pro-Arg Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCCNC(N)=N)C(O)=O MUAFDCVOHYAFNG-RCWTZXSCSA-N 0.000 description 1
- YGZWVPBHYABGLT-KJEVXHAQSA-N Thr-Pro-Tyr Chemical compound C[C@@H](O)[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 YGZWVPBHYABGLT-KJEVXHAQSA-N 0.000 description 1
- GXDLGHLJTHMDII-WISUUJSJSA-N Thr-Ser Chemical compound C[C@@H](O)[C@H](N)C(=O)N[C@@H](CO)C(O)=O GXDLGHLJTHMDII-WISUUJSJSA-N 0.000 description 1
- DOBIBIXIHJKVJF-XKBZYTNZSA-N Thr-Ser-Gln Chemical compound C[C@@H](O)[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@H](C(O)=O)CCC(N)=O DOBIBIXIHJKVJF-XKBZYTNZSA-N 0.000 description 1
- VUXIQSUQQYNLJP-XAVMHZPKSA-N Thr-Ser-Pro Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CO)C(=O)N1CCC[C@@H]1C(=O)O)N)O VUXIQSUQQYNLJP-XAVMHZPKSA-N 0.000 description 1
- BBPCSGKKPJUYRB-UVOCVTCTSA-N Thr-Thr-Leu Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(O)=O BBPCSGKKPJUYRB-UVOCVTCTSA-N 0.000 description 1
- KHTIUAKJRUIEMA-HOUAVDHOSA-N Thr-Trp-Asp Chemical compound C1=CC=C2C(C[C@H](NC(=O)[C@@H](N)[C@H](O)C)C(=O)N[C@@H](CC(O)=O)C(O)=O)=CNC2=C1 KHTIUAKJRUIEMA-HOUAVDHOSA-N 0.000 description 1
- GJOBRAHDRIDAPT-NGTWOADLSA-N Thr-Trp-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CC1=CNC2=CC=CC=C21)NC(=O)[C@H]([C@@H](C)O)N GJOBRAHDRIDAPT-NGTWOADLSA-N 0.000 description 1
- UMFLBPIPAJMNIM-LYARXQMPSA-N Thr-Trp-Phe Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CC1=CNC2=CC=CC=C21)C(=O)N[C@@H](CC3=CC=CC=C3)C(=O)O)N)O UMFLBPIPAJMNIM-LYARXQMPSA-N 0.000 description 1
- NJGMALCNYAMYCB-JRQIVUDYSA-N Thr-Tyr-Asn Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC(N)=O)C(O)=O NJGMALCNYAMYCB-JRQIVUDYSA-N 0.000 description 1
- OMRWDMWXRWTQIU-YJRXYDGGSA-N Thr-Tyr-Cys Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CC1=CC=C(C=C1)O)C(=O)N[C@@H](CS)C(=O)O)N)O OMRWDMWXRWTQIU-YJRXYDGGSA-N 0.000 description 1
- XVHAUVJXBFGUPC-RPTUDFQQSA-N Thr-Tyr-Phe Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O XVHAUVJXBFGUPC-RPTUDFQQSA-N 0.000 description 1
- VYVBSMCZNHOZGD-RCWTZXSCSA-N Thr-Val-Val Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](C(C)C)C(O)=O VYVBSMCZNHOZGD-RCWTZXSCSA-N 0.000 description 1
- AYFVYJQAPQTCCC-UHFFFAOYSA-N Threonine Natural products CC(O)C(N)C(O)=O AYFVYJQAPQTCCC-UHFFFAOYSA-N 0.000 description 1
- 239000004473 Threonine Substances 0.000 description 1
- HYVLNORXQGKONN-NUTKFTJISA-N Trp-Ala-Lys Chemical compound C1=CC=C2C(C[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@@H](CCCCN)C(O)=O)=CNC2=C1 HYVLNORXQGKONN-NUTKFTJISA-N 0.000 description 1
- GRQCSEWEPIHLBI-JQWIXIFHSA-N Trp-Asn Chemical compound C1=CC=C2C(C[C@H](N)C(=O)N[C@@H](CC(N)=O)C(O)=O)=CNC2=C1 GRQCSEWEPIHLBI-JQWIXIFHSA-N 0.000 description 1
- RYXOUTORDIUWNI-BPUTZDHNSA-N Trp-Asn-Met Chemical compound CSCC[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](CC1=CNC2=CC=CC=C21)N RYXOUTORDIUWNI-BPUTZDHNSA-N 0.000 description 1
- UTQBQJNSNXJNIH-IHPCNDPISA-N Trp-Asn-Phe Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](CC2=CNC3=CC=CC=C32)N UTQBQJNSNXJNIH-IHPCNDPISA-N 0.000 description 1
- FKAPNDWDLDWZNF-QEJZJMRPSA-N Trp-Asp-Glu Chemical compound C1=CC=C2C(=C1)C(=CN2)C[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CCC(=O)O)C(=O)O)N FKAPNDWDLDWZNF-QEJZJMRPSA-N 0.000 description 1
- CZWIHKFGHICAJX-BPUTZDHNSA-N Trp-Glu-Glu Chemical compound C1=CC=C2C(C[C@H](N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O)=CNC2=C1 CZWIHKFGHICAJX-BPUTZDHNSA-N 0.000 description 1
- VMBBTANKMSRJSS-JSGCOSHPSA-N Trp-Glu-Gly Chemical compound [H]N[C@@H](CC1=CNC2=C1C=CC=C2)C(=O)N[C@@H](CCC(O)=O)C(=O)NCC(O)=O VMBBTANKMSRJSS-JSGCOSHPSA-N 0.000 description 1
- UYKREHOKELZSPB-JTQLQIEISA-N Trp-Gly Chemical compound C1=CC=C2C(C[C@H](N)C(=O)NCC(O)=O)=CNC2=C1 UYKREHOKELZSPB-JTQLQIEISA-N 0.000 description 1
- NOBINHCGDUHOBV-NAZCDGGXSA-N Trp-His-Thr Chemical compound [H]N[C@@H](CC1=CNC2=C1C=CC=C2)C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H]([C@@H](C)O)C(O)=O NOBINHCGDUHOBV-NAZCDGGXSA-N 0.000 description 1
- YDTKYBHPRULROG-LTHWPDAASA-N Trp-Ile-Thr Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H]([C@@H](C)O)C(=O)O)NC(=O)[C@H](CC1=CNC2=CC=CC=C21)N YDTKYBHPRULROG-LTHWPDAASA-N 0.000 description 1
- WKCFCVBOFKEVKY-HSCHXYMDSA-N Trp-Leu-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CC1=CNC2=CC=CC=C21)N WKCFCVBOFKEVKY-HSCHXYMDSA-N 0.000 description 1
- KWTRGSQOQHZKIA-PMVMPFDFSA-N Trp-Lys-Tyr Chemical compound C([C@H](NC(=O)[C@@H](NC(=O)[C@@H](N)CC=1C2=CC=CC=C2NC=1)CCCCN)C(O)=O)C1=CC=C(O)C=C1 KWTRGSQOQHZKIA-PMVMPFDFSA-N 0.000 description 1
- RERRMBXDSFMBQE-ZFWWWQNUSA-N Trp-Met-Gly Chemical compound CSCC[C@@H](C(=O)NCC(=O)O)NC(=O)[C@H](CC1=CNC2=CC=CC=C21)N RERRMBXDSFMBQE-ZFWWWQNUSA-N 0.000 description 1
- VUMCLPHXCBIJJB-PMVMPFDFSA-N Trp-Phe-His Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CC2=CN=CN2)C(=O)O)NC(=O)[C@H](CC3=CNC4=CC=CC=C43)N VUMCLPHXCBIJJB-PMVMPFDFSA-N 0.000 description 1
- UIRPULWLRODAEQ-QEJZJMRPSA-N Trp-Ser-Glu Chemical compound C1=CC=C2C(C[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(O)=O)C(O)=O)=CNC2=C1 UIRPULWLRODAEQ-QEJZJMRPSA-N 0.000 description 1
- YBRHKUNWEYBZGT-WLTAIBSBSA-N Trp-Thr Chemical compound C1=CC=C2C(C[C@H](N)C(=O)N[C@@H]([C@H](O)C)C(O)=O)=CNC2=C1 YBRHKUNWEYBZGT-WLTAIBSBSA-N 0.000 description 1
- YCQXZDHDSUHUSG-FJHTZYQYSA-N Trp-Thr-Ala Chemical compound C1=CC=C2C(C[C@H](N)C(=O)N[C@@H]([C@H](O)C)C(=O)N[C@@H](C)C(O)=O)=CNC2=C1 YCQXZDHDSUHUSG-FJHTZYQYSA-N 0.000 description 1
- HTGJDTPQYFMKNC-VFAJRCTISA-N Trp-Thr-Leu Chemical compound C1=CC=C2C(C[C@H](N)C(=O)N[C@H](C(=O)N[C@@H](CC(C)C)C(O)=O)[C@@H](C)O)=CNC2=C1 HTGJDTPQYFMKNC-VFAJRCTISA-N 0.000 description 1
- UPUNWAXSLPBMRK-XTWBLICNSA-N Trp-Thr-Thr Chemical compound [H]N[C@@H](CC1=CNC2=C1C=CC=C2)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O UPUNWAXSLPBMRK-XTWBLICNSA-N 0.000 description 1
- DVLHKUWLNKDINO-PMVMPFDFSA-N Trp-Tyr-Leu Chemical compound [H]N[C@@H](CC1=CNC2=C1C=CC=C2)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC(C)C)C(O)=O DVLHKUWLNKDINO-PMVMPFDFSA-N 0.000 description 1
- UIRVSEPRMWDVEW-RNXOBYDBSA-N Trp-Tyr-Phe Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)O)NC(=O)[C@H](CC2=CC=C(C=C2)O)NC(=O)[C@H](CC3=CNC4=CC=CC=C43)N UIRVSEPRMWDVEW-RNXOBYDBSA-N 0.000 description 1
- RWTFCAMQLFNPTK-UMPQAUOISA-N Trp-Val-Thr Chemical compound C1=CC=C2C(C[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H]([C@@H](C)O)C(O)=O)=CNC2=C1 RWTFCAMQLFNPTK-UMPQAUOISA-N 0.000 description 1
- 102000014384 Type C Phospholipases Human genes 0.000 description 1
- 108010079194 Type C Phospholipases Proteins 0.000 description 1
- NIHNMOSRSAYZIT-BPNCWPANSA-N Tyr-Ala-Arg Chemical compound NC(=N)NCCC[C@@H](C(O)=O)NC(=O)[C@H](C)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 NIHNMOSRSAYZIT-BPNCWPANSA-N 0.000 description 1
- QJBWZNTWJSZUOY-UWJYBYFXSA-N Tyr-Ala-Cys Chemical compound C[C@@H](C(=O)N[C@@H](CS)C(=O)O)NC(=O)[C@H](CC1=CC=C(C=C1)O)N QJBWZNTWJSZUOY-UWJYBYFXSA-N 0.000 description 1
- TVOGEPLDNYTAHD-CQDKDKBSSA-N Tyr-Ala-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](C)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 TVOGEPLDNYTAHD-CQDKDKBSSA-N 0.000 description 1
- DXYWRYQRKPIGGU-BPNCWPANSA-N Tyr-Ala-Val Chemical compound CC(C)[C@@H](C(O)=O)NC(=O)[C@H](C)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 DXYWRYQRKPIGGU-BPNCWPANSA-N 0.000 description 1
- AYHSJESDFKREAR-KKUMJFAQSA-N Tyr-Asn-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CC(N)=O)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 AYHSJESDFKREAR-KKUMJFAQSA-N 0.000 description 1
- SCCKSNREWHMKOJ-SRVKXCTJSA-N Tyr-Asn-Ser Chemical compound N[C@@H](Cc1ccc(O)cc1)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CO)C(O)=O SCCKSNREWHMKOJ-SRVKXCTJSA-N 0.000 description 1
- IXTQGBGHWQEEDE-AVGNSLFASA-N Tyr-Asp-Gln Chemical compound NC(=O)CC[C@@H](C(O)=O)NC(=O)[C@H](CC(O)=O)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 IXTQGBGHWQEEDE-AVGNSLFASA-N 0.000 description 1
- YRBHLWWGSSQICE-IHRRRGAJSA-N Tyr-Asp-Met Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CCSC)C(O)=O YRBHLWWGSSQICE-IHRRRGAJSA-N 0.000 description 1
- MNMYOSZWCKYEDI-JRQIVUDYSA-N Tyr-Asp-Thr Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O MNMYOSZWCKYEDI-JRQIVUDYSA-N 0.000 description 1
- QOEZFICGUZTRFX-IHRRRGAJSA-N Tyr-Cys-Val Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CS)C(=O)N[C@@H](C(C)C)C(O)=O QOEZFICGUZTRFX-IHRRRGAJSA-N 0.000 description 1
- QUILOGWWLXMSAT-IHRRRGAJSA-N Tyr-Gln-Gln Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCC(N)=O)C(O)=O QUILOGWWLXMSAT-IHRRRGAJSA-N 0.000 description 1
- DXUVJJRTVACXSO-KKUMJFAQSA-N Tyr-Gln-Met Chemical compound CSCC[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)N)NC(=O)[C@H](CC1=CC=C(C=C1)O)N DXUVJJRTVACXSO-KKUMJFAQSA-N 0.000 description 1
- FJKXUIJOMUWCDD-FHWLQOOXSA-N Tyr-Gln-Tyr Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H](CC2=CC=C(C=C2)O)C(=O)O)N)O FJKXUIJOMUWCDD-FHWLQOOXSA-N 0.000 description 1
- AKLNEFNQWLHIGY-QWRGUYRKSA-N Tyr-Gly-Asp Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)NCC(=O)N[C@@H](CC(=O)O)C(=O)O)N)O AKLNEFNQWLHIGY-QWRGUYRKSA-N 0.000 description 1
- AZGZDDNKFFUDEH-QWRGUYRKSA-N Tyr-Gly-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)CNC(=O)[C@@H](N)CC1=CC=C(O)C=C1 AZGZDDNKFFUDEH-QWRGUYRKSA-N 0.000 description 1
- ZQOOYCZQENFIMC-STQMWFEESA-N Tyr-His Chemical compound C([C@H](N)C(=O)N[C@@H](CC=1N=CNC=1)C(O)=O)C1=CC=C(O)C=C1 ZQOOYCZQENFIMC-STQMWFEESA-N 0.000 description 1
- GFJXBLSZOFWHAW-JYJNAYRXSA-N Tyr-His-Glu Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CCC(O)=O)C(O)=O GFJXBLSZOFWHAW-JYJNAYRXSA-N 0.000 description 1
- HHFMNAVFGBYSAT-IGISWZIWSA-N Tyr-Ile-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H]([C@@H](C)CC)C(=O)O)NC(=O)[C@H](CC1=CC=C(C=C1)O)N HHFMNAVFGBYSAT-IGISWZIWSA-N 0.000 description 1
- AZZLDIDWPZLCCW-ZEWNOJEFSA-N Tyr-Ile-Phe Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O AZZLDIDWPZLCCW-ZEWNOJEFSA-N 0.000 description 1
- QARCDOCCDOLJSF-HJPIBITLSA-N Tyr-Ile-Ser Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CO)C(=O)O)NC(=O)[C@H](CC1=CC=C(C=C1)O)N QARCDOCCDOLJSF-HJPIBITLSA-N 0.000 description 1
- GULIUBBXCYPDJU-CQDKDKBSSA-N Tyr-Leu-Ala Chemical compound [O-]C(=O)[C@H](C)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H]([NH3+])CC1=CC=C(O)C=C1 GULIUBBXCYPDJU-CQDKDKBSSA-N 0.000 description 1
- PRONOHBTMLNXCZ-BZSNNMDCSA-N Tyr-Leu-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 PRONOHBTMLNXCZ-BZSNNMDCSA-N 0.000 description 1
- OLYXUGBVBGSZDN-ACRUOGEOSA-N Tyr-Leu-Tyr Chemical compound C([C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(O)=O)C1=CC=C(O)C=C1 OLYXUGBVBGSZDN-ACRUOGEOSA-N 0.000 description 1
- CDKZJGMPZHPAJC-ULQDDVLXSA-N Tyr-Leu-Val Chemical compound CC(C)[C@@H](C(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 CDKZJGMPZHPAJC-ULQDDVLXSA-N 0.000 description 1
- AVFGBGGRZOKSFS-KJEVXHAQSA-N Tyr-Met-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](CCSC)NC(=O)[C@H](CC1=CC=C(C=C1)O)N)O AVFGBGGRZOKSFS-KJEVXHAQSA-N 0.000 description 1
- UPODKYBYUBTWSV-BZSNNMDCSA-N Tyr-Phe-Cys Chemical compound C([C@H](N)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@@H](CS)C(O)=O)C1=CC=C(O)C=C1 UPODKYBYUBTWSV-BZSNNMDCSA-N 0.000 description 1
- SCZJKZLFSSPJDP-ACRUOGEOSA-N Tyr-Phe-Leu Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(C)C)C(O)=O SCZJKZLFSSPJDP-ACRUOGEOSA-N 0.000 description 1
- WURLIFOWSMBUAR-SLFFLAALSA-N Tyr-Phe-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CC2=CC=CC=C2)NC(=O)[C@H](CC3=CC=C(C=C3)O)N)C(=O)O WURLIFOWSMBUAR-SLFFLAALSA-N 0.000 description 1
- HRHYJNLMIJWGLF-BZSNNMDCSA-N Tyr-Ser-Phe Chemical compound C([C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC=1C=CC=CC=1)C(O)=O)C1=CC=C(O)C=C1 HRHYJNLMIJWGLF-BZSNNMDCSA-N 0.000 description 1
- RIVVDNTUSRVTQT-IRIUXVKKSA-N Tyr-Thr-Gln Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)O)NC(=O)[C@H](CC1=CC=C(C=C1)O)N)O RIVVDNTUSRVTQT-IRIUXVKKSA-N 0.000 description 1
- JAQGKXUEKGKTKX-HOTGVXAUSA-N Tyr-Tyr Chemical compound C([C@H](N)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(O)=O)C1=CC=C(O)C=C1 JAQGKXUEKGKTKX-HOTGVXAUSA-N 0.000 description 1
- GZWPQZDVTBZVEP-BZSNNMDCSA-N Tyr-Tyr-Asn Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC(N)=O)C(O)=O GZWPQZDVTBZVEP-BZSNNMDCSA-N 0.000 description 1
- QVYFTFIBKCDHIE-ACRUOGEOSA-N Tyr-Tyr-Lys Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)N[C@@H](CC2=CC=C(C=C2)O)C(=O)N[C@@H](CCCCN)C(=O)O)N)O QVYFTFIBKCDHIE-ACRUOGEOSA-N 0.000 description 1
- WOCYUGQDXPTQPY-FXQIFTODSA-N Val-Ala-Cys Chemical compound C[C@@H](C(=O)N[C@@H](CS)C(=O)O)NC(=O)[C@H](C(C)C)N WOCYUGQDXPTQPY-FXQIFTODSA-N 0.000 description 1
- SMKXLHVZIFKQRB-GUBZILKMSA-N Val-Ala-Met Chemical compound C[C@@H](C(=O)N[C@@H](CCSC)C(=O)O)NC(=O)[C@H](C(C)C)N SMKXLHVZIFKQRB-GUBZILKMSA-N 0.000 description 1
- SLLKXDSRVAOREO-KZVJFYERSA-N Val-Ala-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](C)NC(=O)[C@H](C(C)C)N)O SLLKXDSRVAOREO-KZVJFYERSA-N 0.000 description 1
- CVUDMNSZAIZFAE-UHFFFAOYSA-N Val-Arg-Pro Natural products NC(N)=NCCCC(NC(=O)C(N)C(C)C)C(=O)N1CCCC1C(O)=O CVUDMNSZAIZFAE-UHFFFAOYSA-N 0.000 description 1
- QGFPYRPIUXBYGR-YDHLFZDLSA-N Val-Asn-Phe Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)O)N QGFPYRPIUXBYGR-YDHLFZDLSA-N 0.000 description 1
- VLOYGOZDPGYWFO-LAEOZQHASA-N Val-Asp-Glu Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O VLOYGOZDPGYWFO-LAEOZQHASA-N 0.000 description 1
- WPSXZFTVLIAPCN-WDSKDSINSA-N Val-Cys Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](CS)C(O)=O WPSXZFTVLIAPCN-WDSKDSINSA-N 0.000 description 1
- PFMAFMPJJSHNDW-ZKWXMUAHSA-N Val-Cys-Asn Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CS)C(=O)N[C@@H](CC(=O)N)C(=O)O)N PFMAFMPJJSHNDW-ZKWXMUAHSA-N 0.000 description 1
- VXCAZHCVDBQMTP-NRPADANISA-N Val-Cys-Gln Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CS)C(=O)N[C@@H](CCC(=O)N)C(=O)O)N VXCAZHCVDBQMTP-NRPADANISA-N 0.000 description 1
- LMSBRIVOCYOKMU-NRPADANISA-N Val-Gln-Cys Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H](CS)C(=O)O)N LMSBRIVOCYOKMU-NRPADANISA-N 0.000 description 1
- NYTKXWLZSNRILS-IFFSRLJSSA-N Val-Gln-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)N)NC(=O)[C@H](C(C)C)N)O NYTKXWLZSNRILS-IFFSRLJSSA-N 0.000 description 1
- LAYSXAOGWHKNED-XPUUQOCRSA-N Val-Gly-Ser Chemical compound CC(C)[C@H](N)C(=O)NCC(=O)N[C@@H](CO)C(O)=O LAYSXAOGWHKNED-XPUUQOCRSA-N 0.000 description 1
- CPGJELLYDQEDRK-NAKRPEOUSA-N Val-Ile-Ala Chemical compound CC[C@H](C)[C@H](NC(=O)[C@@H](N)C(C)C)C(=O)N[C@@H](C)C(O)=O CPGJELLYDQEDRK-NAKRPEOUSA-N 0.000 description 1
- LKUDRJSNRWVGMS-QSFUFRPTSA-N Val-Ile-Asp Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)O)NC(=O)[C@H](C(C)C)N LKUDRJSNRWVGMS-QSFUFRPTSA-N 0.000 description 1
- APQIVBCUIUDSMB-OSUNSFLBSA-N Val-Ile-Thr Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H]([C@@H](C)O)C(=O)O)NC(=O)[C@H](C(C)C)N APQIVBCUIUDSMB-OSUNSFLBSA-N 0.000 description 1
- XCTHZFGSVQBHBW-IUCAKERBSA-N Val-Leu Chemical compound CC(C)C[C@@H](C([O-])=O)NC(=O)[C@@H]([NH3+])C(C)C XCTHZFGSVQBHBW-IUCAKERBSA-N 0.000 description 1
- HGJRMXOWUWVUOA-GVXVVHGQSA-N Val-Leu-Gln Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)O)NC(=O)[C@H](C(C)C)N HGJRMXOWUWVUOA-GVXVVHGQSA-N 0.000 description 1
- ZZGPVSZDZQRJQY-ULQDDVLXSA-N Val-Leu-Phe Chemical compound CC(C)C[C@H](NC(=O)[C@@H](N)C(C)C)C(=O)N[C@@H](Cc1ccccc1)C(O)=O ZZGPVSZDZQRJQY-ULQDDVLXSA-N 0.000 description 1
- WDIWOIRFNMLNKO-ULQDDVLXSA-N Val-Leu-Tyr Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 WDIWOIRFNMLNKO-ULQDDVLXSA-N 0.000 description 1
- QRVPEKJBBRYISE-XUXIUFHCSA-N Val-Lys-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CCCCN)NC(=O)[C@H](C(C)C)N QRVPEKJBBRYISE-XUXIUFHCSA-N 0.000 description 1
- UOUIMEGEPSBZIV-ULQDDVLXSA-N Val-Lys-Tyr Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 UOUIMEGEPSBZIV-ULQDDVLXSA-N 0.000 description 1
- VPGCVZRRBYOGCD-AVGNSLFASA-N Val-Lys-Val Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](C(C)C)C(O)=O VPGCVZRRBYOGCD-AVGNSLFASA-N 0.000 description 1
- OJOMXGVLFKYDKP-QXEWZRGKSA-N Val-Met-Asp Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CC(=O)O)C(=O)O)N OJOMXGVLFKYDKP-QXEWZRGKSA-N 0.000 description 1
- YDVDTCJGBBJGRT-GUBZILKMSA-N Val-Met-Ser Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CO)C(=O)O)N YDVDTCJGBBJGRT-GUBZILKMSA-N 0.000 description 1
- HJSLDXZAZGFPDK-ULQDDVLXSA-N Val-Phe-Leu Chemical compound CC(C)C[C@@H](C(=O)O)NC(=O)[C@H](CC1=CC=CC=C1)NC(=O)[C@H](C(C)C)N HJSLDXZAZGFPDK-ULQDDVLXSA-N 0.000 description 1
- YKNOJPJWNVHORX-UNQGMJICSA-N Val-Phe-Thr Chemical compound CC(C)[C@H](N)C(=O)N[C@H](C(=O)N[C@@H]([C@@H](C)O)C(O)=O)CC1=CC=CC=C1 YKNOJPJWNVHORX-UNQGMJICSA-N 0.000 description 1
- MJOUSKQHAIARKI-JYJNAYRXSA-N Val-Phe-Val Chemical compound CC(C)[C@H](N)C(=O)N[C@H](C(=O)N[C@@H](C(C)C)C(O)=O)CC1=CC=CC=C1 MJOUSKQHAIARKI-JYJNAYRXSA-N 0.000 description 1
- BGXVHVMJZCSOCA-AVGNSLFASA-N Val-Pro-Lys Chemical compound CC(C)[C@@H](C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCCCN)C(=O)O)N BGXVHVMJZCSOCA-AVGNSLFASA-N 0.000 description 1
- RYHUIHUOYRNNIE-NRPADANISA-N Val-Ser-Gln Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(=O)N)C(=O)O)N RYHUIHUOYRNNIE-NRPADANISA-N 0.000 description 1
- UGFMVXRXULGLNO-XPUUQOCRSA-N Val-Ser-Gly Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](CO)C(=O)NCC(O)=O UGFMVXRXULGLNO-XPUUQOCRSA-N 0.000 description 1
- PGQUDQYHWICSAB-NAKRPEOUSA-N Val-Ser-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CO)NC(=O)[C@H](C(C)C)N PGQUDQYHWICSAB-NAKRPEOUSA-N 0.000 description 1
- VHIZXDZMTDVFGX-DCAQKATOSA-N Val-Ser-Leu Chemical compound CC(C)C[C@@H](C(=O)O)NC(=O)[C@H](CO)NC(=O)[C@H](C(C)C)N VHIZXDZMTDVFGX-DCAQKATOSA-N 0.000 description 1
- CEKSLIVSNNGOKH-KZVJFYERSA-N Val-Thr-Ala Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](C)C(=O)O)NC(=O)[C@H](C(C)C)N)O CEKSLIVSNNGOKH-KZVJFYERSA-N 0.000 description 1
- PQSNETRGCRUOGP-KKHAAJSZSA-N Val-Thr-Asn Chemical compound CC(C)[C@H](N)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@H](C(O)=O)CC(N)=O PQSNETRGCRUOGP-KKHAAJSZSA-N 0.000 description 1
- YQYFYUSYEDNLSD-YEPSODPASA-N Val-Thr-Gly Chemical compound CC(C)[C@H](N)C(=O)N[C@@H]([C@@H](C)O)C(=O)NCC(O)=O YQYFYUSYEDNLSD-YEPSODPASA-N 0.000 description 1
- OFTXTCGQJXTNQS-XGEHTFHBSA-N Val-Thr-Ser Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CO)C(=O)O)NC(=O)[C@H](C(C)C)N)O OFTXTCGQJXTNQS-XGEHTFHBSA-N 0.000 description 1
- HTONZBWRYUKUKC-RCWTZXSCSA-N Val-Thr-Val Chemical compound CC(C)[C@H](N)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C(C)C)C(O)=O HTONZBWRYUKUKC-RCWTZXSCSA-N 0.000 description 1
- LZDNBBYBDGBADK-KBPBESRZSA-N Val-Trp Chemical compound C1=CC=C2C(C[C@H](NC(=O)[C@@H](N)C(C)C)C(O)=O)=CNC2=C1 LZDNBBYBDGBADK-KBPBESRZSA-N 0.000 description 1
- VEYJKJORLPYVLO-RYUDHWBXSA-N Val-Tyr Chemical compound CC(C)[C@H](N)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 VEYJKJORLPYVLO-RYUDHWBXSA-N 0.000 description 1
- IECQJCJNPJVUSB-IHRRRGAJSA-N Val-Tyr-Ser Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N[C@@H](CO)C(O)=O IECQJCJNPJVUSB-IHRRRGAJSA-N 0.000 description 1
- MOJFVLVTLZDQGW-AVGNSLFASA-N Val-Val-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](C(C)C)NC(=O)[C@@H](N)C(C)C MOJFVLVTLZDQGW-AVGNSLFASA-N 0.000 description 1
- XNLUVJPMPAZHCY-JYJNAYRXSA-N Val-Val-Phe Chemical compound CC(C)[C@H]([NH3+])C(=O)N[C@@H](C(C)C)C(=O)N[C@H](C([O-])=O)CC1=CC=CC=C1 XNLUVJPMPAZHCY-JYJNAYRXSA-N 0.000 description 1
- STTYIMSDIYISRG-UHFFFAOYSA-N Valyl-Serine Chemical compound CC(C)C(N)C(=O)NC(CO)C(O)=O STTYIMSDIYISRG-UHFFFAOYSA-N 0.000 description 1
- 108091005722 Vomeronasal receptors Proteins 0.000 description 1
- 239000002253 acid Substances 0.000 description 1
- 150000007513 acids Chemical class 0.000 description 1
- 239000012190 activator Substances 0.000 description 1
- 239000004480 active ingredient Substances 0.000 description 1
- 230000010933 acylation Effects 0.000 description 1
- 238000005917 acylation reaction Methods 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 102000030621 adenylate cyclase Human genes 0.000 description 1
- 108060000200 adenylate cyclase Proteins 0.000 description 1
- 239000002671 adjuvant Substances 0.000 description 1
- 208000012761 aggressive behavior Diseases 0.000 description 1
- 108010024078 alanyl-glycyl-serine Proteins 0.000 description 1
- 108010069490 alanyl-glycyl-seryl-glutamic acid Proteins 0.000 description 1
- 108010045350 alanyl-tyrosyl-alanine Proteins 0.000 description 1
- 108010011559 alanylphenylalanine Proteins 0.000 description 1
- 230000001476 alcoholic effect Effects 0.000 description 1
- 125000005600 alkyl phosphonate group Chemical group 0.000 description 1
- 230000029936 alkylation Effects 0.000 description 1
- 238000005804 alkylation reaction Methods 0.000 description 1
- HMFHBZSHGGEWLO-UHFFFAOYSA-N alpha-D-Furanose-Ribose Natural products OCC1OC(O)C(O)C1O HMFHBZSHGGEWLO-UHFFFAOYSA-N 0.000 description 1
- 150000001412 amines Chemical class 0.000 description 1
- 210000004727 amygdala Anatomy 0.000 description 1
- 238000010171 animal model Methods 0.000 description 1
- 239000003242 anti bacterial agent Substances 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 229940088710 antibiotic agent Drugs 0.000 description 1
- 239000003963 antioxidant agent Substances 0.000 description 1
- 235000006708 antioxidants Nutrition 0.000 description 1
- 239000002787 antisense oligonuctleotide Substances 0.000 description 1
- 239000008365 aqueous carrier Substances 0.000 description 1
- 239000003125 aqueous solvent Substances 0.000 description 1
- PYMYPHUHKUWMLA-WDCZJNDASA-N arabinose Chemical compound OC[C@@H](O)[C@@H](O)[C@H](O)C=O PYMYPHUHKUWMLA-WDCZJNDASA-N 0.000 description 1
- PYMYPHUHKUWMLA-UHFFFAOYSA-N arabinose Natural products OCC(O)C(O)C(O)C=O PYMYPHUHKUWMLA-UHFFFAOYSA-N 0.000 description 1
- ODKSFYDXXFIFQN-UHFFFAOYSA-N arginine Natural products OC(=O)C(N)CCCNC(N)=N ODKSFYDXXFIFQN-UHFFFAOYSA-N 0.000 description 1
- 108010008355 arginyl-glutamine Proteins 0.000 description 1
- 125000003118 aryl group Chemical group 0.000 description 1
- 229960001230 asparagine Drugs 0.000 description 1
- 235000009582 asparagine Nutrition 0.000 description 1
- 210000003050 axon Anatomy 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- SRBFZHDQGSBBOR-UHFFFAOYSA-N beta-D-Pyranose-Lyxose Natural products OC1COC(O)C(O)C1O SRBFZHDQGSBBOR-UHFFFAOYSA-N 0.000 description 1
- 108091007497 betacoronavirus-specific marker domains Proteins 0.000 description 1
- 238000002799 binding type assay Methods 0.000 description 1
- 239000012472 biological sample Substances 0.000 description 1
- 235000020958 biotin Nutrition 0.000 description 1
- 210000004204 blood vessel Anatomy 0.000 description 1
- 239000006172 buffering agent Substances 0.000 description 1
- 210000004899 c-terminal region Anatomy 0.000 description 1
- 238000010804 cDNA synthesis Methods 0.000 description 1
- BPKIGYQJPYCAOW-FFJTTWKXSA-I calcium;potassium;disodium;(2s)-2-hydroxypropanoate;dichloride;dihydroxide;hydrate Chemical compound O.[OH-].[OH-].[Na+].[Na+].[Cl-].[Cl-].[K+].[Ca+2].C[C@H](O)C([O-])=O BPKIGYQJPYCAOW-FFJTTWKXSA-I 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 150000004657 carbamic acid derivatives Chemical class 0.000 description 1
- 150000001720 carbohydrates Chemical class 0.000 description 1
- 229910052799 carbon Inorganic materials 0.000 description 1
- 150000004649 carbonic acid derivatives Chemical class 0.000 description 1
- 125000002915 carbonyl group Chemical group [*:2]C([*:1])=O 0.000 description 1
- 125000002057 carboxymethyl group Chemical group [H]OC(=O)C([H])([H])[*] 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 238000004113 cell culture Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000030570 cellular localization Effects 0.000 description 1
- 230000033077 cellular process Effects 0.000 description 1
- 239000002738 chelating agent Substances 0.000 description 1
- 238000007385 chemical modification Methods 0.000 description 1
- 239000007795 chemical reaction product Substances 0.000 description 1
- 210000004978 chinese hamster ovary cell Anatomy 0.000 description 1
- 239000012501 chromatography medium Substances 0.000 description 1
- 238000003200 chromosome mapping Methods 0.000 description 1
- 238000003776 cleavage reaction Methods 0.000 description 1
- 239000013599 cloning vector Substances 0.000 description 1
- 230000024203 complement activation Effects 0.000 description 1
- 239000000356 contaminant Substances 0.000 description 1
- 238000013270 controlled release Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000001351 cycling effect Effects 0.000 description 1
- XUJNEKJLAYXESH-UHFFFAOYSA-N cysteine Natural products SCC(N)C(O)=O XUJNEKJLAYXESH-UHFFFAOYSA-N 0.000 description 1
- 125000000151 cysteine group Chemical group N[C@@H](CS)C(=O)* 0.000 description 1
- 150000001945 cysteines Chemical class 0.000 description 1
- 108010069495 cysteinyltyrosine Proteins 0.000 description 1
- 230000001086 cytosolic effect Effects 0.000 description 1
- 230000002939 deleterious effect Effects 0.000 description 1
- 239000005547 deoxyribonucleotide Substances 0.000 description 1
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000015155 detection of stimulus involved in sensory perception Effects 0.000 description 1
- 238000007865 diluting Methods 0.000 description 1
- 239000003085 diluting agent Substances 0.000 description 1
- 238000010790 dilution Methods 0.000 description 1
- 239000012895 dilution Substances 0.000 description 1
- LOKCTEFSRHRXRJ-UHFFFAOYSA-I dipotassium trisodium dihydrogen phosphate hydrogen phosphate dichloride Chemical compound P(=O)(O)(O)[O-].[K+].P(=O)(O)([O-])[O-].[Na+].[Na+].[Cl-].[K+].[Cl-].[Na+] LOKCTEFSRHRXRJ-UHFFFAOYSA-I 0.000 description 1
- 108010054813 diprotin B Proteins 0.000 description 1
- NAGJZTKCGNOGPW-UHFFFAOYSA-N dithiophosphoric acid Chemical class OP(O)(S)=S NAGJZTKCGNOGPW-UHFFFAOYSA-N 0.000 description 1
- 238000012377 drug delivery Methods 0.000 description 1
- 239000000975 dye Substances 0.000 description 1
- 229960001484 edetic acid Drugs 0.000 description 1
- 239000012636 effector Substances 0.000 description 1
- 239000003792 electrolyte Substances 0.000 description 1
- 238000002337 electrophoretic mobility shift assay Methods 0.000 description 1
- 230000003826 endocrine responses Effects 0.000 description 1
- 230000012202 endocytosis Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002255 enzymatic effect Effects 0.000 description 1
- 230000032050 esterification Effects 0.000 description 1
- 238000005886 esterification reaction Methods 0.000 description 1
- 230000012173 estrus Effects 0.000 description 1
- LVGKNOAMLMIIKO-QXMHVHEDSA-N ethyl oleate Chemical compound CCCCCCCC\C=C/CCCCCCCC(=O)OCC LVGKNOAMLMIIKO-QXMHVHEDSA-N 0.000 description 1
- 229940093471 ethyl oleate Drugs 0.000 description 1
- 230000001747 exhibiting effect Effects 0.000 description 1
- 239000000945 filler Substances 0.000 description 1
- 238000001215 fluorescent labelling Methods 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 125000000524 functional group Chemical group 0.000 description 1
- 230000002538 fungal effect Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 239000000499 gel Substances 0.000 description 1
- 108091008053 gene clusters Proteins 0.000 description 1
- 229930195712 glutamate Natural products 0.000 description 1
- 108010055341 glutamyl-glutamic acid Proteins 0.000 description 1
- HPAIKDPJURGQLN-UHFFFAOYSA-N glycyl-L-histidyl-L-phenylalanine Natural products C=1C=CC=CC=1CC(C(O)=O)NC(=O)C(NC(=O)CN)CC1=CN=CN1 HPAIKDPJURGQLN-UHFFFAOYSA-N 0.000 description 1
- 239000005090 green fluorescent protein Substances 0.000 description 1
- 210000003128 head Anatomy 0.000 description 1
- 229940094991 herring sperm dna Drugs 0.000 description 1
- 125000000623 heterocyclic group Chemical group 0.000 description 1
- 108010036413 histidylglycine Proteins 0.000 description 1
- 229940088597 hormone Drugs 0.000 description 1
- 239000005556 hormone Substances 0.000 description 1
- 102000054751 human RUNX1T1 Human genes 0.000 description 1
- 102000047147 human SCARA3 Human genes 0.000 description 1
- 210000003016 hypothalamus Anatomy 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000003100 immobilizing effect Effects 0.000 description 1
- 230000003053 immunization Effects 0.000 description 1
- 238000002649 immunization Methods 0.000 description 1
- 238000003317 immunochromatography Methods 0.000 description 1
- 229940127121 immunoconjugate Drugs 0.000 description 1
- 102000018358 immunoglobulin Human genes 0.000 description 1
- 229940072221 immunoglobulins Drugs 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 239000011261 inert gas Substances 0.000 description 1
- 238000001802 infusion Methods 0.000 description 1
- 230000002401 inhibitory effect Effects 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 238000013383 initial experiment Methods 0.000 description 1
- 229960003786 inosine Drugs 0.000 description 1
- 239000002917 insecticide Substances 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000007918 intramuscular administration Methods 0.000 description 1
- 238000007912 intraperitoneal administration Methods 0.000 description 1
- 238000004255 ion exchange chromatography Methods 0.000 description 1
- 229960000310 isoleucine Drugs 0.000 description 1
- AGPKZVBTJJNPAG-UHFFFAOYSA-N isoleucine Natural products CCC(C)C(N)C(O)=O AGPKZVBTJJNPAG-UHFFFAOYSA-N 0.000 description 1
- 210000003734 kidney Anatomy 0.000 description 1
- 108010049589 leucyl-leucyl-leucine Proteins 0.000 description 1
- 108010047926 leucyl-lysyl-tyrosine Proteins 0.000 description 1
- 108010091798 leucylleucine Proteins 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 230000007762 localization of cell Effects 0.000 description 1
- 238000004020 luminiscence type Methods 0.000 description 1
- 239000006166 lysate Substances 0.000 description 1
- 230000001320 lysogenic effect Effects 0.000 description 1
- 108010017391 lysylvaline Proteins 0.000 description 1
- 230000002101 lytic effect Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 230000007758 mating behavior Effects 0.000 description 1
- 239000002609 medium Substances 0.000 description 1
- 230000025350 membrane depolarization involved in regulation of action potential Effects 0.000 description 1
- 108010016686 methionyl-alanyl-serine Proteins 0.000 description 1
- 108010063431 methionyl-aspartyl-glycine Proteins 0.000 description 1
- 108010005942 methionylglycine Proteins 0.000 description 1
- 239000011325 microbead Substances 0.000 description 1
- 239000004005 microsphere Substances 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 230000011278 mitosis Effects 0.000 description 1
- 230000037230 mobility Effects 0.000 description 1
- 210000003205 muscle Anatomy 0.000 description 1
- 238000002703 mutagenesis Methods 0.000 description 1
- 231100000350 mutagenesis Toxicity 0.000 description 1
- 210000003928 nasal cavity Anatomy 0.000 description 1
- 210000000492 nasalseptum Anatomy 0.000 description 1
- 239000013642 negative control Substances 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 239000012457 nonaqueous media Substances 0.000 description 1
- 239000000346 nonvolatile oil Substances 0.000 description 1
- 210000001331 nose Anatomy 0.000 description 1
- 238000007899 nucleic acid hybridization Methods 0.000 description 1
- 235000015097 nutrients Nutrition 0.000 description 1
- 210000002475 olfactory pathway Anatomy 0.000 description 1
- 239000004006 olive oil Substances 0.000 description 1
- 235000008390 olive oil Nutrition 0.000 description 1
- 238000010397 one-hybrid screening Methods 0.000 description 1
- 210000000287 oocyte Anatomy 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 150000002895 organic esters Chemical class 0.000 description 1
- 125000000962 organic group Chemical group 0.000 description 1
- 238000007911 parenteral administration Methods 0.000 description 1
- 230000037361 pathway Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000011170 pharmaceutical development Methods 0.000 description 1
- 108010074082 phenylalanyl-alanyl-lysine Proteins 0.000 description 1
- 108010082795 phenylalanyl-arginyl-arginine Proteins 0.000 description 1
- 108010012581 phenylalanylglutamate Proteins 0.000 description 1
- 108010073101 phenylalanylleucine Proteins 0.000 description 1
- 108091011065 pheromone binding proteins Proteins 0.000 description 1
- 239000010452 phosphate Substances 0.000 description 1
- 239000002953 phosphate buffered saline Substances 0.000 description 1
- 125000002467 phosphate group Chemical group [H]OP(=O)(O[H])O[*] 0.000 description 1
- 150000008298 phosphoramidates Chemical class 0.000 description 1
- 150000003014 phosphoric acid esters Chemical class 0.000 description 1
- 230000001766 physiological effect Effects 0.000 description 1
- 230000035479 physiological effects, processes and functions Effects 0.000 description 1
- INAAIJLSXJJHOZ-UHFFFAOYSA-N pibenzimol Chemical compound C1CN(C)CCN1C1=CC=C(N=C(N2)C=3C=C4NC(=NC4=CC=3)C=3C=CC(O)=CC=3)C2=C1 INAAIJLSXJJHOZ-UHFFFAOYSA-N 0.000 description 1
- 229920001223 polyethylene glycol Polymers 0.000 description 1
- 102000054765 polymorphisms of proteins Human genes 0.000 description 1
- 229920002223 polystyrene Polymers 0.000 description 1
- 230000016833 positive regulation of signal transduction Effects 0.000 description 1
- 239000002244 precipitate Substances 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000004393 prognosis Methods 0.000 description 1
- 108700042769 prolyl-leucyl-glycine Proteins 0.000 description 1
- 108010029020 prolylglycine Proteins 0.000 description 1
- 108020001580 protein domains Proteins 0.000 description 1
- 238000001243 protein synthesis Methods 0.000 description 1
- 230000017854 proteolysis Effects 0.000 description 1
- 230000002685 pulmonary effect Effects 0.000 description 1
- 150000003212 purines Chemical class 0.000 description 1
- 150000003230 pyrimidines Chemical class 0.000 description 1
- 238000002708 random mutagenesis Methods 0.000 description 1
- 239000000018 receptor agonist Substances 0.000 description 1
- 229940044601 receptor agonist Drugs 0.000 description 1
- 239000002464 receptor antagonist Substances 0.000 description 1
- 230000022532 regulation of transcription, DNA-dependent Effects 0.000 description 1
- 230000001850 reproductive effect Effects 0.000 description 1
- 239000011347 resin Substances 0.000 description 1
- 229920005989 resin Polymers 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 210000001995 reticulocyte Anatomy 0.000 description 1
- 238000010839 reverse transcription Methods 0.000 description 1
- 239000002336 ribonucleotide Substances 0.000 description 1
- 125000002652 ribonucleotide group Chemical group 0.000 description 1
- 125000000548 ribosyl group Chemical group C1([C@H](O)[C@H](O)[C@H](O1)CO)* 0.000 description 1
- 230000007017 scission Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 210000002265 sensory receptor cell Anatomy 0.000 description 1
- 102000027509 sensory receptors Human genes 0.000 description 1
- 108091008691 sensory receptors Proteins 0.000 description 1
- 238000002864 sequence alignment Methods 0.000 description 1
- 108010069117 seryl-lysyl-aspartic acid Proteins 0.000 description 1
- 230000035938 sexual maturation Effects 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
- 238000001542 size-exclusion chromatography Methods 0.000 description 1
- 239000001509 sodium citrate Substances 0.000 description 1
- NLJMYIDDQXHKNR-UHFFFAOYSA-K sodium citrate Chemical compound O.O.[Na+].[Na+].[Na+].[O-]C(=O)CC(O)(CC([O-])=O)C([O-])=O NLJMYIDDQXHKNR-UHFFFAOYSA-K 0.000 description 1
- 239000001488 sodium phosphate Substances 0.000 description 1
- 229910000162 sodium phosphate Inorganic materials 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 239000003381 stabilizer Substances 0.000 description 1
- 238000010186 staining Methods 0.000 description 1
- 238000012289 standard assay Methods 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 150000003432 sterols Chemical class 0.000 description 1
- 235000003702 sterols Nutrition 0.000 description 1
- 230000004936 stimulating effect Effects 0.000 description 1
- 230000000638 stimulation Effects 0.000 description 1
- 238000007920 subcutaneous administration Methods 0.000 description 1
- 108060008004 synaptotagmin Proteins 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 150000003505 terpenes Chemical class 0.000 description 1
- 210000001550 testis Anatomy 0.000 description 1
- 229940124597 therapeutic agent Drugs 0.000 description 1
- 238000002560 therapeutic procedure Methods 0.000 description 1
- 230000005026 transcription initiation Effects 0.000 description 1
- 230000002103 transcriptional effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000014621 translational initiation Effects 0.000 description 1
- 230000032258 transport Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- RYFMWSXOAZQYPI-UHFFFAOYSA-K trisodium phosphate Chemical compound [Na+].[Na+].[Na+].[O-]P([O-])([O-])=O RYFMWSXOAZQYPI-UHFFFAOYSA-K 0.000 description 1
- 108010038745 tryptophylglycine Proteins 0.000 description 1
- 108010079202 tyrosyl-alanyl-cysteine Proteins 0.000 description 1
- 108010003137 tyrosyltyrosine Proteins 0.000 description 1
- 241000701447 unidentified baculovirus Species 0.000 description 1
- 241001529453 unidentified herpesvirus Species 0.000 description 1
- YSGSDAIMSCVPHG-UHFFFAOYSA-N valyl-methionine Chemical compound CSCCC(C(O)=O)NC(=O)C(N)C(C)C YSGSDAIMSCVPHG-UHFFFAOYSA-N 0.000 description 1
- 108010036320 valylleucine Proteins 0.000 description 1
- 108010021889 valylvaline Proteins 0.000 description 1
- 235000015112 vegetable and seed oil Nutrition 0.000 description 1
- 239000008158 vegetable oil Substances 0.000 description 1
- 229920002554 vinyl polymer Polymers 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
- 238000001086 yeast two-hybrid system Methods 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6809—Methods for determination or identification of nucleic acids involving differential detection
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61P—SPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
- A61P15/00—Drugs for genital or sexual disorders; Contraceptives
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K14/00—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
- C07K14/435—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
- C07K14/705—Receptors; Cell surface antigens; Cell surface determinants
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61K—PREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
- A61K38/00—Medicinal preparations containing peptides
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61K—PREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
- A61K39/00—Medicinal preparations containing antigens or antibodies
Landscapes
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- General Health & Medical Sciences (AREA)
- Zoology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Engineering & Computer Science (AREA)
- Immunology (AREA)
- Analytical Chemistry (AREA)
- Medicinal Chemistry (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Wood Science & Technology (AREA)
- Genetics & Genomics (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Chemical & Material Sciences (AREA)
- Public Health (AREA)
- Physics & Mathematics (AREA)
- Endocrinology (AREA)
- Reproductive Health (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Pharmacology & Pharmacy (AREA)
- Animal Behavior & Ethology (AREA)
- General Engineering & Computer Science (AREA)
- Veterinary Medicine (AREA)
- Cell Biology (AREA)
- Toxicology (AREA)
- Gastroenterology & Hepatology (AREA)
- Peptides Or Proteins (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
- Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)
- Medicines Containing Antibodies Or Antigens For Use As Internal Diagnostic Agents (AREA)
- Investigating Or Analysing Biological Materials (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention describes a multigene family encoding a collection of novel mammalian pheromone receptors. Nucleic acids encoding the pheromone receptor polypeptides, including fragments and biologically functional variants thereof are provided. Also included are polypeptides and fragments thereof encoded by such nucleic acids, and antibodies relating thereto. Methods and products for using such nucleic acids and polypeptides also are provided.
Description
Field o~f the Invention This invention relates to nucleic acids and encoded polypeptides which are part of a multigene family encoding a collection of novel mammalian pheromone receptors.
The invention further provides representative nucleic acids and encoded polypeptides in this multigene family. The representative polypeptides are expressed in the marine and rat 1 o vomeronasal organ (VNO). Agents which bind the nucleic acids or polypeptides also are provided. The invention further relates to methods of using such nucleic acids and polypeptides in the diagnosis and/or treatment of disease, including the use of these molecules in controlling fertility and behavior in vertebrates and invertebrates.
Background of the Invention Pheromones are intraspecific chemical signals found throughout the animal kingdom.
They regulate populations of animals by inducing innate behaviors and stereotyped changes in physiology (Karlson and Luscher, Nature, 1959,183:55-56; Wilson, Sci. Am., 1963, 208:100-114; Sorensen, Chem. Sens., 1996, 21:245-256). Pheromones can serve as cues for overcrowding, impending danger, reproductive status, gender, or dominance. In rodents, a variety of pheromone effects have been reported. These include effects on estrus and the onset of puberty as well as the induction of mating and aggressive behaviors (Singer, A.G., J. Steroid.
Biochem. Molec. Biol., 1991, 39:627-632; Halpern, M., Ann. Rev. Neurosci., 1987 10:325-362;
Wysocki, C.J., et al., In the Neurobiology of Taste and Smell, 1987, 125-150;
Novotny et al., Chemical signals in Vertebrates, 1990, Vol. 5, eds. D.W. Macdonald et al., Oxford University Press).
The detection of pheromones is mediated by the olfactory system. However, sensory neurons that detect pheromones are typically segregated from those that detect volatile odorants (Keverne, E.B., Trends Neurosci., 1983, 6:381-384; Halpern, M., Ann. Rev.
Neurosci., 1987, 10:325-362; Wysocki, C.J., et al., In the Neurobiology of Taste and Smell, 1987, 125-150;
Hildebrand, J.G., et al., Brain Res., 1997, 677:157-161 ). In marnrnals, sensory neurons in the nasal olfactory epithelium (OE) detect volatile odorants and some pheromones while those in an accessory olfactory organ, called the vomeronasal organ (VNO), are thought to be specialized to detect pheromones. The VNO is a tubular structure, at the base of the nasal septum, which is connected to the nasal cavity by a small duct. Signals from the OE are relayed through the olfactory bulb (OB) to the olfactory cortex, and then to multiple brain regions, including those involved in conscious perception. In contrast, signals from the VNO are conveyed through the accessory olfactory bulb (AOB) to the amygdala and hypothalamus, areas associated with the endocrine and behavioral responses induced by pheromones.
Volatile odorants are detected in the OE by as many as 1000 different types of odorant receptors (ORs), which are differentially expressed by olfactory sensory neurons (Buck and Axel, io Cell, 1991, 65:175-187; Levy, N.S., et al., J. Steroid Biochem. Mol. Biol., 1991, 39:633-637, 1991; Nef, P., et al., Proc. Natl. Acad. Sci., 1992, 89:8948-8952; Strotman, J., et al., Neuroreport, 1992, 3:1053-1056; Ngai, J., et al., Cell, 1993, 72:667-680;
Ressler, K.J., et al, Cell, 1993, 73:597-609; Vassar, R., et al, Cell, 1993, 74:309-318. The ORs are thought to couple to the G protein a subunit, Gao,~, thereby initiating a cascade of transduction events which culminate in the generation of action potentials in the sensory axons (reviewed in Firestein, S., Curr.Opin. in Neurobiology, 1992, 2:444-448; Reed, R., Neuron, 1992, 8:205-209; Ronnett, G., et al., Trends Neurosci, 1992, 15:508-513). Current evidence suggests that each OR may recognize a particular molecular feature that can be shared by many odorants (Ressler, K., et al., Celd, 1994, 79:1245-1255; Vassar, R., et al., Cell, 1994, 79:981-991; Axel, R., Sci. Am., 1995, 1273:154-159; Buck, L., Annu. Rev. Neurosci., 1996, 19:517-544). This is consistent with a combinatorial coding model in which the identities of different odorants are encoded by different combinations of receptors, but each receptor serves as one component of the codes for many odorants. By contrast, very little is known about how pheromones are detected or encoded in the VNO. Although VNO neurons (VNs) resemble olfactory sensory neurons in the nose, only a rare VN expresses an OR gene. VNs also lack a number of other olfactory sensory transduction molecules, including the G protein a subunit,Gaco,~ (Reed, R., Neuron, 1992, 8:205-209), which is highly expressed in olfactory neurons (Dulac and Axel, Cell, 1995, 83:195-206;
Berghard, A., et al, Proc. Natl. Acad Sci. USA, 1996, 93:2365-2369; Wu, Y., et al, Biochem.
Biopys. Res. Com., 1996, 220:900-904). Instead, VNs express high levels of two other G
3o protein a subunits,Gao and Gait (Dulac and Axel, Cell, 1995, 83:195-206;
Halpern, M., Brain Res., 41995, 677:157-161; Berghard, A., et al, Proc. Natl. Acad. Sci. USA, 1996, 93:2365-2369).
G,~ and Gait are expressed in spatially-segregated subsets of VNs that form longitudinal zones in the VNO neuroepithelium. Interestingly, Dulac and Axel have identified a family of 100 candidate pheromones receptors ("VNRs") which appear to be expressed exclusively in the Gait subset (Dulac and Axel, Cell, 1995, 83:195-206).
This invention differs from the state of the art in providing a novel family of mammalian pheromone receptors. Accordingly, the objects of the invention relate to providing compositions containing these novel receptors and their binding partners and methods for using such compositions to modulate pheromone receptor activity.
1 o The invention involves the discovery of a multigene family of mammalian pheromone receptors. In particular, the invention involves the cDNA cloning of multiple pheromone receptors from a marine VNO cDNA library and from a rat VNO cDNA library.
Partial sequences of human homologs of these pheromone receptors also are provided.
In general, the invention provides isolated nucleic acid molecules encoding the novel pheromone receptors, unique fragments of the isolated nucleic acid molecules, expression vectors containing the foregoing, and host cells transfected with the foregoing. The invention also provides isolated pheromone receptor polypeptides and agents which bind such polypeptides, including antibodies. The foregoing can be used in the diagnosis or treatment of conditions, including the control of fertility, that are characterized by the expression of a pheromone receptor 2o polypeptide. Methods for identifying pharmacological agents useful in the diagnosis or treatment of such conditions and methods for identifying additional members of this multigene family also are provided.
Applicants have discovered that the pheromone receptors disclosed herein are expressed in the vomeronasal organ (VNO), particularly in Goco protein expressing neurons. This is in contrast to the prior art VNO pheromone receptors which are expressed in neurons which express different G-coupled proteins (Gait-expressing neumns). Thus, the novel pheromone receptors disclosed herein are distinct from, and expressly exclude, the prior art VNO
pheromone receptors which differ in primary structure, as well as in cell localization. Although Applicants do not intend the invention to be limited to a particular theory or mechanism, the amino acid sequence 3o homology and structural organization of the pheromone receptor polypeptides to other well-known G-protein coupled receptors suggests that the pheromone receptors disclosed herein also are G-protein coupled. Thus, it is anticipated that the binding to the pheromone receptor of its cognate ligand (pheromone) will be accompanied by G-protein signal transduction, an event which can be measured using conventional screening assays, such as assays that measure changes in the intracellular concentrations of calcium and/or cyclic nucleotides (see, e.g., PCT
publication no. WO 94/18959, entitled "Calcium Receptor-Active Molecules", inventors E.
Nemeth et al.).
According to one aspect of the invention, a family of pheromone receptor polypeptides is provided. Each polypeptide of the family shares amino acid sequence homology and structural organization with a pheromone receptor polypeptide selected from the group consisting of SEQ
ID NO. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52. Each polypeptide member of the receptor family contains, from amino terminus to carboxyl terminus, the following domains: (a) an amino-terminal extracellular domain containing from 30 to 600 amino acids; (b) a transmembrane region comprising: (i) seven non-contiguous transmembrane domains designated TMl, TM2, TM3, TM4, TMS, TM6 and TM7, (ii) three non-contiguous extracellular domains designated EC2, EC3 and EC4, and (iii) three non-contiguous intracellular domains designated IC1, IC2, and IC3,wherein the transmembrane domains, the extracellular domains and the intracellular domains are attached to one another from amino terminus to carboxyl terminus in the order TM1-IC 1-TM2-EC2-TM3- IC2-TM4-EC3-TM6-EC4-TM7, and wherein the transmembrane region has at least about 35%
homology and a length approximately equal to a transmembrane region of a polypeptide selected from the group 2o consisting of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 34, 36, 38, 40, 42, 44, 46, 48, and 50; and (c) a carboxyl-terminal intracellular domain containing from 5 to 200 amino acids.
Each polypeptide member of the family is expressed in a Gao protein-expressing vomeronasal organ neuron or are expressed in another olfactory organ neuron in an animal which does not possess a vomeronasal organ. One skilled in the art can readily identify olfactory organs in animals which do not possess a vomeronasal organ.
In general, the amino-terminal extracellular domains (NTDs) of the receptor family members share sequence homology to a pheromone receptor polypeptide selected from the group consisting of SEQ ID NO. 2, 4, 6, 8,10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, and 50 to a lesser extent than that observed for the transmembrane region. The length of the extracellular domain can vary among members of the family.
Accordingly, certain embodiments of the invention have extracellular domains that contain at least 50, 100, 200, 300, 400 or 500 amino acids. Preferably, the transmembrane region has greater than 40% homology with the corresponding region of a pheromone receptor polypeptide selected from the group consisting of SEQ ID NO. 2, 4, 6, 8,10, 12, 14, 34, 36, 38, 40, 42, 44, 46, 48, and 50, and more preferably, have even greater sequence homology (e.g., more than 50%, 60%, 70%, 80% or 90%
homology). The length of the carboxyl-terminal intracellular domain can vary among members s of the family. Accordingly, certain embodiments of the invention have carboxyl-terminal intracellular domains that contain at least between 5 and SO amino acids. More preferably, carboxyl-terminal intracellular domains contain between 15 and 25 amino acids.
According to another aspect of the invention, a method for identifying a nucleic acid encoding a pheromone receptor is provided. The method involves contacting a mixture of nucleic acid molecules (genomic library, cDNA library, genomic DNA, RNA, etc.) with at least one nucleic acid probe of a nucleic acid selected from the group consisting of: (a) a nucleic acid molecule selected from the group consisting of SEQ ID NO. 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 54, and SS
that encodes a pheromone receptor polypeptide; (b) a unique fragment of (a); (c) a human homolog of (a) or (b); and (d) a t s set of degenerate primers of any of (a), (b) or (c); and identifying the sequences within the mixture that hybridize to the probe. Selected fragments of human homologs of a pheromone receptor are selected from the group consisting of SEQ ID NO. 51, 53, 54 and 55. In certain embodiments, the nucleic acid probe further includes a detectable label to facilitate identification of the sequence in the library which hybridizes to the probe. In certain embodiments, the probe 2o is represented by a pair of degenerate polymerase chain reaction ("PCR") primers that amplify a unique fragment of a nucleic acid molecule selected from the group consisting of SEQ ID NO.
1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 54, and 55. The meaning of "unique fragment" in reference to a nucleic acid is provided below.
By "degenerate PCR primers that amplify a unique fragment" is meant degenerate primers which 2s result in the amplification of a unique fragment following a polymerase chain reaction.
According to this embodiment, the method for identifying a nucleic acid encoding a pheromone receptor polypeptide further involves subjecting a mixture of nucleic acids and the degenerate PCR primers to amplification conditions prior to identifying the sequences of the mixture that hybridize to the probe and that form part of the amplification reaction products. In some 3o embodiments the pair of degenerate polymerase chain reaction primers is selected from a conserved sequence motif of a pheromone receptor polypeptide. A "conserved sequence motif' can be determined using the side-by-side comparison of the amino acid sequences of the different pheromone receptor polypeptides of the invention. Exemplary conserved sequence motifs include regions selected from the group consisting of amino acids 191-397, amino acids 565-825, amino acids 637-825, amino acids 637-804, amino acids 619-784, of the polypeptide of, for example, SEQ ID NO. 2 (VRl ). In preferred embodiments, the pair of degenerate polymerase s chain reaction primers is selected from the group consisting of SEQ ID NOs.
60 and 61, SEQ ID
NOs. 62 and 63, SEQ ID NOs. 64 and 63, SEQ ID NOs. 64 and 65, and SEQ ID NOs.
66 and 67.
According to yet another aspect of the invention, an isolated nucleic acid molecule is provided. The isolated nucleic acid molecule hybridizes under high or low stringency conditions to a molecule consisting of a nucleic acid sequence selected from the group consisting of SEQ
1o ID NO. l, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 54, and 55. The invention further embraces nucleic acid molecules that differ from the foregoing isolated nucleic acid molecules in codon sequence due to the degeneracy of the genetic code. The invention also embraces complements of the foregoing nucleic acids.
The pheromone receptors of the invention are expressed in the vomeronasal organ or, in 1 s an animal which lacks such an organ, are expressed in another olfactory organ. More particularly, the receptors of the invention are expressed in a Gao protein-expressing vomeronasal organ neuron. Although not intending to be bound to a particular mechanism, it is believed that the receptors of the invention are G-protein coupled receptors. This is supported by Applicants' discovery that the receptors of the invention are expressed in Goco protein-expressing 2o vomeronasal organ neurons.
The pheromone receptors of the invention bind to ligands (pheromones) which induce certain changes in receptor conformation. Methods for identifying ligands which bind to the pheromone receptors of the invention are provided below, e.g., by forming an affinity matrix containing immobilized receptor and using the matrix to isolate a cognate ligand from a complex 2s mixture. The particular ligand bound by a particular receptor is dictated by the primary and secondary structure of the receptor. In certain embodiments, the immobilized pheromone receptor polypeptide is a pheromone receptor polypeptide selected from the group consisting of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, SO and 52.
3o According to another aspect of the invention, an isolated nucleic acid molecule that is a unique fragment of any of the foregoing isolated nucleic acid molecules is provided. In general, the isolated nucleic acid molecule consists of a unique fragment between 12 and 4000 nucleotides in length, and complements thereof, of any cDNA (SEQ ID NOs. l, 3, 5, 7, 9, 11, ' 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 54, and 55) encoding a pheromone receptor polypeptide selected from the group consisting of SEQ ID NO.
2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52.
Depending upon its intended use (e.g., probe, primer), the unique fragment can be between 12 and 2000, 1000, 500, 250, 100, 50 or 25 nucleotides in length. Preferably, the isolated nucleic acid molecule consists of between 12 and 35 contiguous nucleotides of the foregoing cDNAs encoding the pheromone receptor polypeptides, or complements of such nucleic acid molecules.
More preferably, the unique fragment is at least 14, 15, 16, 17, 18, 20 or 22 contiguous 1 o nucleotides of the nucleic acid sequence of the foregoing cDNAs encoding the pheromone receptor polypeptides, or complements thereof. Particularly preferred isolated nucleic acid molecules are isolated fragments of the foregoing cDNAs which encode one or more of the following pheromone receptor polypeptide domains, alone or in combination (e.g., as fusion proteins): an amino-terminal extracellular domain, a transmembrane region, and a carboxy-terminal intracellular domain. In certain embodiments, the unique fragments are a pheromone receptor extracellular domain or a pheromone receptor intracellular domain coupled to at least one (e.g., 1, 2, 3, 4, 5, 6, or 7) transmembrane domain.
According to yet another aspect of the invention, an isolated nucleic acid molecule comprising a molecule having a sequence selected from the group consisting of SEQ ID NO. 51, 53, 54, S5, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, and 92, that encodes a pheromone receptor polypeptide are provided. This aspect of the invention further embraces nucleic acid molecules that differ from these nucleic acid molecules in codon sequence due to the degeneracy of the genetic code, and diversity among pheromone receptors and complements of foregoing.
According to still other aspects of the invention, an expression vector comprising any of the foregoing isolated nucleic acid molecules operably linked to a promoter and host cells transformed or transfected with the same also are provided.
According to another aspect of the invention, an isolated polypeptide encoded by any of the above-described isolated nucleic acid molecules is provided. Preferably, the isolated 3o polypeptide is a pheromone receptor polypeptide that has a pheromone receptor activity or an antigenic fragment thereof. As used herein, a pheromone receptor activity refers to the ability of the pheromone receptor to selectively bind to its cognate iigand (pheromone) and, optionally, _g_ upon binding, to induce signal transduction in a cell that expresses the pheromone receptor. In preferred embodiments, the isolated polypeptide comprises a pheromone receptor polypeptide having a sequence selected from the group consisting of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52.
According to yet other embodiments, the isolated polypeptide comprises a polypeptide encoded by a nucleic acid which hybridizes under high or low stringency conditions to the extracellular domain, transmembrane region and/or intracellular domain of a cDNA sequence selected from the group consisting of SEQ ID NO. 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 54, and 55 that encodes a pheromone receptor 1 o polypeptide or fragment thereof. Thus, the invention embraces portions of a pheromone receptor polypeptide that may include, for example, an amino-terminal extracellular domain or a carboxy-terminal intracellular domain coupled to 1, 2, 3, 4, 5, 6, or 7 transmembrane domains.
Preferably, such polypeptides or fragments thereof are unique fragments and can function as, for example, antigens for making antibodies specific for pheromone receptor family members.
Accordingly, the polypeptides of the invention can be used to isolate additional members of the pheromone receptor family or, alternatively, can be used to induce in vivo an immune response to a pheromone receptor, i.e., can be incorporated into a vaccine preparation.
Such vaccine compositions are useful for controlling fertility or behavior in an animal by administering to the animal, an effective amount of the vaccine to elicit an immune response to the pheromone receptor. Thus, the invention embraces fragments or variants of the foregoing pheromone receptors which exhibit certain detectable activities, e.g., a ligand binding activity, an antigenicity activity. In certain embodiments, the isolated polypeptide is encoded by a cDNA
selected from the group consisting of SEQ ID NO. 51, 53, 54, 55, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, and 92, that encodes a pheromone receptor polypeptide or one or more of its domains.
According to another aspect of the invention, there are provided isolated binding polypeptides which selectively bind a unique amino acid sequence of a pheromone receptor polypeptide or fragment thereof. The isolated binding polypeptide in certain embodiments binds to a polypeptide comprising the extracellular domain and/or 1, 2, 3, 4, 5, 6, or 7 transmembrane 3o domains of a pheromone receptor polypeptide selected from the group consisting of SEQ ID
NO. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52.
_ _ _9_ The isolated polypeptide preferably binds to a polypeptide consisting of the amino-terminal extracellular domain and/or one or more portions of the transmembrane region of a pheromone receptor polypeptide sequence selected from the group consisting of SEQ ID NO.
' 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52.
In preferred embodiments, isolated binding polypeptides include antibodies and fragments of antibodies (e.g., Fab, F(ab)2, Fd and antibody fragments which include a CDR3 region which binds selectively to the unique sequences of the polypeptides of the invention). In the preferred embodiments, the isolated binding peptides do not bind to pheromone receptors that are expressed in vomeronasal organ neurons other than Gao-protein-expressing neurons.
1 o The invention provides in yet other aspects, isolated nucleic acids or polypeptides of the invention that are: (a) immobilized to an insoluble support (an affinity matrix containing immobilized pheromone receptor polypeptide or a unique fragment thereof); (b) associated with, covalently coupled to, or encapsulated a drug delivery device (e.g., a microsphere} to effect controlled release of the isolated nucleic acid or polypeptide in vivo or in vitro; (c) covalently coupled to another isolated nucleic acid or protein to form a chimeric molecule; and/or (d) labeled with a detectable agent (e.g., a radiolabel, a fluorescent label).
Thus, the invention provides chimeric molecules containing at least one first structural domain of one pheromone receptor polypeptide (e.g., an extracellular domain) coupled to a second structural domain (e.g., a transmembrane domain, such as TM1, TM2, etc.) of a different pheromone receptor 2o polypeptide. The invention also provides a method for isolating a pheromone receptor by (1) contacting a composition containing a putative pheromone receptor of the above-described family with an affinity matrix containing immobilized binding polypeptide under conditions to permit the pheromone receptor to selectively bind to the immobilized binding polypeptide, and (2) isolating the polypeptides that bind to the affinity matrix.
According to still another aspect of the invention, pharmaceutical compositions containing any of the foregoing compounds of the invention in a pharmaceutically acceptable carrier and methods of producing same by placing the compositions in the carrier also are provided.
According to still another aspect of the invention, methods for modulating a pheromone 3o receptor activity (e.g., a ligand binding activity, a signal transduction activity) in a cell (vertebrate or invertebrate) are provided. The cell can be located in vivo or in vitro and the methods can be used to down regulate (inhibit) or up regulate (stimulate) the pheromone receptor activity. For example, to inhibit a ligand binding activity, the cell is contacted with an inhibitor that can be an isolated binding polypepdde that binds to an extracellular portion of the receptor and, thereby, inhibits receptor binding to its cognate ligand. Such binding also can induce conformational changes in the receptor that alter the signal transduction activity of the receptor.
s The inhibitor can be an isolated antibody (or function equivalent thereof) which binds to an epitope located on an extracellular portion (such as EC2, EC3, EC4) of the pheromone receptor polypeptide, e.g., an amino-terminal extracellular domain or an "extracellular transmembrane region domain", i.e., an extracellular portion of the transmembrane region located between one or more transmembrane domains. Alternatively, the inhibitor can be an agent (e.g., an isolated 1 o competitive binding polypeptide) that inhibits receptor-ligand binding.
For example, the inhibitor can be an isolated fragment of a pheromone receptor (preferably, a soluble fragment), which fragment contains a ligand (pheromone) binding site. Other inhibitors can be identif ed in screening assays which test the ability of a putative inhibitor to inhibit pheromone receptor-mediated signal transduction or which test the ability of the putative inhibitor to inhibit binding 15 of a pheromone receptor to its known cognate ligand. Similarly, such screening assays can be used to identify molecules which stimulate pheromone receptor-mediated signal transduction.
Exemplary molecules which stimulate transduction include the naturally-occurring ligands (e.g., isolated from a biological source (e.g., urine, vaginal fluid), as well as synthetic ligands obtained from a non-biological source (e.g., a combinatorial library).
2o According to still another aspect of the invention, methods for inhibiting the binding of a pheromone having a binding domain to a pheromone receptor polypeptide having a ligand binding site that selectively binds to the binding domain are provided. The method involves contacting (in vivo or in vitro) the pheromone receptor polypeptide with an agent which binds to the ligand binding site under conditions to permit binding of the agent to the receptor. For 25 example, the agent can be an isolated binding polypeptide that binds to the ligand binding site of the pheromone receptor. Thus, the agent can be an isolated antibody (or functionally equivalent fragment thereof) which selectively binds to the ligand binding site of the receptor.
Alternatively, the agent can be a pheromone receptor antagonist, e.g., a molecule that mimics the structure of the naturally-occurring ligand but that does not mimic the function (stimulating 3o the receptor) of the naturally-occurring ligand. Agents which inhibit ligand binding can be identified in screening assays which test the ability of a putative binding inhibitor to inhibit binding of a pheromone receptor to its cognate ligand (e.g., pheromone). Such molecules can be isolated from a biological source or from a non-biological source.
According to another aspect of the invention, methods for modulating pheromone receptor-mediated signal transduction in a subject are provided. The methods involve administering to a subject in need of such treatment an agent that selectively binds to any of the above-described isolated nucleic acid molecules which encode a pheromone receptor or unique fragment thereof, or an expression product thereof, in an amount effective to modulate (down regulate or up regulate) pheromone receptor-mediated signal transduction in the subject.
Exemplary agents include antisense nucleic acid molecules and binding polypeptides.
t o Thus, according to yet another aspect of the invention, methods are provided for identifying lead compounds for an pharmacological agent useful in the diagnosis or treatment of a condition associated with pheromone receptor signal transduction activity or otherwise generally associated with binding of the receptor to its cognate Iigand.
Preferably, cells expressing intact pheromone receptor polypeptides or portions thereof are used in the screening z s assays for identifying lead compounds which modulate pheromone receptor-mediated ligand binding or signal transduction activity. Cells expressing these polypeptides, isolated pheromone receptor polypeptides and fragments of these polypeptides which contain the ligand binding site can be used in the screening assays for identifying lead compounds which modulate binding of the receptor to a known ligand.
2o The screening methods involve forming a mixture of a pheromone receptor polypeptide (as noted above) or fragment thereof containing a ligand binding site; a molecule which is known to ( 1 ) interact with the foregoing receptor to effect pheromone receptor-mediated signal transduction or (2) bind to the ligand binding site of the receptor; and a candidate pharmacological agent. The mixture is incubated under conditions which, in the absence of the 25 candidate pharmacological agent, permit a first amount of pheromone receptor-ligand binding or receptor-mediated signal transduction by the known ligand. A test amount of the selective binding of the ligand by receptor or of the specific activation of signal transduction is determined. Detection of an increase in the foregoing activities in the presence of the candidate - pharmacological agent indicates that the candidate pharmacological agent is a lead compound 3o for a pharmacological agent which increases specific activation of pheromone receptor-mediated signal transduction or selective binding of the ligand by the ligand binding site of the receptor.
Detection of a decrease in the foregoing activities in the presence of the candidate pharmacological agent indicates that the candidate pharmacological agent is a lead compound for a pharmacological agent which decreases specific activation of pheromone receptor-mediated signal transduction or selective binding of the ligand by the ligand binding site of the receptor.
Pheromone receptor polypeptides that are useful in the screening assays, preferably, are those selected from the group consisting of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52. Extracellular domains or portions thereof and portions of the transmembrane region, alone or coupled to one another, of these pheromone receptor polypeptides (indicated in the Examples) can be tested for their ability to inhibit receptor-ligand binding.
1 o These and other objects of the invention will be described in further detail in connection with the detailed description of the invention.
All patents, patent publications, references and other information identified in this document are incorporated in their entirety herein by reference.
Brief Description of the Drn~i in,g$
Figure 1 depicts a comparison of the deduced protein sequences encoded by VR
cDNA clones.
Figure 2 is a schematic comparison of ORs, VNRs, and Vrs.
Figure 3 depicts a comparison of the deduced protein sequences encoded by the 2o Go-VN cDNA clones.
Brief Descrix~tion of the Seauences SEQ ID NO. 1 is the nucleotide sequence of the mouse pheromone receptor VRl cDNA (GenBank Accession No. AF011411).
SEQ ID NO. 2 is the predicted amino acid sequence of the polypeptide encoded by the mouse pheromone receptor VR1 cDNA (GenBank Accession No. AF011411).
SEQ ID NO. 3 is the nucleotide sequence of the mouse pheromone receptor VR2 cDNA (GenBank Accession No. AF011412).
SEQ ID NO. 4 is the predicted amino acid sequence of the polypeptide encoded by 3o the mouse pheromone receptor VR2 cDNA (GenBank Accession No. AF011412).
SEQ ID NO. 5 is the nucleotide sequence of the mouse pheromone receptor VR3 cDNA (GenBank Accession No. AF011413).
SEQ ID NO. 6 is the predicted amino acid sequence of the polypeptide encoded by the mouse pheromone receptor VR3 cDNA (GenBank Accession No. AF011413).
SEQ ID NO. 7 is the nucleotide sequence of the mouse pheromone receptor VR4 cDNA (GenBank Accession No. AF011414).
SEQ ID NO. 8 is the predicted amino acid sequence of the polypeptide encoded by the mouse pheromone receptor VR4 cDNA (GenBank Accession No. AF011414).
SEQ ID NO. 9 is the nucleotide sequence of the mouse pheromone receptor VRS
cDNA (GenBank Accession No. AF011415).
SEQ ID NO. 10 is the predicted amino acid sequence of the polypeptide encoded by 1 o the mouse pheromone receptor VRS cDNA (GenBank Accession No. AF011415).
SEQ ID NO. 11 is the nucleotide sequence of the mouse pheromone receptor VR6 cDNA (GenBank Accession No. AFO l 1416).
SEQ ID NO. 12 is the predicted amino acid sequence of the polypeptide encoded by the mouse pheromone receptor VR6 cDNA (GenBank Accession No. AF011416).
SEQ ID NO. 13 is the nucleotide sequence of the mouse pheromone receptor VR7 cDNA (GenBank Accession No. AF011417).
SEQ ID NO. 14 is the predicted amino acid sequence of the polypeptide encoded by the mouse pheromone receptor VR7 cDNA (GenBank Accession No. AF011417).
SEQ ID NO. 15 is the nucleotide sequence of the mouse pheromone receptor VR8 2o cDNA (GenBank Accession No. AF011418).
SEQ ID NO. 16 is the predicted amino acid sequence of the polypeptide encoded by the mouse pheromone receptor VR8 cDNA (GenBank Accession No. AF011418).
SEQ ID NO. 17 is the nucleotide sequence of the mouse pheromone receptor VR9 cDNA (GenBank Accession No. AF011419).
SEQ ID NO. 18 is the predicted amino acid sequence of the polypeptide encoded by the mouse pheromone receptor VR9 cDNA (GenBank Accession No. AF011419).
SEQ ID NO. 19 is the nucleotide sequence of the mouse pheromone receptor VR10 cDNA (GenBank Accession No. AF011420).
SEQ ID NO. 20 is the predicted amino acid sequence of the polypeptide encoded by 3o the mouse pheromone receptor VR10 cDNA (GenBank Accession No. AFO11420).
SEQ ID NO. 21 is the nucleotide sequence of the mouse pheromone receptor VRl 1 cDNA (GenBank Accession No. AF011421).
SEQ ID NO. 22 is the predicted amino acid sequence of the polypeptide encoded by the mouse pheromone receptor VRl 1 cDNA (GenBank Accession No. AF011421).
SEQ ID NO. 23 is the nucleotide sequence of the mouse pheromone receptor VR12 cDNA (GenBank Accession No. AFOl 1422).
SEQ ID NO. 24 is the predicted amino acid sequence of the polypeptide encoded by the mouse pheromone receptor VR12 cDNA (GenBank Accession No. AF011422).
SEQ ID NO. 25 is the nucleotide sequence of the mouse pheromone receptor VR13 cDNA (GenBank Accession No. AF011423).
SEQ ID NO. 26 is the predicted amino acid sequence of the polypeptide encoded by 1o the mouse pheromone receptor VR13 cDNA (GenBank Accession No. AFOl 1423).
SEQ ID NO. 27 is the nucleotide sequence of the mouse pheromone receptor VR14 cDNA (GenBank Accession No. AF011424).
SEQ ID NO. 28 is the predicted amino acid sequence of the polypeptide encoded by the mouse pheromone receptor VR14 cDNA (GenBank Accession No. AF011424).
I s SEQ ID NO. 29 is the nucleotide sequence of the mouse pheromone receptor cDNA (GenBank Accession No. AF011425).
SEQ ID NO. 30 is the predicted amino acid sequence of the polypeptide encoded by the mouse pheromone receptor VR15 cDNA (GenBank Accession No. AF011425).
SEQ ID NO. 31 is the nucleotide sequence of the mouse pheromone receptor VR16 20 cDNA (GenBank Accession No. AF011426).
SEQ ID NO. 32 is the predicted amino acid sequence of the polypeptide encoded by the mouse pheromone receptor VR16 cDNA (GenBank Accession No. AF011426).
SEQ ID NO. 33 is the nucleotide sequence of the rat pheromone receptor Go-VN1 cDNA (GenBank Accession No. AF016178).
2s SEQ ID NO. 34 is the predicted amino acid sequence of the polypeptide encoded by the rat pheromone receptor Go-VN1 cDNA (GenBank Accession No. AF016178).
SEQ ID NO. 35 is the nucleotide sequence of the rat pheromone receptor Go-VN2 cDNA (GenBank Accession No. AF016179).
SEQ ID NO. 36 is the predicted amino acid sequence of the polypeptide encoded by 3o the rat pheromone receptor Go-VN2 cDNA (GenBank Accession No. AF016179).
SEQ ID NO. 37 is the nucleotide sequence of the rat pheromone receptor Go-VN3 cDNA (GenBank Accession No. AF016180).
W0.99/00422 PCT/US98/13680 SEQ ID NO. 38 is the predicted amino acid sequence of the polypeptide encoded by the rat pheromone receptor Go-VN3 cDNA (GenBank Accession No. AF016180).
SEQ ID NO. 39 is the nucleotide sequence of the rat pheromone receptor Go-VN4 cDNA (GenBank Accession No. AF016181 ).
L 5 SEQ ID NO. 40 is the predicted amino acid sequence of the polypeptide encoded by the rat pheromone receptor Go-VN4 cDNA (GenBank Accession No. AFO 16181 ).
SEQ ID NO. 41 is the nucleotide sequence of the rat pheromone receptor Go-VNS
cDNA (GenBank Accession No. AF016182).
SEQ ID NO. 42 is the predicted amino acid sequence of the polypeptide encoded by to the rat pheromone receptor Go-VNS cDNA {GenBank Accession No. AF016182).
SEQ ID NO. 43 is the nucleotide sequence of the rat pheromone receptor Go-VN6 cDNA (GenBank Accession No. AF016183).
SEQ ID NO. 44 is the predicted amino acid sequence of the polypeptide encoded by the rat pheromone receptor Go-VN6 cDNA (GenBank Accession No. AF016183).
1 s SEQ ID NO. 45 is the nucleotide sequence of the rat pheromone receptor Go-cDNA (GenBank Accession No. AF016184).
SEQ ID NO. 46 is the predicted amino acid sequence of the polypeptide encoded by the rat pheromone receptor Go-VN7 cDNA (GenBank Accession No. AF016184).
SEQ ID NO. 47 is the nucleotide sequence of the rat pheromone receptor Go-2o cDNA (GenBank Accession No. AF016185).
SEQ ID NO. 48 is the predicted amino acid sequence of the polypeptide encoded by the rat pheromone receptor Go-VN13C cDNA (GenBank Accession No. AF016185).
SEQ ID NO. 49 is the nucleotide sequence of the rat pheromone receptor Go-cDNA (GenBank Accession No. AF016186).
25 SEQ ID NO. 50 is the predicted amino acid sequence of the polypeptide encoded by the rat pheromone receptor Go-VN13B cDNA (GenBank Accession No. AF016186).
SEQ ID NO. 51 is a partial nucleotide sequence of the human pheromone receptor hVRI.
- SEQ ID NO. 52 is the predicted amino acid sequence of the polypeptide encoded by 3o the partial sequence of the human pheromone receptor hVRI .
SEQ ID NO. 53 is a partial nucleotide sequence of the human pheromone receptor hVN01.
SEQ ID NO. 54 is a partial nucleotide sequence of the human pheromone receptor hVN02.
SEQ ID NO. 55 is a partial nucleotide sequence of the human pheromone receptor hVN03.
SEQ ID NO. 56 is the nucleotide sequence of primer AL 1.
SEQ ID NO. 57 is the nucleotide sequence of primer AL3.
SEQ ID NO. 58 is a fifty amino acid sequence of Go-VN13B (SEQ ID NO. 50) that is absent from Go-VN13C (SEQ ID NO. 48).
SEQ ID NO. 59 is the amino acid sequence of a rat kidney extracellular calcium/
1 o polyvalent canon-sensing receptor.
SEQ ID NO. 60 is a degenerate oligonucleotide primer from a conserved VR
domain.
SEQ ID NO. 61 is a degenerate oligonucleotide primer from a conserved VR
domain.
SEQ ID NO. 62 is a degenerate oligonucleotide primer from a conserved VR
domain.
SEQ ID NO. 63 is a degenerate oligonucleotide primer from a conserved VR
domain.
SEQ ID NO. 64 is a degenerate oligonucleotide primer from a conserved VR
domain.
SEQ ID NO. 65 is a degenerate oligonucleotide primer from a conserved VR
domain.
SEQ ID NO. 66 is a degenerate oligonucleotide primer from a conserved VR
domain.
SEQ ID NO. 67 is a degenerate oligonucleotide primer from a conserved VR
domain.
SEQ ID NO. 68 is the nucleotide sequence of the coding region of the mouse 2o pheromone receptor VRI.
SEQ ID NO. 69 is the nucleotide sequence of the coding region of the mouse pheromone receptor VR2.
SEQ ID NO. 70 is the nucleotide sequence of the coding region of the mouse pheromone receptor VR3.
SEQ ID NO. 71 is the nucleotide sequence of the coding region of the mouse pheromone receptor VR4.
SEQ ID NO. 72 is the nucleotide sequence of the coding region of the mouse pheromone receptor VRS.
SEQ ID NO. 73 is the nucleotide sequence of the coding region of the mouse 3o pheromone receptor VR6.
SEQ ID NO. 74 is the nucleotide sequence of the coding region of the mouse pheromone receptor VR7.
- _ - 17-SEQ ID NO. 75 is the nucleotide sequence of the coding region of the mouse pheromone receptor VRB.
i SEQ ID NO. 76 is the nucleotide sequence of the coding region of the mouse x pheromone receptor VR9.
SEQ ID NO. 77 is the nucleotide sequence of the coding region of the mouse pheromone receptor VR10.
SEQ ID NO. 78 is the nucleotide sequence of the coding region of the mouse pheromone receptor VRl l .
SEQ ID NO. 79 is the nucleotide sequence of the coding region of the mouse 1o pheromone receptor VR12.
SEQ ID NO. 80 is the nucleotide sequence of the coding region of the mouse pheromone receptor VR13.
SEQ ID NO. 81 is the nucleotide sequence of the coding region of the mouse pheromone receptor VR14.
SEQ ID NO. 82 is the nucleotide sequence of the coding region of the mouse pheromone receptor VR15.
SEQ iD NO. 83 is the nucleotide sequence of the coding region of the mouse pheromone receptor VR16.
SEQ ID NO. 84 is the nucleotide sequence of the coding region of the rat pheromone 2o receptor GoVNl.
SEQ ID NO. 85 is the nucleotide sequence of the coding region of the rat pheromone receptor GoVN2.
SEQ ID NO. 86 is the nucleotide sequence of the coding region of the rat pheromone receptor GoVN3.
SEQ ID NO. 87 is the nucleotide sequence of the coding region of the rat pheromone receptor GoVN4.
SEQ ID NO. 88 is the nucleotide sequence of the coding region of the rat pheromone receptor GoVNS.
- SEQ ID NO. 89 is the nucleotide sequence of the coding region of the rat pheromone 3o receptor GoVN6.
SEQ ID NO. 90 is the nucleotide sequence of the coding region of the rat pheromone receptor GoVN7.
WO 99/00422 PCT/US98/13l80 SEQ ID NO. 91 is the nucleotide sequence of the coding region of the rat pheromone receptor GoVNI3C.
SEQ ID NO. 92 is the nucleotide sequence of the coding region of the rat pheromone receptor GoVNI3B.
Detailed Description of the Invention The present invention in one aspect involves the cloning of cDNAs encoding several members of a multigene family of pheromone receptors. Complete cDNA sequences for selected marine and rat pheromone receptors are provided. Partial sequences of the human gene also are provided. The present invention also relates to the discovery that this family of pheromone receptors is expressed in a Ga° protein-expressing vomeronasal organ neurons ("Gq~
VNO") or in another olfactory organ neuron in an animal (preferably, a mammal and more preferably, a human) which lacks a vomeronasal organ. Throughout this description, the pheromone receptors of the invention alternatively are referred to as "pheromone receptors", "Ga°+ VNO pheromone receptors" or, simply, "Gaco+ VNO receptors".
Analysis of the sequence homology between members of the receptor family by comparison to nucleic acid and protein databases established that the pheromone receptor family has several domains. These include, from amino terminus to carboxyl terminus:
(a) an amino-terminal extracellular domain containing from 30 to 600 amino acids; (b) a transmembrane region comprising: (i) seven non-contiguous transmembrane domains designated TM1, TM2, TM3, TM4, TMS, TM6 and TM7, (ii) three non-contiguous extracellular domains designated EC2, EC3 and EC4, and (iii) three non-contiguous intracellular domains designated IC1, IC2, and IC3, wherein the transmembrane domains, the extracellular domains and the intracellular domains are attached to one another from amino terminus to carboxyl terminus in the orderTMl-IC1-TM2-EC2-TM3-IC2-TM4-EC3-TMS-IC3-TM6-EC4-TM7,andwhereinthe transmembrane region has at least about 35% homology and a length approximately equal to a transmembrane region of a polypeptide selected from the group consisting of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 34, 36, 38, 40, 42, 44, 46, 48, and 50; and (c) a carboxyl-terminal intracellular domain containing from 5 to 200 amino acids. Each polypeptide member of the family is 3o expressed in a Gao protein-expressing vomeronasal organ neuron or are expressed in another olfactory organ neuron in an animal which does not possess a vomeronasal organ. One skilled in the art can readily identify olfactory organs in animals which do not possess a vomeronasal organ. The homology can be calculated using various, publicly available software tools developed by NCBI (Bethesda, Maryland) that can be obtained through the Internet (ftp://ncbi.nlm.nih.gov/pub~. Exemplary tools include the BLAST system.
Pairwise and ClustalW alignments (BLOSUM30 matrix setting) as well as Kyte-Doolittle hydropathic analysis s can be obtained using the MacVector sequence analysis software (Oxford Molecular Group).
The structure of the Gacp+ VNO pheromone receptors suggests that these receptors are members of the large G protein-coupled receptor superfamily (GPCR). Like other GPCRs, the Gao+ VNO pheromone receptors exhibit seven hydrophobic stretches ("hydrophobic domains") and are similar in structure to other types of GPCRs, the calcium sensing receptor (CSR Ser. ID
~ o No. 59) and the metabotropic glutamate receptors (mGluRs). The CSR and mGluRs are unusual among the GPCRs in that they have extremely long N-terminal extracellular domain (e.g., 557-565 amino acids), a feature that is shared by the pheromone receptors of the invention. Despite this similarity, the receptors of the invention do not share substantial primary structure homology with the CSR and mGluRs. The receptors of the invention also are very different structurally 15 from two other G-protein coupled receptors, the odorant receptors and Gai2+
vomeronasal receptors, which share none of the characteristic sequence motifs of the receptors of the invention and, moreover, which have very small (--12-28 amino acids) N-terminal extracellular domains.
The receptors of the invention differ somewhat in amino acid sequence, with regions of relatively high sequence homology. Refer to Examples 1 and 2 for a discussion and illustration 20 of the amino acid sequence homology for the marine and rat Gao+ VNO
receptors, respectively.
Other features of these members of the Gao+ VNO receptor family also are discussed and illustrated in the Examples. For example, signal sequences have been identified for several of the Gao+ VNO receptors disclosed in the Examples.
Homologs and alleles of the pheromone receptor nucleic acids of the invention can be 25 identified by conventional techniques. Thus, an aspect of the invention is those nucleic acid sequences (SEQ ID NOs. 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 54, and 55) which code for Goco+ VNO pheromone receptors and which hybridize to a nucleic acid molecule consisting of the coding region of any one Goco+ VNO
pheromone receptor selected from the group consisting of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 16, 30 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52, under high or low stringency conditions. The term "high or low stringency conditions" as used herein refers to parameters with which the art is familiar. Nucleic acid hybridization parameters may be found in references which compile such methods, e.g. Molecular Cloning: A Laboratory Manual, J.
Sambrook, et al., eds., Second Edition, Cold Spring Harbor Laboratory Press, Coid Spring Harbor, New York, I 989, or Current Protocols in Molecular Biology, F.M.
Ausubel, et al., eds., John Wiley & Sons, Inc., New York. More specifically, high stringency conditions, as used herein, refers, for example, to hybridization at 65°C in hybridization buffer (3.5 x SSC, 0.02%
Ficoll, 0.02% polyvinyl pyrolidone, 0.02% Bovine Serum Albumin, 2.SmM
NaH,P04(pH7), 0.5% SDS, 2mM EDTA). SSC is O.15M sodium chloride/O.15M sodium citrate, pH7;
SDS is sodium dodecyl sulphate; and EDTA is ethylenediaminetetracetic acid. Low stringency conditions would be the same, but with a lower temperature (e.g., SS
°C). After hybridization, 1 o the membrane upon which the DNA is transferred is washed at 2 x SSC at room temperature and then at 0.2 x SSC/0.5% SDS at temperatures of up to 65°C. Additional conditions of varying stringency are provided in the Examples.
There are other conditions, reagents, and so forth which can used, which result in a similar degree of stringency. The skilled artisan will be familiar with such conditions, and thus they are not given here. It will be understood, however, that the skilled artisan will be able to manipulate the conditions in a manner to permit the clear identification of homologs and alleles of the Goco+ VNO pheromone receptor nucleic acids of the invention. The skilled artisan also is fanuliar with the methodology for screening cells and libraries for expression of such molecules which then are routinely isolated, followed by isolation of the pertinent nucleic acid molecule 2o and sequencing.
In general homologs and alleles typically will share at least 35% nucleotide identity and/or at least SO% amino acid identity to the cDNAs encoding a Ga°+
VNO pheromone receptor polypeptide selected from the group consisting of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52, in some instances will share at least 50% nucleotide identity and/or at least 65% amino acid identity and in still other instances will share at least 60% nucleotide identity and/or at least 75% amino acid identity. Watson-Crick complements of the foregoing nucleic acids also are embraced by the invention.
As discussed above in the Summary of the invention, certain domains within the pheromone receptors may share even greater sequence homology to a pheromone receptor polypeptide selected from the 3o group consisting of SEQ ID NO. 2, 4, 6, 8,10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52.
In screening for Gao+ VNO pheromone receptor polypeptides, a Southern blot may be performed using the foregoing conditions, together with a radioactive probe.
After washing the membrane to which the DNA is finally transferred, the membrane can be placed against X-ray film to detect the radioactive signal.
The invention also includes degenerate nucleic acids which include alternative colons to those present in the native materials. For example, serine residues are encoded by the colons TCA, AGT, TCC, TCG, TCT and AGC. Each of the six colons is equivalent for the purposes of encoding a serine residue. Thus, it will be apparent to one of ordinary skill in the art that any of the serine-encoding nucleotide triplets may be employed to direct the protein synthesis 1o apparatus, in vitro or in vivo, to incorporate a serine residue into an elongating Gao+ VNO
pheromone receptor polypeptide. Similarly, nucleotide sequence triplets which encode other amino acid residues include, but are not limited to,: CCA, CCC, CCG and CCT
(proline colons); CGA, CGC, CGG, CGT, AGA and AGG (arginine colons); ACA, ACC, ACG and ACT (threonine colons); AAC and AAT (asparagine colons); and ATA, ATC and ATT
(isoleucine colons). Other amino acid residues may be encoded similarly by multiple nucleotide sequences. Thus, the invention embraces degenerate nucleic acids that differ from the biologically isolated nucleic acids in colon sequence due to the degeneracy of the genetic code.
In addition, areas of high similarity among pheromone receptors may differ in amino acid sequences such that they share many, but not all, amino acids. Their nucleotide sequences all 2o differ accordingly.
The invention also provides isolated unique fragments of the cDNAs encoding a Gao+
VNO polypeptide selected from the group consisting of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52, or complements of these sequences. A unique fragment is one that is a 'signature' for the larger nucleic acid. It, for example, is long enough to assure that its precise sequence is not found in molecules outside of the Goco+ VNO pheromone receptor nucleic acids defined above. Unique fragments can be used as probes in Southern blot assays to identify such nucleic acids, or can be used as primers in amplification assays such as those employing PCR. As known to those skilled in the art, large probes such as 200 nucleotides or more are preferred for certain uses such as Southern blots, 3o while smaller fragments will be preferred for uses such as PCR. Unique fragments also can be used to produce fusion proteins for generating antibodies or determining binding of the polypeptide fragments, as demonstrated in the Examples, or for generating immunoassay components. Likewise, unique fragments can be employed to produce nonfused fragments of the Gao+ VNO pheromone receptor polypeptides, useful, for example, in the preparation of antibodies, in immunoassays, and as a competitive binding partner of the pheromones and/or other ligands which bind to the Gao+ VNO pheromone receptor polypeptides, for example, in therapeutic applications. Unique fragments further can be used as antisense molecules to inhibit the expression of Gao+ VNO pheromone receptor nucleic acids and polypeptides, particularly for the insecticide and other fertility control purposes as described in greater detail below.
As will be recognized by those skilled in the art, the size of the unique fragment will depend upon its conservancy in the genetic code. Thus, some regions of a cDNA
selected from 1o the group consisting of SEQ ID NO. 51, 53, 54, 55, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, and 92, that encodes a Gao+
VNO polypeptide, and its complement will require longer segments to be unique while others will require only short segments, typically between 12 and 32 nucleotides (e.g. 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 and 32 bases long). Virtually any segment of the region of the 1 s cDNAs encoding the full length Gaco+ VNO polypeptide or their complements, that is I 8 or more nucleotides in length will be unique. Those skilled in the art are well versed in methods for selecting such sequences, typically on the basis of the ability of the unique fragment to selectively distinguish the sequence of interest from non-Gao+ VNO pheromone receptor nucleic acids. A comparison of the sequence of the fragment to those on known data bases typically is 2o all that is necessary, although in vitro confirmatory hybridization and sequencing analysis may be performed.
As mentioned above, the invention embraces antisense oligonucleotides that selectively bind to a nucleic acid molecule encoding a Goco+ VNO pheromone receptor polypeptide, to decrease a pheromone receptor activity (e.g., a Iigand binding activity, a signal transduction 25 activity). This is desirable in virtually any condition wherein a reduction in pheromone binding or induction of a behavior that is triggered by pheromone binding is desirable, including to control fertility and behavior in vertebrates and invertebrates. The compositions of the invention are particularly useful in, for example, controlling fertility in livestock and controlling reproduction in rodents or insects by interrupting the normal behaviors of rodents or insects that 3o result in reproduction. As used herein, the term "antisense oligonucleotide" or "antisense"
describes an oligonucleotide that is an oligoribonucleotide, oligodeoxyribonucleotide, modified oligoribonucleotide, or modified oligodeoxyribonucleotide which hybridizes under physiological _ _ _ 23 _ conditions to DNA comprising a particular gene or to an mRNA transcript of that gene and, thereby, inhibits the transcription of that gene and/or the translation of that mRNA. The antisense molecules are designed so as to interfere with transcription or translation of a target gene upon hybridization with the target gene or transcript. Those skilled in the art will recognize s that the exact length of the antisense oligonucleotide and its degree of complementarity with its target will depend upon the specific target selected, including the sequence of the target and the particular bases which comprise that sequence. It is preferred that the antisense oligonucleotide be constructed and arranged so as to bind selectively with the target under physiological conditions, i.e., to hybridize substantially more to the target sequence than to any other sequence 1 o in the target cell under physiological conditions. Based upon the cDNA
sequences of Examples l and 2 (SEQ ID NOs. 1, 3, 5, 7, 9, 11,13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 54, and 55), or upon allelic or homologous genomic and/or cDNA
sequences, one of skill in the art can easily choose and synthesize ariy of a number of appropriate antisense molecules for use in accordance with the present invention. In order to be sufficiently i s selective and potent for inhibition, such antisense oligonucleotides should comprise at least 10 and, more preferably, at least 15 consecutive bases which are complementary to the target, although in certain cases modified oligonucleotides as short as 7 bases in length have been used successfully as antisense oligonucleotides (Wagner et al., Nature Biotechnol.
14:840-844, 1996).
Most preferably, the antisense oligonucleotides comprise a complementary sequence of 20-30 2o bases. Although oligonucleotides may be chosen which are antisense to any region of the gene or mRNA transcripts, in preferred embodiments the antisense oligonucleotides correspond to N-terminal or S' upstream sites such as translation initiation, transcription initiation or promoter sites. In addition, 3'-untranslated regions may be targeted. Targeting to mRNA
splicing sites has also been used in the art but may be less preferred if alternative mRNA
splicing occurs. In 25 addition, the antisense is targeted, preferably, to sites in which mRNA
secondary structure is not expected {see, e.g., Sainio et al., Cell Mol. Neurobiol. 14(5):439-457, 1994) and at which proteins are not expected to bind. Finally, although, Examples 1 and 2 disclose cDNA sequences (SEQ ID NOs. 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 54, and 55), one of ordinary skill in the art may easily derive the genomic DNA
3o corresponding to the cDNA of these cDNAs. Thus, the present invention also provides for antisense oligonucleotides which are complementary to the genomic DNA
corresponding to a cDNA sequence selected from the group consisting of SEQ ID NOs. 1, 3, 5, 7, 9, 1 l, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 54, and 55. Similarly, antisense to allelic or homologous cDNAs and genomic DNAs are enabled without undue experimentation.
In one set of embodiments, the antisense oligonucleotides of the invention may be composed of "natural" deoxyribonucleotides, ribonucleotides, or any combination thereof. That is, the 5' end of one native nucleotide and the 3' end of another native nucleotide may be covalently linked, as in natural systems, via a phosphodiester internucleoside linkage. These oligonucleotides may be prepared by art recognized methods which may be carried out manually or by an automated synthesizer. They also may be produced recombinantly by vectors.
1o In preferred embodiments, however, the antisense oligonucleotides of the invention also may include "modified" oligonucleotides. That is, the oligonucleotides may be modified in a number of ways which do not prevent them from hybridizing to their target but which enhance their stability or targeting or which otherwise enhance their therapeutic effectiveness.
The term "modified oligonucleotide" as used herein describes an oligonucleotide in t 5 which ( 1 ) at least two of its nucleotides are covalently linked via a synthetic internucleoside linkage (i.e., a linkage other than a phosphodiester linkage between the 5' end of one nucleotide and the 3' end of another nucleotide) and/or (2) a chemical group not normally associated with nucleic acids has been covalently attached to the oligonucleotide. Preferred synthetic internucleoside linkages are phosphorothioates, alkylphosphonates, phosphorodithioates, 2o phosphate esters, alkylphosphonothioates, phosphoramidates, carbamates, carbonates, phosphate triesters, acetamidates, carboxymethyl esters and peptides.
The term "modified oligonucleotide" also encompasses oligonucleotides with a covalently modified base and/or sugar. For example, modified oligonucleotides include oiigonucleotides having backbone sugars which are covalently attached to low molecular weight 25 organic groups other than a hydroxyl group at the 3' position and other than a phosphate group at the 5' position. Thus modified oligonucleotides may include a 2'-O-alkylated ribose group.
In addition, modified oligonucleotides may include sugars such as arabinose instead of ribose.
The present invention, thus, contemplates pharmaceutical preparations containing modified antisense molecules that are complementary to and hybridizable with, under physiological 3o conditions, nucleic acids encoding pheromone receptor polypeptides, together with pharmaceutically acceptable carriers.
Antisense oligonucleotides may be administered as part of a pharmaceutical composition.
Such a pharmaceutical composition may include the antisense oligonucleotides in combination with any standard physiologically and/or pharmaceutically acceptable carriers which are known in the art. The compositions should be sterile and contain a therapeutically effective amount of the antisense oligonucleotides in a unit of weight or volume suitable for administration to a patient. The term "pharmaceutically acceptable" means a non-toxic material that does not interfere with the effectiveness of the biological activity of the active ingredients. The term "physiologically acceptable" refers to a non-toxic material that is compatible with a biological system such as a cell, cell culture, tissue, or organism. The characteristics of the carrier will 1 o depend on the route of administration. Physiologically and pharmaceutically acceptable Garners include diluents, fillers, salts, buffers, stabilizers, solubilizers, and other materials which are well known in the art.
As used herein, a "vector" may be any of a number of nucleic acids into which a desired sequence may be inserted by restriction and ligation for transport between different genetic environments or for expression in a host cell. Vectors are typically composed of DNA although RNA vectors are also available. Vectors include, but are not limited to, plasmids, phagemids and virus genomes. A cloning vector is one which is able to replicate in a host cell, and which is further characterized by one or more endonuclease restriction sites at which the vector may be cut in a determinable fashion and into which a desired DNA sequence may be ligated such that 2o the new recombinant vector retains its ability to replicate in the host cell. In the case of plasmids, replication of the desired sequence may occur many times as the plasmid increases in copy number within the host bacterium or just a single time per host before the host reproduces by mitosis. In the case of phage, replication may occur actively during a lytic phase or passively during a lysogenic phase. An expression vector is one into which a desired DNA
sequence may be inserted by restriction and ligation such that it is operably joined to regulatory sequences and may be expressed as an RNA transcript. Vectors may further contain one or more marker sequences suitable for use in the identification of cells which have or have not been transformed or transfected with the vector. Markers include, for example, genes encoding proteins which increase or decrease either resistance or sensitivity to antibiotics or other compounds, genes 3o which encode enzymes whose activities are detectable by standard assays known in the art (e.g., 13-galactosidase or alkaline phosphatase), and genes which visibly affect the phenotype of transformed or transfected cells, hosts, colonies or plaques (e.g., green fluorescent protein).
Preferred vectors are those capable of autonomous replication and expression of the structural gene products present in the DNA segments to which they are operably joined.
As used herein, a coding sequence and regulatory sequences are said to be "operably"
joined when they are covalently linked in such a way as to place the expression or transcription of the coding sequence under the influence or control of the regulatory sequences. If it is desired that the coding sequences be translated into a functional protein, two DNA
sequences are said to be operably joined if induction of a promoter in the 5' regulatory sequences results in the transcription of the coding sequence and if the nature of the linkage between the two DNA
sequences does not (1) result in the introduction of a frame-shift mutation, (2) interfere with the 1o ability of the promoter region to direct the transcription of the coding sequences, or (3) interfere with the ability of the corresponding RNA transcript to be translated into a protein. Thus, a promoter region would be operably joined to a coding sequence if the promoter region were capable of effecting transcription of that DNA sequence such that the resulting transcript might be translated into the desired protein or polypeptide.
The precise nature of the regulatory sequences needed for gene expression may vary between species or cell types, but shall in general include, as necessary, 5' non-transcribed and 5' non-translated sequences involved with the initiation of transcription and translation respectively, such as a TATA box, capping sequence, CART sequence, and the like. Especially, such 5' non-transcribed regulatory sequences will include a promoter region which includes a 2o promoter sequence for transcriptional control of the operably joined gene.
Regulatory sequences may also include enhancer sequences or upstream activator sequences as desired. The vectors of the invention may optionally include 5' leader or signal sequences. The choice and design of an appropriate vector is within the ability and discretion of one of ordinary skill in the art.
Expression vectors containing all the necessary elements for expression are commercially available and known to those skilled in the art. See, e.g., Sambrook et al., Molecular Cloning:
A Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, 1989. Cells are genetically engineered by the introduction into the cells of heterologous DNA
(RNA) encoding pheromone receptor polypeptide or fragment or variant thereof. That heterologous DNA (RNA) is placed under operable control of transcriptional elements to permit the expression of the 3o heterologous DNA in the host cell.
Preferred systems for mRNA expression in mammalian cells are those such as pRc/CMV
(available from Invitrogen, Carlsbad, CA) that contain a selectable marker such as a gene that confers 6418 resistance (which facilitates the selection of stably transfected cell lines) and the human cytomegalovirus (CMV) enhancer-promoter sequences. Additionally, suitable for expression in primate or canine cell lines is the pCEP4 vector (Invitrogen), which contains an Epstein Barr virus (EBV) origin of replication, facilitating the maintenance of plasmid as a multicopy extrachromosomal element. Another expression vector is the pEF-BOS
plasmid containing the promoter of polypeptide Elongation Factor 1 a, which stimulates efficiently transcription in vitro. The plasmid is described by Mishizuma and Nagata (Nuc.
Acids Res.
18:5322, 1990), and its use in transfection experiments is disclosed by, for example, Demoulin (Mol. Cell. Biol. 16:4710-4716,1996). Still another preferred expression vector is an adenovirus, described by Stratford-Perricaudet, which is defective for E1 and E3 proteins (J. Clin. Invest.
90:626-630, 1992). The use of the adenovirus as an Adeno.PlA recombinant is disclosed by Warnier et al., in intradermal injection in mice for immunization against P1A
(Int. J. Cancer, 67:303-310, 1996).
The invention also embraces so-called expression kits, which allow the artisan to prepare a desired expression vector or vectors. Such expression kits include at least separate portions of each of the previously discussed coding sequences. Other components may be added, as desired, as long as the previously mentioned sequences, which are required, are included.
The invention also permits the construction of pheromone receptor gene "knock-outs"
in cells and in animals, providing materials for studying certain aspects of pheromone receptor 2o binding, signal transduction activity, or function.
The invention also provides isolated polypeptides, which include a pheromone receptor polypep6de selected from the group consisting of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52 and unique fragments of these pheromone receptor polypeptides. Such polypeptides are useful, for example, alone or as fusion proteins to generate antibodies.
A unique fragment of a pheromone receptor polypeptide, in general, has the features and characteristics of unique fragments as discussed above in connection with nucleic acids. As will be recognized by those skilled in the art, the size of the unique fragment will depend upon factors such as whether the fragment constitutes a portion of a conserved protein domain. Thus, some 3o regions of a pheromone receptor polypeptide selected from the group consisting of SEQ ID NO.
2, 4, 6, 8, 10,12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52 will require longer segments to be unique while others will require only short segments, typically between 5 and 12 amino acids (e.g. 5, 6, 7, 8, 9, 10, 11 and 12 amino acids long).
Unique fragments of a polypeptide preferably are those fragments which retain a distinct functional capability of the polypeptide. Functional capabilities which can be retained in a unique fragment of a polypeptide include interaction with antibodies, interaction with other polypeptides (G-proteins) or molecules (e.g., a ligand) or fragments thereof, selective binding of nucleic acids or proteins, and enzymatic activity. Those skilled in the art are well versed in methods for selecting unique amino acid sequences, typically on the basis of the ability of the unique fragment to selectively distinguish the sequence of interest from non-family members.
1 o A comparison of the sequence of the fragment to those on known data bases typically is all that is necessary.
The invention embraces variants of the pheromone receptor polypeptides described above. As used herein, a "variant" of a pheromone receptor polypeptide is a polypeptide which contains one or more modifications to the primary amino acid sequence of a pheromone receptor poiypeptide. Modifications which create a pheromone receptor variant can be made to a pheromone receptor polypeptide 1 ) to reduce or eliminate an activity of a pheromone receptor polypeptide, such as a ligand binding activity or a signal transduction activity; 2) to enhance a property of a pheromone receptor polypeptide, such as protein stability in an expression system or the stability of protein-protein binding; or 3) to provide a novel activity or property to a 2o pheromone receptor polypeptide, such as addition of an antigenic epitope or addition of a detectable moiety. Modifications to a pheromone receptor polypeptide are typically made to the nucleic acid which encodes the pheromone receptor polypeptide, and can include deletions, point mutations, truncations, amino acid substitutions and additions of amino acids or non-amino acid moieties. Alternatively, modifications can be made directly to the polypeptide, such as by cleavage, addition of a linker molecule, addition of a detectable moiety, such as biotin, addition of a fatty acid, and the like. Modifications also embrace fusion proteins comprising all or part of the pheromone receptor amino acid sequence.
In general, variants include pheromone receptor polypeptides which are modified specifically to alter a feature of the polypeptide unrelated to its physiological activity. For 3o example, cysteine residues can be substituted or deleted to prevent unwanted disulfide linkages.
Similarly, certain amino acids can be changed to enhance expression of a pheromone receptor polypeptide by eliminating proteolysis by proteases in an expression system.
Mutations of a nucleic acid which encode a pheromone receptor polypeptide preferably preserve the amino acid reading frame of the coding sequence, and preferably do not create i regions in the nucleic acid which are likely to hybridize to form secondary structures, such a hairpins or loops, which can be deleterious to expression of the variant polypeptide.
Mutations can be made by selecting an amino acid substitution, or by random mutagenesis of a selected site in a nucleic acid which encodes the polypeptide. Variant polypeptides are then expressed and tested for one or more activities to determine which mutation provides a variant polypeptide with the desired properties. Further mutations can be made to variants {or to non-variant pheromone receptor polypeptides) which are silent as to the 1o amino acid sequence of the polypeptide, but which provide preferred codons for translation in a particular host. The preferred colons for translation of a nucleic acid in, e.g., E. coli, are well known to those of ordinary skill in the art. Still other mutations can be made to the noncoding sequences of a pheromone receptor gene or cDNA clone to enhance expression of the polypeptide. The activity of variants of pheromone receptor polypeptides can be tested by cloning the gene encoding the variant pheromone receptor polypeptide into a bacterial or mammalian expression vector, introducing the vector into an appropriate host cell, expressing the variant pheromone receptor polypeptide, and testing for a functional capability of the pheromone receptor polypeptides as disclosed herein. For example, the variant pheromone receptor polypeptide can be tested for a ligand binding activity, wherein a ligand to which the 2o receptor binds is contacted with the variant receptor and the amount of ligand binding to the variant receptor is determined using conventional procedures to measure the binding of one molecule to another. Preparation of other variant polypeptides may favor testing of other activities, as will be known to one of ordinary skill in the art.
The skilled artisan will also realize that conservative amino acid substitutions may be 2s made in pheromone receptor polypeptides to provide functionally equivalent variants of the foregoing polypeptides, i.e, the variants retain the functional capabilities of the pheromone receptor polypeptides. As used herein, a "conservative amino acid substitution" refers to an amino acid substitution which does not alter the relative charge or size characteristics of the protein in which the amino acid substitution is made. Variants can be prepared according to 3o methods for altering polypeptide sequence known to one of ordinary skill in the art such as are found in references which compile such methods, e.g. Molecular Cloning: A
Laboratory Manual, J. Sambrook, et al., eds., Second Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1989, or Current Protocols in Molecular Biology, F.M.
Ausubel, et al., eds., John Wiley & Sons, Inc., New York. To a certain extent, the various members of the pheromone receptor family that are illustrated in the Examples represent exemplary functionally equivalent variants of the pheromone receptor polypeptides. Other functionally equivalent variants include s conservative amino acid substitutions of the amino acids of a pheromone receptor polypeptide selected from the group consisting of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52. Conservative substitutions of amino acids include substitutions made amongst amino acids within the following groups:
(a) M, I, L, V; (b) F, Y, W; (c) K, R, H; (d) A, G; (e) S, T; (f) Q, N; and (g) E, D.
1 o Conservative amino-acid substitutions in the amino acid sequence of pheromone receptor polypeptides to produce functionally equivalent variants of pheromone receptor polypeptides typically are made by alteration of the nucleic acid encoding pheromone receptor polypeptides.
Such substitutions can be made by a variety of methods known to one of ordinary skill in the art.
For example, amino acid substitutions may be made by PCR-directed mutation, site-directed 15 mutagenesis according to the method described in Proc. Nat. Acad. Sci.
U.S.A. 82: 488-492, 1985, or by chemical synthesis of a gene encoding a pheromone receptor polypeptide. Where amino acid substitutions are made to a small unique fragment of a pheromone receptor polypeptide, such as a ligand binding site peptide, the substitutions can be made by directly synthesizing the peptide. The activity of functionally equivalent fragments of pheromone 2o receptor polypeptides can be tested by cloning the gene encoding the altered pheromone receptor polypeptide into a bacterial or mammalian expression vector, introducing the vector into an appropriate host cell, expressing the altered pheromone receptor polypeptide, and testing for a functional capability of the pheromone receptor polypeptides as disclosed herein. Peptides which are chemically synthesized can be tested directly for function, e.g., for binding to a ligand to 25 which the unaltered pheromone receptor is known to bind.
The invention as described herein has a number of uses, some of which are described elsewhere herein. First, the invention permits isolation of the pheromone receptor polypeptides of the Examples. A variety of methodologies well-known to the skilled practitioner can be utilized to obtain isolated pheromone receptor molecules. The polypeptide may be purified from 3o cells which naturally produce the polypepdde by chromatographic means or immunological recognition. Alternatively, an expression vector may be introduced into cells to cause production of the polypeptide. In another method, mRNA transcripts may be microinjected or otherwise WO 9!9/00422 PCT/US98/13680 introduced into cells to cause production of the encoded polypeptide.
Translation of mRNA in cell-free extracts such as the reticulocyte lysate system also may be used to produce polypeptide.
Those skilled in the art also can readily follow known methods for isolating pheromone receptor polypeptides. These include, but are not limited to, immunochromatography, FiPLC, size-exclusion chromatography, ion-exchange chromatography and immune-affinity chromatography.
The isolation of the pheromone receptor gene also makes it possible for the artisan to diagnose a disorder characterized by expression of pheromone receptor . These methods involve determining expression of the pheromone receptor gene, and/or pheromone receptor 1 o polypeptides derived therefrom. In the former situation, such determinations can be carried out via any standard nucleic acid determination assay, including the polymerase chain reaction as exemplified in the examples below, or assaying with labeled hybridization probes.
The invention also makes it possible to isolate the naturally occurring ligands (pheromones) and other ligands that have a ligand binding domain, namely, by the binding of such molecules to the pheromone receptor polypeptides (or fragments thereof containing a ligand binding site). Binding of the receptors to a ligand can be accomplished by introducing into a biological system in which the proteins bind (e.g., a cell) a molecule that includes a binding domain (putative ligand) in an amount sufficient to detect the binding.
The invention also provides agents such as binding polypeptides which bind to 2o pheromone receptor polypeptides and/or to complexes of pheromone receptor polypeptides and their ligand binding partners. Such binding agents can be used, for example, in screening assays to detect the presence or absence of pheromone receptor polypeptides and complexes of pheromone receptor polypeptides and their ligand binding partners and in purification protocols to isolate pheromone receptor polypep~tides and complexes of pheromone receptor polypeptides and their ligand binding partners. Such agents also can be used to inhibit the native activity of the pheromone receptor polypeptides or their ligand binding partners, for example, by binding to such polypeptides, or their binding partners or both.
The invention, therefore, embraces peptide binding agents which, for example, can be . antibodies or fragments of antibodies having the ability to selectively bind to pheromone receptor 3o polypeptides. Antibodies include polyclonal and monoclonal antibodies, prepared according to conventional methodology.
W0.99/00422 PCTNS98/13680 Significantly, as is well-known in the art, only a small portion of an antibody molecule, the paratope, is involved in the binding of the antibody to its epitope (see, in general, Clark, W.R.
(1986) The Experimental Foundations of Modern Immunology Wiley & Sons, Inc., New York;
Roitt, I. (1991) Essential Immunology, 7th Ed., Blackweil Scientific Publications, Oxford). The pFc' and Fc regions, for example, are effectors of the complement cascade but are not involved in antigen binding. An antibody from which the pFc' region has been enzymatically cleaved, or which has been produced without the pFc' region, designated an F(ab')2 fragment, retains both of the antigen binding sites of an intact antibody. Similarly, an antibody from which the Fc region has been enzymatically cleaved, or which has been produced without the Fc region, 1 o designated an Fab fragment, retains one of the antigen binding sites of an intact antibody molecule. Proceeding further, Fab fragments consist of a covalently bound antibody light chain and a portion of the antibody heavy chain denoted Fd. The Fd fragments are the major determinant of antibody specificity (a single Fd fragment may be associated with up to ten different light chains without altering antibody specificity) and Fd fragments retain epitope binding ability in isolation.
Within the antigen-binding portion of an antibody, as is well-known in the art, there are complementarity determining regions (CDRs), which directly interact with the epitope of the antigen, and framework regions (FRs), which maintain the tertiary structure of the paratope (see, in general, Clark, 1986; Roitt, 1991 ). In both the heavy chain Fd fragment and the light chain of IgG immunoglobulins, there are four framework regions (FRl through FR4) separated respectively by three complementarity determining regions (CDR/ through CDR3).
The CDRs, and in particular the CDR3 regions, and more particularly the heavy chain CDR3, are largely responsible for antibody specificity.
It is now well-established in the art that the non-CDR regions of a mammalian antibody may be replaced with similar regions of nonspecific or heterospecific antibodies while retaining the epitopic specificity of the original antibody. This is most clearly manifested in the development and use of "humanized" antibodies in which non-human CDRs are covalently joined to human FR and/or Fc/pFc' regions to produce a functional antibody.
Thus, for example, PCT International Publication Number WO 92/04381 teaches the production and use of 3o humanized marine RSV antibodies in which at least a portion of the marine FR regions have been replaced by FR regions of human origin. Such antibodies, including fragments of intact antibodies with antigen-binding ability, are often referred to as "chimeric"
antibodies.
Thus, as will be apparent to one of ordinary skill in the art, the present invention also provides for F(ab')2, Fab, Fv and Fd fragments; chimeric antibodies in which the Fc and/or FR
and/or CDRI and/or CDR2 and/or light chain CDR3 regions have been replaced by homologous human or non-human sequences; chimeric F(ab')2 fragment antibodies in which the FR and/or CDRI and/or CDR2 and/or light chain CDR3 regions have been replaced by homologous human or non-human sequences; chimeric Fab fragment antibodies in which the FR
and/or CDRI and/or CDR2 and/or light chain CDR3 regions have been replaced by homologous human or non-human sequences; and chimeric Fd fragment antibodies in which the FR and/or CDRl and/or CDR2 regions have been replaced by homologous human or non-human sequences. The present 1o invention also includes so-called single chain antibodies.
Thus, the invention involves polypeptides of numerous size and type that bind specifically to pheromone receptor polypeptides, and/or complexes of both pheromone receptor polypeptides and their ligand binding partners. These polypeptides may be derived also from sources other than antibody technology. For example, such polypeptide binding agents can be provided by degenerate peptide libraries which can be readily prepared in solution, in immobilized form or as phage display libraries. Combinatorial libraries also can be synthesized of peptides containing one or more amino acids. Libraries further can be synthesized of peptoids and non-peptide synthetic moieties.
Phage display can be particularly effective in identifying binding peptides useful 2o according to the invention. Briefly, one prepares a phage library (using e.g. m13, fd, or lambda phage), displaying inserts from 4 to about 80 amino acid residues using conventional procedures.
The inserts may represent, for example, a completely degenerate or biased array. One then can select phage-bearing inserts which bind to the pheromone receptor polypeptide.
This process can be repeated through several cycles of reselection of phage that bind to the pheromone 2s receptor polypeptide. Repeated rounds lead to enrichment of phage bearing particular sequences. DNA sequence analysis can be conducted to identify the sequences of the expressed polypeptides. The minimal linear portion of the sequence that binds to the pheromone receptor polypeptide can be determined. One can repeat the procedure using a biased library containing ' inserts containing part or all of the minimal linear portion plus one or more additional degenerate 3 o residues upstream or downstream thereof. Yeast two-hybrid screening methods also may be used to identify polypeptides that bind to the pheromone receptor polypeptides.
Thus, the pheromone receptor polypeptides of the invention, or a fragment thereof, can be used to screen peptide WO 99!00422 PCTNS98/13680 libraries, including phage display libraries, to identify and select peptide binding partners of the pheromone receptor polypeptides of the invention. Such molecules can be used, as described, for screening assays, for purification protocols, for interfering directly with the functioning of pheromone receptor and for other purposes that will be apparent to those of ordinary skill in the art.
A pheromone receptor polypeptide, or a fragment which contains the ligand binding site, also can be used to isolate naturally-occurring ligands and other binding partners of the receptors of the invention. For example, an isolated pheromone receptor can be used to isolate ligands that bind to the receptor binding site by immobilizing a receptor (or fragment containing the ligand binding site) on a chromatographic media, such as polystyrene beads, or a filter, and using the immobilized polypeptide to isolate molecules that bind to this affinity matrix in accordance with standard procedures for affinity chromatography.
It will also be recognized that the invention embraces the use of the pheromone receptor cDNA sequences in expression vectors, as well as to transfect host cells and cell lines, be these ~ 5 prokaryotic (e.g., E. coli), or eukaryotic (e.g., CHO cells, COS cells, yeast expression systems and recombinant baculovirus expression in insect cells). Especially useful are oocytes, mammalian cells such as mouse, hamster, pig, goat, primate, etc. They may be of a wide variety of tissue types, and include primary cells and cell lines. The expression vectors require that the pertinent sequence, i.e., those nucleic acids described supra, be operably linked to a promoter.
When administered, the therapeutic compositions of the present invention are administered in pharmaceutically acceptable preparations. Such preparations may routinely contain pharmaceutically acceptable concentrations of salt, buffering agents, preservatives, compatible carriers, supplementary immune potentiating agents such as adjuvants and cytokines and optionally other therapeutic agents.
The therapeutics of the invention can be administered by any conventional route, including injection or by gradual infusion over time. The administration may, for example, be oral, intravenous, intraperitoneal, intramuscular, intracavity, subcutaneous, or transdermal.
When antibodies are used therapeutically, a preferred route of administration is by pulmonary 3o aerosol. Techniques for preparing aerosol delivery systems containing antibodies are well known to those of skill in the art. Generally, such systems should utilize components which will not significantly impair the biological properties of the antibodies, such as the paratope binding WO 99/00422 PCT/US98/13b80 capacity (see, for example, Sciarra and Cutie, "Aerosols," in ReminQton's Pharmaceutical Science, 18th edition, 1990, pp 1694-1712; incorporated by reference). Those of skill in the art can readily determine the various parameters and conditions for producing antibody aerosols without resort to undue experimentation. When using antisense preparations of the invention, s slow intravenous administration is preferred.
' Preparations for parenteral administration include sterile aqueous or non-aqueous solutions, suspensions, and emulsions. Examples of non-aqueous solvents are propylene glycol, polyethylene glycol, vegetable oils such as olive oil, and injectable organic esters such as ethyl oleate. Aqueous carriers include water, alcoholic/aqueous solutions, emulsions or suspensions, Io including saline and buffered media. Parenteral vehicles include sodium chloride solution, Ringer's dextrose, dextrose and sodium chloride, lactated Ringer's or fixed oils. Intravenous vehicles include fluid and nutrient replenishers, electrolyte replenishers (such as those based on Ringer's dextrose), and the like. Preservatives and other additives may also be present such as, for example, antimicrobials, anti-oxidants, chelating agents, and inert gases and the like.
15 The preparations of the invention are administered in effective amounts. An effective amount is that amount of a pharmaceutical preparation that alone, or together with further doses, produces the desired response in the condition being treated, e.g., modifying fertility or pheromone-mediated behaviors that are related to reproduction or aggression.
For example, this can involve the use of the compounds of the invention as pesticides to slow or halt insect or 2o rodent behaviors that result in reproduction. Alternatively, this can involve the use of the compounds of the invention as agents for controlling fertility in animals (e.g., livestock, domestic animals), by providing compounds which inhibit or stimulate the behaviors in such animals that result in reproduction or agression. This can be monitored by routine methods, e.g., observing the behavior in the animal (vertebrate or invertebrate) recipient.
25 The invention also contemplates gene therapy, e.g., to prepare an animal model for studying the conditions and behaviors (e.g., fertility, aggression) that are pheromone receptor-mediated. The procedure for performing ex vivo gene therapy is outlined in U.S. Patent 5,399,346 and in exhibits submitted in the file history of that patent, all of which are publicly available documents. In general, it involves introduction in vitro of a functional copy of a gene 3o into a cells) of a subject which contains a defective copy of the gene, and returning the genetically engineered cells) to the subject. The functional copy of the gene is under operable control of regulatory elements which permit expression of the gene in the genetically engineered cell(s). Numerous transfection and transduction techniques as well as appropriate expression vectors are well known to those of ordinary skill in the art, some of which are described in PCT
application W095/00654. In vivo gene therapy using vectors such as adenovirus, retroviruses, herpes virus, and targeted Iiposomes also is contemplated according to the invention.
The invention further provides efficient methods of identifying pharmacological agents or lead compounds for agents active at the level of a pheromone receptor or pheromone receptor fragment modulatable cellular function. In particular, such functions include Iigand binding activity. Generally, the screening methods involve assaying for activation of pheromone receptors or assaying for compounds which interfere with a pheromone receptor activity such as pheromone receptor binding to its cognate Iigand. Such methods are adaptable to automated, high throughput screening of compounds. The target therapeutic indications for pharmacological agents detected by the screening methods that block pheromone receptor activity are limited only in that the target cellular function be subject to modulation by alteration of the formation of a complex comprising a pheromone receptor polypeptide or fragment thereof and one or more t 5 natural pheromone receptor ligands. Target indications include cellular processes modulated by pheromone receptor signal transduction following receptor-ligand binding.
A wide variety of assays for pharmacological agents are provided, including, labeled in vitro protein-protein binding assays, electrophoretic mobility shift assays, immunoassays, cell-based assays such as two- or three-hybrid screens, expression assays, activation of G-proteins, 2o etc. For example, three-hybrid screens are used to rapidly examine the effect of transfected nucleic acids on the intracellular binding of pheromone receptor or pheromone receptor fragments to specific extracellular targets (e.g., ligands in biological samples, such as urine, vaginal fluid, or in combinatorial libraries) .
Pheromone receptor fiagments used in the methods, when not produced by a transfected 25 nucleic acid are added to an assay mixture as an isolated polypeptide. The assay can be used to screen putative Iigands for their ability to bind to the receptor. Pheromone receptor polypeptides preferably are produced recombinantly, although such polypeptides may be isolated from biological extracts. Recombinantly produced pheromone receptor polypeptides include chimeric proteins comprising a fusion of a pheromone receptor protein with another polypeptide.
3o For example, a polypeptide fused to a pheromone receptor polypeptide or fragment may also provide means of readily detecting the fusion protein, e.g., by immunological recognition or by fluorescent labeling.
In addition to the pheromone receptor, a screening assay mixture includes a binding partner for the receptor, e.g., a naturally occurring ligand that is capable of binding to the pheromone receptor or, alternatively, is comprised of an analog which mimics the pheromone receptor binding properties of the naturally occurring ligand for purposes of the assay. The S screening assay mixture also comprises a candidate pharmacological agent (e.g., a putative receptor agonist or antagonist). Typically, a plurality of assay mixtures are run in parallel with different agent concentrations to obtain a different response to the various concentrations.
Typically, one of these concentrations serves as a negative control, i.e., at zero concentration of agent or at a concentration of agent below the limits of assay detection.
Candidate agents 1 o encompass numerous chemical classes, although typically they are organic compounds.
Preferably, the candidate pharmacological agents are small organic compounds, i.e., those having a molecular weight of more than 50 yet less than about 2500, preferably less than about 1000 and, more preferably, less than about 500. Candidate agents comprise functional chemical groups necessary for structural interactions with polypeptides and/or nucleic acids, and typically 15 include at least an amine, carbonyl, hydroxyl or carboxyl group, preferably at least two of the functional chemical groups and more preferably at least three of the functional chemical groups.
The candidate agents can comprise cyclic carbon or heterocyclic structure and/or aromatic or polyaromatic structures substituted with one or more of the above-identified functional groups.
Candidate agents also can be biomolecules such as peptides, saccharides, fatty acids, sterols, 2o isoprenoids, purines, pyrimidines, derivatives or structural analogs of the above, or combinations thereof and the like. Where the agent is a nucleic acid, the agent typically is a DNA or RNA
molecule, although modified nucleic acids as defined herein are also contemplated.
Candidate agents are obtained from a wide variety of sources including libraries of synthetic or natural compounds. For example, numerous means are available for random and 25 directed synthesis of a wide variety of organic compounds and biomolecules, including expression of randomized oligonucleotides, synthetic organic combinatorial libraries, phage display libraries of random peptides, and the like. Alternatively, libraries of natural compounds in the form of bacterial, fungal, plant and animal extracts are available or readily produced.
Additionally, natural and synthetically produced libraries and compounds can be readily be 3o modified through conventional chemical, physical, and biochemical means.
Further, known pharmacological agents may be subjected to directed or random chemical modifications such as WO 99!00422 PCT/US98l13680 acylation, alkylation, esterification, amidification, etc. to produce structural analogs of the agents.
A variety of other reagents also can be included in the mixture. These include reagents such as salts, buffers, neutral proteins (e.g., albumin), detergents, etc.
which may be used to facilitate optimal protein-protein and/or protein-nucleic acid binding. Such a reagent may also reduce non-specific or background interactions of the reaction components.
Other reagents that improve the efficiency of the assay such as protease, inhibitors, nuclease inhibitors, antimicrobial agents, and the like may also be used.
The mixture of the foregoing assay materials is incubated under conditions whereby, but 1 o for the presence of the candidate pharmacological agent, the pheromone receptor polypeptide specifically binds the cellular binding target, a portion thereof or analog thereof. The order of addition of components, incubation temperature, time of incubation, and other parameters of the assay may be readily determined. Such experimentation merely involves optimization of the assay parameters, not the fundamental composition of the assay. Incubation temperatures typically are between 4°C and 40°C. Incubation times preferably are minimized to facilitate rapid, high throughput screening, and typically are between 0.1 and 10 hours.
After incubation, the presence or absence of specific binding between the pheromone receptor polypeptide and one or more binding targets is detected by any convenient method available to the user. For cell free binding type assays, a separation step is often used to separate 2o bound from unbound components. The separation step may be accomplished in a variety of ways. Conveniently, at least one of the components is immobilized on a solid substrate, from which the unbound components may be easily separated. The solid substrate can be made of a wide variety of materials and in a wide variety of shapes, e.g., microtiter plate, microbead, dipstick, resin particle, etc. The substrate preferably is chosen to maximum signal to noise ratios, primarily to minimize background binding, as well as for ease of separation and cost.
Separation may be effected for example, by removing a bead or dipstick from a reservoir, emptying or diluting a reservoir such as a microtiter plate well, rinsing a bead, particle, chromatographic column or filter with a wash solution or solvent. The separation step preferably includes multiple rinses or washes. For example, when the solid substrate is a microtiter plate, 3o the wells may be washed several times with a washing solution, which typically includes those components of the incubation mixture that do not participate in specific bindings such as salts, buffer, detergent, non-specific protein, etc. Where the solid substrate is a magnetic bead, the beads may be washed one or more times with a washing solution and isolated using a magnet.
Detection may be effected in any convenient way for cell-based assays such as two- or three-hybrid screens. The transcript resulting from a reporter gene transcription assay of S Pheromone receptor polypeptide binding to a target molecule typically encodes a directly or indirectly detectable product, e.g., [3-galactosidase activity, luciferase activity, and the like. A
wide variety of cell based assays for G-protein coupled receptors could also be employed for detection of molecules that stimulate (agonsists) pheromone receptors or block (agonists) that stimulation by natural ligands or agonists. Pheromone receptor polypeptides or chimeric receptors composed only in-part of a pheromone receptor could be employed in these assays.
The chimeric receptors might, for example, contain part of another G-protein coupled receptor such that binding of a ligand to the pheromone receptor binding domain results in coupling to a particular G-protein where activation could be easily assayed. For cell free binding assays, one of the components usually comprises, or is coupled to, a detectable label. A
wide variety of 15 labels can be used, such as those that provide direct detection (e.g., radioactivity, luminescence, optical or electron density, etc). or indirect detection (e.g., epitope tag such as the FLAG epitope, enzyme tag such as horseradish peroxidase, etc.). The label may be bound to a pheromone receptor binding partner (ligand), or incorporated into the structure of the binding partner.
A variety of methods may be used to detect the label, depending on the nature of the label 2o and other assay components. For example, the label may be detected while bound to the solid substrate or subsequent to separation from the solid substrate. Labels may be directly detected through optical or electron density, radioactive emissions, nonradioactive energy transfers, etc.
or indirectly detected with antibody conjugates, strepavidin-biotin conjugates, etc. Methods for detecting the labels are well known in the art.
25 The invention provides pheromone receptor -specific binding agents, methods of identifying and making such agents, and their use in diagnosis, therapy and pharmaceutical development, including the development of pesticides and other agents for controlling fertility and reproduction (or related behaviors) in animals. For example, pheromone receptor-specific pharmacological agents are useful in a variety of diagnostic and therapeutic applications, 3o especially where disease or disease prognosis is associated with improper utilization of a pathway involving pheromone receptor. Novel pheromone receptor-specific binding agents include pheromone receptor-specific antibodies and other natural intracellular binding agents identified with assays such as two hybrid screens, and non-natural intracellular binding agents identified in screens of chemical libraries and the like.
In general, the specificity of pheromone receptor binding to a binding agent is shown by binding equilibrium constants. Targets which are capable of selectively binding a pheromone receptor polypeptide preferably have binding equilibrium constants of at least about 10' M-', more preferably at least about 10g M'', and most preferably at least about 109 M-'. The wide variety of cell based and cell free assays may be used to demonstrate pheromone receptor -specific binding. Cell based assays include one, two and three hybrid screens, assays in which pheromone receptor -mediated transcription is inhibited or increased activation of G-proteins, etc. Cell free assays include pheromone receptor -protein binding assays, immunoassays, etc.
Other assays useful for screening agents which bind pheromone receptor polypeptides include fluorescence resonance energy transfer (FRET), and electrophoretic mobility shift analysis (EMSA).
Various techniques may be employed for introducing nucleic acids of the invention into cells, depending on whether the nucleic acids are introduced in vitro or in vivo in a host. Such techniques include transfection of nucleic acid-CaP04 precipitates, transfection of nucleic acids associated with DEAF, transfection with a retrovirus including the nucleic acid of interest, liposome mediated transfection, and the like. For certain uses, it is preferred to target the nucleic acid to particular cells. In such instances, a vehicle used for delivering a nucleic acid of the 2o invention into a cell (e.g., a retrovirus, or other virus; a liposome) can have a targeting molecule attached thereto. For example, a molecule such as an antibody specific for a surface membrane protein on the target cell or a ligand for a receptor on the target cell can be bound to or incorporated within the nucleic acid delivery vehicle. For example, where liposomes are employed to deliver the nucleic acids of the invention, proteins which bind to a surface membrane protein associated with endocytosis may be incorporated into the liposome formulation for targeting and/or to facilitate uptake. Such proteins include capsid proteins or fragments thereof tropic for a particular cell type, antibodies for proteins which undergo internalization in cycling, proteins that target intracellular localization and enhance intracellular half life, and the like. Polymeric delivery systems also have been used successfully to deliver 3o nucleic acids into cells, as is known by those skilled in the art. Such systems even permit oral delivery of nucleic acids.
Preparation and analysis of single cell cDNAs Male mouse (C57BL/6J) VNOs were minced, incubated in Trypsin-EDTA (Gibco-BRL/LTI, Rockville, Maryland), and triturated to obtain dissociated cells. The cells were centrifuged ( 1000 RPM, 5 min) and resuspended in phosphate buffered saline +
0.1 % bovine serum albumin. Individual cells that appeared to be neurons were transferred to separate tubes 1 o with a microcapillary pipet.
cDNAs were prepared from each cell and amplified according to Brady and Iscove (Methods in Enzymology, 1993, 225:611-621) with minor modifications. Briefly, cDNAs were prepared from the 3' ends of mRNAs by reverse transcription with an oligo (dT) primer, and a poly dA stretch was added to each cDNA with terminal transferase. The cDNAs were then I5 amplified by PCR with one of two primers, AL1 (ATTGGATCCAGGCCGCTCTGGACAA
AATATGAA TTC(T) ( SEQ. ID. No. 56) (Dulac and Axel, Cell, 1995, 83:195-206 or (GGCACATGG ACGAAATCTTGGTACTCTTCAGAATTC(T), (SEQ. ID. No. 57) and Taq polymerase [Amplitaq LD ("ALD") or Amplitaq Stoffel Fragment ("ASF") (Perkin Elmer, Norwalk, CT )].
20 Aliquots of each cDNA sample were electrophoresed on agarose gels and blotted onto nylon membranes (Hybond N+, Amersham, Piscataway, NJ) (Ausubel, F., et al., Current Protocols in Molecular Biology, 1988, John Wiley & Sons NY, NY; Sambrook, J., et al., Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, 1989). The blots were hybridized at 55° or 70°C in Hyb Buffer (O.SM sodium phosphate 25 buffer (pH7.3), 4% SDS, 1% bovine serum albumin (BSA)) with 32P-labeled probes prepared by random priming (Prime-It II, Stratagene, La Jolla, CA).
Construction and screening of single cell cDNA libraries ' An aliquot of cDNA sample VN14 was digested with Eco RI and gel-isolated fragments 30 of 0.1-1.5 kb were cloned into ~.ZapII Ausubel, F., et al., Current Protocols in Molecular Biology, 1988, John Wiley & Sons NY, NY; Sambrook, J., et al., Molecular Cloning: A
Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, 1989).
Two thousand library clones were plated at low density. Replica filter lifts were hybridized at 75°C
(in Hyb Buffer containing 2~g/ml poly (dT)24 and 1pg/ml of random dA-dT 20-mers) t0 32P-labeled probes (~2.5 x 108 CPM/pg; 5 x 106 CPM/ml) prepared by PCR of different single cell cDNA samples. Clones that hybridized to only a VN14 probe were isolated, and a probe prepared from the insert of each was hybridized to blots of selected single cell cDNAs. Clones that hybridized to only VN14 cDNAs were sequenced.
Isolation and analysis of VR cDNA clones sc153, one VN14+VN2- clone from the VN14 library, was used as probe to screen a to mouse VNO cDNA library ('~,VNO') (Berghard, A., et al., JNeurosci, 1996, 16:909-918) and a mouse genomic DNA library (Stratagene, La Jolla, CA) (70°C, Hyb buffer). Hybridizing clones were found only in the genomic library. A fragment containing 2kb upstream of sc153 was isolated from one genomic clone (15361) and used to screen 1VN0 (55°C, Hyb Buffer). The region (D10-TM7) of one clone (D10) that showed homology to TM7 of the CSR
(SEQ ID NO.
59) was then used to screen 1VN0 (55°C, Hyb Buffer), yielding a variety of VR cDNA clones.
Additional clones were obtained from 1VN0 using probes prepared from clones previously isolated, or from PCR products obtained by amplification of mouse genomic DNA
or VNO
cDNA with degenerate primers (Buck, L., et al., Cell, 1991, 65:175-187) matching conserved motifs in the VRs. Some PCR products were also cloned into pCR2.1 (Invitrogen, Carlsbad, 2o CA) and sequenced.
Analysis of VR mRNAs by RT-PCR
Random-primed cDNA prepared from male or female C57BL/6J mouse VNO RNAs (or VR cDNA clones) were used in PCR reactions with degenerate primers (Buck and Axel, Cell 1991, 65:175-187) matching conserved VR motifs to amplify VR sequences corresponding to amino acids 33-772 in VRl (SEQ ID NO. 2). Nested PCR was performed with a 1/1000 dilution of the first PCR reaction and primer pairs matching regions of putative exons 1 and 6 in specific VR cDNA clones. Blots prepared from size-fractionated, nested PCR products were hybridized (70°C, Hyb buffer containing 100p,g/ml herring sperm DNA (Sigma, St Louis, MO)) to probes 3o prepared from the PCR products of the cDNA clones.
Northern and Southern blots and genomic library screens - _ - 43 -Northern Blots: One ~g of PolyA+ RNA prepared from mouse VNO and OE, or purchased from Clontech (other tissue RNAs), was size fractionated on formaldehyde gels, and blotted (see above) (Berghard and Buck, J Neurosci, 1996, 16:909-918). The blot was hybridized (70°C, Hyb Buffer) with a 32P-labeled probe prepared from the regions of cDNAs VRI, VR2, VR4, and VR15 corresponding to that encoding amino acids 33-772 in VR1 (SEQ
ID NO. 1 ).
Southern Blots: 5 ~g of genomic DNA prepared from C57BL6/J mouse liver was digested with Eco RI or Hind III, size fractionated, and blotted (Ressler et al, Cell, 1993, 73:597-609). The blots were hybridized (70°C, Hyb buffer containing sperm DNA
{see above)) to 1 o probes prepared from 3' untranslated segments of different VR cDNA clones [VR2 (nt.2607-2961 of SEQ ID NO. 3), VR3 (nt. 2505-2907 of SEQ ID NO. S), and VR15 (nt. 3239-3689 of SEQ ID NO. 29)]. A VR4 probe was also used, which gave the same results as highly related VR15 probe.
Genomic library screens to determine VR gene number: A mouse genomic library was screened separately at 70°C or 55°C (see above) with different 32P-labeled probes. Probe 1: a mix of segments of cDNAs VRl (SEQ ID NO. 1 ), VR2 (SEQ ID NO. 3), VR4 (SEQ ID
NO. 7), and VR15 (SEQ ID NO. 29) encoding the region cowesponding to amino acids 619-772 of VRl (SEQ ID NO. 2). Probes 2-6: Segments of VR genes obtained from mouse genomic DNA by PCR with degenerate primers matching conserved VR sequence motifs. The PCR
segments 2o corresponded to the following amino stretches in VRl (SEQ ID NO. 2): amino acids 191-397, 565-825, 637-825, 637-804, and 619-784. For example, degenerate oligonucleotide primer pairs used included:
for amino acids 191-397:
5' primer= (GCT)TI(CT)A(CT) CA(AG)(AG)TIGCI(AC~IAA(AG)GA(CT)AC (SEQ ID NO.
60), 3' primer= G(CT)(AG)T(GT)IGCI(AG)(CT)I(AG)C(AG)T{AG)IACI(AG)C(AG)TT (SEQ ID
NO. 61 );
for amino acids 565-825:
5' primer= (ACXAG)ITG (CT)CCI(GT)AIIA(CTXAC)A{AG)TA(CT)GCIAA (SEQ ID NO. 62), 3' primer= GIC(GT)IA(C'T)IA(AG)IATIA (CT)(AG)TAI(AC)(AT)(CT)TTIGGIAC (SEQ ID
NO. 63);
for amino acids 637-825:
5' primes= ATI(AT)(GC)I (CT) TI(AG)TITT(CT)TG(CT)TT(CT)(CT)TITG (SEQ ID NO.
64), 3' primer= GIC(GT)IA(CT)IA(AG)IATIA (CT)(AG)TAI(AC)(AT)(CT)TTIGGIAC (SEQ ID
NO. 63);
for amino acids 637-804:
5' primer-- ATI(AT)(GC)I(CT)TI(AG)TITT(CT)TG(CT)TT(CT)(CT)TITG (SEQ ID NO.
64), 3' primer= (AG)IATI(GC)(AT)(AG)AAIA(CT)(CT)TCIACI (AG)CIACCAT (SEQ ID NO. 65);
and for amino acids 619-784:
5' primer= GA(CT)ACICCIATIGTIAA(AG)GCIAA(CT}AA (SEQ ID NO. 66), 3' primer= AAIGTIA(CT)CCAIACI(GC)(AT)(AG)CA(AG)AAIAC (SEQ ID NO. 67), wherein all primers are in a 5'-» 3' direction, I:Inosine.
In situ hybridization is In situ hybridization was performed according to Schaeren-Wiemers and Gerfin-Moser (Histochemistry, 1993, 100:431-440) with sequential 16 micron sections of male or female VNOs. Digoxigenin- labeled cRNA probes were prepared from the same 3' untranslated regions of VR cDNAs as used for the genomic Southern blots. Sections were counter-stained with Hoechst 33258, which labels nuclei. The numbers of G,o- or G,;Z-labeled cells (or cells labeled 2o with VR probes) was determined by counting the number of nuclei in labeled regions. The total number of cells was considered to be the sum of G,~+ and G,~+ cells in adjacent sections.
Chromosome mapping of VR genes Southern blots of genomic DNA from C57BL/6J and Mus spretus (Jackson Labs) 25 digested with different restriction enzymes were prepared and probed with specific VR cDNA
probes as described above. Southern blots of Eco RI, size fractionated genomic DNAs from 94 different backcross mice (M. spretus x (M. spretus x C57BL/6J)), were purchased from Jackson Labs. These blots were hybridized to probes prepared from 3' untranslated segments of the VR2 or VR4 (see above) cDNA at 70°C and washed (see above}. Polymorphic bands were typed as 3o either M. spretus or M. spretus/C57BL/6J. The data was sent to the Jackson Laboratory Backcross DNA Mapping Panel Resource for determination of the chromosomal locations of the polymorphic fragments. Additional information was obtained via Internet from Jackson Laboratory Mouse Genome Informatics.
Cloning of a gene differentially expressed in G,~+ VNs Different members of the OR and VNR families are expressed in different neurons in the OE and G,;~+ zone of the VNO, respectively. It therefore appeared likely that the same would be true of sensory receptors expressed by G,~+ VNs. The differential screening of cDNA libraries with cDNA probes prepared from a few neurons can be used to identify genes expressed in one neuron, but not another (Buck, L., et al, Annu. Rev. Neurosci., 1996, 19:517-544). Using PCR, this can be accomplished with single cells (Brady, G., et al., Methods in Enzymology, 1993, 225:611-621; Dulac, C., et al., Cell, 1995, 83 :195-206).
To search for genes encoding receptors expressed by G,~+ VNs, we looked for genes expressed in one G,~+ VN, but not another, using the PCR-based differential screening approach.
In initial experiments, we isolated a series of mouse VNs, prepared cDNAs from the 3' ends of ~ s mRNAs present in each, and amplified the single-cell cDNA fragments by PCR. Many of the amplified, single-cell cDNA samples hybridized to an OMP probe, confirming their derivation from VNs (Berghard et al, Proc. Natl. Acad. Sci. USA, 1996, 93:2365-2369).
With one exception, Gp and G,;Z probes hybridized to different OMP+ samples, allowing us to identify samples that were derived from Ga+ VNs.
2o We next prepared a library from one of the Gn+ single-cell cDNA samples (VN14), and isolated clones that hybridized to a probe prepared from VN14, but not to a probe prepared from another G,~+ sample (VN2). We identified 3 VN14+VN2- clones, which differed in size, but were otherwise identical in sequence. None contained an open reading frame, which was not surprising since, in the method used, the amplified cDNAs are only 400-800 by long, and are 25 derived from the 3' ends of mRNAs (Brady and Iscove, Methods in Enzymology, 1993, 225:611-621 ).
I We next hybridized one of the VN14+VN2- clones (sc153) to the original panel of single-cell cDNAs. sc 153 hybridized to VN 14, but not to any of the other cDNA samples.
Consistent with this result, sc153 hybridized to only a small percentage (~0.3%) of VNs in VNO
30 tissue sections.
Using sc153 as probe, we were able to isolate a sc153+ clone from a mouse genomic library which contained ~2 kb of DNA S' to the sc153 sequence. Using this 2kb fragment as probe, we isolated a matching clone (D10) from the VNO cDNA library. Sequence analysis showed that sc 153 and D 10 were derived from the same gene, but that the D 10 cDNA was truncated at the 3' end and did not contain the final 685 by of sequence present in sc 153. Like sc153, D10 hybridized to only a small percentage of VNs in VNO tissue sections.
The 5' end of the D 10 cDNA contained a short open reading frame, which encoded a protein fragment with homology to transmembrane domain 7 (TM7) of the calcium sensing receptor (CSR), a G protein-coupled receptor (GPCR) (Brown et al, Nature, 1993, 366:575-580).
When the TM7-related region of D 10 (D 10-TM7) was hybridized at reduced stringency (55°C) to the original panel of single-cell cDNAs, it labeled many of the G,°+
samples, but none of G,~+
ones (except the one that was also G,°+, and was probably derived from two cells). Since D10 labeled only a small percentage of VNs in tissue sections under high stringency conditions, this suggested that many G,°+ neurons express a gene related to D 10, but not identical to it.
A novel multigene family encoding VNO receptors Hybridization of D10-TM7 to the VNO cDNA library at reduced stringency yielded a number of related cDNA clones (e.g. VRl-VR3, SEQ ID NOs. 1-6). Additional related cDNAs were obtained by RT-PCR with degenerate primers (e.g. VR6-VR7, SEQ ID NOs. 11-14), or by screening the VNO cDNA library with a PCR product obtained from genomic DNA
(e.g., VR4, VRS, SEQ ID NOs. 7-10).
2o These cDNAs encode a novel family of proteins, which are members of the G
protein-coupled receptor (GPCR) superfamily (Figure 1). Like other GPCRs, these VNO
receptors (VRs) have 7 hydrophobic stretches that may serve as membrane spanning domains. Only 287 of 850 residues are identical in all of the molecules shown in Figurel, indicating that the family is diverse. The VRs are related to two other types of GPCR, the calcium sensing receptor (CSR) and the metabotropic glutamate receptors (mGluRs) (Tanabe, Y., et al., Neuron, 1992, 8:169-179; Brown, E., et al., Nature, 1993, 366:575-580). The most highly related molecule is the CSR; for example, VRl is 31% identical to rat CSR (Riccardi et al., Proc.
Natl. Acad. Sci. USA, 1995, 92:131-135), with the highest homology residing in the TM1-TM7 region (44%) (Figure 1 ). However, the VRs comprise a distinct family of receptors, which share novel sequence 3o motifs, and are more related to one another than they are to other receptors. For example, two divergent VRs, VRl (SEQ ID NO. 1, 2) and VR4 (SEQ ID NO. 7, 8), are 70%
identical in TM1-TM7, and 48% identical overall.
_ - 47 _ The VRs are unusual among GPCRs in having an extremely long N-terminal extracellular domain (Figures 1 and 2). This feature is shared by the CSR and mGluRs, and by an unrelated class of GPCRs that includes several receptors for glycoprotein hormones (Segaloff, D., et al., Oxf. Rev. Reprod Biol., 1992, 14:141-168). Importantly, the VRs are very different from both ORs and VNRs, which are also GPCRs (Buck. L., et al., Cell, 1991 51:127-133;
Dulac, C., et al., Cell, 1995, 83:195-206). VRs share none of the characteristic sequence motifs of ORs or VNRs. In addition, the size of the N-tenrninal extracellular domain of VRs (557-565 amino acids) far exceeds that of ORs and VNRs (~12-28 amino acids) (Figure 2). The VRs are most variable in the N-terminal domain (25% identical residues compared to 57% in TMl-TM7). In t o the structurally-related mGluRs, the ligand binding site is thought to reside in the large N-terminal domain (O'I3ara et al., Neuron, 1993, 11:41-52; Takahashi et al, J.
Biol. Chem., 1993, 268:1934I-19345). If this is also true of VRs, the accentuated diversity of the N-terminal domain may reflect an ability to recognize diverse pheromonal ligands.
Most of the VR cDNAs that we analyzed appeared to belong to one of three subfamilies ~5 of highly related molecules. For example, VRl (SEQ ID NOs. 1, 2), VR2 (SEQ
ID NOs. 3, 4), and VR3 (SEQ ID NOs. 5, 6) are very similar as are VR4 (SEQ ID NOs. 7, 8) and VRS (SEQ
ID NOs. 9, IO), and VR6 (SEQ ID NOs. 11, 12) and VR7 (SEQ ID NOs. 13, 14) (Figure 1).
Nonetheless, our results indicate that all of these cDNAs were derived from different genes.
First, all cDNAs were sequenced on both strands to rule out sequencing errors.
Second, the RNA
2o used for library construction and PCR came from an inbred mouse strain (C57BL/6J), so they cannot be allelic variants. Third, the error rates of reverse transcriptase (or Taq polymerase) cannot account for the extent to which the cDNAs differ. For example,VR4 (SEQ
ID NOs. 7, 8) and VRS (SEQ ID NOs. 9, 10) cDNAs are 99% identical in nucleotide sequence, but the reverse transcriptase used to prepare them has an error rate of only 3.6 x 10-s/bp (Ji, J., et al., 2s Biochemistry,1992, 31:954-958).
Variant forms of VR mRNA
Many of the VRs we characterized lacked a segment of the N-terminal domain present in other VRs. Invariably, the missing segment corresponded to a region of the human CSR
3o encoded by a single exon, or pair of exons (Pollak, M., et al., Cell, 1993, 73:1297-1303). We also found several different VR cDNAs that contained a stretch of noncoding sequence at a site corresponding to a CSR exon-intron boundary (e.g. VRI S). This suggested that the exon-intron structure of VR genes resembles that of the CSR gene, and that variant forms of VR mRNAs might be generated by differential RNA splicing.
Variant VR mRNAs could derive either from different genes, or from the same gene by alternative RNA splicing. Consistent with the latter possibility, two pairs of cDNAs that we sequenced VR8 (SEQ ID NOs. 15, 16) and VR9 (SEQ ID NOs. I7, 18), and VR10 (SEQ
ID
NOs. 19, 20) and VRl 1 (SEQ ID NOs. 21, 22) were identical in nucleotide sequence, but were missing different segments. However, when we used RT-PCR to amplify VNO mRNA
sequences encoding 5 different VRs, we obtained one major PCR product in each case, regardless of whether the RNA used was from male or female mice. In 4 cases, the size of the to major product corresponded to a complete VR, even though one of the cDNAs (but not the PCR
product) contained an intron (#5). In one case, in which the cDNA lacked one exon {#2), the major PCR product was even smaller, and was found to lack two exons. Although PCR products of a smaller size were also seen in these experiments, they were much less abundant.
These results suggest that different VR forms derive from different genes.
Thus many ~5 VR genes may be expressed pseudogenes, which either lack one or more exons, or have mutations that prevent proper RNA splicing. We cannot exclude the possibility that some variant VRs are functional, however. For example, some truncated VRs that lack transmembrane domains could conceivably be secreted pheromone-binding proteins.
2o Differential expression of VR genes in VNO neurons To investigate the tissue distribution of VR gene expression, we conducted Northern blot analyses in which size fractionated polyA+ RNAs from different mouse tissues were hybridized to a mix of radiolabeled VR cDNAs. The mixed probe hybridized to VNO RNAs of ~1.9-3.7 kb, with intense hybridization to RNAs of 2.8-3.5 kb. It did not hybridize to RNAs from a 25 variety of other tissues, including olfactory epithelium and brain. This suggested that VR genes may be expressed exclusively in the VNO.
We found two partial cDNAs that were highly related to VR cDNAs in the NCBI
dbEST
database, one from spleen and the other from 2-cell stage mouse embryos.
However, when we hybridized the most highly related VR cDNAs (VR6 and VR7) to spleen sections, only one 3o questionably-labeled cell was seen out of ~1.4 x 106 cells with one VR
probe, and none was seen with the other. The EST clones might be DNA contaminants, or be due to the widespread, but low level, misexpression of tissue specific genes {Sarkar, G., et al., Science, 1989, 244:331-334);
nonetheless, we cannot exclude the possibility that VR genes are expressed at a low frequency in some other tissues.
To examine the patterns of expression of different VR genes in the VNO, we conducted in situ hybridization experiments. Labeled segments of the 3' untranslated regions of three VR
cDNAs were hybridized separately, or in combination, to sequential sections through the VNO.
Probes prepared from G,~ and G,~ cDNAs were hybridized to adjacent sections to delineate the G,~+ and G,~+ zones of the VNO neuroepithelium.
The Gp and G,;2 probes gave patterns of hybridization similar to those we had previously seen (Berghard, A., et al, J. Neurosci., 1996, 16:909-918). The G,~probe hybridized to a wavy stripe of VNO neurons in the basal (lower) region of the VNO neuroepithleium, whereas the G,;z probe hybridized to an adjacent stripe of neurons in the apical (upper) part of the neuroepithelium. The waviness of the two zones appears to be caused by the periodic presence of blood vessels near the base of the epithelium (Berghard, A., et al, J.
Neurosci., 1996, 16:909-918). Approximately 57% of VNs were labeled by the G,,Z probe and 43% were labeled by the 1s Gp probe. The single layer of supporting cells located just beneath the epithelial surface was not labeled by either probe.
Each of the VR probes hybridized to a small percentage (2.4-5.7%) of VNs that appeared to be restricted to the basal, Gm+ zone of the VNO neuroepithelium. Labeled neurons were scattered throughout the anterior-posterior and dorsal-ventral extent of the G,~+ zone. Small 2o clusters of labeled cells were somtimes seen, particularly with the VR2 probe The mixed probe labeled a larger percentage of VNs (10.6%) that was almost equal to the sum of the percentages labeled by its individual components (10.8%). Thus different G,~+ neurons must express different VRs.
No differences were seen in the patterns of hybridization obtained using VNOs from male 25 and female mice, and no hybridization was observed in the nasal olfactory epithelium using either the mix of VR probes or a full-length VR cDNA probe (not shown).
Subsequent analyses of the size of the VR gene family, and the number of VR genes recognized by the VR in situ hybridization probes, allowed us to estimate the number of VR genes expressed by individual neurons (see below).
The size of the VR multigene family To investigate the size of the VR gene family, we hybridized several different mixed VR
gene probes to a mouse genomic library, using high (70°C) or low (55°C) stringency conditions.
A probe prepared from the membrane spanning regions (putative exon 6) of several different cDNA clones hybridized to 59 and 98 clones per haploid genome equivalent, at high and low stringency, respectively. To obtain probes that were potentially more diverse, we amplified internal segments of putative exon3 or 6 from genomic DNA by PCR with degenerate primers.
At high stringency, these probes hybridized to 60-140 clones per haploid equivalent. These results indicate that there are as many as 140 VR genes in the mouse genome.
The VR probes that we used for in situ hybridization each labeled a small percentage of 1 o neurons. To determine how many VR genes each probe recognized, we hybridized probes prepared from the same VR cDNA segments to Southern blots of C57BL/6J mouse genomic DNA which had been digested with Eco RI or Hind III. Each probe hybridized to a small number of restriction fragments. Given the small size of the probes {350-450 bp), most of these fragments should represent at least one gene, provided that there are no introns in the region probed. Consistent with this assumption, the VRZ {SEQ ID NO. 3) probe hybridized to 7 different restriction fragments, as many as five of which could be accounted for by characterized VR cDNAs that were 91-98% identical to VR2 (SEQ ID NO. 3) in the region probed.
Given the number of genes recognized by each VR probe and the percentage G,°+ neurons that hybridized to each, we estimate that each VR gene may be expressed in only ~1.1-1.9% of 2o G,°+ VNs. Since there appear to be 60-140 VR genes in the mouse genome, this suggests that each Gm+ VNO neuron may express only one, or at most a few, VR genes.
Linkage of chromosomal clusters of VR and OR genes We previously found that there are clusters of OR genes at multiple chromosomal sites in the mouse genome (Sullivan, S., et al., Proc. Natl. Acad. Sci., 1996, 93:884-888). To investigate the chromosomal locations of VR genes, we used the Jackson Laboratory Backcross DNA Mapping Panel, which allows the mapping of mouse genes using interspecies mouse crosses.
Probes prepared from the 3' untranslated regions of VR2 (SEQ ID NO. 3) or VR4 cDNAs 3o were first hybridized to Southern blots of genomic DNAs from two mouse species, C57BL/6J
and Mus spretus, which had been digested with different restriction enzymes.
Eco RI digests showed a number of restriction length polymorphisms with both VR probes. The VR probes were then hybridized to Eco RI-digested DNAs from a large panel of different backcross mice ((C57BL/6J x M. spretus) x M. spretus).
The patterns of inheritance of the polymorphic fragments recognized by the two VR
probes allowed us to assign chromosomal locations to approximately 9 VR genes.
Using the VR4 (SEQ ID NO. 7) probe, we could follow the inheritance of 4 polymorphic restriction fragments. All of these cosegregated in the backcrosses, and mapped to the proximal end of chromosome 7 (near D7Bir5). Five restriction fragments were followed for the VR2 (SEQ ID
NO. 3) probe. Again, all of the restriction fragments cosegregated, allowing us to map the VR2 (SEQ ID NO. 3) fragments to the distal end of chromosome 4 (near D4Bir1).
Given the 1 o resolution of the genetic mapping, the cosegregating fragments can be no more than 3.8 cM from one another. These results indicate that VR genes are located near the ends of at least two different mouse chromosomes. They also indicate that highly related VR genes are clustered at the same chromosomal locus, as previously seen in our studies and others (Ben-Arie et al, Human Molecular Genetics, 1994, 3:229-235.).
The VR4 gene subfamily appears to be closely linked to one OR gene locus, (olfRS ) (Sullivan, S., et al., Proc. Natl. Acad. Sci., 1996, 93:884-888). Although the VRs and ORs were mapped in different mouse crosses, the synaptotagmin-3 gene (Syt3 ) was mapped in both crosses, allowing an estimate of their relative positions. The OR locus mapped 15.05 cM
proximal to Syt3 while the VR4 gene cluster mapped 14.89 cM proximal to Syt3.
(Jackson 2o Laboratory Mouse Genome Informatics), suggesting a close linkage between VR
and OR genes at the proximal end of chromosome 7. Our previous studies indicate that multiple OR gene loci arose via a series of duplications of very large chromosomal domains that maintained linkages between OR genes and members of other gene families. These results therefore suggest that VR
genes and OR genes might have been linked in a primitive ancestor. They also suggest the possibility that additional clusters of VR genes might be linked to other OR
gene loci.
Preparation of cDNA Libraries from Isolated VNO Neurons 3o VNOs were dissected from adult (7- to 8-week-old) male Lewis rats (Sprague-Dawley).
Single-cell cDNA synthesis and amplification were performed and checked according to Dulac and Axel (Cell,1995, 83:195-206). Southern blot analysis of single-cell cDNA
was used to detect expression of tubulin, OMP, Go, and Gi2a (Dulac and Axel, Cell, 1995, 83:195-206).
Eighteen cDNAs showed strong hybridization with tubulin and OMP probes, indicating that they originated from mature neurons, and were selected for further study. Cells VN3 and VN13 exhibited high levels of Go expression, whereas VN10 showed presence of Gi2a, indicating the origin of these cells from two distinct regions of the VNO neuroepithelium. VN
13 single-cell cDNA library was prepared according to Dulac and Axel (Cell, 1995, 83:195-206).
Differential Screening of Single-Cell Library Plaque-forming units ( 12 x 103} from the VN 13 library were plated at low density, and 1 o duplicate filters (Hybond N+, Amersham) were hybridized with probes generated from VN 10 and VN 13 single-cell cDNAs, following the procedure described in Dulac and Axel, Cell, 1995, 83:195-206. Ten phage plaques were detected that showed a positive signal unique to the VN13 probe. These plaques were purified, and the corresponding phage inserts were amplified by PCR, run on 1.5% agarose gel, blotted onto nylon filter, and hybridized with the VN10, VN3, and t5 VN13 single-cell cDNA probes.
Isolation and Analysis of Full-Length cDNA Clones A 425 by clone, Go-VN13A, present at the frequency of 0.1% in the VN13 single-cell cDNA library, was selected and in vivo excised to generate the pBlueScriptSK(-) phagemid.
2o High stringency (65 °C) screening of a cDNA library prepared from female rat VNO (Dulac and Axel, Cell, 1995, 83:195-206) with the Go-VN13A cDNA probe led to the isolation of Go-VN13B (SEQ ID NO. 49) , presenting 90% sequence homology with Go-VN13A.
Phages (7.2 x 105) of the female rat VNO library were further screened with the Go-VN13B (SEQ ID
NO. 49) cDNA probe under low stringency conditions: hybridization was carried out at 55 °C for 25 24 hr, and the filters were washed three times at 55°C for 30 min in O.Sx SSC and 0.5% SDS.
A total of 75 positive phages were identified and the corresponding inserts were amplified by PCR and analyzed by Southern blot using the Go-VN13B (SEQ ID NO. 49) probe at both high (65 °C) and low (SS °C) stringency. This led to the identification of 22 cDNA clones with insert sizes longer than 3 kb. Among those, six distinct subfamilies were defined by absence of 3o cross-hybridization under stringent conditions of hybridization and washing. Full-length clones (Go-VN1 to Go-VN6, SEQ ID NOs. 33, 35, 37, 39, 41, 43), each representative of a subfamily, were selected for in vivo excision and sequenced. Go-VN13C (SEQ ID NO. 47) and Go-VNI3B
(SEQ ID NO. 49) are identical sequences differing by a 150 by deletion in Go-VN13C (SEQ ID
NO. 47). This sequence encodes for NMDQCANCPEYQYANTEKNKCIQKGVIVLSYEDPLGMALALIAFCFSAFTV (SEQ ID
NO. 58) in Go-VN13B (SEQ ID NO. 49) and is replaced by an M at position 552 in Go-VN13C
s (SEQ ID NO. 48).
DNA Sequencing and Sequence Analysis DNA sequencing was performed using ABI Prism dye terminator cycle ready reaction (Perkin Elmer, Norwalk, CT ) according to manufacturer's protocol. Samples were run on an ABI
Prism 310 Genetic Analyzer (Perkin Elmer, Norwalk, CT). Sequence homologies were determined using the BLAST system (NIH network service). Pairwise and ClustalW
alignments (BLOSUM30 matrix setting) as well as Kyte-Doolittle hydropathic analysis were obtained with the MacVector sequence analysis software (Oxford Molecular Group).
~ 5 In Situ Hybridization Analysis In situ hybridization was performed as described elsewhere (Schaeren-Wiemers, N., et al., Histochemistry, 1993, 100:431-440). VNOs were dissected from adult male (8- to 9-week-old), adult female (9- to 11-week-old), and young (1-week-old) rats.
Tissues were embedded in Tissue-Tek OCT. Antisense and sense digoxigenin-labeled probes were generated 2o from the full-length cDNAs encoding for Go, Gi2a, Go-VN13B (SEQ ID NO. 49), and Go-VN1 to Go-VN6 (SEQ ID NOs. 33, 35, 37, 39, 41, 43), as well as from the 3' untranslated regions of the Go-VN1 to Go-VN6 clones.
Imaging Processing and Statistical Analysis 2s Digital photographs were captured with a Leitz DMRB microscope (Leica) coupled to a ProgRes3012 digital camera (Kontron Electronic) and further processed with the Photoshop (Adobe System) and Canvas (Deneba) software for Macintosh. The relative positions of cells exhibiting a positive signal by in situ hybridization were measured along the basal-apical axis using the NIH Image analysis software. The number of cells in hemiconcentric sections of 10%
along this axis from the basal (value = 0) to the apical (value =100) boundaries was determined.
', Average data for Go-VN1 and Go-VN3 to Go-VN6 were obtained from six to eight VNO
' sections, corresponding to four individuals analyzed in two independent experiments. For Go-VN2, 14 VNO sections, corresponding to ten individuals and four independent experiments, were analyzed for each sex.
Southern Blot Analysis of Rat Genomic DNA and Screening of Rat and Human Genomic Libraries Genomic DNA, prepared from Lewis rat (Sprague-Dawley) liver, was digested with the restriction enzymes EcoRI and BamHI, size fractionated on 0.8% agarose gels, and blotted onto nylon membrane (Sambrook, J., et al., Molecular Cloning: A Laboratory Manual, Second Edition, Coid Spring Harbor Laboratory Press, 1989). Membranes were cross-linked under UV
light, hybridized overnight at both high (68°C) and low (55°C) stringency in hybridization buffer, and washed as described above. 32P-labeled probes were generated by random priming, using the following DNA templates: EcoRI-EcoRV, NotI-NsiI, EcoRI-SaII, PstI-NdeI, Xbal-HincII, and EcoRI-NsiI fragments of Go-VN1 to Go-VN6 (SEQ ID NOs. 33, 35, 37, 39, 41, 43), respectively; a full-length (425 bp) insert of Go-VN13A; and a cDNA
fragment including the seven transmembrane domains of Go-VN13B (SEQ ID NO. 49). Plaque-forming units (3 x 105) from rat and human genomic libraries (Stratagene, La Jolla, CA) were screened at low stringency (55 °C) using a mix of 32P-labeled probes prepared from fragments of Go-VN1 to Go-VN6 (SEQ ID NOs. 33, 35, 37, 39, 41, 43) encompassing the transmembrane domains 2 to 7.
The VNO Neuroepitheiium Expresses Two Independent Families of Pheromone Receptors We hypothesized the existence of two distinct families of genes encoding pheromone receptor genes that are selectively colocalized with either the Go protein in the basal half of the vomeronasal neuroepithelium or with the Gi2a protein in the apical region. For simplicity of nomenclature, and with the understanding that the cosegregation of distinct G-protein subunits with independent families of pheromone receptors is consistent but does not demonstrate a functional link, the family of genes encoding putative pheromone receptors that we have previously identified and that colocalize with Gi2a will be named GiZa VN, whereas the novel 3o family of receptors coexpressed with Go and described in this study will be named Go-VN. In the absence of information concerning the nature of the Go-VN receptor molecules, we reiterated the cloning strategy that allowed us to identify a family of putative pheromone receptor genes expressed by GiZa+ neurons (Dulac and Axel, Cell, 1995, 83:195-20b). This strategy was based on the assumption that individual neurons within the VNO are likely to express only one pheromone receptor gene and that transcripts encoding a given receptor represent between 1 and 0.1 % of a single-cell mRNA. Differential screening of cDNA libraries constructed from single-VNO neurons takes advantage of the fact that different cells express different receptors and thus provides an experimental solution to the problem of detecting a specific transcript in a heterogeneous population of neurons. In this attempt, we expected that differential screening of a cDNA library prepared from an isolated Go+, Gi2a VNO neuron would permit the isolation of a class of pheromone receptor genes distinct from the Gi2a VN family of receptor genes.
io A cDNA library prepared from a Go+ neuron (VN13) was dif~'erentially hybridized with s2p-labeled probes prepared from YN13 and from a second VNO neuron cDNA
(VN10). A 425 by cDNA (Go-VN13A) present at a frequency of 0.1% in the VN13-cDNA library showed selective hybridization with VN13 cell probe. Two cDNAs of longer size, Go-VN13B (SEQ ID
NO. 49) and Go-VN13C (SEQ ID NO. 47), were subsequently isolated from a cDNA
library prepared from dissected adult VNOs and showed 90% sequence similarity with Go-VN13A.
Hybridization to VNO cross-sections with digoxigenin-labeled antisense RNA
probe showed that expression of these transcripts is restricted to a small subpopulation of VNO
neurons in a location consistent with the region of Go expression of the neuroepithelium.
The sequence of Go-VN13B (SEQ ID NO. 49) reveals a partial open reading frame that includes seven 2o hydrophobic stretches of 20 amino acids in length. Go-VN13B (SEQ ID NO. 49) sequence does not share any resemblance with the odorant receptor genes nor with the family of putative pheromone receptor genes previously identified (see below). In addition, hybridization of Go-VN13B DNA probe to genomic DNA identified two discrete bands at high stringency and 13 or more at lower stringency, revealing the existence of a family of closely related genes in the rat genome.
Taken together, these data indicate that we have isolated a novel multigene family encoding seven transmembrane domain receptors and expressed by subsets of VNO
neurons from the basal half of the neuroepithelium.
3o Sequences of a New Family of VNO Receptors Recombinant phages from a VNO cDNA library were screened at low stringency with the Go-VN13B (SEQ ID NO. 49) DNA pmbe. Six distinct gene subfamilies were isolated that showed no cross-hybridization under stringent conditions of hybridization and washing. cDNAs Go-VN1 to Go-VN6, each representative of a subfamily, were fully sequenced (SEQ ID Nos 33, 35, 37, 39, 41 and 43).
In Go-VN1 to Go-VN5 cDNAs (SEQ ID Nos 33, 35, 37, 39 and 41), the first methionine of the open reading frame was tentatively chosen as a start for protein translation, revealing large open reading frames ranging from 548 to 866 amino acids. A frame shift in the Go-VN6 (SEQ
ID NO. 44) sequence (amino acid 532; indicated by slash bar in Fig. 3) indicated that this transcript is unable to generate a functional protein.
to Deduced Amino Acid Sequences of cDNAs from the Go-VN Family of Pheromone Receptors The deduced amino acid sequences of eight cDNAs belonging to the Go-VN family of putative pheromone receptors is shown in Figure 3. Predicted position of seven transmembrane domains is also indicated (I-VII). Amino acids common to at least five cDNAs are shaded.
Amino acids common to the rat mGluRl and Ca2+-sensing receptors are indicated by a star.
Hydropathy analysis of the predicted Go-VN proteins with the Kyte-Doolittle algorithm identified a large hydrophilic N-terminal domain that ranges in size from 274 amino acids in Go-VN 1 (SEQ ID NO. 34) to 595 in Go-VN4 (SEQ ID NO. 40). This is preceded in cDNAs Go-VN4 (SEQ ID NO. 40), Go-VN7 (SEQ ID NO. 46), and Go-VN13C (SEQ ID NO. 50) by 2o an initial hydrophobic 21 amino acid segment characteristic of eukaryotic signal sequences. A
cluster of seven hydrophobic regions representing potential membrane-spanning helices and typical of the G protein-coupled receptor superfamily is followed by a short hydrophilic sequence that indicates a potential intracytoplasmic C-terminal domain. A database search indicated the presence of sequence motifs common to Ca2+-sensing and metabotropic glutamate (mGluR) receptors (Houamed, K., et al., Science, 1991, 252:1318-1321; Masu, M., et al., Nature, 1991, 349:760-765; Brown, E., et al., Nature, 1993, 366:575-580 ; Pollak, M., et al., Cell, 1993 75:1297-1303). Pa.irwise sequence alignments reveal 18% to 23% sequence identity between the rat Ca2+-sensing receptor and the most distant (Go-VN3, SEQ ID Nos.37, 38) and the closest (Go-VN1, SEQ ID NOs. 33, 34) Go-VN sequences, respectively. Sequences of rat mGluRl and 3o Go-VN cDNAs appear more distantly related. Several localized regions showed a more pronounced degree of similarity, including a cysteine-rich sequence just preceding the first transmembrane domain (amino acid 206 to 260 in Go-VN1, SEQ ID NO. 34), the predicted transmembrane domains 2 to 7 with surrounding cytoplasmic and extracellular loops, and the relative position of 20 cysteines. The N-terminal and first transmembrane domains show little degree of homology. In mGluR and Ca2+-sensing receptors, the second intracellular loop is involved in providing specificity for G-protein coupling (Gomeza, J., et al., J. Biol. Chem., s 1996, 271:2199-2205), enabling dii~erent classes of mGluR receptors to activate phospholipase C or to inhibit adenylyl cyclase. In Go-VN, this domain is rich in basic residues, as expected for potential G-protein coupling, and shows closer resemblance to the class II and III mGluRs that were shown to couple to Go and Gi subunits. Overall, the six Go-VN sequences share between 42% and 75% sequence identity. Regions of Go-VN proteins downstream of transmembrane domain 2 are nearly identical in all VNO receptor s~uences. In contrast, N-terminal extracellular regions and first transmembrane domains are quite divergent.
Anomalies in Go-VN cDNA Sequences: Two unusual features were observed in the sequence of some Go-VN cDNAs. Iu Go-VN1 (SEQ ID NO. 33) and Go-VN3 (SEQ ID NO.
37) cDNAs, stretches of open reading frame can be found in the 5' extremity of the cDNAs that 15 generate polypeptide sequences of 310 and and 152 amino acids, respectively, which are interrupted by a frameshift in Go-VNl and by an insertion of 500 nucleic acids in Go-VN3. The prospective receptor protein sequences indicated for Go-VN1 (SEQ ID NO. 33) and Go-VN3 (SEQ ID NO. 37) (Fig. 3) start at the next available methionin and are therefore significantly shorter than those of other receptor cDNAs.
2o Go-VN7 (SEQ ID NO. 45) and Go-VN13C (SEQ ID NO. 47) cDNAs show a similar deletion of 150 by located at the exact same position in the sequence.
Strikingly, the 150 by deletion does not alter the own reading frame but generates a gap that encompasses 34 amino acids upstream of the first transmembrane domain and most of the first transmembrane domain itself.
25 Hydropathy analysis of Go-VN7 (SEQ ID NO. 46) and Go-VN13C (SEQ ID NO. 48) protein sequences detects only a seven to eight amino acid long hydrophobic stretch that might not be long enough to replace the deleted transmembrane domain 1 and allow the appropriate folding of the protein. Except for the 150 by gap, sequences of Go-VN13B (SEQ
ID NO. 50) and Go-VN 13 C (SEQ ID NO. 48) are identical. This raises the question as to whether both transcripts 3o might originate from alternative splicing of the same gene. Alternatively, they might be transcribed from independent genes that evolved from recent duplication and deletion events.
Size of the Go-VN Family of Genes We investigated the size of the Go-VN family of receptors by hybridizing 32P-labeled cDNA probes prepared from regions spanning the most divergent N-terminal half of the receptor protein to rat genomic DNA. Individual probes identify two to four discrete bands under s stringent conditions of hybridization and washing. Under conditions of reduced stringency, each of the individual probes now generates a unique pattern of 12 to 20 bands, providing a direct illustration of the existence of a very large family of related genes.
A direct estimate of the size of the Go-VN receptor gene family was obtained by low stringency screening of a rat genomic library. PCR amplification on genomic DNA had indicated 1 o that receptor genes are devoid of introns in the region encompassing transmembrane domains 2 to 7, enabling us to deduce directly the number of genes present in the rat genome. A mix of s2p_labeled DNA probes prepared from the six Go-VN cDNA fragments identified 110 positive clones per haploid genome, indicating that the family of Go-VN receptors may consist of 100 genes.
Expression Pattern of Go-VN Receptors The pattern of expression of the Go-VN receptor genes was examined by in situ hybridization with digoxigenin-labeled RNA antisense probes. No signal was observed after hybridizing the mix of Go-VNl to Go-VN6 (SEQ ID NOs. 33, 35, 37, 39, 41 and 43) receptor 2o probes to sections of muscle, testis, brain, or whole head. The adult olfactory epithelium was also consistently negative, although rare positive cells (one to three cells per section) were observed in the olfactory neuroepithelium of E19 rat embryo. In contrast, strong signals were observed when antisense receptor RNA probes were hybridized to VNO neuroepithelium. In adults, each one of the Go-VN probes detects small subsets of VNO sensory neurons. When hybridization and washing were performed at lower temperature, the number of faintly labeled neurons increased, revealing cross- hybridization to more distant receptor genes.
Under high stringency conditions, cDNA clones Go-VN1 to Go-VN6 label 1.9%, 3.6%, 6.1%, 0.4%, 3.5%, and 1.3% of the VNO sensory neurons, respectively. Under the same experimental conditions, the mix of all six Go-VN RNA probes labels 19% of the cells. This 3o number is similar to the sum of labeled neurons detected with the six individual Go-VN probes (17%), indicating that probes representing the six receptor subfamilies recognize distinct populations of VNO sensory neurons. Spatial Distribution of Go-VN Receptor Transcripts WO 99/x0422 PCT/US98/13680 Positive neurons identified with each of the Go-VN probes were randomly distributed along the anteroposterior and dorso-ventral axis of the VNO neuroepithelium. Most RNA
probes recognize cells that are preferentially localized in the most basal two-thirds of the neuroepithelium corresponding to the zone of Go expression. However, careful examination of adjacent cross-sections of vomeronasal neuroepithelium labeled with each of the Go-VN
probes reveals a well-organized spatial distribution of receptor expression. Different receptors appear preferentially localized in radial zones that define a series of hemiconcentric rings of distinct diameters. This pattern is observed along the entire length of the VNO and is conserved in all animals analyzed. The Go-VN3 (SEQ ID NO. 37) probe, for example, recognizes a subset of 1o neurons that are confined to the most basal third of the VNO
neuroepithelium. In contrast, the Go-VN1 (SEQ ID NO. 33), Go-VN4 (SEQ ID NO. 39), and Go-VNS (SEQ ID NO. 41) RNA
probes identify cells restricted to a hemiconcentric zone immediately apical to the area of Go-VN3 expression, whereas Go-VN2 identifies cells apposed to the apical Iayer of supporting cells. Go-VN6 in turn is found only in sparse cells immediately apposed to the basal membrane.
This is best seen in a statistical representation of Go-VN receptor localization collected from VNO sections and multiple animals that shows a striking conservation of these patterns. Thus, transcription of Go-VN cDNAs appears restricted to one of three circumscribed areas of the VNO
neuroepithelium in a manner quite reminiscent of the odorant receptor gene expression in four zones of the MOE (Ressler, K., et al., Cell, 1993, 73:597-609 ; Vassar, R., et al., Cell, 1993, 74:309-318). Although Go-VN3 (SEQ ID NO. 37) and Go-VN6 (SEQ ID NO. 43) transcripts show a clear segregation in the most basal region of the VNO neuroepithelium, the sequence anomalies found in both transcripts leave the functionality of this area of the neuroepithelium as an open question.
Sexual Dimorphism in Receptor Di$tribution and Age-Related Changes To identify potential sexual dimorphism in Go-VN receptor expression, we systematically hybridized each probe to sections originating from adult male and female rat VNOs. All receptors were equally distributed in males and females with the striking exception of Go-VN2 (SEQ ID
NO. 35). In females, Go-VN2 appears expressed in a large and centrally located region 3o comprising one-third of the neuroepithelium. In sharp contrast, the same probe recognizes in males a cohort of cells in the most apical side of the neuroepithelium, closely apposed to the VNO lumen, and most likely intermingled with Gi2a VNO sensory neurons. Such a difference in the Go-VN2 expression pattern in males and females might result from the expression of the same receptor gene in a different zone of the VNO epithelium or from a differential expression of two distinct but closely related genes of the Go-VN2 subfamily. In females, Go-VN2 generates a very intense hybridization signal to most positive neurons and a fainter staining on s a second set of labeled cells. The population of faintly labeled cells was never detected in males, indicating the existence of a female-specific neuronal subpopulation expressing either a lower level of the Go-VN2 transcript or a female-specific receptor significantly different but still cross-hybridizing to the Go-VN2 probe. We followed the emergence of receptor expression and of the VNO zonal organization during development and postnatal stages preceding puberty.
1o Go-VN receptor expression is first detected in the VNO of E14 embryos. No significant difference is observed in the onset of expression of Gi2a VN and Go-VN classes of receptor genes. In agreement with data of Berghard and Buck, 1996 in mouse, segregation of Gi2a and Go expression in the apical and basal areas of VNO neuroepithelium, respectively, is not apparent in the embryo and in 1-week-old animals. In contrast, Gi~+ cells appear randomly 15 distributed in large clusters over the whole thickness of the neuroepithelium, intermingled with Go cells. At 4 weeks after birth, however, Gi2a cells appear clearly localized in the apex of the epithelium. Similarly, in situ hybridization experiments with mixes of Go-VN
and GiZa VN
receptor probes on sections of the VNOs dissected from late embryos and 1-week-old animals show that the two cell populations are still intermingled at early postnatal stages. We observed 20 that the zonal distribution of the two families of receptors slowly emerges during sexual maturation to reach the spatial distribution observed in adults. Preliminary data indicate that the sexual dimorphic expression pattern of Go-VN2 is undetectable at 6 weeks after birth. Thus, in contrast to the zones of olfactory receptor gene expression, which are already present in the olfactory epithelium at the earliest stages of receptor gene expression in the embryo (Sullivan, 25 S., et al., Neuron, 1995, 15:779-789), the spatial organization of the VNO
neuroepithelium as detected by G-protein and receptor gene expression emerges only in a late postnatal period and reaches its definitive pattern at sexual maturity.
Expression of Go-VN Receptors Is Restricted to Go+ VNp Neurons 3o The expression of some of the Go-VN receptors in neurons lining the VNO
lumen in an area mainly occupied by Gi~+ cells raises the obvious question as to whether the expression of this family of genes is strictly restricted to Go+ VNO neurons. Single-cell cDNA prepared from 23 individual VNO neurons was analyzed by Southern blots with probes representing the six divergent subfamilies of Go-VN receptors and was PCR amplified with degenerated primers based on conserved motifs between Go-VN receptor sequences. Both approaches confirmed that none of the 19 cell cDNAs prepared from Gi2a+ neurons contained any sequence of the Go-VN
receptor family. In contrast, all four cDNAs generated from Gi2a cells contained a sequence related to the Go-VN receptors. PCR products generated with degenerated primers based on conserved motifs between Go-VN receptor sequences and obtained from the four Go+ cells were subcloned and sequenced. For each single-cell cDNA, the insert sequences from ten independent colonies were found to be identical. This set of data strongly suggests that Go-VN receptor 1 o genes are not expressed by Gi2a+ neurons and constitutes preliminary evidence for the expression of only one Go-VN receptor gene per neuron.
Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the invention described herein. Such equivalents are intended to be encompassed by the following claims. All references 1 s disclosed herein are incorporated by reference in their entirety.
A Sequence Listing is presented below and is followed by what is claimed.
SEQUENCE LISTING
(1) GENERAL INFORMATION
(i) APPLICANT: PRESIDENT AND FELLOWS OF HARVARD COLLEGE
(ii) TITLE OF THE INVENTION: NOVEL PHEROMONE RECEPTORS
(iii) NUMBER OF SEQUENCES: 92 (iv) CORRESPONDENCE ADDRESS:
(A) ADDRESSEE: Wolf, Greenfield & Sacks, P.C.
(B) STREET: 600 Atlantic Avenue (C) CITY: Boston (D) STATE: MA
(E) COUNTRY: U.S.A.
(F) ZIP: 02210-2211 (v) COMPUTER READABLE FORM:
(A) MEDIUM TYPE: Diskette (B) COMPUTER: IBM Compatible (C) OPERATING SYSTEM: DOS
(D) SOFTWARE: FastSEQ for Windows Version 2.0 (vi) CURRENT APPLICATION DATA:
(A) APPLICATION NUMBER:
(B) FILING DATE:
(C) CLASSIFICATION:
(vii) PRIOR APPLICATION DATA:
(A) APPLICATION NUMBER: 60/051,284 (B) FILING DATE: 30-~TC1N-1997 (viii) ATTORNEY/AGENT INFORMATION:
(A) NAME: Plumer, Elizabeth R.
(B) REGISTRATION NUMBER: 36,637 (C) REFERENCE/DOCKET NUMBER: H0498/7074 (ix) TELECOMMUNICATION INFORMATION:
(A) TELEPHONE: 617-720-3500 (B) TELEFAX: 617-720-2441 ( C ) TELEX
(2) INFORMATION FOR SEQ ID NO:1:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3080 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 57...2606 (D) OTHER INFORMATION: VR1 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:1:
Met Lys Gln Leu Cys Ala Phe Thr Ile Ser Leu Leu Phe Leu Lys Phe Ser Leu Ile Leu Cys Cys Leu Thr Glu Pro Ser Cys Phe Trp Arg Ile Arg Asn Ser Glu Asp Ser Asp Gly Asp Leu Gln Arg Glu Cys His Phe Tyr Leu Trp Lys Thr Asp Glu Pro Ile Glu Asp Ser Phe Tyr Asn Tyr Asp Leu Ser Phe Arg Ile Ala Ala Ser Glu Tyr Glu Phe Leu Leu Val Met Phe Phe Ala Ile Asp Glu Ile Asn Arg Asn Pro Tyr Leu Leu Pro Asn Ile Thr Leu Met Phe Ser Phe Ile Gly Gly Asn Cys Gln Asp Leu Leu ', Arg Val Met Asp Gln Ala Tyr Thr Gln Ile Asn Gly His Met Asn Phe TAT TGT TTA TGT
GCC
ATA
GGT
Val Asn Phe Tyr Asp AspSerCys IleGly Leu Thr Tyr Cys Leu Ala TCA AAA TCC ATG
Gly Pro Trp Thr Leu LysLeuAla HisSer Ser Met Ser Lys Ser Met ', 150 155 160 GTT TTT CCA CTA
Pro Leu Phe Gly Phe AsnProAsn ArgAsp His Asp Val Phe Pro Leu ', CGG CTG CAT CAT GTA GCCCCCAAG ACACAT TTG TCC 635 CCC GTC CAG GAC
Arg Leu His His Val AlaProLys ThrHis Leu Ser Pro Val Gln Asp ATG TCC ATG TGG
His Gly Val Leu Phe HisPheArg ThrTrp Ile Gly Met Ser Met Trp ATC GAT GAC TTT
Leu Val Ser Asp Gln GlyIleGln LeuSer Asp Leu Ile Asp Asp Phe GAA CAA CAT GCT
Arg Glu Ser Arg Gly IleCysLeu PheVal Asn Met Glu Gln His Ala GAA ATG ATA GCT
Ile Pro Asn Gln Tyr MetThrArg ThrIle Tyr Asp.
Glu Met Ile Ala ATG AAG GTT TAT
Lys His Ile Thr Ser Ser Ala Val Ile Ile Gly Glu Met Lys Val Tyr ACT TTT AGA GAG
Met Asn Ser Leu Glu Ala Ser Arg Trp Glu Leu Gly Thr Phe Arg Glu ATC TCA CAA ATC
Ala Arg Arg Trp Ile Thr Thr Trp Asp Val Thr Asn Ile Ser Gln Ile TTC TTC CAT ACT
Lys Lys Asp Thr Leu Asn Leu Gly Ile Ile Phe Glu Phe Phe His Thr TTT TTA AAT CAA
His His Arg Glu Ile Pro Lys Lys Phe Met Thr Met Phe Leu Asn Gln AAA ATT TCT TTG
Asn Thr Ala Tyr Pro Val Asp His Thr Ile Glu Trp Lys Ile Ser Leu AAT AAG AAC ATG
Asn Tyr Phe Cys Ser Ile Ser Ser Ile Arg His His Asn Lys Asn Met AAC TGG ACA AAC
Ile Thr Phe Asn Thr Leu Glu Ser Leu His Tyr Asp Asn Trp Thr Asn AGT AAT TTG GTT
Val Ala Met Asp Glu Gly Tyr Tyr Asn Ala Tyr Ala Ser Asn Leu Val ACC ATT TTT GAG
Val Ala His Tyr His Glu Tyr Gln Gln Val Ser Gln Thr Ile Phe Glu AAA TTC ACT CAG
Lys Lys Ala Pro Lys Arg Tyr Ala Cys Gln Val Ser Lys Phe Thr Gln AAA ACG AAC GAA
Ser Leu Met Thr Arg Val Phe Pro Val Gly Leu Val Lys Thr Asn Glu CAT TGT ACA ATT
Asn Met Lys Arg Glu Asn Gln Glu Tyr Asp Phe Ile His Cys Thr Ile TTT GGA TTA ATA
Ile Trp Asn Pro Gln Gly Leu Lys Val Lys Gly Ser Phe Gly Leu Ile TGT CAA AAA TCT
Tyr Leu Pro Phe Pro Gln Arg Leu His Ile Asp Asp Cys Gln Lys Ser GCC TCA CCT TCC
Leu Glu Trp Lys Gly Gly Thr Gln Val Pro Ser Val Ala Ser Pro Ser Cys Ser Val Ala Cys Thr Ala Gly Phe Arg Lys Ile Tyr Gln Lys Glu Thr Ala Asp Cys Cys Phe Asp Cys Val Gln Cys Pro Glu Asn Glu Ile ~ TCC AAC GAA ACA GAT ATG GAA CAG TGT GTG AGG TGT CCA GAT GAT AAG 1739 _ Ser Asn Glu Thr Asp Met Glu Gln Cys Val Arg Cys Pro Asp Asp Lys Tyr AlaAsn IleGluGln ThrHisCysLeu SerArgAla ValSer Phe Leu AlaTyr GluAspSer LeuGlyMetAla LeuGlyCys MetAla Leu Ser PheSer AlaIleThr IleLeuIleLeu ValThrPhe ValLys Tyr Lys AspThr ProThrVal LysAlaAsnAsn ArgIleLeu SerTyr Ile Leu LeuIle SerLeuVal PheCysPheLeu CysSerLeu LeuPhe Ile Gly ProPro AspGlnVal ThrCysIlePhe GlnGlnThr ThrPhe Gly Val LeuPhe ThrValSer ValSerThrVal LeuAlaLys ThrIle Thr Val ValMet AlaPheLys LeuThrThrPro GlyArgArg MetArg Gly Met MetMet ThrGlyAla ProLysLeuVal IleProIle CysThr Leu Ile GlnLeu ValLeuCys GlyIleTrpLeu ValThrSer ProPro Phe Ile AspArg AspIleGln SerGluHisGly LysIleVal IleLeu Cys Asn LysGly SerValIle AlaPheHisVal ValLeuGly TyrLeu Gly Ser LeuAla LeuGlySer PheThrLeuAla PheLeuAla ArgAsn Leu .
ProAsp ThrPheAsn Glu Lys PheLeuThr PheSerMet LeuVal Ala PheCys SerValTrp IleThrPhe LeuProVal TyrHisSer ThrArg GlyArg ValMetVal ValValGlu ValPheSer IleLeuAla SerSer AlaGly LeuLeuMet CysIlePhe ValProLys CysTyrVal IleLeu IleArg ProAspSer AsnPheIle LysAsnHis LysGlyLys LeuLeu TATTGAAACTTTC GATATTCAAC TTATCTTATT
ATGGTATGAA CTTCAT
AATGTTAGAT
Tyr (2) INFORMATION FOR SEQ ID N0:2:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 850 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:2:
Met Lys Gln Leu Cys Ala Phe Thr Ile Ser Leu Leu Phe Leu Lys Phe Ser Leu Ile Leu Cys Cys Leu Thr Glu Pro Ser Cys Phe Trp Arg Ile Arg Asn Ser Glu Asp Ser Asp Gly Asp Leu Gln Arg Glu Cys His Phe Tyr Leu Trp Lys Thr Asp Glu Pro Ile Glu Asp Ser Phe Tyr Asn Tyr Asp Leu Ser Phe Arg Ile Ala Ala Ser Glu Tyr Glu Phe Leu Leu Val Met Phe Phe Ala Ile Asp Glu Ile Asn Arg Asn Pro Tyr Leu Leu Pro Asn Ile Thr Leu Met Phe Ser Phe Ile Gly Gly Asn Cys Gln Asp Leu Leu Arg Val Met Asp Gln Ala Tyr Thr Gln Ile Asn Gly His Met Asn Phe Val Asn Tyr Phe Cys Tyr Leu Asp Asp Ser Cys Ala Ile Gly Leu Thr Gly Pro Ser Trp Lys Thr Ser Leu Lys Leu Ala Met His Ser Ser Met Pro Leu Val Phe Phe Gly Pro Phe Asn Pro Asn Leu Arg Asp His Asp Arg Leu Pro His Val His Gln Val Ala Pro Lys Asp Thr His Leu Ser His Gly Met Val Ser Leu Met Phe His Phe Arg Trp Thr Trp Ile Gly Leu Val Ile Ser Asp Asp Asp Gln Gly Ile Gln Phe Leu Ser Asp Leu Arg Glu Glu Ser Gln Arg His Gly Ile Cys Leu Ala Phe Val Asn Met Ile Pro Glu Asn Met Gln Ile Tyr Met Thr Arg Ala Thr Ile Tyr Asp Lys His Ile Met Thr Ser Ser Ala Lys Val Val Ile Ile Tyr Gly Glu Met Asn Ser Thr Leu Glu Ala Ser Phe Arg Arg Trp Glu Glu Leu Gly Ala Arg Arg Ile Trp Ile Thr Thr Ser Gln Trp Asp Val Ile Thr Asn Lys Lys Asp Phe Thr Leu Asn Leu Phe His Gly Ile Ile Thr Phe Glu His His Arg Phe Glu Ile Pro Lys Leu Asn Lys Phe Met Gln Thr Met Asn Thr Ala Lys Tyr Pro Val Asp Ile Ser His Thr Ile Leu Glu Trp Asn Tyr Phe Asn Cys Ser Ile Ser Lys Asn Ser Ile Arg Met His His Ile Thr Phe Asn Asn Thr Leu Glu Trp Thr Ser Leu His Asn Tyr Asp Val Ala Met Ser Asp Glu Gly Tyr Asn Leu Tyr Asn Ala Val Tyr Ala Val Ala His Thr Tyr His GIu Tyr Ile Phe Gln Gln Val Glu Ser Gln Lys Lys Ala Lys Pro Lys Arg Tyr Phe Thr Ala Cys Gln Gln Val Ser Ser Leu Met Lys Thr Arg Val Phe Thr Asn Pro Val Gly Glu Leu Val Asn Met Lys His Arg Glu Asn Gln Cys Thr Glu Tyr Asp Ile Phe Ile Ile Trp Asn Phe Pro Gln Gly Leu Gly Leu Lys Val Lys Ile Gly Ser Tyr Leu Pro Cys Phe Pro Gln Arg Gln Lys Leu His Ile Ser Asp Asp Leu Glu Trp Ala Lys Gly Gly Thr Ser Pro Gln Val Pro Ser Ser Val Cys Ser Val Ala Cys Thr Ala Gly Phe Arg Lys Ile Tyr Gln Lys Glu Thr Ala Asp Cys Cys Phe Asp Cys Val Gln Cys Pro Glu Asn Glu Ile Ser Asn Glu Thr Asp Met Glu Gln Cys Val Arg Cys Pro Asp Asp Lys Tyr Ala Asn Ile Glu Gln Thr His Cys Leu Ser Arg Ala Val Ser Phe Leu Ala Tyr Glu Asp Ser Leu Gly Met Ala Leu Gly Cys Met Ala Leu Ser Phe Ser Ala Ile Thr Ile Leu Ile Leu Val Thr Phe Val Lys Tyr Lys Asp Thr Pro Thr Val Lys Ala Asn Asn Arg Ile Leu Ser Tyr Ile Leu Leu Ile Ser Leu Val Phe Cys Phe Leu Cys Ser Leu Leu Phe Ile Gly Pro Pro Asp Gln Val Thr Cys Ile Phe Gln Gln Thr Thr Phe Gly Val Leu Phe Thr Val Ser Val Ser Thr Val Leu Ala Lys Thr Ile Thr Val Val Met Ala Phe Lys Leu Thr Thr Pro Gly Arg Arg Met Arg Gly Met Met Met Thr Gly Ala Pro Lys Leu Val Ile Pro Ile Cys Thr Leu Ile Gln Leu Val Leu Cys Gly Ile Trp Leu Val Thr Ser Pro Pro Phe Ile Asp Arg Asp Ile Gln Ser Glu His Gly Lys Ile Val Ile Leu Cys Asn Lys Gly Ser Val Ile Ala Phe His Val Val Leu Gly Tyr Leu Gly Ser Leu Ala Leu Gly Ser Phe Thr Leu Ala Phe Leu Ala Arg Asn Leu Pro Asp Thr Phe Asn Glu Ala Lys Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ser Val Trp Ile Thr Phe Leu Pro Val Tyr His Ser Thr Arg Gly Arg Val Met Val Val Val Glu Val Phe Ser Ile Leu Ala Ser Ser Ala Gly Leu Leu Met Cys Ile Phe Val Pro Lys Cys Tyr Val Ile Leu Ile Arg Pro Asp Ser Asn Phe Ile Lys Asn His Lys Gly Lys Leu Leu Tyr (2) INFORMATION FOR SEQ ID N0:3:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2961 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 86...2509 (D) OTHER INFORMATION: VR2 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:3:
GTGCAACTGT
GTGTGTGATG
TTTTTCTGCA
TCAGAAACGG
ATTTCACAGC
CAGATCCTAG
CAGAC
MetLysGln LeuCysThr PheThrIle TTG AAG
Ser LeuPhe Leu Phe SerLeuIle LeuCysCys TrpSerGlu Leu Lys AGC AGG
Pro CysPhe Trp Ile LysLysSer GluAspAsn AspGlyAsp Ser Arg CAA CAT
Leu ArgGlu Cys Phe TyrLeuTrp LysThrAsp GluProIle Gln His GAT AAT
Glu SerPhe Tyr Tyr AspLeuSer PheArgIle AlaGlySer Asp Asn TAT CTG
Glu GluLeu Leu Val MetPhePhe AlaThrAsp GluIleAsn Tyr Leu Lys Asn Pro Tyr Leu Leu Pro Asn Met Ser Leu Met Phe Ser Ile Ile Gly Gly Asn Cys His Asp Leu Leu Arg Ser Leu Asp Gln Glu Tyr A1a Gln Ile Asp Gly His Met Asn Phe Val Asn Tyr Phe Cys Tyr Leu Asp ' 125 130 135 Asp Ser Cys Ala Thr Gly Leu Thr Gly Pro Ser Trp Lys Thr Ser Leu Lys Leu Ala Met His Ser Ser Met Pro Leu Val Phe Phe Gly Pro Phe CGC CGG CAT GTC GTA
', Asn ProAsn LeuArgAsp His Asp Leu Pro Val HisGlnVal Arg His j GCC CCCAAG GACACACAT TTG TCC GGC ATG TCC TTGATGTTT 688 CAT GTC
Ala ProLys AspThrHis Leu Ser Gly Met Ser LeuMetPhe His Val CTG TCA
His PheArg TrpThrTrp Ile Gly Val Ile Asp AspAspGln Leu Ser ', 205 210 215 AGA AGC
Gly IleGln PheLeuSer Asp Leu Glu Glu Gln ArgHisGly Arg Ser ', 220 225 230 ATC AAC
Ile CysLeu AlaPheVal Asn Met Pro Glu Met GlnIleTyr Ile Asn I
ACA ATG
Met ThrArg AlaThrIle Tyr Asp Gln Ile Thr SerSerAla Thr Met ATG ACT
', Lys ValVal IleIleTyr Gly Asp Asn Ser Leu GluAlaSer Met Thr GCT ATC
Phe ArgArg TrpGluGlu Leu Gly Arg Arg Trp IleThrThr Ala Ile Thr Gln Trp Asp Val Ile Thr Asn Lys Lys Asp Phe Thr Leu Asn Leu Phe His Gly Thr Ile Thr Phe Ala His His Lys Asp Glu Ile Pro Lys Phe Arg Asn Phe Met Gln Thr Lys Lys Thr Ala Lys Tyr Leu Val Asp I, ATT TCT CAT ACT ATT TTG GAG TGG AAT TAT TTT AAT TGT TCA ATC TCT 1168 IleSer ThrIle Leu Trp TyrPhe CysSer IleSer His Glu Asn Asn TTT AAC
LysAsn SerSerLys MetGlyHis ThrPhe AsnThr LeuGln Phe Asn ATG AGC
TrpThr AlaLeuHis AsnTyrAsp AlaLeu AspGlu GlyTyr Met Ser GTG ACC
AsnLeu TyrAsnAla ValTyrAla AlaHis TyrHis GluTyr Val Thr AAA AAA
IleLeu GlnGlnVal GluSerGln LysAla ProLys ArgTyr Lys Lys TCC AAA
PheThr AlaCysGln GlnValSer LeuMet ThrArg ValPhe Ser Lys AAC CAT
MetAsn ProValGly GluLeuVal MetLys ArgGlu AsnGln Asn His ATT TTT
CysThr GluTyrAsp IlePheIle TrpAsn ProGln GlyLeu Ile Phe TAT TGC
GlyLeu LysValLys ValGlySer LeuPro PhePro LysSer Tyr Cys TTG GCC
GlnGln LeuHisIle AlaAspAsp GluTrp MetGly GlyThr Leu Ala AGA GAT
SerVal AspMetGlu GlnCysVal CysPro AsnLys TyrAla Arg Asp CAA GTG
AsnLeu GluGlnThr HisCysLeu ArgThr SerPhe LeuAla Gln Val CTA ATG
TyrGlu AspProLeu GlyMetAla GlyCys AlaLeu SerPhe Leu Met GTC GTG
SerAla IleThrIle LeuValLeu ThrPhe LysTyr LysAsp Val Val CGC AGC
ThrPro IleValLys AlaAsnAsn IleLeu TyrIle LeuLeu Arg Ser TGT CTC
IleSer LeuValPhe CysPheLeu SerLeu PheIle GlyHis Cys Leu CAG ACA
ProAsp GlnValThr CysIleLeu GlnThr PheGly ValLeu Gln Thr .
TCT GTG AAA ATA
Phe Thr Val Ser Val Thr LeuAla Thr ThrValVal Ser Val Lys Ile ACT CCA AGG AGA
Met Ala Phe Lys Leu Thr GlyArg Met GlyMetMet Thr Pro Arg Arg AAG GTC ATT ACC
Met Thr Gly Ala Pro Leu IlePro Cys LeuIleGln Lys Val Ile Thr ATC TTG TCT CCC
Leu Val Leu Cys Gly Trp ValThr Pro PheIleAsp Ile Leu Ser Pro GAA GGG GTC CTT
Arg Asp Ile Gln Ser His LysIle Ile CysAsnLys Glu Gly Val Leu TTC GTC GGA TTG
Gly Ser Val Val Ala His ValLeu Tyr GlySerLeu Phe Val Gly Leu 700 . 705 710 ACT GCT GCT AAC
Ala Leu Gly Ser Phe Leu PheLeu Arg LeuProAsp Thr Ala Ala Asn AAG CTA AGC CTG
Thr Phe Asn Glu Ala Phe ThrPhe Met ValPheCys Lys Leu Ser Leu TTC CCT CAC ACC
Ser Val Trp Ile Thr Leu ValTyr Ser ArgGlyLys Phe Pro His Thr GAG TTC TTG TCT
Val Met Val Val Val Val SerIle Ala SerAlaGly Glu Phe Leu Ser TTT CCA TAT ATT
Leu Leu Met Cys Ile Val LysCys Val LeuIleArg Phe Pro Tyr Ile ATA AAC GGT TTG
Pro Asp Ser Asn Phe Gln HisLys Lys LeuTyr Ile Asn Gly Leu TAGATGATAT TCTTAATAAA
TCAACTTATC
AAAATAAAGT CAAACTGGAC
AATATACAGA
GAACTGGGAT CCAATATTTT
TCTCAATTGA
AGCCATGTAC GGTTACCCTA
TTAATTAATG
CTCTAGGCAT AAGGGTACTG
GCTGTCCTTG
CCAGTAATCA ATGGAGTTCT
ACATTATTCC
GACTTTATTC GAATAAATAA
AATGTTCTAT
AAAAAAA
(2) INFORMATION FOR SEQ ID N0:4:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 808 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:4:
Met Lys Gln Leu Cys Thr Phe Thr Ile Ser Leu Leu Phe Leu Lys Phe Ser Leu Ile Leu Cys Cys Trp Ser Glu Pro Ser Cys Phe Trp Arg Ile Lys Lys Ser Glu Asp Asn Asp Gly Asp Leu Gln Arg Glu Cys His Phe Tyr Leu Trp Lys Thr Asp Glu Pro Ile Glu Asp Ser Phe Tyr Asn Tyr Asp Leu Ser Phe Arg Ile Ala Gly Ser Glu Tyr Glu Leu Leu Leu Val Met Phe Phe Ala Thr Asp Glu Ile Asn Lys Asn Pro Tyr Leu Leu Pro Asn Met Ser Leu Met Phe Ser Ile Ile Gly Gly Asn Cys His Asp Leu Leu Arg Ser Leu Asp Gln Glu Tyr Ala Gln Ile Asp Gly His Met Asn Phe Val Asn Tyr Phe Cys Tyr Leu Asp Asp Ser Cys Ala Thr Gly Leu Thr Gly Pro Ser Trp Lys Thr Ser Leu Lys Leu Ala Met His Ser Ser Met Pro Leu Val Phe Phe Gly Pro Phe Asn Pro Asn Leu Arg Asp His Asp Arg Leu Pro His Val His Gln Val Ala Pro Lys Asp Thr His Leu Ser His Gly Met Val Ser Leu Met Phe His Phe Arg Trp Thr Trp Ile Gly Leu Val Ile Ser Asp Asp Asp Gln Gly Ile Gln Phe Leu Ser Asp Leu Arg Glu Glu Ser Gln Arg His Gly Ile Cys Leu Ala Phe Val Asn Met Ile Pro Glu Asn Met Gln Ile Tyr Met Thr Arg Ala Thr Ile Tyr Asp Thr Gln Ile Met Thr Ser Ser Ala Lys Val Val Ile Ile Tyr Gly Asp Met Asn Ser Thr Leu Glu Ala Ser Phe Arg Arg Trp Glu Glu Leu Gly Ala Arg Arg Ile Trp Ile Thr Thr Thr Gln Trp Asp Val Ile Thr Asn Lys Lys Asp Phe Thr Leu Asn Leu Phe His Gly Thr Ile Thr Phe Ala His His Lys Asp Glu Ile Pro Lys Phe Arg Asn Phe Met Gln Thr Lys Lys Thr Ala Lys Tyr Leu Val Asp Ile Ser His Thr Ile Leu Glu Trp Asn Tyr Phe Asn Cys Ser Ile Ser Lys Asn Ser Ser Lys Met Gly His Phe Thr Phe Asn Asn Thr Leu Gln Trp Thr Ala Leu His Asn Tyr Asp Met Ala Leu Ser Asp Glu Gly Tyr Asn Leu Tyr Asn Ala Val Tyr Ala Val Ala His Thr Tyr His Glu Tyr Ile Leu Gln Gln Val Glu Ser Gln Lys Lys Ala Lys Pro Lys Arg Tyr Phe Thr Ala Cys Gln Gln Val Ser Ser Leu Met Lys Thr Arg Val Phe Met Asn Pro Val Gly Glu Leu Val Asn Met Lys His Arg Glu Asn Gln Cys Thr Glu Tyr Asp Ile Phe _ 73 _ Ile Ile Trp Asn Phe Pro Gln Gly Leu Gly Leu Lys Val Lys Val Gly Ser Tyr Leu Pro Cys Phe Pro Lys Ser Gln Gln Leu His Ile Ala Asp Asp Leu Glu Trp Ala Met Gly Gly Thr Ser Val Asp Met Glu Gln Cys Val Arg Cys Pro Asp Asn Lys Tyr Ala Asn Leu Glu Gln Thr His Cys Leu Gln Arg Thr Val Ser Phe Leu Ala Tyr Glu Asp Pro Leu Gly Met Ala Leu Gly Cys Met Ala Leu Ser Phe Ser Ala Ile Thr Ile Leu Val Leu Val Thr Phe Val Lys Tyr Lys Asp Thr Pro Ile Val Lys Ala Asn Asn Arg Ile Leu Ser Tyr Ile Leu Leu Ile Ser Leu Val Phe Cys Phe Leu Cys Ser Leu Leu Phe Ile Gly His Pro Asp Gln Val Thr Cys Ile Leu Gln Gln Thr Thr Phe Gly Val Leu Phe Thr Val Ser Val Ser Thr Val Leu Ala Lys Thr Ile Thr Val Val Met Ala Phe Lys Leu Thr Thr Pro Gly Arg Arg Met Arg Gly Met Met Met Thr Gly Ala Pro Lys Leu Val Ile Pro Ile Cys Thr Leu Ile Gln Leu Val Leu Cys Gly Ile Trp Leu Val Thr Ser Pro Pro Phe Ile Asp Arg Asp Ile Gln Ser Glu His Gly Lys Ile Val Ile Leu Cys Asn Lys Gly Ser Val Val Ala Phe His Val Val Leu Gly Tyr Leu Gly Ser Leu Ala Leu Gly Ser Phe Thr Leu Ala Phe Leu Ala Arg Asn Leu Pro Asp Thr Phe Asn Glu Ala Lys Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ser Val Trp Ile Thr Phe Leu Pro Val Tyr His Ser Thr Arg Gly Lys Val Met Val Val Val Glu Val Phe Ser Ile Leu Ala Ser Ser Ala Gly Leu Leu Met Cys Ile Phe Val Pro Lys Cys Tyr Val Ile Leu Ile Arg Pro Asp Ser Asn Phe Ile Gln Asn His Lys Gly Lys Leu Leu Tyr (2) INFORMATION FOR SEQ ID N0:5:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2907 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/ItEY: Coding Sequence (B) LOCATION: 1...2409 (D) OTIiER INFORMATION: VR3 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:5:
His Phe Tyr Leu Gly Ala Val Asp Lys Pro Ile Glu Asp Asn Phe Tyr.
TCA TTA AGA GCA GAG
Asn Leu LysPhe Ile Ala Ser GluTyr PheLeu Ser Leu Arg Ala Glu GTA TTT ACT ATC CCT
Leu Met PheAla Asp Glu Asn LysAsn TyrLeu Val Phe Thr Ile Pro CCC ATA ATG ATC AAC
Leu Asn ThrLeu Phe Ser Ile GlyGly CysHis Pro Ile Met Ile Asn TTA AGA GAT TAT AAT
Asp Leu GlyLeu Gln Ala Thr GlnIle GlyHis Leu Arg Asp Tyr Asn AAT GTT TTC TTA TGT
Met Phe AsnTyr Cys Tyr Asp AspSer AlaIle Asn Val Phe Leu Cys CTT GGA TGG TCC GCA
Gly Thr ProSer Lys Thr Leu AsnLeu MetHis Leu Gly Trp Ser Ala TCA CCA TTC TCA AAC
Ser Met LeuVal Phe Gly Phe AsnPro LeuHis Ser Pro Phe Ser Asn CAT CGG CAT CAA AAG
Asp Asp LeuHis Val His Val AlaThr AspThr His Arg His Gln Lys TTG CAT GTC ATG AGA
His Ser GlyIle Ser Leu Phe HisPhe TrpThr Leu His Val Met Arg ATA CTG TCA GAC CAG
Trp Gly ValIle Asp Asp Lys GlyIle PheLeu Ile Leu Ser Asp Gln GAT AGA AGC CAT TTA
Ser Leu GluGlu Gln Arg Gly IleCys AlaPhe Asp Arg Ser His Leu AAT ATC AAC ATA AGG
Val Met ProGlu Met Gln Tyr MetThr AlaThr Asn Ile Asn Ile Arg TAT AAA ATG TTA GTT
Ile Asp GlnIle Thr Ser Ala LysVal IleIle Tyr Lys Met Leu Val GGT ATG ACA GTA AGA
Tyr Glu AsnSer Leu Glu Ser PheArg TrpGlu Gly Met Thr Val Arg TTA GCT ATC ACA TGG
Asn Gly ArgArg Trp Ile Thr SerGln AspVal Leu Ala Ile Thr Trp ACA AAA TTC AAT GGG
Ile Asn LysGlu Thr Leu Leu PheHis ThrIle Thr Lys Phe Asn Gly Thr Phe Ala His Arg Arg Phe Glu Ile Pro Lys Phe Lys Lys Phe Met Gln Thr Met Asn Thr Ala Lys Tyr Pro Val Asp Ile Ser His Thr Ile Leu Glu Trp Asn Tyr Phe Asn Cys Ser Ile Ser Lys Asn Ser Ser Lys Met Asp His IleThr Asn Thr GluTrp Leu His Phe Asn Leu Thr Ala ATG GAT GGT TTG
Asn Tyr AspMetVal Ser Glu TyrAsn TyrAsn Ala Met Asp Gly Leu CAC TAC GAA TTT
Val Tyr AlaValAla Thr His HisIle GlnGln Val His Tyr Glu Phe GCA CCC AGA ACT
Glu Ser GlnLysLys Lys Lys PhePhe ValCys Gln Ala Pro Arg Thr ATG ACC GTA AAC
Gln Val SerSerLeu Lys Arg PheThr ProVal Gly Met Thr Val Asn AAG AGG AAT ACA
Glu Leu ValAsnMet His Glu GlnCys GluTyr Asp Lys Arg Asn Thr AAC CCA GGC TTA
Ile Phe LeuIleTrp Phe Gln LeuGly LysVal Lys Asn Pro Gly Leu CCT TTT CAG GAA
Ile Gly SerTyrLeu Cys Pro ArgGln LeuHis Ile Pro Phe Gln Glu TGG ATG GGA GTG
Ser Asp AspLeuGlu Ala Gly ThrSer ValPro Ser Trp Met Gly Val GCA ACT GGA AAA
Ser Val CysSerVal Cys Ala PheArg IleHis Gln Ala Thr Gly Lys TGC TTT TGT TGC
Lys Glu ThrAlaAsp Cys Asp ValGln ProGlu Asn Cys Phe Cys Cys ACA ATG CAG AAG
Glu Val SerAsnGlu Asp Glu CysVal CysPro Tyr Thr Met Gln Lys ATA AAA CAC TCA
Asp Lys TyrAlaAsn Glu Thr CysLeu ArgAla Val Ile Lys His Ser GAA CCA GGG CTA .
SerPhe Leu Tyr Glu Asp LeuGlyIle LeuGlyCys Ile Ala Pro Ala ACA
AlaLeu SerPheSer Ala Ile IleLeuValLeu IleThrPhe Leu Thr GTG
LysTyr LysAspThr Pro Ile LysAlaAsnAsn ArgIleLeu Ser Val GTC
TyrIle LeuLeuIle Ser Leu PheCysPheLeu CysSerLeu Leu Val GTC
PheIle GlyHisPro Asn Gln SerCysValLeu GlnGlnThr Thr Val TCT
PheGly ValPhePhe Thr Val ValSerThrVal LeuAlaLys Thr Ser AAG
IleThr ValValMet Ala Phe LeuThrThrPro GlyArgArg Met Lys GCA
ArgGlu MetLeuVal Thr Gly ProLysLeuVal IleProIle Cys Ala TGT
ThrLeu IleGlnPhe Val Leu GlyIleTrpLeu IleThrSer Pro Cys CAA
ProPhe IleAspArg Asp Ile SerGluHisGly LysIleVal Ile Gln ATT
LeuCys AsnLysGly Ser Val AlaPheHisVal ValLeuGly Tyr Ile AGC
LeuGly SerLeuAla Leu Gly PheThrLeuAla PheLeuAla Arg Ser GAA
AsnLeu ProAspThr Phe Asn AlaLysPheLeu ThrPheSer Met Glu ATC
LeuVal PheCysSer Val Trp ThrPheLeuPro ValTyrHis Ser Ile GTT
ThrArg GlyLysVal Met Val ValGluValPhe SerIleLeu Ala Val TGT
SerSer AlaGlyLeu Leu Met IlePheValPro LysCysTyr Val Cys AAT
IleLeu ValArgPro Asp Ser PheIleArgLys TyrLysAsp Lys Asn .
_77_ Phe Arg Tyr ATAAAAATTT AAATAATATA CAAATTTGAA
{2) INFORMATION FOR SEQ ID N0:6:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 803 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal {xi) SEQUENCE DESCRIPTION: SEQ ID N0:6:
His Phe Tyr Leu Gly Ala Val Asp Lys Pro Ile Glu Asp Asn Phe Tyr Asn Ser Leu Leu Lys Phe Arg Ile Ala Ala Ser Glu Tyr Glu Phe Leu Leu Val Met Phe Phe Ala Thr Asp Glu Ile Asn Lys Asn Pro Tyr Leu Leu Pro Asn Ile Thr Leu Met Phe Ser Ile Ile Gly Gly Asn Cys His Asp Leu Leu Arg Gly Leu Asp Gln Ala Tyr Thr Gln Ile Asn Gly His Met Asn Phe Val Asn Tyr Phe Cys Tyr Leu Asp Asp Ser Cys Ala Ile Gly Leu Thr Gly Pro Ser Trp Lys Thr Ser Leu Asn Leu Ala Met His Ser Ser Met Pro Leu Val Phe Phe Gly Ser Phe Asn Pro Asn Leu His Asp His Asp Arg Leu His His Val His Gln Val Ala Thr Lys Asp Thr His Leu Ser His Gly Ile Val Ser Leu Met Phe His Phe Arg Trp Thr Trp Ile Gly Leu Val Ile Ser Asp Asp Asp Lys Gly Ile Gln Phe Leu Ser Asp Leu Arg Glu Glu Ser Gln Arg His Gly Ile Cys Leu Ala Phe Val Asn Met Ile Pro Glu Asn Met Gln Ile Tyr Met Thr Arg Ala Thr Ile Tyr Asp Lys Gln Ile Met Thr Ser Leu Ala Lys Val Val Ile Ile Tyr Gly Glu Met Asn Ser Thr Leu Glu Val Ser Phe Arg Arg Trp Glu Asn Leu Gly Ala Arg Arg Ile Trp Ile Thr Thr Ser Gln Trp Asp Val Ile Thr Asn Lys Lys Glu Phe Thr Leu Asn Leu Phe His Gly Thr Ile Thr Phe Ala His Arg Arg Phe Glu Ile Pro Lys Phe Lys Lys Phe Met Gln Thr Met Asn Thr Ala Lys Tyr Pro Val Asp Ile Ser His Thr Ile .
_78_ Leu Glu Trp Asn Tyr Phe Asn Cys Ser Ile Ser Lys Asn Ser Ser Lys Met Asp His Ile Thr Phe Asn Asn Thr Leu Glu Trp Thr Ala Leu His Asn Tyr Asp Met Val Met Ser Asp Glu Gly Tyr Asn Leu Tyr Asn Ala Val Tyr Ala Val Ala His Thr Tyr His Glu His Ile Phe Gln Gln Val Glu Ser Gln Lys Lys Ala Lys Pro Lys Arg Phe Phe Thr Val Cys Gln Gln Val Ser Ser Leu Met Lys Thr Arg Val Phe Thr Asn Pro Val Gly Glu Leu Val Asn Met Lys His Arg Glu Asn Gln Cys Thr Glu Tyr Asp Ile Phe Leu Ile Trp Asn Phe Pro Gln Gly Leu Gly Leu Lys Val Lys Ile Gly Ser Tyr Leu Pro Cys Phe Pro Gln Arg Gln Glu Leu His Ile Ser Asp Asp Leu Glu Trp Ala Met Gly Gly Thr Ser Val Val Pro Ser Ser Val Cys Ser Val Ala Cys Thr Ala Gly Phe Arg Lys Ile His Gln Lys Glu Thr Ala Asp Cys Cys Phe Asp Cys Val Gln Cys Pro Glu Asn Glu Val Ser Asn Glu Thr Asp Met Glu Gln Cys Val Lys Cys Pro Tyr Asp Lys Tyr Ala Asn Ile Glu Lys Thr His Cys Leu Ser Arg Ala Val Ser Phe Leu Ala Tyr Glu Asp Pro Leu Gly Ile Ala Leu Gly Cys Ile Ala Leu Ser Phe Ser Ala Ile Thr Ile Leu Val Leu Ile Thr Phe Leu Lys Tyr Lys Asp Thr Pro Ile Val Lys Ala Asn Asn Arg Ile Leu Ser Tyr Ile Leu Leu Ile Ser Leu Val Phe Cys Phe Leu Cys Ser Leu Leu Phe Ile Gly His Pro Asn Gln Val Ser Cys Val Leu Gln Gln Thr Thr Phe Gly Val Phe Phe Thr Val Ser Val Ser Thr Val Leu Ala Lys Thr Ile Thr Val Val Met Ala Phe Lys Leu Thr Thr Pro Gly Arg Arg Met Arg Glu Met Leu Val Thr Gly Ala Pro Lys Leu Val Ile Pro Ile Cys Thr Leu Ile Gln Phe Val Leu Cys Gly Ile Trp Leu Ile Thr Ser Pro Pro Phe Ile Asp Arg Asp Ile Gln Ser Glu His Gly Lys Ile Val Ile Leu Cys Asn Lys Gly Ser Val Ile Ala Phe His Val Val Leu Gly Tyr Leu Gly Ser Leu Ala Leu Gly Ser Phe Thr Leu Ala Phe Leu Ala Arg Asn Leu Pro Asp Thr Phe Asn Glu Ala Lys Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ser Val Trp Ile Thr Phe Leu Pro Val Tyr His Ser Thr Arg Gly Lys Val Met Val Val Val Glu Val Phe Ser Ile Leu Ala Ser Ser Ala Gly Leu Leu Met Cys Ile Phe Val Pro Lys Cys Tyr Val Ile Leu Val Arg Pro Asp Ser Asn Phe Ile Arg Lys Tyr Lys Asp Lys Phe Arg Tyr (2) INFORMATION FOR SEQ ID N0:7:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3625 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 117...2672 (D) OTHER INFORMATION: VR4 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:7:
', TGAATATGCA 60 ATAAACCTCA
CATTTGCACA
AAGAAATAAA
AGCTGGTAGA
AATCTGATGT
GCTGATATGC ATGGCACTTC TTAAGGCAGG
ACAATCCGCA AAAAAG
CTGCCCAGGT ATG
Met TTC AAT ACA
Phe IlePhe Met Gly Val Phe LeuLeu Ile Leu Leu Met Phe Asn Thr ', GCC AATTTC ATT GAT CCC AGG TTTTGG ATA TTG GAT GAA 215 TGC AGA AAT
Ala AsnPhe Ile Asp Pro Arg PheTrp Ile Leu Asp Glu Cys Arg Asn TTA GCT ATC
', Ile ThrAsp Glu Tyr Leu Gly SerCys Phe Leu Ala Ala Leu Ala Ile GAT AAC ACT
Val GlnThr Pro Ile Glu Lys TyrPhe Thr Leu Asn Phe Asp Asn Thr j 50 55 60 65 AAA TTG TTG
Leu LysThr Thr Lys Asn His TyrAla Ala Vai Phe Ala Lys Leu Leu CCT TTA AAT
Met AspGlu Ile Asn Arg Tyr AspLeu Pro Met Ser Leu Pro Leu Asn Ile Ile Arg Tyr Ser Leu Gly His Cys Asp Gly Lys Thr Val Thr Pro Thr Pro Tyr Leu Phe His Arg Lys Lys Gln Ser Pro Ile Pro Asn Tyr Phe Cys Asn Glu Glu Ser Met Cys Ser Phe Leu Leu Ser Gly Pro Asn Trp Asp Glu Ser Leu Ser Phe Trp Lys Tyr Leu Asp Ser Phe Leu Ser ', 150 155 160 Pro Arg Ile Leu Gln Leu Ser Tyr Gly Ser Phe Ser Ser Ile Phe Ser .
CCC TAT GCC GAC
AspAsp Glu Gln Tyr Tyr Leu Gln Met Pro Lys Thr Pro Tyr Ala Asp ATG TTC TAT TGG
SerLeu Ala Leu Ala Val Ser Ile Leu Leu Lys Asn Met Phe Tyr Trp ATC GAT GGA TTT
TrpIle Gly Leu Val Pro Asp Asp Gln Asn Gln Leu Ile Asp Gly Phe CAG AAC ATT GCC
LeuGlu Leu Lys Lys Ser Glu Lys Glu Cys Phe Phe Gln Asn Ile Ala GTT GTT CCA ACT
ValLys Met Ile Ser Asp Glu Ser Phe Gln Lys Glu Val Val Pro Thr ATT TCA AAT ATC
IleAsn Tyr Lys Gln Val Lys Leu Thr Val Ile Ile Ile Ser Asn Ile AAT GAT TTC TGG
TyrGly Glu Thr Tyr Phe Ile Leu Ile Arg Met Glu Asn Asp Phe Trp AGA ATC AAA AAT
ProPro Ile Leu Gln Ile Trp Thr Thr Gln Leu Phe Arg Ile Lys Asn GAC CAT TTC TCA
ProThr Ser Lys Thr Ile Ser Asp Thr Tyr Gly Leu Asp His Phe Ser CAT ATT TTT TTT
ThrPhe Leu Pro His Gly Glu Ser Gly Lys Asn Val His Ile Phe Phe CTC ACA TGT ATG
GinThr Trp Phe His Arg Asn Asp Leu Leu Val Pro Leu Thr Cys Met AAC GAC TCT AAA
GluTrp Lys Tyr Ile Ser Glu Ser Ala Asn Cys Ile Asn Asp Ser Lys TCT TCA TGG GAA
LeuLys Asn Ser Ser Asp Ala Phe Asp Leu Met Glu Ser Ser Trp Glu TTT AAT AAC AAT
LysLeu Asp Met Ala Ser Glu Ser His Ile Tyr Ala Phe Asn Asn Asn CAT CAT AAT CAG
ValHis Ala Ile Ala Ala Leu Glu Met Leu Gln Ala His His Asn Gln GAT AAA AGT TGC
AspAsn Gln Ala Ile Asn Gly Gly Ala Ser His Leu Asp Lys Ser Cys W0.99I00422 PCTNS98/13680 Lys Val Asn Ser Phe Leu Arg Arg Thr Tyr Phe Thr Asn Pro Leu Gly Asp Lys Val Phe Met Lys Gln Arg Val Ile Met Gln Asp Glu Tyr Asp Ile Val His Phe Ala Asn Leu Ser Gln His Leu Gly Ile Lys Met Lys Leu Gly Lys Phe Ser Pro Tyr Leu Pro His Gly Arg His Ser His Leu Tyr Val Asp Met Ile Glu Leu Ala Thr Gly Arg Arg Lys Met Pro Ser ', 500 505 510 Ser Val Cys Ser Ala Asp Cys Ser Pro Gly Phe Arg Arg Leu Trp Lys ', 515 520 525 Glu Gly Met Ala Ala Cys Cys Phe Val Cys Ser Pro Cys Pro Glu Asn Glu Ile Ser Asn Glu Thr Asn Met Asp Gln Cys Val Asn Cys Pro Glu ', Tyr Gln Tyr Ala Asn Thr Glu Gln Asn Lys Cys Ile Gln Lys Gly Val Thr Phe Leu Ser Tyr Glu Asp Pro Leu Gly Met Ala Leu Ala Leu Met Ala Phe CysPhe SerAlaPheThr Val LeuCys PheVal Ala Val Val AAT AGC
Lys His HisAsp ThrProIleVal LysAla AsnArg LeuSer Asn Ser TTT TCC
Tyr Leu LeuLeu MetSerLeuMet PheCys LeuCys PhePhe Phe Ser GTC CAA
Phe Ile GlyLeu ProAsnLysVal IleCys LeuGln IleThr Val Gln ACA GCC
Phe Gly IleVal PheThrValAla ValSer ValLeu LysThr Thr Ala GTC AGA
', Val Thr ValVal LeuAlaPheLys ValThr ProGly ArgLeu Val Arg AGA TAC TTC CTT GTA TCA GGG ACA CTA AAC TAC ATT ATT CCT ATA TGT. 2231 Arg Tyr Phe Leu Val Ser Gly Thr Leu Asn Tyr Ile Ile Pro Ile Cys GTC TCT CCT
Ser Leu Leu Gln Cys Val Leu Cys Ala Ile Trp Leu Ala Val Ser Pro ATC ATC ATT
Pro Phe Val Asp Ile Asp Glu His Ser Gln His Gly His Ile Ile Ile CTT GGA TAC
Val Cys Asn Lys Gly Ser Val Thr Ala Phe Tyr Cys Val Leu Gly Tyr TTG GCC AAG
Leu Ala Cys Leu Ala Leu Gly Ser Phe Thr Leu Ala Phe Leu Ala Lys TTC AGC ATG
Asn Leu Pro Asp Ala Phe Asn Glu Ala Lys Phe Leu Thr Phe Ser Met TAC CAT AGC
Leu Val Phe Cys Ser Val Trp Val Thr Phe Leu Pro Val Tyr His Ser ATC TTG GCA
Thr Lys Gly Lys His Met Val Ala Val Glu Ile Phe Ser Ile Leu Ala ATT TAT ATC
Ser Ser Ala Gly Met Leu Gly Cys Ile Phe Val Pro Lys Ile Tyr Ile AGA GAA AAA
Ile Leu Met Arg Pro Glu Arg Asn Ser Thr Gln Lys Ile Arg Glu Lys ACATAACCA
Ser Tyr Phe GGTTGCTCTA
AATCTTGCAC
CAATTTTATT
GTTGATAAGG
GGTTACACAT
ATAATCAGCA
AGAAAATACT
GAAATGTTCC
CAGGGATTCT
ATTCTCAACA
TACACAAGCT
CAGTGGGAGA
GCATTGGGGA
GTCAGTGGGG
AATAAATTAA
AAAA
(2) INFORMATION FOR SEQ ID NO: B:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 852 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:8:
Met Phe Ile Phe Met Gly Val Phe Phe Leu Leu Asn Ile Thr Leu Leu Met Ala Asn Phe Ile Asp Pro Arg Cys Phe Trp Arg Ile Asn Leu Asp Glu Ile Thr Asp Glu Tyr Leu Gly Leu Ser Cys Ala Phe Ile Leu Ala Ala Val Gln Thr Pro Ile Glu Lys Asp Tyr Phe Asn Thr Thr Leu Asn Phe Leu Lys Thr Thr Lys Asn His Lys Tyr Ala Leu Ala Leu Val Phe Ala Met Asp Glu Ile Asn Arg Tyr Pro Asp Leu Leu Pro Asn Met Ser Leu Ile Ile Arg Tyr Ser Leu Gly His Cys Asp Gly Lys Thr Val Thr Pro Thr Pro Tyr Leu Phe His Arg Lys Lys Gln Ser Pro Ile Pro Asn Tyr Phe Cys Asn Glu Glu Ser Met Cys Ser Phe Leu Leu Ser Gly Pro Asn Trp Asp Glu Ser Leu Ser Phe Trp Lys Tyr Leu Asp Ser Phe Leu Ser Pro Arg Ile Leu Gln Leu Ser Tyr Gly Ser Phe Ser Ser Ile Phe Ser Asp Asp Glu Gln Tyr Pro Tyr Leu Tyr Gln Met Ala Pro Lys Asp Thr Ser Leu Ala Leu Ala Met Val Ser Phe Ile Leu Tyr Leu Lys Trp Asn Trp Ile Gly Leu Val Ile Pro Asp Asp Asp Gln Gly Asn Gln Phe Leu Leu Glu Leu Lys Lys Gln Ser Glu Asn Lys Glu Ile Cys Phe Ala Phe Val Lys Met Ile Ser Val Asp Glu Val Ser Phe Pro Gln Lys Thr Glu Ile Asn Tyr Lys Gln Ile Val Lys Ser Leu Thr Asn Val Ile Ile Ile Tyr Gly Glu Thr Tyr Asn Phe Ile Asp Leu Ile Phe Arg Met Trp Glu Pro Pro Ile Leu Gln Arg Ile Trp Ile Thr Thr Lys Gln Leu Asn Phe Pro Thr Ser Lys Thr Asp Ile Ser His Asp Thr Phe Tyr Gly Ser Leu Thr Phe Leu Pro His His Gly Glu Ile Ser Gly Phe Lys Asn Phe Val Gln Thr Trp Phe His Leu Arg Asn Thr Asp Leu Cys Leu Val Met Pro Glu Trp Lys Tyr Ile Asn Ser Glu Asp Ser Ala Ser Asn Cys Lys Ile Leu Lys Asn Ser Ser Ser Asp Ala Ser Phe Asp Trp Leu Met Glu Glu Lys Leu Asp Met Ala Phe Ser Glu Asn Ser His Asn Ile Tyr Asn _ 385 390 395 400 Ala Val His Ala Ile Ala His Ala Leu His Glu Met Asn Leu Gln Gln Ala Asp Asn Gln Ala Ile Asp Asn Gly Lys Gly Ala Ser Ser His Cys Leu Lys Val Asn Ser Phe Leu Arg Arg Thr Tyr Phe Thr Asn Pro Leu Gly Asp Lys Val Phe Met Lys Gln Arg Val Ile Met Gln Asp Glu Tyr Asp Ile Val His Phe Ala Asn Leu Ser Gln His Leu Gly Ile Lys Met Lys Leu Gly Lys Phe Ser Pro Tyr Leu Pro His Gly Arg His Ser His Leu Tyr Val Asp Met Ile Glu Leu Ala Thr Gly Arg Arg Lys Met Pro Ser Ser Val Cys Ser Ala Asp Cys Ser Pro Gly Phe Arg Arg Leu Trp Lys Glu Gly Met Ala Ala Cys Cys Phe Val Cys Ser Pro Cys Pro Glu Asn Glu Ile Ser Asn Glu Thr Asn Met Asp Gln Cys Val Asn Cys Pro Glu Tyr Gln Tyr Ala Asn Thr Glu Gln Asn Lys Cys Ile Gln Lys Gly Val Thr Phe Leu Ser Tyr Glu Asp Pro Leu Gly Met Ala Leu Ala Leu Met Ala Phe Cys Phe Ser Ala Phe Thr Ala Val Val Leu Cys Val Phe Val Lys His His Asp Thr Pro Ile Val Lys Ala Asn Asn Arg Ser Leu Ser Tyr Leu Leu Leu Met Ser Leu Met Phe Cys Phe Leu Cys Ser Phe Phe Phe Ile Gly Leu Pro Asn Lys Val Ile Cys Val Leu Gln Gln Ile Thr Phe Gly Ile Val Phe Thr Val Ala Val Ser Thr Val Leu Ala Lys Thr Val Thr Val Val Leu Ala Phe Lys Val Thr Val Pro Gly Arg Arg Leu Arg Tyr Phe Leu Val Ser Gly Thr Leu Asn Tyr Ile Ile Pro Ile Cys Ser Leu Leu Gln Cys Val Leu Cys Ala Ile Trp Leu Ala Val Ser Pro Pro Phe Val Asp Ile Asp Glu His Ser Gln His Gly His Ile Ile Ile Val Cys Asn Lys Gly Ser Val Thr Ala Phe Tyr Cys Val Leu Gly Tyr Leu Ala Cys Leu Ala Leu Gly Ser Phe Thr Leu Ala Phe Leu Ala Lys Asn Leu Pro Asp Ala Phe Asn Glu Ala Lys Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ser Val Trp Val Thr Phe Leu Pro Val Tyr His Ser Thr Lys Gly Lys His Met Val Ala Val Glu Ile Phe Ser Ile Leu Ala Ser Ser Ala Gly Met Leu Gly Cys Ile Phe Val Pro Lys Ile Tyr Ile Ile Leu Met Arg Pro Glu Arg Asn Ser Thr Gln Lys Ile Arg Glu Lys Ser Tyr Phe (2) INFORMATION FOR SEQ ID N0:9:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3125 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 1...2169 (D) OTHER INFORMATION: VR5 (xi) SEQUENCE
DESCRIPTION:
SEQ ID N0:9:
AGT TCA CTT GGA
Ile Cys Asn Glu Glu Met Cys PheLeu Ser Pro Asn Ser Ser Leu Gly AGT AAG GAC TTC
Trp Asp Glu Ser Leu Phe Trp TyrLeu Ser Leu Ser Ser Lys Asp Phe CTT GGA AGT ATC
Pro His Ile Leu Gln Ser Tyr SerPhe Ser Phe Ser Leu Gly Ser Ile CCC TAT GCC AAG
Asp Asp Glu Gln Tyr Tyr Leu GlnMet Pro Asp Thr Pro Tyr Ala Lys ATG TTC TAT AAA
Ser Leu Ala Leu Ala Val Ser IleLeu Leu Trp Asn Met Phe Tyr Lys ATC GAC GGA CAA
Trp Ile Gly Leu Val Pro Asp AspGln Asn Phe Leu Ile Asp Gly Gln ', 85 90 95 CAG AAC ATT TTT
Leu Glu Leu Lys Lys Ser Glu LysGlu Cys Ala Phe Gln Asn Ile Phe Val Lys Met Ile Ser Val Asp Glu Val Ser Phe Pro Gln Lys Thr Glu Ile Tyr Tyr Lys Gln Ile Val Lys Ser Leu Thr Asn Val Ile Ile Ile ', 130 135 140 Tyr Gly Glu Thr Tyr Asn Phe Ile Asp Leu Ile Phe Arg Met Trp Glu ATT CAG
AGA
ATA
TGG
ATC
ACC
ACA
AAA
CAA
TTG
AAT
i Pro Pro Leu Arg Ile Trp ThrThrLys Gln Leu Phe Ile Gln Ile Asn AGT ACA CAT TCA
Pro Thr Lys Asp Ile Ser AspThrPhe Tyr Gly Leu Ser Thr His Ser CTA CAC ATT TTT
Thr Phe Pro His Gly Glu SerGlyPhe Lys Asn Val Leu His Ile Phe TGG CAT ACA ATG
Gln Thr Phe Leu Arg Asn AspLeuTyr Leu Val Pro Trp His Thr Met AAA ATT GAC AAA
', Glu Trp Tyr Asn Ser Glu SerAlaSer Asn Cys Ile Lys Ile Asp Lys WO 99/00422 PCTlUS98/13680 AAG TCA TGG CTA ATG
AAC
Leu LysAsnSer Ser Asp Ala Ser Phe Asp Met Glu Gln Ser Trp Leu GCC AAC ATA
Lys LeuAspMet Phe Ser Asp Asn Ser His Tyr Asn Val Ala Asn Ile GCC AAT CTG
Val HisAlaIle His Ala Leu His Glu Met Gln Gln Ala Ala Asn Leu ATA AGT TCT
Asp AsnGlnAla Asp Asn Gly Lys Gly Ala His Cys Leu Ile Ser Ser TTT ACT AAT
Lys ValAsnSer Leu Arg Arg Thr Tyr Phe Pro Leu Gly Phe Thr Asn ATG CAG GAT
Asp LysValPhe Lys Gln Arg Val Ile Met Glu Tyr Asp Met Gln Asp GCG GGG ATT
Ile ValHisPhe Asn Leu Ser Gln His Leu Lys Met Lys Ala Gly Ile AGC CGA CAC
Leu GlyLysPhe Pro Tyr Leu Pro His Gly Ser His Leu Ser Arg His ATT AGA AAG
Tyr ValAspMet Glu Leu Ala Thr Gly Arg Met Pro Ser Ile Arg Lys GCA AGA AGA
Ser ValCysSer Asp Cys Ser Pro Gly Phe Leu Trp Lys Ala Arg Arg GCC CCC TGC
Glu GlyMetAla Cys Cys Phe Val Cys Ser Pro Glu Asn Ala Pro Cys GAG GTG AAT
Glu IleSerAsn Thr Asn Met Asp Gln Cys Cys Pro Glu Glu Val Asn AAC ATT CAG
Tyr GlnTyrAla Thr Glu Gln Asn Lys Cys Lys Gly Val Asn Ile Gln TAT GCA CTT
Thr PheLeuSer Glu Asp Pro Leu Gly Met Ala Leu Met Tyr Ala Leu TCT CTT TGT
Ala PheCysPhe Ala Phe Thr Ala Val Val Val Phe Val Ser Leu Cys ACT AAC AGA
Lys HisHisAsp Pro Ile Val Lys Ala Asn Ser Leu Ser Thr Asn Arg TAT CTATTACTC TCA CTC ATG TTC TGT TTT TCC TTT TTC.1536 ATG CTG TGC
_ _87_ Tyr Leu LeuLeuMet SerLeuMet PheCysPheLeu CysSerPhe Phe Phe Ile GlyLeuPro AsnLysVal IleCysValLeu GlnGlnIle Thr Phe Gly IleValPhe ThrValAla ValSerThrVal LeuAlaLys Thr Val Thr ValValLeu ATaPheLys ValThrAspPro GlyArgArg Leu Arg Tyr PheLeuVal SerGlyThr LeuAsnTyrIle IleProIle Cys 565 5?0 575 Ser Leu LeuGlnCys ValLeuCys AlaIleTrpLeu AlaValSer Pro Pro Phe ValAspIle AspGluHis SerGlnHisGly HisIleIle Ile Val Cys AsnLysGly SerValThr AlaPheTyrCys ValLeuGly Tyr Leu Ala CysLeuAla LeuGlySer PheThrLeuAla PheLeuAla Lys Asn Leu ProAspAla PheAsnGlu AlaLysPheLeu ThrPheSer Met Leu Val PheCysSer ValTrpVal ThrPheLeuPro ValTyrHis Ser Thr Lys GlyLysHis MetValAla ValGluIlePhe SerIleLeu Ala Ser Ser AlaGlyMet LeuGluCys IlePheValPro LysIleTyr Ile ', ., Ile Leu Met Arg Pro Glu Arg Asn Ser Thr Gln Lys Ile Arg Glu Lys ' 705 710 715 720 Ser Tyr Phe ', CTTCGTTTTG ATTTCATGGA GATTGCCCTC TGGTAACTTC CAAAAACCGT TGATAAGGCA 2458 ', AAACCATCTA CCAAATCAAA TAATCAATGA GAAACACAGA CTAACTAAAT AATCAGCAAA 2578 _88_ (2) INFORMATION FOR SEQ ID NO:10:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 723 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID NO:10:
Ile Cys Asn Glu Glu Ser Met Cys Ser Phe Leu Leu Ser Gly Pro Asn Trp Asp Glu Ser Leu Ser Phe Trp Lys Tyr Leu Asp Ser Phe Leu Ser Pro His Ile Leu Gln Leu Ser Tyr Gly Ser Phe Ser Ser Ile Phe Ser Asp Asp Glu Gln Tyr Pro Tyr Leu Tyr Gln Met Ala Pro Lys Asp Thr Ser Leu Ala Leu Ala Met Val Ser Phe Ile Leu Tyr Leu Lys Trp Asn Trp Ile Gly Leu Val Ile Pro Asp Asp Asp Gln Gly Asn Gln Phe Leu Leu Glu Leu Lys Lys Gln Ser Glu Asn Lys Glu Ile Cys Phe Ala Phe Val Lys Met Ile Ser Val Asp Glu Val Ser Phe Pro Gln Lys Thr Glu Ile Tyr Tyr Lys Gln Ile Val Lys Ser Leu Thr Asn Val Ile Ile Ile Tyr Gly Glu Thr Tyr Asn Phe Ile Asp Leu Ile Phe Arg Met Trp Glu Pro Pro Ile Leu Gln Arg Ile Trp Ile Thr Thr Lys Gln Leu Asn Phe Pro Thr Ser Lys Thr Asp Ile Ser His Asp Thr Phe Tyr Gly Ser Leu Thr Phe Leu Pro His His Gly Glu Ile Ser Gly Phe Lys Asn Phe Val Gln Thr Trp Phe His Leu Arg Asn Thr Asp Leu Tyr Leu Val Met Pro Glu Trp Lys Tyr Ile Asn Ser Glu Asp Ser Ala Ser Asn Cys Lys Ile Leu Lys Asn Ser Ser Ser Asp Ala Ser Phe Asp Trp Leu Met Glu Gln Lys Leu Asp Met Ala Phe Ser Asp Asn Ser His Asn Ile Tyr Asn Val Val His Ala Ile Ala His Ala Leu His Glu Met Asn Leu Gln Gln Ala Asp Asn Gln Ala Ile Asp Asn Gly Lys Gly Ala Ser Ser His Cys Leu Lys Val Asn Ser Phe Leu Arg Arg Thr Tyr Phe Thr Asn Pro Leu Gly Asp Lys Val Phe Met Lys Gln Arg Val Ile Met Gln Asp Glu Tyr Asp _ _ _89_ Ile Val His Phe Ala Asn Leu Ser Gln His Leu Gly Ile Lys Met Lys Leu Gly Lys Phe Ser Pro Tyr Leu Pro His Gly Arg His Ser His Leu Tyr Val Asp Met Ile Glu Leu Ala Thr Gly Arg Arg Lys Met Pro Ser Ser Val Cys Ser Ala Asp Cys Ser Pro Gly Phe Arg Arg Leu Trp Lys Glu Gly Met Ala Ala Cys Cys Phe Val Cys Ser Pro Cys Pro Glu Asn Glu Ile Ser Asn Glu Thr Asn Met Asp Gln Cys Val Asn Cys Pro Glu Tyr Gln Tyr Ala Asn Thr Glu Gln Asn Lys Cys Ile Gln Lys Gly Val Thr Phe Leu Ser Tyr Glu Asp Pro Leu Gly Met Ala Leu Ala Leu Met Ala Phe Cys Phe Ser Ala Phe Thr Ala Val Val Leu Cys Val Phe Val Lys His His Asp Thr Pro Ile Val Lys Ala Asn Asn Arg Ser Leu Ser Tyr Leu Leu Leu Met Ser Leu Met Phe Cys Phe Leu Cys Ser Phe Phe Phe Ile Gly Leu Pro Asn Lys Val Ile Cys Val Leu Gln Gln Ile Thr Phe Gly Ile Val Phe Thr Val Ala Val Ser Thr Val Leu Ala Lys Thr Val Thr Val Val Leu Ala Phe Lys Val Thr Asp Pro Gly Arg Arg Leu Arg Tyr Phe Leu Val Ser Gly Thr Leu Asn Tyr Ile Ile Pro Ile Cys Ser Leu Leu Gln Cys Val Leu Cys Ala Ile Trp Leu Ala Val Ser Pro Pro Phe Val Asp Ile Asp Glu His Ser Gln His Gly His Ile Ile Ile Val Cys Asn Lys Gly Ser Val Thr Ala Phe Tyr Cys Val Leu Gly Tyr Leu Ala Cys Leu Ala Leu Gly Ser Phe Thr Leu Ala Phe Leu Ala Lys Asn Leu Pro Asp Ala Phe Asn Glu Ala Lys Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ser Val Trp Val Thr Phe Leu Pro Val Tyr His Ser Thr Lys Gly Lys His Met Val Ala Val Glu Ile Phe Ser Ile Leu Ala Ser Ser Ala Gly Met Leu Glu Cys Ile Phe Val Pro Lys Ile Tyr Ile Ile Leu Met Arg Pro Glu Arg Asn Ser Thr Gln Lys Ile Arg Glu Lys Ser Tyr Phe (2) INFORMATION FOR SEQ ID NO:11:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1889 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:11:
GCCATTGTTT
GACCACAAAG
TGTACGGCTT
GAAAATATGG
CTAATGCGAA
CCCCATATTA
TTTAAGCACA
AAAAAATACC
TTTTCTGATA
TTACCTAGTC
GTGTACGCTG
TGTGAAAATG
ATTGAGGTGA
CTTAACCTCT
GCAAATGCTC
ATATTTTCAG
GTAACCCTGG
ATTTCTAATG
ACAGAGAAGA
GGGATGGCTC
ATATTTGTGA
ACTTTGCTCA
AACACAGTTG
GCCACTGTGT
AGAATGGTAA
CTGATCCAAC
GATGCTCATA
TTCCACTCTG
TTGTCAAGAA
GTATTCTTCT
ATGGTCGCCG
(2) INFORMATION FOR SEQ ID N0:12:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 604 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID N0:12:
Ser Leu Ser Leu Ala Ile Val Ser Leu Met Val His Phe Arg Trp Ser Trp Val Gly Leu Ile Leu Pro Asp Asp His Lys Gly Asn Lys Ile Leu Ser Asp Phe Arg Lys Glu Met GIu Arg Lys Arg Ile Cys Thr Ala Phe Val Lys Met Ile Pro Ala Thr Trp Thr Ser Ser Phe Val Lys Phe Trp Glu Asn Met Asp Asp Thr Asn Ile Ile Ile Ile Tyr Gly Asp Ile Asp Ser Leu Glu Gly Leu Met Arg Asn Ile Gly Gln Arg Leu Leu Thr Trp His Val Trp Val Met Asn Ile Glu Pro His Ile Ile Glu Tyr Asp Asn Tyr Phe Met Leu Asp Ser Phe His Gly Ser Leu Ile Phe Lys His Asn Tyr Arg Glu Asn Phe Glu Phe Thr Lys Phe Ile Arg Thr Val Asn Pro Lys Lys Tyr Pro Glu Asp Ile Tyr Leu Pro Lys Met Trp Tyr Leu Phe Phe Met Cys Ser Phe Ser Asp Ile Asn Cys Gln Val Leu Asp Ser Cys Gln Thr Asn Ala Ser Leu Asp Met Leu Pro Ser Gln Ile Phe Asp Val Val Met Ser Glu Glu Ser Thr Ser Ile Tyr Asn Ala Val Tyr Ala Val Ala His Ser Leu His Glu Met Arg Leu Gln Gln Leu Gln Thr Gln Pro Cys Glu Asn Glu Glu Gly Met Glu Phe Phe Pro Trp Gln Leu Asn Thr Phe Leu Lys Asp Ile Glu Val Arg Val Asn Ser Leu Asp Trp Arg Gln Arg Ile Asp Ala Glu Tyr Asp Ile Leu Asn Leu Trp Asn Leu Pro Lys Gly Leu Gly Leu Lys Val Lys Ile Gly Asn Phe Tyr Ala Asn Ala Pro Gln Gly Gln Gln Leu Ser Leu Ser Glu Gln Met Ile Gln Trp Pro Glu Ile Phe Ser Glu Ile Pro Gln Ser Val Cys Ser Glu Ser Cys Gly Pro Gly Phe Arg Lys Val Thr Leu Glu Asn Lys Ala Ile Cys Cys Tyr Asn Cys Thr Pro Cys Ala Asp Asn Glu Ile Ser Asn Glu Thr Asp Val Asp Gln Cys Val Lys Cys Pro Glu Ser His Tyr Ala Asn Thr Glu Lys Ser Asn Cys Tyr Gln Lys Ser Val Ser Phe Leu Gly Tyr Glu Asp Pro Leu Gly Met Ala Leu Ala Ser Ile Ala Leu Cys Leu Ser Ala Leu Thr Ala Phe Val Ile Gly Ile Phe Val Lys His Lys Asp Thr Pro Ile Val Lys Ala Asn Asn Gln Ala Leu Ser Tyr Thr Leu Leu Ile Thr Leu Lys Phe Cys Phe Leu Cys Ser Leu Asn Phe Ile Gly Gln Pro Asn Thr Val Ala Cys Ile Leu Gln Gln Thr Thr Phe Ala Val Ala Phe Thr Met Ala Leu Ala Thr Val Leu Ala Lys Ala Ile Thr Val Val Leu Ala Phe Lys Val Ser Phe Pro Gly Arg Met Val Arg Trp Leu Met Ile Ser Arg Gly Pro Asn Tyr Ile Ile Pro Ile Cys Thr Leu Ile Gln Leu Leu Leu Cys Gly Ile Trp Met Ala Ile Ser Pro Pro Tyr Ile Asp Gln Asp Ala His Ile Glu His Gly His Ile Ile Ile Leu Cys Asn Lys Gly Ser Ala Val Ala Phe His Ser Val Leu Gly Tyr Leu Cys Phe Leu Ala Leu Gly Ser Tyr Thr Met Ala Phe Leu Ser Arg Asn Leu Pro Asp Thr Phe Asn Glu Ser Lys Phe Ile Ser Leu Ser Met Leu Val Phe Phe Cys Val Trp Ile Thr Phe Leu Pro Val Tyr His Ser Thr Lys Gly Lys Val (2) INFORMATION FOR SEQ ID N0:13:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1889 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:13:
ATGGCGACGA
AGGACACATC
TCGAAGTCTTCTGCATCCAAGCCGAATTC lgg9 (2) INFORMATION FOR SEQ ID N0:14:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 604 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:14:
Ser Leu Ser Leu Ala Ile Val Ser Leu Met Val His Phe Arg Trp Ser Trp Val Gly Leu Ile Leu Pro Asp Asp His Lys Gly Asn Lys Ile Leu Ser Asp Phe Arg Lys Glu Met Glu Arg Lys Arg Ile Cys Thr Ala Phe Val Lys Met Ile Pro Ala Thr Trp Thr Ser Ser Phe Val Lys Phe Trp Glu Asn Met Asp Asp Thr Asn Ile Ile Ile Ile Tyr Gly Asp Ile Asp Ser Leu Glu Gly Pro Met Arg Asn Ile Gly Gln Arg Leu Leu Thr Trp His Val Trp Val Met Asn Ile Glu Pro His Ile Ile Glu Tyr Asp Asn . 100 105 110 Tyr Phe Met Leu Asp Ser Phe His Gly Ser Leu Ile Phe Lys His Asn Tyr Arg Glu Asn Phe Glu Phe Thr Lys Phe Ile Arg Thr Val Asn Pro Lys Lys Tyr Pro Glu Asp Ile Tyr Leu Pro Lys Met Trp Tyr Leu Phe Phe Met Cys Ser Phe Ser Asp Ile Asn Cys Gln Val Leu Asp Ser Cys Gln Thr Asn Ala Ser Leu Asp Met Leu Pro Ser Gln Ile Phe Asp Val Val Met Ser Glu Glu Ser Thr Ser Ile Tyr Asn Ala Val Tyr Ala Val Ala His Ser Leu His Glu Met Arg Leu Gln Gln Leu Gln Thr Gln Pro Cys Glu Asn Glu Glu Gly Met Glu Phe Phe Pro Trp Gln Leu Asn Thr Phe Leu Lys Asp Ile Glu Val Arg Val Asn Ser Leu Asp Trp Arg Gln Arg Ile Asp Ala Glu Tyr Asp Ile Leu Asn Leu Trp Asn Leu Pro Lys Gly Leu Gly Leu Lys Val Lys Ile Gly Asn Phe Tyr Ala Asn Ala Pro Gln Gly Gln Gln Leu Ser Leu Ser Glu Gln Met Ile Gln Trp Pro Glu Ile Phe Ser Glu Val Pro Gln Ser Val Cys Ser Glu Ser Cys Arg Pro Gly Phe Arg Lys Val Ser Leu Asp Asp Lys Ala Ile Cys Cys Tyr Lys Cys Thr Pro Cys Ala Asp Asn Glu Ile Ser Asn Glu Thr Asp Val Asp Gln Cys Val Lys Cys Pro Glu Ser His Tyr Ala Asn Thr Glu Lys Ser Asn Cys Phe Pro Lys Ser Val Ser Phe Leu Ala Tyr Glu Asp Pro Leu Gly Met Ala Leu Ala Ser Ile Ala Leu Cys Leu Ser Ala Leu Thr Val Phe Val Ile Gly Ile Phe Val Lys Asn Arg Asp Thr Pro Ile Val Lys Ala Asn Asn Arg Thr Leu Ser Tyr Ile Leu Leu Ile Thr Leu Thr Phe Cys Phe Leu Cys Ser Leu Asn Phe Ile Gly Gln Pro Asn Thr Ala Ala Cys Ile Leu Gln Gln Thr Thr Phe Ala Val Ala Phe Thr Met Ala Leu Ala Thr Val Leu Ala Lys Ala Ile Thr Val Val Leu Ala Phe Lys Ile Ser Phe Pro Gly Arg Met Leu Arg Trp Leu Met Ile Ser Arg Gly Pro Arg Tyr Ile Ile Pro Ile Cys Thr Leu Ile Gln Leu Leu Leu Cys Gly Ile Trp Met Ala Thr Ser Pro Pro Phe Ile Asp Gln Asp Val Asn Thr Glu Asp Gly Tyr Ile Ile Leu Leu Cys Asn Lys Gly Ser Ala Val Ala Phe His Ser Val Leu Gly Tyr Leu Cys Phe Leu Ala Leu Gly Ser Tyr Thr Met Ala Phe Leu Ser Arg Asn Leu Pro Asp Thr Phe Asn Glu Ser Lys Phe Leu Ser Phe Ser Met Leu Val Phe Phe Cys Val Trp Val Thr Phe Leu Pro Val Tyr His Ser Thr Lys Gly Lys Val (2) INFORMATION FOR SEQ ID N0:15:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2561 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 80...349 (D) OTHER INFORMATION: VR8 (xi) SEQUENCE DESCRIPTION: SEQ ID
N0:15:
TACATCAGAA
CTC TGT GCT TTC ACG ATT TCA TTG
Met Lys Lys Leu Cys Ala Phe Thr Ile Ser Leu TGC TGT AGT
Leu Phe Leu Lys Phe Ser Leu Ile Leu Trp Ser Glu Pro Cys Cys Ser GAT AAT CAA
Cys Phe Trp Arg Ile Lys Asn Ser Asp Asp Gly Asp Leu Asp Asn Gln GCT GAT GAT
Arg Glu Cys His Phe Tyr Leu Gly Ala Thr Pro Val Glu Ala Asp Asp AGG TTT TTA
Asn Phe Tyr Ser Ser Leu Leu Lys Phe Ser Leu Rsp His Arg Phe Leu TGC CCC TAGCC
Ile Leu Thr Tyr Ala Thr Met Thr Gly Met Ser Ile Arg Cys Pro GTCTCCTTGA
CAGGGTATTC
GCTTTTGTTA
GATCAACAAA
ACTCTAGAAG
ACCTCACAAT
ACTATCACTT
ATGAACACTG
AATTGTTCAA
GAATGGACAT
AATGCTGTTT
CAGAAAAAGG
TCCGTGTGTA
GACTGCTGCT
GAACAGTGTG
TCAAGAGCTG
GCACTGTCCT
ACTCCCATTG
TTCTGCTTTC
CAGCAGACCA
ATAACTGTGG
ATGACAGGGG
GGAATCTGGT
AAGATTGTCA
(2) INFORMATION FOR SEQ ID N0:16:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 90 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:16:
Met Lys Lys Leu Cys Ala Phe Thr Ile Ser Leu Leu Phe Leu Lys Phe Ser Leu Ile Leu Cys Cys Trp Ser Glu Pro Ser Cys Phe Trp Arg Ile Lys Asn Ser Asp Asp Asn Asp Gly Asp Leu Gln Arg Glu Cys His Phe Tyr Leu Gly Ala Ala Asp Thr Pro Val Glu Asp Asn Phe Tyr Ser Ser Leu Leu Lys Phe Arg Phe Ser Leu Asp Hia Leu Ile Leu Thr Tyr Ala Thr Met Thr Gly Cys Pro Met Ser Ile Arg (2) INFORMATION FOR SEQ ID N0:17:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2734 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence _ (B) LOCATION: 80...1387 (D) OTHER INFORMATION: VR9 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:17:
Met Lys Lys Leu Cys Ala Phe Thr Ile Ser Leu Leu Phe Leu Lys Phe Ser Leu Ile Leu Cys Cys Trp Ser Glu Pro Ser.
AAG GAT GAT
CysPhe TrpArgIle AsnSer AspAsnAsp Gly Leu Gln Lys Asp Asp TAC GCA GTT
ArgGlu CysHisPhe LeuGly AlaAspThr Pro Glu Asp Tyr Ala Val CTT TTT AGT
AsnPhe TyrSerSer LeuLys ArgIleAla Ala Glu Tyr Leu Phe Ser ATG GCT AAC
GluPhe LeuLeuVal PhePhe IleAspGlu Ile Arg Asn Met Ala Asn AAC TTG ATT
ProTyr LeuLeuPro IleThr MetPheSer Phe Gly Gly Asn Leu Ile TTG ATG ACA
AsnCys GlnAspLeu ArgVal AspGlnAla Tyr Gln Ile Leu Met Thr TTT TAT GAT
AsnGly HisMetAsn ValAsn PheCysTyr Leu Asp Ser Phe Tyr Asp ACA TCA TTA
CysAla IleGlyLeu GlyPro TrpLysThr Ser Lys Leu Thr Ser Leu ATG GTT TTT
AlaMet HisSerSer ProLeu PhePheGly Pro Asn Pro Met Val Phe GAC CCC GTA
AsnLeu ArgAspHis ArgLeu HisValHis Gln Ala Pro Asp Pro Val TCC ATG TTT
LysAsp ThrHisLeu HisGly ValSerLeu Met His Phe Ser Met Phe GGA ATC CAG
ArgTrp ThrTrpIle MetVal SerAspAsp Asp Gly Ile Gly Ile Gln TTA GAA GGG
GlnPhe LeuSerAsp ArgGlu SerGlnArg His Ile Cys Leu Glu Gly ATG GAA TAC
LeuAla PheValAsn IlePro AsnMetGln Ile Met Thr Met Glu Tyr GAT ATT GCA
ArgAla ThrIleTyr GlnGln MetThrSer Ser Lys Val Asp Ile Ala GAA TCT AGC
ValIle IleTyrGly MetAsn ThrLeuGlu Val Phe Arg Glu Ser Ser TGG GCT ATC ACA ACC
GAA TCA CAA
ArgTrp Glu Leu Gly Arg Arg Trp Ile Thr Gln Glu Ala Ile Thr Ser GTC AAA TTC AAT TTC
TrpAsp Ile Thr Asn Lys Asp Thr Leu Leu His Val Lys Phe Asn Phe ATC CAC GTT CCT TTA
GlyThr Thr Phe Ala His Arg Glu Ile Lys Asn Ile His Val Pro Leu ATG AAC AAA GTA ATT
LysPhe Gln Thr Met Thr Ala Tyr Pro Asp Ser Met Asn Lys Val Ile ATA AAT AAT ATA AAG
HisThr Leu Glu Trp Tyr Phe Cys Ser Ser Asn Ile Asn Asn Ile Lys AGA ATT AAC TTG TGG
SerIle Met His His Thr Phe Asn Thr Glu Thr Arg Ile Asn Leu Trp CAC ATG AGT GGT AGT
SerLeu Asn Tyr Asp Ala Met Asp Glu Tyr Leu His Met Ser Gly Ser GCT GTG ACC GAA ATT
TyrAsn Val Tyr Ala Ala His Tyr His Tyr Phe Ala Val Thr Glu Ile GTA AAA AAA AGA TTC
GlnGln Glu Ser Gln Lys Ala Pro Lys Tyr Thr Val Lys Lys Arg Phe CAG AAC TGAGGTGTCC
AGATGATAAG
TATGCCA
AlaCys Gln Ile Trp Ser Val Gln Asn ', TCACATTTGT GAAACACAAC GATACTCCCA TTGTGAAGGC CAATAACCGCATTCTCAGCT1594 ACATCCTGCT CATCTCTCTC GTCTTCTGC~ TTCTCTGCTC CCTGCTCTTC ATTGGACCTC1654 ', GAGATATACA ATCTGAGCAT GGGAAGATTG TCATTCTTTG CAATAAAGGCTCAGTCATTG1954 _ TCATGGTGGT TGTGGAGGTT TTCTCCATCT TGGCTTCTAG TGCAGGGTTG CTAATGTGTA2194 (2) INFORMATION FOR SEQ ID N0:18:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 436 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:18:
Met Lys Lys Leu Cys Ala Phe Thr Ile Ser Leu Leu Phe Leu Lys Phe Ser Leu Ile Leu Cys Cys Trp Ser Glu Pro Ser Cys Phe Trp Arg Ile Lys Asn Ser Asp Asp Asn Asp Gly Asp Leu Gln Arg Glu Cys His Phe Tyr Leu Gly Ala Ala Asp Thr Pro Val Glu Asp Asn Phe Tyr Ser Ser Leu Leu Lys Phe Arg Ile Ala Ala Ser Glu Tyr Glu Phe Leu Leu Val Met Phe Phe Ala Ile Asp Glu Ile Asn Arg Asn Pro Tyr Leu Leu Pro Asn Ile Thr Leu Met Phe Ser Phe Ile Gly Gly Asn Cys Gln Asp Leu Leu Arg Val Met Asp Gln Ala Tyr Thr Gln Ile Asn Gly His Met Asn Phe Val Asn Tyr Phe Cys Tyr Leu Asp Asp Ser Cys Ala Ile Gly Leu Thr Gly Pro Ser Trp Lys Thr Ser Leu Lys Leu Ala Met His Ser Ser Met Pro Leu Val Phe Phe Gly Pro Phe Asn Pro Asn Leu Arg Asp His Asp Arg Leu Pro His Val His Gln Val Ala Pro Lys Asp Thr His Leu Ser His Gly Met Val Ser Leu Met Phe His Phe Arg Trp Thr Trp Ile Gly Met Val Ile Ser Asp Asp Asp Gln Gly Ile Gln Phe Leu Ser Asp Leu Arg Glu Glu Ser Gln Arg His Gly Ile Cys Leu Ala Phe Val Asn Met Ile Pro Glu Asn Met Gln Ile Tyr Met Thr Arg Ala Thr Ile Tyr Asp Gln Gln Ile Met Thr Ser Ser Ala Lys Val Val Ile Ile Tyr Gly Glu Met Asn Ser Thr Leu Glu Val Ser Phe Arg Arg Trp Glu Glu Leu Gly Ala Arg Arg Ile Trp Ile Thr Thr Ser Gln Trp Asp Val Ile Thr Asn Lys Lys Asp Phe Thr Leu Asn Leu Phe His Gly Thr Ile Thr Phe Ala His His Arg Val Glu Ile Pro Lys Leu Asn Lys Phe Met Gln Thr Met Asn Thr Ala Lys Tyr Pro Val Asp Ile Ser His Thr Ile Leu Glu Trp Asn Tyr Phe Asn Cys Ser Ile Ser Lys Asn Ser Ile Arg Met His His Ile Thr Phe Asn Asn Thr Leu Glu Trp Thr Ser Leu His Asn Tyr Asp Met Ala Met Ser Asp Glu Gly Tyr Ser Leu Tyr Asn Ala Val Tyr Ala Val Ala His Thr Tyr His Glu Tyr Ile Phe Gln Gln Val Glu Ser Gln Lys Lys Ala Lys Pro Lys Arg Tyr Phe Thr Ala Cys Gln Gln Ile Trp Asn Ser Val i 435 (2) INFORMATION FOR SEQ ID N0:19:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2732 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/REY: Coding Sequence (B) LOCATION: 80...1375 II (D) OTHER INFORMATION: VR10 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:19:
Met Lys Lys Leu Cys Ala Phe Thr Ile Ser Phe Leu Ser Leu Lys Phe Ser Leu Ile Leu Cys Cys Leu Thr Glu Ala Ser Cys Phe Trp Arg Ile Lys Asn Ser Glu Asp Ser Asp Gly Asp Leu Gln Arg Glu Cys His Phe Tyr Leu Trp Val Ile Asp Lys Pro Ile Glu Asp Asn Phe Tyr Asn Ser Val Leu Asn Phe Arg Ile Ser Ala Ser Glu Tyr Glu Phe Leu Leu Val Met Phe Phe Ala Thr Asp Glu Ile Asn Lys Asn ', CCT TAT TTA AAC GTT GGT 400 CTT CCC ATA GGT
ACT
TTG
ATA
TTC
AGC
ATC
Pro TyrLeuLeu ProAsn ThrLeuIlePhe SerIleVal Gly Ile Gly j 95 100 105 ', CAC TGTCATGAT TTATTG GGTCTGGATCAA TCATATACA ATA 448 AGA CAA
I His CysHisAsp LeuLeu GlyLeuAspGln SerTyrThr Ile Arg Gln GTT GAT
Asn GlyArgVal AsnPhe AsnTyrPheCys TyrLeuAsp Ser Val Asp . 125 130 135 GGA AAA
Cys AsnIleGly LeuThr ProSerTrpLys LysSerLeu Leu Gly Lys ', 140 145 150 155 Ala Met Asp Ser Ser Ile Pro Met Val Phe Phe Gly Pro Phe Asn Pro CGC CAT CCC
Asn Leu AspHis Asp Arg Leu Pro Val His Gln Val Ala Arg His Pro ACA GTC TTT
Lys Asp HisLeu Ser His Gly Met Ser Leu Met Phe His Thr Val Phe ACT TCA ATT
Arg Trp TrpIle Gly Leu Val Ile Asp Asp Asp Gln Gly Thr Ser Ile CTC AGC TGT
Gln Phe SerAsp Leu Arg Glu Glu Gln Arg His Gly Ile Leu Ser Cys TTT AAC ACA
Leu Ala ValAsn Met Ile Pro Glu Met Gln Ile Tyr Met Phe Asn Thr ACA ATG GTT
Arg Ala IleTyr Asp Lys Gln Ile Thr Ser Ser AIa Lys Thr Met Val ATT ACT AGA
Val Ile TyrGly Glu Met Asn Ser Leu Glu Val Ser Phe Ile Thr Arg GAA ATC CAA
Arg Trp AspLeu Gly Ala Arg Arg Trp Ile Thr Thr Ser Glu Ile Gln ATC TTC CAT
Trp Asp IleLeu Asn Lys Lys Glu Thr Leu Asn Leu Phe Ile Phe His ATC GTT AGG
Gly Pro ThrPhe Ala His His Lys Glu Ile Pro Lys Leu Ile Val Arg ATG AAA TCT
Asn Phe GlnThr Met Asn Thr Ala Tyr Pro Val Asp Ile Met Lys Ser ATA AAT AAC
His Thr LeuGlu Trp Asn Tyr Phe Cys Ser Ile Ser Lys Ile Asn Asn AAA AAC ACA
Ser Ser MetAsp Leu Phe Thr Ser Asn Thr Leu Glu Trp Lys Asn Thr CAC AGT TTG
Ala Leu AsnTyr Asp Met Ala Met Asp Glu Gly Tyr Asn His Ser Leu GCT ACC CTT
Tyr Asn ValTyr Val Ala Ala His Tyr His Glu His Ile Ala Thr Leu GTA GAA ACT
Gln Gln GluSer Gln Lys Lys Val His Asn Arg Tyr Phe Val Glu Thr CAG CCAGATGATA AGTATGCCAA
.C
Val Cys Gln Gln Ile ,. GGGATGGCTC TAGGCTGCAT GGCACTATCC TTCTCGGCCATCACAATTCTAGTACTAGTC1536 ' TCTACAGTGT TGGCCARAAC AATAACTGTG GTCATGGCTTTCAAGTTCACTACTCCAGGA1776 TATGACAAAG GTACATAAAT AAATAAACAC TTTCCCCACC1?~WAAAAAAAAAA,AAA 2732 (2) INFORMATION FOR SEQ ID N0:20:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 432 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:20:
Met Lys Lys Leu Cys Ala Phe Thr Ile Ser Phe Leu Ser Leu Lys Phe Ser Leu Ile Leu Cys Cys Leu Thr Glu Ala Ser Cys Phe Trp Arg Ile Lys Asn Ser Glu Asp Ser Asp Gly Asp Leu Gln Arg Glu Cys His Phe Tyr Leu Trp Val Ile Asp Lys Pro Ile Glu Asp Asn Phe Tyr Asn Ser Val Leu Asn Phe Arg Ile Ser Ala Ser Glu Tyr Glu Phe Leu Leu Val Met Phe Phe Ala Thr Asp Glu Ile Asn Lys Asn Pro Tyr Leu Leu Pro Asn Ile Thr Leu Ile Phe Ser Ile Val Gly Gly His Cys His Asp Leu Leu Arg Gly Leu Asp Gln Ser Tyr Thr Gln Ile Asn Gly Arg Val Asn Phe Val Asn Tyr Phe Cys Tyr Leu Asp Asp Ser Cys Asn Ile Gly Leu Thr Gly Pro Ser Trp Lys Lys Ser Leu Lys Leu Ala Met Asp Ser Ser Ile Pro Met Val Phe Phe Gly Pro Phe Asn Pro Asn Leu Arg Asp His Asp Arg Leu Pro His Val His Gln Val Ala Pro Lys Asp Thr His Leu Ser His Gly Met Val Ser Leu Met Phe His Phe Arg Trp Thr Trp Ile Gly Leu Val Ile Ser Asp Asp Asp Gln Gly Ile Gln Phe Leu Ser Asp Leu Arg Glu Glu Ser Gln Arg His Gly Ile Cys Leu Ala Phe Val Asn Met Ile Pro Glu Asn Met Gln Ile Tyr Met Thr Arg Ala Thr Ile Tyr Asp Lys Gln Ile Met Thr Ser Ser Ala Lys Val val Ile Ile Tyr Gly Glu Met Asn Ser Thr Leu Glu Val Ser Phe Arg Arg Trp Glu Asp Leu Gly Ala Arg Arg Ile Trp Ile Thr Thr Ser Gln Trp Asp Ile Ile Leu Asn Lys Lys Glu Phe Thr Leu Asn Leu Phe His Gly Pro Ile Thr Phe Ala His His Lys Val Glu Ile Pro Lys Leu Arg Asn Phe Met Gln Thr Met Asn Thr Ala Lys Tyr Pro Val Asp Ile Ser His Thr Ile Leu Glu Trp Asn Tyr Phe Asn Cys Ser Ile Ser Lys Asn Ser Ser Lys Met Asp Leu Phe Thr Ser Asn Asn Thr Leu Glu Trp Thr Ala Leu His Asn Tyr Asp Met Ala Met Ser Asp Glu Gly Tyr Asn Leu Tyr Asn Ala Val Tyr Val Ala Ala His Thr Tyr His Glu His Ile Leu Gln Gln Val Glu Ser Gln Lys Lys Val Glu His Asn Arg Tyr Phe Thr Val Cys Gln Gln Ile (2) INFORMATION FOR SEQ ID N0:21:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2962 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 81...1601 (D) OTHER INFORMATION: VR11 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:21:
AATGTGTGTG
TGATGTTTTT
CTACATCAGA
AACGGATTTC
ACAACAACTC
ATG
AAG
AAG
CTC
TGT
GCT
TTC
ACT
ATT
TCA
Met Lys Lys Leu Cys Ala Phe Thr Ile Ser TTG AAG TGC GCA
Phe SerLeu Phe Ser Leu Ile Leu Cys Leu Thr Glu Leu Lys Cys Ala TGC AGG GAT TTG
Ser PheTrp Ile Lys Asn Ser Glu Ser Asp Gly Asp Cys Arg Asp Leu AGA CAT ATT GAA
Gln GluCys Phe Tyr Leu Trp Val Asp Lys Pro Ile Arg His Ile Glu AAT AAT AGA GAA .
W0.99/00422 PC'T/IJS98l13680 Asp Asn Phe Tyr Asn Ser Val Leu Asn Phe Arg Ile Ser Ala Ser Glu Tyr Glu Phe Leu Leu Val Met Phe Phe Ala Thr Asp Glu Ile Asn Lys Asn Pro Tyr Leu Leu Pro Asn Ile Thr Leu Ile Phe Ser Ile Val Gly Gly His Cys His Asp Leu Leu Arg Gly Leu Asp Gln Ser Tyr Thr Gln Ile Asn Gly Arg Val Asn Phe Val Asn Tyr Phe Cys Tyr Leu Asp Asp Ser Cys Asn Ile Gly Leu Thr Gly Pro Ser Trp Lys Lys Ser Leu Lys Leu Ala Met Asp Ser Ser Ile Pro Met Val Phe Phe Gly Pro Phe Asn Pro Asn Leu Arg Asp His Asp Arg Leu Pro His Val His Gln Val Ala ', CCC AAG GAC ACA CAT TTA TCC CAT GGC ATG GTC TCC TTG ATG TTT CAT 686 ', Pro Lys Asp Thr His Leu Ser His Gly Met Val Ser Leu Met Phe His ', 190 195 200 Phe Arg Trp Thr Trp Ile Gly Leu Val Ile Ser Asp Asp Asp Gln Gly Ile Gln Phe Leu Ser Asp Leu Arg Glu Glu Ser Gln Arg His Gly Ile Cys Leu Ala Phe Val Asn Met Ile Pro Glu Asn Met Gln Ile Tyr Met Thr Arg Ala Thr Ile Tyr Asp Lys Gln Ile Met Thr Ser Ser Ala Lys I _ Val Val Ile Ile Tyr Gly Glu Met Asn Ser Thr Leu Glu Val Ser Phe Arg Arg Trp Glu Asp Leu Gly Ala Arg Arg Ile Trp Ile Thr Thr Ser Gln Trp Asp Ile Ile Leu Asn Lys Lys Glu Phe Thr Leu Asn Leu Phe His Gly Pro Ile Thr Phe Ala His His Lys Val Glu Ile Pro Lys Leu .
AAT CAA GCC TAC GAT
Arg Phe Met Thr MetAsnThr Lys Pro Val Ile Asn Gln Ala Tyr Asp CAT CTG TTT TGT TCT
Ser Thr Ile Glu TrpAsnTyr Asn Ser Ile Lys His Leu Phe Cys Ser AGC ATG TCC AAC GAA
Asn Ser Lys Asp LeuPheThr Asn Thr Leu Trp Ser Met Ser Asn Glu GCA AAC ATG GAT TAC
Thr Leu His Tyr AspMetAla Ser Glu Gly Asn Ala Asn Met Asp Tyr TAT GTT CAC TAC CAC
Leu Asn Ala Tyr ValAlaAla Thr His Glu Ile Tyr Val His Tyr His CAA GAG GTA CAC TAT
Leu Gln Val Ser GlnLysLys Glu Asn Arg Phe Gln Glu Val His Tyr GTT CAG ATG ACC TTT
Thr Cys Gln Val SerSerLeu Lys Arg Val Thr Val Gln Met Thr Phe CCG GAA AAG AGG CAG
Asn Val Gly Leu ValAsnMet His Glu Asn Cys Pro Glu Lys Arg Gln GAG ATT AAT CCA CTT
Thr Tyr Asp Phe IleIleTrp Phe Gln Gly Gly Glu Ile Asn Pro Leu AAA ATA CCT TTT AGT
Leu Leu Lys Gly SerTyrIle Cys Pro Lys Gln Lys Ile Pro Phe Ser CTT TCT TGG ATG ACA
Gln His Ile Asp AspLeuGlu Ala Gly Gly Ser Leu Ser Trp Met Thr TAGAACAGTG ACCTAC
TGTGAAATGT
CCAGATGATA
AGTATGCCAA
Ile - lOS -AATTAAGTAA TATACAGATT
TAAATAAATA AACACTTTCC CCACAAAAAAAAAAAAP.AAA AAAAA 2962 (2) INFORMATION FOR SEQ ID N0:22:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 507 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:22:
Met Lys Lys Leu Cys Ala Phe Thr Ile Ser Phe Leu Ser Leu Lys Phe Ser Leu Ile Leu Cys Cys Leu Thr Glu Ala Ser Cys Phe Trp Arg Ile Lys Asn Ser Glu Asp Ser Asp Gly Asp Leu Gln Arg Glu Cys His Phe Tyr Leu Trp Val Ile Asp Lys Pro Ile Glu Asp Asn Phe Tyr Asn Ser Val Leu Asn Phe Arg Ile Ser Ala Ser Glu Tyr Glu Phe Leu Leu Val Met Phe Phe Ala Thr Asp Glu Ile Asn Lys Asn Pro Tyr Leu Leu Pro Asn Ile Thr Leu Ile Phe Ser Ile Val Gly Gly His Cys His Asp Leu Leu Arg Gly Leu Asp Gln Ser Tyr Thr Gln Ile Asn Gly Arg Val Asn Phe Val Asn Tyr Phe Cys Tyr Leu Asp Asp Ser Cys Asn Ile Gly Leu Thr Gly Pro Ser Trp Lys Lys Ser Leu Lys Leu Ala Met Asp Ser Ser Ile Pro Met Val Phe Phe Gly Pro Phe Asn Pro Asn Leu Arg Asp His Asp Arg Leu Pro His Val His Gln Val Ala Pro Lys Asp Thr His Leu Ser His Gly Met Val Ser Leu Met Phe His Phe Arg Trp Thr Trp Ile Gly Leu Val Ile Ser Asp Asp Asp Gln Gly Ile Gln Phe Leu Ser Asp Leu Arg Glu Glu Ser Gln Arg His Gly Ile Cys Leu Ala Phe Val Asn Met Ile Pro Glu Asn Met Gln Ile Tyr Met Thr Arg Ala Thr Ile Tyr Asp Lys Gln Ile Met Thr Ser Ser Ala Lys Val Val Ile Ile Tyr Gly Glu Met Asn Ser Thr Leu Glu Val Ser Phe Arg Arg Trp Glu Asp Leu Gly Ala Arg Arg Ile Trp Ile Thr Thr Ser Gln Trp Asp Ile Ile Leu Asn Lys Lys Glu Phe Thr Leu Asn Leu Phe His Gly Pro Ile Thr Phe Ala His His Lys Val Glu Ile Pro Lys Leu Arg Asn Phe Met Gln Thr Met Asn Thr Ala Lys Tyr Pro Val Asp Ile Ser His Thr Ile Leu Glu Trp Asn Tyr Phe Asn Cys Ser Ile Ser Lys Asn Ser Ser Lys Met Asp Leu Phe Thr Ser Asn Asn Thr Leu Glu Trp Thr Ala Leu His Asn Tyr Asp Met Ala Met Ser Asp Glu Gly Tyr Asn Leu Tyr Asn Ala Val Tyr Val Ala Ala His Thr Tyr His Glu His Ile Leu Gln Gln Val Glu Ser Gln Lys Lys Val Glu His Asn Arg Tyr Phe Thr Val Cys Gln Gln Val Ser Ser Leu Met Lys Thr Arg Val Phe Thr Asn Pro Val Gly Glu Leu Val Asn Met Lys His Arg Glu Asn Gln Cys Thr Glu Tyr Asp Ile Phe Ile Ile Trp Asn Phe Pro Gln Gly Leu Gly Leu Lys Leu Lys Ile Gly Ser Tyr Ile Pro Cys Phe Pro Lys Ser Gln Gln Leu His Ile Ser Asp Asp Leu Glu Trp Ala Met Gly Gly Thr Ser Ile (2) INFORMATION FOR SEQ ID N0:23:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2821 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 60...992 (D) OTHER INFORMATION: VR12 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:23:
Me t Lys Gln Leu Cys Thr Phe Thr Ile Ser Leu Leu Phe Leu Lys Phe Se r Leu Ile Leu Cys Cys Trp Ser Glu Pro Ser Cys Phe Trp Arg Ile Ly s Lys Ser Glu Asp Asn Asp Gly Asp Leu Gln Arg Glu Cys His Phe Ty r Leu Trp Lys Thr Asp Glu Pro Ile Glu Asp Ser Phe Tyr Asn Tyr As p Leu Ser Phe Arg Ile Ala Gly Ser Glu Tyr Glu Leu Leu Leu Val Me t Phe Phe Ala Thr Asp Glu Ile Asn Lys Asn Pro Tyr Leu Leu Pro As.
i~ ACA TGA GTT TGA TGT TCT CCA TCA TTG GTG GAA ACT GTC ATG ATT TAT 396 n Met Ser Leu Met Phe Ser Ile Ile Gly Gly Asn Cys His Asp Leu Le a Arg Ser Leu Asp Gln Glu Tyr Ala Gln Ile Asp Gly His Met Asn Ph a Val Asn Tyr Phe Cys Tyr Leu Asp Asp Ser Cys Ala Thr Gly Leu Th r Gly Pro Ser Trp Lys Thr Ser Leu Lys Leu Ala Met His Ser Ser Me t Pro Leu Val Phe Phe Gly Pro Phe Asn Pro Asn Leu Arg Asp His As ', p Arg Leu Pro His Val His Gln Val Ala Pro Lys Asp Thr His Leu Se r His Gly Met Val Ser Leu Met Phe His Phe Arg Trp Thr Trp Ile Gl y Leu Val Ile Ser Asp Asp Asp Gln Gly Ile Gln Phe Leu Ser Asp Le a Arg Glu Glu Ser Gln Arg His Gly Ile Cys Leu Ala Phe Val Asn Me ', 25 230 235 240 t Ile Pro Glu Asn Met Gln Ile Tyr Met Thr Arg Ala Thr Ile Tyr As p Thr Gln Ile Met Thr Ser Ser Ala Lys Val Val Ile Ile Tyr Gly As ',, p Met Asn Ser Thr Leu Glu Ala Ser Phe Arg Arg Trp Glu Glu Leu ', 275 280 285 y Ala Arg Arg Ile Trp Ile Thr Thr Thr Gln Trp Asp Val Ile Thr As n Lys Lys Arg Leu His Pro TGAGGTTTCC
AATGAAACAG
ATATGGAACA
(2) INFORMATION FOR SEQ ID N0:24:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 311 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:24:
Met Lys Gln Leu Cys Thr Phe Thr Ile Ser Leu Leu Phe Leu Lys Phe Ser Leu Ile Leu Cys Cys Trp Ser Glu Pro Ser Cys Phe Trp Arg Ile Lys Lys Ser Glu Asp Asn Asp Gly Asp Leu Gln Arg Glu Cys His Phe Tyr Leu Trp Lys Thr Asp Glu Pro Ile Glu Asp Ser Phe Tyr Asn Tyr Asp Leu Ser Phe Arg Ile Ala Gly Ser Glu Tyr Glu Leu Leu Leu Val Met Phe Phe Ala Thr Asp Glu Ile Asn Lys Asn Pro Tyr Leu Leu Pro Asn Met Ser Leu Met Phe Ser Ile Ile Gly Gly Asn Cys His Asp Leu Leu Arg Ser Leu Asp Gln Glu Tyr Ala Gln Ile Asp Gly His Met Asn Phe Val Asn Tyr Phe Cys Tyr Leu Asp Asp Ser Cys Ala Thr Gly Leu Thr Gly Pro Ser Trp Lys Thr Ser Leu Lys Leu Ala Met His Ser Ser Met Pro Leu Val Phe Phe Gly Pro Phe Asn Pro Asn Leu Arg Asp His Asp Arg Leu Pro His Val His Gln Val Ala Pro Lys Asp Thr His Leu Ser His Gly Met Val Ser Leu Met Phe His Phe Arg Trp Thr Trp Ile Gly Leu Val Ile Ser Asp Asp Asp Gln Gly Ile Gln Phe Leu Ser Asp Leu Arg Glu Glu Ser Gln Arg His Gly Ile Cys Leu Ala Phe Val Asn Met Ile Pro Glu Asn Met Gln Ile Tyr Met Thr Arg Ala Thr Ile Tyr Asp Thr Gln Ile Met Thr Ser Ser Ala Lys Val Val Ile Ile Tyr Gly Asp Met Asn Ser Thr Leu Glu Ala Ser Phe Arg Arg Trp Glu Glu Leu Gly Ala Arg Arg Ile Trp Ile Thr Thr Thr Gln Trp Asp Val Ile Thr Asn Lys Lys Arg Leu His Pro {2) INFORMATION FOR SEQ ID N0:25:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2773 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
{A) NAME/KEY: Coding Sequence {B) LOCATION: 3...1238 (D) OTHER INFORMATION: VR13 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:25:
CGG ATA AAG AAT AGT GAA
GAT AAT GAT GGA
Ala Ser Cys Phe Trp Arg Ile Lys Asn Ser Glu Asp Asn Asp Gly AAA CCA
Asp Leu Gln Arg Glu Cys His Phe Tyr Leu Gly Ala Val Asp Lys Pro GCA GCA
Ile Glu Asp Asn Phe Tyr Asn Ser Leu Leu Lys Phe Arg Ile Ala Ala GAG ATC
Ser Glu Tyr Glu Phe Leu Leu Val Met Phe Phe Ala Thr Asp Glu Ile TCC ATC
Asn Lys Asn Pro Tyr Leu Leu Pro Asn Ile Thr Leu Met Phe Ser Ile GCA TAT
Ile Gly Gly Asn Cys His Asp Leu Leu Arg Gly Leu Asp Gln Ala Tyr TAT TTA
Thr Gln Ile Asn Gly His Met Asn Phe Val Asn Tyr Phe Cys Tyr Leu Asp Asp Ser Cys Ala Ile Gly Leu Thr Gly Pro Ser Trp Lys Thr Ser Leu Lys Leu Ala Met His Ser Ser Met Pro Leu Val Phe Phe Gly Ser .
CGG
PheAsn ProAsnLeu HisAspHis Asp Leu HisHisValHis Gln Arg CAT
ValAla ThrLysAsp ThrHisLeu Ser Gly IleValSerLeu Met His CTG
PheHis PheArgTrp ThrTrpIle Gly Val IleSerAspAsp Asp Leu AGA
LysGIy IleGlnPhe LeuSerAsp Leu Glu GluSerGlnArg His Arg ATC
GlyIle CysLeuAla PheValAsn Met Pro GluAsnMetGln Ile Ile AAA
TyrMet ThrArgAla ThrIleTyr Asp Gln IleMetThrSer Leu Lys ATG
AlaLys ValValIle IleTyrGly Glu Asn SerThrLeuGlu Val Met GCT
SerPhe ArgArgTrp GluAsnLeu Gly Arg ArgIleTrpIle Thr Ala AAA
ThrSer GlnTrpAsp ValIleThr Asn Lys GluPheThrLeu Asn Lys CAC
LeuPhe HisGlyThr IleThrPhe Ala Arg ArgPheGluIle Pro His AAC
LysPhe LysLysPhe MetGlnThr Met Thr AlaLysTyrPro Val Asn AAT
AspIle SerHisThr IleLeuGlu Trp Tyr PheAsnCysSer Ile Asn ATT
SerLys AsnSerSer LysMetAsp His Thr PheAsnAsnThr Leu Ile ATG
GluTrp ThrAlaLeu HisAsnTyr Asp Val MetSerAspGlu Gly Met GTG
TyrAsn LeuTyrAsn AlaValTyr Ala Ala HisThrTyrHis Glu Val AAA
HisIle PheGlnGln ValGluSer Gln Lys AlaLysProLys Arg Lys Phe Phe Thr Val Cys Gln Gln Gln Ile Trp Asn Ser Val TTTTCACTGTGTCTGTTTCTACAGTGTTGGCCAAAACAAT AACTGTGGTC ATGGCTTTCA1'610 (2) INFORMATION FOR SEQ ID N0:26:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 412 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:26:
Ala Ser Cys Phe Trp Arg Ile Lys Asn Ser Glu Asp Asn Asp Gly Asp Leu Gln Arg Glu Cys His Phe Tyr Leu Gly Ala Val Asp Lys Pro Ile Glu Asp Asn Phe Tyr Asn Ser Leu Leu Lys Phe Arg Ile Ala Ala Ser Glu Tyr Glu Phe Leu Leu Val Met Phe Phe Ala Thr Asp Glu Ile Asn Lys Asn Pro Tyr Leu Leu Pro Asn Ile Thr Leu Met Phe Ser Ile Ile Gly Gly Asn Cys His Asp Leu Leu Arg Gly Leu Asp Gln Ala Tyr Thr Gln Ile Asn Gly His Met Asn Phe Val Asn Tyr Phe Cys Tyr Leu Asp Asp Ser Cys Ala Ile Gly Leu Thr Gly Pro Ser Trp Lys Thr Ser Leu Lys Leu Ala Met His Ser Ser Met Pro Leu Val Phe Phe Gly Ser Phe Asn Pro Asn Leu His Asp His Asp Arg Leu His His Val His Gln Val Ala Thr Lys Asp Thr His Leu Ser His Gly Ile Val Ser Leu Met Phe His Phe Arg Trp Thr Trp Ile Gly Leu Val Ile Ser Asp Asp Asp Lys Gly Ile Gln Phe Leu Ser Asp Leu Arg Glu Glu Ser Gln Arg His Gly Ile Cys Leu Ala Phe Val Asn Met Ile Pro Glu Asn Met Gln Ile Tyr Met Thr Arg Ala Thr Ile Tyr Asp Lys Gln Ile Met Thr Ser Leu Ala Lys Val Val Ile Ile Tyr Gly Glu Met Asn Ser Thr Leu Glu Val Ser Phe Arg Arg Trp Glu Asn Leu Gly Ala Arg Arg Ile Trp Ile Thr Thr Ser Gln Trp Asp Val Ile Thr Asn Lys Lys Glu Phe Thr Leu Asn Leu Phe His Gly Thr Ile Thr Phe Ala His Arg Arg Phe Glu Ile Pro Lys Phe Lys Lys Phe Met Gln Thr Met Asn Thr Ala Lys Tyr Pro Val Asp Ile Ser His Thr Ile Leu Glu Trp Asn Tyr Phe Asn Cys Ser Ile Ser Lys Asn Ser Ser Lys Met Asp His Ile Thr Phe Asn Asn Thr Leu Glu Trp Thr Ala Leu His Asn Tyr Asp Met Val Met Ser Asp Glu Gly Tyr Asn Leu Tyr Aan Ala Val Tyr Ala Val Ala His Thr Tyr His Glu His Ile Phe Gln Gln Val Glu Ser Gln Lys Lys Ala Lys Pro Lys Arg Phe Phe Thr Val Cys Gln Gln Gln Ile Trp Asn Ser Val (2) INFORMATION FOR SEQ ID N0:27:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3108 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 116...2527 (D) OTHER INFORMATION: VR14 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:27:
TAAACATCTC
CTTTGCCTAA
AGAAATAAAA
GCTGGTAGAA
ATCTGATGTG
TGGCACTTCA AAAAG
CAATCCACAC ATG
TGCCCAGGTT
Met TTC GTC TTC AAT ACA
Phe Ile Met Glu Phe Leu Leu Ile LeuLeu Met Phe Val Phe Asn Thr TTC CCC TGC AGA AAT
Ala Asn Ile Asp Arg Phe Trp Ile LeuAsp Glu Phe Pro Cys Arg Asn GAT TTG TTA GCT ATC
Ile Met Glu Tyr Gly Ser Cys Phe LeuAla Ala Asp Leu Leu Ala Ile , !, GTT CAG ACA CCC GAA GATTAT TTCAACAAGACT CTTAAT GTT 310 ATT AAT
Val Gln Thr Pro Glu AspTyr PheAsnLysThr LeuAsn Val Ile Asn AAA CAC
Leu Lys Thr Thr Asn LysTyr AlaLeuAlaLeu ValPhe Ala Lys His AAC AAT
Met Asp Glu Ile Arg ProAsp LeuLeuProAsn MetSer Leu Asn Asn ACT GGC
Ile Ile Arg Tyr Leu ArgCys AspGlyLysThr ValIle Pro Thr Gly TTT AAA
', Thr Pro Tyr Leu Arg LysLys GluSerProIle ProAsn Tyr Phe Lys i ', TTC TGT AAT GAA ACT TGTTCC TATCTGCTTACA GGACCC CAT 550 GAG ATG
Phe Cys Asn Glu Thr CysSer TyrLeuLeuThr GlyPro His Glu Met TTA TTC
Trp Glu Val Ser Gly TrpLys HisMetAsnSer PheLeu Ser Leu Phe CAG ACC
Pro Arg Ile Leu Leu TyrGly ProPheHisSer IlePhe Ser Gln Thr TAT TAT
Asp Asp Glu Gln Pro LeuTyr GlnMetAlaPro LysAsp Thr Tyr Tyr Ser Leu Ala Leu Ala Met Val Ser Phe Ile Leu Tyr Phe Ser Trp Asn GTC GAT CAA TTT
ATT
Trp Ile GlyLeuVal Pro Asp Asp Gly Asn Gln Leu Ile Asp Gln Phe CAG AAC GAA GCC
Leu Glu LeuLysLys Ser Glu Lys Ile Cys Phe Phe Gln Asn Glu Ala GTT GTT TTT ACT
Val Lys MetIleSer Asp Asp Ser Pro Gln Asn Glu Val Val Phe Thr ATT TCA ACA ATC
Met Tyr TyrAsnGln Val Met Ser Asn Val Ile Ile Ile Ser Thr Ile AAT GAT ATC TGG
Tyr Gly GluThrTyr Phe Ile Leu Phe Arg Met Glu Asn Asp Ile Trp AGA ATC ACA AAT
', Pro Pro IleLeuGln Ile Trp Thr Lys Gln Leu Phe Arg Ile Thr Asn .
ACC ACA TTC TCA
AGG
AAA
AAA
GAC
ATA
AGT
ProThrArgLys Asp SerHis Gly PheTyr GlySerLeu Lys Ile Thr CAC GGT GGT
ThrPheLeuPro His ValIle Ser PheLys AsnPheVal His Gly Gly CAT AGA TTA
GlnThrTrpPhe Leu AsnThr Asp TyrLeu ValMetGln His Arg Leu TTT TAT GCA
GluTrpLysTyr Asn GluAsp Ser SerThr CysLysIle Phe Tyr Ala TCA AAT GAT
LeuLysAsnAsn Ser AlaSer Phe TrpLeu MetGluGln Ser Asn Asp ACC AGT CAT
LysPheAspMet Phe GluAsn Ser AsnIle TyrAsnAla Thr Ser His GCC GCC ATG
ValHisAlaIle His LeuHis Glu AsnLeu GlnGlnAla Ala Ala Met ATA AAT GAG
AspAsnGlnAla Asp GlyLys Lys ProSer SerSerHis Ile Asn Glu AAC TTT ATT
CysLeuLysVal Ser LeuArg Arg TyrPhe ThrAsnPro Asn Phe Ile GTG ATG GTA
ProGlyAspLys Phe LysGln Arg IleMet HisAspGlu Val Met Val CAC GTG CAA
TyrAspIleVal Phe AsnLeu Ser HisLeu GlyIleLys His Val Gln AAG AGC CCA
MetLysLeuGly Phe ProTyr Leu HisGly ArgHisSer Lys Ser Pro GAC ATT ACA
HisLeuTyrVal Arg GluLeu Ala GlyArg ArgLysMet Asp Ile Thr TGC GCT CCT
ProSerSerVal Ser AspCys Ser GlyPhe ArgArgLeu Cys Ala Pro ATG GCC GTT
TrpLysGluGly Ala CysCys Phe CysSer ProCysPro Met Ala Val TCT GAG GTA
GluAsnGluIle Asn ThrThr Val LeuCys ValPheVal Ser Glu Val ACT ATT AAT .
WO 99/~4Z2 PCT/US98/13680 Lys His His Asp Thr Pro Ile Val Lys Ala Asn Asn Arg Ser Leu Ser Tyr Leu Leu Ser Leu CysPheLeuCys SerPhe Phe Leu Met Met Ser CCA ATC
Phe IleGly Leu Asn ArgAla CysValLeuGln GlnIle Thr Pro Ile TTC GTT
Phe GlyIle Val Thr MetAla SerThrValLeu AlaLys Thr Phe Val CTG GTC
Val ThrVal Val Ala PheLys ThrAspProGly ArgArg Leu Leu Val GTA CCC
Arg AsnPhe Leu Ser GlyThr AsnTyrIleIle ProIle Cys Val Pro TGT GCA
Ser LeuLeu Gln Val LeuCys IleTrpLeuAla ValSer Pro Cys Ala ATT ACT
Pro PheVal Asp Asp GluHis LeuHisGlyHis IleIle Ile Ile Thr GGC GCA
Val CysAsn Lys Ser ValThr PheTyrCysIle LeuGly Tyr Gly Ala ', 690 695 700 705 GCA TTC
Leu AlaCys Leu Leu GlyAsn SerValAlaPhe LeuAla Lys Ala Phe ', AAT CTGCCT GAC TTC AATGAA AAGTTCTTGACC TTCAGC ATG 2326 ACA GCC
Asn LeuPro Asp Phe AsnGlu LysPheLeuThr PheSer Met Thr Ala AGT ACC
Leu ValPhe Cys Val TrpVal PheLeuProVal TyrHis Ser Ser Thr CAC GTG
', Thr LysGly Lys Met ValAla GluIlePheSer IleLeu Ala His Val ', TCC AGTGCT GGG CTT GGATGT TTTGTACCCAAG ATTTAT ATC 2470 ATC ATA
Ser SerAla Gly Leu GlyCys PheValProLys IleTyr Ile Ile Ile CCA TCG
Ile LeuMet Arg Glu ArgAsn ThrGlnLysIle ArgGlu Lys Pro Ser Ser Tyr Phe ACTAAACTCT
CTAATTATTA
CAATTTTATT
CTGTCAAATAAAAATATATTATATCCAAAAp,~~iAAAAAAAA p~AAAAAAAp,A 3108 AA
{2) INFORMATION FOR SEQ ID N0:28:
(i) SEQUENCE CHARACTERISTICS:
(Ay LENGTH: 804 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:28:
Met Phe Ile Phe Met Glu Val Phe Phe Leu Leu Asn Ile Thr Leu Leu Met Ala Asn Phe Ile Asp Pro Arg Cys Phe Trp Arg Ile Asn Leu Asp Glu Ile Met Asp Glu Tyr Leu Gly Leu Ser Cys Ala Phe Ile Leu Ala Ala Val Gln Thr Pro Ile Glu Asn Asp Tyr Phe Asn Lys Thr Leu Asn Val Leu Lys Thr Thr Lys Asn His Lys Tyr Ala Leu Ala Leu Val Phe Ala Met Asp Glu Ile Asn Arg Asn Pro Asp Leu Leu Pro Asn Met Ser Leu Ile Ile Arg Tyr Thr Leu Gly Arg Cys Asp Gly Lys Thr Val Ile Pro Thr Pro Tyr Leu Phe Arg Lys Lys Lys Glu Ser Pro Ile Pro Asn Tyr Phe Cys Asn Glu Glu Thr Met Cys Ser Tyr Leu Leu Thr Gly Pro His Trp Glu Val Ser Leu Gly Phe Trp Lys His Met Asn Ser Phe Leu Ser Pro Arg Ile Leu Gln Leu Thr Tyr Gly Pro Phe His Ser Ile Phe Ser Asp Asp Glu Gln Tyr Pro Tyr Leu Tyr Gln Met Ala Pro Lys Asp Thr Ser Leu Ala Leu Ala Met Val Ser Phe Ile Leu Tyr Phe Ser Trp Asn Trp Ile Gly Leu Val Ile Pro Asp Asp Asp Gln Gly Asn Gln Phe Leu Leu Glu Leu Lys Lys Gln Ser Glu Asn Lys Glu Ile Cys Phe Ala Phe Val Lys Met Ile Ser Val Asp Asp Val Ser Phe Pro Gln Asn Thr Glu Met Tyr Tyr Asn Gln Ile Val Met Ser Ser Thr Asn Val Ile Ile Ile Tyr Gly Glu Thr Tyr Asn Phe Ile Asp Leu Ile Phe Arg Met Trp Glu Pro Pro Ile Leu Gln Arg Ile Trp Ile Thr Thr Lys Gln Leu Asn Phe Pro Thr Arg Lys Lys Asp Ile Ser His Gly Thr Phe Tyr Gly Ser Leu Thr Phe Leu Pro His His Gly Val Ile Ser Gly Phe Lys Asn Phe Val Gln Thr Trp Phe His Leu Arg Asn Thr Asp Leu Tyr Leu Val Met Gln Glu Trp Lys Tyr Phe Asn Tyr Glu Asp Ser Ala Ser Thr Cys Lys Ile Leu Lys Asn Asn Ser Ser Asn Ala Ser Phe Asp Trp Leu Met Glu Gln Lys Phe Asp Met Thr Phe Ser Glu Asn Ser His Asn Ile Tyr Asn Ala Val His Ala Ile Ala His Ala Leu His Glu Met Asn Leu Gln Gln Ala Asp Asn Gln Ala Ile Asp Asn Gly Lys Lys Glu Pro Ser Ser Ser His Cys Leu Lys Val Asn Ser Phe Leu Arg Arg Ile Tyr Phe Thr Asn Pro Pro Gly Asp Lys Val Phe Met Lys Gln Arg Val Ile Met His Asp Glu Tyr Asp Ile Val His Phe Val Asn Leu Ser Gln His Leu Gly Ile Lys Met Lys Leu Gly Lys Phe Ser Pro Tyr Leu Pro His Gly Arg His Ser His Leu Tyr Val Asp Arg Ile Glu Leu Ala Thr Gly Arg Arg Lys Met Pro Ser Ser Val Cys Ser Ala Asp Cys Ser Pro Gly Phe Arg Arg Leu Trp Lys Glu Gly Met Ala Ala Cys Cys Phe Val Cys Ser Pro Cys Pro Glu Asn Glu Ile Ser Asn Glu Thr Thr Val Val Leu Cys Val Phe Val Lys His His Asp Thr Pro Ile Val Lys Ala Asn Asn Arg Ser Leu Ser Tyr Leu Leu Leu Met Ser Leu Met Ser Cys Phe Leu Cys Ser Phe Phe Phe Ile Gly Leu Pro Asn Arg Ala Ile Cys Val Leu Gln Gln Ile Thr Phe Gly Ile Val Phe Thr Met Ala Val Ser Thr Val Leu Ala Lys Thr Val Thr Val Val Leu Ala Phe Lys Val Thr Asp Pro Gly Arg Arg Leu Arg Asn Phe Leu Val Ser Gly Thr Pro Asn Tyr Ile Ile Pro Ile Cys Ser Leu Leu Gln Cys Val Leu Cys Ala Ile Trp Leu Ala Val Ser Pro Pro Phe Val Asp Ile Asp Glu His Thr Leu His Gly His Ile Ile Ile Val Cys Asn Lys Gly Ser Val Thr Ala Phe Tyr Cys Ile Leu Gly Tyr Leu Ala Cys Leu Ala Leu Gly Asn Phe Ser Val Ala Phe Leu Ala Lys Asn Leu Pro Asp Thr Phe Asn Glu Ala Lys Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ser Val Trp Val Thr Phe Leu Pro Val Tyr His Ser Thr Lys Gly Lys His Met Val Ala Val Glu Ile Phe Ser Ile Leu Ala Ser Ser Ala Gly Ile Leu Gly Cys Ile Phe Val Pro Lys Ile Tyr Ile Ile Leu Met Arg Pro Glu Arg Asn Ser Thr Gln Lys Ile Arg Glu Lys Ser Tyr Phe (2) INFORMATION FOR SEQ ID N0:29:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3689 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii} MOLECULE TYPE: cDNA
(ix} FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 39...419 (D) OTHER INFORMATION: VR15 (xi) SEQUENCE DESCRIPTION: SEQ ID
N0:29:
GGAAAAAT ATG TTC ATT TTC ATG GGA
Met Phe Ile Phe Met Gly CTC ATG AAT
Val Phe Phe Leu Leu Asn Ile Thr Leu Ala Asn Phe Ile Leu Met Asn GAT GAA TAT
Pro Arg Cys Phe Trp Arg Ile Asn Leu Ile Thr Asp Glu Asp Glu Tyr GCG GCA ACT
Leu Gly Leu Ser Cys Thr Phe Ile Leu Val Gln Thr Pro Ala Ala Thr AAT GTT AAA
Glu Lys Asp Tyr Phe Asn Lys Thr Leu Leu Lys Thr Thr Asn Val Lys TTT GCA AAC
Asn His Lys Tyr Ala Leu Ala Leu Val Met Asp Glu Ile Phe Ala Asn TCT TTG ACT
Arg Asn Pro Asp Leu Leu Pro Asn Met Ile Ile Arg Tyr Ser Leu Thr ACA CCT TTT
Leu Gly Leu Cys Asp Gly Lys Thr Val Thr Pro Tyr Leu Thr Pro Phe TAATTATTTC TGTAATGAAG AGACTAT
His Lys Lys Lys Thr Lys Pro Tyr Pro GGATGTATCT
GCTTACCTAT
TCAGATGGCC
GAAATGGAAC
AGAGTTGAAG
TGTTGATGAT
ATCCACAAAT
AATGTGGGAA
TACCAGGAAG
CCATGGTGAG
AGATTTATAT
TTGTAAAATA
GTTTGACATG
CCATGCCCTC
AGGAGCCAGT
TCCTCTTGGG
TATTCACTTT
GACACTCTCA
CCTCTGTGTG
CAGCCTGCTG
CCTCTCCATT
GGATGGGAAT
GTTAACACTA
GCATTTTGGT
GCTGGTCCTC
TAACATCATA
TCATTTAAAT
CATTACATAT
TACCTTGACA
TGGATCAATG
TTCAGAAAGG
TGGCCTTCTG
ACACTCCTAT
TGTTCTGTTT
TACAGCAAAT
CAGTCACTGT
TGGTATCAGG
GTGCAATCTG
GCCATATCAT
ATTTGGCCTG
ACACATTCAA
TCACCTTCCT
TCTCCATCTT
TCATTTTAAT
GAACAAATAT
TTATAGTGCA
AGTATCATAT
TTCATTTTCT
TCATGGAGAT
CTTTGTGTAG
TCAAATAATC
TATTTTCTGA
CTTCAATCTA
ATATATTATA
(2) INFORMATION FOR SEQ ID N0:30:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 127 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:30:
Met Phe Ile Phe Met Gly Val Phe Phe Leu Leu Asn Ile Thr Leu Leu Met Ala Asn Phe Ile Asn Pro Arg Cys Phe Trp Arg Ile Asn Leu Asp Giu Ile Thr Asp Glu Tyr Leu Gly Leu Ser Cys Thr Phe Ile Leu Ala Ala Val Gln Thr Pro Thr Glu Lys Asp Tyr Phe Asn Lys Thr Leu Asn Val Leu Lys Thr Thr Lys Asn His Lys Tyr Ala Leu Ala Leu Val Phe Ala Met Asp Glu Ile Asn Arg Asn Pro Asp Leu Leu Pro Asn Met Ser Leu Ile Ile Arg Tyr Thr Leu Gly Leu Cys Asp Gly Lys Thr Val Thr Pro Thr Pro Tyr Leu Phe His Lys Lys Lys Thr Lys Pro Tyr Pro (2) INFORMATION FOR SEQ ID N0:31:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3896 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 36...263 (D) OTHER INFORMATION:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:31:
TGT GTT
Met Lys Asn Leu Cys Val TTG TGC CAT
Phe Thr Leu Ser Phe Phe Leu Leu Glu Phe Ser Leu Ile Leu Cys His GAA GAT AAT
Leu Thr Glu Pro Ile Cys Phe Trp Arg Ile Asn Asn Asn Glu Asp Asn GCA GTT GAG
Asp Gly Asp Leu Arg Ser Asp Cys Gly Phe Phe Leu Ala Ala Val Glu TTT TCT TTG
Gly Pro Thr Asp Asp Ser Tyr Asn Ile Ser Asp Leu Arg Phe Ser Leu ATCAGGTA
Asp His Leu Ile Leu Ser TTTTACATGG
ATCAGACATG
CCCAGAAGAC
ATCAACAGCA
TAGAAGGTGG
TATCACAAAT
CCATGTAGGT
CACAGTAAAC
GAACAGCAAT
CAAATATGAC
GGCCCACACC
CAAAGGAACA
TAACCCTGTT
TATTTTCATC
TTTGCCTTGT
AGGAGGATCA
AATTCATCAG
AGTTTCCAAT
AAAGGCTCAG
(2) INFORMATION FOR SEQ ID N0:32:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 76 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:32:
Met Lys Asn Leu Cys Val Phe Thr Leu Ser Phe Phe Leu Leu Glu Phe Ser Leu Ile Leu Cys His Leu Thr Glu Pro Ile Cys Phe Trp Arg Ile Asn Asn Asn Glu Asp Asn Asp Gly Asp Leu Arg Ser Asp Cys Gly Phe Phe Leu Ala Ala Val Glu Gly Pro Thr Asp Asp Ser Tyr Asn Ile Ser Asp Leu Arg Phe Ser Leu Asp His Leu Ile Leu Ser (2) INFORMATION FOR SEQ ID N0:33:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2811 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
( ix) FEATURE
(A) NAME/KEY: Coding Sequence (B) LOCATION: 962...2605 (D) OTHER INFORMATION: GoVNl (xi) SEQUENCE DESCRIPTION: SEQ ID N0:33:
ACAGATGCCA
CCTCGTACAC
GCCTATAAGT
GATAAATGCA
TGTGAAGTCC
TTTTGAACGG
TCCTAATTAC
GAAAACATCT
TGGGCCGTGT
CCCCAAAGAC
CTGGGTGGGA
AAAGGAGCTG
GGAATCATTG
TGTGATTATA
AAAGTATGAA
AACAATATAC
CAT CAT GGG
Met Leu Glu Leu Ala His Gly Thr Leu Thr Phe Ser Pro His His Gly CCT ATC AAG
Glu Ile Ser Asp Phe Thr Asn Phe Met Gln Glu Val Thr Pro Ile Lys TAT TTC AAT
Tyr Pro Glu Asp Ile Phe Leu His Ile Leu Trp Asn Gln Tyr Phe Asn TGT ATA CCC
Cys Pro Leu Leu His Ser Glu Cys Lys Ile Phe Glu Asn Cys Ile Pro CTG GTC ATG
Asn Ala Ser Leu Glu Leu Leu Pro Gly Gly Val Phe Glu Leu Val Met GTG GCC CAC
Thr Glu Glu Ser Tyr Asn Val Tyr Asn Ala Val Tyr Ala Val Ala His CCA CAG GAT
Ser Leu His Glu Lys Ala Leu His Gln Val Glu Ile Gln Pro Gln Asp CCT TTT CTG
Asn Lys Asp Arg Thr Ile Leu Phe Pro Trp Gln Leu His Pro Phe Leu Lys AsnIle Gln Ile AsnSer Gly Arg Val Leu Leu Val Asp Ile Asp ACG ATT
Trp LysLys Lys Asp ThrGluTyrAsp IleSer Asn TrpAsn Thr Ile CTT TTT
Phe ProThr Gly Ser LeuLeuValLys ValGly Thr AlaPro Leu Phe GGG ACA
Ser AlaPro Lys Glu GlnLeuSerIle SerGlu His IleAsn Gly Thr TTT AGT
Trp ProIle Gly Thr GluIleProLys SerVal Cys GluSer Phe Ser CAC CCT
Cys SerPro Gly Arg LysValIleLeu GluSer Lys AlaCys His Pro ACT AAC
Cys PheAsp Cys Pro CysProAspLys GluIle Ser GluThr Thr Asn TGT GCA
Asp ValGly Gln Val LysCysProGlu SerHis Tyr AsnThr Cys Ala TGC GAT
Glu LysSer His Leu LysLysThrMet ThrPhe Leu TyrAsn Cys Asp ACG TTC
Asp SerLeu Gly Gly LeuThrLeuMet SerLeu Gly PheVal Thr Phe GTT AAC
Val ThrGly Leu Ile GlyValPheIle IleHis Arg ThrPro Val Asn AAT CTC
Ile ValLys Ala Asn ArgSerLeuSer TyrIle Leu IleThr Asn Leu TTC CTT
Leu ThrLeu Cys Leu CysProLeuLeu PheIle Gly ProAsn Phe Leu ATC CTC
Thr AlaThr Cys Leu GlnGlnAsnLeu PheGly Leu PheThr Ile Leu _ GTG GCTCTA TCC GTG TTGGCCAAAACT ATCACT GTA ATGGCA 2065 ACA GTT
Val AlaLeu Ser Val LeuAlaLysThr IleThr Val MetAla Thr Val GCT CTG
Phe LysIle Thr Pro GlyArgLysThr ArgTrp Leu IleLeu Ala Leu AGA GCC CCT CAG TTC ATC ATT CCA CTT TGT GCC CTG ATG CAA ATC CTT . 2161 ArgAla Gln Ile Ile Pro Leu AlaLeu Met IleLeu Pro Phe Cys Gln GGG TGG CCT GAC
PheSer Ile Leu Gly Thr Ser ProPhe Val MetAsp Gly Trp Pro Asp TCT CAT ATT AAG
AlaHis Glu Gly His Ile Ile LeuCys Asn GlySer Ser His Ile Lys GGC TAC TAC ATG
AlaIle Phe Cys Thr Leu Ala LeuGly Val AlaPhe Gly Tyr Tyr Met TAC TTG AGG GAC
GlySer Leu Ala Phe Met Ser AsnLeu Pro ThrPhe Tyr Leu Arg Asp TCC GCC ATG TGC
AsnGlu Lys Leu Ala Phe Ser LeuMet Phe SerVal Ser Ala Met Cys ACA CTC AGC AAG
TrpVal Phe Pro Val Tyr His ThrThr Gly ValArg Thr Leu Ser Lys ATG ATG GCT AGC
ValAla Glu Phe Ser Ile Leu SerSer Ala IleLeu Met Met Ala Ser ATC GTC ATT AGA
ThrLeu Phe Pro Lys Cys Tyr ValLeu Phe ProGlu Ile Val Ile Arg ATA CCT AAA AGG
ArgAsn Leu Leu Asn Arg Glu ArgGln His SerLys Ile Pro Lys Arg GAA TAGCAGTCAA
GACAAACATT
GGCCTAGCAC
AAAATGTCTG
AsnSer Thr Glu CCTGCTATAT GATCACATGA
AAACAATTAG
TCCTTTGACT
TGATATTGCT TATTGACCAA
TCAAATTATG
TAAAATATGT
GTTCTTGTAT
GAAAAAAAAA
AAAAAAA
(2) INFORMATION
FOR
SEQ
ID
N0:34:
(i) SEQUENCE
CHARACTERISTICS:
(A) LENGTH:548 amino acids (B) TYPE:
amino acid (C) STRANDEDNESS:
single (D) TOPOLOGY:
linear (ii) TYPE: protein MOLECULE
(v) FRAGMENT
TYPE:
internal (xi) DESCRIPTION: SEQ N0:34:
SEQUENCE ID
MetLeu Leu His Gly Thr Leu PheSer Pro HisGly Glu Ala Thr His GluIle Asp Thr Asn Phe Met GluVal Thr IleLys Ser Phe Gln Pro Tyr.Pro Asp Phe Leu His Ile TrpAsn Gln PheAsn Glu Ile Leu Tyr .
Cys Pro Leu Leu His Ser Glu Cys Lys Ile Phe Glu Asn Cys Ile Pro Asn Ala Ser Leu Glu Leu Leu Pro Gly Gly Val Phe Glu Leu Val Met Thr Glu Glu Ser Tyr Asn Val Tyr Asn Ala Val Tyr Ala Val Ala His Ser Leu His Glu Lys Ala Leu His Gln Val Glu Ile Gln Pro Gln Asp Asn Lys Asp Arg Thr Ile Leu Phe Pro Trp Gln Leu His Pro Phe Leu Lys Asn Ile Gln Leu Ile Asn Ser Val Gly Asp Arg Val Ile Leu Asp Trp Lys Lys Lys Thr Asp Thr Glu Tyr Asp Ile Ser Asn Ile Trp Asn Phe Pro Thr Gly Leu Ser Leu Leu Val Lys Val Gly Thr Phe Ala Pro Ser Ala Pro Lys Gly Glu Gln Leu Ser Ile Ser Glu His Thr Ile Asn Trp Pro Ile Gly Phe Thr Glu Ile Pro Lys Ser Val Cys Ser Glu Ser Cys Ser Pro Gly His Arg Lys Val Ile Leu Glu Ser Lys Pro Ala Cys Cys Phe Asp Cys Thr Pro Cys Pro Asp Lys Glu Ile Ser Asn Glu Thr Asp Val Gly Gln Cys Val Lys Cys Pro Glu Ser His Tyr Ala Asn Thr Glu Lys Ser His Cys Leu Lys Lys Thr Met Thr Phe Leu Asp Tyr Asn Asp Ser Leu Gly Thr Gly Leu Thr Leu Met Ser Leu Gly Phe Phe Val Val Thr Gly Leu Val Ile Gly Val Phe Ile Ile His Arg Asn Thr Pro Ile Val Lys Ala Asn Asn Arg Ser Leu Ser Tyr Ile Leu Leu Ile Thr Leu Thr Leu Cys Phe Leu Cys Pro Leu Leu Phe Ile Gly Leu Pro Asn Thr Ala Thr Cys Ile Leu Gln Gln Asn Leu Phe Gly Leu Leu Phe Thr Val Ala Leu Ser Thr Val Leu Ala Lys Thr Ile Thr Val Val Met Ala Phe Lys Ile Thr Ala Pro Gly Arg Lys Thr Arg Trp Leu Leu Ile Leu Arg Ala Pro Gln Phe Ile Ile Pro Leu Cys Ala Leu Met Gln Ile Leu Phe Ser Gly Ile Trp Leu Gly Thr Ser Pro Pro Phe Val Asp Met Asp Ala His Ser Glu His Gly His Ile Ile Ile Leu Cys Asn Lys Gly Ser Ala Ile Gly Phe Tyr Cys Thr Leu Ala Tyr Leu Gly Val Met Ala Phe Gly Ser Tyr Leu Leu Ala Phe Met Ser Arg Asn Leu Pro Asp Thr Phe Asn Glu Ser Lys Ala Leu Ala Phe Ser Met Leu Met Phe Cys Ser Val Trp Val Thr Phe Leu Pro Val Tyr His Ser Thr Thr Gly Lys Val Arg Val Ala Met Glu Met Phe Ser Ile Leu Ala Ser Ser Ala Ser Ile Leu Thr Leu Ile Phe Val Pro Lys Cys Tyr Ile Val Leu Phe Arg Pro Glu Arg Asn Ile Leu Pro Leu Asn Arg Glu Lys Arg Gln His Arg Ser Lys Asn Ser Glu Thr (2) INFORMATION FOR SEQ ID N0:35:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3584 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: CDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 273...2576 (D) OTHER INFORMATION: GoVN2 (xi) SEQUENCE N0:35:
DESCRIPTION:
SEQ ID
CAGAAAGAAT ATTTTTCCTT
ATGTTCATTT
CTCCACCATC CACTTCTCAT GGTGCTTTTG GAGAACAAAT
GGCAAATTTC
ATCGATCCCT
TTGAATGAAG TCAAGGAAAA CCTTCATCCT TGGAGCAGTT
AAACTTGGAT
ATAAATTGTG
TATTTCAATG ACAACTAAAA
AGACTTTGAA
TTAGCCTTTT
CA ATG
Met Glu GluIleAsnArg Asn AAT TCT GTT
ProAsp LeuLeu Pro Met Leu Ile LysHisThrLeu Ser Asn Ser Val ACT GAC ATA
TyrCys AspGly Asn Ala His Phe LysGluLysPhe Tyr Thr Asp Ile TAT TGT GAA
LysPro LeuPro Asn Val Asn Glu ThrMetCysSer Phe Tyr Cys Glu AAT GTA TCT
MetLeu IleGly Leu Trp Leu Leu ThrLeuPheLys Asp Asn Val Ser 60 65 7p TTT CGT CTT
LeuAsp IlePhe Ser Pro Phe Gln IleSerTyrGly Pro Phe Arg Leu AGT AAT CAA
PheHis SerIle Phe Asp Glu Phe ProTyrLeuTyr Gln Ser Asn Gln ACA CTA TTG
MetThr ProLys Asp Ser Ala Ala IleValSerPhe Leu Thr Leu Leu AAC GTT CTT
LeuTyr PheAsn Trp Trp Gly Val IleSerAspAsn Asp Asn Val Leu CTC GAG AAA
GluGly AsnGln Phe Ser Leu Lys GluThrGlnAsn Lys Leu Glu Lys TTT AAC ATG
GluIle CysPhe Ala Val Met Ser IleHisGluHis Ser Phe Asn Met _ --Ser Tyr GlnLysThrGlu MetTyrTyr AsnGlnIle ValMetSer Ser Thr Asn IleIleIleIle TyrGlyLys ThrAsnSer IleIleGlu Leu Ser Phe ArgMetTrpVal SerProVal IleGlnArg IleTrpVal Thr Asn Ser GluLeuAspPhe ProThrSer MetArgAsp PheThrHis Gly Thr Phe TyrGlyThrLeu ThrPheLeu HisHisHis GlyGluIle Ser Gly Phe ThrAsnPhePhe GluThrTrp AspHisLeu ArgSerArg Asp Leu Asn LeuLeuIlePro GluTrpLys TyrPheSer TyrAspAla Ser Gly Ser AsnCysLysIle LeuArgAsn TyrSerSer AsnAlaSer Leu Glu Trp IleThrGluGln LysPheHis MetAlaPhe AsnAspTyr Ser His Ser IleTyrAsnAla ValTyrAla MetAlaHis AlaLeuHis Glu Thr Asn LeuGlnGluVal AspAsnLys GluIleArg AsnGlyLys Gly Ala Ser ThrHisCysLeu LysValAsn SerPheLeu ArgLysThr His Phe Thr AsnSerHisGly GluArgVal IleMetLys GlnArgVal Arg Val Gln GluAspTyrAsp IleValHis IleGlnAsn PheSerGln His Leu Arg IleLysMetLys IleGlyLys PheSerPro TyrPheThr His Gly Gly ProPheHisLeu TyrGluAsp MetIleGln LeuAlaThr Gly .
SerArgLys MetProSer SerValCys SerAlaAsp CysSerPro Gly PheArgLys SerTrpLys GluGlyMet AlaProCys CysPheIle Cys SerLeuCys ProGluAsn GluIleSer AsnGluThr AsnMetAsp Gln CysValAsn CysProGlu TyrGlnTyr AlaAsnThr GluLysAsn Lys CysIleGln LysAspVal IlePheLeu SerTyrGlu AspProLeu Gly MetAlaLeu AlaLeuIle AlaPheCys LeuSerAla PheThrAla Val ValLeuTrp ValPheVal LysHisHis AspThrPro IleValLys Ala AsnAsnArg IleLeuSer TyrIleLeu IleMetSer LeuMetPhe Cys PheLeuCys SerPhePhe PheIleGly HisProAsn ArgGlyThr Cys IleLeuGln GlnIleThr PheGlyIle ValPheThr ValAlaVal Ser ThrValLeu AlaLysThr IleThrVal IleLeuAla PheLysLeu Arg AspProGly ArgSerLeu ArgAsnPhe LeuValSer GlyAlaPro Asn TyrIleIle ProIleCys SerLeuLeu GlnCysIle LeuCysAla Ile TrpLeuAla ValSerPro ProPheVal AspIleAsp GluHisSer Glu HisGlyHis IleMetIle ValCysAsn LysGlySer IleMetAla Phe TyrCysVal LeuGlyTyr LeuAlaCys LeuAlaLeu GlySerPhe Thr ThrAlaPhe LeuAlaLys AsnLeuPro AspThrPhe AsnGluAla Lys ', TTC TTG ACC TTC AGC ATG CTA GTG TTC TGC AGT GTC TGG GTC ACC TTT 2405 Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ser Val Trp Val Thr Phe Leu Pro Val Tyr His Ser Thr Arg Gly Arg Val Met Val Ala Val Glu Ile Phe Ser Ile Leu Ala Ser Ser Ala Gly Met Phe Gly Cys Ile Phe Ala Pro Lys Ile Tyr Ile Ile Leu Met Lys Pro Glu Arg Asn Ser Ile Gln Lys Phe Arg Glu Lys Ser Tyr Phe (2) INFORMATION FOR SEQ ID N0:36:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 768 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:36:
Met Glu Glu Ile Asn Arg Asn Pro Asp Leu Leu Pro Asn Met Ser Leu Val Ile Lys His Thr Leu Ser Tyr Cys Asp Gly Asn Thr Ala Asp His Ile Phe Lys Glu Lys Phe Tyr Lys Pro Leu Pro Asn Tyr Val Cys Asn Glu Glu Thr Met Cys Ser Phe Met Leu Ile Gly Leu Asn Trp Val Leu Ser Leu Thr Leu Phe Lys Asp Leu Asp Ile Phe Ser Phe Pro Arg Phe Leu Gln Ile Ser Tyr Gly Pro Phe His Ser Ile Phe Ser Asp Asn Glu Gln Phe Pro Tyr Leu Tyr Gln Met Thr Pro Lys Asp Thr Ser Leu Ala Leu Ala Ile Val Ser Phe Leu Leu Tyr Phe Asn Trp Asn Trp Val Gly Leu Val Ile Ser Asp Asn Asp Glu Gly Asn Gln Phe Leu Ser Glu Leu Lys Lys Glu Thr Gln Asn Lys Glu Ile Cys Phe Ala Phe Val Asn Met Met Ser Ile His Glu His Ser Ser Tyr Gln Lys Thr Glu Met Tyr Tyr Asn Gln Ile Val Met Ser Ser Thr Asn Ile Ile Ile Ile Tyr Gly Lys Thr Asn Ser Ile Ile Glu Leu Ser Phe Arg Met Trp Val Ser Pro Val Ile Gln Arg Ile Trp Val Thr Asn Ser Glu Leu Asp Phe Pro Thr Ser Met Arg Asp Phe Thr His Gly Thr Phe Tyr Gly Thr Leu Thr Phe Leu His His His Gly Glu Ile Ser Gly Phe Thr Asn Phe Phe Glu Thr Trp Asp His Leu Arg Ser Arg Asp Leu Asn Leu Leu Ile Pro Glu Trp Lys Tyr Phe Ser Tyr Asp Ala Ser Gly Ser Asn Cys Lys Ile Leu Arg Asn Tyr Ser Ser Asn Ala Ser Leu Glu Trp Ile Thr Glu Gln Lys Phe His Met Ala Phe Asn Asp Tyr Ser His Ser Ile Tyr Asn Ala Val Tyr Ala Met Ala His Ala Leu His Glu Thr Asn Leu Gln Glu Val Asp Asn Lys Glu Ile Arg Asn Gly Lys Gly Ala Ser Thr His Cys Leu Lys Val Asn Ser Phe Leu Arg Lys Thr His Phe Thr Asn Ser His Gly Glu Arg Val Ile Met Lys Gln Arg Val Arg Val Gln Glu Asp Tyr Asp Ile Val His Ile Gln Asn Phe Ser Gln His Leu Arg Ile Lys Met Lys Ile Gly Lys Phe Ser Pro Tyr Phe Thr His Gly Gly Pro Phe His Leu Tyr Glu Asp Met Ile Gln Leu Ala Thr Gly Ser Arg Lys Met Pro Ser Ser Val Cys Ser Ala Asp Cys Ser Pro Gly Phe Arg Lys Ser Trp Lys Glu Gly Met Ala Pro Cys Cys Phe Ile Cys Ser Leu Cys Pro Glu Asn Glu Ile Ser Asn Glu Thr Asn Met Asp Gln Cys Val Asn Cys Pro Glu Tyr Gln Tyr Ala Asn Thr Glu Lys Asn Lys Cys Ile Gln Lys Asp Val Ile Phe Leu Ser Tyr Glu Asp Pro Leu Gly Met Ala Leu Ala Leu Ile Ala Phe Cys Leu Ser Ala Phe Thr Ala Val Val Leu Trp Val Phe Val Lys His His Asp Thr Pro Ile Val Lys Ala Asn Asn Arg Ile Leu Ser Tyr Ile Leu Ile Met Ser Leu Met Phe Cys Phe Leu Cys Ser Phe Phe Phe Ile Gly His Pro Asn Arg Gly Thr Cys Ile Leu Gln Gln Ile Thr Phe Gly Ile Val Phe Thr Val Ala Val Ser Thr Val Leu Ala Lys Thr Ile Thr Val Ile Leu Ala Phe Lys Leu Arg Asp Pro Gly Arg Ser Leu Arg Asn Phe Leu Val Ser Gly Ala Pro Asn Tyr Ile Ile Pro Ile Cys Ser Leu Leu Gln Cys Ile Leu Cys Ala Ile Trp Leu Ala Val Ser Pro Pro Phe Val Asp Ile Asp Glu His Ser Glu His Gly His Ile Met Ile Val Cys Asn Lys Gly Ser Ile Met Ala Phe Tyr Cys Val Leu Gly Tyr Leu Ala Cys Leu Ala Leu Gly Ser Phe Thr Thr Ala Phe Leu Ala Lys Asn Leu Pro Asp Thr Phe Asn Glu Ala Lys Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ser Val Trp Val Thr Phe Leu Pro Val Tyr His Ser Thr Arg Gly Arg Val Met Val Ala Val Glu Ile Phe Ser Ile Leu Ala Ser Ser Ala Gly Met Phe Gly Cys Ile Phe Ala Pro Lys Ile Tyr Ile Ile Leu Met Lys Pro Glu Arg Asn Ser Ile Gln Lys Phe Arg Glu Lys Ser Tyr Phe (2) INFORMATION FOR SEQ ID N0:37:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3578 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 1181...3181 (D) OTHER INFORMATION: GoVN3 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:37:
AATTATTTTT
TGAAGCTTTC
CCAAGACATT
TTGGTCATGT
CTATCCACTT
TTCTACCTAA
TAAACAGTTT
GAAGATATTT
TCCTATACAT
CCCTAAAGTC
ACAATCACTG
CTACTGATGC
TCTTTGCACT
TTAGAACTCA
ATCACACCTG
AACTAAGTGA
GATATTTTTT
TAAAATTCCT
CATTTCACCC
CCT AAG GAC
Met Ala Pro Lys Asp Thr Ser Leu Ala Leu Ala Met Val Ser Leu Phe Val His Phe Ser Trp GCT GTT GAC GGT
TCA TAT
Asn TrpValGly ValVal AspAsp ProGly Glu Phe Ala Ser Asp Tyr AGA ATG AAC TGT
Ile LeuGluLeu ArgGlu GlnArg AsnPhe Leu Ala Arg Met Asn Cys ATT GAT TTA AAA
Phe ValSerIle ValSer AspAsn PheLeu Arg Tyr Ile Asp Leu Lys AAC AAG TCA GTT
Asn IleTyrTyr GlnIle MetSer AlaLys Val Ile Asn Lys Ser Val 70 75 8p g5 AAA CCT GTG AGA
Ile TyrGlyAsp AspSer LeuGln AsnPhe Leu Trp Lys Pro Val Arg ATC ATC ACT CAG
Asn LeuPheAsp GlnArg TrpVal ThrSer Trp Asp Ile Ile Thr Gln AAT TTC AAT TAT
Met IleIleAsn GlyLys LeuLeu SerPhe Gly Thr Asn Phe Asn Tyr CAT TCT TCT AAA
Leu SerPheSer HisTyr GluLeu GlyPhe Thr Phe His Ser Ser Lys TAC AAC GAT TCT
Ile GlnThrAla ProSer TyrSer AspPhe Leu Gly Tyr Asn Asp Ser GTG AAT TTG TCT
Ile LeuTrpTrp TyrPhe CysSer SerLeu Glu Cys Val Asn Leu Ser AAT AAG ATA TGG
Lys AsnLeuGln CysPro GluAsn PheArg Leu Tyr Asn Lys Ile Trp GAA TTG ACT GAC
Arg HisHisPhe MetSer SerAsp ThrTyr Leu Tyr Glu Leu Thr Asp GCT TAC CAA CTT
Asn SerMetTyr ValAla ThrLeu GlnMet Leu Lys Ala Tyr Gln Leu TGG GAT AAA GAA
Gln AlaAspThr GlnIle AspGly GluPro Phe Asp Trp Asp Lys Glu CTC CTG ATC ATA
Ser TrpGlnMet SerPhe ArgAsn GlnPhe Asn Pro Leu Leu Ile Ile GTG AAT GAA GAT
Val GlyAspLys AsnLeu HisGlu LysLeu Thr Lys Val Asn Glu Asp CAG ACT CCA GTA .
Tyr GluIle HisGln Thr ThrPhe Pro Asn ValPheLys Leu Leu Pro TCC TTA GGT
Leu LysIle GlyThr Phe GlnAsn Ser His ArgGlnLeu Ser Leu Gly ATA AAC CAC
Tyr MetLeu LysGlu Met GluTrp Thr Gly GlnGlnSer Ile Asn His ATT AGT TTC
Pro ThrSer ValCys Ser ProCys Pro Gly ArgLysSer Ile Ser Phe GTT TTT ACA
Pro GlnLeu GlyLys Pro CysCys Asp Cys ProCysPro Val Phe Thr ' 345 350 355 ATG ATG TGT
Glu AsnGlu IleSer Asn ThrAsn Asn Gln IleLysCys Met Met Cys Leu Asn Asp Gln Tyr Ala Asn Pro Gly Gly Thr Arg Cys Leu Lys Lys Val Ile Val Phe Leu Gly Tyr Glu Asp Pro Leu Gly Met Ser Leu Ala Ile Leu Ala Leu Cys Phe Ser Ala Leu Thr Ala Phe Val Leu Ser Ile Phe Leu Lys His Gln Glu Thr Pro Thr Val Lys Ala Asn Asn Arg Thr Leu Ser Tyr Val Leu Leu Ile Ser Leu Ile Ser Cys Phe Leu Cys Ser Leu Leu Phe Ile Gly His Pro Ser Phe Thr Thr Cys Ile Met Gln Gln ' 455 460 465 Thr Thr Phe Ala Val Val Phe Thr Val Ala Ala Ser Thr Val Leu Ala Lys Thr Ile Ile Val Ile Leu Ala Phe Lys Val Thr Asn Thr Ser Arg _ Lys Met Arg Trp Leu Leu Val Ser Gly Ala Pro Lys Phe Ile Ile Pro Ile Cys Thr Met Ile Gln Leu Ile Leu Cys Gly Ile Trp Leu Gly Thr Ser Pro Pro Phe Val Asp Ala Asp Gly His Val Glu Lys Gly His Ile.
CTT GCT TGT
Leu Ile Phe Cys Asn Lys Gly Ser Ile Phe Tyr ValLeu Leu Ala Cys AGT TTC GCA
Gly Tyr Leu Val Ser Ile Ala Ile Ala Thr Leu PhePhe Ser Phe Ala GAA GCC CTA
Ala Arg Asn Leu Pro Asp Thr Phe Asn Lys Phe ThrPhe Glu Ala Leu GTC ACC CCT
Ser Met Leu Val Phe Cys Ser Val Trp Phe Leu ValTyr Val Thr Pro GCT GTG TTC
His Ser Thr Lys Gly Lys Ser Met Val Glu Val CysIle Ala Val Phe TGC ATC CCA
Leu Ala Ser Ser Ala Gly Leu Leu Phe Phe Ala LysCys Cys Ile Pro AAA TCT AAG
Phe Ile Ile Leu Leu Arg Pro Glu Lys Phe Gln PheGln Lys Ser Lys ATTAAATTTT TCTGACACAC
Asn Ile His Ser Lys Ile TAGATCCAAA
AAAACACGTC
CTGTTTGCTG
GTTCTGAGTT
TGTTGTTGTG
CTCTATAATA AATAATTATG AGATAAATGC P~~i~AAAAAAAp~~e~AAAAAAAAAAAAAAA 3578 A
(2) INFORMATION FOR SEQ ID N0:38:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 667 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID 8:
N0:3 Met Ala Pro Lys Asp Thr Ser Leu Ala Leu Ala Met Val Ser Leu Phe Val His Phe Ser Trp Asn Trp Val Gly Ala Val Val Ser Asp Asp Asp Pro Gly Tyr Glu Phe Ile Leu Glu Leu Arg Arg Glu Met Gln Arg Asn Asn Phe Cys Leu Ala Phe Val Ser Ile Ile Val Ser Asp Asp Asn Leu Phe Leu Lys Arg Tyr Asn Ile Tyr Tyr Asn Gln Ile Lys Met Ser Ser Ala-Lys Val Val Ile Ile Tyr Gly Asp Lys Asp Ser Pro Leu Gln Va1 Asn Phe Arg Leu Trp Asn Leu Phe Asp Ile Gln Arg Ile Trp Val Thr Thr Ser Gln Trp Asp Met Ile Ile Asn Asn Gly Lys Phe Leu Leu Asn Ser Phe Tyr Gly Thr Leu Ser Phe Ser His His Tyr Ser Glu Leu Ser Gly Phe Lys Thr Phe Ile Gln Thr Ala Tyr Pro Ser Asn Tyr Ser Asp Asp Phe Ser Leu Gly Ile Leu Trp Trp Val Tyr Phe Asn Cys Ser Leu Ser Leu Ser Glu Cys Lys Asn Leu Gln Asn Cys Pro Lys Glu Asn Ile Phe Arg Trp Leu Tyr Arg His His Phe Glu Met Ser Leu Ser Asp Thr Thr Tyr Asp Leu Tyr Asn Ser Met Tyr Ala Val Ala Tyr Thr Leu Gln Gln Met Leu Leu Lys Gln Ala Asp Thr Trp Gln Ile Asp Asp Gly Lys Glu Pro Glu Phe Asp Ser Trp Gln Met Leu Ser Phe Leu Arg Asn Ile Gln Phe Ile Asn Pro Val Gly Asp Lys Val Asn Leu Asn His Glu Glu Lys Leu Asp Thr Lys Tyr Glu Ile His Gln Thr Leu Thr Phe Leu Pro Asn Pro Val Phe Lys Leu Lys Ile Gly Thr Phe Ser Gln Asn Leu Ser His Gly Arg Gln Leu Tyr Met Leu Lys Glu Met Ile Glu Trp Asn Thr Gly His Gln Gln Ser Pro Thr Ser Val Cys Ser Ile Pro Cys Ser Pro Gly Phe Arg Lys Ser Pro Gln Leu Gly Lys Pro Val Cys Cys Phe Asp Cys Thr Pro Cys Pro Glu Asn Glu Ile Ser Asn Met Thr Asn Met Asn Gln Cys Ile Lys Cys Leu Asn Asp Gln Tyr Ala Asn Pro Gly Gly Thr Arg Cys Leu Lys Lys Val Ile Val Phe Leu Gly Tyr Glu Asp Pro Leu Gly Met Ser Leu Ala Ile Leu Ala Leu Cys Phe Ser Ala Leu Thr Ala Phe Val Leu Ser Ile Phe Leu Lys His Gln Glu Thr Pro Thr Val Lys Ala Asn Asn Arg Thr Leu Ser Tyr Val Leu Leu Ile Ser Leu Ile Ser Cys Phe Leu Cys Ser Leu Leu Phe Ile Gly His Pro Ser Phe Thr Thr Cys Ile Met Gln Gln Thr Thr Phe Ala Val Val Phe Thr Val Ala Ala Ser Thr Val Leu Ala Lys Thr Ile Ile Val Ile Leu Ala Phe Lys Val Thr Asn Thr Ser Arg Lys Met Arg Trp Leu Leu Val Ser Gly Ala Pro Lys Phe Ile Ile Pro Ile Cys Thr Met Ile Gln Leu Ile Leu Cys Gly Ile Trp Leu Gly Thr Ser Pro Pro Phe Val Asp Ala Asp Gly His Val Glu Lys Gly His Ile Leu Ile Phe Cys Asn Lys Gly Ser Ile Leu Ala Phe Tyr Cys Val Leu Gly Tyr Leu Val Ser Ile Ala Ile Ala Ser Phe Thr Leu Ala Phe Phe Ala Arg Asn Leu Pro Asp Thr Phe Asn Glu Ala Lys Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ser Val Trp Val Thr Phe Leu Val Tyr His Ser Thr Lys Gly Lys Ser Met Val Pro Ala Val Glu Val Cys Ile Leu Ala Ser Ser Ala Gly Leu Leu Phe Phe Cys Ile Phe Ala Lys Cys Phe Ile Ile Leu Leu Arg Pro Glu Lys Pro Lys Ser Phe Gln Phe Gln Asn Ile His Ser Lys Ile Lys (2) INFORMATION FOR SEQ ID N0:39:
(i) SEQUENCE
CHARACTERISTICS:
(A) LENGTH: 4467 base pairs (B) TYPE: nucleic acid {C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE
TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 126...2723 {D) OTHER INFORMATION: GoVN4 (xi) SEQUENCE
DESCRIPTION:
SEQ ID
N0:39:
GAAACACCTG
TAGAAAAGGA
AACCTGAATA
CAGGTATAGC
ATCTTCTTGG
AGATGGGGAT
AATTGCTACC
TGTTTGCTGA
TCTGTGCAGC
AATTAACTAC
TCC AGG
CTC AGA
GCA GGA
AAA AAT
ATG CTC
ACC TTC
ATT TTA
Met Ser Arg Leu Arg Ala Gly Lys Asn Met Leu Thr Phe IIe Leu TTT ATT TAT
Leu Phe Leu Leu Asn Ile Pro Leu Phe Val Pro Ser Phe Phe Ile Tyr TGC AGA AAC
Pro Arg Phe Trp Ser Met Lys Lys Asn Glu Tyr Gln Asp Cys Arg Asn ACA CCT ATG
Leu Gly Gly Cys Met Phe Phe Ile Leu Ala Val Gln Gln Thr Pro Met GAG ACT GAA
Glu Lys Tyr Phe Ser His Ile Ser Asn Ile Gln Thr Pro Glu Thr Glu AAG ATC AAC
Asn Gln Tyr Pro Leu Thr Leu Ala Phe Ser Met Asn Glu Lys Ile Asn CCT TTC TCA
Asn Asn Asp Leu Leu Pro Asn Met Ser Leu Ala Phe Thr Pro Phe Ser AGT AAT TTT
Glu Tyr Cys Tyr Leu Glu Ser His His Lys Arg Leu Phe Ser Asn Phe AAA AAA GAC
Ser Leu Asn His Glu Ile Leu Pro Asn Phe Ile Cys Thr Lys Lys Asp WO 99/00422 PCT/US98/13b80 GGA CTT AGT TTG ACT
Ile LysCys Gly Val Val Leu Thr SerLeuVal Thr Val Gly Leu Thr TTC ATA CGT
_ Thr LeuHis Ile Ile Leu Asn Asn PheGlnGln Phe Gln Phe Ile Arg GCT CTG AAT
Leu ThrTyr Gly His Phe His Pro CysAspHis Glu Phe Ala Leu Asn GAT GAT CTT
Pro HisLeu Tyr Gln Met Ala Ser ThrSerLeu Ala Ala Asp Asp Leu AGT TGG TTG
Leu ValSer Phe Ile Ile His Phe AsnTrpIle Gly Ala Ser Trp Leu CAT TTT AGA
Ile SerAsp Asn Asp Gln Gly Ile LeuSerTyr Leu Arg His Phe Arg TTT GCC ATT
Glu MetGlu Lys Asn Thr Val Cys PheValAsn Ile Pro Phe Ala Ile ', 240 245 250 255 AGA GCT AGC
Val AsnMet Asn Leu Tyr Met Ser GluValTyr Tyr Gln Arg Ala Ser GTT ATC ACA
Val MetThr Ser Ser Ala Asn Val IleTyrGly Asp Gly Val Ile Thr ATG TGG ATA
Asn ThrLeu Ala Val Ser Phe Arg AspSerLeu Gly Gln Met Trp Ile TGG GAT AAG
Arg LeuTrp Val Thr Thr Ser Gln ValThrPro Phe Lys Trp Asp Lys GGA ACT CAC
Asp PheThr Phe Asp Asn Gly Tyr PheGlyPhe Gly Arg Gly Thr His TAT TTT AAC
His SerGlu Ile Ser Gly Phe Lys ValGlnThr Leu Pro Tyr Phe Asn GTA AAG TAT
Phe LysTyr Ser Asp Glu Tyr Leu LeuGluTrp Met Val Val Lys Tyr TGT AAG TGC
Asn CysLys Ile Leu Glu Tyr Asn SerLeuLys Asn Ser Cys Lys Cys Phe Asn His Ser Leu Glu Trp Leu Met Thr His Thr Phe Asp Met Ala ATT ATT GAA GGG AGT TAT GAA ATA TAC AAT GCT GTG TAT GCT TTT GCC . 1370 Ile Ile GluGlySer TyrGluIle TyrAsnAla ValTyrAla PheAla His Ala LeuHisGlu MetThrLeu GlnAsnVal AspAsnVal LeuLeu Pro Asn TyrGluGlu GlnAsnTyr AsnCysLys MetValTyr SerPhe Leu Ser LysThrGln PheThrAsn ProValGly AspThrVal AsnMet Asn Gln ArgAsnLys LeuLysGlu GluTyrAsp IlePheTyr AsnTrp Asn Phe ProGlnGly LeuGlyPhe LysValLys IleGlyIle PheSer Pro Tyr PheProLys GlyGlnGln LeuHisLeu SerGluAsn LeuIle Glu Trp SerThrGly ArgIleGln MetProThr SerValCys SerAla Asp Cys GlyProGly PheArgLys ValTrpLys AsnGlyMet ProAla Cys Cys PheAspCys SerProCys ProGluAsn GluIleSer AsnGlu Thr Asn ValGluLeu CysValGln CysProGlu AspGlnTyr AlaAsn Gln Glu GlnAsnHis CysIleHis LysAlaArg IlePheLeu SerTyr Asp Glu ProLeuGly MetAlaLeu SerLeuMet AlaLeuCys LeuAla Ala Leu ThrValVal ValLeuGly ValPheVal LysHisHis ArgThr Pro Ile ValLysAla AsnAsnCys ThrLeuThr TyrIleLeu LeuIle Ala Leu IlePheCys PheLeuCys ProLeuPhe PheIleGly HisPro Asn Ser AlaThrCys IleLeuGln GlnIleThr PheGlyVal ValPhe.
Thr Val Ala Ile Ser Thr Val Leu Ala Lys Thr Thr Thr Val Ile Leu GTT
Ala Phe Arg Val Thr Ala Pro His Arg Met Met Lys Tyr Phe Leu Val w 690 695 700 ATT
Ser Arg Ala Ser Asn Tyr Ile Ile Pro Ile Cys Thr Leu Ile Gln Ile ATT
Ile Val Cys Ala Ile Trp Leu Gly Ala Ser Pro Pro Ser Val Asp Ile GGT
Asp Ala Gln Ser Glu His Gly His Ile Ile Ile Ala Cys Asn Lys Gly GCC
Ser Val Thr Ala Phe Tyr Cys Val Leu Gly Tyr Leu Ala Cys Leu Ala ACC
Phe Val Ser Phe Thr Leu Ala Phe Leu Ser Arg Asn Leu Pro Val Thr AGT
Phe Asn Glu Ala Lys Ser Met Thr Phe Ser Met Leu Val Phe Cys Ser GTT
Val Trp Val Thr Phe Leu Pro Val Tyr His Gly Thr Lys Gly Lys Val ATG
Met Val Ala Val Glu Ile Phe Ser Thr Leu Ala Ser Ser Ala Gly Met CCA
Leu Gly Cys Ile Phe Ala Pro Lys Cys Tyr Thr Ile Leu Phe Arg Pro ACT
Asp Arg Asn Ser Leu Gln Met Ile Arg Glu Lys Ser Ser Ser His Thr TCATAATCAC CAAATATTC
His Ile Leu CCCTCAATTT TAAGTGTATC ATAAAAGACA CAGTTGTGAA ATTTTCAAGG ACAGCACTA,C3432 (2) INFORMATION FOR SEQ ID N0:40:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 866 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:40:
Met Ser Arg Leu Arg Ala Gly Lys Asn Met Leu Thr Phe Ile Leu Leu Phe Phe Leu Leu Asn Ile Pro Leu Phe Val Pro Ser Phe Ile Tyr Pro Arg Cys Phe Trp Ser Met Lys Lys Asn Glu Tyr Gln Asp Arg Asn Leu Gly Thr Gly Cys Met Phe Phe Ile Leu Ala Val Gln Gln Pro Met Glu Lys Glu Tyr Phe Ser His Ile Ser Asn Ile Gln Thr Pro Thr Glu Asn Gln Lys Tyr Pro Leu Thr Leu Ala Phe Ser Met Asn Glu Ile Asn Asn Asn Pro Asp Leu Leu Pro Asn Met Ser Leu Ala Phe Thr Phe Ser Glu Tyr Ser Cys Tyr Leu Glu Ser His His Lys Arg Leu Phe Asn Phe Ser Leu Lys Asn His Glu Ile Leu Pro Asn Phe Ile Cys Thr Lys Asp Ile Lys Cys Gly Val Val Leu Thr Gly Leu Ser Leu Val Thr Thr Val Thr Leu His Ile Ile Leu Asn Asn Phe Ile Phe Gln Gln Phe Arg Gln Leu Thr Tyr Gly His Phe His Pro Ala Leu Cys Asp His Glu Asn Phe Pro His Leu Tyr Gln Met Ala Ser Asp Asp Thr Ser Leu Ala Leu Ala Leu Val Ser Phe Ile Ile His Phe Ser Trp Asn Trp Ile Gly Leu Ala Ile Ser Asp Asn Asp Gln Gly Ile His Phe Leu Ser Tyr Leu Arg Arg Glu Met Glu Lys Asn Thr Val Cys Phe Ala Phe Val Asn Ile Ile Pro Val Asn Met Asn Leu Tyr Met Ser Arg Ala Glu Val Tyr Tyr Ser Gln Val Met Thr Ser Ser Ala Asn Val Val Ile Ile Tyr Gly Asp Thr Gly Asn Thr Leu Ala Val Ser Phe Arg Met Trp Asp Ser Leu Gly Ile Gln Arg Leu Trp Val Thr Thr Ser Gln Trp Asp Val Thr Pro Phe Lys Lys Asp Phe Thr Phe Asp Asn Gly Tyr Gly Thr Phe Gly Phe Gly His Arg His Ser Glu Ile Ser Gly Phe Lys Tyr Phe Val Gln Thr Leu Asn Pro Phe Lys Tyr Ser Asp Glu Tyr Leu Val Lys Leu Glu Trp Met Tyr Val Asn Cys Lys Ile Leu Glu Tyr Asn Cys Lys Ser Leu Lys Asn Cys Ser Phe Asn His Ser Leu Glu Trp Leu Met Thr His Thr Phe Asp Met Ala Ile Ile Glu Gly Ser Tyr Glu Ile Tyr Asn Ala Val Tyr Ala Phe Ala His Ala Leu His Glu Met Thr Leu Gln Asn Val Asp Asn Val Leu Leu Pro Asn Tyr Glu Glu Gln Asn Tyr Asn Cys Lys Met Val Tyr Ser Phe Leu Ser Lys Thr Gln Phe Thr Asn Pro Val Gly Asp Thr Val Asn Met Asn Gln Arg Asn Lys Leu Lys Glu Glu Tyr Asp Ile Phe Tyr Asn Trp Asn Phe Pro Gln Gly Leu Gly Phe Lys Val Lys Ile Gly Ile Phe Ser Pro Tyr Phe Pro Lys Gly Gln Gln Leu His Leu Ser Glu Asn Leu Ile Glu Trp Ser Thr Gly Arg Ile Gln Met Pro Thr Ser Val Cys Ser Ala Asp Cys Gly Pro Gly Phe Arg Lys Val Trp Lys Asn Gly Met Pro Ala Cys Cys Phe Asp Cys Ser Pro Cys Pro Glu Asn Glu Ile Ser Asn G1u Thr Asn Val Glu Leu Cys Val Gln Cys Pro Glu Asp Gin Tyr Ala Asn Gln Glu Gln Asn His Cys Ile His Lys Ala Arg Ile Phe Leu Ser Tyr Asp Glu Pro Leu Gly Met Ala Leu Ser Leu Met Ala Leu Cys Leu Ala Ala Leu Thr Val val Val Leu Gly Val Phe Val Lys His His Arg Thr Pro Ile Val Lys Ala Asn Asn Cys Thr Leu Thr Tyr Ile Leu Leu Ile Ala Leu Ile Phe Cys Phe Leu Cys Pro Leu Phe Phe Ile Gly His Pro Asn Ser Ala Thr Cys Ile Leu Gln Gln Ile Thr Phe Gly Val Val Phe Thr Val Ala Ile Ser Thr Val Leu Ala Lys Thr Thr Thr Val Ile Leu Ala Phe Arg Val Thr Ala Pro His Arg Met Met Lys Tyr Phe Leu Val Ser Arg Ala Ser Asn Tyr Ile Ile Pro Ile Cys Thr Leu Ile Gln Ile Ile Val Cys Ala Ile Trp Leu Gly Ala Ser Pro Pro Ser Val Asp Ile Asp Ala Gln Ser Glu His Gly His Ile Ile Ile Ala Cys Asn Lys Gly Ser Val Thr Ala Phe Tyr Cys Val Leu Gly Tyr Leu Ala Cys Leu Ala Phe Val Ser Phe Thr Leu Ala Phe Leu Ser Arg Asn Leu Pro Val Thr Phe Asn Glu Ala Lys Ser Met Thr Phe Ser Met Leu Val Phe Cys Ser Val Trp Val Thr Phe Leu Pro Val Tyr His Gly Thr Lys Gly Lys Val Met Val Ala Val Glu Ile Phe Ser Thr Leu Ala Ser Ser Ala Gly Met Leu Gly Cys Ile Phe Ala Pro Lys Cys Tyr Thr Ile Leu Phe Arg Pro Asp Arg Asn Ser Leu Gln Met Ile Arg Glu Lys Ser Ser Ser His Thr His Ile Leu (2) INFORMATION FOR SEQ ID N0:41:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2916 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 299...2635 (D) OTHER INFORMATION: GoVN5 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:41:
TATGCTCTTC
CTACAGAAGC
GGTGATTGTT
GTGGATGCTT
ATTTGGCC AT
Met ACC AAA
Arg Phe Ala Ile Glu Glu Ile Asn Ser Asn Pro His Leu Leu Pro Asn GAG GTA
Thr Ser Leu Gly Phe Glu Ile Asn Asn Val Pro His Gly Gln Arg Tyr TGA CAT
Thr Leu Val Lys Leu Phe Ser Ser Leu Ser Gly Ser Asn Tyr Asp Ile ACT TAC
Pro Asn Tyr Ile Ser Ala Ser Glu Ser Asn Ser Ala Ala Val Leu Thr GGA TCT
Gly Pro Ser Trp Thr Ile Ser Glu Cys Val Gly Thr Leu Leu Asp Leu CCT GAG
Tyr Lys Phe Pro Gln Leu Thr Phe Gly Pro Phe Asp Ser Leu Leu Ser AGA TAC
Glu Gln Arg Arg Phe Ser Ser Leu Tyr Gln Val Ala Pro Lys Asp Thr.
loo los llo Phe Leu Thr Pro Gly Ile Val Ser Leu Met Leu His Phe His Trp Asn Trp Val Gly Leu Phe Ile -Ile Asp Asp Asp Lys Gly Ala Gln Thr Leu Ser Asp Leu Arg Asn Glu Met Asp Lys Asn Gly Val Cys Thr Ala Phe Val Glu Met Ile Pro Val Ile Lys Gly Ser Phe Phe Thr Lys Ser Trp Lys Asn His Val Gln Ile Leu Glu Ser Ser Ser Asn Val Ile Ile Ile Tyr Gly Asp Ser Asp Ser Leu Leu Ser Leu Ile Val Asn Ile Lys Gln Lys Leu Leu Thr Trp Lys Val Trp Val Leu Ile Ser Gln Trp Asp Val Ser Lys Phe Asp Asp Tyr Phe Met Val Asp Ser Leu His Gly Ala Leu Ile Phe Ser His His Arg Giu Glu Ile Pro Asn Phe Thr Asp Phe Met Gln Lys Tyr Asn Pro Ser Lys Tyr Pro Glu Asp Thr Tyr Leu His Val Leu Trp His Met Tyr Phe Asn Cys Ser Phe VaI Lys Lys Asp Cys Lys Ile Val His Asn Cys Leu Pro Asn Ala Ser Leu Gly Phe Leu Pro Gly Asn Ile Phe Asp Met Ala Met Ser Glu Glu Ser Tyr Asn Val Tyr Asn Ala Val Tyr Ala Val Ala His Ser Leu His Glu Met Ile Leu Asn Gln Val Gln Phe Gln Thr His Glu Lys Gly Lys Lys Met Val Phe Phe Pro Trp Gln Leu His Pro Phe Leu Arg Glu Arg Gln Leu Ile Asn Gln Asn TTG GTC GTA
Gly Ala Asn Glu Asp Leu Asp Thr Arg Lys His Val Glu Cys Ser Tyr TTT TCT TGT
Asp Ile Leu Asn Phe Trp Asn Pro Lys Gly Gly Leu Asn Phe Leu Val AAG GGA GTC
Lys Val Gly Thr Phe Ser Pro Ala Pro Lys Gln Lys Leu Ser Glu Ser GTG GTC TCC
Ile Ser Ser Asn Met Ile Gln Ala Thr Gly Thr Glu Ile Trp Ser Pro CTG ATT CCA
Gln Ser Val Cys Ser Glu Ser His Pro Gly Arg Lys Thr Cys Phe His TTG CAT AGA
Gln Glu Gly Arg Val Ala Cys Phe Asp Cys Pro Cys Pro Cys Ile Glu AGA GTG TCC
Asn Glu Ile Ser Asn Glu Thr Val Asp Gln Val Lys Cys Asp Cys Pro AGA CTG AAC
Glu Thr His Tyr Ala Asn Ile Lys Ile His Leu Gln Lys Glu Cys Thr TGA GAA GAC ACT
TTG CTT
Val Thr Phe Leu Tyr Tyr Asp Pro Leu Gly Thr Leu Cys Asp Lys Phe ACT TGT GTT
Met Ser Leu Gly Phe Ser Ser Thr Ala Ala Leu Val Val Leu Val Phe CAT CAA TAA CCT
GGC TCT
Leu Lys Asn Arg Asp Thr Pro Val Lys Ala Asn Leu Ala Ile Asn Leu TTT TTT CTT
Ser Tyr Thr Leu Leu Ile Thr Met Leu Cys Leu Cys Pro Leu Phe Leu CAC TAT AAA
Leu Phe Ile Gly Arg Pro Ser Ala Ser Cys Leu Gln Gln Thr Ile Asn TGT CAC CAA
Ile Phe Gly Leu Leu Phe Thr Ala Leu Ser Val Leu Ala Val Thr Lys CTT TTC AAT
Thr Ile Thr Val Val Ile Ala Lys Ile Thr Pro Gly Arg Phe Ser Ile AAG TTT CTT
Arg Arg Trp Leu Leu Ile Ser Ala Pro Asn Ile Ile Pro Arg Phe Leu TCT TTG CTC
Cys Thr Leu Leu Gln Val Phe Leu Ser Gly Ile Trp Leu Thr Thr Ser Pro Pro Phe Ile Asp Lys Asp Ala His Ser Glu His Gly His Ile Ile Ile Ile Cys Asn Lys Gly Ser Ala Val Ala Phe His Cys Asn Leu Gly Tyr Leu Gly Ala Leu Ala Leu Val Ser Tyr Phe Met Ala Phe Leu Ser ', Arg Asn Leu Pro Asp Thr Phe Asn Glu Ala Lys Phe Leu Ala Phe Ser Met Leu Val Phe Cys Ser Val Trp Val Thr Phe Leu Pro Val Tyr His ' Ser Thr Lys Gly Lys Asn Met Val Ala Met Glu Val Phe Ser Ile Leu Ala Ser Ser Thr Ser Leu Leu Gly Ile Ile Phe Ala Pro Lys Cys Tyr 740 745 75p Leu Ile Leu Leu Arg Pro Glu Arg Asn Ser Leu Ser Tyr Ile Arg Asp T
Lys Thr Tyr Ala Lys Ser Ile Lys Pro Ser (2) INFORMATION FOR SEQ ID N0:42:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 779 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single ' (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:42:
Met Arg Phe Ala Ile Glu Glu Ile Asn Ser Asn Pro His Leu Leu Pro Asn Thr Ser Leu Gly Phe Glu Ile Asn Asn Val Pro His Gly Gln Arg Tyr Thr Leu Val Lys Leu Phe Ser Ser Leu Ser Gly Ser Asn Tyr Asp Ile Pro Asn Tyr Ile Ser Ala Ser Glu Ser Asn Ser Ala Ala Val Leu .
Thr Gly Pro Ser Trp Thr Ile Ser Glu Cys Val Gly Thr Leu Leu Asp Leu Tyr Lys Phe Pro Gln Leu Thr Phe Gly Pro Phe Asp Ser Leu Leu Ser Glu Gln Arg Arg Phe Ser Ser Leu Tyr Gln Val Ala Pro Lys Asp Thr Phe Leu Thr Pro Gly Ile Val Ser Leu Met Leu His Phe His Trp Asn Trp Val Gly Leu Phe Ile Ile Asp Asp Asp Lys Gly Ala Gln Thr Leu Ser Asp Leu Arg Asn Glu Met Asp Lys Asn Gly Val Cys Thr Ala Phe Val Glu Met Ile Pro Val Ile Lys Gly Ser Phe Phe Thr Lys Ser Trp Lys Asn His Val Gln Ile Leu Glu Ser Ser Ser Asn Val Ile Ile Ile Tyr Gly Asp Ser Asp Ser Leu Leu Ser Leu Ile Val Asn Ile Lys Gln Lys Leu Leu Thr Trp Lys Val Trp Val Leu Ile Ser Gln Trp Asp Val Ser Lys Phe Asp Asp Tyr Phe Met Val Asp Ser Leu His Gly Ala Leu Ile Phe Ser His His Arg Glu Glu Ile Pro Asn Phe Thr Asp Phe Met Gln Lys Tyr Asn Pro Ser Lys Tyr Pro Glu Asp Thr Tyr Leu His Val Leu Trp His Met Tyr Phe Asn Cys Ser Phe Val Lys Lys Asp Cys Lys Ile Val His Asn Cys Leu Pro Asn Ala Ser Leu Gly Phe Leu Pro Gly Asn Ile Phe Asp Met Ala Met Ser Glu Glu Ser Tyr Asn Val Tyr Asn Ala Val Tyr Ala Val Ala His Ser Leu His Glu Met Ile Leu Asn Gln Val Gln Phe Gln Thr His Glu Lys Gly Lys Lys Met Val Phe Phe Pro Trp Gln Leu His Pro Phe Leu Arg Glu Arg Gln Leu Ile Asn Gln Asn Gly Ala Asn Glu Asp Leu Asp Cys Thr Arg Lys Ser His Val Glu Tyr Asp Ile Leu Asn Phe Trp Asn Phe Pro Lys Gly Leu Gly Leu Asn Val Lys Val Gly Thr Phe Ser Pro Ser Ala Pro Lys Glu Gln Lys Leu Ser Ile Ser Ser Asn Met Ile Gln Trp Ala Thr Gly Ser Thr Glu Ile Pro Gln Ser Val Cys Ser Glu Ser Cys His Pro Gly Phe Arg Lys Thr His Gln Glu Gly Arg Val Ala Cys Cys Phe Asp Cys Ile Pro Cys Pro Glu Asn Glu Ile Ser Asn Glu Thr Asp Val Asp Gln Cys Val Lys Cys Pro Glu Thr His Tyr Ala Asn Ile Glu Lys Ile His Cys Leu Gln Lys Thr Val Thr Phe Leu Tyr Tyr Asp Asp Pro Leu Gly Lys Thr Leu Cys Phe Met Ser Leu Gly Phe Ser Ser Leu Thr Ala Ala Val Leu Val Val Phe Leu Lys Asn Arg Asp Thr Pro Ile Val Lys Ala Asn Asn Leu Ala Leu Ser Tyr Thr Leu Leu Ile Thr Leu Met Leu Cys Phe Leu Cys Pro Leu Leu Phe Ile Gly Arg Pro Ser Thr Ala Ser Cys Ile Leu Gln Gln Asn Ile Phe Gly Leu Leu Phe Thr Val Ala Leu Ser Thr Val Leu Ala Lys Thr Ile Thr Val Val Ile Ala Phe Lys Ile Thr Ser Pro Giy Arg Ile Arg Arg Trp Leu Leu Ile Ser Arg Ala Pro Asn Phe Ile Ile Pro Leu Cys Thr Leu Leu Gln Val Phe Leu Ser Gly Ile Trp Leu Thr Thr Ser Pro Pro Phe Ile Asp Lys Asp Ala His Ser Glu His Gly His Ile Ile Ile Ile Cys Asn Lys Gly Ser Ala Val Ala Phe His Cys Asn Leu Gly Tyr Leu Gly Ala Leu Ala Leu Val Ser Tyr Phe Met Ala Phe Leu Ser Arg Asn Leu Pro Asp Thr Phe Asn Glu Ala Lys Phe Leu Ala Phe Ser Met Leu Val Phe Cys Ser Val Trp Val Thr Phe Leu Pro Val Tyr His Ser Thr Lys Gly Lys Asn Met Val Ala Met Glu Val Phe Ser Ile Leu Ala Ser Ser Thr Ser Leu Leu Gly Ile Ile Phe Ala Pro Lys Cys Tyr Leu Ile Leu Leu Arg Pro Glu Arg Asn Ser Leu Ser Tyr Ile Arg Asp Lys Thr Tyr Ala Lys Ser Ile Lys Pro Ser (2) INFORMATION FOR SEQ ID N0:43:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3307 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 112...1761 (D) OTHER INFORMATION: GoVN6 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:43:
TAAGGCAGGA AAAAATGTTC ATTTTGATGG AAGTCT"TCTT CTTCTTCCTT AACATTCCAC 60 Met Lys Leu Arg Asp Lys Asp Leu Ser Ile Thr Cys Ser Phe Ile Leu Glu Ala Val Gln Met Pro Thr Glu Asn Asp Tyr Phe Asn Gln Thr Leu Asn Ile . 20 25 30 Leu Lys Thr Thr Lys Asn His Lys Tyr Ala Leu Ala Leu Ala Phe Ser Ile Asp Glu Ile Asn Arg Asn Pro Asp Leu Leu Pro Asn Met Ser Leu AAA ACT
TAC
IleIleLys TyrProLeu GlyLeuCys AspGlyGln ThrThrLeuPro ThrProTyr LeuPheAsn GluIleTyr PheArgPro IleProAsnTyr PheCysAsn GluGluThr MetCysThr PheLeuLeu ThrGlyProHis TrpIleThr SerTyrSer PheTrpIle HisLeuAsn IlePheLeuSer ProSerMet AsnProLys AspThrSer LeuAlaLeu AlaMetValSer PheLeuLeu TyrPheLys TrpAsnTrp ValGlyLeu ValIleSerAsp AspAspGln GlyAsnGln PheLeuSer GluLeuLys LysGluSerLys IleLysGlu IleCysPhe AlaPheVal SerMetLeu AlaIleAspGlu IleSerPhe TyrHisLys ThrGluMet TyrTyrAsn GlnIleValMet SerSerThr AsnValIle IleIleTyr GlyLysThr GluSerIleIle GluLeuSer PheArgMet TrpGluSer ProValIle GlnArgIleTrp ValThrThr LysGluMet AsnPhePro ThrSerLys ArgAspLeuThr HisAspThr PheTyrGly ThrLeuThr PheLeuHis SerHisGlyGlu IleSerGly PheLysAsn PheValGln ThrTrpTyr HisLeuArgIle ThrAspLeu HisLeuVal MetProGlu TrpLysTyr PheAsnTyrGlu AlaSerAla SerAsnCys LysIleLeu LysAsnTyr SerSerSerAla Ser Leu Glu Leu Glu Gln Thr Phe Asp Val Phe Ser Trp Met Met Asp GAT TAT ATG CTC
Gly Ser Arg Ile Asn Ala Val Asn Ala Ala His Ala Asp Tyr Met Leu AAT CAC GCA GGG
His Glu Met Leu Leu Val Asp Asn Gln Ile Asp Asn Asn His Ala Gly AGT CAC TCC AAG
Lys Gly Ala Ser Cys Phe Lys Ile Asn Phe Leu Arg Ser His Ser Lys ACT CCT ATT AGA
Thr His Phe Asn Leu Gly Asp Arg Val Met Lys Glu Thr Pro Ile Arg CAA GAC ACT TCT
Glu Ile Leu Glu Tyr Asn Ile Phe His Trp Asn Phe Gln Asp Thr Ser GGT AAG TTC TTT
', Gln His Ile Phe Val Lys Ile Gly Lys Ser Pro Tyr Gly Lys Phe Phe AGG TTT ATG GCT
Pro His Gly His His Leu Tyr Val Asp Ile Glu Leu Arg Phe Met Ala AGA ATG ACT AGT
Thr Gly Ser Lys Pro Ser Ser Val Cys Glu Asp Cys Arg Met Thr Ser AGA TTC GCA TTT
Pro Gly Tyr Arg Trp Lys Glu Gly Met Ala Cys Cys Arg Phe Ala Phe CCC CCT AAT ATG
Val Cys Ser Cys Glu Asn Ala Ile Ser Glu Thr Asn Pro Pro Asn Met GTG TGT GCC CGG
Asp Gln Cys Asn Pro Glu Tyr Gln Tyr Asn Thr Lys Val Cys Ala Arg ', GAC AAA TGC CAG AAT GTG ATG TTT CTA TAC AAA GAC 1701 ATT AAA AGC CCC
Asp Lys Cys Gln Asn Val Met Phe Leu Tyr Lys Asp Ile Lys Ser Pro GAC TGC TTT AAC
Leu Gly Asp Ser Leu His Ser Leu Leu Leu Cys Ile Asp Cys Phe Asn ACT GTGAAGCACC
ATGACACTCC
TATTGTGAAG
GCCAA
. Ser Cys Cys Thr TATTAATCAC GTCTCTCTTG
TTCTGTTTTC TCTGCTCATT
ACAGAGCAAC CTGCATCTTA
CAGCAAATCA CATTTGGAAT
CTACAATTTT GGCAAAAACA
ATCACTGTGG TTCTGGCTTT
GAAGGTTGAG AAACTTCCTA
GTATTGGGTA CACTCAACTA
TGTTTCAATG TATTCTGTGT
GCAATCTGGC TAGCAGTTTC
ATGAACACAC TGAGTATGGC
CACATCATCA TTGTGTGCAA
(2) INFORMATION FOR SEQ ID N0:44:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 550 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:44:
Met Lys Leu Arg Asp Lys Asp Leu Ser Ile Thr Cys Ser Phe Ile Leu Glu Ala Val Gln Met Pro Thr Glu Asn Asp Tyr Phe Asn Gln Thr Leu Asn Ile Leu Lys Thr Thr Lys Asn His Lys Tyr Ala Leu Ala Leu Ala Phe Ser Ile Asp Glu Ile Asn Arg Asn Pro Asp Leu Leu Pro Asn Met Ser Leu Ile Ile Lys Tyr Pro Leu Gly Leu Cys Asp Gly Gln Thr Thr Leu Pro Thr Pro Tyr Leu Phe Asn Glu Ile Tyr Phe Arg Pro Ile Pro Asn Tyr Phe Cys Asn Glu Glu Thr Met Cys Thr Phe Leu Leu Thr Gly Pro His Trp Ile Thr Ser Tyr Ser Phe Trp Ile His Leu Asn Ile Phe Leu Ser Pro Ser Met Asn Pro Lys Asp Thr Ser Leu Ala Leu Ala Met Val Ser Phe Leu Leu Tyr Phe Lys Trp Asn Trp Val Gly Leu Val Ile Ser Asp Asp Asp Gln Gly Asn Gln Phe Leu Ser Glu Leu Lys Lys Glu Ser Lys Ile Lys Glu Ile Cys Phe Ala Phe Val Ser Met Leu Ala Ile Asp Glu Ile Ser Phe Tyr His Lys Thr Glu Met Tyr Tyr Asn Gln Ile Val Met Ser Ser Thr Asn Val Ile Ile Ile Tyr Gly Lys Thr Glu Ser Ile Ile Glu Leu Ser Phe Arg Met Trp Glu Ser Pro Val Ile Gln Arg Ile Trp Val Thr Thr Lys Glu Met Asn Phe Pro Thr Ser Lys Arg Asp Leu Thr His Asp Thr Phe Tyr Gly Thr Leu Thr Phe Leu His Ser His Gly Glu Ile Ser Gly Phe Lys Asn Phe Val Gln Thr Trp Tyr His Leu Arg Ile Thr Asp Leu His Leu Val Met Pro Glu Trp Lys Tyr Phe Asn Tyr Glu Ala Ser Ala Ser Asn Cys Lys Ile Leu Lys Asn Tyr Ser Ser ' 305 310 315 320 Ser Ala Ser Leu Glu Trp Leu Met Glu Gln Thr Phe Asp Met VaI Phe Ser Asp Gly Ser Arg Asp IIe Tyr Asn Ala Val Asn Ala Met Ala His Ala Leu His Glu Met Asn Leu His Leu Val Asp Asn Gln Ala Ile Asp Asn Gly Lys Gly Ala Ser Ser His Cys Phe Lys Ile Asn Ser Phe Leu Arg Lys Thr His Phe Thr Asn Pro Leu Gly Asp Arg Val Ile Met Lys Glu Arg Glu Ile Leu Gln Glu Asp Tyr Asn Ile Phe His Thr Trp Asn Phe Ser Gln His Ile Gly Phe Lys Val Lys Ile Gly Lys Phe Ser Pro Tyr Phe Pro His Gly Arg His Phe His Leu Tyr Val Asp Met Ile Glu Leu Ala Thr Gly Ser Arg Lys Met Pro Ser Ser Val Cys Thr Glu Asp Cys Ser Pro Gly Tyr Arg Arg Phe Trp Lys Glu Gly Met Ala Ala Cys Cys Phe Val Cys Ser Pro Cys Pro Glu Asn Ala Ile Ser Asn Glu Thr Asn Met Asp Gln Cys Val Asn Cys Pro Glu Tyr Gln Tyr Ala Asn Thr Lys Arg Asp Lys Cys Ile Gln Lys Asn Val Met Phe Leu Ser Tyr Lys Asp Pro Leu Gly Asp Asp Ser Cys Leu His Ser Leu Leu Phe Leu Cys Ile Asn Ser Cys Cys Thr (2) INFORMATION FOR S8Q ID N0:45:
{i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3938 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 46...2424 (D) OTHER INFORMATION: GoVN7 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:45:
Met Ile Val Phe Phe Leu Leu Asn Ile Pro Leu Leu Met Ala Asn Ser Val Asp Pro Arg AAA GTC ATA GAT
ATA GAT
CysPhe TrpLysIle AsnLeuAsn Glu LysAsp AspLeu Asp Val Ile GTT CCT
ThrSer CysTyrPhe IleLeuGlu Ala GlnLeu MetGlu Lys Val Pro CTA ACC
AspTyr PheAsnGln ThrLeuAsn Val LysThr LysTyr Asn Leu Thr ATG ATA
ArgTyr AlaLeuAla LeuAlaPhe Thr AspGlu AsnArg Asn Met Ile ATT CAT
ProHis IleLeuPro AsnMetSer Leu IleLys ThrLeu Gly Ile His TTA CAA
HisCys AspGlyAsn IleProLeu Arg LeuAsn IlePhe Tyr Leu Gln GAA ATG
MetPro PheProAsn TyrGlyCys Asn GluThr CysSer Phe Glu Met TCT TTT
MetLeu MetGlyPro AsnLeuTrp Pro ValAsp PheIle His Ser Phe CAG TTC
LeuAsn IleLeuPhe ProHisPhe Leu IleSer GlyPro Phe Gln Phe TTT ATC
HisSer IlePheSer AspAsnGlu Gln ProTyr TyrGln Met Phe Ile GCA TCT
ThrPro LysAspThr SerLeuAla Leu MetVal PheIle Leu Ala Ser GTC GAT
TyrPhe AsnTrpAsn TrpValGly Leu LeuSer AsnAsp Glu Val Asp AAA CAC
GlyAsn GlnPheLeu ThrGluLeu Lys GluThr AsnThr Glu Lys His GCA GAG
IleCys PheAlaPhe ValAsnMet Met IleAsn AsnSer Ser Ala Glu CAA ATG
MetLys LysThrAsp MetTyrTyr Asn IleVal SerThr Ala Gln Met CCC ATT
AsnVal IleIleIle TyrGlyGlu Arg SerIle GluLeu Cys Pro Ile TTC AGA ACA TGG ACA TCT CCA GTC ATA CAG AGG ATA TGG GTT ACC AAA . 921 Phe Arg Thr Trp Thr Ser Pro Val Ile Gln Arg Ile Trp Val Thr Lys Ser Glu Leu Tyr Phe Pro Thr Ser Lys Arg Asp Leu Ser His Gly Thr Phe Tyr Gly Thr Leu Ala Phe Gln Gln His His Asp Val Ile Ser Gly Phe Lys Asn PheVal Gln Thr Tyr His Leu Lys Ser Asp Leu Trp Met ', TAT TTA TTA AAGCCA GAG TGG TTC TTT GAA TAT GAA TCA GCA 1113 GGT ACC
Tyr Leu Leu LysPro Glu Trp Phe Phe Glu Tyr Glu Ser Ala Gly Thr AGT TCA
Ser Tyr Cys LysIle Leu Met Asn Ser Ser Asn Val Leu Glu Ser Ser ' 360 365 370 GAC AAT
Trp Leu Met GluGln Lys Phe Ile Ala Phe Asn Asp Ser His Asp Asn GCC CAT
Ser Ile Tyr AsnAla Val Tyr Met Ala His Ala Leu Glu Lys Ala His CAG AAA
Asn Leu Lys GlnIle Asp Asn Glu Ile Ser Tyr Gly Gly Ala Gln Lys CAC ATC
Ser Thr His CysLeu Lys Leu Ser Phe Leu Arg Thr His Phe His Ile GTG GTA
Thr Asn Pro PheGly Glu Arg Ile Met Lys Glu Arg Arg Val Val Val CAC CAA
Gln Glu Asp TyrAsp Ile Val Leu Gln Asn Cys Ser His Leu His Gln Arg Ile Lys Val Lys Ile Gly Gln Phe Ser Pro Tyr Phe Pro His Gly Gly Gln Phe His Leu Tyr Glu Asp Met Ile Asp Leu AIa Thr Gly Ser _ Arg Lys Met Pro Leu Ser Met Cys Ser Ala Asp Cys Arg Pro Gly Tyr Arg Lys Phe Trp Lys Glu Gly Met Ala Ala Cys Cys Phe Val Cys Ser Pro Cys Pro Asp Asn Glu Ile Ser Asn Glu Thr Thr Val Val Leu Trp ACT ATT AAT
ValPheVal LysHisHis Asp Pro Val LysAla Asn Arg Thr Ile Asn ATG CTC TTT
IleLeuSer TyrIleLeu Ile Ser Met PheCys Leu Cys Met Leu Phe CCT AGA ATC
SerPhePhe PheIleGly His Asn Gly ThrCys Leu Gln Pro Arg Ile TTC GTG ACA
GlnIleThr PheGlyIle Val Thr Ala ValSer Val Leu Phe Val Thr CTG TTT GAC
AlaLysThr IleThrVal Leu Ala Gln ValThr Thr Gly Leu Phe Asp GTA GGG TAC
ArgLysLeu ArgAsnPhe Leu Ser Thr ProAsn Ile Ile Val Gly Tyr TGC CTG TGG
ProIleCys SerLeuLeu Gln Thr Cys AlaIle Leu Ala Cys Leu Trp ATC GAA CAT
ValSerPro ProPheVal Asp Asp His SerGlu Gly His Ile Glu His GGA GTT TAC
IleIleIle ValCysAsn Lys Ser Met AlaPhe Cys Val Gly Val Tyr GCC GGA ATG
LeuGlyTyr LeuAlaPhe Leu Leu Ser PheThr Ala Phe Ala Gly Met ACA AAT TTC
LeuAlaLys AsnLeuPro Asp Phe Glu AlaLys Leu Thr Thr Asn Phe AGT TGG CTT
PheSerMet LeuValPhe Cys Val Ile ThrPhe Pro Val Ser Trp Leu GTC GTT ATT
TyrHisSer ThrLysGly Arg Met Ala ValGlu Phe Ser Val Val Ile ATG GGA GCA
IleLeuThr SerSerAla Gly Leu Cys ValPhe Pro Lys Met Gly Ala CCA AGA AAA
IleTyrIle IleLeuMet Lys Glu Ile LeuSer Arg Gln Pro Arg Lys TTTTAGAAAT
TCTGTCAAAT
GTACAGTTGT
T
GluLysSer ArgPhe CTCACTAGTT
CCATAAAATC
TGTAGTATTA
CAAGTACATT
ACAGGATTAC
GAATCAACAA
CAGAATACTG
GTAGAAGTTT
GAGCACCCTG
GAATACCAGC
ATACATAAGC
TCAGTGGAAG
GTGATGGTTT
GAGGAATTTG
TCAGTCTGTT
CCTGCCTGGA
GAACATGTAA
GTCTGTACAT
GATTTCCTCT
CACCGTAAAA
TATTAACATG
TGTACCTAAT
CACAAAATTC
ATAAATTTTC
(2) INFORMATION FOR SEQ ID N0:46:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 793 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:46:
Met Ile Val Phe Phe Leu Leu Asn Ile Pro Leu Leu Met Ala Asn Ser Val Asp Pro Arg Cys Phe Trp Lys Ile Asn Leu Asn Glu Val Lys Asp Ile Asp Leu Asp Thr Ser Cys Tyr Phe Ile Leu Glu Ala Val Gln Leu Pro Met Glu Lys Asp Tyr Phe Asn Gln Thr Leu Asn Val Leu Lys Thr Thr Lys Tyr Asn Arg Tyr Ala Leu Ala Leu Ala Phe Thr Met Asp Glu Ile Asn Arg Asn Pro His Ile Leu Pro Asn Met Ser Leu Ile Ile Lys His Thr Leu Gly His Cys Asp Gly Asn Ile Pro Leu Arg Leu Leu Asn ' Gln Ile Phe Tyr Met Pro Phe Pro Asn Tyr Gly Cys Asn Glu Glu Thr Met Cys Ser Phe Met Leu Met Gly Pro Asn Leu Trp Pro Ser Val Asp Phe Phe Ile His Leu Asn Ile Leu Phe Pro His Phe Leu Gln Ile Ser Phe Gly Pro Phe His Ser Ile Phe Ser Asp Asn Glu Gln Phe Pro Tyr Ile Tyr Gln Met Thr Pro Lys Asp Thr Ser Leu Ala Leu Ala Met Val Ser Phe Ile Leu Tyr Phe Asn Trp Asn Trp Val Gly Leu Val Leu Ser Asp Asn Asp Glu Gly Asn Gln Phe Leu Thr Glu Leu Lys Lys Glu Thr His Asn Thr Glu Ile Cys Phe Ala Phe Val Asn Met Met Ala Ile Asn Glu Asn Ser Ser Met Lys Lys Thr Asp Met Tyr Tyr Asn Gln Ile Val Met Ser Thr Ala Asn Val Ile Ile Ile Tyr Gly Glu Arg Pro Ser Ile Ile Glu Leu Cys Phe Arg Thr Trp Thr Ser Pro Val Ile Gln Arg Ile Trp Val Thr Lys Ser Glu Leu Tyr Phe Pro Thr Ser Lys Arg Asp Leu Ser His Gly Thr Phe Tyr Gly Thr Leu Ala Phe Gln Gln His His Asp Val Ile Ser Gly Phe Lys Asn Phe Val Gln Thr Trp Tyr His Leu Lys Ser Met Asp Leu Tyr Leu Leu Lys Pro Glu Trp Gly Phe Phe Glu Tyr Glu Thr Ser Ala Ser Tyr Cys Lys Ile Leu Met Ser Asn Ser Ser Asn Val Ser Leu Glu Trp Leu Met Glu Gln Lys Phe Asp Ile Ala Phe Asn Asp Asn Ser His Ser Ile Tyr Asn Ala Val Tyr Ala Met Ala His Ala Leu His Glu Lys Asn Leu Lys Gln Ile Asp Asn Gln Glu Ile Ser Tyr Gly Lys Gly Ala Ser Thr His Cys Leu Lys Leu His Ser Phe Leu Arg Thr Ile His Phe Thr Asn Pro Phe Gly Glu Arg Val Ile Met Lys Glu Arg Val Arg Val Gln Glu Asp Tyr Asp Ile Val His Leu Gln Asn Cys Ser Gln His Leu Arg Ile Lys Val Lys Ile Gly Gln Phe Ser Pro Tyr Phe Pro His Gly Gly Gln Phe His Leu Tyr Glu Asp Met Ile Asp Leu Ala Thr Gly Ser Arg Lys Met Pro Leu Ser Met Cys Ser Ala Asp Cys Arg Pro Gly Tyr Arg Lys Phe Trp Lys Glu Gly Met Ala Ala Cys Cys Phe Val Cys Ser Pro Cys Pro Asp Asn Glu Ile Ser Asn Glu Thr Thr Val Val Leu Trp Val Phe Val Lys His His Asp Thr Pro Ile Val Lys Ala Asn Asn Arg Ile Leu Ser Tyr Ile Leu Ile Met Ser Leu Met Phe Cys Phe Leu Cys Ser Phe Phe Phe Ile Gly His Pro Asn Arg Gly Thr Cys Ile Leu Gln Gln Ile Thr Phe Gly Ile Val Phe Thr Val Ala Val Ser Thr Val Leu Ala Lys Thr Ile Thr Val Leu Leu Ala Phe Gln Val Thr Asp Thr Gly Arg Lys Leu Arg Asn Phe Leu Val Ser Gly Thr Pro Asn Tyr Ile Ile Pro Ile Cys Ser Leu Leu Gln Cys Thr Leu Cys Ala Ile Trp Leu Ala Val Ser Pro Pro Phe Val Asp Ile Asp Glu His Ser Glu His Gly His Ile Ile Ile Val Cys Asn Lys Gly Ser Val Met Ala Phe Tyr Cys Val Leu Gly Tyr Leu Ala Phe Leu Ala Leu Gly Ser Phe Thr Met Ala Phe Leu Ala Lys Asn Leu Pro Asp Thr Phe Asn Glu Ala Lys Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ser Val Trp Ile Thr.
Phe Leu Pro Val Tyr His Ser Thr Lys Gly Arg Val Met Val Ala Val Glu Ile Phe Ser Ile Leu Thr Ser Ser Ala Gly Met Leu Gly Cys Val Phe Ala Pro Lys Ile Tyr Ile Ile Leu Met Lys Pro Glu Arg Ile Leu Ser Lys Arg Gln Glu Lys Ser Arg Phe (2) INFORMATION FOR SEQ ID N0:47:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3359 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 59...2452 (D) OTHER INFORMATION: GoVNI3C
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:47:
Met Val Ile Phe Phe Leu Leu Asn Ile Pro Phe Leu Leu Ala Asn Phe Met Asp Pro Arg Cys Phe Trp Lys Ile Asn Leu Asn Glu Ile Lys Asp Glu Val Leu Gly Met Thr Cys Ser Phe Ile Leu Glu Thr Val Gln Lys Thr Met Asp Lys Asp Tyr Phe Asn Gln Thr Leu Asn Val Leu Asn Thr Thr Thr Asn His Lys Tyr Ala Leu Ala Leu Ala Phe Thr Val Asp Glu Ile Asn Arg Asn Pro Asp Leu Leu Pro Asn Met Ser Leu Ile Ile Lys Tyr Asn Leu Gly His Cys Asp Gly Lys Thr Val Thr Thr Leu Ser Asp Leu Phe Asn Pro Asn Asn His Leu His Phe Pro Asn Tyr Leu Cys Asn Glu AGG GAT TAT GTG TTT GGT TCT GCT TAC AGG ACC ACA TTG GAG AGC ATC . 492 Gly Ile Met Cys Leu Val Leu Leu Thr Gly Pro His Trp Arg Ala Ser TTT CCT
Leu Tyr Leu Trp Ile Ser Val Tyr Val Tyr Leu Ser Pro His Phe Leu TGA ACA
Gln Leu Ser Tyr Gly Pro Phe Tyr Ser Ile Phe Ser Asp Asn Glu Gln AGC ATT
Tyr Pro Tyr Leu Tyr Gln Met Gly Pro Lys Asp Ser Ser Leu Ala Leu TGG GCT
Ala Met Val Ser Phe Ile Ile Tyr Phe Lys Trp Asn Trp Val Gly Leu GTT GAA
Phe Ile Ser Asp Asp Asp Gln Gly Asn Gln Phe Leu Ser Glu Leu Lys CAT GAT
Lys Glu Ser Gln Thr Lys Asp Ile Cys Phe Ala Phe Val Asn Met Ile CTA CAA
Ser Val Ser Asp Val Ser Tyr Tyr His Lys Thr Glu Met Tyr Tyr Asn GGA AAC
Gln Ile Val Met Ser Ser Thr Lys Val Ile Ile Ile Tyr Gly Glu Thr AGT TAA
Asn Ser Ile Ile Glu Leu Ser Phe Arg Met Trp Ser Ser Pro Val Lys CAG TAA
Gln Arg Ile Trp Val Thr Thr Lys Gln Phe Asp Cys Pro Thr Ser Lys TCT ACA
Arg Asp Leu Thr His Gly Thr Phe Tyr Gly Thr Leu Thr Phe Leu His ACG GTA
His Tyr Gly Glu Ile Ser Gly Phe Lys Asn Phe Val Gln Thr Arg Tyr GAA ATA
Asn Leu Arg Ser Thr Asp Leu Tyr Leu Val Met Pro Glu Trp Lys Tyr AAA CTA
Phe Asn Tyr Glu Ala Ser Ala Ser Asn Cys Lys Ile Leu Arg Asn Tyr TGA CAT
Leu Ser Asn Ile Ser Leu Glu Trp Leu Met Glu Gln Lys Phe Asp Met TGC CAT
Ser Phe Ser Asp Tyr Ser His Asn Ile Tyr Asn Ala Val Tyr Ala Ile.
Ala His Ala Leu His Glu Lys Asn Leu Gln Glu Val Glu Asn Gln Ala Ile Asn Asn Ala Lys Gly Glu Asn Thr His Cys Leu Lys Leu Asn Ser Phe Leu Arg Lys Thr His Phe Thr Asn Ser Leu Gly Asn Arg Val Ile Met Lys Gln Arg Glu Val Val His Gly Asp Tyr Asn Ile Val His Met Trp Asn Phe Ser Gln Arg Leu Gly Ile Lys Val Lys Ile Gly Gln Phe ', 5 470 475 480 Ser Pro His Phe Pro Gln Gly Gln Gln Leu His Leu Tyr Val Asp Met ' 485 490 495 Thr Glu Leu Ala Thr Gly Ser Arg Lys Met Pro Ser Ser Val Cys Ser Ala Asp Cys His Pro Gly Phe Arg Arg Ile Trp Lys Glu Glu Met Ala Ala Cys Cys Phe Val Cys Asn Pro Cys Pro Glu Asn Glu Ile Ser Asn ', Glu Thr Met Val Val Phe Trp Val Phe Val Lys His His Asp Thr Pro Ile Val Lys Ala Asn Asn Arg Ile Leu Ser Tyr Leu Leu Ile Val Ser Leu Met Phe Cys Phe Leu Cys Ser Phe Phe Phe Ile Gly Tyr Pro Asn Arg Ala Thr Cys Ile Leu Gln Gln Ile Thr Phe Gly Ile Phe Phe Thr Val Ala Ile Ser Thr Val Leu Ala Lys Thr Ile Thr Val Val Leu Ala _ 610 615 620 62 Phe Lys Val Thr Asp Pro Gly Arg Gln Leu Arg Ile Phe Leu Val Ser Gly Thr Pro Asn Tyr Ile Ile Pro Ile Cys Ser Leu Leu Gln Cys Ile TAT TGA
Leu Cys Ala Ile Trp Leu Ala Val Ser Pro Pro Phe Val Asp Ile Asp GGG CTC
Glu His Ser Glu His Gly His Ile Ile Ile Val Cys Asn Lys Gly Ser GGC CTT
Ile Thr Ala Phe Tyr Cys Val Leu Gly Tyr Leu Ala Cys Leu Ala Phe CAC ATT
Gly Ser Phe Thr Ile Ala Phe Leu Ala Lys Asn Leu Pro Asp Thr Phe CGC TGT
Asn Glu Ala Lys Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ala Val GGT CAT
Trp Val Thr Phe Leu Pro Val Tyr His Ser Thr Lys Gly Lys Val Met GAT GCT
Val Ala Val Glu Ile Phe Ser Ile Leu Ala Ser Ser Ala Gly Met Leu ACC AGA
Gly Cys Ile Phe Ala Pro Lys Val Tyr Ile Ile Leu Met Arg Pro Asp TGAAAAGGTA
Arg Asn Ser Ile His Lys Ile Arg Glu Lys Ser Tyr Phe AGTTACAGAG
GAAGTACCAT
GCTTAGTATC
TTTGCTTTCA
ATATAACCTT
TAACTAAAAA
TACTTGACAG
GCTGAAAATG
ATCTGAGAAC
CCATAGGAAT
CCACTAACAA
ATTGCCTGGT
GGATTGGGGA
TATTGGATGA
AAAAAAA
(2) INFORMATION FOR SEQ ID N0:48:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 79B amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:48:
Met Val Ile Phe Phe Leu Leu Asn Ile Pro Phe Leu Leu Ala Asn Phe Met Asp Pro Arg Cys Phe Trp Lys Ile Asn Leu Asn Glu Ile Lys Asp Glu Val Leu Gly Met Thr Cys Ser Phe Ile Leu Glu Thr Val Gln Lys Thr Met Asp Lys Asp Tyr Phe Asn Gln Thr Leu Asn Val Leu Asn Thr Thr Thr Asn His Lys Tyr Ala Leu Ala Leu Ala Phe Thr Val Asp Glu Ile Asn Arg Asn Pro Asp Leu Leu Pro Asn Met Ser Leu Ile Ile Lys Tyr Asn Leu Gly His Cys Asp Gly Lys Thr Val Thr Thr Leu Ser Asp Leu Phe Asn Pro Asn Asn His Leu His Phe Pro Asn Tyr Leu Cys Asn Glu Gly Ile Met Cys Leu Val Leu Leu Thr Gly Pro His Trp Arg Ala Ser Leu Tyr Leu Trp Ile Ser Val Tyr Val Tyr Leu Ser Pro His Phe Leu Gln Leu Ser Tyr Gly Pro Phe Tyr Ser Ile Phe Ser Asp Asn Glu Gln Tyr Pro Tyr Leu Tyr Gln Met Gly Pro Lys Asp Ser Ser Leu Ala Leu Ala Met Val Ser Phe Ile Ile Tyr Phe Lys Trp Asn Trp Val Gly Leu Phe Ile Ser Asp Asp Asp Gln Gly Asn Gln Phe Leu Ser Glu Leu Lys Lys Glu Ser Gln Thr Lys Asp Ile Cys Phe Ala Phe Val Asn Met Ile Ser Val Ser Asp Val Ser Tyr Tyr His Lys Thr Glu Met Tyr Tyr Asn Gln Ile Val Met Ser Ser Thr Lys Val Ile Ile Ile Tyr Gly Glu Thr Asn Ser Ile Ile Glu Leu Ser Phe Arg Met Trp Ser Ser Pro Val Lys Gln Arg Ile Trp Val Thr Thr Lys Gln Phe Asp Cys Pro Thr Ser Lys Arg Asp Leu Thr His Gly Thr Phe Tyr Gly Thr Leu Thr Phe Leu His His Tyr Gly Glu Ile Ser Gly Phe Lys Asn Phe Val Gln Thr Arg Tyr Asn Leu Arg Ser Thr Asp Leu Tyr Leu Val Met Pro Glu Trp Lys Tyr Phe Asn Tyr Glu Ala Ser Ala Ser Asn Cys Lys Ile Leu Arg Asn Tyr Leu Ser Asn Ile Ser Leu Glu Trp Leu Met Glu Gln Lys Phe Asp Met Ser Phe Ser Asp Tyr Ser His Asn Ile Tyr Asn Ala Val Tyr Ala Ile Ala His Ala Leu His Glu Lys Asn Leu Gln Glu Val Glu Asn Gln Ala Ile Asn Asn Ala Lys Gly Glu Asn Thr His Cys Leu Lys Leu Asn Ser Phe Leu Arg Lys Thr His Phe Thr Asn Ser Leu Gly Asn Arg Val Ile Met Lys Gln Arg Glu Val Val His Gly Asp Tyr Asn Ile Val His Met Trp Asn Phe Ser Gln Arg Leu Gly Ile Lys Val Lys Ile Gly Gln Phe Ser Pro His Phe Pro Gln Gly Gln Gln Leu His Leu Tyr Val Asp Met Thr Glu Leu Ala Thr Gly Ser Arg Lys Met Pro Ser Ser Val Cys Ser Ala Asp Cys His Pro Gly Phe Arg Arg Ile Trp Lys Glu Glu Met.
Ala Ala Cys Cys Phe Val Cys Asn Pro Cys Pro Glu Asn Glu Ile Ser Asn Glu Thr Met Val Val Phe Trp Val Phe Val Lys His His Asp Thr Pro Ile Val Lys Ala Asn Asn Arg Ile Leu Ser Tyr Leu Leu Ile Val Ser Leu Met Phe Cys Phe Leu Cys Ser Phe Phe Phe Ile Gly Tyr Pro Asn Arg Ala Thr Cys Ile Leu Gln Gln Ile Thr Phe Gly Ile Phe Phe Thr Val Ala Ile Ser Thr Val Leu Ala Lys Thr Ile Thr Val Val Leu Ala Phe Lys Val Thr Asp Pro Gly Arg Gln Leu Arg Ile Phe Leu Val Ser Gly Thr Pro Asn Tyr Ile Ile Pro Ile Cys Ser Leu Leu Gln Cys Ile Leu Cys Ala Ile Trp Leu Ala Val Ser Pro Pro Phe Val Asp Ile Asp Glu His Ser Glu His Gly His Ile Ile Ile Val Cys Asn Lys Gly Ser Ile Thr Ala Phe Tyr Cys Val Leu Gly Tyr Leu Ala Cys Leu Ala Phe Gly Ser Phe Thr Ile Ala Phe Leu Ala Lys Asn Leu Pro Asp Thr Phe Asn Glu Ala Lys Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ala Val Trp Val Thr Phe Leu Pro Val Tyr His Ser Thr Lys Gly Lys Val Met Val Ala Val Glu Ile Phe Ser Ile Leu Ala Ser Ser Ala Gly Met Leu Gly Cys Ile Phe Ala Pro Lys Val Tyr Ile Ile Leu Met Arg Pro Asp Arg Asn Ser Ile His Lys Ile Arg Glu Lys Ser Tyr Phe (2) INFORMATION FOR SEQ ID N0:49:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3012 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 3...2087 (D) OTHER INFORMATION: GoVNI3B
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:49:
Val Tyr Leu Ser Pro His Phe Leu Gln Leu Ser Tyr Gly Pro Phe Tyr Ser Ile Phe Ser Asp Asn Glu Gln Tyr Pro Tyr Leu Tyr Gln Met Gly Pro Lys Asp Ser Ser Leu Ala Leu Ala Met Val Ser Phe Ile Ile WO 5!9/00422 PCT/US98/13680 AAG GAT CAA
Tyr Phe Trp Asn Trp Val Gly Leu Phe Ile Ser Asp Asp Lys Asp Gln CAA AAG GAT
Gly Asn Phe Leu Ser Glu Leu Lys Lys Glu Ser Gln Thr Gln Lys Asp w ATT TGC GCC TTT GTG AAC ATG ATA TCA GTC AGT GAT GTT 287 TTT TCA TAC
Ile Cys Ala Phe Val Asn Met Ile Ser Val Ser Asp Val Phe Ser Tyr AAA TCC ACA
Tyr His Thr Glu Met Tyr Tyr Asn Gln Ile Val Met Ser Lys Ser Thr ATT TTG AGC
Lys Val Ile Ile Tyr Gly Glu Thr Asn Ser Ile Ile Glu Ile Leu Ser ATG ACC ACA
Phe Arg Trp Ser Ser Pro Val Lys Gln Arg Ile Trp Val Met Thr Thr TTT GGC ACA
Lys Gln Asp Cys Pro Thr Ser Lys Arg Asp Leu Thr His Phe Gly Thr GGG TCT GGC
Phe Tyr Thr Leu Thr Phe Leu His His Tyr Gly Glu Ile Gly Ser Gly AAT GAT TTA
Phe Lys Phe Val Gln Thr Arg Tyr Asn Leu Arg Ser Thr Asn Asp Leu ',, 180 185 190 GTA TCA GCA
Tyr Leu Met Pro Glu Trp Lys Tyr Phe Asn Tyr Glu Ala Val Ser Ala TGT CTG GAA
Ser Asn Lys Ile Leu Arg Asn Tyr Leu Ser Asn Ile Ser Cys Leu Glu ATG AGT CAC
', Trp Leu Glu Gln Lys Phe Asp Met Ser Phe Ser Asp Tyr Met Ser His TAC GAG AAA
Asn Ile Asn Ala Val Tyr Ala Ile Ala His Ala Leu His Tyr Glu Lys CAA GGA GAA
Asp Leu Glu Phe Glu Asn Gln Ala Ile Asn Asn Ala Lys Gln Gly Glu CAC CAC TTC
Asn Thr Cys Leu Lys Leu Asn Ser Phe Leu Arg Lys Thr His His Phe TCT GTA GTG
Thr Asn Leu Gly Asn Arg Val Ile Met Lys Gln Arg Glu Ser Val Val GAC CGC CTT .
HisGly Tyr IlevalHis MetTrpAsn Phe Gln Leu Asp Asn Ser Arg TTT
GlyIleLysVal LysIleGlyGln PheSerPro His ProGlnGly Phe GCT
GlnGlnLeuHis LeuTyrValAsp MetThrGlu Leu ThrGlySer Ala CAT
ArgLysMetPro SerSerValCys SerAlaAsp Cys ProGlyPhe His TTT
ArgArgIleTrp LysGluGluMet AlaAlaCys Cys ValCysAsn Phe ATG
ProCysProGlu AsnGluIleSer AsnGluThr Asn AspGlnCys Met AAG
AlaAsnCysPro GluTyrGlnTyr AlaAsnThr Glu AsnLysCys Lys CCC
IleGlnLysGly ValIleValLeu SerTyrGlu Asp LeuGlyMet Pro ACA
AlaLeuAlaLeu IleAlaPheCys PheSerAla Phe ValValVal Thr GTG
PheTrpValPhe ValLysHisHis AspThrPro Ile LysAlaAsn Val ATG
AsnArgIleLeu SerTyrLeuLeu IleValSer Leu PheCysPhe Met GCA
LeuCysSerPhe PhePheIleGly TyrProAsn Arg ThrCysIle Ala GCT
LeuGlnGlnIle ThrPheGlyIle PhePheThr Val IleSerThr Aia AAA
ValLeuAlaLys ThrIleThrVal ValLeuAla Phe ValThrAsp Lys ACA
ProGlyArgGln LeuArgIlePhe LeuValSer Gly ProAsnTyr Thr TGT
IleIleProIle CysSerLeuLeu GlnCysIle Leu AlaIleTrp Cys CAC
LeuAlaValSer ProProPheVal AspIleAsp Glu SerGluHis.
His Gly His Ile Ile Ile Val Cys Asn Lys Gly Ser Ile Thr Ala Phe Tyr Cys Val Leu Gly Tyr Leu Ala Cys Leu Ala Phe Gly Ser Phe Thr Ile Ala Phe Leu Ala Lys Asn Leu Pro Asp Thr Phe Asn Glu Ala Lys Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ala Val Trp Val Thr Phe Leu Pro Val Tyr His Ser Thr Lys Gly Lys Val Met Val Ala Val Glu Ile Phe Ser Ile Leu Ala Ser Ser Ala Gly Met Leu Gly Cys Ile Phe Ala Pro Lys Val Tyr Ile Ile Leu Met Arg Pro Asp Arg Asn Ser Ile His Lys Ile Arg Glu Lys Ser Tyr Phe CATATATCTA
ATGTACTGAT
TTCTGGAAGA
TGAGGATTTC
ACATAGAAAG
TCAACAAAGA
AATACTGTCT
TTCTTTATCT
TCCTGTGGTT
TACAAAGCAG
AGTCAGCCTA
TCCAGCCTCA
AGGTCTGGGG
AGAATGAATC
19~AAAAAAAAA
(2) INFORMATION FOR SEQ ID N0:50:
(i) SEQUENCE CHARACTERISTICS:
' (A) LENGTH: 695 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single ' (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:50:
Val Tyr Leu Ser Pro His Phe Leu Gln Leu Ser Tyr Gly Pro Phe Tyr Ser Ile Phe Ser Asp Asn Glu Gln Tyr Pro Tyr Leu Tyr Gln Met Gly Pro Lys Asp Ser Ser Leu Ala Leu Ala Met Val Ser Phe Ile Ile Tyr Phe Lys Trp Asn Trp Val Gly Leu Phe Ile Ser Asp Asp Asp Gln Gly Asn Gln Phe Leu Ser Glu Leu Lys Lys Glu Ser Gln Thr Lys Asp Ile Cys Phe Ala Phe Val Asn Met Ile Ser Val Ser Asp Val Ser Tyr Tyr His Lys Thr Glu Met Tyr Tyr Asn Gln Ile Val Met Ser Ser Thr Lys Val Ile Ile Ile Tyr Gly Glu Thr Asn Ser Ile Ile Glu Leu Ser Phe Arg Met Trp Ser Ser Pro Val Lys Gln Arg Ile Trp Val Thr Thr Lys Gln Phe Asp Cys Pro Thr Ser Lys Arg Asp Leu Thr His Gly Thr Phe Tyr Gly Thr Leu Thr Phe Leu His His Tyr Gly Glu Ile Ser Gly Phe Lys Asn Phe Val Gln Thr Arg Tyr Asn Leu Arg Ser Thr Asp Leu Tyr Leu Val Met Pro Glu Trp Lys Tyr Phe Asn Tyr Glu Ala Ser Ala Ser Asn Cys Lys Ile Leu Arg Asn Tyr Leu Ser Asn Ile Ser Leu Glu Trp Leu Met Glu Gln Lys Phe Asp Met Ser Phe Ser Asp Tyr Ser His Asn Ile Tyr Asn Ala Val Tyr Ala Ile Ala His Ala Leu His Glu Lys Asp Leu Gln Glu Phe Glu Asn Gln Ala Ile Asn Asn Ala Lys Gly Glu Asn Thr His Cys Leu Lys Leu Asn Ser Phe Leu Arg Lys Thr His Phe Thr Asn Ser Leu Gly Asn Arg Val Ile Met Lys Gln Arg Glu Val Val His Gly Asp Tyr Asn Ile Val His Met Trp Asn Phe Ser Gln Arg Leu Gly Ile Lys Val Lys Ile Gly Gln Phe Ser Pro His Phe Pro Gln Gly Gln Gln Leu His Leu Tyr Val Asp Met Thr Glu Leu Ala Thr Gly Ser Arg Lys Met Pro Ser Ser Val Cys Ser Ala Asp Cys His Pro Gly Phe Arg Arg Ile Trp Lys Glu Glu Met Ala Ala Cys Cys Phe Val Cys Asn Pro Cys Pro Glu Asn Glu Ile Ser Asn Glu Thr Asn Met Asp Gln Cys Ala Asn Cys Pro Glu Tyr Gln Tyr Ala Asn Thr Glu Lys Asn Lys Cys Ile Gln Lys Gly Val Ile Val Leu Ser Tyr Glu Asp Pro Leu Gly Met Ala Leu Ala Leu Ile Ala Phe Cys Phe Ser Ala Phe Thr Val Val Val Phe Trp Val Phe Val Lys His His Asp Thr Pro Ile Val Lys Ala Asn Asn Arg Ile Leu Ser Tyr Leu Leu Ile Val Ser Leu Met Phe Cys Phe Leu Cys Ser Phe Phe Phe Ile Gly Tyr Pro Asn Arg Ala Thr Cys Ile Leu Gln Gln Ile Thr Phe Gly Ile Phe Phe Thr Val Ala Ile Ser Thr Val Leu Ala Lys Thr Ile Thr Val Val Leu Ala Phe Lys Val Thr Asp Pro Gly Arg Gln Leu Arg Ile Phe Leu Val Ser Gly Thr Pro Asn Tyr Ile .
Ile Pro Ile Cys Ser Leu Leu Gln Cys Ile Leu Cys Ala Ile Trp Leu Ala Val Ser Pro Pro Phe Val Asp Ile Asp Glu His Ser Glu His Gly His Ile Ile Ile Val Cys Asn Lys Gly Ser Ile Thr Ala Phe Tyr Cys 580 585 . 590 Val Leu Gly Tyr Leu Ala Cys Leu Ala Phe Gly Ser Phe Thr Ile Ala Phe Leu Ala Lys Asn Leu Pro Asp Thr Phe Asn Glu Ala Lys Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ala Val Trp Val Thr Phe Leu Pro Val Tyr His Ser Thr Lys Gly Lys Val Met Val Ala Val Glu Ile Phe Ser Ile Leu Ala Ser Ser Ala Gly Met Leu Gly Cys Ile Phe Ala Pro Lys Val Tyr Ile Ile Leu Met Arg Pro Asp Arg Asn Ser Ile His Lys Ile Arg Glu Lys Ser Tyr Phe (2) INFORMATION FOR SEQ ID N0:51:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 435 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:51:
(2) INFORMATION FOR SEQ ID N0:52:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 145 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide - (xi) SEQUENCE DESCRIPTION: SEQ ID N0:52:
Gln Thr Leu Ser Tyr Thr Leu Leu Val Ser Leu Thr Leu Cys Phe Leu Ser Ser Ser Leu Phe Ile Gly Arg Pro Ser Pro Ala Thr Cys Leu Leu Ser Gln Thr Thr Phe Ala Ala Val Phe Thr Val Ala Val Phe Phe Cys Arg Ala Phe Gln Ala Ile Arg Pro Glu Ser Arg Ile Arg Lys Trp Met Gly Pro Gln Lys Thr Asn Ser Val Val Phe Leu Cys Ser Phe Thr Gln 65 70 75 g0 Val Thr Leu Cys Gly Ile Trp Leu Gly Thr Glu Pro Pro Phe Val Asn Lys Asp Pro Gln Phe Met Pro Gly Tyr Ile Ile Ile Gln Cys Asn Glu Gly Ser Val Thr Ala Phe Tyr Ser Val Leu Gly Tyr Leu Gly Phe Leu Val Leu Gly Ser Leu Ala Val Ala Phe Leu Ala Arg Asn Leu Pro Asp Ala (2) INFORMATION FOR SEQ ID N0:53:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 474 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:53:
(2) INFORMATION FOR SEQ ID N0:54:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 338 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:54:
(2) INFORMATION FOR SEQ ID N0:55:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 182 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID N0:55:
~ 182 (2) INFORMATION FOR SEQ ID N0:56:
(i) SEQUENCE CHARACTERISTICS:
. (A) LENGTH: 37 base pairs (B) TYPE: nucleic acid (C) STR.ANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID
N0:56:
GACAAAATAT GAATTCT
(2) INFORMATION FOR SEQ ID N0:57:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 37 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID
N0:57:
GTACTCTTCA GAATTCT
(2) INFORMATION FOR SEQ ID N0:58:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 51 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID
N0:58:
Asn Met Asp Gln Cys Ala Asn Cys Pro Glu Tyr Ala Asn Thr Tyr Gln Glu Lys Asn Lys Cys Ile Gln Lys Gly Val Leu Ser Tyr Glu Ile Val Asp Pro Leu Gly Met Ala Leu Ala Leu Ile Cys Phe Ser Ala Ala Phe ' 35 40 45 Phe Thr Val !, 50 (2) INFORMATION FOR SEQ ID N0:59:
_ (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1079 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID N0:59:
Met Ala Ser Tyr Ser Cys Cys Leu Ala Leu Leu Ala Leu Ala Trp His ' Ser Ser Ala Tyr Gly Pro Asp Gln Arg Ala Gln Lys Lys Gly Asp Ile .
Ile Leu Gly Gly Leu Phe Pro Ile His Phe Gly Val Ala Ala Lys Asp Gln Asp Leu Lys Ser Arg Pro Glu Ser Val Glu Cys Ile Arg Tyr Asn Phe Arg Gly Phe Arg Trp Leu Gln Ala Met Ile Phe Ala Ile Glu Glu Ile Asn Ser Ser Pro Ser Leu Leu Pro Asn Met Thr Leu Gly Tyr Arg Ile Phe Asp Thr Cys Asn Thr Val Ser Lys Ala Leu Glu Ala Thr Leu Ser Phe Val Ala Gln Asn Lys Ile Asp Ser Leu Asn Leu Asp Glu Phe Cys Asn Cys Ser Glu His Ile Pro Ser Thr Ile Ala Val Val Gly Ala Thr Gly Ser Gly Val Ser Thr Ala Val Ala Asn Leu Leu Gly Leu Phe Tyr Ile Pro Gln Val Ser Tyr Ala Ser Ser Ser Arg Leu Leu Ser Asn Lys Asn Gln Tyr Lys Ser Phe Leu Arg Thr Ile Pro Asn Asp Glu His Gln Ala Thr Ala Met Ala Asp Ile Ile Glu Tyr Phe Arg Trp Asn Trp Val Gly Thr Ile Ala Ala Asp Asp Asp Tyr Gly Arg Pro Gly Ile Glu Lys Phe Arg Glu Glu Ala Glu Glu Arg Asp Ile Cys Ile Asp Phe Ser Glu Leu Ile Ser Gln Tyr Ser Asp Glu Glu Glu Ile Gln Gln Val Val Glu Val Ile Gln Asn Ser Thr Ala Lys Val Ile Val Val Phe Ser Ser Gly Pro Asp Leu Glu Pro Leu Ile Lys Glu Ile Val Arg Arg Asn Ile Thr Gly Arg Ile Trp Leu Ala Ser Glu Ala Trp Ala Ser Ser Ser Leu Ile Ala Met Pro Glu Tyr Phe His Val Val Gly Gly Thr Ile Gly Phe Gly Leu Lys Ala Gly Gln Ile Pro Gly Phe Arg Glu Phe Leu Gln Lys Val His Pro Arg Lys Ser Val His Asn Gly Phe Ala Lys Glu Phe Trp Glu Glu Thr Phe Asn Cys His Leu Gln Glu Gly Ala Lys Gly Pro Leu Pro Val Asp Thr Phe Val Arg Ser His Glu Glu Gly Gly Asn Arg Leu Leu Asn Ser Ser Thr Ala Phe Arg Pro Leu Cys Thr Gly Asp Glu Asn Ile Asn Ser Val Glu Thr Pro Tyr Met Asp Tyr Glu His Leu Arg Ile Ser Tyr Asn Val Tyr Leu Ala Val Tyr Ser Ile Ala His Ala Leu Gln Asp Ile Tyr Thr Cys Leu Pro Gly Arg Gly Leu Phe Thr Asn Gly Ser Cys Ala Asp Ile Lys Lys Val Glu Ala Trp Gln Val Leu Lys His Leu Arg His Leu Asn Phe Thr Asn Asn Met Gly Glu Gln Val Thr Phe Asp Glu Cys Gly Asp Leu Val Gly Asn Tyr Ser Ile Ile Asn Trp His Leu Ser Pro Glu Asp Giy Ser Ile Val Phe Lys Glu Val Gly Tyr Tyr Asn Val Tyr Ala Lys Lys Gly Glu Arg Leu Phe Ile Asn Glu Glu Lys Ile Leu Trp Ser Gly Phe Ser Arg Glu Val Pro Phe Ser Asn Cys Ser Arg WO 99/t10422 PCT/US98/13680 Asp Cys Gln Ala Gly Thr Arg Lys Gly Ile Ile Glu Gly Glu Pro Thr Cys Cys Phe Glu Cys Val Glu Cys Pro Asp Gly Glu Tyr Ser Gly Glu Thr Asp Ala Ser Ala Cys Asp Lys Cys Pro Asp Asp Phe Trp Ser Asn Glu Asn His Thr Ser Cys Ile Ala Lys Glu Ile Glu Phe Leu Ala Trp Thr Glu Pro Phe Gly Ile Ala Leu Thr Leu Phe Ala Val Leu Gly Ile Phe Leu Thr Ala Phe Val Leu Gly Val Phe Ile Lys Phe Arg Asn Thr Pro Ile Val Lys Ala Thr Asn Arg Glu Leu Ser Tyr Leu Leu Leu Phe Ser Leu Leu Cys Cys Phe Ser Ser Ser Leu Phe Phe Ile Gly Glu Pro Gln Asp Trp Thr Cys Arg Leu Arg Gln Pro Ala Phe Gly Ile Ser Phe Val Leu Cys Ile Ser Cys Ile Leu Val Lys Thr Asn Arg Val Leu Leu Val Phe Glu Ala Lys Ile Pro Thr Ser Phe His Arg Lys Trp Trp Gly Leu Asn Leu Gln Phe Leu Leu Val Phe Leu Cys Thr Phe Met Gln Ile Leu Ile Cys Ile Ile Trp Leu Tyr Thr Ala Pro Pro Ser Ser Tyr Arg Asn His Glu Leu Glu Asp Glu Ile Ile Phe Ile Thr Cys His Glu Gly Ser Leu Met Ala Leu Gly Ser Leu Ile Gly Tyr Thr Cys Leu Leu Ala Ala Ile Cys Phe Phe Phe Ala Phe Lys Ser Arg Lys Leu Pro Glu Asn Phe Asn Glu Ala Lys Phe Ile Thr Phe Ser Met Leu Ile Phe Phe Ile Val Trp Ile Ser Phe Ile Pro Ala Tyr Ala Ser Thr Tyr Gly Lys Phe Val Ser Ala Val Glu Val Ile Ala Ile Leu Ala Ala Ser Phe Gly Leu Leu Ala Cys Ile Phe Phe Asn Lys Val Tyr Ile Ile Leu Phe Lys Pro Ser Arg Asn Thr Ile Glu Glu Val Arg Ser Ser Thr Ala Ala His Ala Phe Lys Val Ala Ala Arg Ala Thr Leu Arg Arg Pro Asn Ile Ser Arg Lys Arg Ser Ser Ser Leu Gly Gly Ser Thr Gly Ser Ile Pro Ser Ser Ser Ile Ser Ser Lys Ser Asn Ser Glu Asp Arg Phe Pro Gln Pro Glu Arg Gln Lys Gln Gln Gln Pro Leu Ser Leu Thr Gln Gln Glu Gln Gln Gln Gln Pro Leu Thr Leu His Pro Gln Gln Gln Gln Gln Pro Gln Gln Pro Arg Cys Lys Gln Lys Val Ile Phe Gly Ser Gly Thr Val Thr Phe Ser Leu Ser Phe Asp Glu Pro Gln Lys Asn Ala Met Ala His Arg Asn Ser Met Arg Gln Asn Ser Leu Glu Ala Gln Arg Ser Asn Asp Thr Leu Gly Arg His Gln Ala Leu Leu Pro Leu Gln Cys Ala Asp Ala Asp Ser Glu Met Thr Ile Gln Glu Thr Gly Leu Gln Gly Pro Met Val Gly Asp His Gln Pro Glu Met Glu Ser Ser Asp Glu Met Ser Pro Ala Leu Val Met Ser Thr Ser Arg Ser Phe Val Ile Ser Gly Gly Gly Ser Ser Val .
Thr Glu Asn Val Leu His Ser (2) INFORMATION FOR SEQ ID N0:60:
(iI SEQUENCE CHARACTERISTICS:
(A) LENGTH: 26 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ix) FEATURE:
(A) NAME/KEY: Modified Base (B) LOCATION: 3...3 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 12...12 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 15...15 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 18...18 (D) OTHER INFORMATION: Inosine (xi) SEQUENCE DESCRIPTION: SEQ ID N0:60:
(2) INFORMATION FOR SEQ ID N0:61:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 26 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ix) FEATURE:
(A) NAME/KEY: Modified Base (B) LOCATION: 6...6 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 9...9 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 12...12 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 18...18 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base _, (B) LOCATION: 21...21 (D) OTHER INFORMATION: Inosine (xi) SEQUENCE DESCRIPTION: SEQ ID N0:61:
(2) INFORMATION FOR SEQ ID N0:62:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 26 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ix) FEATURE:
(A) NAME/KEY: Modified Base (B) LOCATION: 3...3 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 9...9 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 12...12 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 13...13 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base ', (B) LOCATION: 24...24 (D) OTHER INFORMATION: Inosine (xi) SEQUENCE DESCRIPTION: SEQ ID N0:62:
(2) INFORMATION FOR SEQ ID N0:63:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 31 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ix) FEATURE:
(A) NAME/KEY: Modified Base (B) LOCATION: 2...2 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 5...5 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 8...8 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 11...11 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 14...14 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 20...20 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 26...26 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 29...29 (D) OTHER INFORMATION: Inosine (xi) SEQUENCE DESCRIPTION: SEQ ID N0:63:
(2) INFORMATION FOR SEQ ID N0:64:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 26 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ix) FEATURE:
(A) NAME/KEY: Modified Base (B) LOCATION: 3...3 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 6...6 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 9...9 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 12...12 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 16...16 (D) OTHER INFORMATION: Inosine (A) NAME/ICEY: Modified Base (B) LOCATION: 24...24 (D) OTHER INFORMATION: Inosine ', (xi) SEQUENCE DESCRIPTION: SEQ ID N0:64:
A~SNYTNR TNTTYNGYTT YYTNTG 26 (2) INFORMATION FOR SEQ ID N0:65:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 28 base pairs (8) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ix) FEATURE:
(A) NAME/KEY: Modified Base (B) LOCATION: 2...2 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base ' (B) LOCATION: 5...5 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 11...11 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 17...17 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 20...20 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 23...23 (D) OTHER INFORMATION: Inosine (xi) SEQUENCE DESCRIPTION: SEQ ID N0:65:
RNATNSWRAA NAYYTCNACN RCNACCAT 2g (2) INFORMATION FOR SEQ ID N0:66:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 26 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ix) FEATURE:
(A) NAME/KEY: Modified Base (B) LOCATION: 6...6 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 9...9 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 12...12 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 15...15 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 21...21 (D) OTHER INFORMATION: Inosine (xi) SEQUENCE DESCRIPTION: SEQ ID N0:66:
(2) INFORMATION FOR SEQ ID N0:67:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 26 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ix) FEATURE:
(A) NAME/KEY: Modified Base (B) LOCATION: 3...3 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 6...6 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 12...12 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 15...15 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 24...24 (D) OTHER INFORMATION: Inosine (xi) SEQUENCE DESCRIPTION: SEQ ID N0:67:
(2) INFORMATION FOR SEQ ID N0:68:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2550 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:68:
TGAAGTTTTC
ATAGTGAAGA
ATGAACCTAT
AATATGAGTT
TTTTACCCAA
GAGTTATGGA
GTTATTTAGA
AACTGGCAAT
GCGACCATGA
ATGGCATGGT
ATGATGACCA
TCTGTTTAGC
CAATATATGA
TGAACTCTAC
GGATCACAAC
TCCATGGGAT
TGCAAACAAT
ATTATTTTAA
ACACCTTGGA
ATTTGTACAA
TAGAGTCTCA
CCTTGATGAA
GGGAAAATCA
GATTAAAAGT
TATCTGATGA
GTAGTGTGGC
GCTTTGATTG
GTGTGAGGTG
CTGTATCATT
CCTTCTCAGC
CTGTGAAGGC
TTCTCTGCTC
CCACATTTGG
TGGTCATGGC
GGGCACCTAA
GGTTGGTCAC
TCATTCTTTG
CCTTGGCTCT
ATGAAGCCAA
TCCCTGTCTA
TGGCTTCTAG
TTAGACCAGA
(2) INFORMATION FOR SEQ ID N0:69:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2424 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:69:
(2) INFORMATION FOR SEQ ID N0:70:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2409 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:70:
TTCACTTTTA
TGCTACTGAT
CATCATTGGT
AAATGGACAT
TCTTACAGGA
GGTTTTCTTT
TCAAGTAGCC
TAGATGGACT
AGATTTAAGA
AGAAAACATG
TTTAGCAAAA
AAGATGGGAA
CACAAATAAA
CAGATTTGAG
AGTAGATATT
CAGCAGTAAA
CTATGATATG
CCACACCTAC
AAGATTTTTC
CCCTGTTGGA
TTTCCTCATT
ACCTTGTTTT
AGGAACATCA
AATTCATCAG
GGTTTCCAAT
CATAGAGAAA
GGGGATAGCT
CACATTTTTG
CATCCTGCTC
AAACCAGGTC
TTCTACAGTG
AAGAAGAATG
CCTAATCCAA
AGATATACAA
CTTCCATGTT
CTTGGCTAGG
GGTGTTCTGC
CATGGTGGTT
CTTTGTCCCA
CAAAGATAAA
(2) INFORMATION FOR SEQ ID N0:71:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2556 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:71:
CACTTCTCAT
TAACGGATGA
TTGAAAAAGA
ATGCTTTGGC
ATATGTCTTT
CACCATATTT
AGAGTATGTG
AGTACCTGGA
CCATCTTCAG
CTCTAGCATT
TCATCCCAGA
ACAAAGAAAT
AAAAAACTGA
ATGGAGAAAC
AGAGAATATG
ATGACACATT
AAAATTTTGT
AGTGGAAATA
CATCTGATGC
ATAGTCATAA
TGCAACAGGC
AGGTAAACTC
TGAAGCAAAG
AACACCTTGG
ACTCTCACTT
CTGTGTGCAG
CCTGCTGTTT
ATCAATGCGT
AGAAAGGTGT
CCTTCTGCTT
CTCCTATTGT
TCTGTTTTCT
AGCAAATCAC
TCACTGTGGT
TATCAGGGAC
CAATCTGGCT
ACATCATCAT
TGGCCTGCCT
CATTCAATGA
CCTTCCTCCC
CTATCTTGGC
TTTTAATGAG
(2) INFORMATION FOR SEQ ID N0:72:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2169 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:72:
GGATGAATCT
GCTTTCCTAT
TCAGATGGCC
GAAATGGAAT
AGAGTTGAAG
TGTTGATGAA
ATTAACAAAT
AATGTGGGAA
TACCAGTAAG
CCATGGTGAG
AGATTTATAT
TTGTAAAATA
GCTTGACATG
CCATGCCCTC
AGGAGCCAGT
TCCTCTTGGG
TGTTCACTTT
CCCATATTTA
AGGAAGAAGA
ATTATGGAAG
CA 02294473~1999-12-21 AATTTCTAAT
CACAGAACAG
GGGGATGGCA
TGTCTTTGTG
TCTATTACTC
AAACAAAGTC
TTCCACAGTT
AAGAAGATTG
CCTACTCCAA
TGATGAACAC
ATTCTACTGT
CTTGGCCAAG
AGTGTTCTGC
CATGGTTGCT
TTTTGTACCC
CAGGGAAAAA
(2) INFORMATION FOR SEQ ID N0:73:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1889 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESC&IPTION: SEQ ID N0:73:
GCCATTGTTT
GACCACAAAG
TGTACGGCTT
GAAAATATGG
CTAATGCGAA
CCCCATATTA
TTTAAGCACA
AAAAAATACC
TTTTCTGATA
TTACCTAGTC
GTGTACGCTG
TGTGAAAATG
ATTGAGGTGA
CTTAACCTCT
GCAAATGCTC
ATATTTTCAG
GTAACCCTGG
ATTTCTAATG
ACAGAGAAGA
GGGATGGCTC
ATATTTGTGA
ACTTTGCTCA
AACACAGTTG
GCCACTGTGT
AGAATGGTAA
CTGATCCAAC
GATGCTCATA
TTCCACTCTG
TTGTCAAGAA
GTATTCTTCT
ATGGTCGCCG
(2) INFORMATION FOR SEQ ID N0:74:
- 182 - w (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1889 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:74:
ATGGCGACGA AGGACACATC
TCTTTCACTT GCCATTGTTT
(2) INFORMATION FOR SEQ ID N0:75:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 270 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:75:
(2) INFORMATION FOR SEQ ID N0:76:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1308 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:76:
TCTCATCTTG
TAATGACGGA
TGAAGATAAT
TCTTCTCGTA
CATAACTTTG
CCAAGCATAT
TGATTCATGT
GCACTCTTCG
CCGGCTGCCC
CTCCTTGATG
TTTCACTTTA GATGGACTTG.GATAGGAATG GTCATCTCAG ATGATGACCA 660 GGi3TATTCAG
TTTTGTTAAT
TCAACAAATT
TCTAGAAGTA
CTCACAATGG
TATCACTTTT
GAACACTGCC
TTGTTCAATA
ATGGACATCA
TGCTGTTTAT
GAAAAAGGCA
(2) INFORMATION FOR SEQ ID N0:77:
(i) SEQUBNCE CHARACTERISTICS:
(A) LENGTH: 1296 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:77:
TCTCATCTTG
TAGTGATGGA
TGAAGATAAT
TCTTCTGGTA
CATAACTTTG
TCAATCATAT
TGATTCATGT
AACATAGGCC TTACAGGACC ATCATGGAAA AAATCCiTAA AACTGGCAAT 480 GGATTCTTCA
CCGGCTGCCC
CTCCTTGATG
GGGTATTCAG
TTTTGTTAAT
TAAACAAATT
TCTAGAAGTA
CTCACAATGG
TATCACTTTT
GAACACTGCC
TTGTTCAATC
TCTAAGAACA GCAGTAAAAT GGATCTT'!'TT ACATCCAACA ACACATTGGA1140 ATGGACAGCA
CTGCACAACT ATGATATGGC CATGAGTGAT GAAiGGTTACA ATTTGTATAA1200 TGCTGTTTAT
GAAAAAGGTA
GAACACAACA GATATTTCAC TGTTTGTCAG CAGATA ' 1296 (2) INFORMATION FOR SEQ ID N0:78:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1521 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:78:
TCTCATCTTG
TAGTGATGGA
TGAAGATAAT
TCTTCTGGTA
CATAACTTTG
TCAATCATAT
TGATTCATGT
GGATTCTTCA
CCGGCTGCCC
CTCCTTGATG
GGGTATTCAG
TTTTGTTAAT
TAAACAAATT
TCTAGAAGTA
CTCACAATGG
TATCACTTTT
GAACACTGCC
TTGTTCAATC
ATGGACAGCA
TGCTGTTTAT
GAAAAAGGTA
AACCAGGGTA
GTGTACAGAG
GAAAATAGGA
TTTGGAATGG
(2) INFORMATION FOR SEQ ID N0:79:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 933 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:79:
TCTCATCTTG
TAATGATGGA
TGAAGATAGT
TCTTCTGGTA
CATGAGTTTG
TCAAGAATAT
TGATTCATGT
GCATTCTTCA
CCGGCTGCCC
CTCCTTGATG
GGGTATTCAG
TTTTGTTAAT
TACACAAATT
TCTAGAAGCA
(2) INFORMATION FOR SEQ ID N0:80:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1236 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:80:
GCAAAGGGAA
TAATTCACTT
TTTTGCTACT
CTCCATCATT
AATAAATGGA
AGGTCTTACA
ACTGGTTTTC
CCATCAAGTA
TTTTAGATGG
CTCAGATTTA
CCCAGAAAAC
GTCTTTAGCA
TAGAAGATGG
CATCACAAAT
CCGCAGATTT
CCCAGTAGAT
GAACAGCAGT
CAACTATGAT
GGCCCACACC
CAAAAGATTT
(2) INFORMATION FOR SEQ ID N0:81:
(i) SEQU8NC8 CHARACTERISTICS:
(A) LENGTH: 2412 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:81:
GGCCAATTTC
ATATTTGGGA
TTATTTCAAC
ATTGGTGTTT
GATTATAAGA
TACACTTTGG GCCGTTGTGA Z'GGAAAAACT GTAATACCTA CACCATATTT 360 ATTTCGTAAA
TTCCTATCTG
CAGCTTCTTA
TGATGATGAA
GGCAATGGTC
TGATGACCAA
GGAAACCAAT TTCTTTTAGA GTT'GAAGAAA CAGAGTGAAA ACAAGGAAAT 720 TTGCTTTGCC
AATGTACTAC
ATACAATTTC
GATCACCACA
CTATGGATCA
(2) INFORMATION FOR SEQ ID N0:82:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 381 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:82:
(2) INFORMATION FOR SEQ ID N0:83:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 228 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: CDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:83:
(2) INFORMATION FOR SEQ ID N0:84:
- 187 - _ (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1644 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SBQ ID N0:84:
ATGTTAGAAT TGGCCCATGG CACTCTaACT TTCTCACCCC ATCATGGGGA 60 GATTTCTGAT
TTTTCTTCAC
AATCTTTGAA
GCTGGTCATG
TCTCCATGAG
TATATTATTT
TGGTGATCGT
TATTTGGAAT
TGCTCCCAAG
TACAGAGATT
CCTGGAGAGC
AAGCCTGCCT GTTaCTTTGA CTGCACTCCT TGCCCAGATA AAGAGATTTC 720 CAACGAGACA
GAAGAGTCAC
GGGACTCACA
TATAATCCAC
GCTCATCACT
AGCCACATGT
AGTGTTGGCC
GACAAGATGG
GCAAATCCTT
TCACTCTGAA
CTGTACTCTG
CAGGAATCTT
CTGCAGTGTC
GGCTATGGAA
CCCTAAGTGC
AAAAAGACAG
(2) INFORMATION FOR SEQ ID N0:85:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2304 base pairs (B) TYPE: nucleic acid (C) STRANDEDNBSS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:85:
TATAAAACAT
ATTTTATAAG
TATAGGGCTG
TCCACGTTTC
ATTTCCATAT
CTTCTTACTT
CAATCAATTT
CTCTCAGAGT TG~?~AAAAAGA GACCCAAAAC AAGGAAATTT GCTTTGCCTT480 TGTTAACATG
TCAAATAGTG
ATGTCATCAA CAAATATTAT TATCATTTAT GGGAX~AACAA ACAGTATCAT600 TGAATTGAGC
AGAGTTGGAT
GACATTTCTA
CCATCTCAGA
TGCCTCAGGA
-188- _ TCCAATGCCT
AGTCATAGTA
CAAGAGGTTG
GTAAACTCAT
AAACAGAGAG
CACCTTCGGA
TTTCACTTAT
GTGTGCAGTG
TGCTGTTTTA
CAATGTGTGA
AAAGACGTGA
TTCTGTTTGT
CCTATTGTGA
TGTTTTCTCT
CAAATCACAT
ACTGTCATTC
TCTGGTGCAC
ATTTGGCTAG
ATCATGATTG
GCCTGCCTGG
TTCAACGAAG
TTTCTCCCTG
ATCTTGGCAT
TTAATGAAAC
(2) INFORMATION FOR SEQ ID N0:86:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2001 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:86:
CCATTTCAGC
TATCTTGGAA
CATTGTTAGT
GATGTCATCA
CTTTAGACTA
TATGATCATA
ACATCACTAT
CTACAGTGAT
ATTATCTGAA
CAGGCACCAT
TGCTGTGGCT
TGATGGAAAA
ATTTATAAAC
GTATGAGATT
AACATTTTCC
GTGGAACACA
ATTCAGAAAA
AGAAAATGAA
GTATGCCAAT
AGATCCATTG
TGTACTTAGT
TCTCAGCTAT
TGGTCATCCC
TGTAGCTGCA
TAATACAAGT
AATTTGCACA
TGTTGATGCT
- 189 - _ ATTTTCTGTA
(2) INFORMATION FOR SEQ ID N0:87:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2598 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:87:
CTTTCTCCTG
TATGAAGAAG
AGCAGTGCAA
CAAAAGTATC CTCTCACCTT GGCTTTI"TCC ATGAATGAAA TCAACAACAA 300 CCCTGATCTT
TTGCCAAATA TGTCTTTAGC AT't'TACATTC TCAGAATATA GTTaTTATTT360 GGAATCCCAC
TTTTATCTGT
AACTGTGACA
TTATGGACAC
GGCCTCTGAT
GATACATCTC TAGCCG'TTGC TCTCGTCTCC TTCATAATTC ATTTCAGTTG 660 GAGAAGAGAG
TATGAATTTA
AAATGTTGTT
GGACTCTCTA
TAAGAAAGAC
TTCACATTTG ATAATGGATA TGGAACTTTT GGTTTTC',GAC ACCGCCACAG1020 TGAGATTTCT
ATATTTGGTA
GTCACTGAAG
CATGGCCATT
ACTCCATGAG
ATGACTC1'TC AAAATGTTGA TAATGTTCTC CTTCCCAATT ATGAAGAACA 1320 AAATTATAAT
TGCAAGATGG TTTATTCCTT Tt'TGAGCAAG ACTCAATTCA CAAFrTCCTGT1380 TGGAGACACT
GTGAATATGA ATCAAAGAAA CAAACTGAAG GAAGAGTACG ACATTrTCTA 1440 CAATTGGAAT
TTTTCCAAAA
TATACAGATG
GAAC'aAATQGA
TAATGAGACA
GCAGAATCAC
GGCTCTTTCC
TGTGAAACAT
GCTCATCGCA
AGCTACCTGC
TGTGTTGGCC
GATGAAGTAC
TCAAATTATT
ACAGTCTGAG
CTGTGTCCTG
CAGAAACCTG
CTGCAGTGTC
GGCTGTTGAG
TCCAAAATGC
GAAGTCATCT
(2) INFORMATION FOR SEQ ID N0:8B:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2337 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:88:
CACATCCCTG
ACTTTTTAGC
GAGCAATTCT
ACTCCTGGAT
TGAACAAAGA
TGGCATTGTA
TGATGACAAA
CTGCACAGCA
GAAAAATCAT
TGATTCTCTA
GGTACTGATC
GCATGGAGCT
GCAGAAGTAC
GTACTTCAAT
TGCCTCCCTG
CAATGTATAC
AGTACAATTT
CCCCTTTCTA
AGGGAAAGAC AACTCATCAA TCAGAATGGA GCGAATGA,AG ATCTGGATTG1140 TACCAGGAAG
TGGGCTAAAT
CATATCTTCT
CAGTGAGAGC
CTTTGACTGC
TGTGAAGTGT
TGTGACATTT
TTTCTCCTCA
TGTCAAGGCC
TCTCTGTCCC
CATTTTTGGG
GGTTATAGCC
GGCCCCTAAT
GCTGACAACC
CATCATTTGC
ACTAGCCCTA
TGAAGCCAAG
CCCTGTCTAC
GGCTTCCAGT
AAGACCAGAA
ACCTTCT
(2) INFORMATION FOR SEQ ID N0:89:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1650 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:89:
AACAAAAAAC
TCCTGATCTT
ACAAACTACA
TTATTTCTGT
AAT'Cv4AGAGA CTATGTGTAC ATTTCTACTT ACAGGACCGC ATTGGATAAC360 ATCTTATAGT
CACATCCCTA
CCTTGTCATC
CAAAATCAAG
TTATCATAAA
CATTTATGGG
AAAAC14,GAGA GTATTATTGA GTTGAGCTTC AGAATGTGGG AATCTCCAGT720 TATCCAGAGA
AACTCATGAC
CTTTAAAAAT
GCCAGAGTGG
CTATTCATCC
TGATGGAAGT
GAATCTGCAC
CTTTAAGATA
GATTATGAAA
TTCTCAGCAC
CAGGCACTTT
ATCCTCTGTG
GGCAGCCTGC
TATGGATCAG
CATTCAGAAA
TCATAGCCTT
(2) INFORMATION FOR SEQ ID N0:90:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2379 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:90:
TGATCCCAGG
AAGTTGTTAC
GACTCTGAAT
AATGGATGAA
TACATTGGGC
CACTGTGATG GAAATATCCC ACTCL'G3CTTA CTTAATCAAA TATTTTATAT360 GCCTTTTCCT
GAATTTGTGG
TCAGATTTCC
CTATCAGATG
CTTCAACTGG
CACAGAGTTG
GGCAATCAAT
GTCAACCGCA
CAGAACATGG
CCCAACAAGT
ACACCATGAT
CATGGATTTA
TATTTATTAA AGCCAGAGT'G GGGTTTCTTT GAATATGAAA CCTCAGCATC1080 TTACTGTAAA
GAAGTTTGAC
GGCCCATGCT
CAAAGGAGCA
CAATCCTTTT
CATTGTTCAC
CAGCCCATAT
CACAGGAAGT
AAAATTCTGG
TGAAATTTCT
TATTGTGAAG
CTTTCTGTGC
AATCACATTT
TGTGCTTCTG
GGGGACACCC
TTGGCTAGCA
CATAATTGTG
CTTCCTGGCC
CAATGAAGCC
CCTTCCTGTC
TTTGACATCC
AATGAAACCA
GAGAGAATTC TATCCAAP~AG ACAGGAGAAA TCACGTTTC 2379 (2) INFORMATION FOR SEQ ID N0:91:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2394 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:91:
GGATCCCAGA
GACTTGTTCC
GACTCTGAAT
AGTGGATGAA
CAATTTGGGT
TAATCATCTC
TACAGGACCA
TCCACATTTC
ATATCCTTAT
CTTCATAATT
CAATCAATTT
TGTGAACATG
CCAAATTGTG
TGAATTGAGC
ACAATTTGAT
TACATTTCTA
CAATCTCAGA
AGCCTCAGCA
GCTAATGGAA
TGTATATGCC
AATAAACAAT
GACCCACTTC
TGGAGACTAT
GATAGGACAA
GACTGAGTTG
TCCTGGATTC
CTGCCCTGAA
CCATGACACT
ACTCATGTTC
TATCTTACAG
CAAAACAATC
CTTTTTGGTA
TCTGTGTGCA
GCATGGCCAC
GGGATACTTG
GCCTGACACA
CTGGGTCACC
(2) INFORMATION FOR SEQ ID N0:92:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2085 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:92:
CATCTTCAGT
ACTAGCATTG
TATCTCAGAT
CAAGGATATT
TAAAACTGAA
TGGGGAAACA
GAGAATATGG
TGGCACATTC
AAATTTTGTA
GTGGAAATAT
ATCCAATATC
TAGTCACAAC
GCAAGAATTT
GCTAAACTCA
GAAACAGAGA
ACGCCTTGGG
GTTACACTTA
AGTGTGCAGT
GCAGATTGCC ATCCTGGATT CAGAAC,AATC TGGAA('~GAGG AAATGGCAGC1140 CTGCTGTTTT
TCAGTGTGCG
GAAAGGTGTG
ATTCTGTTTC
TCCTATTGTG
CTGTTTTCTG
GCAAATCACA
CACTGTGGTT
ATCGGGGACA
AATCTGGCTA
CATCATCATT
GGCCTGCCTG
GCCTTTGGAA GCTTCACTAT AGCTT1'CTTG GCAAAGAACC TGCCTGACAC1860 ATTCAACGAA
CTTCCTCCCT
CATCTTGGCA
TTTAATGAGA
WC C181I11:
'r
The invention further provides representative nucleic acids and encoded polypeptides in this multigene family. The representative polypeptides are expressed in the marine and rat 1 o vomeronasal organ (VNO). Agents which bind the nucleic acids or polypeptides also are provided. The invention further relates to methods of using such nucleic acids and polypeptides in the diagnosis and/or treatment of disease, including the use of these molecules in controlling fertility and behavior in vertebrates and invertebrates.
Background of the Invention Pheromones are intraspecific chemical signals found throughout the animal kingdom.
They regulate populations of animals by inducing innate behaviors and stereotyped changes in physiology (Karlson and Luscher, Nature, 1959,183:55-56; Wilson, Sci. Am., 1963, 208:100-114; Sorensen, Chem. Sens., 1996, 21:245-256). Pheromones can serve as cues for overcrowding, impending danger, reproductive status, gender, or dominance. In rodents, a variety of pheromone effects have been reported. These include effects on estrus and the onset of puberty as well as the induction of mating and aggressive behaviors (Singer, A.G., J. Steroid.
Biochem. Molec. Biol., 1991, 39:627-632; Halpern, M., Ann. Rev. Neurosci., 1987 10:325-362;
Wysocki, C.J., et al., In the Neurobiology of Taste and Smell, 1987, 125-150;
Novotny et al., Chemical signals in Vertebrates, 1990, Vol. 5, eds. D.W. Macdonald et al., Oxford University Press).
The detection of pheromones is mediated by the olfactory system. However, sensory neurons that detect pheromones are typically segregated from those that detect volatile odorants (Keverne, E.B., Trends Neurosci., 1983, 6:381-384; Halpern, M., Ann. Rev.
Neurosci., 1987, 10:325-362; Wysocki, C.J., et al., In the Neurobiology of Taste and Smell, 1987, 125-150;
Hildebrand, J.G., et al., Brain Res., 1997, 677:157-161 ). In marnrnals, sensory neurons in the nasal olfactory epithelium (OE) detect volatile odorants and some pheromones while those in an accessory olfactory organ, called the vomeronasal organ (VNO), are thought to be specialized to detect pheromones. The VNO is a tubular structure, at the base of the nasal septum, which is connected to the nasal cavity by a small duct. Signals from the OE are relayed through the olfactory bulb (OB) to the olfactory cortex, and then to multiple brain regions, including those involved in conscious perception. In contrast, signals from the VNO are conveyed through the accessory olfactory bulb (AOB) to the amygdala and hypothalamus, areas associated with the endocrine and behavioral responses induced by pheromones.
Volatile odorants are detected in the OE by as many as 1000 different types of odorant receptors (ORs), which are differentially expressed by olfactory sensory neurons (Buck and Axel, io Cell, 1991, 65:175-187; Levy, N.S., et al., J. Steroid Biochem. Mol. Biol., 1991, 39:633-637, 1991; Nef, P., et al., Proc. Natl. Acad. Sci., 1992, 89:8948-8952; Strotman, J., et al., Neuroreport, 1992, 3:1053-1056; Ngai, J., et al., Cell, 1993, 72:667-680;
Ressler, K.J., et al, Cell, 1993, 73:597-609; Vassar, R., et al, Cell, 1993, 74:309-318. The ORs are thought to couple to the G protein a subunit, Gao,~, thereby initiating a cascade of transduction events which culminate in the generation of action potentials in the sensory axons (reviewed in Firestein, S., Curr.Opin. in Neurobiology, 1992, 2:444-448; Reed, R., Neuron, 1992, 8:205-209; Ronnett, G., et al., Trends Neurosci, 1992, 15:508-513). Current evidence suggests that each OR may recognize a particular molecular feature that can be shared by many odorants (Ressler, K., et al., Celd, 1994, 79:1245-1255; Vassar, R., et al., Cell, 1994, 79:981-991; Axel, R., Sci. Am., 1995, 1273:154-159; Buck, L., Annu. Rev. Neurosci., 1996, 19:517-544). This is consistent with a combinatorial coding model in which the identities of different odorants are encoded by different combinations of receptors, but each receptor serves as one component of the codes for many odorants. By contrast, very little is known about how pheromones are detected or encoded in the VNO. Although VNO neurons (VNs) resemble olfactory sensory neurons in the nose, only a rare VN expresses an OR gene. VNs also lack a number of other olfactory sensory transduction molecules, including the G protein a subunit,Gaco,~ (Reed, R., Neuron, 1992, 8:205-209), which is highly expressed in olfactory neurons (Dulac and Axel, Cell, 1995, 83:195-206;
Berghard, A., et al, Proc. Natl. Acad Sci. USA, 1996, 93:2365-2369; Wu, Y., et al, Biochem.
Biopys. Res. Com., 1996, 220:900-904). Instead, VNs express high levels of two other G
3o protein a subunits,Gao and Gait (Dulac and Axel, Cell, 1995, 83:195-206;
Halpern, M., Brain Res., 41995, 677:157-161; Berghard, A., et al, Proc. Natl. Acad. Sci. USA, 1996, 93:2365-2369).
G,~ and Gait are expressed in spatially-segregated subsets of VNs that form longitudinal zones in the VNO neuroepithelium. Interestingly, Dulac and Axel have identified a family of 100 candidate pheromones receptors ("VNRs") which appear to be expressed exclusively in the Gait subset (Dulac and Axel, Cell, 1995, 83:195-206).
This invention differs from the state of the art in providing a novel family of mammalian pheromone receptors. Accordingly, the objects of the invention relate to providing compositions containing these novel receptors and their binding partners and methods for using such compositions to modulate pheromone receptor activity.
1 o The invention involves the discovery of a multigene family of mammalian pheromone receptors. In particular, the invention involves the cDNA cloning of multiple pheromone receptors from a marine VNO cDNA library and from a rat VNO cDNA library.
Partial sequences of human homologs of these pheromone receptors also are provided.
In general, the invention provides isolated nucleic acid molecules encoding the novel pheromone receptors, unique fragments of the isolated nucleic acid molecules, expression vectors containing the foregoing, and host cells transfected with the foregoing. The invention also provides isolated pheromone receptor polypeptides and agents which bind such polypeptides, including antibodies. The foregoing can be used in the diagnosis or treatment of conditions, including the control of fertility, that are characterized by the expression of a pheromone receptor 2o polypeptide. Methods for identifying pharmacological agents useful in the diagnosis or treatment of such conditions and methods for identifying additional members of this multigene family also are provided.
Applicants have discovered that the pheromone receptors disclosed herein are expressed in the vomeronasal organ (VNO), particularly in Goco protein expressing neurons. This is in contrast to the prior art VNO pheromone receptors which are expressed in neurons which express different G-coupled proteins (Gait-expressing neumns). Thus, the novel pheromone receptors disclosed herein are distinct from, and expressly exclude, the prior art VNO
pheromone receptors which differ in primary structure, as well as in cell localization. Although Applicants do not intend the invention to be limited to a particular theory or mechanism, the amino acid sequence 3o homology and structural organization of the pheromone receptor polypeptides to other well-known G-protein coupled receptors suggests that the pheromone receptors disclosed herein also are G-protein coupled. Thus, it is anticipated that the binding to the pheromone receptor of its cognate ligand (pheromone) will be accompanied by G-protein signal transduction, an event which can be measured using conventional screening assays, such as assays that measure changes in the intracellular concentrations of calcium and/or cyclic nucleotides (see, e.g., PCT
publication no. WO 94/18959, entitled "Calcium Receptor-Active Molecules", inventors E.
Nemeth et al.).
According to one aspect of the invention, a family of pheromone receptor polypeptides is provided. Each polypeptide of the family shares amino acid sequence homology and structural organization with a pheromone receptor polypeptide selected from the group consisting of SEQ
ID NO. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52. Each polypeptide member of the receptor family contains, from amino terminus to carboxyl terminus, the following domains: (a) an amino-terminal extracellular domain containing from 30 to 600 amino acids; (b) a transmembrane region comprising: (i) seven non-contiguous transmembrane domains designated TMl, TM2, TM3, TM4, TMS, TM6 and TM7, (ii) three non-contiguous extracellular domains designated EC2, EC3 and EC4, and (iii) three non-contiguous intracellular domains designated IC1, IC2, and IC3,wherein the transmembrane domains, the extracellular domains and the intracellular domains are attached to one another from amino terminus to carboxyl terminus in the order TM1-IC 1-TM2-EC2-TM3- IC2-TM4-EC3-TM6-EC4-TM7, and wherein the transmembrane region has at least about 35%
homology and a length approximately equal to a transmembrane region of a polypeptide selected from the group 2o consisting of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 34, 36, 38, 40, 42, 44, 46, 48, and 50; and (c) a carboxyl-terminal intracellular domain containing from 5 to 200 amino acids.
Each polypeptide member of the family is expressed in a Gao protein-expressing vomeronasal organ neuron or are expressed in another olfactory organ neuron in an animal which does not possess a vomeronasal organ. One skilled in the art can readily identify olfactory organs in animals which do not possess a vomeronasal organ.
In general, the amino-terminal extracellular domains (NTDs) of the receptor family members share sequence homology to a pheromone receptor polypeptide selected from the group consisting of SEQ ID NO. 2, 4, 6, 8,10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, and 50 to a lesser extent than that observed for the transmembrane region. The length of the extracellular domain can vary among members of the family.
Accordingly, certain embodiments of the invention have extracellular domains that contain at least 50, 100, 200, 300, 400 or 500 amino acids. Preferably, the transmembrane region has greater than 40% homology with the corresponding region of a pheromone receptor polypeptide selected from the group consisting of SEQ ID NO. 2, 4, 6, 8,10, 12, 14, 34, 36, 38, 40, 42, 44, 46, 48, and 50, and more preferably, have even greater sequence homology (e.g., more than 50%, 60%, 70%, 80% or 90%
homology). The length of the carboxyl-terminal intracellular domain can vary among members s of the family. Accordingly, certain embodiments of the invention have carboxyl-terminal intracellular domains that contain at least between 5 and SO amino acids. More preferably, carboxyl-terminal intracellular domains contain between 15 and 25 amino acids.
According to another aspect of the invention, a method for identifying a nucleic acid encoding a pheromone receptor is provided. The method involves contacting a mixture of nucleic acid molecules (genomic library, cDNA library, genomic DNA, RNA, etc.) with at least one nucleic acid probe of a nucleic acid selected from the group consisting of: (a) a nucleic acid molecule selected from the group consisting of SEQ ID NO. 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 54, and SS
that encodes a pheromone receptor polypeptide; (b) a unique fragment of (a); (c) a human homolog of (a) or (b); and (d) a t s set of degenerate primers of any of (a), (b) or (c); and identifying the sequences within the mixture that hybridize to the probe. Selected fragments of human homologs of a pheromone receptor are selected from the group consisting of SEQ ID NO. 51, 53, 54 and 55. In certain embodiments, the nucleic acid probe further includes a detectable label to facilitate identification of the sequence in the library which hybridizes to the probe. In certain embodiments, the probe 2o is represented by a pair of degenerate polymerase chain reaction ("PCR") primers that amplify a unique fragment of a nucleic acid molecule selected from the group consisting of SEQ ID NO.
1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 54, and 55. The meaning of "unique fragment" in reference to a nucleic acid is provided below.
By "degenerate PCR primers that amplify a unique fragment" is meant degenerate primers which 2s result in the amplification of a unique fragment following a polymerase chain reaction.
According to this embodiment, the method for identifying a nucleic acid encoding a pheromone receptor polypeptide further involves subjecting a mixture of nucleic acids and the degenerate PCR primers to amplification conditions prior to identifying the sequences of the mixture that hybridize to the probe and that form part of the amplification reaction products. In some 3o embodiments the pair of degenerate polymerase chain reaction primers is selected from a conserved sequence motif of a pheromone receptor polypeptide. A "conserved sequence motif' can be determined using the side-by-side comparison of the amino acid sequences of the different pheromone receptor polypeptides of the invention. Exemplary conserved sequence motifs include regions selected from the group consisting of amino acids 191-397, amino acids 565-825, amino acids 637-825, amino acids 637-804, amino acids 619-784, of the polypeptide of, for example, SEQ ID NO. 2 (VRl ). In preferred embodiments, the pair of degenerate polymerase s chain reaction primers is selected from the group consisting of SEQ ID NOs.
60 and 61, SEQ ID
NOs. 62 and 63, SEQ ID NOs. 64 and 63, SEQ ID NOs. 64 and 65, and SEQ ID NOs.
66 and 67.
According to yet another aspect of the invention, an isolated nucleic acid molecule is provided. The isolated nucleic acid molecule hybridizes under high or low stringency conditions to a molecule consisting of a nucleic acid sequence selected from the group consisting of SEQ
1o ID NO. l, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 54, and 55. The invention further embraces nucleic acid molecules that differ from the foregoing isolated nucleic acid molecules in codon sequence due to the degeneracy of the genetic code. The invention also embraces complements of the foregoing nucleic acids.
The pheromone receptors of the invention are expressed in the vomeronasal organ or, in 1 s an animal which lacks such an organ, are expressed in another olfactory organ. More particularly, the receptors of the invention are expressed in a Gao protein-expressing vomeronasal organ neuron. Although not intending to be bound to a particular mechanism, it is believed that the receptors of the invention are G-protein coupled receptors. This is supported by Applicants' discovery that the receptors of the invention are expressed in Goco protein-expressing 2o vomeronasal organ neurons.
The pheromone receptors of the invention bind to ligands (pheromones) which induce certain changes in receptor conformation. Methods for identifying ligands which bind to the pheromone receptors of the invention are provided below, e.g., by forming an affinity matrix containing immobilized receptor and using the matrix to isolate a cognate ligand from a complex 2s mixture. The particular ligand bound by a particular receptor is dictated by the primary and secondary structure of the receptor. In certain embodiments, the immobilized pheromone receptor polypeptide is a pheromone receptor polypeptide selected from the group consisting of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, SO and 52.
3o According to another aspect of the invention, an isolated nucleic acid molecule that is a unique fragment of any of the foregoing isolated nucleic acid molecules is provided. In general, the isolated nucleic acid molecule consists of a unique fragment between 12 and 4000 nucleotides in length, and complements thereof, of any cDNA (SEQ ID NOs. l, 3, 5, 7, 9, 11, ' 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 54, and 55) encoding a pheromone receptor polypeptide selected from the group consisting of SEQ ID NO.
2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52.
Depending upon its intended use (e.g., probe, primer), the unique fragment can be between 12 and 2000, 1000, 500, 250, 100, 50 or 25 nucleotides in length. Preferably, the isolated nucleic acid molecule consists of between 12 and 35 contiguous nucleotides of the foregoing cDNAs encoding the pheromone receptor polypeptides, or complements of such nucleic acid molecules.
More preferably, the unique fragment is at least 14, 15, 16, 17, 18, 20 or 22 contiguous 1 o nucleotides of the nucleic acid sequence of the foregoing cDNAs encoding the pheromone receptor polypeptides, or complements thereof. Particularly preferred isolated nucleic acid molecules are isolated fragments of the foregoing cDNAs which encode one or more of the following pheromone receptor polypeptide domains, alone or in combination (e.g., as fusion proteins): an amino-terminal extracellular domain, a transmembrane region, and a carboxy-terminal intracellular domain. In certain embodiments, the unique fragments are a pheromone receptor extracellular domain or a pheromone receptor intracellular domain coupled to at least one (e.g., 1, 2, 3, 4, 5, 6, or 7) transmembrane domain.
According to yet another aspect of the invention, an isolated nucleic acid molecule comprising a molecule having a sequence selected from the group consisting of SEQ ID NO. 51, 53, 54, S5, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, and 92, that encodes a pheromone receptor polypeptide are provided. This aspect of the invention further embraces nucleic acid molecules that differ from these nucleic acid molecules in codon sequence due to the degeneracy of the genetic code, and diversity among pheromone receptors and complements of foregoing.
According to still other aspects of the invention, an expression vector comprising any of the foregoing isolated nucleic acid molecules operably linked to a promoter and host cells transformed or transfected with the same also are provided.
According to another aspect of the invention, an isolated polypeptide encoded by any of the above-described isolated nucleic acid molecules is provided. Preferably, the isolated 3o polypeptide is a pheromone receptor polypeptide that has a pheromone receptor activity or an antigenic fragment thereof. As used herein, a pheromone receptor activity refers to the ability of the pheromone receptor to selectively bind to its cognate iigand (pheromone) and, optionally, _g_ upon binding, to induce signal transduction in a cell that expresses the pheromone receptor. In preferred embodiments, the isolated polypeptide comprises a pheromone receptor polypeptide having a sequence selected from the group consisting of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52.
According to yet other embodiments, the isolated polypeptide comprises a polypeptide encoded by a nucleic acid which hybridizes under high or low stringency conditions to the extracellular domain, transmembrane region and/or intracellular domain of a cDNA sequence selected from the group consisting of SEQ ID NO. 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 54, and 55 that encodes a pheromone receptor 1 o polypeptide or fragment thereof. Thus, the invention embraces portions of a pheromone receptor polypeptide that may include, for example, an amino-terminal extracellular domain or a carboxy-terminal intracellular domain coupled to 1, 2, 3, 4, 5, 6, or 7 transmembrane domains.
Preferably, such polypeptides or fragments thereof are unique fragments and can function as, for example, antigens for making antibodies specific for pheromone receptor family members.
Accordingly, the polypeptides of the invention can be used to isolate additional members of the pheromone receptor family or, alternatively, can be used to induce in vivo an immune response to a pheromone receptor, i.e., can be incorporated into a vaccine preparation.
Such vaccine compositions are useful for controlling fertility or behavior in an animal by administering to the animal, an effective amount of the vaccine to elicit an immune response to the pheromone receptor. Thus, the invention embraces fragments or variants of the foregoing pheromone receptors which exhibit certain detectable activities, e.g., a ligand binding activity, an antigenicity activity. In certain embodiments, the isolated polypeptide is encoded by a cDNA
selected from the group consisting of SEQ ID NO. 51, 53, 54, 55, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, and 92, that encodes a pheromone receptor polypeptide or one or more of its domains.
According to another aspect of the invention, there are provided isolated binding polypeptides which selectively bind a unique amino acid sequence of a pheromone receptor polypeptide or fragment thereof. The isolated binding polypeptide in certain embodiments binds to a polypeptide comprising the extracellular domain and/or 1, 2, 3, 4, 5, 6, or 7 transmembrane 3o domains of a pheromone receptor polypeptide selected from the group consisting of SEQ ID
NO. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52.
_ _ _9_ The isolated polypeptide preferably binds to a polypeptide consisting of the amino-terminal extracellular domain and/or one or more portions of the transmembrane region of a pheromone receptor polypeptide sequence selected from the group consisting of SEQ ID NO.
' 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52.
In preferred embodiments, isolated binding polypeptides include antibodies and fragments of antibodies (e.g., Fab, F(ab)2, Fd and antibody fragments which include a CDR3 region which binds selectively to the unique sequences of the polypeptides of the invention). In the preferred embodiments, the isolated binding peptides do not bind to pheromone receptors that are expressed in vomeronasal organ neurons other than Gao-protein-expressing neurons.
1 o The invention provides in yet other aspects, isolated nucleic acids or polypeptides of the invention that are: (a) immobilized to an insoluble support (an affinity matrix containing immobilized pheromone receptor polypeptide or a unique fragment thereof); (b) associated with, covalently coupled to, or encapsulated a drug delivery device (e.g., a microsphere} to effect controlled release of the isolated nucleic acid or polypeptide in vivo or in vitro; (c) covalently coupled to another isolated nucleic acid or protein to form a chimeric molecule; and/or (d) labeled with a detectable agent (e.g., a radiolabel, a fluorescent label).
Thus, the invention provides chimeric molecules containing at least one first structural domain of one pheromone receptor polypeptide (e.g., an extracellular domain) coupled to a second structural domain (e.g., a transmembrane domain, such as TM1, TM2, etc.) of a different pheromone receptor 2o polypeptide. The invention also provides a method for isolating a pheromone receptor by (1) contacting a composition containing a putative pheromone receptor of the above-described family with an affinity matrix containing immobilized binding polypeptide under conditions to permit the pheromone receptor to selectively bind to the immobilized binding polypeptide, and (2) isolating the polypeptides that bind to the affinity matrix.
According to still another aspect of the invention, pharmaceutical compositions containing any of the foregoing compounds of the invention in a pharmaceutically acceptable carrier and methods of producing same by placing the compositions in the carrier also are provided.
According to still another aspect of the invention, methods for modulating a pheromone 3o receptor activity (e.g., a ligand binding activity, a signal transduction activity) in a cell (vertebrate or invertebrate) are provided. The cell can be located in vivo or in vitro and the methods can be used to down regulate (inhibit) or up regulate (stimulate) the pheromone receptor activity. For example, to inhibit a ligand binding activity, the cell is contacted with an inhibitor that can be an isolated binding polypepdde that binds to an extracellular portion of the receptor and, thereby, inhibits receptor binding to its cognate ligand. Such binding also can induce conformational changes in the receptor that alter the signal transduction activity of the receptor.
s The inhibitor can be an isolated antibody (or function equivalent thereof) which binds to an epitope located on an extracellular portion (such as EC2, EC3, EC4) of the pheromone receptor polypeptide, e.g., an amino-terminal extracellular domain or an "extracellular transmembrane region domain", i.e., an extracellular portion of the transmembrane region located between one or more transmembrane domains. Alternatively, the inhibitor can be an agent (e.g., an isolated 1 o competitive binding polypeptide) that inhibits receptor-ligand binding.
For example, the inhibitor can be an isolated fragment of a pheromone receptor (preferably, a soluble fragment), which fragment contains a ligand (pheromone) binding site. Other inhibitors can be identif ed in screening assays which test the ability of a putative inhibitor to inhibit pheromone receptor-mediated signal transduction or which test the ability of the putative inhibitor to inhibit binding 15 of a pheromone receptor to its known cognate ligand. Similarly, such screening assays can be used to identify molecules which stimulate pheromone receptor-mediated signal transduction.
Exemplary molecules which stimulate transduction include the naturally-occurring ligands (e.g., isolated from a biological source (e.g., urine, vaginal fluid), as well as synthetic ligands obtained from a non-biological source (e.g., a combinatorial library).
2o According to still another aspect of the invention, methods for inhibiting the binding of a pheromone having a binding domain to a pheromone receptor polypeptide having a ligand binding site that selectively binds to the binding domain are provided. The method involves contacting (in vivo or in vitro) the pheromone receptor polypeptide with an agent which binds to the ligand binding site under conditions to permit binding of the agent to the receptor. For 25 example, the agent can be an isolated binding polypeptide that binds to the ligand binding site of the pheromone receptor. Thus, the agent can be an isolated antibody (or functionally equivalent fragment thereof) which selectively binds to the ligand binding site of the receptor.
Alternatively, the agent can be a pheromone receptor antagonist, e.g., a molecule that mimics the structure of the naturally-occurring ligand but that does not mimic the function (stimulating 3o the receptor) of the naturally-occurring ligand. Agents which inhibit ligand binding can be identified in screening assays which test the ability of a putative binding inhibitor to inhibit binding of a pheromone receptor to its cognate ligand (e.g., pheromone). Such molecules can be isolated from a biological source or from a non-biological source.
According to another aspect of the invention, methods for modulating pheromone receptor-mediated signal transduction in a subject are provided. The methods involve administering to a subject in need of such treatment an agent that selectively binds to any of the above-described isolated nucleic acid molecules which encode a pheromone receptor or unique fragment thereof, or an expression product thereof, in an amount effective to modulate (down regulate or up regulate) pheromone receptor-mediated signal transduction in the subject.
Exemplary agents include antisense nucleic acid molecules and binding polypeptides.
t o Thus, according to yet another aspect of the invention, methods are provided for identifying lead compounds for an pharmacological agent useful in the diagnosis or treatment of a condition associated with pheromone receptor signal transduction activity or otherwise generally associated with binding of the receptor to its cognate Iigand.
Preferably, cells expressing intact pheromone receptor polypeptides or portions thereof are used in the screening z s assays for identifying lead compounds which modulate pheromone receptor-mediated ligand binding or signal transduction activity. Cells expressing these polypeptides, isolated pheromone receptor polypeptides and fragments of these polypeptides which contain the ligand binding site can be used in the screening assays for identifying lead compounds which modulate binding of the receptor to a known ligand.
2o The screening methods involve forming a mixture of a pheromone receptor polypeptide (as noted above) or fragment thereof containing a ligand binding site; a molecule which is known to ( 1 ) interact with the foregoing receptor to effect pheromone receptor-mediated signal transduction or (2) bind to the ligand binding site of the receptor; and a candidate pharmacological agent. The mixture is incubated under conditions which, in the absence of the 25 candidate pharmacological agent, permit a first amount of pheromone receptor-ligand binding or receptor-mediated signal transduction by the known ligand. A test amount of the selective binding of the ligand by receptor or of the specific activation of signal transduction is determined. Detection of an increase in the foregoing activities in the presence of the candidate - pharmacological agent indicates that the candidate pharmacological agent is a lead compound 3o for a pharmacological agent which increases specific activation of pheromone receptor-mediated signal transduction or selective binding of the ligand by the ligand binding site of the receptor.
Detection of a decrease in the foregoing activities in the presence of the candidate pharmacological agent indicates that the candidate pharmacological agent is a lead compound for a pharmacological agent which decreases specific activation of pheromone receptor-mediated signal transduction or selective binding of the ligand by the ligand binding site of the receptor.
Pheromone receptor polypeptides that are useful in the screening assays, preferably, are those selected from the group consisting of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52. Extracellular domains or portions thereof and portions of the transmembrane region, alone or coupled to one another, of these pheromone receptor polypeptides (indicated in the Examples) can be tested for their ability to inhibit receptor-ligand binding.
1 o These and other objects of the invention will be described in further detail in connection with the detailed description of the invention.
All patents, patent publications, references and other information identified in this document are incorporated in their entirety herein by reference.
Brief Description of the Drn~i in,g$
Figure 1 depicts a comparison of the deduced protein sequences encoded by VR
cDNA clones.
Figure 2 is a schematic comparison of ORs, VNRs, and Vrs.
Figure 3 depicts a comparison of the deduced protein sequences encoded by the 2o Go-VN cDNA clones.
Brief Descrix~tion of the Seauences SEQ ID NO. 1 is the nucleotide sequence of the mouse pheromone receptor VRl cDNA (GenBank Accession No. AF011411).
SEQ ID NO. 2 is the predicted amino acid sequence of the polypeptide encoded by the mouse pheromone receptor VR1 cDNA (GenBank Accession No. AF011411).
SEQ ID NO. 3 is the nucleotide sequence of the mouse pheromone receptor VR2 cDNA (GenBank Accession No. AF011412).
SEQ ID NO. 4 is the predicted amino acid sequence of the polypeptide encoded by 3o the mouse pheromone receptor VR2 cDNA (GenBank Accession No. AF011412).
SEQ ID NO. 5 is the nucleotide sequence of the mouse pheromone receptor VR3 cDNA (GenBank Accession No. AF011413).
SEQ ID NO. 6 is the predicted amino acid sequence of the polypeptide encoded by the mouse pheromone receptor VR3 cDNA (GenBank Accession No. AF011413).
SEQ ID NO. 7 is the nucleotide sequence of the mouse pheromone receptor VR4 cDNA (GenBank Accession No. AF011414).
SEQ ID NO. 8 is the predicted amino acid sequence of the polypeptide encoded by the mouse pheromone receptor VR4 cDNA (GenBank Accession No. AF011414).
SEQ ID NO. 9 is the nucleotide sequence of the mouse pheromone receptor VRS
cDNA (GenBank Accession No. AF011415).
SEQ ID NO. 10 is the predicted amino acid sequence of the polypeptide encoded by 1 o the mouse pheromone receptor VRS cDNA (GenBank Accession No. AF011415).
SEQ ID NO. 11 is the nucleotide sequence of the mouse pheromone receptor VR6 cDNA (GenBank Accession No. AFO l 1416).
SEQ ID NO. 12 is the predicted amino acid sequence of the polypeptide encoded by the mouse pheromone receptor VR6 cDNA (GenBank Accession No. AF011416).
SEQ ID NO. 13 is the nucleotide sequence of the mouse pheromone receptor VR7 cDNA (GenBank Accession No. AF011417).
SEQ ID NO. 14 is the predicted amino acid sequence of the polypeptide encoded by the mouse pheromone receptor VR7 cDNA (GenBank Accession No. AF011417).
SEQ ID NO. 15 is the nucleotide sequence of the mouse pheromone receptor VR8 2o cDNA (GenBank Accession No. AF011418).
SEQ ID NO. 16 is the predicted amino acid sequence of the polypeptide encoded by the mouse pheromone receptor VR8 cDNA (GenBank Accession No. AF011418).
SEQ ID NO. 17 is the nucleotide sequence of the mouse pheromone receptor VR9 cDNA (GenBank Accession No. AF011419).
SEQ ID NO. 18 is the predicted amino acid sequence of the polypeptide encoded by the mouse pheromone receptor VR9 cDNA (GenBank Accession No. AF011419).
SEQ ID NO. 19 is the nucleotide sequence of the mouse pheromone receptor VR10 cDNA (GenBank Accession No. AF011420).
SEQ ID NO. 20 is the predicted amino acid sequence of the polypeptide encoded by 3o the mouse pheromone receptor VR10 cDNA (GenBank Accession No. AFO11420).
SEQ ID NO. 21 is the nucleotide sequence of the mouse pheromone receptor VRl 1 cDNA (GenBank Accession No. AF011421).
SEQ ID NO. 22 is the predicted amino acid sequence of the polypeptide encoded by the mouse pheromone receptor VRl 1 cDNA (GenBank Accession No. AF011421).
SEQ ID NO. 23 is the nucleotide sequence of the mouse pheromone receptor VR12 cDNA (GenBank Accession No. AFOl 1422).
SEQ ID NO. 24 is the predicted amino acid sequence of the polypeptide encoded by the mouse pheromone receptor VR12 cDNA (GenBank Accession No. AF011422).
SEQ ID NO. 25 is the nucleotide sequence of the mouse pheromone receptor VR13 cDNA (GenBank Accession No. AF011423).
SEQ ID NO. 26 is the predicted amino acid sequence of the polypeptide encoded by 1o the mouse pheromone receptor VR13 cDNA (GenBank Accession No. AFOl 1423).
SEQ ID NO. 27 is the nucleotide sequence of the mouse pheromone receptor VR14 cDNA (GenBank Accession No. AF011424).
SEQ ID NO. 28 is the predicted amino acid sequence of the polypeptide encoded by the mouse pheromone receptor VR14 cDNA (GenBank Accession No. AF011424).
I s SEQ ID NO. 29 is the nucleotide sequence of the mouse pheromone receptor cDNA (GenBank Accession No. AF011425).
SEQ ID NO. 30 is the predicted amino acid sequence of the polypeptide encoded by the mouse pheromone receptor VR15 cDNA (GenBank Accession No. AF011425).
SEQ ID NO. 31 is the nucleotide sequence of the mouse pheromone receptor VR16 20 cDNA (GenBank Accession No. AF011426).
SEQ ID NO. 32 is the predicted amino acid sequence of the polypeptide encoded by the mouse pheromone receptor VR16 cDNA (GenBank Accession No. AF011426).
SEQ ID NO. 33 is the nucleotide sequence of the rat pheromone receptor Go-VN1 cDNA (GenBank Accession No. AF016178).
2s SEQ ID NO. 34 is the predicted amino acid sequence of the polypeptide encoded by the rat pheromone receptor Go-VN1 cDNA (GenBank Accession No. AF016178).
SEQ ID NO. 35 is the nucleotide sequence of the rat pheromone receptor Go-VN2 cDNA (GenBank Accession No. AF016179).
SEQ ID NO. 36 is the predicted amino acid sequence of the polypeptide encoded by 3o the rat pheromone receptor Go-VN2 cDNA (GenBank Accession No. AF016179).
SEQ ID NO. 37 is the nucleotide sequence of the rat pheromone receptor Go-VN3 cDNA (GenBank Accession No. AF016180).
W0.99/00422 PCT/US98/13680 SEQ ID NO. 38 is the predicted amino acid sequence of the polypeptide encoded by the rat pheromone receptor Go-VN3 cDNA (GenBank Accession No. AF016180).
SEQ ID NO. 39 is the nucleotide sequence of the rat pheromone receptor Go-VN4 cDNA (GenBank Accession No. AF016181 ).
L 5 SEQ ID NO. 40 is the predicted amino acid sequence of the polypeptide encoded by the rat pheromone receptor Go-VN4 cDNA (GenBank Accession No. AFO 16181 ).
SEQ ID NO. 41 is the nucleotide sequence of the rat pheromone receptor Go-VNS
cDNA (GenBank Accession No. AF016182).
SEQ ID NO. 42 is the predicted amino acid sequence of the polypeptide encoded by to the rat pheromone receptor Go-VNS cDNA {GenBank Accession No. AF016182).
SEQ ID NO. 43 is the nucleotide sequence of the rat pheromone receptor Go-VN6 cDNA (GenBank Accession No. AF016183).
SEQ ID NO. 44 is the predicted amino acid sequence of the polypeptide encoded by the rat pheromone receptor Go-VN6 cDNA (GenBank Accession No. AF016183).
1 s SEQ ID NO. 45 is the nucleotide sequence of the rat pheromone receptor Go-cDNA (GenBank Accession No. AF016184).
SEQ ID NO. 46 is the predicted amino acid sequence of the polypeptide encoded by the rat pheromone receptor Go-VN7 cDNA (GenBank Accession No. AF016184).
SEQ ID NO. 47 is the nucleotide sequence of the rat pheromone receptor Go-2o cDNA (GenBank Accession No. AF016185).
SEQ ID NO. 48 is the predicted amino acid sequence of the polypeptide encoded by the rat pheromone receptor Go-VN13C cDNA (GenBank Accession No. AF016185).
SEQ ID NO. 49 is the nucleotide sequence of the rat pheromone receptor Go-cDNA (GenBank Accession No. AF016186).
25 SEQ ID NO. 50 is the predicted amino acid sequence of the polypeptide encoded by the rat pheromone receptor Go-VN13B cDNA (GenBank Accession No. AF016186).
SEQ ID NO. 51 is a partial nucleotide sequence of the human pheromone receptor hVRI.
- SEQ ID NO. 52 is the predicted amino acid sequence of the polypeptide encoded by 3o the partial sequence of the human pheromone receptor hVRI .
SEQ ID NO. 53 is a partial nucleotide sequence of the human pheromone receptor hVN01.
SEQ ID NO. 54 is a partial nucleotide sequence of the human pheromone receptor hVN02.
SEQ ID NO. 55 is a partial nucleotide sequence of the human pheromone receptor hVN03.
SEQ ID NO. 56 is the nucleotide sequence of primer AL 1.
SEQ ID NO. 57 is the nucleotide sequence of primer AL3.
SEQ ID NO. 58 is a fifty amino acid sequence of Go-VN13B (SEQ ID NO. 50) that is absent from Go-VN13C (SEQ ID NO. 48).
SEQ ID NO. 59 is the amino acid sequence of a rat kidney extracellular calcium/
1 o polyvalent canon-sensing receptor.
SEQ ID NO. 60 is a degenerate oligonucleotide primer from a conserved VR
domain.
SEQ ID NO. 61 is a degenerate oligonucleotide primer from a conserved VR
domain.
SEQ ID NO. 62 is a degenerate oligonucleotide primer from a conserved VR
domain.
SEQ ID NO. 63 is a degenerate oligonucleotide primer from a conserved VR
domain.
SEQ ID NO. 64 is a degenerate oligonucleotide primer from a conserved VR
domain.
SEQ ID NO. 65 is a degenerate oligonucleotide primer from a conserved VR
domain.
SEQ ID NO. 66 is a degenerate oligonucleotide primer from a conserved VR
domain.
SEQ ID NO. 67 is a degenerate oligonucleotide primer from a conserved VR
domain.
SEQ ID NO. 68 is the nucleotide sequence of the coding region of the mouse 2o pheromone receptor VRI.
SEQ ID NO. 69 is the nucleotide sequence of the coding region of the mouse pheromone receptor VR2.
SEQ ID NO. 70 is the nucleotide sequence of the coding region of the mouse pheromone receptor VR3.
SEQ ID NO. 71 is the nucleotide sequence of the coding region of the mouse pheromone receptor VR4.
SEQ ID NO. 72 is the nucleotide sequence of the coding region of the mouse pheromone receptor VRS.
SEQ ID NO. 73 is the nucleotide sequence of the coding region of the mouse 3o pheromone receptor VR6.
SEQ ID NO. 74 is the nucleotide sequence of the coding region of the mouse pheromone receptor VR7.
- _ - 17-SEQ ID NO. 75 is the nucleotide sequence of the coding region of the mouse pheromone receptor VRB.
i SEQ ID NO. 76 is the nucleotide sequence of the coding region of the mouse x pheromone receptor VR9.
SEQ ID NO. 77 is the nucleotide sequence of the coding region of the mouse pheromone receptor VR10.
SEQ ID NO. 78 is the nucleotide sequence of the coding region of the mouse pheromone receptor VRl l .
SEQ ID NO. 79 is the nucleotide sequence of the coding region of the mouse 1o pheromone receptor VR12.
SEQ ID NO. 80 is the nucleotide sequence of the coding region of the mouse pheromone receptor VR13.
SEQ ID NO. 81 is the nucleotide sequence of the coding region of the mouse pheromone receptor VR14.
SEQ ID NO. 82 is the nucleotide sequence of the coding region of the mouse pheromone receptor VR15.
SEQ iD NO. 83 is the nucleotide sequence of the coding region of the mouse pheromone receptor VR16.
SEQ ID NO. 84 is the nucleotide sequence of the coding region of the rat pheromone 2o receptor GoVNl.
SEQ ID NO. 85 is the nucleotide sequence of the coding region of the rat pheromone receptor GoVN2.
SEQ ID NO. 86 is the nucleotide sequence of the coding region of the rat pheromone receptor GoVN3.
SEQ ID NO. 87 is the nucleotide sequence of the coding region of the rat pheromone receptor GoVN4.
SEQ ID NO. 88 is the nucleotide sequence of the coding region of the rat pheromone receptor GoVNS.
- SEQ ID NO. 89 is the nucleotide sequence of the coding region of the rat pheromone 3o receptor GoVN6.
SEQ ID NO. 90 is the nucleotide sequence of the coding region of the rat pheromone receptor GoVN7.
WO 99/00422 PCT/US98/13l80 SEQ ID NO. 91 is the nucleotide sequence of the coding region of the rat pheromone receptor GoVNI3C.
SEQ ID NO. 92 is the nucleotide sequence of the coding region of the rat pheromone receptor GoVNI3B.
Detailed Description of the Invention The present invention in one aspect involves the cloning of cDNAs encoding several members of a multigene family of pheromone receptors. Complete cDNA sequences for selected marine and rat pheromone receptors are provided. Partial sequences of the human gene also are provided. The present invention also relates to the discovery that this family of pheromone receptors is expressed in a Ga° protein-expressing vomeronasal organ neurons ("Gq~
VNO") or in another olfactory organ neuron in an animal (preferably, a mammal and more preferably, a human) which lacks a vomeronasal organ. Throughout this description, the pheromone receptors of the invention alternatively are referred to as "pheromone receptors", "Ga°+ VNO pheromone receptors" or, simply, "Gaco+ VNO receptors".
Analysis of the sequence homology between members of the receptor family by comparison to nucleic acid and protein databases established that the pheromone receptor family has several domains. These include, from amino terminus to carboxyl terminus:
(a) an amino-terminal extracellular domain containing from 30 to 600 amino acids; (b) a transmembrane region comprising: (i) seven non-contiguous transmembrane domains designated TM1, TM2, TM3, TM4, TMS, TM6 and TM7, (ii) three non-contiguous extracellular domains designated EC2, EC3 and EC4, and (iii) three non-contiguous intracellular domains designated IC1, IC2, and IC3, wherein the transmembrane domains, the extracellular domains and the intracellular domains are attached to one another from amino terminus to carboxyl terminus in the orderTMl-IC1-TM2-EC2-TM3-IC2-TM4-EC3-TMS-IC3-TM6-EC4-TM7,andwhereinthe transmembrane region has at least about 35% homology and a length approximately equal to a transmembrane region of a polypeptide selected from the group consisting of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 34, 36, 38, 40, 42, 44, 46, 48, and 50; and (c) a carboxyl-terminal intracellular domain containing from 5 to 200 amino acids. Each polypeptide member of the family is 3o expressed in a Gao protein-expressing vomeronasal organ neuron or are expressed in another olfactory organ neuron in an animal which does not possess a vomeronasal organ. One skilled in the art can readily identify olfactory organs in animals which do not possess a vomeronasal organ. The homology can be calculated using various, publicly available software tools developed by NCBI (Bethesda, Maryland) that can be obtained through the Internet (ftp://ncbi.nlm.nih.gov/pub~. Exemplary tools include the BLAST system.
Pairwise and ClustalW alignments (BLOSUM30 matrix setting) as well as Kyte-Doolittle hydropathic analysis s can be obtained using the MacVector sequence analysis software (Oxford Molecular Group).
The structure of the Gacp+ VNO pheromone receptors suggests that these receptors are members of the large G protein-coupled receptor superfamily (GPCR). Like other GPCRs, the Gao+ VNO pheromone receptors exhibit seven hydrophobic stretches ("hydrophobic domains") and are similar in structure to other types of GPCRs, the calcium sensing receptor (CSR Ser. ID
~ o No. 59) and the metabotropic glutamate receptors (mGluRs). The CSR and mGluRs are unusual among the GPCRs in that they have extremely long N-terminal extracellular domain (e.g., 557-565 amino acids), a feature that is shared by the pheromone receptors of the invention. Despite this similarity, the receptors of the invention do not share substantial primary structure homology with the CSR and mGluRs. The receptors of the invention also are very different structurally 15 from two other G-protein coupled receptors, the odorant receptors and Gai2+
vomeronasal receptors, which share none of the characteristic sequence motifs of the receptors of the invention and, moreover, which have very small (--12-28 amino acids) N-terminal extracellular domains.
The receptors of the invention differ somewhat in amino acid sequence, with regions of relatively high sequence homology. Refer to Examples 1 and 2 for a discussion and illustration 20 of the amino acid sequence homology for the marine and rat Gao+ VNO
receptors, respectively.
Other features of these members of the Gao+ VNO receptor family also are discussed and illustrated in the Examples. For example, signal sequences have been identified for several of the Gao+ VNO receptors disclosed in the Examples.
Homologs and alleles of the pheromone receptor nucleic acids of the invention can be 25 identified by conventional techniques. Thus, an aspect of the invention is those nucleic acid sequences (SEQ ID NOs. 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 54, and 55) which code for Goco+ VNO pheromone receptors and which hybridize to a nucleic acid molecule consisting of the coding region of any one Goco+ VNO
pheromone receptor selected from the group consisting of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 16, 30 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52, under high or low stringency conditions. The term "high or low stringency conditions" as used herein refers to parameters with which the art is familiar. Nucleic acid hybridization parameters may be found in references which compile such methods, e.g. Molecular Cloning: A Laboratory Manual, J.
Sambrook, et al., eds., Second Edition, Cold Spring Harbor Laboratory Press, Coid Spring Harbor, New York, I 989, or Current Protocols in Molecular Biology, F.M.
Ausubel, et al., eds., John Wiley & Sons, Inc., New York. More specifically, high stringency conditions, as used herein, refers, for example, to hybridization at 65°C in hybridization buffer (3.5 x SSC, 0.02%
Ficoll, 0.02% polyvinyl pyrolidone, 0.02% Bovine Serum Albumin, 2.SmM
NaH,P04(pH7), 0.5% SDS, 2mM EDTA). SSC is O.15M sodium chloride/O.15M sodium citrate, pH7;
SDS is sodium dodecyl sulphate; and EDTA is ethylenediaminetetracetic acid. Low stringency conditions would be the same, but with a lower temperature (e.g., SS
°C). After hybridization, 1 o the membrane upon which the DNA is transferred is washed at 2 x SSC at room temperature and then at 0.2 x SSC/0.5% SDS at temperatures of up to 65°C. Additional conditions of varying stringency are provided in the Examples.
There are other conditions, reagents, and so forth which can used, which result in a similar degree of stringency. The skilled artisan will be familiar with such conditions, and thus they are not given here. It will be understood, however, that the skilled artisan will be able to manipulate the conditions in a manner to permit the clear identification of homologs and alleles of the Goco+ VNO pheromone receptor nucleic acids of the invention. The skilled artisan also is fanuliar with the methodology for screening cells and libraries for expression of such molecules which then are routinely isolated, followed by isolation of the pertinent nucleic acid molecule 2o and sequencing.
In general homologs and alleles typically will share at least 35% nucleotide identity and/or at least SO% amino acid identity to the cDNAs encoding a Ga°+
VNO pheromone receptor polypeptide selected from the group consisting of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52, in some instances will share at least 50% nucleotide identity and/or at least 65% amino acid identity and in still other instances will share at least 60% nucleotide identity and/or at least 75% amino acid identity. Watson-Crick complements of the foregoing nucleic acids also are embraced by the invention.
As discussed above in the Summary of the invention, certain domains within the pheromone receptors may share even greater sequence homology to a pheromone receptor polypeptide selected from the 3o group consisting of SEQ ID NO. 2, 4, 6, 8,10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52.
In screening for Gao+ VNO pheromone receptor polypeptides, a Southern blot may be performed using the foregoing conditions, together with a radioactive probe.
After washing the membrane to which the DNA is finally transferred, the membrane can be placed against X-ray film to detect the radioactive signal.
The invention also includes degenerate nucleic acids which include alternative colons to those present in the native materials. For example, serine residues are encoded by the colons TCA, AGT, TCC, TCG, TCT and AGC. Each of the six colons is equivalent for the purposes of encoding a serine residue. Thus, it will be apparent to one of ordinary skill in the art that any of the serine-encoding nucleotide triplets may be employed to direct the protein synthesis 1o apparatus, in vitro or in vivo, to incorporate a serine residue into an elongating Gao+ VNO
pheromone receptor polypeptide. Similarly, nucleotide sequence triplets which encode other amino acid residues include, but are not limited to,: CCA, CCC, CCG and CCT
(proline colons); CGA, CGC, CGG, CGT, AGA and AGG (arginine colons); ACA, ACC, ACG and ACT (threonine colons); AAC and AAT (asparagine colons); and ATA, ATC and ATT
(isoleucine colons). Other amino acid residues may be encoded similarly by multiple nucleotide sequences. Thus, the invention embraces degenerate nucleic acids that differ from the biologically isolated nucleic acids in colon sequence due to the degeneracy of the genetic code.
In addition, areas of high similarity among pheromone receptors may differ in amino acid sequences such that they share many, but not all, amino acids. Their nucleotide sequences all 2o differ accordingly.
The invention also provides isolated unique fragments of the cDNAs encoding a Gao+
VNO polypeptide selected from the group consisting of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52, or complements of these sequences. A unique fragment is one that is a 'signature' for the larger nucleic acid. It, for example, is long enough to assure that its precise sequence is not found in molecules outside of the Goco+ VNO pheromone receptor nucleic acids defined above. Unique fragments can be used as probes in Southern blot assays to identify such nucleic acids, or can be used as primers in amplification assays such as those employing PCR. As known to those skilled in the art, large probes such as 200 nucleotides or more are preferred for certain uses such as Southern blots, 3o while smaller fragments will be preferred for uses such as PCR. Unique fragments also can be used to produce fusion proteins for generating antibodies or determining binding of the polypeptide fragments, as demonstrated in the Examples, or for generating immunoassay components. Likewise, unique fragments can be employed to produce nonfused fragments of the Gao+ VNO pheromone receptor polypeptides, useful, for example, in the preparation of antibodies, in immunoassays, and as a competitive binding partner of the pheromones and/or other ligands which bind to the Gao+ VNO pheromone receptor polypeptides, for example, in therapeutic applications. Unique fragments further can be used as antisense molecules to inhibit the expression of Gao+ VNO pheromone receptor nucleic acids and polypeptides, particularly for the insecticide and other fertility control purposes as described in greater detail below.
As will be recognized by those skilled in the art, the size of the unique fragment will depend upon its conservancy in the genetic code. Thus, some regions of a cDNA
selected from 1o the group consisting of SEQ ID NO. 51, 53, 54, 55, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, and 92, that encodes a Gao+
VNO polypeptide, and its complement will require longer segments to be unique while others will require only short segments, typically between 12 and 32 nucleotides (e.g. 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 and 32 bases long). Virtually any segment of the region of the 1 s cDNAs encoding the full length Gaco+ VNO polypeptide or their complements, that is I 8 or more nucleotides in length will be unique. Those skilled in the art are well versed in methods for selecting such sequences, typically on the basis of the ability of the unique fragment to selectively distinguish the sequence of interest from non-Gao+ VNO pheromone receptor nucleic acids. A comparison of the sequence of the fragment to those on known data bases typically is 2o all that is necessary, although in vitro confirmatory hybridization and sequencing analysis may be performed.
As mentioned above, the invention embraces antisense oligonucleotides that selectively bind to a nucleic acid molecule encoding a Goco+ VNO pheromone receptor polypeptide, to decrease a pheromone receptor activity (e.g., a Iigand binding activity, a signal transduction 25 activity). This is desirable in virtually any condition wherein a reduction in pheromone binding or induction of a behavior that is triggered by pheromone binding is desirable, including to control fertility and behavior in vertebrates and invertebrates. The compositions of the invention are particularly useful in, for example, controlling fertility in livestock and controlling reproduction in rodents or insects by interrupting the normal behaviors of rodents or insects that 3o result in reproduction. As used herein, the term "antisense oligonucleotide" or "antisense"
describes an oligonucleotide that is an oligoribonucleotide, oligodeoxyribonucleotide, modified oligoribonucleotide, or modified oligodeoxyribonucleotide which hybridizes under physiological _ _ _ 23 _ conditions to DNA comprising a particular gene or to an mRNA transcript of that gene and, thereby, inhibits the transcription of that gene and/or the translation of that mRNA. The antisense molecules are designed so as to interfere with transcription or translation of a target gene upon hybridization with the target gene or transcript. Those skilled in the art will recognize s that the exact length of the antisense oligonucleotide and its degree of complementarity with its target will depend upon the specific target selected, including the sequence of the target and the particular bases which comprise that sequence. It is preferred that the antisense oligonucleotide be constructed and arranged so as to bind selectively with the target under physiological conditions, i.e., to hybridize substantially more to the target sequence than to any other sequence 1 o in the target cell under physiological conditions. Based upon the cDNA
sequences of Examples l and 2 (SEQ ID NOs. 1, 3, 5, 7, 9, 11,13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 54, and 55), or upon allelic or homologous genomic and/or cDNA
sequences, one of skill in the art can easily choose and synthesize ariy of a number of appropriate antisense molecules for use in accordance with the present invention. In order to be sufficiently i s selective and potent for inhibition, such antisense oligonucleotides should comprise at least 10 and, more preferably, at least 15 consecutive bases which are complementary to the target, although in certain cases modified oligonucleotides as short as 7 bases in length have been used successfully as antisense oligonucleotides (Wagner et al., Nature Biotechnol.
14:840-844, 1996).
Most preferably, the antisense oligonucleotides comprise a complementary sequence of 20-30 2o bases. Although oligonucleotides may be chosen which are antisense to any region of the gene or mRNA transcripts, in preferred embodiments the antisense oligonucleotides correspond to N-terminal or S' upstream sites such as translation initiation, transcription initiation or promoter sites. In addition, 3'-untranslated regions may be targeted. Targeting to mRNA
splicing sites has also been used in the art but may be less preferred if alternative mRNA
splicing occurs. In 25 addition, the antisense is targeted, preferably, to sites in which mRNA
secondary structure is not expected {see, e.g., Sainio et al., Cell Mol. Neurobiol. 14(5):439-457, 1994) and at which proteins are not expected to bind. Finally, although, Examples 1 and 2 disclose cDNA sequences (SEQ ID NOs. 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 54, and 55), one of ordinary skill in the art may easily derive the genomic DNA
3o corresponding to the cDNA of these cDNAs. Thus, the present invention also provides for antisense oligonucleotides which are complementary to the genomic DNA
corresponding to a cDNA sequence selected from the group consisting of SEQ ID NOs. 1, 3, 5, 7, 9, 1 l, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 54, and 55. Similarly, antisense to allelic or homologous cDNAs and genomic DNAs are enabled without undue experimentation.
In one set of embodiments, the antisense oligonucleotides of the invention may be composed of "natural" deoxyribonucleotides, ribonucleotides, or any combination thereof. That is, the 5' end of one native nucleotide and the 3' end of another native nucleotide may be covalently linked, as in natural systems, via a phosphodiester internucleoside linkage. These oligonucleotides may be prepared by art recognized methods which may be carried out manually or by an automated synthesizer. They also may be produced recombinantly by vectors.
1o In preferred embodiments, however, the antisense oligonucleotides of the invention also may include "modified" oligonucleotides. That is, the oligonucleotides may be modified in a number of ways which do not prevent them from hybridizing to their target but which enhance their stability or targeting or which otherwise enhance their therapeutic effectiveness.
The term "modified oligonucleotide" as used herein describes an oligonucleotide in t 5 which ( 1 ) at least two of its nucleotides are covalently linked via a synthetic internucleoside linkage (i.e., a linkage other than a phosphodiester linkage between the 5' end of one nucleotide and the 3' end of another nucleotide) and/or (2) a chemical group not normally associated with nucleic acids has been covalently attached to the oligonucleotide. Preferred synthetic internucleoside linkages are phosphorothioates, alkylphosphonates, phosphorodithioates, 2o phosphate esters, alkylphosphonothioates, phosphoramidates, carbamates, carbonates, phosphate triesters, acetamidates, carboxymethyl esters and peptides.
The term "modified oligonucleotide" also encompasses oligonucleotides with a covalently modified base and/or sugar. For example, modified oligonucleotides include oiigonucleotides having backbone sugars which are covalently attached to low molecular weight 25 organic groups other than a hydroxyl group at the 3' position and other than a phosphate group at the 5' position. Thus modified oligonucleotides may include a 2'-O-alkylated ribose group.
In addition, modified oligonucleotides may include sugars such as arabinose instead of ribose.
The present invention, thus, contemplates pharmaceutical preparations containing modified antisense molecules that are complementary to and hybridizable with, under physiological 3o conditions, nucleic acids encoding pheromone receptor polypeptides, together with pharmaceutically acceptable carriers.
Antisense oligonucleotides may be administered as part of a pharmaceutical composition.
Such a pharmaceutical composition may include the antisense oligonucleotides in combination with any standard physiologically and/or pharmaceutically acceptable carriers which are known in the art. The compositions should be sterile and contain a therapeutically effective amount of the antisense oligonucleotides in a unit of weight or volume suitable for administration to a patient. The term "pharmaceutically acceptable" means a non-toxic material that does not interfere with the effectiveness of the biological activity of the active ingredients. The term "physiologically acceptable" refers to a non-toxic material that is compatible with a biological system such as a cell, cell culture, tissue, or organism. The characteristics of the carrier will 1 o depend on the route of administration. Physiologically and pharmaceutically acceptable Garners include diluents, fillers, salts, buffers, stabilizers, solubilizers, and other materials which are well known in the art.
As used herein, a "vector" may be any of a number of nucleic acids into which a desired sequence may be inserted by restriction and ligation for transport between different genetic environments or for expression in a host cell. Vectors are typically composed of DNA although RNA vectors are also available. Vectors include, but are not limited to, plasmids, phagemids and virus genomes. A cloning vector is one which is able to replicate in a host cell, and which is further characterized by one or more endonuclease restriction sites at which the vector may be cut in a determinable fashion and into which a desired DNA sequence may be ligated such that 2o the new recombinant vector retains its ability to replicate in the host cell. In the case of plasmids, replication of the desired sequence may occur many times as the plasmid increases in copy number within the host bacterium or just a single time per host before the host reproduces by mitosis. In the case of phage, replication may occur actively during a lytic phase or passively during a lysogenic phase. An expression vector is one into which a desired DNA
sequence may be inserted by restriction and ligation such that it is operably joined to regulatory sequences and may be expressed as an RNA transcript. Vectors may further contain one or more marker sequences suitable for use in the identification of cells which have or have not been transformed or transfected with the vector. Markers include, for example, genes encoding proteins which increase or decrease either resistance or sensitivity to antibiotics or other compounds, genes 3o which encode enzymes whose activities are detectable by standard assays known in the art (e.g., 13-galactosidase or alkaline phosphatase), and genes which visibly affect the phenotype of transformed or transfected cells, hosts, colonies or plaques (e.g., green fluorescent protein).
Preferred vectors are those capable of autonomous replication and expression of the structural gene products present in the DNA segments to which they are operably joined.
As used herein, a coding sequence and regulatory sequences are said to be "operably"
joined when they are covalently linked in such a way as to place the expression or transcription of the coding sequence under the influence or control of the regulatory sequences. If it is desired that the coding sequences be translated into a functional protein, two DNA
sequences are said to be operably joined if induction of a promoter in the 5' regulatory sequences results in the transcription of the coding sequence and if the nature of the linkage between the two DNA
sequences does not (1) result in the introduction of a frame-shift mutation, (2) interfere with the 1o ability of the promoter region to direct the transcription of the coding sequences, or (3) interfere with the ability of the corresponding RNA transcript to be translated into a protein. Thus, a promoter region would be operably joined to a coding sequence if the promoter region were capable of effecting transcription of that DNA sequence such that the resulting transcript might be translated into the desired protein or polypeptide.
The precise nature of the regulatory sequences needed for gene expression may vary between species or cell types, but shall in general include, as necessary, 5' non-transcribed and 5' non-translated sequences involved with the initiation of transcription and translation respectively, such as a TATA box, capping sequence, CART sequence, and the like. Especially, such 5' non-transcribed regulatory sequences will include a promoter region which includes a 2o promoter sequence for transcriptional control of the operably joined gene.
Regulatory sequences may also include enhancer sequences or upstream activator sequences as desired. The vectors of the invention may optionally include 5' leader or signal sequences. The choice and design of an appropriate vector is within the ability and discretion of one of ordinary skill in the art.
Expression vectors containing all the necessary elements for expression are commercially available and known to those skilled in the art. See, e.g., Sambrook et al., Molecular Cloning:
A Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, 1989. Cells are genetically engineered by the introduction into the cells of heterologous DNA
(RNA) encoding pheromone receptor polypeptide or fragment or variant thereof. That heterologous DNA (RNA) is placed under operable control of transcriptional elements to permit the expression of the 3o heterologous DNA in the host cell.
Preferred systems for mRNA expression in mammalian cells are those such as pRc/CMV
(available from Invitrogen, Carlsbad, CA) that contain a selectable marker such as a gene that confers 6418 resistance (which facilitates the selection of stably transfected cell lines) and the human cytomegalovirus (CMV) enhancer-promoter sequences. Additionally, suitable for expression in primate or canine cell lines is the pCEP4 vector (Invitrogen), which contains an Epstein Barr virus (EBV) origin of replication, facilitating the maintenance of plasmid as a multicopy extrachromosomal element. Another expression vector is the pEF-BOS
plasmid containing the promoter of polypeptide Elongation Factor 1 a, which stimulates efficiently transcription in vitro. The plasmid is described by Mishizuma and Nagata (Nuc.
Acids Res.
18:5322, 1990), and its use in transfection experiments is disclosed by, for example, Demoulin (Mol. Cell. Biol. 16:4710-4716,1996). Still another preferred expression vector is an adenovirus, described by Stratford-Perricaudet, which is defective for E1 and E3 proteins (J. Clin. Invest.
90:626-630, 1992). The use of the adenovirus as an Adeno.PlA recombinant is disclosed by Warnier et al., in intradermal injection in mice for immunization against P1A
(Int. J. Cancer, 67:303-310, 1996).
The invention also embraces so-called expression kits, which allow the artisan to prepare a desired expression vector or vectors. Such expression kits include at least separate portions of each of the previously discussed coding sequences. Other components may be added, as desired, as long as the previously mentioned sequences, which are required, are included.
The invention also permits the construction of pheromone receptor gene "knock-outs"
in cells and in animals, providing materials for studying certain aspects of pheromone receptor 2o binding, signal transduction activity, or function.
The invention also provides isolated polypeptides, which include a pheromone receptor polypep6de selected from the group consisting of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52 and unique fragments of these pheromone receptor polypeptides. Such polypeptides are useful, for example, alone or as fusion proteins to generate antibodies.
A unique fragment of a pheromone receptor polypeptide, in general, has the features and characteristics of unique fragments as discussed above in connection with nucleic acids. As will be recognized by those skilled in the art, the size of the unique fragment will depend upon factors such as whether the fragment constitutes a portion of a conserved protein domain. Thus, some 3o regions of a pheromone receptor polypeptide selected from the group consisting of SEQ ID NO.
2, 4, 6, 8, 10,12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52 will require longer segments to be unique while others will require only short segments, typically between 5 and 12 amino acids (e.g. 5, 6, 7, 8, 9, 10, 11 and 12 amino acids long).
Unique fragments of a polypeptide preferably are those fragments which retain a distinct functional capability of the polypeptide. Functional capabilities which can be retained in a unique fragment of a polypeptide include interaction with antibodies, interaction with other polypeptides (G-proteins) or molecules (e.g., a ligand) or fragments thereof, selective binding of nucleic acids or proteins, and enzymatic activity. Those skilled in the art are well versed in methods for selecting unique amino acid sequences, typically on the basis of the ability of the unique fragment to selectively distinguish the sequence of interest from non-family members.
1 o A comparison of the sequence of the fragment to those on known data bases typically is all that is necessary.
The invention embraces variants of the pheromone receptor polypeptides described above. As used herein, a "variant" of a pheromone receptor polypeptide is a polypeptide which contains one or more modifications to the primary amino acid sequence of a pheromone receptor poiypeptide. Modifications which create a pheromone receptor variant can be made to a pheromone receptor polypeptide 1 ) to reduce or eliminate an activity of a pheromone receptor polypeptide, such as a ligand binding activity or a signal transduction activity; 2) to enhance a property of a pheromone receptor polypeptide, such as protein stability in an expression system or the stability of protein-protein binding; or 3) to provide a novel activity or property to a 2o pheromone receptor polypeptide, such as addition of an antigenic epitope or addition of a detectable moiety. Modifications to a pheromone receptor polypeptide are typically made to the nucleic acid which encodes the pheromone receptor polypeptide, and can include deletions, point mutations, truncations, amino acid substitutions and additions of amino acids or non-amino acid moieties. Alternatively, modifications can be made directly to the polypeptide, such as by cleavage, addition of a linker molecule, addition of a detectable moiety, such as biotin, addition of a fatty acid, and the like. Modifications also embrace fusion proteins comprising all or part of the pheromone receptor amino acid sequence.
In general, variants include pheromone receptor polypeptides which are modified specifically to alter a feature of the polypeptide unrelated to its physiological activity. For 3o example, cysteine residues can be substituted or deleted to prevent unwanted disulfide linkages.
Similarly, certain amino acids can be changed to enhance expression of a pheromone receptor polypeptide by eliminating proteolysis by proteases in an expression system.
Mutations of a nucleic acid which encode a pheromone receptor polypeptide preferably preserve the amino acid reading frame of the coding sequence, and preferably do not create i regions in the nucleic acid which are likely to hybridize to form secondary structures, such a hairpins or loops, which can be deleterious to expression of the variant polypeptide.
Mutations can be made by selecting an amino acid substitution, or by random mutagenesis of a selected site in a nucleic acid which encodes the polypeptide. Variant polypeptides are then expressed and tested for one or more activities to determine which mutation provides a variant polypeptide with the desired properties. Further mutations can be made to variants {or to non-variant pheromone receptor polypeptides) which are silent as to the 1o amino acid sequence of the polypeptide, but which provide preferred codons for translation in a particular host. The preferred colons for translation of a nucleic acid in, e.g., E. coli, are well known to those of ordinary skill in the art. Still other mutations can be made to the noncoding sequences of a pheromone receptor gene or cDNA clone to enhance expression of the polypeptide. The activity of variants of pheromone receptor polypeptides can be tested by cloning the gene encoding the variant pheromone receptor polypeptide into a bacterial or mammalian expression vector, introducing the vector into an appropriate host cell, expressing the variant pheromone receptor polypeptide, and testing for a functional capability of the pheromone receptor polypeptides as disclosed herein. For example, the variant pheromone receptor polypeptide can be tested for a ligand binding activity, wherein a ligand to which the 2o receptor binds is contacted with the variant receptor and the amount of ligand binding to the variant receptor is determined using conventional procedures to measure the binding of one molecule to another. Preparation of other variant polypeptides may favor testing of other activities, as will be known to one of ordinary skill in the art.
The skilled artisan will also realize that conservative amino acid substitutions may be 2s made in pheromone receptor polypeptides to provide functionally equivalent variants of the foregoing polypeptides, i.e, the variants retain the functional capabilities of the pheromone receptor polypeptides. As used herein, a "conservative amino acid substitution" refers to an amino acid substitution which does not alter the relative charge or size characteristics of the protein in which the amino acid substitution is made. Variants can be prepared according to 3o methods for altering polypeptide sequence known to one of ordinary skill in the art such as are found in references which compile such methods, e.g. Molecular Cloning: A
Laboratory Manual, J. Sambrook, et al., eds., Second Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1989, or Current Protocols in Molecular Biology, F.M.
Ausubel, et al., eds., John Wiley & Sons, Inc., New York. To a certain extent, the various members of the pheromone receptor family that are illustrated in the Examples represent exemplary functionally equivalent variants of the pheromone receptor polypeptides. Other functionally equivalent variants include s conservative amino acid substitutions of the amino acids of a pheromone receptor polypeptide selected from the group consisting of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52. Conservative substitutions of amino acids include substitutions made amongst amino acids within the following groups:
(a) M, I, L, V; (b) F, Y, W; (c) K, R, H; (d) A, G; (e) S, T; (f) Q, N; and (g) E, D.
1 o Conservative amino-acid substitutions in the amino acid sequence of pheromone receptor polypeptides to produce functionally equivalent variants of pheromone receptor polypeptides typically are made by alteration of the nucleic acid encoding pheromone receptor polypeptides.
Such substitutions can be made by a variety of methods known to one of ordinary skill in the art.
For example, amino acid substitutions may be made by PCR-directed mutation, site-directed 15 mutagenesis according to the method described in Proc. Nat. Acad. Sci.
U.S.A. 82: 488-492, 1985, or by chemical synthesis of a gene encoding a pheromone receptor polypeptide. Where amino acid substitutions are made to a small unique fragment of a pheromone receptor polypeptide, such as a ligand binding site peptide, the substitutions can be made by directly synthesizing the peptide. The activity of functionally equivalent fragments of pheromone 2o receptor polypeptides can be tested by cloning the gene encoding the altered pheromone receptor polypeptide into a bacterial or mammalian expression vector, introducing the vector into an appropriate host cell, expressing the altered pheromone receptor polypeptide, and testing for a functional capability of the pheromone receptor polypeptides as disclosed herein. Peptides which are chemically synthesized can be tested directly for function, e.g., for binding to a ligand to 25 which the unaltered pheromone receptor is known to bind.
The invention as described herein has a number of uses, some of which are described elsewhere herein. First, the invention permits isolation of the pheromone receptor polypeptides of the Examples. A variety of methodologies well-known to the skilled practitioner can be utilized to obtain isolated pheromone receptor molecules. The polypeptide may be purified from 3o cells which naturally produce the polypepdde by chromatographic means or immunological recognition. Alternatively, an expression vector may be introduced into cells to cause production of the polypeptide. In another method, mRNA transcripts may be microinjected or otherwise WO 9!9/00422 PCT/US98/13680 introduced into cells to cause production of the encoded polypeptide.
Translation of mRNA in cell-free extracts such as the reticulocyte lysate system also may be used to produce polypeptide.
Those skilled in the art also can readily follow known methods for isolating pheromone receptor polypeptides. These include, but are not limited to, immunochromatography, FiPLC, size-exclusion chromatography, ion-exchange chromatography and immune-affinity chromatography.
The isolation of the pheromone receptor gene also makes it possible for the artisan to diagnose a disorder characterized by expression of pheromone receptor . These methods involve determining expression of the pheromone receptor gene, and/or pheromone receptor 1 o polypeptides derived therefrom. In the former situation, such determinations can be carried out via any standard nucleic acid determination assay, including the polymerase chain reaction as exemplified in the examples below, or assaying with labeled hybridization probes.
The invention also makes it possible to isolate the naturally occurring ligands (pheromones) and other ligands that have a ligand binding domain, namely, by the binding of such molecules to the pheromone receptor polypeptides (or fragments thereof containing a ligand binding site). Binding of the receptors to a ligand can be accomplished by introducing into a biological system in which the proteins bind (e.g., a cell) a molecule that includes a binding domain (putative ligand) in an amount sufficient to detect the binding.
The invention also provides agents such as binding polypeptides which bind to 2o pheromone receptor polypeptides and/or to complexes of pheromone receptor polypeptides and their ligand binding partners. Such binding agents can be used, for example, in screening assays to detect the presence or absence of pheromone receptor polypeptides and complexes of pheromone receptor polypeptides and their ligand binding partners and in purification protocols to isolate pheromone receptor polypep~tides and complexes of pheromone receptor polypeptides and their ligand binding partners. Such agents also can be used to inhibit the native activity of the pheromone receptor polypeptides or their ligand binding partners, for example, by binding to such polypeptides, or their binding partners or both.
The invention, therefore, embraces peptide binding agents which, for example, can be . antibodies or fragments of antibodies having the ability to selectively bind to pheromone receptor 3o polypeptides. Antibodies include polyclonal and monoclonal antibodies, prepared according to conventional methodology.
W0.99/00422 PCTNS98/13680 Significantly, as is well-known in the art, only a small portion of an antibody molecule, the paratope, is involved in the binding of the antibody to its epitope (see, in general, Clark, W.R.
(1986) The Experimental Foundations of Modern Immunology Wiley & Sons, Inc., New York;
Roitt, I. (1991) Essential Immunology, 7th Ed., Blackweil Scientific Publications, Oxford). The pFc' and Fc regions, for example, are effectors of the complement cascade but are not involved in antigen binding. An antibody from which the pFc' region has been enzymatically cleaved, or which has been produced without the pFc' region, designated an F(ab')2 fragment, retains both of the antigen binding sites of an intact antibody. Similarly, an antibody from which the Fc region has been enzymatically cleaved, or which has been produced without the Fc region, 1 o designated an Fab fragment, retains one of the antigen binding sites of an intact antibody molecule. Proceeding further, Fab fragments consist of a covalently bound antibody light chain and a portion of the antibody heavy chain denoted Fd. The Fd fragments are the major determinant of antibody specificity (a single Fd fragment may be associated with up to ten different light chains without altering antibody specificity) and Fd fragments retain epitope binding ability in isolation.
Within the antigen-binding portion of an antibody, as is well-known in the art, there are complementarity determining regions (CDRs), which directly interact with the epitope of the antigen, and framework regions (FRs), which maintain the tertiary structure of the paratope (see, in general, Clark, 1986; Roitt, 1991 ). In both the heavy chain Fd fragment and the light chain of IgG immunoglobulins, there are four framework regions (FRl through FR4) separated respectively by three complementarity determining regions (CDR/ through CDR3).
The CDRs, and in particular the CDR3 regions, and more particularly the heavy chain CDR3, are largely responsible for antibody specificity.
It is now well-established in the art that the non-CDR regions of a mammalian antibody may be replaced with similar regions of nonspecific or heterospecific antibodies while retaining the epitopic specificity of the original antibody. This is most clearly manifested in the development and use of "humanized" antibodies in which non-human CDRs are covalently joined to human FR and/or Fc/pFc' regions to produce a functional antibody.
Thus, for example, PCT International Publication Number WO 92/04381 teaches the production and use of 3o humanized marine RSV antibodies in which at least a portion of the marine FR regions have been replaced by FR regions of human origin. Such antibodies, including fragments of intact antibodies with antigen-binding ability, are often referred to as "chimeric"
antibodies.
Thus, as will be apparent to one of ordinary skill in the art, the present invention also provides for F(ab')2, Fab, Fv and Fd fragments; chimeric antibodies in which the Fc and/or FR
and/or CDRI and/or CDR2 and/or light chain CDR3 regions have been replaced by homologous human or non-human sequences; chimeric F(ab')2 fragment antibodies in which the FR and/or CDRI and/or CDR2 and/or light chain CDR3 regions have been replaced by homologous human or non-human sequences; chimeric Fab fragment antibodies in which the FR
and/or CDRI and/or CDR2 and/or light chain CDR3 regions have been replaced by homologous human or non-human sequences; and chimeric Fd fragment antibodies in which the FR and/or CDRl and/or CDR2 regions have been replaced by homologous human or non-human sequences. The present 1o invention also includes so-called single chain antibodies.
Thus, the invention involves polypeptides of numerous size and type that bind specifically to pheromone receptor polypeptides, and/or complexes of both pheromone receptor polypeptides and their ligand binding partners. These polypeptides may be derived also from sources other than antibody technology. For example, such polypeptide binding agents can be provided by degenerate peptide libraries which can be readily prepared in solution, in immobilized form or as phage display libraries. Combinatorial libraries also can be synthesized of peptides containing one or more amino acids. Libraries further can be synthesized of peptoids and non-peptide synthetic moieties.
Phage display can be particularly effective in identifying binding peptides useful 2o according to the invention. Briefly, one prepares a phage library (using e.g. m13, fd, or lambda phage), displaying inserts from 4 to about 80 amino acid residues using conventional procedures.
The inserts may represent, for example, a completely degenerate or biased array. One then can select phage-bearing inserts which bind to the pheromone receptor polypeptide.
This process can be repeated through several cycles of reselection of phage that bind to the pheromone 2s receptor polypeptide. Repeated rounds lead to enrichment of phage bearing particular sequences. DNA sequence analysis can be conducted to identify the sequences of the expressed polypeptides. The minimal linear portion of the sequence that binds to the pheromone receptor polypeptide can be determined. One can repeat the procedure using a biased library containing ' inserts containing part or all of the minimal linear portion plus one or more additional degenerate 3 o residues upstream or downstream thereof. Yeast two-hybrid screening methods also may be used to identify polypeptides that bind to the pheromone receptor polypeptides.
Thus, the pheromone receptor polypeptides of the invention, or a fragment thereof, can be used to screen peptide WO 99!00422 PCTNS98/13680 libraries, including phage display libraries, to identify and select peptide binding partners of the pheromone receptor polypeptides of the invention. Such molecules can be used, as described, for screening assays, for purification protocols, for interfering directly with the functioning of pheromone receptor and for other purposes that will be apparent to those of ordinary skill in the art.
A pheromone receptor polypeptide, or a fragment which contains the ligand binding site, also can be used to isolate naturally-occurring ligands and other binding partners of the receptors of the invention. For example, an isolated pheromone receptor can be used to isolate ligands that bind to the receptor binding site by immobilizing a receptor (or fragment containing the ligand binding site) on a chromatographic media, such as polystyrene beads, or a filter, and using the immobilized polypeptide to isolate molecules that bind to this affinity matrix in accordance with standard procedures for affinity chromatography.
It will also be recognized that the invention embraces the use of the pheromone receptor cDNA sequences in expression vectors, as well as to transfect host cells and cell lines, be these ~ 5 prokaryotic (e.g., E. coli), or eukaryotic (e.g., CHO cells, COS cells, yeast expression systems and recombinant baculovirus expression in insect cells). Especially useful are oocytes, mammalian cells such as mouse, hamster, pig, goat, primate, etc. They may be of a wide variety of tissue types, and include primary cells and cell lines. The expression vectors require that the pertinent sequence, i.e., those nucleic acids described supra, be operably linked to a promoter.
When administered, the therapeutic compositions of the present invention are administered in pharmaceutically acceptable preparations. Such preparations may routinely contain pharmaceutically acceptable concentrations of salt, buffering agents, preservatives, compatible carriers, supplementary immune potentiating agents such as adjuvants and cytokines and optionally other therapeutic agents.
The therapeutics of the invention can be administered by any conventional route, including injection or by gradual infusion over time. The administration may, for example, be oral, intravenous, intraperitoneal, intramuscular, intracavity, subcutaneous, or transdermal.
When antibodies are used therapeutically, a preferred route of administration is by pulmonary 3o aerosol. Techniques for preparing aerosol delivery systems containing antibodies are well known to those of skill in the art. Generally, such systems should utilize components which will not significantly impair the biological properties of the antibodies, such as the paratope binding WO 99/00422 PCT/US98/13b80 capacity (see, for example, Sciarra and Cutie, "Aerosols," in ReminQton's Pharmaceutical Science, 18th edition, 1990, pp 1694-1712; incorporated by reference). Those of skill in the art can readily determine the various parameters and conditions for producing antibody aerosols without resort to undue experimentation. When using antisense preparations of the invention, s slow intravenous administration is preferred.
' Preparations for parenteral administration include sterile aqueous or non-aqueous solutions, suspensions, and emulsions. Examples of non-aqueous solvents are propylene glycol, polyethylene glycol, vegetable oils such as olive oil, and injectable organic esters such as ethyl oleate. Aqueous carriers include water, alcoholic/aqueous solutions, emulsions or suspensions, Io including saline and buffered media. Parenteral vehicles include sodium chloride solution, Ringer's dextrose, dextrose and sodium chloride, lactated Ringer's or fixed oils. Intravenous vehicles include fluid and nutrient replenishers, electrolyte replenishers (such as those based on Ringer's dextrose), and the like. Preservatives and other additives may also be present such as, for example, antimicrobials, anti-oxidants, chelating agents, and inert gases and the like.
15 The preparations of the invention are administered in effective amounts. An effective amount is that amount of a pharmaceutical preparation that alone, or together with further doses, produces the desired response in the condition being treated, e.g., modifying fertility or pheromone-mediated behaviors that are related to reproduction or aggression.
For example, this can involve the use of the compounds of the invention as pesticides to slow or halt insect or 2o rodent behaviors that result in reproduction. Alternatively, this can involve the use of the compounds of the invention as agents for controlling fertility in animals (e.g., livestock, domestic animals), by providing compounds which inhibit or stimulate the behaviors in such animals that result in reproduction or agression. This can be monitored by routine methods, e.g., observing the behavior in the animal (vertebrate or invertebrate) recipient.
25 The invention also contemplates gene therapy, e.g., to prepare an animal model for studying the conditions and behaviors (e.g., fertility, aggression) that are pheromone receptor-mediated. The procedure for performing ex vivo gene therapy is outlined in U.S. Patent 5,399,346 and in exhibits submitted in the file history of that patent, all of which are publicly available documents. In general, it involves introduction in vitro of a functional copy of a gene 3o into a cells) of a subject which contains a defective copy of the gene, and returning the genetically engineered cells) to the subject. The functional copy of the gene is under operable control of regulatory elements which permit expression of the gene in the genetically engineered cell(s). Numerous transfection and transduction techniques as well as appropriate expression vectors are well known to those of ordinary skill in the art, some of which are described in PCT
application W095/00654. In vivo gene therapy using vectors such as adenovirus, retroviruses, herpes virus, and targeted Iiposomes also is contemplated according to the invention.
The invention further provides efficient methods of identifying pharmacological agents or lead compounds for agents active at the level of a pheromone receptor or pheromone receptor fragment modulatable cellular function. In particular, such functions include Iigand binding activity. Generally, the screening methods involve assaying for activation of pheromone receptors or assaying for compounds which interfere with a pheromone receptor activity such as pheromone receptor binding to its cognate Iigand. Such methods are adaptable to automated, high throughput screening of compounds. The target therapeutic indications for pharmacological agents detected by the screening methods that block pheromone receptor activity are limited only in that the target cellular function be subject to modulation by alteration of the formation of a complex comprising a pheromone receptor polypeptide or fragment thereof and one or more t 5 natural pheromone receptor ligands. Target indications include cellular processes modulated by pheromone receptor signal transduction following receptor-ligand binding.
A wide variety of assays for pharmacological agents are provided, including, labeled in vitro protein-protein binding assays, electrophoretic mobility shift assays, immunoassays, cell-based assays such as two- or three-hybrid screens, expression assays, activation of G-proteins, 2o etc. For example, three-hybrid screens are used to rapidly examine the effect of transfected nucleic acids on the intracellular binding of pheromone receptor or pheromone receptor fragments to specific extracellular targets (e.g., ligands in biological samples, such as urine, vaginal fluid, or in combinatorial libraries) .
Pheromone receptor fiagments used in the methods, when not produced by a transfected 25 nucleic acid are added to an assay mixture as an isolated polypeptide. The assay can be used to screen putative Iigands for their ability to bind to the receptor. Pheromone receptor polypeptides preferably are produced recombinantly, although such polypeptides may be isolated from biological extracts. Recombinantly produced pheromone receptor polypeptides include chimeric proteins comprising a fusion of a pheromone receptor protein with another polypeptide.
3o For example, a polypeptide fused to a pheromone receptor polypeptide or fragment may also provide means of readily detecting the fusion protein, e.g., by immunological recognition or by fluorescent labeling.
In addition to the pheromone receptor, a screening assay mixture includes a binding partner for the receptor, e.g., a naturally occurring ligand that is capable of binding to the pheromone receptor or, alternatively, is comprised of an analog which mimics the pheromone receptor binding properties of the naturally occurring ligand for purposes of the assay. The S screening assay mixture also comprises a candidate pharmacological agent (e.g., a putative receptor agonist or antagonist). Typically, a plurality of assay mixtures are run in parallel with different agent concentrations to obtain a different response to the various concentrations.
Typically, one of these concentrations serves as a negative control, i.e., at zero concentration of agent or at a concentration of agent below the limits of assay detection.
Candidate agents 1 o encompass numerous chemical classes, although typically they are organic compounds.
Preferably, the candidate pharmacological agents are small organic compounds, i.e., those having a molecular weight of more than 50 yet less than about 2500, preferably less than about 1000 and, more preferably, less than about 500. Candidate agents comprise functional chemical groups necessary for structural interactions with polypeptides and/or nucleic acids, and typically 15 include at least an amine, carbonyl, hydroxyl or carboxyl group, preferably at least two of the functional chemical groups and more preferably at least three of the functional chemical groups.
The candidate agents can comprise cyclic carbon or heterocyclic structure and/or aromatic or polyaromatic structures substituted with one or more of the above-identified functional groups.
Candidate agents also can be biomolecules such as peptides, saccharides, fatty acids, sterols, 2o isoprenoids, purines, pyrimidines, derivatives or structural analogs of the above, or combinations thereof and the like. Where the agent is a nucleic acid, the agent typically is a DNA or RNA
molecule, although modified nucleic acids as defined herein are also contemplated.
Candidate agents are obtained from a wide variety of sources including libraries of synthetic or natural compounds. For example, numerous means are available for random and 25 directed synthesis of a wide variety of organic compounds and biomolecules, including expression of randomized oligonucleotides, synthetic organic combinatorial libraries, phage display libraries of random peptides, and the like. Alternatively, libraries of natural compounds in the form of bacterial, fungal, plant and animal extracts are available or readily produced.
Additionally, natural and synthetically produced libraries and compounds can be readily be 3o modified through conventional chemical, physical, and biochemical means.
Further, known pharmacological agents may be subjected to directed or random chemical modifications such as WO 99!00422 PCT/US98l13680 acylation, alkylation, esterification, amidification, etc. to produce structural analogs of the agents.
A variety of other reagents also can be included in the mixture. These include reagents such as salts, buffers, neutral proteins (e.g., albumin), detergents, etc.
which may be used to facilitate optimal protein-protein and/or protein-nucleic acid binding. Such a reagent may also reduce non-specific or background interactions of the reaction components.
Other reagents that improve the efficiency of the assay such as protease, inhibitors, nuclease inhibitors, antimicrobial agents, and the like may also be used.
The mixture of the foregoing assay materials is incubated under conditions whereby, but 1 o for the presence of the candidate pharmacological agent, the pheromone receptor polypeptide specifically binds the cellular binding target, a portion thereof or analog thereof. The order of addition of components, incubation temperature, time of incubation, and other parameters of the assay may be readily determined. Such experimentation merely involves optimization of the assay parameters, not the fundamental composition of the assay. Incubation temperatures typically are between 4°C and 40°C. Incubation times preferably are minimized to facilitate rapid, high throughput screening, and typically are between 0.1 and 10 hours.
After incubation, the presence or absence of specific binding between the pheromone receptor polypeptide and one or more binding targets is detected by any convenient method available to the user. For cell free binding type assays, a separation step is often used to separate 2o bound from unbound components. The separation step may be accomplished in a variety of ways. Conveniently, at least one of the components is immobilized on a solid substrate, from which the unbound components may be easily separated. The solid substrate can be made of a wide variety of materials and in a wide variety of shapes, e.g., microtiter plate, microbead, dipstick, resin particle, etc. The substrate preferably is chosen to maximum signal to noise ratios, primarily to minimize background binding, as well as for ease of separation and cost.
Separation may be effected for example, by removing a bead or dipstick from a reservoir, emptying or diluting a reservoir such as a microtiter plate well, rinsing a bead, particle, chromatographic column or filter with a wash solution or solvent. The separation step preferably includes multiple rinses or washes. For example, when the solid substrate is a microtiter plate, 3o the wells may be washed several times with a washing solution, which typically includes those components of the incubation mixture that do not participate in specific bindings such as salts, buffer, detergent, non-specific protein, etc. Where the solid substrate is a magnetic bead, the beads may be washed one or more times with a washing solution and isolated using a magnet.
Detection may be effected in any convenient way for cell-based assays such as two- or three-hybrid screens. The transcript resulting from a reporter gene transcription assay of S Pheromone receptor polypeptide binding to a target molecule typically encodes a directly or indirectly detectable product, e.g., [3-galactosidase activity, luciferase activity, and the like. A
wide variety of cell based assays for G-protein coupled receptors could also be employed for detection of molecules that stimulate (agonsists) pheromone receptors or block (agonists) that stimulation by natural ligands or agonists. Pheromone receptor polypeptides or chimeric receptors composed only in-part of a pheromone receptor could be employed in these assays.
The chimeric receptors might, for example, contain part of another G-protein coupled receptor such that binding of a ligand to the pheromone receptor binding domain results in coupling to a particular G-protein where activation could be easily assayed. For cell free binding assays, one of the components usually comprises, or is coupled to, a detectable label. A
wide variety of 15 labels can be used, such as those that provide direct detection (e.g., radioactivity, luminescence, optical or electron density, etc). or indirect detection (e.g., epitope tag such as the FLAG epitope, enzyme tag such as horseradish peroxidase, etc.). The label may be bound to a pheromone receptor binding partner (ligand), or incorporated into the structure of the binding partner.
A variety of methods may be used to detect the label, depending on the nature of the label 2o and other assay components. For example, the label may be detected while bound to the solid substrate or subsequent to separation from the solid substrate. Labels may be directly detected through optical or electron density, radioactive emissions, nonradioactive energy transfers, etc.
or indirectly detected with antibody conjugates, strepavidin-biotin conjugates, etc. Methods for detecting the labels are well known in the art.
25 The invention provides pheromone receptor -specific binding agents, methods of identifying and making such agents, and their use in diagnosis, therapy and pharmaceutical development, including the development of pesticides and other agents for controlling fertility and reproduction (or related behaviors) in animals. For example, pheromone receptor-specific pharmacological agents are useful in a variety of diagnostic and therapeutic applications, 3o especially where disease or disease prognosis is associated with improper utilization of a pathway involving pheromone receptor. Novel pheromone receptor-specific binding agents include pheromone receptor-specific antibodies and other natural intracellular binding agents identified with assays such as two hybrid screens, and non-natural intracellular binding agents identified in screens of chemical libraries and the like.
In general, the specificity of pheromone receptor binding to a binding agent is shown by binding equilibrium constants. Targets which are capable of selectively binding a pheromone receptor polypeptide preferably have binding equilibrium constants of at least about 10' M-', more preferably at least about 10g M'', and most preferably at least about 109 M-'. The wide variety of cell based and cell free assays may be used to demonstrate pheromone receptor -specific binding. Cell based assays include one, two and three hybrid screens, assays in which pheromone receptor -mediated transcription is inhibited or increased activation of G-proteins, etc. Cell free assays include pheromone receptor -protein binding assays, immunoassays, etc.
Other assays useful for screening agents which bind pheromone receptor polypeptides include fluorescence resonance energy transfer (FRET), and electrophoretic mobility shift analysis (EMSA).
Various techniques may be employed for introducing nucleic acids of the invention into cells, depending on whether the nucleic acids are introduced in vitro or in vivo in a host. Such techniques include transfection of nucleic acid-CaP04 precipitates, transfection of nucleic acids associated with DEAF, transfection with a retrovirus including the nucleic acid of interest, liposome mediated transfection, and the like. For certain uses, it is preferred to target the nucleic acid to particular cells. In such instances, a vehicle used for delivering a nucleic acid of the 2o invention into a cell (e.g., a retrovirus, or other virus; a liposome) can have a targeting molecule attached thereto. For example, a molecule such as an antibody specific for a surface membrane protein on the target cell or a ligand for a receptor on the target cell can be bound to or incorporated within the nucleic acid delivery vehicle. For example, where liposomes are employed to deliver the nucleic acids of the invention, proteins which bind to a surface membrane protein associated with endocytosis may be incorporated into the liposome formulation for targeting and/or to facilitate uptake. Such proteins include capsid proteins or fragments thereof tropic for a particular cell type, antibodies for proteins which undergo internalization in cycling, proteins that target intracellular localization and enhance intracellular half life, and the like. Polymeric delivery systems also have been used successfully to deliver 3o nucleic acids into cells, as is known by those skilled in the art. Such systems even permit oral delivery of nucleic acids.
Preparation and analysis of single cell cDNAs Male mouse (C57BL/6J) VNOs were minced, incubated in Trypsin-EDTA (Gibco-BRL/LTI, Rockville, Maryland), and triturated to obtain dissociated cells. The cells were centrifuged ( 1000 RPM, 5 min) and resuspended in phosphate buffered saline +
0.1 % bovine serum albumin. Individual cells that appeared to be neurons were transferred to separate tubes 1 o with a microcapillary pipet.
cDNAs were prepared from each cell and amplified according to Brady and Iscove (Methods in Enzymology, 1993, 225:611-621) with minor modifications. Briefly, cDNAs were prepared from the 3' ends of mRNAs by reverse transcription with an oligo (dT) primer, and a poly dA stretch was added to each cDNA with terminal transferase. The cDNAs were then I5 amplified by PCR with one of two primers, AL1 (ATTGGATCCAGGCCGCTCTGGACAA
AATATGAA TTC(T) ( SEQ. ID. No. 56) (Dulac and Axel, Cell, 1995, 83:195-206 or (GGCACATGG ACGAAATCTTGGTACTCTTCAGAATTC(T), (SEQ. ID. No. 57) and Taq polymerase [Amplitaq LD ("ALD") or Amplitaq Stoffel Fragment ("ASF") (Perkin Elmer, Norwalk, CT )].
20 Aliquots of each cDNA sample were electrophoresed on agarose gels and blotted onto nylon membranes (Hybond N+, Amersham, Piscataway, NJ) (Ausubel, F., et al., Current Protocols in Molecular Biology, 1988, John Wiley & Sons NY, NY; Sambrook, J., et al., Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, 1989). The blots were hybridized at 55° or 70°C in Hyb Buffer (O.SM sodium phosphate 25 buffer (pH7.3), 4% SDS, 1% bovine serum albumin (BSA)) with 32P-labeled probes prepared by random priming (Prime-It II, Stratagene, La Jolla, CA).
Construction and screening of single cell cDNA libraries ' An aliquot of cDNA sample VN14 was digested with Eco RI and gel-isolated fragments 30 of 0.1-1.5 kb were cloned into ~.ZapII Ausubel, F., et al., Current Protocols in Molecular Biology, 1988, John Wiley & Sons NY, NY; Sambrook, J., et al., Molecular Cloning: A
Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, 1989).
Two thousand library clones were plated at low density. Replica filter lifts were hybridized at 75°C
(in Hyb Buffer containing 2~g/ml poly (dT)24 and 1pg/ml of random dA-dT 20-mers) t0 32P-labeled probes (~2.5 x 108 CPM/pg; 5 x 106 CPM/ml) prepared by PCR of different single cell cDNA samples. Clones that hybridized to only a VN14 probe were isolated, and a probe prepared from the insert of each was hybridized to blots of selected single cell cDNAs. Clones that hybridized to only VN14 cDNAs were sequenced.
Isolation and analysis of VR cDNA clones sc153, one VN14+VN2- clone from the VN14 library, was used as probe to screen a to mouse VNO cDNA library ('~,VNO') (Berghard, A., et al., JNeurosci, 1996, 16:909-918) and a mouse genomic DNA library (Stratagene, La Jolla, CA) (70°C, Hyb buffer). Hybridizing clones were found only in the genomic library. A fragment containing 2kb upstream of sc153 was isolated from one genomic clone (15361) and used to screen 1VN0 (55°C, Hyb Buffer). The region (D10-TM7) of one clone (D10) that showed homology to TM7 of the CSR
(SEQ ID NO.
59) was then used to screen 1VN0 (55°C, Hyb Buffer), yielding a variety of VR cDNA clones.
Additional clones were obtained from 1VN0 using probes prepared from clones previously isolated, or from PCR products obtained by amplification of mouse genomic DNA
or VNO
cDNA with degenerate primers (Buck, L., et al., Cell, 1991, 65:175-187) matching conserved motifs in the VRs. Some PCR products were also cloned into pCR2.1 (Invitrogen, Carlsbad, 2o CA) and sequenced.
Analysis of VR mRNAs by RT-PCR
Random-primed cDNA prepared from male or female C57BL/6J mouse VNO RNAs (or VR cDNA clones) were used in PCR reactions with degenerate primers (Buck and Axel, Cell 1991, 65:175-187) matching conserved VR motifs to amplify VR sequences corresponding to amino acids 33-772 in VRl (SEQ ID NO. 2). Nested PCR was performed with a 1/1000 dilution of the first PCR reaction and primer pairs matching regions of putative exons 1 and 6 in specific VR cDNA clones. Blots prepared from size-fractionated, nested PCR products were hybridized (70°C, Hyb buffer containing 100p,g/ml herring sperm DNA (Sigma, St Louis, MO)) to probes 3o prepared from the PCR products of the cDNA clones.
Northern and Southern blots and genomic library screens - _ - 43 -Northern Blots: One ~g of PolyA+ RNA prepared from mouse VNO and OE, or purchased from Clontech (other tissue RNAs), was size fractionated on formaldehyde gels, and blotted (see above) (Berghard and Buck, J Neurosci, 1996, 16:909-918). The blot was hybridized (70°C, Hyb Buffer) with a 32P-labeled probe prepared from the regions of cDNAs VRI, VR2, VR4, and VR15 corresponding to that encoding amino acids 33-772 in VR1 (SEQ
ID NO. 1 ).
Southern Blots: 5 ~g of genomic DNA prepared from C57BL6/J mouse liver was digested with Eco RI or Hind III, size fractionated, and blotted (Ressler et al, Cell, 1993, 73:597-609). The blots were hybridized (70°C, Hyb buffer containing sperm DNA
{see above)) to 1 o probes prepared from 3' untranslated segments of different VR cDNA clones [VR2 (nt.2607-2961 of SEQ ID NO. 3), VR3 (nt. 2505-2907 of SEQ ID NO. S), and VR15 (nt. 3239-3689 of SEQ ID NO. 29)]. A VR4 probe was also used, which gave the same results as highly related VR15 probe.
Genomic library screens to determine VR gene number: A mouse genomic library was screened separately at 70°C or 55°C (see above) with different 32P-labeled probes. Probe 1: a mix of segments of cDNAs VRl (SEQ ID NO. 1 ), VR2 (SEQ ID NO. 3), VR4 (SEQ ID
NO. 7), and VR15 (SEQ ID NO. 29) encoding the region cowesponding to amino acids 619-772 of VRl (SEQ ID NO. 2). Probes 2-6: Segments of VR genes obtained from mouse genomic DNA by PCR with degenerate primers matching conserved VR sequence motifs. The PCR
segments 2o corresponded to the following amino stretches in VRl (SEQ ID NO. 2): amino acids 191-397, 565-825, 637-825, 637-804, and 619-784. For example, degenerate oligonucleotide primer pairs used included:
for amino acids 191-397:
5' primer= (GCT)TI(CT)A(CT) CA(AG)(AG)TIGCI(AC~IAA(AG)GA(CT)AC (SEQ ID NO.
60), 3' primer= G(CT)(AG)T(GT)IGCI(AG)(CT)I(AG)C(AG)T{AG)IACI(AG)C(AG)TT (SEQ ID
NO. 61 );
for amino acids 565-825:
5' primer= (ACXAG)ITG (CT)CCI(GT)AIIA(CTXAC)A{AG)TA(CT)GCIAA (SEQ ID NO. 62), 3' primer= GIC(GT)IA(C'T)IA(AG)IATIA (CT)(AG)TAI(AC)(AT)(CT)TTIGGIAC (SEQ ID
NO. 63);
for amino acids 637-825:
5' primes= ATI(AT)(GC)I (CT) TI(AG)TITT(CT)TG(CT)TT(CT)(CT)TITG (SEQ ID NO.
64), 3' primer= GIC(GT)IA(CT)IA(AG)IATIA (CT)(AG)TAI(AC)(AT)(CT)TTIGGIAC (SEQ ID
NO. 63);
for amino acids 637-804:
5' primer-- ATI(AT)(GC)I(CT)TI(AG)TITT(CT)TG(CT)TT(CT)(CT)TITG (SEQ ID NO.
64), 3' primer= (AG)IATI(GC)(AT)(AG)AAIA(CT)(CT)TCIACI (AG)CIACCAT (SEQ ID NO. 65);
and for amino acids 619-784:
5' primer= GA(CT)ACICCIATIGTIAA(AG)GCIAA(CT}AA (SEQ ID NO. 66), 3' primer= AAIGTIA(CT)CCAIACI(GC)(AT)(AG)CA(AG)AAIAC (SEQ ID NO. 67), wherein all primers are in a 5'-» 3' direction, I:Inosine.
In situ hybridization is In situ hybridization was performed according to Schaeren-Wiemers and Gerfin-Moser (Histochemistry, 1993, 100:431-440) with sequential 16 micron sections of male or female VNOs. Digoxigenin- labeled cRNA probes were prepared from the same 3' untranslated regions of VR cDNAs as used for the genomic Southern blots. Sections were counter-stained with Hoechst 33258, which labels nuclei. The numbers of G,o- or G,;Z-labeled cells (or cells labeled 2o with VR probes) was determined by counting the number of nuclei in labeled regions. The total number of cells was considered to be the sum of G,~+ and G,~+ cells in adjacent sections.
Chromosome mapping of VR genes Southern blots of genomic DNA from C57BL/6J and Mus spretus (Jackson Labs) 25 digested with different restriction enzymes were prepared and probed with specific VR cDNA
probes as described above. Southern blots of Eco RI, size fractionated genomic DNAs from 94 different backcross mice (M. spretus x (M. spretus x C57BL/6J)), were purchased from Jackson Labs. These blots were hybridized to probes prepared from 3' untranslated segments of the VR2 or VR4 (see above) cDNA at 70°C and washed (see above}. Polymorphic bands were typed as 3o either M. spretus or M. spretus/C57BL/6J. The data was sent to the Jackson Laboratory Backcross DNA Mapping Panel Resource for determination of the chromosomal locations of the polymorphic fragments. Additional information was obtained via Internet from Jackson Laboratory Mouse Genome Informatics.
Cloning of a gene differentially expressed in G,~+ VNs Different members of the OR and VNR families are expressed in different neurons in the OE and G,;~+ zone of the VNO, respectively. It therefore appeared likely that the same would be true of sensory receptors expressed by G,~+ VNs. The differential screening of cDNA libraries with cDNA probes prepared from a few neurons can be used to identify genes expressed in one neuron, but not another (Buck, L., et al, Annu. Rev. Neurosci., 1996, 19:517-544). Using PCR, this can be accomplished with single cells (Brady, G., et al., Methods in Enzymology, 1993, 225:611-621; Dulac, C., et al., Cell, 1995, 83 :195-206).
To search for genes encoding receptors expressed by G,~+ VNs, we looked for genes expressed in one G,~+ VN, but not another, using the PCR-based differential screening approach.
In initial experiments, we isolated a series of mouse VNs, prepared cDNAs from the 3' ends of ~ s mRNAs present in each, and amplified the single-cell cDNA fragments by PCR. Many of the amplified, single-cell cDNA samples hybridized to an OMP probe, confirming their derivation from VNs (Berghard et al, Proc. Natl. Acad. Sci. USA, 1996, 93:2365-2369).
With one exception, Gp and G,;Z probes hybridized to different OMP+ samples, allowing us to identify samples that were derived from Ga+ VNs.
2o We next prepared a library from one of the Gn+ single-cell cDNA samples (VN14), and isolated clones that hybridized to a probe prepared from VN14, but not to a probe prepared from another G,~+ sample (VN2). We identified 3 VN14+VN2- clones, which differed in size, but were otherwise identical in sequence. None contained an open reading frame, which was not surprising since, in the method used, the amplified cDNAs are only 400-800 by long, and are 25 derived from the 3' ends of mRNAs (Brady and Iscove, Methods in Enzymology, 1993, 225:611-621 ).
I We next hybridized one of the VN14+VN2- clones (sc153) to the original panel of single-cell cDNAs. sc 153 hybridized to VN 14, but not to any of the other cDNA samples.
Consistent with this result, sc153 hybridized to only a small percentage (~0.3%) of VNs in VNO
30 tissue sections.
Using sc153 as probe, we were able to isolate a sc153+ clone from a mouse genomic library which contained ~2 kb of DNA S' to the sc153 sequence. Using this 2kb fragment as probe, we isolated a matching clone (D10) from the VNO cDNA library. Sequence analysis showed that sc 153 and D 10 were derived from the same gene, but that the D 10 cDNA was truncated at the 3' end and did not contain the final 685 by of sequence present in sc 153. Like sc153, D10 hybridized to only a small percentage of VNs in VNO tissue sections.
The 5' end of the D 10 cDNA contained a short open reading frame, which encoded a protein fragment with homology to transmembrane domain 7 (TM7) of the calcium sensing receptor (CSR), a G protein-coupled receptor (GPCR) (Brown et al, Nature, 1993, 366:575-580).
When the TM7-related region of D 10 (D 10-TM7) was hybridized at reduced stringency (55°C) to the original panel of single-cell cDNAs, it labeled many of the G,°+
samples, but none of G,~+
ones (except the one that was also G,°+, and was probably derived from two cells). Since D10 labeled only a small percentage of VNs in tissue sections under high stringency conditions, this suggested that many G,°+ neurons express a gene related to D 10, but not identical to it.
A novel multigene family encoding VNO receptors Hybridization of D10-TM7 to the VNO cDNA library at reduced stringency yielded a number of related cDNA clones (e.g. VRl-VR3, SEQ ID NOs. 1-6). Additional related cDNAs were obtained by RT-PCR with degenerate primers (e.g. VR6-VR7, SEQ ID NOs. 11-14), or by screening the VNO cDNA library with a PCR product obtained from genomic DNA
(e.g., VR4, VRS, SEQ ID NOs. 7-10).
2o These cDNAs encode a novel family of proteins, which are members of the G
protein-coupled receptor (GPCR) superfamily (Figure 1). Like other GPCRs, these VNO
receptors (VRs) have 7 hydrophobic stretches that may serve as membrane spanning domains. Only 287 of 850 residues are identical in all of the molecules shown in Figurel, indicating that the family is diverse. The VRs are related to two other types of GPCR, the calcium sensing receptor (CSR) and the metabotropic glutamate receptors (mGluRs) (Tanabe, Y., et al., Neuron, 1992, 8:169-179; Brown, E., et al., Nature, 1993, 366:575-580). The most highly related molecule is the CSR; for example, VRl is 31% identical to rat CSR (Riccardi et al., Proc.
Natl. Acad. Sci. USA, 1995, 92:131-135), with the highest homology residing in the TM1-TM7 region (44%) (Figure 1 ). However, the VRs comprise a distinct family of receptors, which share novel sequence 3o motifs, and are more related to one another than they are to other receptors. For example, two divergent VRs, VRl (SEQ ID NO. 1, 2) and VR4 (SEQ ID NO. 7, 8), are 70%
identical in TM1-TM7, and 48% identical overall.
_ - 47 _ The VRs are unusual among GPCRs in having an extremely long N-terminal extracellular domain (Figures 1 and 2). This feature is shared by the CSR and mGluRs, and by an unrelated class of GPCRs that includes several receptors for glycoprotein hormones (Segaloff, D., et al., Oxf. Rev. Reprod Biol., 1992, 14:141-168). Importantly, the VRs are very different from both ORs and VNRs, which are also GPCRs (Buck. L., et al., Cell, 1991 51:127-133;
Dulac, C., et al., Cell, 1995, 83:195-206). VRs share none of the characteristic sequence motifs of ORs or VNRs. In addition, the size of the N-tenrninal extracellular domain of VRs (557-565 amino acids) far exceeds that of ORs and VNRs (~12-28 amino acids) (Figure 2). The VRs are most variable in the N-terminal domain (25% identical residues compared to 57% in TMl-TM7). In t o the structurally-related mGluRs, the ligand binding site is thought to reside in the large N-terminal domain (O'I3ara et al., Neuron, 1993, 11:41-52; Takahashi et al, J.
Biol. Chem., 1993, 268:1934I-19345). If this is also true of VRs, the accentuated diversity of the N-terminal domain may reflect an ability to recognize diverse pheromonal ligands.
Most of the VR cDNAs that we analyzed appeared to belong to one of three subfamilies ~5 of highly related molecules. For example, VRl (SEQ ID NOs. 1, 2), VR2 (SEQ
ID NOs. 3, 4), and VR3 (SEQ ID NOs. 5, 6) are very similar as are VR4 (SEQ ID NOs. 7, 8) and VRS (SEQ
ID NOs. 9, IO), and VR6 (SEQ ID NOs. 11, 12) and VR7 (SEQ ID NOs. 13, 14) (Figure 1).
Nonetheless, our results indicate that all of these cDNAs were derived from different genes.
First, all cDNAs were sequenced on both strands to rule out sequencing errors.
Second, the RNA
2o used for library construction and PCR came from an inbred mouse strain (C57BL/6J), so they cannot be allelic variants. Third, the error rates of reverse transcriptase (or Taq polymerase) cannot account for the extent to which the cDNAs differ. For example,VR4 (SEQ
ID NOs. 7, 8) and VRS (SEQ ID NOs. 9, 10) cDNAs are 99% identical in nucleotide sequence, but the reverse transcriptase used to prepare them has an error rate of only 3.6 x 10-s/bp (Ji, J., et al., 2s Biochemistry,1992, 31:954-958).
Variant forms of VR mRNA
Many of the VRs we characterized lacked a segment of the N-terminal domain present in other VRs. Invariably, the missing segment corresponded to a region of the human CSR
3o encoded by a single exon, or pair of exons (Pollak, M., et al., Cell, 1993, 73:1297-1303). We also found several different VR cDNAs that contained a stretch of noncoding sequence at a site corresponding to a CSR exon-intron boundary (e.g. VRI S). This suggested that the exon-intron structure of VR genes resembles that of the CSR gene, and that variant forms of VR mRNAs might be generated by differential RNA splicing.
Variant VR mRNAs could derive either from different genes, or from the same gene by alternative RNA splicing. Consistent with the latter possibility, two pairs of cDNAs that we sequenced VR8 (SEQ ID NOs. 15, 16) and VR9 (SEQ ID NOs. I7, 18), and VR10 (SEQ
ID
NOs. 19, 20) and VRl 1 (SEQ ID NOs. 21, 22) were identical in nucleotide sequence, but were missing different segments. However, when we used RT-PCR to amplify VNO mRNA
sequences encoding 5 different VRs, we obtained one major PCR product in each case, regardless of whether the RNA used was from male or female mice. In 4 cases, the size of the to major product corresponded to a complete VR, even though one of the cDNAs (but not the PCR
product) contained an intron (#5). In one case, in which the cDNA lacked one exon {#2), the major PCR product was even smaller, and was found to lack two exons. Although PCR products of a smaller size were also seen in these experiments, they were much less abundant.
These results suggest that different VR forms derive from different genes.
Thus many ~5 VR genes may be expressed pseudogenes, which either lack one or more exons, or have mutations that prevent proper RNA splicing. We cannot exclude the possibility that some variant VRs are functional, however. For example, some truncated VRs that lack transmembrane domains could conceivably be secreted pheromone-binding proteins.
2o Differential expression of VR genes in VNO neurons To investigate the tissue distribution of VR gene expression, we conducted Northern blot analyses in which size fractionated polyA+ RNAs from different mouse tissues were hybridized to a mix of radiolabeled VR cDNAs. The mixed probe hybridized to VNO RNAs of ~1.9-3.7 kb, with intense hybridization to RNAs of 2.8-3.5 kb. It did not hybridize to RNAs from a 25 variety of other tissues, including olfactory epithelium and brain. This suggested that VR genes may be expressed exclusively in the VNO.
We found two partial cDNAs that were highly related to VR cDNAs in the NCBI
dbEST
database, one from spleen and the other from 2-cell stage mouse embryos.
However, when we hybridized the most highly related VR cDNAs (VR6 and VR7) to spleen sections, only one 3o questionably-labeled cell was seen out of ~1.4 x 106 cells with one VR
probe, and none was seen with the other. The EST clones might be DNA contaminants, or be due to the widespread, but low level, misexpression of tissue specific genes {Sarkar, G., et al., Science, 1989, 244:331-334);
nonetheless, we cannot exclude the possibility that VR genes are expressed at a low frequency in some other tissues.
To examine the patterns of expression of different VR genes in the VNO, we conducted in situ hybridization experiments. Labeled segments of the 3' untranslated regions of three VR
cDNAs were hybridized separately, or in combination, to sequential sections through the VNO.
Probes prepared from G,~ and G,~ cDNAs were hybridized to adjacent sections to delineate the G,~+ and G,~+ zones of the VNO neuroepithelium.
The Gp and G,;2 probes gave patterns of hybridization similar to those we had previously seen (Berghard, A., et al, J. Neurosci., 1996, 16:909-918). The G,~probe hybridized to a wavy stripe of VNO neurons in the basal (lower) region of the VNO neuroepithleium, whereas the G,;z probe hybridized to an adjacent stripe of neurons in the apical (upper) part of the neuroepithelium. The waviness of the two zones appears to be caused by the periodic presence of blood vessels near the base of the epithelium (Berghard, A., et al, J.
Neurosci., 1996, 16:909-918). Approximately 57% of VNs were labeled by the G,,Z probe and 43% were labeled by the 1s Gp probe. The single layer of supporting cells located just beneath the epithelial surface was not labeled by either probe.
Each of the VR probes hybridized to a small percentage (2.4-5.7%) of VNs that appeared to be restricted to the basal, Gm+ zone of the VNO neuroepithelium. Labeled neurons were scattered throughout the anterior-posterior and dorsal-ventral extent of the G,~+ zone. Small 2o clusters of labeled cells were somtimes seen, particularly with the VR2 probe The mixed probe labeled a larger percentage of VNs (10.6%) that was almost equal to the sum of the percentages labeled by its individual components (10.8%). Thus different G,~+ neurons must express different VRs.
No differences were seen in the patterns of hybridization obtained using VNOs from male 25 and female mice, and no hybridization was observed in the nasal olfactory epithelium using either the mix of VR probes or a full-length VR cDNA probe (not shown).
Subsequent analyses of the size of the VR gene family, and the number of VR genes recognized by the VR in situ hybridization probes, allowed us to estimate the number of VR genes expressed by individual neurons (see below).
The size of the VR multigene family To investigate the size of the VR gene family, we hybridized several different mixed VR
gene probes to a mouse genomic library, using high (70°C) or low (55°C) stringency conditions.
A probe prepared from the membrane spanning regions (putative exon 6) of several different cDNA clones hybridized to 59 and 98 clones per haploid genome equivalent, at high and low stringency, respectively. To obtain probes that were potentially more diverse, we amplified internal segments of putative exon3 or 6 from genomic DNA by PCR with degenerate primers.
At high stringency, these probes hybridized to 60-140 clones per haploid equivalent. These results indicate that there are as many as 140 VR genes in the mouse genome.
The VR probes that we used for in situ hybridization each labeled a small percentage of 1 o neurons. To determine how many VR genes each probe recognized, we hybridized probes prepared from the same VR cDNA segments to Southern blots of C57BL/6J mouse genomic DNA which had been digested with Eco RI or Hind III. Each probe hybridized to a small number of restriction fragments. Given the small size of the probes {350-450 bp), most of these fragments should represent at least one gene, provided that there are no introns in the region probed. Consistent with this assumption, the VRZ {SEQ ID NO. 3) probe hybridized to 7 different restriction fragments, as many as five of which could be accounted for by characterized VR cDNAs that were 91-98% identical to VR2 (SEQ ID NO. 3) in the region probed.
Given the number of genes recognized by each VR probe and the percentage G,°+ neurons that hybridized to each, we estimate that each VR gene may be expressed in only ~1.1-1.9% of 2o G,°+ VNs. Since there appear to be 60-140 VR genes in the mouse genome, this suggests that each Gm+ VNO neuron may express only one, or at most a few, VR genes.
Linkage of chromosomal clusters of VR and OR genes We previously found that there are clusters of OR genes at multiple chromosomal sites in the mouse genome (Sullivan, S., et al., Proc. Natl. Acad. Sci., 1996, 93:884-888). To investigate the chromosomal locations of VR genes, we used the Jackson Laboratory Backcross DNA Mapping Panel, which allows the mapping of mouse genes using interspecies mouse crosses.
Probes prepared from the 3' untranslated regions of VR2 (SEQ ID NO. 3) or VR4 cDNAs 3o were first hybridized to Southern blots of genomic DNAs from two mouse species, C57BL/6J
and Mus spretus, which had been digested with different restriction enzymes.
Eco RI digests showed a number of restriction length polymorphisms with both VR probes. The VR probes were then hybridized to Eco RI-digested DNAs from a large panel of different backcross mice ((C57BL/6J x M. spretus) x M. spretus).
The patterns of inheritance of the polymorphic fragments recognized by the two VR
probes allowed us to assign chromosomal locations to approximately 9 VR genes.
Using the VR4 (SEQ ID NO. 7) probe, we could follow the inheritance of 4 polymorphic restriction fragments. All of these cosegregated in the backcrosses, and mapped to the proximal end of chromosome 7 (near D7Bir5). Five restriction fragments were followed for the VR2 (SEQ ID
NO. 3) probe. Again, all of the restriction fragments cosegregated, allowing us to map the VR2 (SEQ ID NO. 3) fragments to the distal end of chromosome 4 (near D4Bir1).
Given the 1 o resolution of the genetic mapping, the cosegregating fragments can be no more than 3.8 cM from one another. These results indicate that VR genes are located near the ends of at least two different mouse chromosomes. They also indicate that highly related VR genes are clustered at the same chromosomal locus, as previously seen in our studies and others (Ben-Arie et al, Human Molecular Genetics, 1994, 3:229-235.).
The VR4 gene subfamily appears to be closely linked to one OR gene locus, (olfRS ) (Sullivan, S., et al., Proc. Natl. Acad. Sci., 1996, 93:884-888). Although the VRs and ORs were mapped in different mouse crosses, the synaptotagmin-3 gene (Syt3 ) was mapped in both crosses, allowing an estimate of their relative positions. The OR locus mapped 15.05 cM
proximal to Syt3 while the VR4 gene cluster mapped 14.89 cM proximal to Syt3.
(Jackson 2o Laboratory Mouse Genome Informatics), suggesting a close linkage between VR
and OR genes at the proximal end of chromosome 7. Our previous studies indicate that multiple OR gene loci arose via a series of duplications of very large chromosomal domains that maintained linkages between OR genes and members of other gene families. These results therefore suggest that VR
genes and OR genes might have been linked in a primitive ancestor. They also suggest the possibility that additional clusters of VR genes might be linked to other OR
gene loci.
Preparation of cDNA Libraries from Isolated VNO Neurons 3o VNOs were dissected from adult (7- to 8-week-old) male Lewis rats (Sprague-Dawley).
Single-cell cDNA synthesis and amplification were performed and checked according to Dulac and Axel (Cell,1995, 83:195-206). Southern blot analysis of single-cell cDNA
was used to detect expression of tubulin, OMP, Go, and Gi2a (Dulac and Axel, Cell, 1995, 83:195-206).
Eighteen cDNAs showed strong hybridization with tubulin and OMP probes, indicating that they originated from mature neurons, and were selected for further study. Cells VN3 and VN13 exhibited high levels of Go expression, whereas VN10 showed presence of Gi2a, indicating the origin of these cells from two distinct regions of the VNO neuroepithelium. VN
13 single-cell cDNA library was prepared according to Dulac and Axel (Cell, 1995, 83:195-206).
Differential Screening of Single-Cell Library Plaque-forming units ( 12 x 103} from the VN 13 library were plated at low density, and 1 o duplicate filters (Hybond N+, Amersham) were hybridized with probes generated from VN 10 and VN 13 single-cell cDNAs, following the procedure described in Dulac and Axel, Cell, 1995, 83:195-206. Ten phage plaques were detected that showed a positive signal unique to the VN13 probe. These plaques were purified, and the corresponding phage inserts were amplified by PCR, run on 1.5% agarose gel, blotted onto nylon filter, and hybridized with the VN10, VN3, and t5 VN13 single-cell cDNA probes.
Isolation and Analysis of Full-Length cDNA Clones A 425 by clone, Go-VN13A, present at the frequency of 0.1% in the VN13 single-cell cDNA library, was selected and in vivo excised to generate the pBlueScriptSK(-) phagemid.
2o High stringency (65 °C) screening of a cDNA library prepared from female rat VNO (Dulac and Axel, Cell, 1995, 83:195-206) with the Go-VN13A cDNA probe led to the isolation of Go-VN13B (SEQ ID NO. 49) , presenting 90% sequence homology with Go-VN13A.
Phages (7.2 x 105) of the female rat VNO library were further screened with the Go-VN13B (SEQ ID
NO. 49) cDNA probe under low stringency conditions: hybridization was carried out at 55 °C for 25 24 hr, and the filters were washed three times at 55°C for 30 min in O.Sx SSC and 0.5% SDS.
A total of 75 positive phages were identified and the corresponding inserts were amplified by PCR and analyzed by Southern blot using the Go-VN13B (SEQ ID NO. 49) probe at both high (65 °C) and low (SS °C) stringency. This led to the identification of 22 cDNA clones with insert sizes longer than 3 kb. Among those, six distinct subfamilies were defined by absence of 3o cross-hybridization under stringent conditions of hybridization and washing. Full-length clones (Go-VN1 to Go-VN6, SEQ ID NOs. 33, 35, 37, 39, 41, 43), each representative of a subfamily, were selected for in vivo excision and sequenced. Go-VN13C (SEQ ID NO. 47) and Go-VNI3B
(SEQ ID NO. 49) are identical sequences differing by a 150 by deletion in Go-VN13C (SEQ ID
NO. 47). This sequence encodes for NMDQCANCPEYQYANTEKNKCIQKGVIVLSYEDPLGMALALIAFCFSAFTV (SEQ ID
NO. 58) in Go-VN13B (SEQ ID NO. 49) and is replaced by an M at position 552 in Go-VN13C
s (SEQ ID NO. 48).
DNA Sequencing and Sequence Analysis DNA sequencing was performed using ABI Prism dye terminator cycle ready reaction (Perkin Elmer, Norwalk, CT ) according to manufacturer's protocol. Samples were run on an ABI
Prism 310 Genetic Analyzer (Perkin Elmer, Norwalk, CT). Sequence homologies were determined using the BLAST system (NIH network service). Pairwise and ClustalW
alignments (BLOSUM30 matrix setting) as well as Kyte-Doolittle hydropathic analysis were obtained with the MacVector sequence analysis software (Oxford Molecular Group).
~ 5 In Situ Hybridization Analysis In situ hybridization was performed as described elsewhere (Schaeren-Wiemers, N., et al., Histochemistry, 1993, 100:431-440). VNOs were dissected from adult male (8- to 9-week-old), adult female (9- to 11-week-old), and young (1-week-old) rats.
Tissues were embedded in Tissue-Tek OCT. Antisense and sense digoxigenin-labeled probes were generated 2o from the full-length cDNAs encoding for Go, Gi2a, Go-VN13B (SEQ ID NO. 49), and Go-VN1 to Go-VN6 (SEQ ID NOs. 33, 35, 37, 39, 41, 43), as well as from the 3' untranslated regions of the Go-VN1 to Go-VN6 clones.
Imaging Processing and Statistical Analysis 2s Digital photographs were captured with a Leitz DMRB microscope (Leica) coupled to a ProgRes3012 digital camera (Kontron Electronic) and further processed with the Photoshop (Adobe System) and Canvas (Deneba) software for Macintosh. The relative positions of cells exhibiting a positive signal by in situ hybridization were measured along the basal-apical axis using the NIH Image analysis software. The number of cells in hemiconcentric sections of 10%
along this axis from the basal (value = 0) to the apical (value =100) boundaries was determined.
', Average data for Go-VN1 and Go-VN3 to Go-VN6 were obtained from six to eight VNO
' sections, corresponding to four individuals analyzed in two independent experiments. For Go-VN2, 14 VNO sections, corresponding to ten individuals and four independent experiments, were analyzed for each sex.
Southern Blot Analysis of Rat Genomic DNA and Screening of Rat and Human Genomic Libraries Genomic DNA, prepared from Lewis rat (Sprague-Dawley) liver, was digested with the restriction enzymes EcoRI and BamHI, size fractionated on 0.8% agarose gels, and blotted onto nylon membrane (Sambrook, J., et al., Molecular Cloning: A Laboratory Manual, Second Edition, Coid Spring Harbor Laboratory Press, 1989). Membranes were cross-linked under UV
light, hybridized overnight at both high (68°C) and low (55°C) stringency in hybridization buffer, and washed as described above. 32P-labeled probes were generated by random priming, using the following DNA templates: EcoRI-EcoRV, NotI-NsiI, EcoRI-SaII, PstI-NdeI, Xbal-HincII, and EcoRI-NsiI fragments of Go-VN1 to Go-VN6 (SEQ ID NOs. 33, 35, 37, 39, 41, 43), respectively; a full-length (425 bp) insert of Go-VN13A; and a cDNA
fragment including the seven transmembrane domains of Go-VN13B (SEQ ID NO. 49). Plaque-forming units (3 x 105) from rat and human genomic libraries (Stratagene, La Jolla, CA) were screened at low stringency (55 °C) using a mix of 32P-labeled probes prepared from fragments of Go-VN1 to Go-VN6 (SEQ ID NOs. 33, 35, 37, 39, 41, 43) encompassing the transmembrane domains 2 to 7.
The VNO Neuroepitheiium Expresses Two Independent Families of Pheromone Receptors We hypothesized the existence of two distinct families of genes encoding pheromone receptor genes that are selectively colocalized with either the Go protein in the basal half of the vomeronasal neuroepithelium or with the Gi2a protein in the apical region. For simplicity of nomenclature, and with the understanding that the cosegregation of distinct G-protein subunits with independent families of pheromone receptors is consistent but does not demonstrate a functional link, the family of genes encoding putative pheromone receptors that we have previously identified and that colocalize with Gi2a will be named GiZa VN, whereas the novel 3o family of receptors coexpressed with Go and described in this study will be named Go-VN. In the absence of information concerning the nature of the Go-VN receptor molecules, we reiterated the cloning strategy that allowed us to identify a family of putative pheromone receptor genes expressed by GiZa+ neurons (Dulac and Axel, Cell, 1995, 83:195-20b). This strategy was based on the assumption that individual neurons within the VNO are likely to express only one pheromone receptor gene and that transcripts encoding a given receptor represent between 1 and 0.1 % of a single-cell mRNA. Differential screening of cDNA libraries constructed from single-VNO neurons takes advantage of the fact that different cells express different receptors and thus provides an experimental solution to the problem of detecting a specific transcript in a heterogeneous population of neurons. In this attempt, we expected that differential screening of a cDNA library prepared from an isolated Go+, Gi2a VNO neuron would permit the isolation of a class of pheromone receptor genes distinct from the Gi2a VN family of receptor genes.
io A cDNA library prepared from a Go+ neuron (VN13) was dif~'erentially hybridized with s2p-labeled probes prepared from YN13 and from a second VNO neuron cDNA
(VN10). A 425 by cDNA (Go-VN13A) present at a frequency of 0.1% in the VN13-cDNA library showed selective hybridization with VN13 cell probe. Two cDNAs of longer size, Go-VN13B (SEQ ID
NO. 49) and Go-VN13C (SEQ ID NO. 47), were subsequently isolated from a cDNA
library prepared from dissected adult VNOs and showed 90% sequence similarity with Go-VN13A.
Hybridization to VNO cross-sections with digoxigenin-labeled antisense RNA
probe showed that expression of these transcripts is restricted to a small subpopulation of VNO
neurons in a location consistent with the region of Go expression of the neuroepithelium.
The sequence of Go-VN13B (SEQ ID NO. 49) reveals a partial open reading frame that includes seven 2o hydrophobic stretches of 20 amino acids in length. Go-VN13B (SEQ ID NO. 49) sequence does not share any resemblance with the odorant receptor genes nor with the family of putative pheromone receptor genes previously identified (see below). In addition, hybridization of Go-VN13B DNA probe to genomic DNA identified two discrete bands at high stringency and 13 or more at lower stringency, revealing the existence of a family of closely related genes in the rat genome.
Taken together, these data indicate that we have isolated a novel multigene family encoding seven transmembrane domain receptors and expressed by subsets of VNO
neurons from the basal half of the neuroepithelium.
3o Sequences of a New Family of VNO Receptors Recombinant phages from a VNO cDNA library were screened at low stringency with the Go-VN13B (SEQ ID NO. 49) DNA pmbe. Six distinct gene subfamilies were isolated that showed no cross-hybridization under stringent conditions of hybridization and washing. cDNAs Go-VN1 to Go-VN6, each representative of a subfamily, were fully sequenced (SEQ ID Nos 33, 35, 37, 39, 41 and 43).
In Go-VN1 to Go-VN5 cDNAs (SEQ ID Nos 33, 35, 37, 39 and 41), the first methionine of the open reading frame was tentatively chosen as a start for protein translation, revealing large open reading frames ranging from 548 to 866 amino acids. A frame shift in the Go-VN6 (SEQ
ID NO. 44) sequence (amino acid 532; indicated by slash bar in Fig. 3) indicated that this transcript is unable to generate a functional protein.
to Deduced Amino Acid Sequences of cDNAs from the Go-VN Family of Pheromone Receptors The deduced amino acid sequences of eight cDNAs belonging to the Go-VN family of putative pheromone receptors is shown in Figure 3. Predicted position of seven transmembrane domains is also indicated (I-VII). Amino acids common to at least five cDNAs are shaded.
Amino acids common to the rat mGluRl and Ca2+-sensing receptors are indicated by a star.
Hydropathy analysis of the predicted Go-VN proteins with the Kyte-Doolittle algorithm identified a large hydrophilic N-terminal domain that ranges in size from 274 amino acids in Go-VN 1 (SEQ ID NO. 34) to 595 in Go-VN4 (SEQ ID NO. 40). This is preceded in cDNAs Go-VN4 (SEQ ID NO. 40), Go-VN7 (SEQ ID NO. 46), and Go-VN13C (SEQ ID NO. 50) by 2o an initial hydrophobic 21 amino acid segment characteristic of eukaryotic signal sequences. A
cluster of seven hydrophobic regions representing potential membrane-spanning helices and typical of the G protein-coupled receptor superfamily is followed by a short hydrophilic sequence that indicates a potential intracytoplasmic C-terminal domain. A database search indicated the presence of sequence motifs common to Ca2+-sensing and metabotropic glutamate (mGluR) receptors (Houamed, K., et al., Science, 1991, 252:1318-1321; Masu, M., et al., Nature, 1991, 349:760-765; Brown, E., et al., Nature, 1993, 366:575-580 ; Pollak, M., et al., Cell, 1993 75:1297-1303). Pa.irwise sequence alignments reveal 18% to 23% sequence identity between the rat Ca2+-sensing receptor and the most distant (Go-VN3, SEQ ID Nos.37, 38) and the closest (Go-VN1, SEQ ID NOs. 33, 34) Go-VN sequences, respectively. Sequences of rat mGluRl and 3o Go-VN cDNAs appear more distantly related. Several localized regions showed a more pronounced degree of similarity, including a cysteine-rich sequence just preceding the first transmembrane domain (amino acid 206 to 260 in Go-VN1, SEQ ID NO. 34), the predicted transmembrane domains 2 to 7 with surrounding cytoplasmic and extracellular loops, and the relative position of 20 cysteines. The N-terminal and first transmembrane domains show little degree of homology. In mGluR and Ca2+-sensing receptors, the second intracellular loop is involved in providing specificity for G-protein coupling (Gomeza, J., et al., J. Biol. Chem., s 1996, 271:2199-2205), enabling dii~erent classes of mGluR receptors to activate phospholipase C or to inhibit adenylyl cyclase. In Go-VN, this domain is rich in basic residues, as expected for potential G-protein coupling, and shows closer resemblance to the class II and III mGluRs that were shown to couple to Go and Gi subunits. Overall, the six Go-VN sequences share between 42% and 75% sequence identity. Regions of Go-VN proteins downstream of transmembrane domain 2 are nearly identical in all VNO receptor s~uences. In contrast, N-terminal extracellular regions and first transmembrane domains are quite divergent.
Anomalies in Go-VN cDNA Sequences: Two unusual features were observed in the sequence of some Go-VN cDNAs. Iu Go-VN1 (SEQ ID NO. 33) and Go-VN3 (SEQ ID NO.
37) cDNAs, stretches of open reading frame can be found in the 5' extremity of the cDNAs that 15 generate polypeptide sequences of 310 and and 152 amino acids, respectively, which are interrupted by a frameshift in Go-VNl and by an insertion of 500 nucleic acids in Go-VN3. The prospective receptor protein sequences indicated for Go-VN1 (SEQ ID NO. 33) and Go-VN3 (SEQ ID NO. 37) (Fig. 3) start at the next available methionin and are therefore significantly shorter than those of other receptor cDNAs.
2o Go-VN7 (SEQ ID NO. 45) and Go-VN13C (SEQ ID NO. 47) cDNAs show a similar deletion of 150 by located at the exact same position in the sequence.
Strikingly, the 150 by deletion does not alter the own reading frame but generates a gap that encompasses 34 amino acids upstream of the first transmembrane domain and most of the first transmembrane domain itself.
25 Hydropathy analysis of Go-VN7 (SEQ ID NO. 46) and Go-VN13C (SEQ ID NO. 48) protein sequences detects only a seven to eight amino acid long hydrophobic stretch that might not be long enough to replace the deleted transmembrane domain 1 and allow the appropriate folding of the protein. Except for the 150 by gap, sequences of Go-VN13B (SEQ
ID NO. 50) and Go-VN 13 C (SEQ ID NO. 48) are identical. This raises the question as to whether both transcripts 3o might originate from alternative splicing of the same gene. Alternatively, they might be transcribed from independent genes that evolved from recent duplication and deletion events.
Size of the Go-VN Family of Genes We investigated the size of the Go-VN family of receptors by hybridizing 32P-labeled cDNA probes prepared from regions spanning the most divergent N-terminal half of the receptor protein to rat genomic DNA. Individual probes identify two to four discrete bands under s stringent conditions of hybridization and washing. Under conditions of reduced stringency, each of the individual probes now generates a unique pattern of 12 to 20 bands, providing a direct illustration of the existence of a very large family of related genes.
A direct estimate of the size of the Go-VN receptor gene family was obtained by low stringency screening of a rat genomic library. PCR amplification on genomic DNA had indicated 1 o that receptor genes are devoid of introns in the region encompassing transmembrane domains 2 to 7, enabling us to deduce directly the number of genes present in the rat genome. A mix of s2p_labeled DNA probes prepared from the six Go-VN cDNA fragments identified 110 positive clones per haploid genome, indicating that the family of Go-VN receptors may consist of 100 genes.
Expression Pattern of Go-VN Receptors The pattern of expression of the Go-VN receptor genes was examined by in situ hybridization with digoxigenin-labeled RNA antisense probes. No signal was observed after hybridizing the mix of Go-VNl to Go-VN6 (SEQ ID NOs. 33, 35, 37, 39, 41 and 43) receptor 2o probes to sections of muscle, testis, brain, or whole head. The adult olfactory epithelium was also consistently negative, although rare positive cells (one to three cells per section) were observed in the olfactory neuroepithelium of E19 rat embryo. In contrast, strong signals were observed when antisense receptor RNA probes were hybridized to VNO neuroepithelium. In adults, each one of the Go-VN probes detects small subsets of VNO sensory neurons. When hybridization and washing were performed at lower temperature, the number of faintly labeled neurons increased, revealing cross- hybridization to more distant receptor genes.
Under high stringency conditions, cDNA clones Go-VN1 to Go-VN6 label 1.9%, 3.6%, 6.1%, 0.4%, 3.5%, and 1.3% of the VNO sensory neurons, respectively. Under the same experimental conditions, the mix of all six Go-VN RNA probes labels 19% of the cells. This 3o number is similar to the sum of labeled neurons detected with the six individual Go-VN probes (17%), indicating that probes representing the six receptor subfamilies recognize distinct populations of VNO sensory neurons. Spatial Distribution of Go-VN Receptor Transcripts WO 99/x0422 PCT/US98/13680 Positive neurons identified with each of the Go-VN probes were randomly distributed along the anteroposterior and dorso-ventral axis of the VNO neuroepithelium. Most RNA
probes recognize cells that are preferentially localized in the most basal two-thirds of the neuroepithelium corresponding to the zone of Go expression. However, careful examination of adjacent cross-sections of vomeronasal neuroepithelium labeled with each of the Go-VN
probes reveals a well-organized spatial distribution of receptor expression. Different receptors appear preferentially localized in radial zones that define a series of hemiconcentric rings of distinct diameters. This pattern is observed along the entire length of the VNO and is conserved in all animals analyzed. The Go-VN3 (SEQ ID NO. 37) probe, for example, recognizes a subset of 1o neurons that are confined to the most basal third of the VNO
neuroepithelium. In contrast, the Go-VN1 (SEQ ID NO. 33), Go-VN4 (SEQ ID NO. 39), and Go-VNS (SEQ ID NO. 41) RNA
probes identify cells restricted to a hemiconcentric zone immediately apical to the area of Go-VN3 expression, whereas Go-VN2 identifies cells apposed to the apical Iayer of supporting cells. Go-VN6 in turn is found only in sparse cells immediately apposed to the basal membrane.
This is best seen in a statistical representation of Go-VN receptor localization collected from VNO sections and multiple animals that shows a striking conservation of these patterns. Thus, transcription of Go-VN cDNAs appears restricted to one of three circumscribed areas of the VNO
neuroepithelium in a manner quite reminiscent of the odorant receptor gene expression in four zones of the MOE (Ressler, K., et al., Cell, 1993, 73:597-609 ; Vassar, R., et al., Cell, 1993, 74:309-318). Although Go-VN3 (SEQ ID NO. 37) and Go-VN6 (SEQ ID NO. 43) transcripts show a clear segregation in the most basal region of the VNO neuroepithelium, the sequence anomalies found in both transcripts leave the functionality of this area of the neuroepithelium as an open question.
Sexual Dimorphism in Receptor Di$tribution and Age-Related Changes To identify potential sexual dimorphism in Go-VN receptor expression, we systematically hybridized each probe to sections originating from adult male and female rat VNOs. All receptors were equally distributed in males and females with the striking exception of Go-VN2 (SEQ ID
NO. 35). In females, Go-VN2 appears expressed in a large and centrally located region 3o comprising one-third of the neuroepithelium. In sharp contrast, the same probe recognizes in males a cohort of cells in the most apical side of the neuroepithelium, closely apposed to the VNO lumen, and most likely intermingled with Gi2a VNO sensory neurons. Such a difference in the Go-VN2 expression pattern in males and females might result from the expression of the same receptor gene in a different zone of the VNO epithelium or from a differential expression of two distinct but closely related genes of the Go-VN2 subfamily. In females, Go-VN2 generates a very intense hybridization signal to most positive neurons and a fainter staining on s a second set of labeled cells. The population of faintly labeled cells was never detected in males, indicating the existence of a female-specific neuronal subpopulation expressing either a lower level of the Go-VN2 transcript or a female-specific receptor significantly different but still cross-hybridizing to the Go-VN2 probe. We followed the emergence of receptor expression and of the VNO zonal organization during development and postnatal stages preceding puberty.
1o Go-VN receptor expression is first detected in the VNO of E14 embryos. No significant difference is observed in the onset of expression of Gi2a VN and Go-VN classes of receptor genes. In agreement with data of Berghard and Buck, 1996 in mouse, segregation of Gi2a and Go expression in the apical and basal areas of VNO neuroepithelium, respectively, is not apparent in the embryo and in 1-week-old animals. In contrast, Gi~+ cells appear randomly 15 distributed in large clusters over the whole thickness of the neuroepithelium, intermingled with Go cells. At 4 weeks after birth, however, Gi2a cells appear clearly localized in the apex of the epithelium. Similarly, in situ hybridization experiments with mixes of Go-VN
and GiZa VN
receptor probes on sections of the VNOs dissected from late embryos and 1-week-old animals show that the two cell populations are still intermingled at early postnatal stages. We observed 20 that the zonal distribution of the two families of receptors slowly emerges during sexual maturation to reach the spatial distribution observed in adults. Preliminary data indicate that the sexual dimorphic expression pattern of Go-VN2 is undetectable at 6 weeks after birth. Thus, in contrast to the zones of olfactory receptor gene expression, which are already present in the olfactory epithelium at the earliest stages of receptor gene expression in the embryo (Sullivan, 25 S., et al., Neuron, 1995, 15:779-789), the spatial organization of the VNO
neuroepithelium as detected by G-protein and receptor gene expression emerges only in a late postnatal period and reaches its definitive pattern at sexual maturity.
Expression of Go-VN Receptors Is Restricted to Go+ VNp Neurons 3o The expression of some of the Go-VN receptors in neurons lining the VNO
lumen in an area mainly occupied by Gi~+ cells raises the obvious question as to whether the expression of this family of genes is strictly restricted to Go+ VNO neurons. Single-cell cDNA prepared from 23 individual VNO neurons was analyzed by Southern blots with probes representing the six divergent subfamilies of Go-VN receptors and was PCR amplified with degenerated primers based on conserved motifs between Go-VN receptor sequences. Both approaches confirmed that none of the 19 cell cDNAs prepared from Gi2a+ neurons contained any sequence of the Go-VN
receptor family. In contrast, all four cDNAs generated from Gi2a cells contained a sequence related to the Go-VN receptors. PCR products generated with degenerated primers based on conserved motifs between Go-VN receptor sequences and obtained from the four Go+ cells were subcloned and sequenced. For each single-cell cDNA, the insert sequences from ten independent colonies were found to be identical. This set of data strongly suggests that Go-VN receptor 1 o genes are not expressed by Gi2a+ neurons and constitutes preliminary evidence for the expression of only one Go-VN receptor gene per neuron.
Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the invention described herein. Such equivalents are intended to be encompassed by the following claims. All references 1 s disclosed herein are incorporated by reference in their entirety.
A Sequence Listing is presented below and is followed by what is claimed.
SEQUENCE LISTING
(1) GENERAL INFORMATION
(i) APPLICANT: PRESIDENT AND FELLOWS OF HARVARD COLLEGE
(ii) TITLE OF THE INVENTION: NOVEL PHEROMONE RECEPTORS
(iii) NUMBER OF SEQUENCES: 92 (iv) CORRESPONDENCE ADDRESS:
(A) ADDRESSEE: Wolf, Greenfield & Sacks, P.C.
(B) STREET: 600 Atlantic Avenue (C) CITY: Boston (D) STATE: MA
(E) COUNTRY: U.S.A.
(F) ZIP: 02210-2211 (v) COMPUTER READABLE FORM:
(A) MEDIUM TYPE: Diskette (B) COMPUTER: IBM Compatible (C) OPERATING SYSTEM: DOS
(D) SOFTWARE: FastSEQ for Windows Version 2.0 (vi) CURRENT APPLICATION DATA:
(A) APPLICATION NUMBER:
(B) FILING DATE:
(C) CLASSIFICATION:
(vii) PRIOR APPLICATION DATA:
(A) APPLICATION NUMBER: 60/051,284 (B) FILING DATE: 30-~TC1N-1997 (viii) ATTORNEY/AGENT INFORMATION:
(A) NAME: Plumer, Elizabeth R.
(B) REGISTRATION NUMBER: 36,637 (C) REFERENCE/DOCKET NUMBER: H0498/7074 (ix) TELECOMMUNICATION INFORMATION:
(A) TELEPHONE: 617-720-3500 (B) TELEFAX: 617-720-2441 ( C ) TELEX
(2) INFORMATION FOR SEQ ID NO:1:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3080 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 57...2606 (D) OTHER INFORMATION: VR1 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:1:
Met Lys Gln Leu Cys Ala Phe Thr Ile Ser Leu Leu Phe Leu Lys Phe Ser Leu Ile Leu Cys Cys Leu Thr Glu Pro Ser Cys Phe Trp Arg Ile Arg Asn Ser Glu Asp Ser Asp Gly Asp Leu Gln Arg Glu Cys His Phe Tyr Leu Trp Lys Thr Asp Glu Pro Ile Glu Asp Ser Phe Tyr Asn Tyr Asp Leu Ser Phe Arg Ile Ala Ala Ser Glu Tyr Glu Phe Leu Leu Val Met Phe Phe Ala Ile Asp Glu Ile Asn Arg Asn Pro Tyr Leu Leu Pro Asn Ile Thr Leu Met Phe Ser Phe Ile Gly Gly Asn Cys Gln Asp Leu Leu ', Arg Val Met Asp Gln Ala Tyr Thr Gln Ile Asn Gly His Met Asn Phe TAT TGT TTA TGT
GCC
ATA
GGT
Val Asn Phe Tyr Asp AspSerCys IleGly Leu Thr Tyr Cys Leu Ala TCA AAA TCC ATG
Gly Pro Trp Thr Leu LysLeuAla HisSer Ser Met Ser Lys Ser Met ', 150 155 160 GTT TTT CCA CTA
Pro Leu Phe Gly Phe AsnProAsn ArgAsp His Asp Val Phe Pro Leu ', CGG CTG CAT CAT GTA GCCCCCAAG ACACAT TTG TCC 635 CCC GTC CAG GAC
Arg Leu His His Val AlaProLys ThrHis Leu Ser Pro Val Gln Asp ATG TCC ATG TGG
His Gly Val Leu Phe HisPheArg ThrTrp Ile Gly Met Ser Met Trp ATC GAT GAC TTT
Leu Val Ser Asp Gln GlyIleGln LeuSer Asp Leu Ile Asp Asp Phe GAA CAA CAT GCT
Arg Glu Ser Arg Gly IleCysLeu PheVal Asn Met Glu Gln His Ala GAA ATG ATA GCT
Ile Pro Asn Gln Tyr MetThrArg ThrIle Tyr Asp.
Glu Met Ile Ala ATG AAG GTT TAT
Lys His Ile Thr Ser Ser Ala Val Ile Ile Gly Glu Met Lys Val Tyr ACT TTT AGA GAG
Met Asn Ser Leu Glu Ala Ser Arg Trp Glu Leu Gly Thr Phe Arg Glu ATC TCA CAA ATC
Ala Arg Arg Trp Ile Thr Thr Trp Asp Val Thr Asn Ile Ser Gln Ile TTC TTC CAT ACT
Lys Lys Asp Thr Leu Asn Leu Gly Ile Ile Phe Glu Phe Phe His Thr TTT TTA AAT CAA
His His Arg Glu Ile Pro Lys Lys Phe Met Thr Met Phe Leu Asn Gln AAA ATT TCT TTG
Asn Thr Ala Tyr Pro Val Asp His Thr Ile Glu Trp Lys Ile Ser Leu AAT AAG AAC ATG
Asn Tyr Phe Cys Ser Ile Ser Ser Ile Arg His His Asn Lys Asn Met AAC TGG ACA AAC
Ile Thr Phe Asn Thr Leu Glu Ser Leu His Tyr Asp Asn Trp Thr Asn AGT AAT TTG GTT
Val Ala Met Asp Glu Gly Tyr Tyr Asn Ala Tyr Ala Ser Asn Leu Val ACC ATT TTT GAG
Val Ala His Tyr His Glu Tyr Gln Gln Val Ser Gln Thr Ile Phe Glu AAA TTC ACT CAG
Lys Lys Ala Pro Lys Arg Tyr Ala Cys Gln Val Ser Lys Phe Thr Gln AAA ACG AAC GAA
Ser Leu Met Thr Arg Val Phe Pro Val Gly Leu Val Lys Thr Asn Glu CAT TGT ACA ATT
Asn Met Lys Arg Glu Asn Gln Glu Tyr Asp Phe Ile His Cys Thr Ile TTT GGA TTA ATA
Ile Trp Asn Pro Gln Gly Leu Lys Val Lys Gly Ser Phe Gly Leu Ile TGT CAA AAA TCT
Tyr Leu Pro Phe Pro Gln Arg Leu His Ile Asp Asp Cys Gln Lys Ser GCC TCA CCT TCC
Leu Glu Trp Lys Gly Gly Thr Gln Val Pro Ser Val Ala Ser Pro Ser Cys Ser Val Ala Cys Thr Ala Gly Phe Arg Lys Ile Tyr Gln Lys Glu Thr Ala Asp Cys Cys Phe Asp Cys Val Gln Cys Pro Glu Asn Glu Ile ~ TCC AAC GAA ACA GAT ATG GAA CAG TGT GTG AGG TGT CCA GAT GAT AAG 1739 _ Ser Asn Glu Thr Asp Met Glu Gln Cys Val Arg Cys Pro Asp Asp Lys Tyr AlaAsn IleGluGln ThrHisCysLeu SerArgAla ValSer Phe Leu AlaTyr GluAspSer LeuGlyMetAla LeuGlyCys MetAla Leu Ser PheSer AlaIleThr IleLeuIleLeu ValThrPhe ValLys Tyr Lys AspThr ProThrVal LysAlaAsnAsn ArgIleLeu SerTyr Ile Leu LeuIle SerLeuVal PheCysPheLeu CysSerLeu LeuPhe Ile Gly ProPro AspGlnVal ThrCysIlePhe GlnGlnThr ThrPhe Gly Val LeuPhe ThrValSer ValSerThrVal LeuAlaLys ThrIle Thr Val ValMet AlaPheLys LeuThrThrPro GlyArgArg MetArg Gly Met MetMet ThrGlyAla ProLysLeuVal IleProIle CysThr Leu Ile GlnLeu ValLeuCys GlyIleTrpLeu ValThrSer ProPro Phe Ile AspArg AspIleGln SerGluHisGly LysIleVal IleLeu Cys Asn LysGly SerValIle AlaPheHisVal ValLeuGly TyrLeu Gly Ser LeuAla LeuGlySer PheThrLeuAla PheLeuAla ArgAsn Leu .
ProAsp ThrPheAsn Glu Lys PheLeuThr PheSerMet LeuVal Ala PheCys SerValTrp IleThrPhe LeuProVal TyrHisSer ThrArg GlyArg ValMetVal ValValGlu ValPheSer IleLeuAla SerSer AlaGly LeuLeuMet CysIlePhe ValProLys CysTyrVal IleLeu IleArg ProAspSer AsnPheIle LysAsnHis LysGlyLys LeuLeu TATTGAAACTTTC GATATTCAAC TTATCTTATT
ATGGTATGAA CTTCAT
AATGTTAGAT
Tyr (2) INFORMATION FOR SEQ ID N0:2:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 850 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:2:
Met Lys Gln Leu Cys Ala Phe Thr Ile Ser Leu Leu Phe Leu Lys Phe Ser Leu Ile Leu Cys Cys Leu Thr Glu Pro Ser Cys Phe Trp Arg Ile Arg Asn Ser Glu Asp Ser Asp Gly Asp Leu Gln Arg Glu Cys His Phe Tyr Leu Trp Lys Thr Asp Glu Pro Ile Glu Asp Ser Phe Tyr Asn Tyr Asp Leu Ser Phe Arg Ile Ala Ala Ser Glu Tyr Glu Phe Leu Leu Val Met Phe Phe Ala Ile Asp Glu Ile Asn Arg Asn Pro Tyr Leu Leu Pro Asn Ile Thr Leu Met Phe Ser Phe Ile Gly Gly Asn Cys Gln Asp Leu Leu Arg Val Met Asp Gln Ala Tyr Thr Gln Ile Asn Gly His Met Asn Phe Val Asn Tyr Phe Cys Tyr Leu Asp Asp Ser Cys Ala Ile Gly Leu Thr Gly Pro Ser Trp Lys Thr Ser Leu Lys Leu Ala Met His Ser Ser Met Pro Leu Val Phe Phe Gly Pro Phe Asn Pro Asn Leu Arg Asp His Asp Arg Leu Pro His Val His Gln Val Ala Pro Lys Asp Thr His Leu Ser His Gly Met Val Ser Leu Met Phe His Phe Arg Trp Thr Trp Ile Gly Leu Val Ile Ser Asp Asp Asp Gln Gly Ile Gln Phe Leu Ser Asp Leu Arg Glu Glu Ser Gln Arg His Gly Ile Cys Leu Ala Phe Val Asn Met Ile Pro Glu Asn Met Gln Ile Tyr Met Thr Arg Ala Thr Ile Tyr Asp Lys His Ile Met Thr Ser Ser Ala Lys Val Val Ile Ile Tyr Gly Glu Met Asn Ser Thr Leu Glu Ala Ser Phe Arg Arg Trp Glu Glu Leu Gly Ala Arg Arg Ile Trp Ile Thr Thr Ser Gln Trp Asp Val Ile Thr Asn Lys Lys Asp Phe Thr Leu Asn Leu Phe His Gly Ile Ile Thr Phe Glu His His Arg Phe Glu Ile Pro Lys Leu Asn Lys Phe Met Gln Thr Met Asn Thr Ala Lys Tyr Pro Val Asp Ile Ser His Thr Ile Leu Glu Trp Asn Tyr Phe Asn Cys Ser Ile Ser Lys Asn Ser Ile Arg Met His His Ile Thr Phe Asn Asn Thr Leu Glu Trp Thr Ser Leu His Asn Tyr Asp Val Ala Met Ser Asp Glu Gly Tyr Asn Leu Tyr Asn Ala Val Tyr Ala Val Ala His Thr Tyr His GIu Tyr Ile Phe Gln Gln Val Glu Ser Gln Lys Lys Ala Lys Pro Lys Arg Tyr Phe Thr Ala Cys Gln Gln Val Ser Ser Leu Met Lys Thr Arg Val Phe Thr Asn Pro Val Gly Glu Leu Val Asn Met Lys His Arg Glu Asn Gln Cys Thr Glu Tyr Asp Ile Phe Ile Ile Trp Asn Phe Pro Gln Gly Leu Gly Leu Lys Val Lys Ile Gly Ser Tyr Leu Pro Cys Phe Pro Gln Arg Gln Lys Leu His Ile Ser Asp Asp Leu Glu Trp Ala Lys Gly Gly Thr Ser Pro Gln Val Pro Ser Ser Val Cys Ser Val Ala Cys Thr Ala Gly Phe Arg Lys Ile Tyr Gln Lys Glu Thr Ala Asp Cys Cys Phe Asp Cys Val Gln Cys Pro Glu Asn Glu Ile Ser Asn Glu Thr Asp Met Glu Gln Cys Val Arg Cys Pro Asp Asp Lys Tyr Ala Asn Ile Glu Gln Thr His Cys Leu Ser Arg Ala Val Ser Phe Leu Ala Tyr Glu Asp Ser Leu Gly Met Ala Leu Gly Cys Met Ala Leu Ser Phe Ser Ala Ile Thr Ile Leu Ile Leu Val Thr Phe Val Lys Tyr Lys Asp Thr Pro Thr Val Lys Ala Asn Asn Arg Ile Leu Ser Tyr Ile Leu Leu Ile Ser Leu Val Phe Cys Phe Leu Cys Ser Leu Leu Phe Ile Gly Pro Pro Asp Gln Val Thr Cys Ile Phe Gln Gln Thr Thr Phe Gly Val Leu Phe Thr Val Ser Val Ser Thr Val Leu Ala Lys Thr Ile Thr Val Val Met Ala Phe Lys Leu Thr Thr Pro Gly Arg Arg Met Arg Gly Met Met Met Thr Gly Ala Pro Lys Leu Val Ile Pro Ile Cys Thr Leu Ile Gln Leu Val Leu Cys Gly Ile Trp Leu Val Thr Ser Pro Pro Phe Ile Asp Arg Asp Ile Gln Ser Glu His Gly Lys Ile Val Ile Leu Cys Asn Lys Gly Ser Val Ile Ala Phe His Val Val Leu Gly Tyr Leu Gly Ser Leu Ala Leu Gly Ser Phe Thr Leu Ala Phe Leu Ala Arg Asn Leu Pro Asp Thr Phe Asn Glu Ala Lys Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ser Val Trp Ile Thr Phe Leu Pro Val Tyr His Ser Thr Arg Gly Arg Val Met Val Val Val Glu Val Phe Ser Ile Leu Ala Ser Ser Ala Gly Leu Leu Met Cys Ile Phe Val Pro Lys Cys Tyr Val Ile Leu Ile Arg Pro Asp Ser Asn Phe Ile Lys Asn His Lys Gly Lys Leu Leu Tyr (2) INFORMATION FOR SEQ ID N0:3:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2961 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 86...2509 (D) OTHER INFORMATION: VR2 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:3:
GTGCAACTGT
GTGTGTGATG
TTTTTCTGCA
TCAGAAACGG
ATTTCACAGC
CAGATCCTAG
CAGAC
MetLysGln LeuCysThr PheThrIle TTG AAG
Ser LeuPhe Leu Phe SerLeuIle LeuCysCys TrpSerGlu Leu Lys AGC AGG
Pro CysPhe Trp Ile LysLysSer GluAspAsn AspGlyAsp Ser Arg CAA CAT
Leu ArgGlu Cys Phe TyrLeuTrp LysThrAsp GluProIle Gln His GAT AAT
Glu SerPhe Tyr Tyr AspLeuSer PheArgIle AlaGlySer Asp Asn TAT CTG
Glu GluLeu Leu Val MetPhePhe AlaThrAsp GluIleAsn Tyr Leu Lys Asn Pro Tyr Leu Leu Pro Asn Met Ser Leu Met Phe Ser Ile Ile Gly Gly Asn Cys His Asp Leu Leu Arg Ser Leu Asp Gln Glu Tyr A1a Gln Ile Asp Gly His Met Asn Phe Val Asn Tyr Phe Cys Tyr Leu Asp ' 125 130 135 Asp Ser Cys Ala Thr Gly Leu Thr Gly Pro Ser Trp Lys Thr Ser Leu Lys Leu Ala Met His Ser Ser Met Pro Leu Val Phe Phe Gly Pro Phe CGC CGG CAT GTC GTA
', Asn ProAsn LeuArgAsp His Asp Leu Pro Val HisGlnVal Arg His j GCC CCCAAG GACACACAT TTG TCC GGC ATG TCC TTGATGTTT 688 CAT GTC
Ala ProLys AspThrHis Leu Ser Gly Met Ser LeuMetPhe His Val CTG TCA
His PheArg TrpThrTrp Ile Gly Val Ile Asp AspAspGln Leu Ser ', 205 210 215 AGA AGC
Gly IleGln PheLeuSer Asp Leu Glu Glu Gln ArgHisGly Arg Ser ', 220 225 230 ATC AAC
Ile CysLeu AlaPheVal Asn Met Pro Glu Met GlnIleTyr Ile Asn I
ACA ATG
Met ThrArg AlaThrIle Tyr Asp Gln Ile Thr SerSerAla Thr Met ATG ACT
', Lys ValVal IleIleTyr Gly Asp Asn Ser Leu GluAlaSer Met Thr GCT ATC
Phe ArgArg TrpGluGlu Leu Gly Arg Arg Trp IleThrThr Ala Ile Thr Gln Trp Asp Val Ile Thr Asn Lys Lys Asp Phe Thr Leu Asn Leu Phe His Gly Thr Ile Thr Phe Ala His His Lys Asp Glu Ile Pro Lys Phe Arg Asn Phe Met Gln Thr Lys Lys Thr Ala Lys Tyr Leu Val Asp I, ATT TCT CAT ACT ATT TTG GAG TGG AAT TAT TTT AAT TGT TCA ATC TCT 1168 IleSer ThrIle Leu Trp TyrPhe CysSer IleSer His Glu Asn Asn TTT AAC
LysAsn SerSerLys MetGlyHis ThrPhe AsnThr LeuGln Phe Asn ATG AGC
TrpThr AlaLeuHis AsnTyrAsp AlaLeu AspGlu GlyTyr Met Ser GTG ACC
AsnLeu TyrAsnAla ValTyrAla AlaHis TyrHis GluTyr Val Thr AAA AAA
IleLeu GlnGlnVal GluSerGln LysAla ProLys ArgTyr Lys Lys TCC AAA
PheThr AlaCysGln GlnValSer LeuMet ThrArg ValPhe Ser Lys AAC CAT
MetAsn ProValGly GluLeuVal MetLys ArgGlu AsnGln Asn His ATT TTT
CysThr GluTyrAsp IlePheIle TrpAsn ProGln GlyLeu Ile Phe TAT TGC
GlyLeu LysValLys ValGlySer LeuPro PhePro LysSer Tyr Cys TTG GCC
GlnGln LeuHisIle AlaAspAsp GluTrp MetGly GlyThr Leu Ala AGA GAT
SerVal AspMetGlu GlnCysVal CysPro AsnLys TyrAla Arg Asp CAA GTG
AsnLeu GluGlnThr HisCysLeu ArgThr SerPhe LeuAla Gln Val CTA ATG
TyrGlu AspProLeu GlyMetAla GlyCys AlaLeu SerPhe Leu Met GTC GTG
SerAla IleThrIle LeuValLeu ThrPhe LysTyr LysAsp Val Val CGC AGC
ThrPro IleValLys AlaAsnAsn IleLeu TyrIle LeuLeu Arg Ser TGT CTC
IleSer LeuValPhe CysPheLeu SerLeu PheIle GlyHis Cys Leu CAG ACA
ProAsp GlnValThr CysIleLeu GlnThr PheGly ValLeu Gln Thr .
TCT GTG AAA ATA
Phe Thr Val Ser Val Thr LeuAla Thr ThrValVal Ser Val Lys Ile ACT CCA AGG AGA
Met Ala Phe Lys Leu Thr GlyArg Met GlyMetMet Thr Pro Arg Arg AAG GTC ATT ACC
Met Thr Gly Ala Pro Leu IlePro Cys LeuIleGln Lys Val Ile Thr ATC TTG TCT CCC
Leu Val Leu Cys Gly Trp ValThr Pro PheIleAsp Ile Leu Ser Pro GAA GGG GTC CTT
Arg Asp Ile Gln Ser His LysIle Ile CysAsnLys Glu Gly Val Leu TTC GTC GGA TTG
Gly Ser Val Val Ala His ValLeu Tyr GlySerLeu Phe Val Gly Leu 700 . 705 710 ACT GCT GCT AAC
Ala Leu Gly Ser Phe Leu PheLeu Arg LeuProAsp Thr Ala Ala Asn AAG CTA AGC CTG
Thr Phe Asn Glu Ala Phe ThrPhe Met ValPheCys Lys Leu Ser Leu TTC CCT CAC ACC
Ser Val Trp Ile Thr Leu ValTyr Ser ArgGlyLys Phe Pro His Thr GAG TTC TTG TCT
Val Met Val Val Val Val SerIle Ala SerAlaGly Glu Phe Leu Ser TTT CCA TAT ATT
Leu Leu Met Cys Ile Val LysCys Val LeuIleArg Phe Pro Tyr Ile ATA AAC GGT TTG
Pro Asp Ser Asn Phe Gln HisLys Lys LeuTyr Ile Asn Gly Leu TAGATGATAT TCTTAATAAA
TCAACTTATC
AAAATAAAGT CAAACTGGAC
AATATACAGA
GAACTGGGAT CCAATATTTT
TCTCAATTGA
AGCCATGTAC GGTTACCCTA
TTAATTAATG
CTCTAGGCAT AAGGGTACTG
GCTGTCCTTG
CCAGTAATCA ATGGAGTTCT
ACATTATTCC
GACTTTATTC GAATAAATAA
AATGTTCTAT
AAAAAAA
(2) INFORMATION FOR SEQ ID N0:4:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 808 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:4:
Met Lys Gln Leu Cys Thr Phe Thr Ile Ser Leu Leu Phe Leu Lys Phe Ser Leu Ile Leu Cys Cys Trp Ser Glu Pro Ser Cys Phe Trp Arg Ile Lys Lys Ser Glu Asp Asn Asp Gly Asp Leu Gln Arg Glu Cys His Phe Tyr Leu Trp Lys Thr Asp Glu Pro Ile Glu Asp Ser Phe Tyr Asn Tyr Asp Leu Ser Phe Arg Ile Ala Gly Ser Glu Tyr Glu Leu Leu Leu Val Met Phe Phe Ala Thr Asp Glu Ile Asn Lys Asn Pro Tyr Leu Leu Pro Asn Met Ser Leu Met Phe Ser Ile Ile Gly Gly Asn Cys His Asp Leu Leu Arg Ser Leu Asp Gln Glu Tyr Ala Gln Ile Asp Gly His Met Asn Phe Val Asn Tyr Phe Cys Tyr Leu Asp Asp Ser Cys Ala Thr Gly Leu Thr Gly Pro Ser Trp Lys Thr Ser Leu Lys Leu Ala Met His Ser Ser Met Pro Leu Val Phe Phe Gly Pro Phe Asn Pro Asn Leu Arg Asp His Asp Arg Leu Pro His Val His Gln Val Ala Pro Lys Asp Thr His Leu Ser His Gly Met Val Ser Leu Met Phe His Phe Arg Trp Thr Trp Ile Gly Leu Val Ile Ser Asp Asp Asp Gln Gly Ile Gln Phe Leu Ser Asp Leu Arg Glu Glu Ser Gln Arg His Gly Ile Cys Leu Ala Phe Val Asn Met Ile Pro Glu Asn Met Gln Ile Tyr Met Thr Arg Ala Thr Ile Tyr Asp Thr Gln Ile Met Thr Ser Ser Ala Lys Val Val Ile Ile Tyr Gly Asp Met Asn Ser Thr Leu Glu Ala Ser Phe Arg Arg Trp Glu Glu Leu Gly Ala Arg Arg Ile Trp Ile Thr Thr Thr Gln Trp Asp Val Ile Thr Asn Lys Lys Asp Phe Thr Leu Asn Leu Phe His Gly Thr Ile Thr Phe Ala His His Lys Asp Glu Ile Pro Lys Phe Arg Asn Phe Met Gln Thr Lys Lys Thr Ala Lys Tyr Leu Val Asp Ile Ser His Thr Ile Leu Glu Trp Asn Tyr Phe Asn Cys Ser Ile Ser Lys Asn Ser Ser Lys Met Gly His Phe Thr Phe Asn Asn Thr Leu Gln Trp Thr Ala Leu His Asn Tyr Asp Met Ala Leu Ser Asp Glu Gly Tyr Asn Leu Tyr Asn Ala Val Tyr Ala Val Ala His Thr Tyr His Glu Tyr Ile Leu Gln Gln Val Glu Ser Gln Lys Lys Ala Lys Pro Lys Arg Tyr Phe Thr Ala Cys Gln Gln Val Ser Ser Leu Met Lys Thr Arg Val Phe Met Asn Pro Val Gly Glu Leu Val Asn Met Lys His Arg Glu Asn Gln Cys Thr Glu Tyr Asp Ile Phe _ 73 _ Ile Ile Trp Asn Phe Pro Gln Gly Leu Gly Leu Lys Val Lys Val Gly Ser Tyr Leu Pro Cys Phe Pro Lys Ser Gln Gln Leu His Ile Ala Asp Asp Leu Glu Trp Ala Met Gly Gly Thr Ser Val Asp Met Glu Gln Cys Val Arg Cys Pro Asp Asn Lys Tyr Ala Asn Leu Glu Gln Thr His Cys Leu Gln Arg Thr Val Ser Phe Leu Ala Tyr Glu Asp Pro Leu Gly Met Ala Leu Gly Cys Met Ala Leu Ser Phe Ser Ala Ile Thr Ile Leu Val Leu Val Thr Phe Val Lys Tyr Lys Asp Thr Pro Ile Val Lys Ala Asn Asn Arg Ile Leu Ser Tyr Ile Leu Leu Ile Ser Leu Val Phe Cys Phe Leu Cys Ser Leu Leu Phe Ile Gly His Pro Asp Gln Val Thr Cys Ile Leu Gln Gln Thr Thr Phe Gly Val Leu Phe Thr Val Ser Val Ser Thr Val Leu Ala Lys Thr Ile Thr Val Val Met Ala Phe Lys Leu Thr Thr Pro Gly Arg Arg Met Arg Gly Met Met Met Thr Gly Ala Pro Lys Leu Val Ile Pro Ile Cys Thr Leu Ile Gln Leu Val Leu Cys Gly Ile Trp Leu Val Thr Ser Pro Pro Phe Ile Asp Arg Asp Ile Gln Ser Glu His Gly Lys Ile Val Ile Leu Cys Asn Lys Gly Ser Val Val Ala Phe His Val Val Leu Gly Tyr Leu Gly Ser Leu Ala Leu Gly Ser Phe Thr Leu Ala Phe Leu Ala Arg Asn Leu Pro Asp Thr Phe Asn Glu Ala Lys Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ser Val Trp Ile Thr Phe Leu Pro Val Tyr His Ser Thr Arg Gly Lys Val Met Val Val Val Glu Val Phe Ser Ile Leu Ala Ser Ser Ala Gly Leu Leu Met Cys Ile Phe Val Pro Lys Cys Tyr Val Ile Leu Ile Arg Pro Asp Ser Asn Phe Ile Gln Asn His Lys Gly Lys Leu Leu Tyr (2) INFORMATION FOR SEQ ID N0:5:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2907 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/ItEY: Coding Sequence (B) LOCATION: 1...2409 (D) OTIiER INFORMATION: VR3 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:5:
His Phe Tyr Leu Gly Ala Val Asp Lys Pro Ile Glu Asp Asn Phe Tyr.
TCA TTA AGA GCA GAG
Asn Leu LysPhe Ile Ala Ser GluTyr PheLeu Ser Leu Arg Ala Glu GTA TTT ACT ATC CCT
Leu Met PheAla Asp Glu Asn LysAsn TyrLeu Val Phe Thr Ile Pro CCC ATA ATG ATC AAC
Leu Asn ThrLeu Phe Ser Ile GlyGly CysHis Pro Ile Met Ile Asn TTA AGA GAT TAT AAT
Asp Leu GlyLeu Gln Ala Thr GlnIle GlyHis Leu Arg Asp Tyr Asn AAT GTT TTC TTA TGT
Met Phe AsnTyr Cys Tyr Asp AspSer AlaIle Asn Val Phe Leu Cys CTT GGA TGG TCC GCA
Gly Thr ProSer Lys Thr Leu AsnLeu MetHis Leu Gly Trp Ser Ala TCA CCA TTC TCA AAC
Ser Met LeuVal Phe Gly Phe AsnPro LeuHis Ser Pro Phe Ser Asn CAT CGG CAT CAA AAG
Asp Asp LeuHis Val His Val AlaThr AspThr His Arg His Gln Lys TTG CAT GTC ATG AGA
His Ser GlyIle Ser Leu Phe HisPhe TrpThr Leu His Val Met Arg ATA CTG TCA GAC CAG
Trp Gly ValIle Asp Asp Lys GlyIle PheLeu Ile Leu Ser Asp Gln GAT AGA AGC CAT TTA
Ser Leu GluGlu Gln Arg Gly IleCys AlaPhe Asp Arg Ser His Leu AAT ATC AAC ATA AGG
Val Met ProGlu Met Gln Tyr MetThr AlaThr Asn Ile Asn Ile Arg TAT AAA ATG TTA GTT
Ile Asp GlnIle Thr Ser Ala LysVal IleIle Tyr Lys Met Leu Val GGT ATG ACA GTA AGA
Tyr Glu AsnSer Leu Glu Ser PheArg TrpGlu Gly Met Thr Val Arg TTA GCT ATC ACA TGG
Asn Gly ArgArg Trp Ile Thr SerGln AspVal Leu Ala Ile Thr Trp ACA AAA TTC AAT GGG
Ile Asn LysGlu Thr Leu Leu PheHis ThrIle Thr Lys Phe Asn Gly Thr Phe Ala His Arg Arg Phe Glu Ile Pro Lys Phe Lys Lys Phe Met Gln Thr Met Asn Thr Ala Lys Tyr Pro Val Asp Ile Ser His Thr Ile Leu Glu Trp Asn Tyr Phe Asn Cys Ser Ile Ser Lys Asn Ser Ser Lys Met Asp His IleThr Asn Thr GluTrp Leu His Phe Asn Leu Thr Ala ATG GAT GGT TTG
Asn Tyr AspMetVal Ser Glu TyrAsn TyrAsn Ala Met Asp Gly Leu CAC TAC GAA TTT
Val Tyr AlaValAla Thr His HisIle GlnGln Val His Tyr Glu Phe GCA CCC AGA ACT
Glu Ser GlnLysLys Lys Lys PhePhe ValCys Gln Ala Pro Arg Thr ATG ACC GTA AAC
Gln Val SerSerLeu Lys Arg PheThr ProVal Gly Met Thr Val Asn AAG AGG AAT ACA
Glu Leu ValAsnMet His Glu GlnCys GluTyr Asp Lys Arg Asn Thr AAC CCA GGC TTA
Ile Phe LeuIleTrp Phe Gln LeuGly LysVal Lys Asn Pro Gly Leu CCT TTT CAG GAA
Ile Gly SerTyrLeu Cys Pro ArgGln LeuHis Ile Pro Phe Gln Glu TGG ATG GGA GTG
Ser Asp AspLeuGlu Ala Gly ThrSer ValPro Ser Trp Met Gly Val GCA ACT GGA AAA
Ser Val CysSerVal Cys Ala PheArg IleHis Gln Ala Thr Gly Lys TGC TTT TGT TGC
Lys Glu ThrAlaAsp Cys Asp ValGln ProGlu Asn Cys Phe Cys Cys ACA ATG CAG AAG
Glu Val SerAsnGlu Asp Glu CysVal CysPro Tyr Thr Met Gln Lys ATA AAA CAC TCA
Asp Lys TyrAlaAsn Glu Thr CysLeu ArgAla Val Ile Lys His Ser GAA CCA GGG CTA .
SerPhe Leu Tyr Glu Asp LeuGlyIle LeuGlyCys Ile Ala Pro Ala ACA
AlaLeu SerPheSer Ala Ile IleLeuValLeu IleThrPhe Leu Thr GTG
LysTyr LysAspThr Pro Ile LysAlaAsnAsn ArgIleLeu Ser Val GTC
TyrIle LeuLeuIle Ser Leu PheCysPheLeu CysSerLeu Leu Val GTC
PheIle GlyHisPro Asn Gln SerCysValLeu GlnGlnThr Thr Val TCT
PheGly ValPhePhe Thr Val ValSerThrVal LeuAlaLys Thr Ser AAG
IleThr ValValMet Ala Phe LeuThrThrPro GlyArgArg Met Lys GCA
ArgGlu MetLeuVal Thr Gly ProLysLeuVal IleProIle Cys Ala TGT
ThrLeu IleGlnPhe Val Leu GlyIleTrpLeu IleThrSer Pro Cys CAA
ProPhe IleAspArg Asp Ile SerGluHisGly LysIleVal Ile Gln ATT
LeuCys AsnLysGly Ser Val AlaPheHisVal ValLeuGly Tyr Ile AGC
LeuGly SerLeuAla Leu Gly PheThrLeuAla PheLeuAla Arg Ser GAA
AsnLeu ProAspThr Phe Asn AlaLysPheLeu ThrPheSer Met Glu ATC
LeuVal PheCysSer Val Trp ThrPheLeuPro ValTyrHis Ser Ile GTT
ThrArg GlyLysVal Met Val ValGluValPhe SerIleLeu Ala Val TGT
SerSer AlaGlyLeu Leu Met IlePheValPro LysCysTyr Val Cys AAT
IleLeu ValArgPro Asp Ser PheIleArgLys TyrLysAsp Lys Asn .
_77_ Phe Arg Tyr ATAAAAATTT AAATAATATA CAAATTTGAA
{2) INFORMATION FOR SEQ ID N0:6:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 803 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal {xi) SEQUENCE DESCRIPTION: SEQ ID N0:6:
His Phe Tyr Leu Gly Ala Val Asp Lys Pro Ile Glu Asp Asn Phe Tyr Asn Ser Leu Leu Lys Phe Arg Ile Ala Ala Ser Glu Tyr Glu Phe Leu Leu Val Met Phe Phe Ala Thr Asp Glu Ile Asn Lys Asn Pro Tyr Leu Leu Pro Asn Ile Thr Leu Met Phe Ser Ile Ile Gly Gly Asn Cys His Asp Leu Leu Arg Gly Leu Asp Gln Ala Tyr Thr Gln Ile Asn Gly His Met Asn Phe Val Asn Tyr Phe Cys Tyr Leu Asp Asp Ser Cys Ala Ile Gly Leu Thr Gly Pro Ser Trp Lys Thr Ser Leu Asn Leu Ala Met His Ser Ser Met Pro Leu Val Phe Phe Gly Ser Phe Asn Pro Asn Leu His Asp His Asp Arg Leu His His Val His Gln Val Ala Thr Lys Asp Thr His Leu Ser His Gly Ile Val Ser Leu Met Phe His Phe Arg Trp Thr Trp Ile Gly Leu Val Ile Ser Asp Asp Asp Lys Gly Ile Gln Phe Leu Ser Asp Leu Arg Glu Glu Ser Gln Arg His Gly Ile Cys Leu Ala Phe Val Asn Met Ile Pro Glu Asn Met Gln Ile Tyr Met Thr Arg Ala Thr Ile Tyr Asp Lys Gln Ile Met Thr Ser Leu Ala Lys Val Val Ile Ile Tyr Gly Glu Met Asn Ser Thr Leu Glu Val Ser Phe Arg Arg Trp Glu Asn Leu Gly Ala Arg Arg Ile Trp Ile Thr Thr Ser Gln Trp Asp Val Ile Thr Asn Lys Lys Glu Phe Thr Leu Asn Leu Phe His Gly Thr Ile Thr Phe Ala His Arg Arg Phe Glu Ile Pro Lys Phe Lys Lys Phe Met Gln Thr Met Asn Thr Ala Lys Tyr Pro Val Asp Ile Ser His Thr Ile .
_78_ Leu Glu Trp Asn Tyr Phe Asn Cys Ser Ile Ser Lys Asn Ser Ser Lys Met Asp His Ile Thr Phe Asn Asn Thr Leu Glu Trp Thr Ala Leu His Asn Tyr Asp Met Val Met Ser Asp Glu Gly Tyr Asn Leu Tyr Asn Ala Val Tyr Ala Val Ala His Thr Tyr His Glu His Ile Phe Gln Gln Val Glu Ser Gln Lys Lys Ala Lys Pro Lys Arg Phe Phe Thr Val Cys Gln Gln Val Ser Ser Leu Met Lys Thr Arg Val Phe Thr Asn Pro Val Gly Glu Leu Val Asn Met Lys His Arg Glu Asn Gln Cys Thr Glu Tyr Asp Ile Phe Leu Ile Trp Asn Phe Pro Gln Gly Leu Gly Leu Lys Val Lys Ile Gly Ser Tyr Leu Pro Cys Phe Pro Gln Arg Gln Glu Leu His Ile Ser Asp Asp Leu Glu Trp Ala Met Gly Gly Thr Ser Val Val Pro Ser Ser Val Cys Ser Val Ala Cys Thr Ala Gly Phe Arg Lys Ile His Gln Lys Glu Thr Ala Asp Cys Cys Phe Asp Cys Val Gln Cys Pro Glu Asn Glu Val Ser Asn Glu Thr Asp Met Glu Gln Cys Val Lys Cys Pro Tyr Asp Lys Tyr Ala Asn Ile Glu Lys Thr His Cys Leu Ser Arg Ala Val Ser Phe Leu Ala Tyr Glu Asp Pro Leu Gly Ile Ala Leu Gly Cys Ile Ala Leu Ser Phe Ser Ala Ile Thr Ile Leu Val Leu Ile Thr Phe Leu Lys Tyr Lys Asp Thr Pro Ile Val Lys Ala Asn Asn Arg Ile Leu Ser Tyr Ile Leu Leu Ile Ser Leu Val Phe Cys Phe Leu Cys Ser Leu Leu Phe Ile Gly His Pro Asn Gln Val Ser Cys Val Leu Gln Gln Thr Thr Phe Gly Val Phe Phe Thr Val Ser Val Ser Thr Val Leu Ala Lys Thr Ile Thr Val Val Met Ala Phe Lys Leu Thr Thr Pro Gly Arg Arg Met Arg Glu Met Leu Val Thr Gly Ala Pro Lys Leu Val Ile Pro Ile Cys Thr Leu Ile Gln Phe Val Leu Cys Gly Ile Trp Leu Ile Thr Ser Pro Pro Phe Ile Asp Arg Asp Ile Gln Ser Glu His Gly Lys Ile Val Ile Leu Cys Asn Lys Gly Ser Val Ile Ala Phe His Val Val Leu Gly Tyr Leu Gly Ser Leu Ala Leu Gly Ser Phe Thr Leu Ala Phe Leu Ala Arg Asn Leu Pro Asp Thr Phe Asn Glu Ala Lys Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ser Val Trp Ile Thr Phe Leu Pro Val Tyr His Ser Thr Arg Gly Lys Val Met Val Val Val Glu Val Phe Ser Ile Leu Ala Ser Ser Ala Gly Leu Leu Met Cys Ile Phe Val Pro Lys Cys Tyr Val Ile Leu Val Arg Pro Asp Ser Asn Phe Ile Arg Lys Tyr Lys Asp Lys Phe Arg Tyr (2) INFORMATION FOR SEQ ID N0:7:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3625 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 117...2672 (D) OTHER INFORMATION: VR4 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:7:
', TGAATATGCA 60 ATAAACCTCA
CATTTGCACA
AAGAAATAAA
AGCTGGTAGA
AATCTGATGT
GCTGATATGC ATGGCACTTC TTAAGGCAGG
ACAATCCGCA AAAAAG
CTGCCCAGGT ATG
Met TTC AAT ACA
Phe IlePhe Met Gly Val Phe LeuLeu Ile Leu Leu Met Phe Asn Thr ', GCC AATTTC ATT GAT CCC AGG TTTTGG ATA TTG GAT GAA 215 TGC AGA AAT
Ala AsnPhe Ile Asp Pro Arg PheTrp Ile Leu Asp Glu Cys Arg Asn TTA GCT ATC
', Ile ThrAsp Glu Tyr Leu Gly SerCys Phe Leu Ala Ala Leu Ala Ile GAT AAC ACT
Val GlnThr Pro Ile Glu Lys TyrPhe Thr Leu Asn Phe Asp Asn Thr j 50 55 60 65 AAA TTG TTG
Leu LysThr Thr Lys Asn His TyrAla Ala Vai Phe Ala Lys Leu Leu CCT TTA AAT
Met AspGlu Ile Asn Arg Tyr AspLeu Pro Met Ser Leu Pro Leu Asn Ile Ile Arg Tyr Ser Leu Gly His Cys Asp Gly Lys Thr Val Thr Pro Thr Pro Tyr Leu Phe His Arg Lys Lys Gln Ser Pro Ile Pro Asn Tyr Phe Cys Asn Glu Glu Ser Met Cys Ser Phe Leu Leu Ser Gly Pro Asn Trp Asp Glu Ser Leu Ser Phe Trp Lys Tyr Leu Asp Ser Phe Leu Ser ', 150 155 160 Pro Arg Ile Leu Gln Leu Ser Tyr Gly Ser Phe Ser Ser Ile Phe Ser .
CCC TAT GCC GAC
AspAsp Glu Gln Tyr Tyr Leu Gln Met Pro Lys Thr Pro Tyr Ala Asp ATG TTC TAT TGG
SerLeu Ala Leu Ala Val Ser Ile Leu Leu Lys Asn Met Phe Tyr Trp ATC GAT GGA TTT
TrpIle Gly Leu Val Pro Asp Asp Gln Asn Gln Leu Ile Asp Gly Phe CAG AAC ATT GCC
LeuGlu Leu Lys Lys Ser Glu Lys Glu Cys Phe Phe Gln Asn Ile Ala GTT GTT CCA ACT
ValLys Met Ile Ser Asp Glu Ser Phe Gln Lys Glu Val Val Pro Thr ATT TCA AAT ATC
IleAsn Tyr Lys Gln Val Lys Leu Thr Val Ile Ile Ile Ser Asn Ile AAT GAT TTC TGG
TyrGly Glu Thr Tyr Phe Ile Leu Ile Arg Met Glu Asn Asp Phe Trp AGA ATC AAA AAT
ProPro Ile Leu Gln Ile Trp Thr Thr Gln Leu Phe Arg Ile Lys Asn GAC CAT TTC TCA
ProThr Ser Lys Thr Ile Ser Asp Thr Tyr Gly Leu Asp His Phe Ser CAT ATT TTT TTT
ThrPhe Leu Pro His Gly Glu Ser Gly Lys Asn Val His Ile Phe Phe CTC ACA TGT ATG
GinThr Trp Phe His Arg Asn Asp Leu Leu Val Pro Leu Thr Cys Met AAC GAC TCT AAA
GluTrp Lys Tyr Ile Ser Glu Ser Ala Asn Cys Ile Asn Asp Ser Lys TCT TCA TGG GAA
LeuLys Asn Ser Ser Asp Ala Phe Asp Leu Met Glu Ser Ser Trp Glu TTT AAT AAC AAT
LysLeu Asp Met Ala Ser Glu Ser His Ile Tyr Ala Phe Asn Asn Asn CAT CAT AAT CAG
ValHis Ala Ile Ala Ala Leu Glu Met Leu Gln Ala His His Asn Gln GAT AAA AGT TGC
AspAsn Gln Ala Ile Asn Gly Gly Ala Ser His Leu Asp Lys Ser Cys W0.99I00422 PCTNS98/13680 Lys Val Asn Ser Phe Leu Arg Arg Thr Tyr Phe Thr Asn Pro Leu Gly Asp Lys Val Phe Met Lys Gln Arg Val Ile Met Gln Asp Glu Tyr Asp Ile Val His Phe Ala Asn Leu Ser Gln His Leu Gly Ile Lys Met Lys Leu Gly Lys Phe Ser Pro Tyr Leu Pro His Gly Arg His Ser His Leu Tyr Val Asp Met Ile Glu Leu Ala Thr Gly Arg Arg Lys Met Pro Ser ', 500 505 510 Ser Val Cys Ser Ala Asp Cys Ser Pro Gly Phe Arg Arg Leu Trp Lys ', 515 520 525 Glu Gly Met Ala Ala Cys Cys Phe Val Cys Ser Pro Cys Pro Glu Asn Glu Ile Ser Asn Glu Thr Asn Met Asp Gln Cys Val Asn Cys Pro Glu ', Tyr Gln Tyr Ala Asn Thr Glu Gln Asn Lys Cys Ile Gln Lys Gly Val Thr Phe Leu Ser Tyr Glu Asp Pro Leu Gly Met Ala Leu Ala Leu Met Ala Phe CysPhe SerAlaPheThr Val LeuCys PheVal Ala Val Val AAT AGC
Lys His HisAsp ThrProIleVal LysAla AsnArg LeuSer Asn Ser TTT TCC
Tyr Leu LeuLeu MetSerLeuMet PheCys LeuCys PhePhe Phe Ser GTC CAA
Phe Ile GlyLeu ProAsnLysVal IleCys LeuGln IleThr Val Gln ACA GCC
Phe Gly IleVal PheThrValAla ValSer ValLeu LysThr Thr Ala GTC AGA
', Val Thr ValVal LeuAlaPheLys ValThr ProGly ArgLeu Val Arg AGA TAC TTC CTT GTA TCA GGG ACA CTA AAC TAC ATT ATT CCT ATA TGT. 2231 Arg Tyr Phe Leu Val Ser Gly Thr Leu Asn Tyr Ile Ile Pro Ile Cys GTC TCT CCT
Ser Leu Leu Gln Cys Val Leu Cys Ala Ile Trp Leu Ala Val Ser Pro ATC ATC ATT
Pro Phe Val Asp Ile Asp Glu His Ser Gln His Gly His Ile Ile Ile CTT GGA TAC
Val Cys Asn Lys Gly Ser Val Thr Ala Phe Tyr Cys Val Leu Gly Tyr TTG GCC AAG
Leu Ala Cys Leu Ala Leu Gly Ser Phe Thr Leu Ala Phe Leu Ala Lys TTC AGC ATG
Asn Leu Pro Asp Ala Phe Asn Glu Ala Lys Phe Leu Thr Phe Ser Met TAC CAT AGC
Leu Val Phe Cys Ser Val Trp Val Thr Phe Leu Pro Val Tyr His Ser ATC TTG GCA
Thr Lys Gly Lys His Met Val Ala Val Glu Ile Phe Ser Ile Leu Ala ATT TAT ATC
Ser Ser Ala Gly Met Leu Gly Cys Ile Phe Val Pro Lys Ile Tyr Ile AGA GAA AAA
Ile Leu Met Arg Pro Glu Arg Asn Ser Thr Gln Lys Ile Arg Glu Lys ACATAACCA
Ser Tyr Phe GGTTGCTCTA
AATCTTGCAC
CAATTTTATT
GTTGATAAGG
GGTTACACAT
ATAATCAGCA
AGAAAATACT
GAAATGTTCC
CAGGGATTCT
ATTCTCAACA
TACACAAGCT
CAGTGGGAGA
GCATTGGGGA
GTCAGTGGGG
AATAAATTAA
AAAA
(2) INFORMATION FOR SEQ ID NO: B:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 852 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:8:
Met Phe Ile Phe Met Gly Val Phe Phe Leu Leu Asn Ile Thr Leu Leu Met Ala Asn Phe Ile Asp Pro Arg Cys Phe Trp Arg Ile Asn Leu Asp Glu Ile Thr Asp Glu Tyr Leu Gly Leu Ser Cys Ala Phe Ile Leu Ala Ala Val Gln Thr Pro Ile Glu Lys Asp Tyr Phe Asn Thr Thr Leu Asn Phe Leu Lys Thr Thr Lys Asn His Lys Tyr Ala Leu Ala Leu Val Phe Ala Met Asp Glu Ile Asn Arg Tyr Pro Asp Leu Leu Pro Asn Met Ser Leu Ile Ile Arg Tyr Ser Leu Gly His Cys Asp Gly Lys Thr Val Thr Pro Thr Pro Tyr Leu Phe His Arg Lys Lys Gln Ser Pro Ile Pro Asn Tyr Phe Cys Asn Glu Glu Ser Met Cys Ser Phe Leu Leu Ser Gly Pro Asn Trp Asp Glu Ser Leu Ser Phe Trp Lys Tyr Leu Asp Ser Phe Leu Ser Pro Arg Ile Leu Gln Leu Ser Tyr Gly Ser Phe Ser Ser Ile Phe Ser Asp Asp Glu Gln Tyr Pro Tyr Leu Tyr Gln Met Ala Pro Lys Asp Thr Ser Leu Ala Leu Ala Met Val Ser Phe Ile Leu Tyr Leu Lys Trp Asn Trp Ile Gly Leu Val Ile Pro Asp Asp Asp Gln Gly Asn Gln Phe Leu Leu Glu Leu Lys Lys Gln Ser Glu Asn Lys Glu Ile Cys Phe Ala Phe Val Lys Met Ile Ser Val Asp Glu Val Ser Phe Pro Gln Lys Thr Glu Ile Asn Tyr Lys Gln Ile Val Lys Ser Leu Thr Asn Val Ile Ile Ile Tyr Gly Glu Thr Tyr Asn Phe Ile Asp Leu Ile Phe Arg Met Trp Glu Pro Pro Ile Leu Gln Arg Ile Trp Ile Thr Thr Lys Gln Leu Asn Phe Pro Thr Ser Lys Thr Asp Ile Ser His Asp Thr Phe Tyr Gly Ser Leu Thr Phe Leu Pro His His Gly Glu Ile Ser Gly Phe Lys Asn Phe Val Gln Thr Trp Phe His Leu Arg Asn Thr Asp Leu Cys Leu Val Met Pro Glu Trp Lys Tyr Ile Asn Ser Glu Asp Ser Ala Ser Asn Cys Lys Ile Leu Lys Asn Ser Ser Ser Asp Ala Ser Phe Asp Trp Leu Met Glu Glu Lys Leu Asp Met Ala Phe Ser Glu Asn Ser His Asn Ile Tyr Asn _ 385 390 395 400 Ala Val His Ala Ile Ala His Ala Leu His Glu Met Asn Leu Gln Gln Ala Asp Asn Gln Ala Ile Asp Asn Gly Lys Gly Ala Ser Ser His Cys Leu Lys Val Asn Ser Phe Leu Arg Arg Thr Tyr Phe Thr Asn Pro Leu Gly Asp Lys Val Phe Met Lys Gln Arg Val Ile Met Gln Asp Glu Tyr Asp Ile Val His Phe Ala Asn Leu Ser Gln His Leu Gly Ile Lys Met Lys Leu Gly Lys Phe Ser Pro Tyr Leu Pro His Gly Arg His Ser His Leu Tyr Val Asp Met Ile Glu Leu Ala Thr Gly Arg Arg Lys Met Pro Ser Ser Val Cys Ser Ala Asp Cys Ser Pro Gly Phe Arg Arg Leu Trp Lys Glu Gly Met Ala Ala Cys Cys Phe Val Cys Ser Pro Cys Pro Glu Asn Glu Ile Ser Asn Glu Thr Asn Met Asp Gln Cys Val Asn Cys Pro Glu Tyr Gln Tyr Ala Asn Thr Glu Gln Asn Lys Cys Ile Gln Lys Gly Val Thr Phe Leu Ser Tyr Glu Asp Pro Leu Gly Met Ala Leu Ala Leu Met Ala Phe Cys Phe Ser Ala Phe Thr Ala Val Val Leu Cys Val Phe Val Lys His His Asp Thr Pro Ile Val Lys Ala Asn Asn Arg Ser Leu Ser Tyr Leu Leu Leu Met Ser Leu Met Phe Cys Phe Leu Cys Ser Phe Phe Phe Ile Gly Leu Pro Asn Lys Val Ile Cys Val Leu Gln Gln Ile Thr Phe Gly Ile Val Phe Thr Val Ala Val Ser Thr Val Leu Ala Lys Thr Val Thr Val Val Leu Ala Phe Lys Val Thr Val Pro Gly Arg Arg Leu Arg Tyr Phe Leu Val Ser Gly Thr Leu Asn Tyr Ile Ile Pro Ile Cys Ser Leu Leu Gln Cys Val Leu Cys Ala Ile Trp Leu Ala Val Ser Pro Pro Phe Val Asp Ile Asp Glu His Ser Gln His Gly His Ile Ile Ile Val Cys Asn Lys Gly Ser Val Thr Ala Phe Tyr Cys Val Leu Gly Tyr Leu Ala Cys Leu Ala Leu Gly Ser Phe Thr Leu Ala Phe Leu Ala Lys Asn Leu Pro Asp Ala Phe Asn Glu Ala Lys Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ser Val Trp Val Thr Phe Leu Pro Val Tyr His Ser Thr Lys Gly Lys His Met Val Ala Val Glu Ile Phe Ser Ile Leu Ala Ser Ser Ala Gly Met Leu Gly Cys Ile Phe Val Pro Lys Ile Tyr Ile Ile Leu Met Arg Pro Glu Arg Asn Ser Thr Gln Lys Ile Arg Glu Lys Ser Tyr Phe (2) INFORMATION FOR SEQ ID N0:9:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3125 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 1...2169 (D) OTHER INFORMATION: VR5 (xi) SEQUENCE
DESCRIPTION:
SEQ ID N0:9:
AGT TCA CTT GGA
Ile Cys Asn Glu Glu Met Cys PheLeu Ser Pro Asn Ser Ser Leu Gly AGT AAG GAC TTC
Trp Asp Glu Ser Leu Phe Trp TyrLeu Ser Leu Ser Ser Lys Asp Phe CTT GGA AGT ATC
Pro His Ile Leu Gln Ser Tyr SerPhe Ser Phe Ser Leu Gly Ser Ile CCC TAT GCC AAG
Asp Asp Glu Gln Tyr Tyr Leu GlnMet Pro Asp Thr Pro Tyr Ala Lys ATG TTC TAT AAA
Ser Leu Ala Leu Ala Val Ser IleLeu Leu Trp Asn Met Phe Tyr Lys ATC GAC GGA CAA
Trp Ile Gly Leu Val Pro Asp AspGln Asn Phe Leu Ile Asp Gly Gln ', 85 90 95 CAG AAC ATT TTT
Leu Glu Leu Lys Lys Ser Glu LysGlu Cys Ala Phe Gln Asn Ile Phe Val Lys Met Ile Ser Val Asp Glu Val Ser Phe Pro Gln Lys Thr Glu Ile Tyr Tyr Lys Gln Ile Val Lys Ser Leu Thr Asn Val Ile Ile Ile ', 130 135 140 Tyr Gly Glu Thr Tyr Asn Phe Ile Asp Leu Ile Phe Arg Met Trp Glu ATT CAG
AGA
ATA
TGG
ATC
ACC
ACA
AAA
CAA
TTG
AAT
i Pro Pro Leu Arg Ile Trp ThrThrLys Gln Leu Phe Ile Gln Ile Asn AGT ACA CAT TCA
Pro Thr Lys Asp Ile Ser AspThrPhe Tyr Gly Leu Ser Thr His Ser CTA CAC ATT TTT
Thr Phe Pro His Gly Glu SerGlyPhe Lys Asn Val Leu His Ile Phe TGG CAT ACA ATG
Gln Thr Phe Leu Arg Asn AspLeuTyr Leu Val Pro Trp His Thr Met AAA ATT GAC AAA
', Glu Trp Tyr Asn Ser Glu SerAlaSer Asn Cys Ile Lys Ile Asp Lys WO 99/00422 PCTlUS98/13680 AAG TCA TGG CTA ATG
AAC
Leu LysAsnSer Ser Asp Ala Ser Phe Asp Met Glu Gln Ser Trp Leu GCC AAC ATA
Lys LeuAspMet Phe Ser Asp Asn Ser His Tyr Asn Val Ala Asn Ile GCC AAT CTG
Val HisAlaIle His Ala Leu His Glu Met Gln Gln Ala Ala Asn Leu ATA AGT TCT
Asp AsnGlnAla Asp Asn Gly Lys Gly Ala His Cys Leu Ile Ser Ser TTT ACT AAT
Lys ValAsnSer Leu Arg Arg Thr Tyr Phe Pro Leu Gly Phe Thr Asn ATG CAG GAT
Asp LysValPhe Lys Gln Arg Val Ile Met Glu Tyr Asp Met Gln Asp GCG GGG ATT
Ile ValHisPhe Asn Leu Ser Gln His Leu Lys Met Lys Ala Gly Ile AGC CGA CAC
Leu GlyLysPhe Pro Tyr Leu Pro His Gly Ser His Leu Ser Arg His ATT AGA AAG
Tyr ValAspMet Glu Leu Ala Thr Gly Arg Met Pro Ser Ile Arg Lys GCA AGA AGA
Ser ValCysSer Asp Cys Ser Pro Gly Phe Leu Trp Lys Ala Arg Arg GCC CCC TGC
Glu GlyMetAla Cys Cys Phe Val Cys Ser Pro Glu Asn Ala Pro Cys GAG GTG AAT
Glu IleSerAsn Thr Asn Met Asp Gln Cys Cys Pro Glu Glu Val Asn AAC ATT CAG
Tyr GlnTyrAla Thr Glu Gln Asn Lys Cys Lys Gly Val Asn Ile Gln TAT GCA CTT
Thr PheLeuSer Glu Asp Pro Leu Gly Met Ala Leu Met Tyr Ala Leu TCT CTT TGT
Ala PheCysPhe Ala Phe Thr Ala Val Val Val Phe Val Ser Leu Cys ACT AAC AGA
Lys HisHisAsp Pro Ile Val Lys Ala Asn Ser Leu Ser Thr Asn Arg TAT CTATTACTC TCA CTC ATG TTC TGT TTT TCC TTT TTC.1536 ATG CTG TGC
_ _87_ Tyr Leu LeuLeuMet SerLeuMet PheCysPheLeu CysSerPhe Phe Phe Ile GlyLeuPro AsnLysVal IleCysValLeu GlnGlnIle Thr Phe Gly IleValPhe ThrValAla ValSerThrVal LeuAlaLys Thr Val Thr ValValLeu ATaPheLys ValThrAspPro GlyArgArg Leu Arg Tyr PheLeuVal SerGlyThr LeuAsnTyrIle IleProIle Cys 565 5?0 575 Ser Leu LeuGlnCys ValLeuCys AlaIleTrpLeu AlaValSer Pro Pro Phe ValAspIle AspGluHis SerGlnHisGly HisIleIle Ile Val Cys AsnLysGly SerValThr AlaPheTyrCys ValLeuGly Tyr Leu Ala CysLeuAla LeuGlySer PheThrLeuAla PheLeuAla Lys Asn Leu ProAspAla PheAsnGlu AlaLysPheLeu ThrPheSer Met Leu Val PheCysSer ValTrpVal ThrPheLeuPro ValTyrHis Ser Thr Lys GlyLysHis MetValAla ValGluIlePhe SerIleLeu Ala Ser Ser AlaGlyMet LeuGluCys IlePheValPro LysIleTyr Ile ', ., Ile Leu Met Arg Pro Glu Arg Asn Ser Thr Gln Lys Ile Arg Glu Lys ' 705 710 715 720 Ser Tyr Phe ', CTTCGTTTTG ATTTCATGGA GATTGCCCTC TGGTAACTTC CAAAAACCGT TGATAAGGCA 2458 ', AAACCATCTA CCAAATCAAA TAATCAATGA GAAACACAGA CTAACTAAAT AATCAGCAAA 2578 _88_ (2) INFORMATION FOR SEQ ID NO:10:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 723 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID NO:10:
Ile Cys Asn Glu Glu Ser Met Cys Ser Phe Leu Leu Ser Gly Pro Asn Trp Asp Glu Ser Leu Ser Phe Trp Lys Tyr Leu Asp Ser Phe Leu Ser Pro His Ile Leu Gln Leu Ser Tyr Gly Ser Phe Ser Ser Ile Phe Ser Asp Asp Glu Gln Tyr Pro Tyr Leu Tyr Gln Met Ala Pro Lys Asp Thr Ser Leu Ala Leu Ala Met Val Ser Phe Ile Leu Tyr Leu Lys Trp Asn Trp Ile Gly Leu Val Ile Pro Asp Asp Asp Gln Gly Asn Gln Phe Leu Leu Glu Leu Lys Lys Gln Ser Glu Asn Lys Glu Ile Cys Phe Ala Phe Val Lys Met Ile Ser Val Asp Glu Val Ser Phe Pro Gln Lys Thr Glu Ile Tyr Tyr Lys Gln Ile Val Lys Ser Leu Thr Asn Val Ile Ile Ile Tyr Gly Glu Thr Tyr Asn Phe Ile Asp Leu Ile Phe Arg Met Trp Glu Pro Pro Ile Leu Gln Arg Ile Trp Ile Thr Thr Lys Gln Leu Asn Phe Pro Thr Ser Lys Thr Asp Ile Ser His Asp Thr Phe Tyr Gly Ser Leu Thr Phe Leu Pro His His Gly Glu Ile Ser Gly Phe Lys Asn Phe Val Gln Thr Trp Phe His Leu Arg Asn Thr Asp Leu Tyr Leu Val Met Pro Glu Trp Lys Tyr Ile Asn Ser Glu Asp Ser Ala Ser Asn Cys Lys Ile Leu Lys Asn Ser Ser Ser Asp Ala Ser Phe Asp Trp Leu Met Glu Gln Lys Leu Asp Met Ala Phe Ser Asp Asn Ser His Asn Ile Tyr Asn Val Val His Ala Ile Ala His Ala Leu His Glu Met Asn Leu Gln Gln Ala Asp Asn Gln Ala Ile Asp Asn Gly Lys Gly Ala Ser Ser His Cys Leu Lys Val Asn Ser Phe Leu Arg Arg Thr Tyr Phe Thr Asn Pro Leu Gly Asp Lys Val Phe Met Lys Gln Arg Val Ile Met Gln Asp Glu Tyr Asp _ _ _89_ Ile Val His Phe Ala Asn Leu Ser Gln His Leu Gly Ile Lys Met Lys Leu Gly Lys Phe Ser Pro Tyr Leu Pro His Gly Arg His Ser His Leu Tyr Val Asp Met Ile Glu Leu Ala Thr Gly Arg Arg Lys Met Pro Ser Ser Val Cys Ser Ala Asp Cys Ser Pro Gly Phe Arg Arg Leu Trp Lys Glu Gly Met Ala Ala Cys Cys Phe Val Cys Ser Pro Cys Pro Glu Asn Glu Ile Ser Asn Glu Thr Asn Met Asp Gln Cys Val Asn Cys Pro Glu Tyr Gln Tyr Ala Asn Thr Glu Gln Asn Lys Cys Ile Gln Lys Gly Val Thr Phe Leu Ser Tyr Glu Asp Pro Leu Gly Met Ala Leu Ala Leu Met Ala Phe Cys Phe Ser Ala Phe Thr Ala Val Val Leu Cys Val Phe Val Lys His His Asp Thr Pro Ile Val Lys Ala Asn Asn Arg Ser Leu Ser Tyr Leu Leu Leu Met Ser Leu Met Phe Cys Phe Leu Cys Ser Phe Phe Phe Ile Gly Leu Pro Asn Lys Val Ile Cys Val Leu Gln Gln Ile Thr Phe Gly Ile Val Phe Thr Val Ala Val Ser Thr Val Leu Ala Lys Thr Val Thr Val Val Leu Ala Phe Lys Val Thr Asp Pro Gly Arg Arg Leu Arg Tyr Phe Leu Val Ser Gly Thr Leu Asn Tyr Ile Ile Pro Ile Cys Ser Leu Leu Gln Cys Val Leu Cys Ala Ile Trp Leu Ala Val Ser Pro Pro Phe Val Asp Ile Asp Glu His Ser Gln His Gly His Ile Ile Ile Val Cys Asn Lys Gly Ser Val Thr Ala Phe Tyr Cys Val Leu Gly Tyr Leu Ala Cys Leu Ala Leu Gly Ser Phe Thr Leu Ala Phe Leu Ala Lys Asn Leu Pro Asp Ala Phe Asn Glu Ala Lys Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ser Val Trp Val Thr Phe Leu Pro Val Tyr His Ser Thr Lys Gly Lys His Met Val Ala Val Glu Ile Phe Ser Ile Leu Ala Ser Ser Ala Gly Met Leu Glu Cys Ile Phe Val Pro Lys Ile Tyr Ile Ile Leu Met Arg Pro Glu Arg Asn Ser Thr Gln Lys Ile Arg Glu Lys Ser Tyr Phe (2) INFORMATION FOR SEQ ID NO:11:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1889 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:11:
GCCATTGTTT
GACCACAAAG
TGTACGGCTT
GAAAATATGG
CTAATGCGAA
CCCCATATTA
TTTAAGCACA
AAAAAATACC
TTTTCTGATA
TTACCTAGTC
GTGTACGCTG
TGTGAAAATG
ATTGAGGTGA
CTTAACCTCT
GCAAATGCTC
ATATTTTCAG
GTAACCCTGG
ATTTCTAATG
ACAGAGAAGA
GGGATGGCTC
ATATTTGTGA
ACTTTGCTCA
AACACAGTTG
GCCACTGTGT
AGAATGGTAA
CTGATCCAAC
GATGCTCATA
TTCCACTCTG
TTGTCAAGAA
GTATTCTTCT
ATGGTCGCCG
(2) INFORMATION FOR SEQ ID N0:12:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 604 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID N0:12:
Ser Leu Ser Leu Ala Ile Val Ser Leu Met Val His Phe Arg Trp Ser Trp Val Gly Leu Ile Leu Pro Asp Asp His Lys Gly Asn Lys Ile Leu Ser Asp Phe Arg Lys Glu Met GIu Arg Lys Arg Ile Cys Thr Ala Phe Val Lys Met Ile Pro Ala Thr Trp Thr Ser Ser Phe Val Lys Phe Trp Glu Asn Met Asp Asp Thr Asn Ile Ile Ile Ile Tyr Gly Asp Ile Asp Ser Leu Glu Gly Leu Met Arg Asn Ile Gly Gln Arg Leu Leu Thr Trp His Val Trp Val Met Asn Ile Glu Pro His Ile Ile Glu Tyr Asp Asn Tyr Phe Met Leu Asp Ser Phe His Gly Ser Leu Ile Phe Lys His Asn Tyr Arg Glu Asn Phe Glu Phe Thr Lys Phe Ile Arg Thr Val Asn Pro Lys Lys Tyr Pro Glu Asp Ile Tyr Leu Pro Lys Met Trp Tyr Leu Phe Phe Met Cys Ser Phe Ser Asp Ile Asn Cys Gln Val Leu Asp Ser Cys Gln Thr Asn Ala Ser Leu Asp Met Leu Pro Ser Gln Ile Phe Asp Val Val Met Ser Glu Glu Ser Thr Ser Ile Tyr Asn Ala Val Tyr Ala Val Ala His Ser Leu His Glu Met Arg Leu Gln Gln Leu Gln Thr Gln Pro Cys Glu Asn Glu Glu Gly Met Glu Phe Phe Pro Trp Gln Leu Asn Thr Phe Leu Lys Asp Ile Glu Val Arg Val Asn Ser Leu Asp Trp Arg Gln Arg Ile Asp Ala Glu Tyr Asp Ile Leu Asn Leu Trp Asn Leu Pro Lys Gly Leu Gly Leu Lys Val Lys Ile Gly Asn Phe Tyr Ala Asn Ala Pro Gln Gly Gln Gln Leu Ser Leu Ser Glu Gln Met Ile Gln Trp Pro Glu Ile Phe Ser Glu Ile Pro Gln Ser Val Cys Ser Glu Ser Cys Gly Pro Gly Phe Arg Lys Val Thr Leu Glu Asn Lys Ala Ile Cys Cys Tyr Asn Cys Thr Pro Cys Ala Asp Asn Glu Ile Ser Asn Glu Thr Asp Val Asp Gln Cys Val Lys Cys Pro Glu Ser His Tyr Ala Asn Thr Glu Lys Ser Asn Cys Tyr Gln Lys Ser Val Ser Phe Leu Gly Tyr Glu Asp Pro Leu Gly Met Ala Leu Ala Ser Ile Ala Leu Cys Leu Ser Ala Leu Thr Ala Phe Val Ile Gly Ile Phe Val Lys His Lys Asp Thr Pro Ile Val Lys Ala Asn Asn Gln Ala Leu Ser Tyr Thr Leu Leu Ile Thr Leu Lys Phe Cys Phe Leu Cys Ser Leu Asn Phe Ile Gly Gln Pro Asn Thr Val Ala Cys Ile Leu Gln Gln Thr Thr Phe Ala Val Ala Phe Thr Met Ala Leu Ala Thr Val Leu Ala Lys Ala Ile Thr Val Val Leu Ala Phe Lys Val Ser Phe Pro Gly Arg Met Val Arg Trp Leu Met Ile Ser Arg Gly Pro Asn Tyr Ile Ile Pro Ile Cys Thr Leu Ile Gln Leu Leu Leu Cys Gly Ile Trp Met Ala Ile Ser Pro Pro Tyr Ile Asp Gln Asp Ala His Ile Glu His Gly His Ile Ile Ile Leu Cys Asn Lys Gly Ser Ala Val Ala Phe His Ser Val Leu Gly Tyr Leu Cys Phe Leu Ala Leu Gly Ser Tyr Thr Met Ala Phe Leu Ser Arg Asn Leu Pro Asp Thr Phe Asn Glu Ser Lys Phe Ile Ser Leu Ser Met Leu Val Phe Phe Cys Val Trp Ile Thr Phe Leu Pro Val Tyr His Ser Thr Lys Gly Lys Val (2) INFORMATION FOR SEQ ID N0:13:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1889 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:13:
ATGGCGACGA
AGGACACATC
TCGAAGTCTTCTGCATCCAAGCCGAATTC lgg9 (2) INFORMATION FOR SEQ ID N0:14:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 604 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:14:
Ser Leu Ser Leu Ala Ile Val Ser Leu Met Val His Phe Arg Trp Ser Trp Val Gly Leu Ile Leu Pro Asp Asp His Lys Gly Asn Lys Ile Leu Ser Asp Phe Arg Lys Glu Met Glu Arg Lys Arg Ile Cys Thr Ala Phe Val Lys Met Ile Pro Ala Thr Trp Thr Ser Ser Phe Val Lys Phe Trp Glu Asn Met Asp Asp Thr Asn Ile Ile Ile Ile Tyr Gly Asp Ile Asp Ser Leu Glu Gly Pro Met Arg Asn Ile Gly Gln Arg Leu Leu Thr Trp His Val Trp Val Met Asn Ile Glu Pro His Ile Ile Glu Tyr Asp Asn . 100 105 110 Tyr Phe Met Leu Asp Ser Phe His Gly Ser Leu Ile Phe Lys His Asn Tyr Arg Glu Asn Phe Glu Phe Thr Lys Phe Ile Arg Thr Val Asn Pro Lys Lys Tyr Pro Glu Asp Ile Tyr Leu Pro Lys Met Trp Tyr Leu Phe Phe Met Cys Ser Phe Ser Asp Ile Asn Cys Gln Val Leu Asp Ser Cys Gln Thr Asn Ala Ser Leu Asp Met Leu Pro Ser Gln Ile Phe Asp Val Val Met Ser Glu Glu Ser Thr Ser Ile Tyr Asn Ala Val Tyr Ala Val Ala His Ser Leu His Glu Met Arg Leu Gln Gln Leu Gln Thr Gln Pro Cys Glu Asn Glu Glu Gly Met Glu Phe Phe Pro Trp Gln Leu Asn Thr Phe Leu Lys Asp Ile Glu Val Arg Val Asn Ser Leu Asp Trp Arg Gln Arg Ile Asp Ala Glu Tyr Asp Ile Leu Asn Leu Trp Asn Leu Pro Lys Gly Leu Gly Leu Lys Val Lys Ile Gly Asn Phe Tyr Ala Asn Ala Pro Gln Gly Gln Gln Leu Ser Leu Ser Glu Gln Met Ile Gln Trp Pro Glu Ile Phe Ser Glu Val Pro Gln Ser Val Cys Ser Glu Ser Cys Arg Pro Gly Phe Arg Lys Val Ser Leu Asp Asp Lys Ala Ile Cys Cys Tyr Lys Cys Thr Pro Cys Ala Asp Asn Glu Ile Ser Asn Glu Thr Asp Val Asp Gln Cys Val Lys Cys Pro Glu Ser His Tyr Ala Asn Thr Glu Lys Ser Asn Cys Phe Pro Lys Ser Val Ser Phe Leu Ala Tyr Glu Asp Pro Leu Gly Met Ala Leu Ala Ser Ile Ala Leu Cys Leu Ser Ala Leu Thr Val Phe Val Ile Gly Ile Phe Val Lys Asn Arg Asp Thr Pro Ile Val Lys Ala Asn Asn Arg Thr Leu Ser Tyr Ile Leu Leu Ile Thr Leu Thr Phe Cys Phe Leu Cys Ser Leu Asn Phe Ile Gly Gln Pro Asn Thr Ala Ala Cys Ile Leu Gln Gln Thr Thr Phe Ala Val Ala Phe Thr Met Ala Leu Ala Thr Val Leu Ala Lys Ala Ile Thr Val Val Leu Ala Phe Lys Ile Ser Phe Pro Gly Arg Met Leu Arg Trp Leu Met Ile Ser Arg Gly Pro Arg Tyr Ile Ile Pro Ile Cys Thr Leu Ile Gln Leu Leu Leu Cys Gly Ile Trp Met Ala Thr Ser Pro Pro Phe Ile Asp Gln Asp Val Asn Thr Glu Asp Gly Tyr Ile Ile Leu Leu Cys Asn Lys Gly Ser Ala Val Ala Phe His Ser Val Leu Gly Tyr Leu Cys Phe Leu Ala Leu Gly Ser Tyr Thr Met Ala Phe Leu Ser Arg Asn Leu Pro Asp Thr Phe Asn Glu Ser Lys Phe Leu Ser Phe Ser Met Leu Val Phe Phe Cys Val Trp Val Thr Phe Leu Pro Val Tyr His Ser Thr Lys Gly Lys Val (2) INFORMATION FOR SEQ ID N0:15:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2561 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 80...349 (D) OTHER INFORMATION: VR8 (xi) SEQUENCE DESCRIPTION: SEQ ID
N0:15:
TACATCAGAA
CTC TGT GCT TTC ACG ATT TCA TTG
Met Lys Lys Leu Cys Ala Phe Thr Ile Ser Leu TGC TGT AGT
Leu Phe Leu Lys Phe Ser Leu Ile Leu Trp Ser Glu Pro Cys Cys Ser GAT AAT CAA
Cys Phe Trp Arg Ile Lys Asn Ser Asp Asp Gly Asp Leu Asp Asn Gln GCT GAT GAT
Arg Glu Cys His Phe Tyr Leu Gly Ala Thr Pro Val Glu Ala Asp Asp AGG TTT TTA
Asn Phe Tyr Ser Ser Leu Leu Lys Phe Ser Leu Rsp His Arg Phe Leu TGC CCC TAGCC
Ile Leu Thr Tyr Ala Thr Met Thr Gly Met Ser Ile Arg Cys Pro GTCTCCTTGA
CAGGGTATTC
GCTTTTGTTA
GATCAACAAA
ACTCTAGAAG
ACCTCACAAT
ACTATCACTT
ATGAACACTG
AATTGTTCAA
GAATGGACAT
AATGCTGTTT
CAGAAAAAGG
TCCGTGTGTA
GACTGCTGCT
GAACAGTGTG
TCAAGAGCTG
GCACTGTCCT
ACTCCCATTG
TTCTGCTTTC
CAGCAGACCA
ATAACTGTGG
ATGACAGGGG
GGAATCTGGT
AAGATTGTCA
(2) INFORMATION FOR SEQ ID N0:16:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 90 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:16:
Met Lys Lys Leu Cys Ala Phe Thr Ile Ser Leu Leu Phe Leu Lys Phe Ser Leu Ile Leu Cys Cys Trp Ser Glu Pro Ser Cys Phe Trp Arg Ile Lys Asn Ser Asp Asp Asn Asp Gly Asp Leu Gln Arg Glu Cys His Phe Tyr Leu Gly Ala Ala Asp Thr Pro Val Glu Asp Asn Phe Tyr Ser Ser Leu Leu Lys Phe Arg Phe Ser Leu Asp Hia Leu Ile Leu Thr Tyr Ala Thr Met Thr Gly Cys Pro Met Ser Ile Arg (2) INFORMATION FOR SEQ ID N0:17:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2734 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence _ (B) LOCATION: 80...1387 (D) OTHER INFORMATION: VR9 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:17:
Met Lys Lys Leu Cys Ala Phe Thr Ile Ser Leu Leu Phe Leu Lys Phe Ser Leu Ile Leu Cys Cys Trp Ser Glu Pro Ser.
AAG GAT GAT
CysPhe TrpArgIle AsnSer AspAsnAsp Gly Leu Gln Lys Asp Asp TAC GCA GTT
ArgGlu CysHisPhe LeuGly AlaAspThr Pro Glu Asp Tyr Ala Val CTT TTT AGT
AsnPhe TyrSerSer LeuLys ArgIleAla Ala Glu Tyr Leu Phe Ser ATG GCT AAC
GluPhe LeuLeuVal PhePhe IleAspGlu Ile Arg Asn Met Ala Asn AAC TTG ATT
ProTyr LeuLeuPro IleThr MetPheSer Phe Gly Gly Asn Leu Ile TTG ATG ACA
AsnCys GlnAspLeu ArgVal AspGlnAla Tyr Gln Ile Leu Met Thr TTT TAT GAT
AsnGly HisMetAsn ValAsn PheCysTyr Leu Asp Ser Phe Tyr Asp ACA TCA TTA
CysAla IleGlyLeu GlyPro TrpLysThr Ser Lys Leu Thr Ser Leu ATG GTT TTT
AlaMet HisSerSer ProLeu PhePheGly Pro Asn Pro Met Val Phe GAC CCC GTA
AsnLeu ArgAspHis ArgLeu HisValHis Gln Ala Pro Asp Pro Val TCC ATG TTT
LysAsp ThrHisLeu HisGly ValSerLeu Met His Phe Ser Met Phe GGA ATC CAG
ArgTrp ThrTrpIle MetVal SerAspAsp Asp Gly Ile Gly Ile Gln TTA GAA GGG
GlnPhe LeuSerAsp ArgGlu SerGlnArg His Ile Cys Leu Glu Gly ATG GAA TAC
LeuAla PheValAsn IlePro AsnMetGln Ile Met Thr Met Glu Tyr GAT ATT GCA
ArgAla ThrIleTyr GlnGln MetThrSer Ser Lys Val Asp Ile Ala GAA TCT AGC
ValIle IleTyrGly MetAsn ThrLeuGlu Val Phe Arg Glu Ser Ser TGG GCT ATC ACA ACC
GAA TCA CAA
ArgTrp Glu Leu Gly Arg Arg Trp Ile Thr Gln Glu Ala Ile Thr Ser GTC AAA TTC AAT TTC
TrpAsp Ile Thr Asn Lys Asp Thr Leu Leu His Val Lys Phe Asn Phe ATC CAC GTT CCT TTA
GlyThr Thr Phe Ala His Arg Glu Ile Lys Asn Ile His Val Pro Leu ATG AAC AAA GTA ATT
LysPhe Gln Thr Met Thr Ala Tyr Pro Asp Ser Met Asn Lys Val Ile ATA AAT AAT ATA AAG
HisThr Leu Glu Trp Tyr Phe Cys Ser Ser Asn Ile Asn Asn Ile Lys AGA ATT AAC TTG TGG
SerIle Met His His Thr Phe Asn Thr Glu Thr Arg Ile Asn Leu Trp CAC ATG AGT GGT AGT
SerLeu Asn Tyr Asp Ala Met Asp Glu Tyr Leu His Met Ser Gly Ser GCT GTG ACC GAA ATT
TyrAsn Val Tyr Ala Ala His Tyr His Tyr Phe Ala Val Thr Glu Ile GTA AAA AAA AGA TTC
GlnGln Glu Ser Gln Lys Ala Pro Lys Tyr Thr Val Lys Lys Arg Phe CAG AAC TGAGGTGTCC
AGATGATAAG
TATGCCA
AlaCys Gln Ile Trp Ser Val Gln Asn ', TCACATTTGT GAAACACAAC GATACTCCCA TTGTGAAGGC CAATAACCGCATTCTCAGCT1594 ACATCCTGCT CATCTCTCTC GTCTTCTGC~ TTCTCTGCTC CCTGCTCTTC ATTGGACCTC1654 ', GAGATATACA ATCTGAGCAT GGGAAGATTG TCATTCTTTG CAATAAAGGCTCAGTCATTG1954 _ TCATGGTGGT TGTGGAGGTT TTCTCCATCT TGGCTTCTAG TGCAGGGTTG CTAATGTGTA2194 (2) INFORMATION FOR SEQ ID N0:18:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 436 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:18:
Met Lys Lys Leu Cys Ala Phe Thr Ile Ser Leu Leu Phe Leu Lys Phe Ser Leu Ile Leu Cys Cys Trp Ser Glu Pro Ser Cys Phe Trp Arg Ile Lys Asn Ser Asp Asp Asn Asp Gly Asp Leu Gln Arg Glu Cys His Phe Tyr Leu Gly Ala Ala Asp Thr Pro Val Glu Asp Asn Phe Tyr Ser Ser Leu Leu Lys Phe Arg Ile Ala Ala Ser Glu Tyr Glu Phe Leu Leu Val Met Phe Phe Ala Ile Asp Glu Ile Asn Arg Asn Pro Tyr Leu Leu Pro Asn Ile Thr Leu Met Phe Ser Phe Ile Gly Gly Asn Cys Gln Asp Leu Leu Arg Val Met Asp Gln Ala Tyr Thr Gln Ile Asn Gly His Met Asn Phe Val Asn Tyr Phe Cys Tyr Leu Asp Asp Ser Cys Ala Ile Gly Leu Thr Gly Pro Ser Trp Lys Thr Ser Leu Lys Leu Ala Met His Ser Ser Met Pro Leu Val Phe Phe Gly Pro Phe Asn Pro Asn Leu Arg Asp His Asp Arg Leu Pro His Val His Gln Val Ala Pro Lys Asp Thr His Leu Ser His Gly Met Val Ser Leu Met Phe His Phe Arg Trp Thr Trp Ile Gly Met Val Ile Ser Asp Asp Asp Gln Gly Ile Gln Phe Leu Ser Asp Leu Arg Glu Glu Ser Gln Arg His Gly Ile Cys Leu Ala Phe Val Asn Met Ile Pro Glu Asn Met Gln Ile Tyr Met Thr Arg Ala Thr Ile Tyr Asp Gln Gln Ile Met Thr Ser Ser Ala Lys Val Val Ile Ile Tyr Gly Glu Met Asn Ser Thr Leu Glu Val Ser Phe Arg Arg Trp Glu Glu Leu Gly Ala Arg Arg Ile Trp Ile Thr Thr Ser Gln Trp Asp Val Ile Thr Asn Lys Lys Asp Phe Thr Leu Asn Leu Phe His Gly Thr Ile Thr Phe Ala His His Arg Val Glu Ile Pro Lys Leu Asn Lys Phe Met Gln Thr Met Asn Thr Ala Lys Tyr Pro Val Asp Ile Ser His Thr Ile Leu Glu Trp Asn Tyr Phe Asn Cys Ser Ile Ser Lys Asn Ser Ile Arg Met His His Ile Thr Phe Asn Asn Thr Leu Glu Trp Thr Ser Leu His Asn Tyr Asp Met Ala Met Ser Asp Glu Gly Tyr Ser Leu Tyr Asn Ala Val Tyr Ala Val Ala His Thr Tyr His Glu Tyr Ile Phe Gln Gln Val Glu Ser Gln Lys Lys Ala Lys Pro Lys Arg Tyr Phe Thr Ala Cys Gln Gln Ile Trp Asn Ser Val i 435 (2) INFORMATION FOR SEQ ID N0:19:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2732 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/REY: Coding Sequence (B) LOCATION: 80...1375 II (D) OTHER INFORMATION: VR10 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:19:
Met Lys Lys Leu Cys Ala Phe Thr Ile Ser Phe Leu Ser Leu Lys Phe Ser Leu Ile Leu Cys Cys Leu Thr Glu Ala Ser Cys Phe Trp Arg Ile Lys Asn Ser Glu Asp Ser Asp Gly Asp Leu Gln Arg Glu Cys His Phe Tyr Leu Trp Val Ile Asp Lys Pro Ile Glu Asp Asn Phe Tyr Asn Ser Val Leu Asn Phe Arg Ile Ser Ala Ser Glu Tyr Glu Phe Leu Leu Val Met Phe Phe Ala Thr Asp Glu Ile Asn Lys Asn ', CCT TAT TTA AAC GTT GGT 400 CTT CCC ATA GGT
ACT
TTG
ATA
TTC
AGC
ATC
Pro TyrLeuLeu ProAsn ThrLeuIlePhe SerIleVal Gly Ile Gly j 95 100 105 ', CAC TGTCATGAT TTATTG GGTCTGGATCAA TCATATACA ATA 448 AGA CAA
I His CysHisAsp LeuLeu GlyLeuAspGln SerTyrThr Ile Arg Gln GTT GAT
Asn GlyArgVal AsnPhe AsnTyrPheCys TyrLeuAsp Ser Val Asp . 125 130 135 GGA AAA
Cys AsnIleGly LeuThr ProSerTrpLys LysSerLeu Leu Gly Lys ', 140 145 150 155 Ala Met Asp Ser Ser Ile Pro Met Val Phe Phe Gly Pro Phe Asn Pro CGC CAT CCC
Asn Leu AspHis Asp Arg Leu Pro Val His Gln Val Ala Arg His Pro ACA GTC TTT
Lys Asp HisLeu Ser His Gly Met Ser Leu Met Phe His Thr Val Phe ACT TCA ATT
Arg Trp TrpIle Gly Leu Val Ile Asp Asp Asp Gln Gly Thr Ser Ile CTC AGC TGT
Gln Phe SerAsp Leu Arg Glu Glu Gln Arg His Gly Ile Leu Ser Cys TTT AAC ACA
Leu Ala ValAsn Met Ile Pro Glu Met Gln Ile Tyr Met Phe Asn Thr ACA ATG GTT
Arg Ala IleTyr Asp Lys Gln Ile Thr Ser Ser AIa Lys Thr Met Val ATT ACT AGA
Val Ile TyrGly Glu Met Asn Ser Leu Glu Val Ser Phe Ile Thr Arg GAA ATC CAA
Arg Trp AspLeu Gly Ala Arg Arg Trp Ile Thr Thr Ser Glu Ile Gln ATC TTC CAT
Trp Asp IleLeu Asn Lys Lys Glu Thr Leu Asn Leu Phe Ile Phe His ATC GTT AGG
Gly Pro ThrPhe Ala His His Lys Glu Ile Pro Lys Leu Ile Val Arg ATG AAA TCT
Asn Phe GlnThr Met Asn Thr Ala Tyr Pro Val Asp Ile Met Lys Ser ATA AAT AAC
His Thr LeuGlu Trp Asn Tyr Phe Cys Ser Ile Ser Lys Ile Asn Asn AAA AAC ACA
Ser Ser MetAsp Leu Phe Thr Ser Asn Thr Leu Glu Trp Lys Asn Thr CAC AGT TTG
Ala Leu AsnTyr Asp Met Ala Met Asp Glu Gly Tyr Asn His Ser Leu GCT ACC CTT
Tyr Asn ValTyr Val Ala Ala His Tyr His Glu His Ile Ala Thr Leu GTA GAA ACT
Gln Gln GluSer Gln Lys Lys Val His Asn Arg Tyr Phe Val Glu Thr CAG CCAGATGATA AGTATGCCAA
.C
Val Cys Gln Gln Ile ,. GGGATGGCTC TAGGCTGCAT GGCACTATCC TTCTCGGCCATCACAATTCTAGTACTAGTC1536 ' TCTACAGTGT TGGCCARAAC AATAACTGTG GTCATGGCTTTCAAGTTCACTACTCCAGGA1776 TATGACAAAG GTACATAAAT AAATAAACAC TTTCCCCACC1?~WAAAAAAAAAA,AAA 2732 (2) INFORMATION FOR SEQ ID N0:20:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 432 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:20:
Met Lys Lys Leu Cys Ala Phe Thr Ile Ser Phe Leu Ser Leu Lys Phe Ser Leu Ile Leu Cys Cys Leu Thr Glu Ala Ser Cys Phe Trp Arg Ile Lys Asn Ser Glu Asp Ser Asp Gly Asp Leu Gln Arg Glu Cys His Phe Tyr Leu Trp Val Ile Asp Lys Pro Ile Glu Asp Asn Phe Tyr Asn Ser Val Leu Asn Phe Arg Ile Ser Ala Ser Glu Tyr Glu Phe Leu Leu Val Met Phe Phe Ala Thr Asp Glu Ile Asn Lys Asn Pro Tyr Leu Leu Pro Asn Ile Thr Leu Ile Phe Ser Ile Val Gly Gly His Cys His Asp Leu Leu Arg Gly Leu Asp Gln Ser Tyr Thr Gln Ile Asn Gly Arg Val Asn Phe Val Asn Tyr Phe Cys Tyr Leu Asp Asp Ser Cys Asn Ile Gly Leu Thr Gly Pro Ser Trp Lys Lys Ser Leu Lys Leu Ala Met Asp Ser Ser Ile Pro Met Val Phe Phe Gly Pro Phe Asn Pro Asn Leu Arg Asp His Asp Arg Leu Pro His Val His Gln Val Ala Pro Lys Asp Thr His Leu Ser His Gly Met Val Ser Leu Met Phe His Phe Arg Trp Thr Trp Ile Gly Leu Val Ile Ser Asp Asp Asp Gln Gly Ile Gln Phe Leu Ser Asp Leu Arg Glu Glu Ser Gln Arg His Gly Ile Cys Leu Ala Phe Val Asn Met Ile Pro Glu Asn Met Gln Ile Tyr Met Thr Arg Ala Thr Ile Tyr Asp Lys Gln Ile Met Thr Ser Ser Ala Lys Val val Ile Ile Tyr Gly Glu Met Asn Ser Thr Leu Glu Val Ser Phe Arg Arg Trp Glu Asp Leu Gly Ala Arg Arg Ile Trp Ile Thr Thr Ser Gln Trp Asp Ile Ile Leu Asn Lys Lys Glu Phe Thr Leu Asn Leu Phe His Gly Pro Ile Thr Phe Ala His His Lys Val Glu Ile Pro Lys Leu Arg Asn Phe Met Gln Thr Met Asn Thr Ala Lys Tyr Pro Val Asp Ile Ser His Thr Ile Leu Glu Trp Asn Tyr Phe Asn Cys Ser Ile Ser Lys Asn Ser Ser Lys Met Asp Leu Phe Thr Ser Asn Asn Thr Leu Glu Trp Thr Ala Leu His Asn Tyr Asp Met Ala Met Ser Asp Glu Gly Tyr Asn Leu Tyr Asn Ala Val Tyr Val Ala Ala His Thr Tyr His Glu His Ile Leu Gln Gln Val Glu Ser Gln Lys Lys Val Glu His Asn Arg Tyr Phe Thr Val Cys Gln Gln Ile (2) INFORMATION FOR SEQ ID N0:21:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2962 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 81...1601 (D) OTHER INFORMATION: VR11 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:21:
AATGTGTGTG
TGATGTTTTT
CTACATCAGA
AACGGATTTC
ACAACAACTC
ATG
AAG
AAG
CTC
TGT
GCT
TTC
ACT
ATT
TCA
Met Lys Lys Leu Cys Ala Phe Thr Ile Ser TTG AAG TGC GCA
Phe SerLeu Phe Ser Leu Ile Leu Cys Leu Thr Glu Leu Lys Cys Ala TGC AGG GAT TTG
Ser PheTrp Ile Lys Asn Ser Glu Ser Asp Gly Asp Cys Arg Asp Leu AGA CAT ATT GAA
Gln GluCys Phe Tyr Leu Trp Val Asp Lys Pro Ile Arg His Ile Glu AAT AAT AGA GAA .
W0.99/00422 PC'T/IJS98l13680 Asp Asn Phe Tyr Asn Ser Val Leu Asn Phe Arg Ile Ser Ala Ser Glu Tyr Glu Phe Leu Leu Val Met Phe Phe Ala Thr Asp Glu Ile Asn Lys Asn Pro Tyr Leu Leu Pro Asn Ile Thr Leu Ile Phe Ser Ile Val Gly Gly His Cys His Asp Leu Leu Arg Gly Leu Asp Gln Ser Tyr Thr Gln Ile Asn Gly Arg Val Asn Phe Val Asn Tyr Phe Cys Tyr Leu Asp Asp Ser Cys Asn Ile Gly Leu Thr Gly Pro Ser Trp Lys Lys Ser Leu Lys Leu Ala Met Asp Ser Ser Ile Pro Met Val Phe Phe Gly Pro Phe Asn Pro Asn Leu Arg Asp His Asp Arg Leu Pro His Val His Gln Val Ala ', CCC AAG GAC ACA CAT TTA TCC CAT GGC ATG GTC TCC TTG ATG TTT CAT 686 ', Pro Lys Asp Thr His Leu Ser His Gly Met Val Ser Leu Met Phe His ', 190 195 200 Phe Arg Trp Thr Trp Ile Gly Leu Val Ile Ser Asp Asp Asp Gln Gly Ile Gln Phe Leu Ser Asp Leu Arg Glu Glu Ser Gln Arg His Gly Ile Cys Leu Ala Phe Val Asn Met Ile Pro Glu Asn Met Gln Ile Tyr Met Thr Arg Ala Thr Ile Tyr Asp Lys Gln Ile Met Thr Ser Ser Ala Lys I _ Val Val Ile Ile Tyr Gly Glu Met Asn Ser Thr Leu Glu Val Ser Phe Arg Arg Trp Glu Asp Leu Gly Ala Arg Arg Ile Trp Ile Thr Thr Ser Gln Trp Asp Ile Ile Leu Asn Lys Lys Glu Phe Thr Leu Asn Leu Phe His Gly Pro Ile Thr Phe Ala His His Lys Val Glu Ile Pro Lys Leu .
AAT CAA GCC TAC GAT
Arg Phe Met Thr MetAsnThr Lys Pro Val Ile Asn Gln Ala Tyr Asp CAT CTG TTT TGT TCT
Ser Thr Ile Glu TrpAsnTyr Asn Ser Ile Lys His Leu Phe Cys Ser AGC ATG TCC AAC GAA
Asn Ser Lys Asp LeuPheThr Asn Thr Leu Trp Ser Met Ser Asn Glu GCA AAC ATG GAT TAC
Thr Leu His Tyr AspMetAla Ser Glu Gly Asn Ala Asn Met Asp Tyr TAT GTT CAC TAC CAC
Leu Asn Ala Tyr ValAlaAla Thr His Glu Ile Tyr Val His Tyr His CAA GAG GTA CAC TAT
Leu Gln Val Ser GlnLysLys Glu Asn Arg Phe Gln Glu Val His Tyr GTT CAG ATG ACC TTT
Thr Cys Gln Val SerSerLeu Lys Arg Val Thr Val Gln Met Thr Phe CCG GAA AAG AGG CAG
Asn Val Gly Leu ValAsnMet His Glu Asn Cys Pro Glu Lys Arg Gln GAG ATT AAT CCA CTT
Thr Tyr Asp Phe IleIleTrp Phe Gln Gly Gly Glu Ile Asn Pro Leu AAA ATA CCT TTT AGT
Leu Leu Lys Gly SerTyrIle Cys Pro Lys Gln Lys Ile Pro Phe Ser CTT TCT TGG ATG ACA
Gln His Ile Asp AspLeuGlu Ala Gly Gly Ser Leu Ser Trp Met Thr TAGAACAGTG ACCTAC
TGTGAAATGT
CCAGATGATA
AGTATGCCAA
Ile - lOS -AATTAAGTAA TATACAGATT
TAAATAAATA AACACTTTCC CCACAAAAAAAAAAAAP.AAA AAAAA 2962 (2) INFORMATION FOR SEQ ID N0:22:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 507 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:22:
Met Lys Lys Leu Cys Ala Phe Thr Ile Ser Phe Leu Ser Leu Lys Phe Ser Leu Ile Leu Cys Cys Leu Thr Glu Ala Ser Cys Phe Trp Arg Ile Lys Asn Ser Glu Asp Ser Asp Gly Asp Leu Gln Arg Glu Cys His Phe Tyr Leu Trp Val Ile Asp Lys Pro Ile Glu Asp Asn Phe Tyr Asn Ser Val Leu Asn Phe Arg Ile Ser Ala Ser Glu Tyr Glu Phe Leu Leu Val Met Phe Phe Ala Thr Asp Glu Ile Asn Lys Asn Pro Tyr Leu Leu Pro Asn Ile Thr Leu Ile Phe Ser Ile Val Gly Gly His Cys His Asp Leu Leu Arg Gly Leu Asp Gln Ser Tyr Thr Gln Ile Asn Gly Arg Val Asn Phe Val Asn Tyr Phe Cys Tyr Leu Asp Asp Ser Cys Asn Ile Gly Leu Thr Gly Pro Ser Trp Lys Lys Ser Leu Lys Leu Ala Met Asp Ser Ser Ile Pro Met Val Phe Phe Gly Pro Phe Asn Pro Asn Leu Arg Asp His Asp Arg Leu Pro His Val His Gln Val Ala Pro Lys Asp Thr His Leu Ser His Gly Met Val Ser Leu Met Phe His Phe Arg Trp Thr Trp Ile Gly Leu Val Ile Ser Asp Asp Asp Gln Gly Ile Gln Phe Leu Ser Asp Leu Arg Glu Glu Ser Gln Arg His Gly Ile Cys Leu Ala Phe Val Asn Met Ile Pro Glu Asn Met Gln Ile Tyr Met Thr Arg Ala Thr Ile Tyr Asp Lys Gln Ile Met Thr Ser Ser Ala Lys Val Val Ile Ile Tyr Gly Glu Met Asn Ser Thr Leu Glu Val Ser Phe Arg Arg Trp Glu Asp Leu Gly Ala Arg Arg Ile Trp Ile Thr Thr Ser Gln Trp Asp Ile Ile Leu Asn Lys Lys Glu Phe Thr Leu Asn Leu Phe His Gly Pro Ile Thr Phe Ala His His Lys Val Glu Ile Pro Lys Leu Arg Asn Phe Met Gln Thr Met Asn Thr Ala Lys Tyr Pro Val Asp Ile Ser His Thr Ile Leu Glu Trp Asn Tyr Phe Asn Cys Ser Ile Ser Lys Asn Ser Ser Lys Met Asp Leu Phe Thr Ser Asn Asn Thr Leu Glu Trp Thr Ala Leu His Asn Tyr Asp Met Ala Met Ser Asp Glu Gly Tyr Asn Leu Tyr Asn Ala Val Tyr Val Ala Ala His Thr Tyr His Glu His Ile Leu Gln Gln Val Glu Ser Gln Lys Lys Val Glu His Asn Arg Tyr Phe Thr Val Cys Gln Gln Val Ser Ser Leu Met Lys Thr Arg Val Phe Thr Asn Pro Val Gly Glu Leu Val Asn Met Lys His Arg Glu Asn Gln Cys Thr Glu Tyr Asp Ile Phe Ile Ile Trp Asn Phe Pro Gln Gly Leu Gly Leu Lys Leu Lys Ile Gly Ser Tyr Ile Pro Cys Phe Pro Lys Ser Gln Gln Leu His Ile Ser Asp Asp Leu Glu Trp Ala Met Gly Gly Thr Ser Ile (2) INFORMATION FOR SEQ ID N0:23:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2821 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 60...992 (D) OTHER INFORMATION: VR12 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:23:
Me t Lys Gln Leu Cys Thr Phe Thr Ile Ser Leu Leu Phe Leu Lys Phe Se r Leu Ile Leu Cys Cys Trp Ser Glu Pro Ser Cys Phe Trp Arg Ile Ly s Lys Ser Glu Asp Asn Asp Gly Asp Leu Gln Arg Glu Cys His Phe Ty r Leu Trp Lys Thr Asp Glu Pro Ile Glu Asp Ser Phe Tyr Asn Tyr As p Leu Ser Phe Arg Ile Ala Gly Ser Glu Tyr Glu Leu Leu Leu Val Me t Phe Phe Ala Thr Asp Glu Ile Asn Lys Asn Pro Tyr Leu Leu Pro As.
i~ ACA TGA GTT TGA TGT TCT CCA TCA TTG GTG GAA ACT GTC ATG ATT TAT 396 n Met Ser Leu Met Phe Ser Ile Ile Gly Gly Asn Cys His Asp Leu Le a Arg Ser Leu Asp Gln Glu Tyr Ala Gln Ile Asp Gly His Met Asn Ph a Val Asn Tyr Phe Cys Tyr Leu Asp Asp Ser Cys Ala Thr Gly Leu Th r Gly Pro Ser Trp Lys Thr Ser Leu Lys Leu Ala Met His Ser Ser Me t Pro Leu Val Phe Phe Gly Pro Phe Asn Pro Asn Leu Arg Asp His As ', p Arg Leu Pro His Val His Gln Val Ala Pro Lys Asp Thr His Leu Se r His Gly Met Val Ser Leu Met Phe His Phe Arg Trp Thr Trp Ile Gl y Leu Val Ile Ser Asp Asp Asp Gln Gly Ile Gln Phe Leu Ser Asp Le a Arg Glu Glu Ser Gln Arg His Gly Ile Cys Leu Ala Phe Val Asn Me ', 25 230 235 240 t Ile Pro Glu Asn Met Gln Ile Tyr Met Thr Arg Ala Thr Ile Tyr As p Thr Gln Ile Met Thr Ser Ser Ala Lys Val Val Ile Ile Tyr Gly As ',, p Met Asn Ser Thr Leu Glu Ala Ser Phe Arg Arg Trp Glu Glu Leu ', 275 280 285 y Ala Arg Arg Ile Trp Ile Thr Thr Thr Gln Trp Asp Val Ile Thr As n Lys Lys Arg Leu His Pro TGAGGTTTCC
AATGAAACAG
ATATGGAACA
(2) INFORMATION FOR SEQ ID N0:24:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 311 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:24:
Met Lys Gln Leu Cys Thr Phe Thr Ile Ser Leu Leu Phe Leu Lys Phe Ser Leu Ile Leu Cys Cys Trp Ser Glu Pro Ser Cys Phe Trp Arg Ile Lys Lys Ser Glu Asp Asn Asp Gly Asp Leu Gln Arg Glu Cys His Phe Tyr Leu Trp Lys Thr Asp Glu Pro Ile Glu Asp Ser Phe Tyr Asn Tyr Asp Leu Ser Phe Arg Ile Ala Gly Ser Glu Tyr Glu Leu Leu Leu Val Met Phe Phe Ala Thr Asp Glu Ile Asn Lys Asn Pro Tyr Leu Leu Pro Asn Met Ser Leu Met Phe Ser Ile Ile Gly Gly Asn Cys His Asp Leu Leu Arg Ser Leu Asp Gln Glu Tyr Ala Gln Ile Asp Gly His Met Asn Phe Val Asn Tyr Phe Cys Tyr Leu Asp Asp Ser Cys Ala Thr Gly Leu Thr Gly Pro Ser Trp Lys Thr Ser Leu Lys Leu Ala Met His Ser Ser Met Pro Leu Val Phe Phe Gly Pro Phe Asn Pro Asn Leu Arg Asp His Asp Arg Leu Pro His Val His Gln Val Ala Pro Lys Asp Thr His Leu Ser His Gly Met Val Ser Leu Met Phe His Phe Arg Trp Thr Trp Ile Gly Leu Val Ile Ser Asp Asp Asp Gln Gly Ile Gln Phe Leu Ser Asp Leu Arg Glu Glu Ser Gln Arg His Gly Ile Cys Leu Ala Phe Val Asn Met Ile Pro Glu Asn Met Gln Ile Tyr Met Thr Arg Ala Thr Ile Tyr Asp Thr Gln Ile Met Thr Ser Ser Ala Lys Val Val Ile Ile Tyr Gly Asp Met Asn Ser Thr Leu Glu Ala Ser Phe Arg Arg Trp Glu Glu Leu Gly Ala Arg Arg Ile Trp Ile Thr Thr Thr Gln Trp Asp Val Ile Thr Asn Lys Lys Arg Leu His Pro {2) INFORMATION FOR SEQ ID N0:25:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2773 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
{A) NAME/KEY: Coding Sequence {B) LOCATION: 3...1238 (D) OTHER INFORMATION: VR13 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:25:
CGG ATA AAG AAT AGT GAA
GAT AAT GAT GGA
Ala Ser Cys Phe Trp Arg Ile Lys Asn Ser Glu Asp Asn Asp Gly AAA CCA
Asp Leu Gln Arg Glu Cys His Phe Tyr Leu Gly Ala Val Asp Lys Pro GCA GCA
Ile Glu Asp Asn Phe Tyr Asn Ser Leu Leu Lys Phe Arg Ile Ala Ala GAG ATC
Ser Glu Tyr Glu Phe Leu Leu Val Met Phe Phe Ala Thr Asp Glu Ile TCC ATC
Asn Lys Asn Pro Tyr Leu Leu Pro Asn Ile Thr Leu Met Phe Ser Ile GCA TAT
Ile Gly Gly Asn Cys His Asp Leu Leu Arg Gly Leu Asp Gln Ala Tyr TAT TTA
Thr Gln Ile Asn Gly His Met Asn Phe Val Asn Tyr Phe Cys Tyr Leu Asp Asp Ser Cys Ala Ile Gly Leu Thr Gly Pro Ser Trp Lys Thr Ser Leu Lys Leu Ala Met His Ser Ser Met Pro Leu Val Phe Phe Gly Ser .
CGG
PheAsn ProAsnLeu HisAspHis Asp Leu HisHisValHis Gln Arg CAT
ValAla ThrLysAsp ThrHisLeu Ser Gly IleValSerLeu Met His CTG
PheHis PheArgTrp ThrTrpIle Gly Val IleSerAspAsp Asp Leu AGA
LysGIy IleGlnPhe LeuSerAsp Leu Glu GluSerGlnArg His Arg ATC
GlyIle CysLeuAla PheValAsn Met Pro GluAsnMetGln Ile Ile AAA
TyrMet ThrArgAla ThrIleTyr Asp Gln IleMetThrSer Leu Lys ATG
AlaLys ValValIle IleTyrGly Glu Asn SerThrLeuGlu Val Met GCT
SerPhe ArgArgTrp GluAsnLeu Gly Arg ArgIleTrpIle Thr Ala AAA
ThrSer GlnTrpAsp ValIleThr Asn Lys GluPheThrLeu Asn Lys CAC
LeuPhe HisGlyThr IleThrPhe Ala Arg ArgPheGluIle Pro His AAC
LysPhe LysLysPhe MetGlnThr Met Thr AlaLysTyrPro Val Asn AAT
AspIle SerHisThr IleLeuGlu Trp Tyr PheAsnCysSer Ile Asn ATT
SerLys AsnSerSer LysMetAsp His Thr PheAsnAsnThr Leu Ile ATG
GluTrp ThrAlaLeu HisAsnTyr Asp Val MetSerAspGlu Gly Met GTG
TyrAsn LeuTyrAsn AlaValTyr Ala Ala HisThrTyrHis Glu Val AAA
HisIle PheGlnGln ValGluSer Gln Lys AlaLysProLys Arg Lys Phe Phe Thr Val Cys Gln Gln Gln Ile Trp Asn Ser Val TTTTCACTGTGTCTGTTTCTACAGTGTTGGCCAAAACAAT AACTGTGGTC ATGGCTTTCA1'610 (2) INFORMATION FOR SEQ ID N0:26:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 412 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:26:
Ala Ser Cys Phe Trp Arg Ile Lys Asn Ser Glu Asp Asn Asp Gly Asp Leu Gln Arg Glu Cys His Phe Tyr Leu Gly Ala Val Asp Lys Pro Ile Glu Asp Asn Phe Tyr Asn Ser Leu Leu Lys Phe Arg Ile Ala Ala Ser Glu Tyr Glu Phe Leu Leu Val Met Phe Phe Ala Thr Asp Glu Ile Asn Lys Asn Pro Tyr Leu Leu Pro Asn Ile Thr Leu Met Phe Ser Ile Ile Gly Gly Asn Cys His Asp Leu Leu Arg Gly Leu Asp Gln Ala Tyr Thr Gln Ile Asn Gly His Met Asn Phe Val Asn Tyr Phe Cys Tyr Leu Asp Asp Ser Cys Ala Ile Gly Leu Thr Gly Pro Ser Trp Lys Thr Ser Leu Lys Leu Ala Met His Ser Ser Met Pro Leu Val Phe Phe Gly Ser Phe Asn Pro Asn Leu His Asp His Asp Arg Leu His His Val His Gln Val Ala Thr Lys Asp Thr His Leu Ser His Gly Ile Val Ser Leu Met Phe His Phe Arg Trp Thr Trp Ile Gly Leu Val Ile Ser Asp Asp Asp Lys Gly Ile Gln Phe Leu Ser Asp Leu Arg Glu Glu Ser Gln Arg His Gly Ile Cys Leu Ala Phe Val Asn Met Ile Pro Glu Asn Met Gln Ile Tyr Met Thr Arg Ala Thr Ile Tyr Asp Lys Gln Ile Met Thr Ser Leu Ala Lys Val Val Ile Ile Tyr Gly Glu Met Asn Ser Thr Leu Glu Val Ser Phe Arg Arg Trp Glu Asn Leu Gly Ala Arg Arg Ile Trp Ile Thr Thr Ser Gln Trp Asp Val Ile Thr Asn Lys Lys Glu Phe Thr Leu Asn Leu Phe His Gly Thr Ile Thr Phe Ala His Arg Arg Phe Glu Ile Pro Lys Phe Lys Lys Phe Met Gln Thr Met Asn Thr Ala Lys Tyr Pro Val Asp Ile Ser His Thr Ile Leu Glu Trp Asn Tyr Phe Asn Cys Ser Ile Ser Lys Asn Ser Ser Lys Met Asp His Ile Thr Phe Asn Asn Thr Leu Glu Trp Thr Ala Leu His Asn Tyr Asp Met Val Met Ser Asp Glu Gly Tyr Asn Leu Tyr Aan Ala Val Tyr Ala Val Ala His Thr Tyr His Glu His Ile Phe Gln Gln Val Glu Ser Gln Lys Lys Ala Lys Pro Lys Arg Phe Phe Thr Val Cys Gln Gln Gln Ile Trp Asn Ser Val (2) INFORMATION FOR SEQ ID N0:27:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3108 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 116...2527 (D) OTHER INFORMATION: VR14 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:27:
TAAACATCTC
CTTTGCCTAA
AGAAATAAAA
GCTGGTAGAA
ATCTGATGTG
TGGCACTTCA AAAAG
CAATCCACAC ATG
TGCCCAGGTT
Met TTC GTC TTC AAT ACA
Phe Ile Met Glu Phe Leu Leu Ile LeuLeu Met Phe Val Phe Asn Thr TTC CCC TGC AGA AAT
Ala Asn Ile Asp Arg Phe Trp Ile LeuAsp Glu Phe Pro Cys Arg Asn GAT TTG TTA GCT ATC
Ile Met Glu Tyr Gly Ser Cys Phe LeuAla Ala Asp Leu Leu Ala Ile , !, GTT CAG ACA CCC GAA GATTAT TTCAACAAGACT CTTAAT GTT 310 ATT AAT
Val Gln Thr Pro Glu AspTyr PheAsnLysThr LeuAsn Val Ile Asn AAA CAC
Leu Lys Thr Thr Asn LysTyr AlaLeuAlaLeu ValPhe Ala Lys His AAC AAT
Met Asp Glu Ile Arg ProAsp LeuLeuProAsn MetSer Leu Asn Asn ACT GGC
Ile Ile Arg Tyr Leu ArgCys AspGlyLysThr ValIle Pro Thr Gly TTT AAA
', Thr Pro Tyr Leu Arg LysLys GluSerProIle ProAsn Tyr Phe Lys i ', TTC TGT AAT GAA ACT TGTTCC TATCTGCTTACA GGACCC CAT 550 GAG ATG
Phe Cys Asn Glu Thr CysSer TyrLeuLeuThr GlyPro His Glu Met TTA TTC
Trp Glu Val Ser Gly TrpLys HisMetAsnSer PheLeu Ser Leu Phe CAG ACC
Pro Arg Ile Leu Leu TyrGly ProPheHisSer IlePhe Ser Gln Thr TAT TAT
Asp Asp Glu Gln Pro LeuTyr GlnMetAlaPro LysAsp Thr Tyr Tyr Ser Leu Ala Leu Ala Met Val Ser Phe Ile Leu Tyr Phe Ser Trp Asn GTC GAT CAA TTT
ATT
Trp Ile GlyLeuVal Pro Asp Asp Gly Asn Gln Leu Ile Asp Gln Phe CAG AAC GAA GCC
Leu Glu LeuLysLys Ser Glu Lys Ile Cys Phe Phe Gln Asn Glu Ala GTT GTT TTT ACT
Val Lys MetIleSer Asp Asp Ser Pro Gln Asn Glu Val Val Phe Thr ATT TCA ACA ATC
Met Tyr TyrAsnGln Val Met Ser Asn Val Ile Ile Ile Ser Thr Ile AAT GAT ATC TGG
Tyr Gly GluThrTyr Phe Ile Leu Phe Arg Met Glu Asn Asp Ile Trp AGA ATC ACA AAT
', Pro Pro IleLeuGln Ile Trp Thr Lys Gln Leu Phe Arg Ile Thr Asn .
ACC ACA TTC TCA
AGG
AAA
AAA
GAC
ATA
AGT
ProThrArgLys Asp SerHis Gly PheTyr GlySerLeu Lys Ile Thr CAC GGT GGT
ThrPheLeuPro His ValIle Ser PheLys AsnPheVal His Gly Gly CAT AGA TTA
GlnThrTrpPhe Leu AsnThr Asp TyrLeu ValMetGln His Arg Leu TTT TAT GCA
GluTrpLysTyr Asn GluAsp Ser SerThr CysLysIle Phe Tyr Ala TCA AAT GAT
LeuLysAsnAsn Ser AlaSer Phe TrpLeu MetGluGln Ser Asn Asp ACC AGT CAT
LysPheAspMet Phe GluAsn Ser AsnIle TyrAsnAla Thr Ser His GCC GCC ATG
ValHisAlaIle His LeuHis Glu AsnLeu GlnGlnAla Ala Ala Met ATA AAT GAG
AspAsnGlnAla Asp GlyLys Lys ProSer SerSerHis Ile Asn Glu AAC TTT ATT
CysLeuLysVal Ser LeuArg Arg TyrPhe ThrAsnPro Asn Phe Ile GTG ATG GTA
ProGlyAspLys Phe LysGln Arg IleMet HisAspGlu Val Met Val CAC GTG CAA
TyrAspIleVal Phe AsnLeu Ser HisLeu GlyIleLys His Val Gln AAG AGC CCA
MetLysLeuGly Phe ProTyr Leu HisGly ArgHisSer Lys Ser Pro GAC ATT ACA
HisLeuTyrVal Arg GluLeu Ala GlyArg ArgLysMet Asp Ile Thr TGC GCT CCT
ProSerSerVal Ser AspCys Ser GlyPhe ArgArgLeu Cys Ala Pro ATG GCC GTT
TrpLysGluGly Ala CysCys Phe CysSer ProCysPro Met Ala Val TCT GAG GTA
GluAsnGluIle Asn ThrThr Val LeuCys ValPheVal Ser Glu Val ACT ATT AAT .
WO 99/~4Z2 PCT/US98/13680 Lys His His Asp Thr Pro Ile Val Lys Ala Asn Asn Arg Ser Leu Ser Tyr Leu Leu Ser Leu CysPheLeuCys SerPhe Phe Leu Met Met Ser CCA ATC
Phe IleGly Leu Asn ArgAla CysValLeuGln GlnIle Thr Pro Ile TTC GTT
Phe GlyIle Val Thr MetAla SerThrValLeu AlaLys Thr Phe Val CTG GTC
Val ThrVal Val Ala PheLys ThrAspProGly ArgArg Leu Leu Val GTA CCC
Arg AsnPhe Leu Ser GlyThr AsnTyrIleIle ProIle Cys Val Pro TGT GCA
Ser LeuLeu Gln Val LeuCys IleTrpLeuAla ValSer Pro Cys Ala ATT ACT
Pro PheVal Asp Asp GluHis LeuHisGlyHis IleIle Ile Ile Thr GGC GCA
Val CysAsn Lys Ser ValThr PheTyrCysIle LeuGly Tyr Gly Ala ', 690 695 700 705 GCA TTC
Leu AlaCys Leu Leu GlyAsn SerValAlaPhe LeuAla Lys Ala Phe ', AAT CTGCCT GAC TTC AATGAA AAGTTCTTGACC TTCAGC ATG 2326 ACA GCC
Asn LeuPro Asp Phe AsnGlu LysPheLeuThr PheSer Met Thr Ala AGT ACC
Leu ValPhe Cys Val TrpVal PheLeuProVal TyrHis Ser Ser Thr CAC GTG
', Thr LysGly Lys Met ValAla GluIlePheSer IleLeu Ala His Val ', TCC AGTGCT GGG CTT GGATGT TTTGTACCCAAG ATTTAT ATC 2470 ATC ATA
Ser SerAla Gly Leu GlyCys PheValProLys IleTyr Ile Ile Ile CCA TCG
Ile LeuMet Arg Glu ArgAsn ThrGlnLysIle ArgGlu Lys Pro Ser Ser Tyr Phe ACTAAACTCT
CTAATTATTA
CAATTTTATT
CTGTCAAATAAAAATATATTATATCCAAAAp,~~iAAAAAAAA p~AAAAAAAp,A 3108 AA
{2) INFORMATION FOR SEQ ID N0:28:
(i) SEQUENCE CHARACTERISTICS:
(Ay LENGTH: 804 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:28:
Met Phe Ile Phe Met Glu Val Phe Phe Leu Leu Asn Ile Thr Leu Leu Met Ala Asn Phe Ile Asp Pro Arg Cys Phe Trp Arg Ile Asn Leu Asp Glu Ile Met Asp Glu Tyr Leu Gly Leu Ser Cys Ala Phe Ile Leu Ala Ala Val Gln Thr Pro Ile Glu Asn Asp Tyr Phe Asn Lys Thr Leu Asn Val Leu Lys Thr Thr Lys Asn His Lys Tyr Ala Leu Ala Leu Val Phe Ala Met Asp Glu Ile Asn Arg Asn Pro Asp Leu Leu Pro Asn Met Ser Leu Ile Ile Arg Tyr Thr Leu Gly Arg Cys Asp Gly Lys Thr Val Ile Pro Thr Pro Tyr Leu Phe Arg Lys Lys Lys Glu Ser Pro Ile Pro Asn Tyr Phe Cys Asn Glu Glu Thr Met Cys Ser Tyr Leu Leu Thr Gly Pro His Trp Glu Val Ser Leu Gly Phe Trp Lys His Met Asn Ser Phe Leu Ser Pro Arg Ile Leu Gln Leu Thr Tyr Gly Pro Phe His Ser Ile Phe Ser Asp Asp Glu Gln Tyr Pro Tyr Leu Tyr Gln Met Ala Pro Lys Asp Thr Ser Leu Ala Leu Ala Met Val Ser Phe Ile Leu Tyr Phe Ser Trp Asn Trp Ile Gly Leu Val Ile Pro Asp Asp Asp Gln Gly Asn Gln Phe Leu Leu Glu Leu Lys Lys Gln Ser Glu Asn Lys Glu Ile Cys Phe Ala Phe Val Lys Met Ile Ser Val Asp Asp Val Ser Phe Pro Gln Asn Thr Glu Met Tyr Tyr Asn Gln Ile Val Met Ser Ser Thr Asn Val Ile Ile Ile Tyr Gly Glu Thr Tyr Asn Phe Ile Asp Leu Ile Phe Arg Met Trp Glu Pro Pro Ile Leu Gln Arg Ile Trp Ile Thr Thr Lys Gln Leu Asn Phe Pro Thr Arg Lys Lys Asp Ile Ser His Gly Thr Phe Tyr Gly Ser Leu Thr Phe Leu Pro His His Gly Val Ile Ser Gly Phe Lys Asn Phe Val Gln Thr Trp Phe His Leu Arg Asn Thr Asp Leu Tyr Leu Val Met Gln Glu Trp Lys Tyr Phe Asn Tyr Glu Asp Ser Ala Ser Thr Cys Lys Ile Leu Lys Asn Asn Ser Ser Asn Ala Ser Phe Asp Trp Leu Met Glu Gln Lys Phe Asp Met Thr Phe Ser Glu Asn Ser His Asn Ile Tyr Asn Ala Val His Ala Ile Ala His Ala Leu His Glu Met Asn Leu Gln Gln Ala Asp Asn Gln Ala Ile Asp Asn Gly Lys Lys Glu Pro Ser Ser Ser His Cys Leu Lys Val Asn Ser Phe Leu Arg Arg Ile Tyr Phe Thr Asn Pro Pro Gly Asp Lys Val Phe Met Lys Gln Arg Val Ile Met His Asp Glu Tyr Asp Ile Val His Phe Val Asn Leu Ser Gln His Leu Gly Ile Lys Met Lys Leu Gly Lys Phe Ser Pro Tyr Leu Pro His Gly Arg His Ser His Leu Tyr Val Asp Arg Ile Glu Leu Ala Thr Gly Arg Arg Lys Met Pro Ser Ser Val Cys Ser Ala Asp Cys Ser Pro Gly Phe Arg Arg Leu Trp Lys Glu Gly Met Ala Ala Cys Cys Phe Val Cys Ser Pro Cys Pro Glu Asn Glu Ile Ser Asn Glu Thr Thr Val Val Leu Cys Val Phe Val Lys His His Asp Thr Pro Ile Val Lys Ala Asn Asn Arg Ser Leu Ser Tyr Leu Leu Leu Met Ser Leu Met Ser Cys Phe Leu Cys Ser Phe Phe Phe Ile Gly Leu Pro Asn Arg Ala Ile Cys Val Leu Gln Gln Ile Thr Phe Gly Ile Val Phe Thr Met Ala Val Ser Thr Val Leu Ala Lys Thr Val Thr Val Val Leu Ala Phe Lys Val Thr Asp Pro Gly Arg Arg Leu Arg Asn Phe Leu Val Ser Gly Thr Pro Asn Tyr Ile Ile Pro Ile Cys Ser Leu Leu Gln Cys Val Leu Cys Ala Ile Trp Leu Ala Val Ser Pro Pro Phe Val Asp Ile Asp Glu His Thr Leu His Gly His Ile Ile Ile Val Cys Asn Lys Gly Ser Val Thr Ala Phe Tyr Cys Ile Leu Gly Tyr Leu Ala Cys Leu Ala Leu Gly Asn Phe Ser Val Ala Phe Leu Ala Lys Asn Leu Pro Asp Thr Phe Asn Glu Ala Lys Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ser Val Trp Val Thr Phe Leu Pro Val Tyr His Ser Thr Lys Gly Lys His Met Val Ala Val Glu Ile Phe Ser Ile Leu Ala Ser Ser Ala Gly Ile Leu Gly Cys Ile Phe Val Pro Lys Ile Tyr Ile Ile Leu Met Arg Pro Glu Arg Asn Ser Thr Gln Lys Ile Arg Glu Lys Ser Tyr Phe (2) INFORMATION FOR SEQ ID N0:29:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3689 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii} MOLECULE TYPE: cDNA
(ix} FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 39...419 (D) OTHER INFORMATION: VR15 (xi) SEQUENCE DESCRIPTION: SEQ ID
N0:29:
GGAAAAAT ATG TTC ATT TTC ATG GGA
Met Phe Ile Phe Met Gly CTC ATG AAT
Val Phe Phe Leu Leu Asn Ile Thr Leu Ala Asn Phe Ile Leu Met Asn GAT GAA TAT
Pro Arg Cys Phe Trp Arg Ile Asn Leu Ile Thr Asp Glu Asp Glu Tyr GCG GCA ACT
Leu Gly Leu Ser Cys Thr Phe Ile Leu Val Gln Thr Pro Ala Ala Thr AAT GTT AAA
Glu Lys Asp Tyr Phe Asn Lys Thr Leu Leu Lys Thr Thr Asn Val Lys TTT GCA AAC
Asn His Lys Tyr Ala Leu Ala Leu Val Met Asp Glu Ile Phe Ala Asn TCT TTG ACT
Arg Asn Pro Asp Leu Leu Pro Asn Met Ile Ile Arg Tyr Ser Leu Thr ACA CCT TTT
Leu Gly Leu Cys Asp Gly Lys Thr Val Thr Pro Tyr Leu Thr Pro Phe TAATTATTTC TGTAATGAAG AGACTAT
His Lys Lys Lys Thr Lys Pro Tyr Pro GGATGTATCT
GCTTACCTAT
TCAGATGGCC
GAAATGGAAC
AGAGTTGAAG
TGTTGATGAT
ATCCACAAAT
AATGTGGGAA
TACCAGGAAG
CCATGGTGAG
AGATTTATAT
TTGTAAAATA
GTTTGACATG
CCATGCCCTC
AGGAGCCAGT
TCCTCTTGGG
TATTCACTTT
GACACTCTCA
CCTCTGTGTG
CAGCCTGCTG
CCTCTCCATT
GGATGGGAAT
GTTAACACTA
GCATTTTGGT
GCTGGTCCTC
TAACATCATA
TCATTTAAAT
CATTACATAT
TACCTTGACA
TGGATCAATG
TTCAGAAAGG
TGGCCTTCTG
ACACTCCTAT
TGTTCTGTTT
TACAGCAAAT
CAGTCACTGT
TGGTATCAGG
GTGCAATCTG
GCCATATCAT
ATTTGGCCTG
ACACATTCAA
TCACCTTCCT
TCTCCATCTT
TCATTTTAAT
GAACAAATAT
TTATAGTGCA
AGTATCATAT
TTCATTTTCT
TCATGGAGAT
CTTTGTGTAG
TCAAATAATC
TATTTTCTGA
CTTCAATCTA
ATATATTATA
(2) INFORMATION FOR SEQ ID N0:30:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 127 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:30:
Met Phe Ile Phe Met Gly Val Phe Phe Leu Leu Asn Ile Thr Leu Leu Met Ala Asn Phe Ile Asn Pro Arg Cys Phe Trp Arg Ile Asn Leu Asp Giu Ile Thr Asp Glu Tyr Leu Gly Leu Ser Cys Thr Phe Ile Leu Ala Ala Val Gln Thr Pro Thr Glu Lys Asp Tyr Phe Asn Lys Thr Leu Asn Val Leu Lys Thr Thr Lys Asn His Lys Tyr Ala Leu Ala Leu Val Phe Ala Met Asp Glu Ile Asn Arg Asn Pro Asp Leu Leu Pro Asn Met Ser Leu Ile Ile Arg Tyr Thr Leu Gly Leu Cys Asp Gly Lys Thr Val Thr Pro Thr Pro Tyr Leu Phe His Lys Lys Lys Thr Lys Pro Tyr Pro (2) INFORMATION FOR SEQ ID N0:31:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3896 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 36...263 (D) OTHER INFORMATION:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:31:
TGT GTT
Met Lys Asn Leu Cys Val TTG TGC CAT
Phe Thr Leu Ser Phe Phe Leu Leu Glu Phe Ser Leu Ile Leu Cys His GAA GAT AAT
Leu Thr Glu Pro Ile Cys Phe Trp Arg Ile Asn Asn Asn Glu Asp Asn GCA GTT GAG
Asp Gly Asp Leu Arg Ser Asp Cys Gly Phe Phe Leu Ala Ala Val Glu TTT TCT TTG
Gly Pro Thr Asp Asp Ser Tyr Asn Ile Ser Asp Leu Arg Phe Ser Leu ATCAGGTA
Asp His Leu Ile Leu Ser TTTTACATGG
ATCAGACATG
CCCAGAAGAC
ATCAACAGCA
TAGAAGGTGG
TATCACAAAT
CCATGTAGGT
CACAGTAAAC
GAACAGCAAT
CAAATATGAC
GGCCCACACC
CAAAGGAACA
TAACCCTGTT
TATTTTCATC
TTTGCCTTGT
AGGAGGATCA
AATTCATCAG
AGTTTCCAAT
AAAGGCTCAG
(2) INFORMATION FOR SEQ ID N0:32:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 76 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:32:
Met Lys Asn Leu Cys Val Phe Thr Leu Ser Phe Phe Leu Leu Glu Phe Ser Leu Ile Leu Cys His Leu Thr Glu Pro Ile Cys Phe Trp Arg Ile Asn Asn Asn Glu Asp Asn Asp Gly Asp Leu Arg Ser Asp Cys Gly Phe Phe Leu Ala Ala Val Glu Gly Pro Thr Asp Asp Ser Tyr Asn Ile Ser Asp Leu Arg Phe Ser Leu Asp His Leu Ile Leu Ser (2) INFORMATION FOR SEQ ID N0:33:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2811 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
( ix) FEATURE
(A) NAME/KEY: Coding Sequence (B) LOCATION: 962...2605 (D) OTHER INFORMATION: GoVNl (xi) SEQUENCE DESCRIPTION: SEQ ID N0:33:
ACAGATGCCA
CCTCGTACAC
GCCTATAAGT
GATAAATGCA
TGTGAAGTCC
TTTTGAACGG
TCCTAATTAC
GAAAACATCT
TGGGCCGTGT
CCCCAAAGAC
CTGGGTGGGA
AAAGGAGCTG
GGAATCATTG
TGTGATTATA
AAAGTATGAA
AACAATATAC
CAT CAT GGG
Met Leu Glu Leu Ala His Gly Thr Leu Thr Phe Ser Pro His His Gly CCT ATC AAG
Glu Ile Ser Asp Phe Thr Asn Phe Met Gln Glu Val Thr Pro Ile Lys TAT TTC AAT
Tyr Pro Glu Asp Ile Phe Leu His Ile Leu Trp Asn Gln Tyr Phe Asn TGT ATA CCC
Cys Pro Leu Leu His Ser Glu Cys Lys Ile Phe Glu Asn Cys Ile Pro CTG GTC ATG
Asn Ala Ser Leu Glu Leu Leu Pro Gly Gly Val Phe Glu Leu Val Met GTG GCC CAC
Thr Glu Glu Ser Tyr Asn Val Tyr Asn Ala Val Tyr Ala Val Ala His CCA CAG GAT
Ser Leu His Glu Lys Ala Leu His Gln Val Glu Ile Gln Pro Gln Asp CCT TTT CTG
Asn Lys Asp Arg Thr Ile Leu Phe Pro Trp Gln Leu His Pro Phe Leu Lys AsnIle Gln Ile AsnSer Gly Arg Val Leu Leu Val Asp Ile Asp ACG ATT
Trp LysLys Lys Asp ThrGluTyrAsp IleSer Asn TrpAsn Thr Ile CTT TTT
Phe ProThr Gly Ser LeuLeuValLys ValGly Thr AlaPro Leu Phe GGG ACA
Ser AlaPro Lys Glu GlnLeuSerIle SerGlu His IleAsn Gly Thr TTT AGT
Trp ProIle Gly Thr GluIleProLys SerVal Cys GluSer Phe Ser CAC CCT
Cys SerPro Gly Arg LysValIleLeu GluSer Lys AlaCys His Pro ACT AAC
Cys PheAsp Cys Pro CysProAspLys GluIle Ser GluThr Thr Asn TGT GCA
Asp ValGly Gln Val LysCysProGlu SerHis Tyr AsnThr Cys Ala TGC GAT
Glu LysSer His Leu LysLysThrMet ThrPhe Leu TyrAsn Cys Asp ACG TTC
Asp SerLeu Gly Gly LeuThrLeuMet SerLeu Gly PheVal Thr Phe GTT AAC
Val ThrGly Leu Ile GlyValPheIle IleHis Arg ThrPro Val Asn AAT CTC
Ile ValLys Ala Asn ArgSerLeuSer TyrIle Leu IleThr Asn Leu TTC CTT
Leu ThrLeu Cys Leu CysProLeuLeu PheIle Gly ProAsn Phe Leu ATC CTC
Thr AlaThr Cys Leu GlnGlnAsnLeu PheGly Leu PheThr Ile Leu _ GTG GCTCTA TCC GTG TTGGCCAAAACT ATCACT GTA ATGGCA 2065 ACA GTT
Val AlaLeu Ser Val LeuAlaLysThr IleThr Val MetAla Thr Val GCT CTG
Phe LysIle Thr Pro GlyArgLysThr ArgTrp Leu IleLeu Ala Leu AGA GCC CCT CAG TTC ATC ATT CCA CTT TGT GCC CTG ATG CAA ATC CTT . 2161 ArgAla Gln Ile Ile Pro Leu AlaLeu Met IleLeu Pro Phe Cys Gln GGG TGG CCT GAC
PheSer Ile Leu Gly Thr Ser ProPhe Val MetAsp Gly Trp Pro Asp TCT CAT ATT AAG
AlaHis Glu Gly His Ile Ile LeuCys Asn GlySer Ser His Ile Lys GGC TAC TAC ATG
AlaIle Phe Cys Thr Leu Ala LeuGly Val AlaPhe Gly Tyr Tyr Met TAC TTG AGG GAC
GlySer Leu Ala Phe Met Ser AsnLeu Pro ThrPhe Tyr Leu Arg Asp TCC GCC ATG TGC
AsnGlu Lys Leu Ala Phe Ser LeuMet Phe SerVal Ser Ala Met Cys ACA CTC AGC AAG
TrpVal Phe Pro Val Tyr His ThrThr Gly ValArg Thr Leu Ser Lys ATG ATG GCT AGC
ValAla Glu Phe Ser Ile Leu SerSer Ala IleLeu Met Met Ala Ser ATC GTC ATT AGA
ThrLeu Phe Pro Lys Cys Tyr ValLeu Phe ProGlu Ile Val Ile Arg ATA CCT AAA AGG
ArgAsn Leu Leu Asn Arg Glu ArgGln His SerLys Ile Pro Lys Arg GAA TAGCAGTCAA
GACAAACATT
GGCCTAGCAC
AAAATGTCTG
AsnSer Thr Glu CCTGCTATAT GATCACATGA
AAACAATTAG
TCCTTTGACT
TGATATTGCT TATTGACCAA
TCAAATTATG
TAAAATATGT
GTTCTTGTAT
GAAAAAAAAA
AAAAAAA
(2) INFORMATION
FOR
SEQ
ID
N0:34:
(i) SEQUENCE
CHARACTERISTICS:
(A) LENGTH:548 amino acids (B) TYPE:
amino acid (C) STRANDEDNESS:
single (D) TOPOLOGY:
linear (ii) TYPE: protein MOLECULE
(v) FRAGMENT
TYPE:
internal (xi) DESCRIPTION: SEQ N0:34:
SEQUENCE ID
MetLeu Leu His Gly Thr Leu PheSer Pro HisGly Glu Ala Thr His GluIle Asp Thr Asn Phe Met GluVal Thr IleLys Ser Phe Gln Pro Tyr.Pro Asp Phe Leu His Ile TrpAsn Gln PheAsn Glu Ile Leu Tyr .
Cys Pro Leu Leu His Ser Glu Cys Lys Ile Phe Glu Asn Cys Ile Pro Asn Ala Ser Leu Glu Leu Leu Pro Gly Gly Val Phe Glu Leu Val Met Thr Glu Glu Ser Tyr Asn Val Tyr Asn Ala Val Tyr Ala Val Ala His Ser Leu His Glu Lys Ala Leu His Gln Val Glu Ile Gln Pro Gln Asp Asn Lys Asp Arg Thr Ile Leu Phe Pro Trp Gln Leu His Pro Phe Leu Lys Asn Ile Gln Leu Ile Asn Ser Val Gly Asp Arg Val Ile Leu Asp Trp Lys Lys Lys Thr Asp Thr Glu Tyr Asp Ile Ser Asn Ile Trp Asn Phe Pro Thr Gly Leu Ser Leu Leu Val Lys Val Gly Thr Phe Ala Pro Ser Ala Pro Lys Gly Glu Gln Leu Ser Ile Ser Glu His Thr Ile Asn Trp Pro Ile Gly Phe Thr Glu Ile Pro Lys Ser Val Cys Ser Glu Ser Cys Ser Pro Gly His Arg Lys Val Ile Leu Glu Ser Lys Pro Ala Cys Cys Phe Asp Cys Thr Pro Cys Pro Asp Lys Glu Ile Ser Asn Glu Thr Asp Val Gly Gln Cys Val Lys Cys Pro Glu Ser His Tyr Ala Asn Thr Glu Lys Ser His Cys Leu Lys Lys Thr Met Thr Phe Leu Asp Tyr Asn Asp Ser Leu Gly Thr Gly Leu Thr Leu Met Ser Leu Gly Phe Phe Val Val Thr Gly Leu Val Ile Gly Val Phe Ile Ile His Arg Asn Thr Pro Ile Val Lys Ala Asn Asn Arg Ser Leu Ser Tyr Ile Leu Leu Ile Thr Leu Thr Leu Cys Phe Leu Cys Pro Leu Leu Phe Ile Gly Leu Pro Asn Thr Ala Thr Cys Ile Leu Gln Gln Asn Leu Phe Gly Leu Leu Phe Thr Val Ala Leu Ser Thr Val Leu Ala Lys Thr Ile Thr Val Val Met Ala Phe Lys Ile Thr Ala Pro Gly Arg Lys Thr Arg Trp Leu Leu Ile Leu Arg Ala Pro Gln Phe Ile Ile Pro Leu Cys Ala Leu Met Gln Ile Leu Phe Ser Gly Ile Trp Leu Gly Thr Ser Pro Pro Phe Val Asp Met Asp Ala His Ser Glu His Gly His Ile Ile Ile Leu Cys Asn Lys Gly Ser Ala Ile Gly Phe Tyr Cys Thr Leu Ala Tyr Leu Gly Val Met Ala Phe Gly Ser Tyr Leu Leu Ala Phe Met Ser Arg Asn Leu Pro Asp Thr Phe Asn Glu Ser Lys Ala Leu Ala Phe Ser Met Leu Met Phe Cys Ser Val Trp Val Thr Phe Leu Pro Val Tyr His Ser Thr Thr Gly Lys Val Arg Val Ala Met Glu Met Phe Ser Ile Leu Ala Ser Ser Ala Ser Ile Leu Thr Leu Ile Phe Val Pro Lys Cys Tyr Ile Val Leu Phe Arg Pro Glu Arg Asn Ile Leu Pro Leu Asn Arg Glu Lys Arg Gln His Arg Ser Lys Asn Ser Glu Thr (2) INFORMATION FOR SEQ ID N0:35:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3584 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: CDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 273...2576 (D) OTHER INFORMATION: GoVN2 (xi) SEQUENCE N0:35:
DESCRIPTION:
SEQ ID
CAGAAAGAAT ATTTTTCCTT
ATGTTCATTT
CTCCACCATC CACTTCTCAT GGTGCTTTTG GAGAACAAAT
GGCAAATTTC
ATCGATCCCT
TTGAATGAAG TCAAGGAAAA CCTTCATCCT TGGAGCAGTT
AAACTTGGAT
ATAAATTGTG
TATTTCAATG ACAACTAAAA
AGACTTTGAA
TTAGCCTTTT
CA ATG
Met Glu GluIleAsnArg Asn AAT TCT GTT
ProAsp LeuLeu Pro Met Leu Ile LysHisThrLeu Ser Asn Ser Val ACT GAC ATA
TyrCys AspGly Asn Ala His Phe LysGluLysPhe Tyr Thr Asp Ile TAT TGT GAA
LysPro LeuPro Asn Val Asn Glu ThrMetCysSer Phe Tyr Cys Glu AAT GTA TCT
MetLeu IleGly Leu Trp Leu Leu ThrLeuPheLys Asp Asn Val Ser 60 65 7p TTT CGT CTT
LeuAsp IlePhe Ser Pro Phe Gln IleSerTyrGly Pro Phe Arg Leu AGT AAT CAA
PheHis SerIle Phe Asp Glu Phe ProTyrLeuTyr Gln Ser Asn Gln ACA CTA TTG
MetThr ProLys Asp Ser Ala Ala IleValSerPhe Leu Thr Leu Leu AAC GTT CTT
LeuTyr PheAsn Trp Trp Gly Val IleSerAspAsn Asp Asn Val Leu CTC GAG AAA
GluGly AsnGln Phe Ser Leu Lys GluThrGlnAsn Lys Leu Glu Lys TTT AAC ATG
GluIle CysPhe Ala Val Met Ser IleHisGluHis Ser Phe Asn Met _ --Ser Tyr GlnLysThrGlu MetTyrTyr AsnGlnIle ValMetSer Ser Thr Asn IleIleIleIle TyrGlyLys ThrAsnSer IleIleGlu Leu Ser Phe ArgMetTrpVal SerProVal IleGlnArg IleTrpVal Thr Asn Ser GluLeuAspPhe ProThrSer MetArgAsp PheThrHis Gly Thr Phe TyrGlyThrLeu ThrPheLeu HisHisHis GlyGluIle Ser Gly Phe ThrAsnPhePhe GluThrTrp AspHisLeu ArgSerArg Asp Leu Asn LeuLeuIlePro GluTrpLys TyrPheSer TyrAspAla Ser Gly Ser AsnCysLysIle LeuArgAsn TyrSerSer AsnAlaSer Leu Glu Trp IleThrGluGln LysPheHis MetAlaPhe AsnAspTyr Ser His Ser IleTyrAsnAla ValTyrAla MetAlaHis AlaLeuHis Glu Thr Asn LeuGlnGluVal AspAsnLys GluIleArg AsnGlyLys Gly Ala Ser ThrHisCysLeu LysValAsn SerPheLeu ArgLysThr His Phe Thr AsnSerHisGly GluArgVal IleMetLys GlnArgVal Arg Val Gln GluAspTyrAsp IleValHis IleGlnAsn PheSerGln His Leu Arg IleLysMetLys IleGlyLys PheSerPro TyrPheThr His Gly Gly ProPheHisLeu TyrGluAsp MetIleGln LeuAlaThr Gly .
SerArgLys MetProSer SerValCys SerAlaAsp CysSerPro Gly PheArgLys SerTrpLys GluGlyMet AlaProCys CysPheIle Cys SerLeuCys ProGluAsn GluIleSer AsnGluThr AsnMetAsp Gln CysValAsn CysProGlu TyrGlnTyr AlaAsnThr GluLysAsn Lys CysIleGln LysAspVal IlePheLeu SerTyrGlu AspProLeu Gly MetAlaLeu AlaLeuIle AlaPheCys LeuSerAla PheThrAla Val ValLeuTrp ValPheVal LysHisHis AspThrPro IleValLys Ala AsnAsnArg IleLeuSer TyrIleLeu IleMetSer LeuMetPhe Cys PheLeuCys SerPhePhe PheIleGly HisProAsn ArgGlyThr Cys IleLeuGln GlnIleThr PheGlyIle ValPheThr ValAlaVal Ser ThrValLeu AlaLysThr IleThrVal IleLeuAla PheLysLeu Arg AspProGly ArgSerLeu ArgAsnPhe LeuValSer GlyAlaPro Asn TyrIleIle ProIleCys SerLeuLeu GlnCysIle LeuCysAla Ile TrpLeuAla ValSerPro ProPheVal AspIleAsp GluHisSer Glu HisGlyHis IleMetIle ValCysAsn LysGlySer IleMetAla Phe TyrCysVal LeuGlyTyr LeuAlaCys LeuAlaLeu GlySerPhe Thr ThrAlaPhe LeuAlaLys AsnLeuPro AspThrPhe AsnGluAla Lys ', TTC TTG ACC TTC AGC ATG CTA GTG TTC TGC AGT GTC TGG GTC ACC TTT 2405 Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ser Val Trp Val Thr Phe Leu Pro Val Tyr His Ser Thr Arg Gly Arg Val Met Val Ala Val Glu Ile Phe Ser Ile Leu Ala Ser Ser Ala Gly Met Phe Gly Cys Ile Phe Ala Pro Lys Ile Tyr Ile Ile Leu Met Lys Pro Glu Arg Asn Ser Ile Gln Lys Phe Arg Glu Lys Ser Tyr Phe (2) INFORMATION FOR SEQ ID N0:36:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 768 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:36:
Met Glu Glu Ile Asn Arg Asn Pro Asp Leu Leu Pro Asn Met Ser Leu Val Ile Lys His Thr Leu Ser Tyr Cys Asp Gly Asn Thr Ala Asp His Ile Phe Lys Glu Lys Phe Tyr Lys Pro Leu Pro Asn Tyr Val Cys Asn Glu Glu Thr Met Cys Ser Phe Met Leu Ile Gly Leu Asn Trp Val Leu Ser Leu Thr Leu Phe Lys Asp Leu Asp Ile Phe Ser Phe Pro Arg Phe Leu Gln Ile Ser Tyr Gly Pro Phe His Ser Ile Phe Ser Asp Asn Glu Gln Phe Pro Tyr Leu Tyr Gln Met Thr Pro Lys Asp Thr Ser Leu Ala Leu Ala Ile Val Ser Phe Leu Leu Tyr Phe Asn Trp Asn Trp Val Gly Leu Val Ile Ser Asp Asn Asp Glu Gly Asn Gln Phe Leu Ser Glu Leu Lys Lys Glu Thr Gln Asn Lys Glu Ile Cys Phe Ala Phe Val Asn Met Met Ser Ile His Glu His Ser Ser Tyr Gln Lys Thr Glu Met Tyr Tyr Asn Gln Ile Val Met Ser Ser Thr Asn Ile Ile Ile Ile Tyr Gly Lys Thr Asn Ser Ile Ile Glu Leu Ser Phe Arg Met Trp Val Ser Pro Val Ile Gln Arg Ile Trp Val Thr Asn Ser Glu Leu Asp Phe Pro Thr Ser Met Arg Asp Phe Thr His Gly Thr Phe Tyr Gly Thr Leu Thr Phe Leu His His His Gly Glu Ile Ser Gly Phe Thr Asn Phe Phe Glu Thr Trp Asp His Leu Arg Ser Arg Asp Leu Asn Leu Leu Ile Pro Glu Trp Lys Tyr Phe Ser Tyr Asp Ala Ser Gly Ser Asn Cys Lys Ile Leu Arg Asn Tyr Ser Ser Asn Ala Ser Leu Glu Trp Ile Thr Glu Gln Lys Phe His Met Ala Phe Asn Asp Tyr Ser His Ser Ile Tyr Asn Ala Val Tyr Ala Met Ala His Ala Leu His Glu Thr Asn Leu Gln Glu Val Asp Asn Lys Glu Ile Arg Asn Gly Lys Gly Ala Ser Thr His Cys Leu Lys Val Asn Ser Phe Leu Arg Lys Thr His Phe Thr Asn Ser His Gly Glu Arg Val Ile Met Lys Gln Arg Val Arg Val Gln Glu Asp Tyr Asp Ile Val His Ile Gln Asn Phe Ser Gln His Leu Arg Ile Lys Met Lys Ile Gly Lys Phe Ser Pro Tyr Phe Thr His Gly Gly Pro Phe His Leu Tyr Glu Asp Met Ile Gln Leu Ala Thr Gly Ser Arg Lys Met Pro Ser Ser Val Cys Ser Ala Asp Cys Ser Pro Gly Phe Arg Lys Ser Trp Lys Glu Gly Met Ala Pro Cys Cys Phe Ile Cys Ser Leu Cys Pro Glu Asn Glu Ile Ser Asn Glu Thr Asn Met Asp Gln Cys Val Asn Cys Pro Glu Tyr Gln Tyr Ala Asn Thr Glu Lys Asn Lys Cys Ile Gln Lys Asp Val Ile Phe Leu Ser Tyr Glu Asp Pro Leu Gly Met Ala Leu Ala Leu Ile Ala Phe Cys Leu Ser Ala Phe Thr Ala Val Val Leu Trp Val Phe Val Lys His His Asp Thr Pro Ile Val Lys Ala Asn Asn Arg Ile Leu Ser Tyr Ile Leu Ile Met Ser Leu Met Phe Cys Phe Leu Cys Ser Phe Phe Phe Ile Gly His Pro Asn Arg Gly Thr Cys Ile Leu Gln Gln Ile Thr Phe Gly Ile Val Phe Thr Val Ala Val Ser Thr Val Leu Ala Lys Thr Ile Thr Val Ile Leu Ala Phe Lys Leu Arg Asp Pro Gly Arg Ser Leu Arg Asn Phe Leu Val Ser Gly Ala Pro Asn Tyr Ile Ile Pro Ile Cys Ser Leu Leu Gln Cys Ile Leu Cys Ala Ile Trp Leu Ala Val Ser Pro Pro Phe Val Asp Ile Asp Glu His Ser Glu His Gly His Ile Met Ile Val Cys Asn Lys Gly Ser Ile Met Ala Phe Tyr Cys Val Leu Gly Tyr Leu Ala Cys Leu Ala Leu Gly Ser Phe Thr Thr Ala Phe Leu Ala Lys Asn Leu Pro Asp Thr Phe Asn Glu Ala Lys Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ser Val Trp Val Thr Phe Leu Pro Val Tyr His Ser Thr Arg Gly Arg Val Met Val Ala Val Glu Ile Phe Ser Ile Leu Ala Ser Ser Ala Gly Met Phe Gly Cys Ile Phe Ala Pro Lys Ile Tyr Ile Ile Leu Met Lys Pro Glu Arg Asn Ser Ile Gln Lys Phe Arg Glu Lys Ser Tyr Phe (2) INFORMATION FOR SEQ ID N0:37:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3578 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 1181...3181 (D) OTHER INFORMATION: GoVN3 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:37:
AATTATTTTT
TGAAGCTTTC
CCAAGACATT
TTGGTCATGT
CTATCCACTT
TTCTACCTAA
TAAACAGTTT
GAAGATATTT
TCCTATACAT
CCCTAAAGTC
ACAATCACTG
CTACTGATGC
TCTTTGCACT
TTAGAACTCA
ATCACACCTG
AACTAAGTGA
GATATTTTTT
TAAAATTCCT
CATTTCACCC
CCT AAG GAC
Met Ala Pro Lys Asp Thr Ser Leu Ala Leu Ala Met Val Ser Leu Phe Val His Phe Ser Trp GCT GTT GAC GGT
TCA TAT
Asn TrpValGly ValVal AspAsp ProGly Glu Phe Ala Ser Asp Tyr AGA ATG AAC TGT
Ile LeuGluLeu ArgGlu GlnArg AsnPhe Leu Ala Arg Met Asn Cys ATT GAT TTA AAA
Phe ValSerIle ValSer AspAsn PheLeu Arg Tyr Ile Asp Leu Lys AAC AAG TCA GTT
Asn IleTyrTyr GlnIle MetSer AlaLys Val Ile Asn Lys Ser Val 70 75 8p g5 AAA CCT GTG AGA
Ile TyrGlyAsp AspSer LeuGln AsnPhe Leu Trp Lys Pro Val Arg ATC ATC ACT CAG
Asn LeuPheAsp GlnArg TrpVal ThrSer Trp Asp Ile Ile Thr Gln AAT TTC AAT TAT
Met IleIleAsn GlyLys LeuLeu SerPhe Gly Thr Asn Phe Asn Tyr CAT TCT TCT AAA
Leu SerPheSer HisTyr GluLeu GlyPhe Thr Phe His Ser Ser Lys TAC AAC GAT TCT
Ile GlnThrAla ProSer TyrSer AspPhe Leu Gly Tyr Asn Asp Ser GTG AAT TTG TCT
Ile LeuTrpTrp TyrPhe CysSer SerLeu Glu Cys Val Asn Leu Ser AAT AAG ATA TGG
Lys AsnLeuGln CysPro GluAsn PheArg Leu Tyr Asn Lys Ile Trp GAA TTG ACT GAC
Arg HisHisPhe MetSer SerAsp ThrTyr Leu Tyr Glu Leu Thr Asp GCT TAC CAA CTT
Asn SerMetTyr ValAla ThrLeu GlnMet Leu Lys Ala Tyr Gln Leu TGG GAT AAA GAA
Gln AlaAspThr GlnIle AspGly GluPro Phe Asp Trp Asp Lys Glu CTC CTG ATC ATA
Ser TrpGlnMet SerPhe ArgAsn GlnPhe Asn Pro Leu Leu Ile Ile GTG AAT GAA GAT
Val GlyAspLys AsnLeu HisGlu LysLeu Thr Lys Val Asn Glu Asp CAG ACT CCA GTA .
Tyr GluIle HisGln Thr ThrPhe Pro Asn ValPheLys Leu Leu Pro TCC TTA GGT
Leu LysIle GlyThr Phe GlnAsn Ser His ArgGlnLeu Ser Leu Gly ATA AAC CAC
Tyr MetLeu LysGlu Met GluTrp Thr Gly GlnGlnSer Ile Asn His ATT AGT TTC
Pro ThrSer ValCys Ser ProCys Pro Gly ArgLysSer Ile Ser Phe GTT TTT ACA
Pro GlnLeu GlyLys Pro CysCys Asp Cys ProCysPro Val Phe Thr ' 345 350 355 ATG ATG TGT
Glu AsnGlu IleSer Asn ThrAsn Asn Gln IleLysCys Met Met Cys Leu Asn Asp Gln Tyr Ala Asn Pro Gly Gly Thr Arg Cys Leu Lys Lys Val Ile Val Phe Leu Gly Tyr Glu Asp Pro Leu Gly Met Ser Leu Ala Ile Leu Ala Leu Cys Phe Ser Ala Leu Thr Ala Phe Val Leu Ser Ile Phe Leu Lys His Gln Glu Thr Pro Thr Val Lys Ala Asn Asn Arg Thr Leu Ser Tyr Val Leu Leu Ile Ser Leu Ile Ser Cys Phe Leu Cys Ser Leu Leu Phe Ile Gly His Pro Ser Phe Thr Thr Cys Ile Met Gln Gln ' 455 460 465 Thr Thr Phe Ala Val Val Phe Thr Val Ala Ala Ser Thr Val Leu Ala Lys Thr Ile Ile Val Ile Leu Ala Phe Lys Val Thr Asn Thr Ser Arg _ Lys Met Arg Trp Leu Leu Val Ser Gly Ala Pro Lys Phe Ile Ile Pro Ile Cys Thr Met Ile Gln Leu Ile Leu Cys Gly Ile Trp Leu Gly Thr Ser Pro Pro Phe Val Asp Ala Asp Gly His Val Glu Lys Gly His Ile.
CTT GCT TGT
Leu Ile Phe Cys Asn Lys Gly Ser Ile Phe Tyr ValLeu Leu Ala Cys AGT TTC GCA
Gly Tyr Leu Val Ser Ile Ala Ile Ala Thr Leu PhePhe Ser Phe Ala GAA GCC CTA
Ala Arg Asn Leu Pro Asp Thr Phe Asn Lys Phe ThrPhe Glu Ala Leu GTC ACC CCT
Ser Met Leu Val Phe Cys Ser Val Trp Phe Leu ValTyr Val Thr Pro GCT GTG TTC
His Ser Thr Lys Gly Lys Ser Met Val Glu Val CysIle Ala Val Phe TGC ATC CCA
Leu Ala Ser Ser Ala Gly Leu Leu Phe Phe Ala LysCys Cys Ile Pro AAA TCT AAG
Phe Ile Ile Leu Leu Arg Pro Glu Lys Phe Gln PheGln Lys Ser Lys ATTAAATTTT TCTGACACAC
Asn Ile His Ser Lys Ile TAGATCCAAA
AAAACACGTC
CTGTTTGCTG
GTTCTGAGTT
TGTTGTTGTG
CTCTATAATA AATAATTATG AGATAAATGC P~~i~AAAAAAAp~~e~AAAAAAAAAAAAAAA 3578 A
(2) INFORMATION FOR SEQ ID N0:38:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 667 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID 8:
N0:3 Met Ala Pro Lys Asp Thr Ser Leu Ala Leu Ala Met Val Ser Leu Phe Val His Phe Ser Trp Asn Trp Val Gly Ala Val Val Ser Asp Asp Asp Pro Gly Tyr Glu Phe Ile Leu Glu Leu Arg Arg Glu Met Gln Arg Asn Asn Phe Cys Leu Ala Phe Val Ser Ile Ile Val Ser Asp Asp Asn Leu Phe Leu Lys Arg Tyr Asn Ile Tyr Tyr Asn Gln Ile Lys Met Ser Ser Ala-Lys Val Val Ile Ile Tyr Gly Asp Lys Asp Ser Pro Leu Gln Va1 Asn Phe Arg Leu Trp Asn Leu Phe Asp Ile Gln Arg Ile Trp Val Thr Thr Ser Gln Trp Asp Met Ile Ile Asn Asn Gly Lys Phe Leu Leu Asn Ser Phe Tyr Gly Thr Leu Ser Phe Ser His His Tyr Ser Glu Leu Ser Gly Phe Lys Thr Phe Ile Gln Thr Ala Tyr Pro Ser Asn Tyr Ser Asp Asp Phe Ser Leu Gly Ile Leu Trp Trp Val Tyr Phe Asn Cys Ser Leu Ser Leu Ser Glu Cys Lys Asn Leu Gln Asn Cys Pro Lys Glu Asn Ile Phe Arg Trp Leu Tyr Arg His His Phe Glu Met Ser Leu Ser Asp Thr Thr Tyr Asp Leu Tyr Asn Ser Met Tyr Ala Val Ala Tyr Thr Leu Gln Gln Met Leu Leu Lys Gln Ala Asp Thr Trp Gln Ile Asp Asp Gly Lys Glu Pro Glu Phe Asp Ser Trp Gln Met Leu Ser Phe Leu Arg Asn Ile Gln Phe Ile Asn Pro Val Gly Asp Lys Val Asn Leu Asn His Glu Glu Lys Leu Asp Thr Lys Tyr Glu Ile His Gln Thr Leu Thr Phe Leu Pro Asn Pro Val Phe Lys Leu Lys Ile Gly Thr Phe Ser Gln Asn Leu Ser His Gly Arg Gln Leu Tyr Met Leu Lys Glu Met Ile Glu Trp Asn Thr Gly His Gln Gln Ser Pro Thr Ser Val Cys Ser Ile Pro Cys Ser Pro Gly Phe Arg Lys Ser Pro Gln Leu Gly Lys Pro Val Cys Cys Phe Asp Cys Thr Pro Cys Pro Glu Asn Glu Ile Ser Asn Met Thr Asn Met Asn Gln Cys Ile Lys Cys Leu Asn Asp Gln Tyr Ala Asn Pro Gly Gly Thr Arg Cys Leu Lys Lys Val Ile Val Phe Leu Gly Tyr Glu Asp Pro Leu Gly Met Ser Leu Ala Ile Leu Ala Leu Cys Phe Ser Ala Leu Thr Ala Phe Val Leu Ser Ile Phe Leu Lys His Gln Glu Thr Pro Thr Val Lys Ala Asn Asn Arg Thr Leu Ser Tyr Val Leu Leu Ile Ser Leu Ile Ser Cys Phe Leu Cys Ser Leu Leu Phe Ile Gly His Pro Ser Phe Thr Thr Cys Ile Met Gln Gln Thr Thr Phe Ala Val Val Phe Thr Val Ala Ala Ser Thr Val Leu Ala Lys Thr Ile Ile Val Ile Leu Ala Phe Lys Val Thr Asn Thr Ser Arg Lys Met Arg Trp Leu Leu Val Ser Gly Ala Pro Lys Phe Ile Ile Pro Ile Cys Thr Met Ile Gln Leu Ile Leu Cys Gly Ile Trp Leu Gly Thr Ser Pro Pro Phe Val Asp Ala Asp Gly His Val Glu Lys Gly His Ile Leu Ile Phe Cys Asn Lys Gly Ser Ile Leu Ala Phe Tyr Cys Val Leu Gly Tyr Leu Val Ser Ile Ala Ile Ala Ser Phe Thr Leu Ala Phe Phe Ala Arg Asn Leu Pro Asp Thr Phe Asn Glu Ala Lys Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ser Val Trp Val Thr Phe Leu Val Tyr His Ser Thr Lys Gly Lys Ser Met Val Pro Ala Val Glu Val Cys Ile Leu Ala Ser Ser Ala Gly Leu Leu Phe Phe Cys Ile Phe Ala Lys Cys Phe Ile Ile Leu Leu Arg Pro Glu Lys Pro Lys Ser Phe Gln Phe Gln Asn Ile His Ser Lys Ile Lys (2) INFORMATION FOR SEQ ID N0:39:
(i) SEQUENCE
CHARACTERISTICS:
(A) LENGTH: 4467 base pairs (B) TYPE: nucleic acid {C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE
TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 126...2723 {D) OTHER INFORMATION: GoVN4 (xi) SEQUENCE
DESCRIPTION:
SEQ ID
N0:39:
GAAACACCTG
TAGAAAAGGA
AACCTGAATA
CAGGTATAGC
ATCTTCTTGG
AGATGGGGAT
AATTGCTACC
TGTTTGCTGA
TCTGTGCAGC
AATTAACTAC
TCC AGG
CTC AGA
GCA GGA
AAA AAT
ATG CTC
ACC TTC
ATT TTA
Met Ser Arg Leu Arg Ala Gly Lys Asn Met Leu Thr Phe IIe Leu TTT ATT TAT
Leu Phe Leu Leu Asn Ile Pro Leu Phe Val Pro Ser Phe Phe Ile Tyr TGC AGA AAC
Pro Arg Phe Trp Ser Met Lys Lys Asn Glu Tyr Gln Asp Cys Arg Asn ACA CCT ATG
Leu Gly Gly Cys Met Phe Phe Ile Leu Ala Val Gln Gln Thr Pro Met GAG ACT GAA
Glu Lys Tyr Phe Ser His Ile Ser Asn Ile Gln Thr Pro Glu Thr Glu AAG ATC AAC
Asn Gln Tyr Pro Leu Thr Leu Ala Phe Ser Met Asn Glu Lys Ile Asn CCT TTC TCA
Asn Asn Asp Leu Leu Pro Asn Met Ser Leu Ala Phe Thr Pro Phe Ser AGT AAT TTT
Glu Tyr Cys Tyr Leu Glu Ser His His Lys Arg Leu Phe Ser Asn Phe AAA AAA GAC
Ser Leu Asn His Glu Ile Leu Pro Asn Phe Ile Cys Thr Lys Lys Asp WO 99/00422 PCT/US98/13b80 GGA CTT AGT TTG ACT
Ile LysCys Gly Val Val Leu Thr SerLeuVal Thr Val Gly Leu Thr TTC ATA CGT
_ Thr LeuHis Ile Ile Leu Asn Asn PheGlnGln Phe Gln Phe Ile Arg GCT CTG AAT
Leu ThrTyr Gly His Phe His Pro CysAspHis Glu Phe Ala Leu Asn GAT GAT CTT
Pro HisLeu Tyr Gln Met Ala Ser ThrSerLeu Ala Ala Asp Asp Leu AGT TGG TTG
Leu ValSer Phe Ile Ile His Phe AsnTrpIle Gly Ala Ser Trp Leu CAT TTT AGA
Ile SerAsp Asn Asp Gln Gly Ile LeuSerTyr Leu Arg His Phe Arg TTT GCC ATT
Glu MetGlu Lys Asn Thr Val Cys PheValAsn Ile Pro Phe Ala Ile ', 240 245 250 255 AGA GCT AGC
Val AsnMet Asn Leu Tyr Met Ser GluValTyr Tyr Gln Arg Ala Ser GTT ATC ACA
Val MetThr Ser Ser Ala Asn Val IleTyrGly Asp Gly Val Ile Thr ATG TGG ATA
Asn ThrLeu Ala Val Ser Phe Arg AspSerLeu Gly Gln Met Trp Ile TGG GAT AAG
Arg LeuTrp Val Thr Thr Ser Gln ValThrPro Phe Lys Trp Asp Lys GGA ACT CAC
Asp PheThr Phe Asp Asn Gly Tyr PheGlyPhe Gly Arg Gly Thr His TAT TTT AAC
His SerGlu Ile Ser Gly Phe Lys ValGlnThr Leu Pro Tyr Phe Asn GTA AAG TAT
Phe LysTyr Ser Asp Glu Tyr Leu LeuGluTrp Met Val Val Lys Tyr TGT AAG TGC
Asn CysLys Ile Leu Glu Tyr Asn SerLeuLys Asn Ser Cys Lys Cys Phe Asn His Ser Leu Glu Trp Leu Met Thr His Thr Phe Asp Met Ala ATT ATT GAA GGG AGT TAT GAA ATA TAC AAT GCT GTG TAT GCT TTT GCC . 1370 Ile Ile GluGlySer TyrGluIle TyrAsnAla ValTyrAla PheAla His Ala LeuHisGlu MetThrLeu GlnAsnVal AspAsnVal LeuLeu Pro Asn TyrGluGlu GlnAsnTyr AsnCysLys MetValTyr SerPhe Leu Ser LysThrGln PheThrAsn ProValGly AspThrVal AsnMet Asn Gln ArgAsnLys LeuLysGlu GluTyrAsp IlePheTyr AsnTrp Asn Phe ProGlnGly LeuGlyPhe LysValLys IleGlyIle PheSer Pro Tyr PheProLys GlyGlnGln LeuHisLeu SerGluAsn LeuIle Glu Trp SerThrGly ArgIleGln MetProThr SerValCys SerAla Asp Cys GlyProGly PheArgLys ValTrpLys AsnGlyMet ProAla Cys Cys PheAspCys SerProCys ProGluAsn GluIleSer AsnGlu Thr Asn ValGluLeu CysValGln CysProGlu AspGlnTyr AlaAsn Gln Glu GlnAsnHis CysIleHis LysAlaArg IlePheLeu SerTyr Asp Glu ProLeuGly MetAlaLeu SerLeuMet AlaLeuCys LeuAla Ala Leu ThrValVal ValLeuGly ValPheVal LysHisHis ArgThr Pro Ile ValLysAla AsnAsnCys ThrLeuThr TyrIleLeu LeuIle Ala Leu IlePheCys PheLeuCys ProLeuPhe PheIleGly HisPro Asn Ser AlaThrCys IleLeuGln GlnIleThr PheGlyVal ValPhe.
Thr Val Ala Ile Ser Thr Val Leu Ala Lys Thr Thr Thr Val Ile Leu GTT
Ala Phe Arg Val Thr Ala Pro His Arg Met Met Lys Tyr Phe Leu Val w 690 695 700 ATT
Ser Arg Ala Ser Asn Tyr Ile Ile Pro Ile Cys Thr Leu Ile Gln Ile ATT
Ile Val Cys Ala Ile Trp Leu Gly Ala Ser Pro Pro Ser Val Asp Ile GGT
Asp Ala Gln Ser Glu His Gly His Ile Ile Ile Ala Cys Asn Lys Gly GCC
Ser Val Thr Ala Phe Tyr Cys Val Leu Gly Tyr Leu Ala Cys Leu Ala ACC
Phe Val Ser Phe Thr Leu Ala Phe Leu Ser Arg Asn Leu Pro Val Thr AGT
Phe Asn Glu Ala Lys Ser Met Thr Phe Ser Met Leu Val Phe Cys Ser GTT
Val Trp Val Thr Phe Leu Pro Val Tyr His Gly Thr Lys Gly Lys Val ATG
Met Val Ala Val Glu Ile Phe Ser Thr Leu Ala Ser Ser Ala Gly Met CCA
Leu Gly Cys Ile Phe Ala Pro Lys Cys Tyr Thr Ile Leu Phe Arg Pro ACT
Asp Arg Asn Ser Leu Gln Met Ile Arg Glu Lys Ser Ser Ser His Thr TCATAATCAC CAAATATTC
His Ile Leu CCCTCAATTT TAAGTGTATC ATAAAAGACA CAGTTGTGAA ATTTTCAAGG ACAGCACTA,C3432 (2) INFORMATION FOR SEQ ID N0:40:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 866 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:40:
Met Ser Arg Leu Arg Ala Gly Lys Asn Met Leu Thr Phe Ile Leu Leu Phe Phe Leu Leu Asn Ile Pro Leu Phe Val Pro Ser Phe Ile Tyr Pro Arg Cys Phe Trp Ser Met Lys Lys Asn Glu Tyr Gln Asp Arg Asn Leu Gly Thr Gly Cys Met Phe Phe Ile Leu Ala Val Gln Gln Pro Met Glu Lys Glu Tyr Phe Ser His Ile Ser Asn Ile Gln Thr Pro Thr Glu Asn Gln Lys Tyr Pro Leu Thr Leu Ala Phe Ser Met Asn Glu Ile Asn Asn Asn Pro Asp Leu Leu Pro Asn Met Ser Leu Ala Phe Thr Phe Ser Glu Tyr Ser Cys Tyr Leu Glu Ser His His Lys Arg Leu Phe Asn Phe Ser Leu Lys Asn His Glu Ile Leu Pro Asn Phe Ile Cys Thr Lys Asp Ile Lys Cys Gly Val Val Leu Thr Gly Leu Ser Leu Val Thr Thr Val Thr Leu His Ile Ile Leu Asn Asn Phe Ile Phe Gln Gln Phe Arg Gln Leu Thr Tyr Gly His Phe His Pro Ala Leu Cys Asp His Glu Asn Phe Pro His Leu Tyr Gln Met Ala Ser Asp Asp Thr Ser Leu Ala Leu Ala Leu Val Ser Phe Ile Ile His Phe Ser Trp Asn Trp Ile Gly Leu Ala Ile Ser Asp Asn Asp Gln Gly Ile His Phe Leu Ser Tyr Leu Arg Arg Glu Met Glu Lys Asn Thr Val Cys Phe Ala Phe Val Asn Ile Ile Pro Val Asn Met Asn Leu Tyr Met Ser Arg Ala Glu Val Tyr Tyr Ser Gln Val Met Thr Ser Ser Ala Asn Val Val Ile Ile Tyr Gly Asp Thr Gly Asn Thr Leu Ala Val Ser Phe Arg Met Trp Asp Ser Leu Gly Ile Gln Arg Leu Trp Val Thr Thr Ser Gln Trp Asp Val Thr Pro Phe Lys Lys Asp Phe Thr Phe Asp Asn Gly Tyr Gly Thr Phe Gly Phe Gly His Arg His Ser Glu Ile Ser Gly Phe Lys Tyr Phe Val Gln Thr Leu Asn Pro Phe Lys Tyr Ser Asp Glu Tyr Leu Val Lys Leu Glu Trp Met Tyr Val Asn Cys Lys Ile Leu Glu Tyr Asn Cys Lys Ser Leu Lys Asn Cys Ser Phe Asn His Ser Leu Glu Trp Leu Met Thr His Thr Phe Asp Met Ala Ile Ile Glu Gly Ser Tyr Glu Ile Tyr Asn Ala Val Tyr Ala Phe Ala His Ala Leu His Glu Met Thr Leu Gln Asn Val Asp Asn Val Leu Leu Pro Asn Tyr Glu Glu Gln Asn Tyr Asn Cys Lys Met Val Tyr Ser Phe Leu Ser Lys Thr Gln Phe Thr Asn Pro Val Gly Asp Thr Val Asn Met Asn Gln Arg Asn Lys Leu Lys Glu Glu Tyr Asp Ile Phe Tyr Asn Trp Asn Phe Pro Gln Gly Leu Gly Phe Lys Val Lys Ile Gly Ile Phe Ser Pro Tyr Phe Pro Lys Gly Gln Gln Leu His Leu Ser Glu Asn Leu Ile Glu Trp Ser Thr Gly Arg Ile Gln Met Pro Thr Ser Val Cys Ser Ala Asp Cys Gly Pro Gly Phe Arg Lys Val Trp Lys Asn Gly Met Pro Ala Cys Cys Phe Asp Cys Ser Pro Cys Pro Glu Asn Glu Ile Ser Asn G1u Thr Asn Val Glu Leu Cys Val Gln Cys Pro Glu Asp Gin Tyr Ala Asn Gln Glu Gln Asn His Cys Ile His Lys Ala Arg Ile Phe Leu Ser Tyr Asp Glu Pro Leu Gly Met Ala Leu Ser Leu Met Ala Leu Cys Leu Ala Ala Leu Thr Val val Val Leu Gly Val Phe Val Lys His His Arg Thr Pro Ile Val Lys Ala Asn Asn Cys Thr Leu Thr Tyr Ile Leu Leu Ile Ala Leu Ile Phe Cys Phe Leu Cys Pro Leu Phe Phe Ile Gly His Pro Asn Ser Ala Thr Cys Ile Leu Gln Gln Ile Thr Phe Gly Val Val Phe Thr Val Ala Ile Ser Thr Val Leu Ala Lys Thr Thr Thr Val Ile Leu Ala Phe Arg Val Thr Ala Pro His Arg Met Met Lys Tyr Phe Leu Val Ser Arg Ala Ser Asn Tyr Ile Ile Pro Ile Cys Thr Leu Ile Gln Ile Ile Val Cys Ala Ile Trp Leu Gly Ala Ser Pro Pro Ser Val Asp Ile Asp Ala Gln Ser Glu His Gly His Ile Ile Ile Ala Cys Asn Lys Gly Ser Val Thr Ala Phe Tyr Cys Val Leu Gly Tyr Leu Ala Cys Leu Ala Phe Val Ser Phe Thr Leu Ala Phe Leu Ser Arg Asn Leu Pro Val Thr Phe Asn Glu Ala Lys Ser Met Thr Phe Ser Met Leu Val Phe Cys Ser Val Trp Val Thr Phe Leu Pro Val Tyr His Gly Thr Lys Gly Lys Val Met Val Ala Val Glu Ile Phe Ser Thr Leu Ala Ser Ser Ala Gly Met Leu Gly Cys Ile Phe Ala Pro Lys Cys Tyr Thr Ile Leu Phe Arg Pro Asp Arg Asn Ser Leu Gln Met Ile Arg Glu Lys Ser Ser Ser His Thr His Ile Leu (2) INFORMATION FOR SEQ ID N0:41:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2916 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 299...2635 (D) OTHER INFORMATION: GoVN5 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:41:
TATGCTCTTC
CTACAGAAGC
GGTGATTGTT
GTGGATGCTT
ATTTGGCC AT
Met ACC AAA
Arg Phe Ala Ile Glu Glu Ile Asn Ser Asn Pro His Leu Leu Pro Asn GAG GTA
Thr Ser Leu Gly Phe Glu Ile Asn Asn Val Pro His Gly Gln Arg Tyr TGA CAT
Thr Leu Val Lys Leu Phe Ser Ser Leu Ser Gly Ser Asn Tyr Asp Ile ACT TAC
Pro Asn Tyr Ile Ser Ala Ser Glu Ser Asn Ser Ala Ala Val Leu Thr GGA TCT
Gly Pro Ser Trp Thr Ile Ser Glu Cys Val Gly Thr Leu Leu Asp Leu CCT GAG
Tyr Lys Phe Pro Gln Leu Thr Phe Gly Pro Phe Asp Ser Leu Leu Ser AGA TAC
Glu Gln Arg Arg Phe Ser Ser Leu Tyr Gln Val Ala Pro Lys Asp Thr.
loo los llo Phe Leu Thr Pro Gly Ile Val Ser Leu Met Leu His Phe His Trp Asn Trp Val Gly Leu Phe Ile -Ile Asp Asp Asp Lys Gly Ala Gln Thr Leu Ser Asp Leu Arg Asn Glu Met Asp Lys Asn Gly Val Cys Thr Ala Phe Val Glu Met Ile Pro Val Ile Lys Gly Ser Phe Phe Thr Lys Ser Trp Lys Asn His Val Gln Ile Leu Glu Ser Ser Ser Asn Val Ile Ile Ile Tyr Gly Asp Ser Asp Ser Leu Leu Ser Leu Ile Val Asn Ile Lys Gln Lys Leu Leu Thr Trp Lys Val Trp Val Leu Ile Ser Gln Trp Asp Val Ser Lys Phe Asp Asp Tyr Phe Met Val Asp Ser Leu His Gly Ala Leu Ile Phe Ser His His Arg Giu Glu Ile Pro Asn Phe Thr Asp Phe Met Gln Lys Tyr Asn Pro Ser Lys Tyr Pro Glu Asp Thr Tyr Leu His Val Leu Trp His Met Tyr Phe Asn Cys Ser Phe VaI Lys Lys Asp Cys Lys Ile Val His Asn Cys Leu Pro Asn Ala Ser Leu Gly Phe Leu Pro Gly Asn Ile Phe Asp Met Ala Met Ser Glu Glu Ser Tyr Asn Val Tyr Asn Ala Val Tyr Ala Val Ala His Ser Leu His Glu Met Ile Leu Asn Gln Val Gln Phe Gln Thr His Glu Lys Gly Lys Lys Met Val Phe Phe Pro Trp Gln Leu His Pro Phe Leu Arg Glu Arg Gln Leu Ile Asn Gln Asn TTG GTC GTA
Gly Ala Asn Glu Asp Leu Asp Thr Arg Lys His Val Glu Cys Ser Tyr TTT TCT TGT
Asp Ile Leu Asn Phe Trp Asn Pro Lys Gly Gly Leu Asn Phe Leu Val AAG GGA GTC
Lys Val Gly Thr Phe Ser Pro Ala Pro Lys Gln Lys Leu Ser Glu Ser GTG GTC TCC
Ile Ser Ser Asn Met Ile Gln Ala Thr Gly Thr Glu Ile Trp Ser Pro CTG ATT CCA
Gln Ser Val Cys Ser Glu Ser His Pro Gly Arg Lys Thr Cys Phe His TTG CAT AGA
Gln Glu Gly Arg Val Ala Cys Phe Asp Cys Pro Cys Pro Cys Ile Glu AGA GTG TCC
Asn Glu Ile Ser Asn Glu Thr Val Asp Gln Val Lys Cys Asp Cys Pro AGA CTG AAC
Glu Thr His Tyr Ala Asn Ile Lys Ile His Leu Gln Lys Glu Cys Thr TGA GAA GAC ACT
TTG CTT
Val Thr Phe Leu Tyr Tyr Asp Pro Leu Gly Thr Leu Cys Asp Lys Phe ACT TGT GTT
Met Ser Leu Gly Phe Ser Ser Thr Ala Ala Leu Val Val Leu Val Phe CAT CAA TAA CCT
GGC TCT
Leu Lys Asn Arg Asp Thr Pro Val Lys Ala Asn Leu Ala Ile Asn Leu TTT TTT CTT
Ser Tyr Thr Leu Leu Ile Thr Met Leu Cys Leu Cys Pro Leu Phe Leu CAC TAT AAA
Leu Phe Ile Gly Arg Pro Ser Ala Ser Cys Leu Gln Gln Thr Ile Asn TGT CAC CAA
Ile Phe Gly Leu Leu Phe Thr Ala Leu Ser Val Leu Ala Val Thr Lys CTT TTC AAT
Thr Ile Thr Val Val Ile Ala Lys Ile Thr Pro Gly Arg Phe Ser Ile AAG TTT CTT
Arg Arg Trp Leu Leu Ile Ser Ala Pro Asn Ile Ile Pro Arg Phe Leu TCT TTG CTC
Cys Thr Leu Leu Gln Val Phe Leu Ser Gly Ile Trp Leu Thr Thr Ser Pro Pro Phe Ile Asp Lys Asp Ala His Ser Glu His Gly His Ile Ile Ile Ile Cys Asn Lys Gly Ser Ala Val Ala Phe His Cys Asn Leu Gly Tyr Leu Gly Ala Leu Ala Leu Val Ser Tyr Phe Met Ala Phe Leu Ser ', Arg Asn Leu Pro Asp Thr Phe Asn Glu Ala Lys Phe Leu Ala Phe Ser Met Leu Val Phe Cys Ser Val Trp Val Thr Phe Leu Pro Val Tyr His ' Ser Thr Lys Gly Lys Asn Met Val Ala Met Glu Val Phe Ser Ile Leu Ala Ser Ser Thr Ser Leu Leu Gly Ile Ile Phe Ala Pro Lys Cys Tyr 740 745 75p Leu Ile Leu Leu Arg Pro Glu Arg Asn Ser Leu Ser Tyr Ile Arg Asp T
Lys Thr Tyr Ala Lys Ser Ile Lys Pro Ser (2) INFORMATION FOR SEQ ID N0:42:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 779 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single ' (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:42:
Met Arg Phe Ala Ile Glu Glu Ile Asn Ser Asn Pro His Leu Leu Pro Asn Thr Ser Leu Gly Phe Glu Ile Asn Asn Val Pro His Gly Gln Arg Tyr Thr Leu Val Lys Leu Phe Ser Ser Leu Ser Gly Ser Asn Tyr Asp Ile Pro Asn Tyr Ile Ser Ala Ser Glu Ser Asn Ser Ala Ala Val Leu .
Thr Gly Pro Ser Trp Thr Ile Ser Glu Cys Val Gly Thr Leu Leu Asp Leu Tyr Lys Phe Pro Gln Leu Thr Phe Gly Pro Phe Asp Ser Leu Leu Ser Glu Gln Arg Arg Phe Ser Ser Leu Tyr Gln Val Ala Pro Lys Asp Thr Phe Leu Thr Pro Gly Ile Val Ser Leu Met Leu His Phe His Trp Asn Trp Val Gly Leu Phe Ile Ile Asp Asp Asp Lys Gly Ala Gln Thr Leu Ser Asp Leu Arg Asn Glu Met Asp Lys Asn Gly Val Cys Thr Ala Phe Val Glu Met Ile Pro Val Ile Lys Gly Ser Phe Phe Thr Lys Ser Trp Lys Asn His Val Gln Ile Leu Glu Ser Ser Ser Asn Val Ile Ile Ile Tyr Gly Asp Ser Asp Ser Leu Leu Ser Leu Ile Val Asn Ile Lys Gln Lys Leu Leu Thr Trp Lys Val Trp Val Leu Ile Ser Gln Trp Asp Val Ser Lys Phe Asp Asp Tyr Phe Met Val Asp Ser Leu His Gly Ala Leu Ile Phe Ser His His Arg Glu Glu Ile Pro Asn Phe Thr Asp Phe Met Gln Lys Tyr Asn Pro Ser Lys Tyr Pro Glu Asp Thr Tyr Leu His Val Leu Trp His Met Tyr Phe Asn Cys Ser Phe Val Lys Lys Asp Cys Lys Ile Val His Asn Cys Leu Pro Asn Ala Ser Leu Gly Phe Leu Pro Gly Asn Ile Phe Asp Met Ala Met Ser Glu Glu Ser Tyr Asn Val Tyr Asn Ala Val Tyr Ala Val Ala His Ser Leu His Glu Met Ile Leu Asn Gln Val Gln Phe Gln Thr His Glu Lys Gly Lys Lys Met Val Phe Phe Pro Trp Gln Leu His Pro Phe Leu Arg Glu Arg Gln Leu Ile Asn Gln Asn Gly Ala Asn Glu Asp Leu Asp Cys Thr Arg Lys Ser His Val Glu Tyr Asp Ile Leu Asn Phe Trp Asn Phe Pro Lys Gly Leu Gly Leu Asn Val Lys Val Gly Thr Phe Ser Pro Ser Ala Pro Lys Glu Gln Lys Leu Ser Ile Ser Ser Asn Met Ile Gln Trp Ala Thr Gly Ser Thr Glu Ile Pro Gln Ser Val Cys Ser Glu Ser Cys His Pro Gly Phe Arg Lys Thr His Gln Glu Gly Arg Val Ala Cys Cys Phe Asp Cys Ile Pro Cys Pro Glu Asn Glu Ile Ser Asn Glu Thr Asp Val Asp Gln Cys Val Lys Cys Pro Glu Thr His Tyr Ala Asn Ile Glu Lys Ile His Cys Leu Gln Lys Thr Val Thr Phe Leu Tyr Tyr Asp Asp Pro Leu Gly Lys Thr Leu Cys Phe Met Ser Leu Gly Phe Ser Ser Leu Thr Ala Ala Val Leu Val Val Phe Leu Lys Asn Arg Asp Thr Pro Ile Val Lys Ala Asn Asn Leu Ala Leu Ser Tyr Thr Leu Leu Ile Thr Leu Met Leu Cys Phe Leu Cys Pro Leu Leu Phe Ile Gly Arg Pro Ser Thr Ala Ser Cys Ile Leu Gln Gln Asn Ile Phe Gly Leu Leu Phe Thr Val Ala Leu Ser Thr Val Leu Ala Lys Thr Ile Thr Val Val Ile Ala Phe Lys Ile Thr Ser Pro Giy Arg Ile Arg Arg Trp Leu Leu Ile Ser Arg Ala Pro Asn Phe Ile Ile Pro Leu Cys Thr Leu Leu Gln Val Phe Leu Ser Gly Ile Trp Leu Thr Thr Ser Pro Pro Phe Ile Asp Lys Asp Ala His Ser Glu His Gly His Ile Ile Ile Ile Cys Asn Lys Gly Ser Ala Val Ala Phe His Cys Asn Leu Gly Tyr Leu Gly Ala Leu Ala Leu Val Ser Tyr Phe Met Ala Phe Leu Ser Arg Asn Leu Pro Asp Thr Phe Asn Glu Ala Lys Phe Leu Ala Phe Ser Met Leu Val Phe Cys Ser Val Trp Val Thr Phe Leu Pro Val Tyr His Ser Thr Lys Gly Lys Asn Met Val Ala Met Glu Val Phe Ser Ile Leu Ala Ser Ser Thr Ser Leu Leu Gly Ile Ile Phe Ala Pro Lys Cys Tyr Leu Ile Leu Leu Arg Pro Glu Arg Asn Ser Leu Ser Tyr Ile Arg Asp Lys Thr Tyr Ala Lys Ser Ile Lys Pro Ser (2) INFORMATION FOR SEQ ID N0:43:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3307 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 112...1761 (D) OTHER INFORMATION: GoVN6 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:43:
TAAGGCAGGA AAAAATGTTC ATTTTGATGG AAGTCT"TCTT CTTCTTCCTT AACATTCCAC 60 Met Lys Leu Arg Asp Lys Asp Leu Ser Ile Thr Cys Ser Phe Ile Leu Glu Ala Val Gln Met Pro Thr Glu Asn Asp Tyr Phe Asn Gln Thr Leu Asn Ile . 20 25 30 Leu Lys Thr Thr Lys Asn His Lys Tyr Ala Leu Ala Leu Ala Phe Ser Ile Asp Glu Ile Asn Arg Asn Pro Asp Leu Leu Pro Asn Met Ser Leu AAA ACT
TAC
IleIleLys TyrProLeu GlyLeuCys AspGlyGln ThrThrLeuPro ThrProTyr LeuPheAsn GluIleTyr PheArgPro IleProAsnTyr PheCysAsn GluGluThr MetCysThr PheLeuLeu ThrGlyProHis TrpIleThr SerTyrSer PheTrpIle HisLeuAsn IlePheLeuSer ProSerMet AsnProLys AspThrSer LeuAlaLeu AlaMetValSer PheLeuLeu TyrPheLys TrpAsnTrp ValGlyLeu ValIleSerAsp AspAspGln GlyAsnGln PheLeuSer GluLeuLys LysGluSerLys IleLysGlu IleCysPhe AlaPheVal SerMetLeu AlaIleAspGlu IleSerPhe TyrHisLys ThrGluMet TyrTyrAsn GlnIleValMet SerSerThr AsnValIle IleIleTyr GlyLysThr GluSerIleIle GluLeuSer PheArgMet TrpGluSer ProValIle GlnArgIleTrp ValThrThr LysGluMet AsnPhePro ThrSerLys ArgAspLeuThr HisAspThr PheTyrGly ThrLeuThr PheLeuHis SerHisGlyGlu IleSerGly PheLysAsn PheValGln ThrTrpTyr HisLeuArgIle ThrAspLeu HisLeuVal MetProGlu TrpLysTyr PheAsnTyrGlu AlaSerAla SerAsnCys LysIleLeu LysAsnTyr SerSerSerAla Ser Leu Glu Leu Glu Gln Thr Phe Asp Val Phe Ser Trp Met Met Asp GAT TAT ATG CTC
Gly Ser Arg Ile Asn Ala Val Asn Ala Ala His Ala Asp Tyr Met Leu AAT CAC GCA GGG
His Glu Met Leu Leu Val Asp Asn Gln Ile Asp Asn Asn His Ala Gly AGT CAC TCC AAG
Lys Gly Ala Ser Cys Phe Lys Ile Asn Phe Leu Arg Ser His Ser Lys ACT CCT ATT AGA
Thr His Phe Asn Leu Gly Asp Arg Val Met Lys Glu Thr Pro Ile Arg CAA GAC ACT TCT
Glu Ile Leu Glu Tyr Asn Ile Phe His Trp Asn Phe Gln Asp Thr Ser GGT AAG TTC TTT
', Gln His Ile Phe Val Lys Ile Gly Lys Ser Pro Tyr Gly Lys Phe Phe AGG TTT ATG GCT
Pro His Gly His His Leu Tyr Val Asp Ile Glu Leu Arg Phe Met Ala AGA ATG ACT AGT
Thr Gly Ser Lys Pro Ser Ser Val Cys Glu Asp Cys Arg Met Thr Ser AGA TTC GCA TTT
Pro Gly Tyr Arg Trp Lys Glu Gly Met Ala Cys Cys Arg Phe Ala Phe CCC CCT AAT ATG
Val Cys Ser Cys Glu Asn Ala Ile Ser Glu Thr Asn Pro Pro Asn Met GTG TGT GCC CGG
Asp Gln Cys Asn Pro Glu Tyr Gln Tyr Asn Thr Lys Val Cys Ala Arg ', GAC AAA TGC CAG AAT GTG ATG TTT CTA TAC AAA GAC 1701 ATT AAA AGC CCC
Asp Lys Cys Gln Asn Val Met Phe Leu Tyr Lys Asp Ile Lys Ser Pro GAC TGC TTT AAC
Leu Gly Asp Ser Leu His Ser Leu Leu Leu Cys Ile Asp Cys Phe Asn ACT GTGAAGCACC
ATGACACTCC
TATTGTGAAG
GCCAA
. Ser Cys Cys Thr TATTAATCAC GTCTCTCTTG
TTCTGTTTTC TCTGCTCATT
ACAGAGCAAC CTGCATCTTA
CAGCAAATCA CATTTGGAAT
CTACAATTTT GGCAAAAACA
ATCACTGTGG TTCTGGCTTT
GAAGGTTGAG AAACTTCCTA
GTATTGGGTA CACTCAACTA
TGTTTCAATG TATTCTGTGT
GCAATCTGGC TAGCAGTTTC
ATGAACACAC TGAGTATGGC
CACATCATCA TTGTGTGCAA
(2) INFORMATION FOR SEQ ID N0:44:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 550 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:44:
Met Lys Leu Arg Asp Lys Asp Leu Ser Ile Thr Cys Ser Phe Ile Leu Glu Ala Val Gln Met Pro Thr Glu Asn Asp Tyr Phe Asn Gln Thr Leu Asn Ile Leu Lys Thr Thr Lys Asn His Lys Tyr Ala Leu Ala Leu Ala Phe Ser Ile Asp Glu Ile Asn Arg Asn Pro Asp Leu Leu Pro Asn Met Ser Leu Ile Ile Lys Tyr Pro Leu Gly Leu Cys Asp Gly Gln Thr Thr Leu Pro Thr Pro Tyr Leu Phe Asn Glu Ile Tyr Phe Arg Pro Ile Pro Asn Tyr Phe Cys Asn Glu Glu Thr Met Cys Thr Phe Leu Leu Thr Gly Pro His Trp Ile Thr Ser Tyr Ser Phe Trp Ile His Leu Asn Ile Phe Leu Ser Pro Ser Met Asn Pro Lys Asp Thr Ser Leu Ala Leu Ala Met Val Ser Phe Leu Leu Tyr Phe Lys Trp Asn Trp Val Gly Leu Val Ile Ser Asp Asp Asp Gln Gly Asn Gln Phe Leu Ser Glu Leu Lys Lys Glu Ser Lys Ile Lys Glu Ile Cys Phe Ala Phe Val Ser Met Leu Ala Ile Asp Glu Ile Ser Phe Tyr His Lys Thr Glu Met Tyr Tyr Asn Gln Ile Val Met Ser Ser Thr Asn Val Ile Ile Ile Tyr Gly Lys Thr Glu Ser Ile Ile Glu Leu Ser Phe Arg Met Trp Glu Ser Pro Val Ile Gln Arg Ile Trp Val Thr Thr Lys Glu Met Asn Phe Pro Thr Ser Lys Arg Asp Leu Thr His Asp Thr Phe Tyr Gly Thr Leu Thr Phe Leu His Ser His Gly Glu Ile Ser Gly Phe Lys Asn Phe Val Gln Thr Trp Tyr His Leu Arg Ile Thr Asp Leu His Leu Val Met Pro Glu Trp Lys Tyr Phe Asn Tyr Glu Ala Ser Ala Ser Asn Cys Lys Ile Leu Lys Asn Tyr Ser Ser ' 305 310 315 320 Ser Ala Ser Leu Glu Trp Leu Met Glu Gln Thr Phe Asp Met VaI Phe Ser Asp Gly Ser Arg Asp IIe Tyr Asn Ala Val Asn Ala Met Ala His Ala Leu His Glu Met Asn Leu His Leu Val Asp Asn Gln Ala Ile Asp Asn Gly Lys Gly Ala Ser Ser His Cys Phe Lys Ile Asn Ser Phe Leu Arg Lys Thr His Phe Thr Asn Pro Leu Gly Asp Arg Val Ile Met Lys Glu Arg Glu Ile Leu Gln Glu Asp Tyr Asn Ile Phe His Thr Trp Asn Phe Ser Gln His Ile Gly Phe Lys Val Lys Ile Gly Lys Phe Ser Pro Tyr Phe Pro His Gly Arg His Phe His Leu Tyr Val Asp Met Ile Glu Leu Ala Thr Gly Ser Arg Lys Met Pro Ser Ser Val Cys Thr Glu Asp Cys Ser Pro Gly Tyr Arg Arg Phe Trp Lys Glu Gly Met Ala Ala Cys Cys Phe Val Cys Ser Pro Cys Pro Glu Asn Ala Ile Ser Asn Glu Thr Asn Met Asp Gln Cys Val Asn Cys Pro Glu Tyr Gln Tyr Ala Asn Thr Lys Arg Asp Lys Cys Ile Gln Lys Asn Val Met Phe Leu Ser Tyr Lys Asp Pro Leu Gly Asp Asp Ser Cys Leu His Ser Leu Leu Phe Leu Cys Ile Asn Ser Cys Cys Thr (2) INFORMATION FOR S8Q ID N0:45:
{i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3938 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 46...2424 (D) OTHER INFORMATION: GoVN7 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:45:
Met Ile Val Phe Phe Leu Leu Asn Ile Pro Leu Leu Met Ala Asn Ser Val Asp Pro Arg AAA GTC ATA GAT
ATA GAT
CysPhe TrpLysIle AsnLeuAsn Glu LysAsp AspLeu Asp Val Ile GTT CCT
ThrSer CysTyrPhe IleLeuGlu Ala GlnLeu MetGlu Lys Val Pro CTA ACC
AspTyr PheAsnGln ThrLeuAsn Val LysThr LysTyr Asn Leu Thr ATG ATA
ArgTyr AlaLeuAla LeuAlaPhe Thr AspGlu AsnArg Asn Met Ile ATT CAT
ProHis IleLeuPro AsnMetSer Leu IleLys ThrLeu Gly Ile His TTA CAA
HisCys AspGlyAsn IleProLeu Arg LeuAsn IlePhe Tyr Leu Gln GAA ATG
MetPro PheProAsn TyrGlyCys Asn GluThr CysSer Phe Glu Met TCT TTT
MetLeu MetGlyPro AsnLeuTrp Pro ValAsp PheIle His Ser Phe CAG TTC
LeuAsn IleLeuPhe ProHisPhe Leu IleSer GlyPro Phe Gln Phe TTT ATC
HisSer IlePheSer AspAsnGlu Gln ProTyr TyrGln Met Phe Ile GCA TCT
ThrPro LysAspThr SerLeuAla Leu MetVal PheIle Leu Ala Ser GTC GAT
TyrPhe AsnTrpAsn TrpValGly Leu LeuSer AsnAsp Glu Val Asp AAA CAC
GlyAsn GlnPheLeu ThrGluLeu Lys GluThr AsnThr Glu Lys His GCA GAG
IleCys PheAlaPhe ValAsnMet Met IleAsn AsnSer Ser Ala Glu CAA ATG
MetLys LysThrAsp MetTyrTyr Asn IleVal SerThr Ala Gln Met CCC ATT
AsnVal IleIleIle TyrGlyGlu Arg SerIle GluLeu Cys Pro Ile TTC AGA ACA TGG ACA TCT CCA GTC ATA CAG AGG ATA TGG GTT ACC AAA . 921 Phe Arg Thr Trp Thr Ser Pro Val Ile Gln Arg Ile Trp Val Thr Lys Ser Glu Leu Tyr Phe Pro Thr Ser Lys Arg Asp Leu Ser His Gly Thr Phe Tyr Gly Thr Leu Ala Phe Gln Gln His His Asp Val Ile Ser Gly Phe Lys Asn PheVal Gln Thr Tyr His Leu Lys Ser Asp Leu Trp Met ', TAT TTA TTA AAGCCA GAG TGG TTC TTT GAA TAT GAA TCA GCA 1113 GGT ACC
Tyr Leu Leu LysPro Glu Trp Phe Phe Glu Tyr Glu Ser Ala Gly Thr AGT TCA
Ser Tyr Cys LysIle Leu Met Asn Ser Ser Asn Val Leu Glu Ser Ser ' 360 365 370 GAC AAT
Trp Leu Met GluGln Lys Phe Ile Ala Phe Asn Asp Ser His Asp Asn GCC CAT
Ser Ile Tyr AsnAla Val Tyr Met Ala His Ala Leu Glu Lys Ala His CAG AAA
Asn Leu Lys GlnIle Asp Asn Glu Ile Ser Tyr Gly Gly Ala Gln Lys CAC ATC
Ser Thr His CysLeu Lys Leu Ser Phe Leu Arg Thr His Phe His Ile GTG GTA
Thr Asn Pro PheGly Glu Arg Ile Met Lys Glu Arg Arg Val Val Val CAC CAA
Gln Glu Asp TyrAsp Ile Val Leu Gln Asn Cys Ser His Leu His Gln Arg Ile Lys Val Lys Ile Gly Gln Phe Ser Pro Tyr Phe Pro His Gly Gly Gln Phe His Leu Tyr Glu Asp Met Ile Asp Leu AIa Thr Gly Ser _ Arg Lys Met Pro Leu Ser Met Cys Ser Ala Asp Cys Arg Pro Gly Tyr Arg Lys Phe Trp Lys Glu Gly Met Ala Ala Cys Cys Phe Val Cys Ser Pro Cys Pro Asp Asn Glu Ile Ser Asn Glu Thr Thr Val Val Leu Trp ACT ATT AAT
ValPheVal LysHisHis Asp Pro Val LysAla Asn Arg Thr Ile Asn ATG CTC TTT
IleLeuSer TyrIleLeu Ile Ser Met PheCys Leu Cys Met Leu Phe CCT AGA ATC
SerPhePhe PheIleGly His Asn Gly ThrCys Leu Gln Pro Arg Ile TTC GTG ACA
GlnIleThr PheGlyIle Val Thr Ala ValSer Val Leu Phe Val Thr CTG TTT GAC
AlaLysThr IleThrVal Leu Ala Gln ValThr Thr Gly Leu Phe Asp GTA GGG TAC
ArgLysLeu ArgAsnPhe Leu Ser Thr ProAsn Ile Ile Val Gly Tyr TGC CTG TGG
ProIleCys SerLeuLeu Gln Thr Cys AlaIle Leu Ala Cys Leu Trp ATC GAA CAT
ValSerPro ProPheVal Asp Asp His SerGlu Gly His Ile Glu His GGA GTT TAC
IleIleIle ValCysAsn Lys Ser Met AlaPhe Cys Val Gly Val Tyr GCC GGA ATG
LeuGlyTyr LeuAlaPhe Leu Leu Ser PheThr Ala Phe Ala Gly Met ACA AAT TTC
LeuAlaLys AsnLeuPro Asp Phe Glu AlaLys Leu Thr Thr Asn Phe AGT TGG CTT
PheSerMet LeuValPhe Cys Val Ile ThrPhe Pro Val Ser Trp Leu GTC GTT ATT
TyrHisSer ThrLysGly Arg Met Ala ValGlu Phe Ser Val Val Ile ATG GGA GCA
IleLeuThr SerSerAla Gly Leu Cys ValPhe Pro Lys Met Gly Ala CCA AGA AAA
IleTyrIle IleLeuMet Lys Glu Ile LeuSer Arg Gln Pro Arg Lys TTTTAGAAAT
TCTGTCAAAT
GTACAGTTGT
T
GluLysSer ArgPhe CTCACTAGTT
CCATAAAATC
TGTAGTATTA
CAAGTACATT
ACAGGATTAC
GAATCAACAA
CAGAATACTG
GTAGAAGTTT
GAGCACCCTG
GAATACCAGC
ATACATAAGC
TCAGTGGAAG
GTGATGGTTT
GAGGAATTTG
TCAGTCTGTT
CCTGCCTGGA
GAACATGTAA
GTCTGTACAT
GATTTCCTCT
CACCGTAAAA
TATTAACATG
TGTACCTAAT
CACAAAATTC
ATAAATTTTC
(2) INFORMATION FOR SEQ ID N0:46:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 793 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:46:
Met Ile Val Phe Phe Leu Leu Asn Ile Pro Leu Leu Met Ala Asn Ser Val Asp Pro Arg Cys Phe Trp Lys Ile Asn Leu Asn Glu Val Lys Asp Ile Asp Leu Asp Thr Ser Cys Tyr Phe Ile Leu Glu Ala Val Gln Leu Pro Met Glu Lys Asp Tyr Phe Asn Gln Thr Leu Asn Val Leu Lys Thr Thr Lys Tyr Asn Arg Tyr Ala Leu Ala Leu Ala Phe Thr Met Asp Glu Ile Asn Arg Asn Pro His Ile Leu Pro Asn Met Ser Leu Ile Ile Lys His Thr Leu Gly His Cys Asp Gly Asn Ile Pro Leu Arg Leu Leu Asn ' Gln Ile Phe Tyr Met Pro Phe Pro Asn Tyr Gly Cys Asn Glu Glu Thr Met Cys Ser Phe Met Leu Met Gly Pro Asn Leu Trp Pro Ser Val Asp Phe Phe Ile His Leu Asn Ile Leu Phe Pro His Phe Leu Gln Ile Ser Phe Gly Pro Phe His Ser Ile Phe Ser Asp Asn Glu Gln Phe Pro Tyr Ile Tyr Gln Met Thr Pro Lys Asp Thr Ser Leu Ala Leu Ala Met Val Ser Phe Ile Leu Tyr Phe Asn Trp Asn Trp Val Gly Leu Val Leu Ser Asp Asn Asp Glu Gly Asn Gln Phe Leu Thr Glu Leu Lys Lys Glu Thr His Asn Thr Glu Ile Cys Phe Ala Phe Val Asn Met Met Ala Ile Asn Glu Asn Ser Ser Met Lys Lys Thr Asp Met Tyr Tyr Asn Gln Ile Val Met Ser Thr Ala Asn Val Ile Ile Ile Tyr Gly Glu Arg Pro Ser Ile Ile Glu Leu Cys Phe Arg Thr Trp Thr Ser Pro Val Ile Gln Arg Ile Trp Val Thr Lys Ser Glu Leu Tyr Phe Pro Thr Ser Lys Arg Asp Leu Ser His Gly Thr Phe Tyr Gly Thr Leu Ala Phe Gln Gln His His Asp Val Ile Ser Gly Phe Lys Asn Phe Val Gln Thr Trp Tyr His Leu Lys Ser Met Asp Leu Tyr Leu Leu Lys Pro Glu Trp Gly Phe Phe Glu Tyr Glu Thr Ser Ala Ser Tyr Cys Lys Ile Leu Met Ser Asn Ser Ser Asn Val Ser Leu Glu Trp Leu Met Glu Gln Lys Phe Asp Ile Ala Phe Asn Asp Asn Ser His Ser Ile Tyr Asn Ala Val Tyr Ala Met Ala His Ala Leu His Glu Lys Asn Leu Lys Gln Ile Asp Asn Gln Glu Ile Ser Tyr Gly Lys Gly Ala Ser Thr His Cys Leu Lys Leu His Ser Phe Leu Arg Thr Ile His Phe Thr Asn Pro Phe Gly Glu Arg Val Ile Met Lys Glu Arg Val Arg Val Gln Glu Asp Tyr Asp Ile Val His Leu Gln Asn Cys Ser Gln His Leu Arg Ile Lys Val Lys Ile Gly Gln Phe Ser Pro Tyr Phe Pro His Gly Gly Gln Phe His Leu Tyr Glu Asp Met Ile Asp Leu Ala Thr Gly Ser Arg Lys Met Pro Leu Ser Met Cys Ser Ala Asp Cys Arg Pro Gly Tyr Arg Lys Phe Trp Lys Glu Gly Met Ala Ala Cys Cys Phe Val Cys Ser Pro Cys Pro Asp Asn Glu Ile Ser Asn Glu Thr Thr Val Val Leu Trp Val Phe Val Lys His His Asp Thr Pro Ile Val Lys Ala Asn Asn Arg Ile Leu Ser Tyr Ile Leu Ile Met Ser Leu Met Phe Cys Phe Leu Cys Ser Phe Phe Phe Ile Gly His Pro Asn Arg Gly Thr Cys Ile Leu Gln Gln Ile Thr Phe Gly Ile Val Phe Thr Val Ala Val Ser Thr Val Leu Ala Lys Thr Ile Thr Val Leu Leu Ala Phe Gln Val Thr Asp Thr Gly Arg Lys Leu Arg Asn Phe Leu Val Ser Gly Thr Pro Asn Tyr Ile Ile Pro Ile Cys Ser Leu Leu Gln Cys Thr Leu Cys Ala Ile Trp Leu Ala Val Ser Pro Pro Phe Val Asp Ile Asp Glu His Ser Glu His Gly His Ile Ile Ile Val Cys Asn Lys Gly Ser Val Met Ala Phe Tyr Cys Val Leu Gly Tyr Leu Ala Phe Leu Ala Leu Gly Ser Phe Thr Met Ala Phe Leu Ala Lys Asn Leu Pro Asp Thr Phe Asn Glu Ala Lys Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ser Val Trp Ile Thr.
Phe Leu Pro Val Tyr His Ser Thr Lys Gly Arg Val Met Val Ala Val Glu Ile Phe Ser Ile Leu Thr Ser Ser Ala Gly Met Leu Gly Cys Val Phe Ala Pro Lys Ile Tyr Ile Ile Leu Met Lys Pro Glu Arg Ile Leu Ser Lys Arg Gln Glu Lys Ser Arg Phe (2) INFORMATION FOR SEQ ID N0:47:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3359 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 59...2452 (D) OTHER INFORMATION: GoVNI3C
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:47:
Met Val Ile Phe Phe Leu Leu Asn Ile Pro Phe Leu Leu Ala Asn Phe Met Asp Pro Arg Cys Phe Trp Lys Ile Asn Leu Asn Glu Ile Lys Asp Glu Val Leu Gly Met Thr Cys Ser Phe Ile Leu Glu Thr Val Gln Lys Thr Met Asp Lys Asp Tyr Phe Asn Gln Thr Leu Asn Val Leu Asn Thr Thr Thr Asn His Lys Tyr Ala Leu Ala Leu Ala Phe Thr Val Asp Glu Ile Asn Arg Asn Pro Asp Leu Leu Pro Asn Met Ser Leu Ile Ile Lys Tyr Asn Leu Gly His Cys Asp Gly Lys Thr Val Thr Thr Leu Ser Asp Leu Phe Asn Pro Asn Asn His Leu His Phe Pro Asn Tyr Leu Cys Asn Glu AGG GAT TAT GTG TTT GGT TCT GCT TAC AGG ACC ACA TTG GAG AGC ATC . 492 Gly Ile Met Cys Leu Val Leu Leu Thr Gly Pro His Trp Arg Ala Ser TTT CCT
Leu Tyr Leu Trp Ile Ser Val Tyr Val Tyr Leu Ser Pro His Phe Leu TGA ACA
Gln Leu Ser Tyr Gly Pro Phe Tyr Ser Ile Phe Ser Asp Asn Glu Gln AGC ATT
Tyr Pro Tyr Leu Tyr Gln Met Gly Pro Lys Asp Ser Ser Leu Ala Leu TGG GCT
Ala Met Val Ser Phe Ile Ile Tyr Phe Lys Trp Asn Trp Val Gly Leu GTT GAA
Phe Ile Ser Asp Asp Asp Gln Gly Asn Gln Phe Leu Ser Glu Leu Lys CAT GAT
Lys Glu Ser Gln Thr Lys Asp Ile Cys Phe Ala Phe Val Asn Met Ile CTA CAA
Ser Val Ser Asp Val Ser Tyr Tyr His Lys Thr Glu Met Tyr Tyr Asn GGA AAC
Gln Ile Val Met Ser Ser Thr Lys Val Ile Ile Ile Tyr Gly Glu Thr AGT TAA
Asn Ser Ile Ile Glu Leu Ser Phe Arg Met Trp Ser Ser Pro Val Lys CAG TAA
Gln Arg Ile Trp Val Thr Thr Lys Gln Phe Asp Cys Pro Thr Ser Lys TCT ACA
Arg Asp Leu Thr His Gly Thr Phe Tyr Gly Thr Leu Thr Phe Leu His ACG GTA
His Tyr Gly Glu Ile Ser Gly Phe Lys Asn Phe Val Gln Thr Arg Tyr GAA ATA
Asn Leu Arg Ser Thr Asp Leu Tyr Leu Val Met Pro Glu Trp Lys Tyr AAA CTA
Phe Asn Tyr Glu Ala Ser Ala Ser Asn Cys Lys Ile Leu Arg Asn Tyr TGA CAT
Leu Ser Asn Ile Ser Leu Glu Trp Leu Met Glu Gln Lys Phe Asp Met TGC CAT
Ser Phe Ser Asp Tyr Ser His Asn Ile Tyr Asn Ala Val Tyr Ala Ile.
Ala His Ala Leu His Glu Lys Asn Leu Gln Glu Val Glu Asn Gln Ala Ile Asn Asn Ala Lys Gly Glu Asn Thr His Cys Leu Lys Leu Asn Ser Phe Leu Arg Lys Thr His Phe Thr Asn Ser Leu Gly Asn Arg Val Ile Met Lys Gln Arg Glu Val Val His Gly Asp Tyr Asn Ile Val His Met Trp Asn Phe Ser Gln Arg Leu Gly Ile Lys Val Lys Ile Gly Gln Phe ', 5 470 475 480 Ser Pro His Phe Pro Gln Gly Gln Gln Leu His Leu Tyr Val Asp Met ' 485 490 495 Thr Glu Leu Ala Thr Gly Ser Arg Lys Met Pro Ser Ser Val Cys Ser Ala Asp Cys His Pro Gly Phe Arg Arg Ile Trp Lys Glu Glu Met Ala Ala Cys Cys Phe Val Cys Asn Pro Cys Pro Glu Asn Glu Ile Ser Asn ', Glu Thr Met Val Val Phe Trp Val Phe Val Lys His His Asp Thr Pro Ile Val Lys Ala Asn Asn Arg Ile Leu Ser Tyr Leu Leu Ile Val Ser Leu Met Phe Cys Phe Leu Cys Ser Phe Phe Phe Ile Gly Tyr Pro Asn Arg Ala Thr Cys Ile Leu Gln Gln Ile Thr Phe Gly Ile Phe Phe Thr Val Ala Ile Ser Thr Val Leu Ala Lys Thr Ile Thr Val Val Leu Ala _ 610 615 620 62 Phe Lys Val Thr Asp Pro Gly Arg Gln Leu Arg Ile Phe Leu Val Ser Gly Thr Pro Asn Tyr Ile Ile Pro Ile Cys Ser Leu Leu Gln Cys Ile TAT TGA
Leu Cys Ala Ile Trp Leu Ala Val Ser Pro Pro Phe Val Asp Ile Asp GGG CTC
Glu His Ser Glu His Gly His Ile Ile Ile Val Cys Asn Lys Gly Ser GGC CTT
Ile Thr Ala Phe Tyr Cys Val Leu Gly Tyr Leu Ala Cys Leu Ala Phe CAC ATT
Gly Ser Phe Thr Ile Ala Phe Leu Ala Lys Asn Leu Pro Asp Thr Phe CGC TGT
Asn Glu Ala Lys Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ala Val GGT CAT
Trp Val Thr Phe Leu Pro Val Tyr His Ser Thr Lys Gly Lys Val Met GAT GCT
Val Ala Val Glu Ile Phe Ser Ile Leu Ala Ser Ser Ala Gly Met Leu ACC AGA
Gly Cys Ile Phe Ala Pro Lys Val Tyr Ile Ile Leu Met Arg Pro Asp TGAAAAGGTA
Arg Asn Ser Ile His Lys Ile Arg Glu Lys Ser Tyr Phe AGTTACAGAG
GAAGTACCAT
GCTTAGTATC
TTTGCTTTCA
ATATAACCTT
TAACTAAAAA
TACTTGACAG
GCTGAAAATG
ATCTGAGAAC
CCATAGGAAT
CCACTAACAA
ATTGCCTGGT
GGATTGGGGA
TATTGGATGA
AAAAAAA
(2) INFORMATION FOR SEQ ID N0:48:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 79B amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:48:
Met Val Ile Phe Phe Leu Leu Asn Ile Pro Phe Leu Leu Ala Asn Phe Met Asp Pro Arg Cys Phe Trp Lys Ile Asn Leu Asn Glu Ile Lys Asp Glu Val Leu Gly Met Thr Cys Ser Phe Ile Leu Glu Thr Val Gln Lys Thr Met Asp Lys Asp Tyr Phe Asn Gln Thr Leu Asn Val Leu Asn Thr Thr Thr Asn His Lys Tyr Ala Leu Ala Leu Ala Phe Thr Val Asp Glu Ile Asn Arg Asn Pro Asp Leu Leu Pro Asn Met Ser Leu Ile Ile Lys Tyr Asn Leu Gly His Cys Asp Gly Lys Thr Val Thr Thr Leu Ser Asp Leu Phe Asn Pro Asn Asn His Leu His Phe Pro Asn Tyr Leu Cys Asn Glu Gly Ile Met Cys Leu Val Leu Leu Thr Gly Pro His Trp Arg Ala Ser Leu Tyr Leu Trp Ile Ser Val Tyr Val Tyr Leu Ser Pro His Phe Leu Gln Leu Ser Tyr Gly Pro Phe Tyr Ser Ile Phe Ser Asp Asn Glu Gln Tyr Pro Tyr Leu Tyr Gln Met Gly Pro Lys Asp Ser Ser Leu Ala Leu Ala Met Val Ser Phe Ile Ile Tyr Phe Lys Trp Asn Trp Val Gly Leu Phe Ile Ser Asp Asp Asp Gln Gly Asn Gln Phe Leu Ser Glu Leu Lys Lys Glu Ser Gln Thr Lys Asp Ile Cys Phe Ala Phe Val Asn Met Ile Ser Val Ser Asp Val Ser Tyr Tyr His Lys Thr Glu Met Tyr Tyr Asn Gln Ile Val Met Ser Ser Thr Lys Val Ile Ile Ile Tyr Gly Glu Thr Asn Ser Ile Ile Glu Leu Ser Phe Arg Met Trp Ser Ser Pro Val Lys Gln Arg Ile Trp Val Thr Thr Lys Gln Phe Asp Cys Pro Thr Ser Lys Arg Asp Leu Thr His Gly Thr Phe Tyr Gly Thr Leu Thr Phe Leu His His Tyr Gly Glu Ile Ser Gly Phe Lys Asn Phe Val Gln Thr Arg Tyr Asn Leu Arg Ser Thr Asp Leu Tyr Leu Val Met Pro Glu Trp Lys Tyr Phe Asn Tyr Glu Ala Ser Ala Ser Asn Cys Lys Ile Leu Arg Asn Tyr Leu Ser Asn Ile Ser Leu Glu Trp Leu Met Glu Gln Lys Phe Asp Met Ser Phe Ser Asp Tyr Ser His Asn Ile Tyr Asn Ala Val Tyr Ala Ile Ala His Ala Leu His Glu Lys Asn Leu Gln Glu Val Glu Asn Gln Ala Ile Asn Asn Ala Lys Gly Glu Asn Thr His Cys Leu Lys Leu Asn Ser Phe Leu Arg Lys Thr His Phe Thr Asn Ser Leu Gly Asn Arg Val Ile Met Lys Gln Arg Glu Val Val His Gly Asp Tyr Asn Ile Val His Met Trp Asn Phe Ser Gln Arg Leu Gly Ile Lys Val Lys Ile Gly Gln Phe Ser Pro His Phe Pro Gln Gly Gln Gln Leu His Leu Tyr Val Asp Met Thr Glu Leu Ala Thr Gly Ser Arg Lys Met Pro Ser Ser Val Cys Ser Ala Asp Cys His Pro Gly Phe Arg Arg Ile Trp Lys Glu Glu Met.
Ala Ala Cys Cys Phe Val Cys Asn Pro Cys Pro Glu Asn Glu Ile Ser Asn Glu Thr Met Val Val Phe Trp Val Phe Val Lys His His Asp Thr Pro Ile Val Lys Ala Asn Asn Arg Ile Leu Ser Tyr Leu Leu Ile Val Ser Leu Met Phe Cys Phe Leu Cys Ser Phe Phe Phe Ile Gly Tyr Pro Asn Arg Ala Thr Cys Ile Leu Gln Gln Ile Thr Phe Gly Ile Phe Phe Thr Val Ala Ile Ser Thr Val Leu Ala Lys Thr Ile Thr Val Val Leu Ala Phe Lys Val Thr Asp Pro Gly Arg Gln Leu Arg Ile Phe Leu Val Ser Gly Thr Pro Asn Tyr Ile Ile Pro Ile Cys Ser Leu Leu Gln Cys Ile Leu Cys Ala Ile Trp Leu Ala Val Ser Pro Pro Phe Val Asp Ile Asp Glu His Ser Glu His Gly His Ile Ile Ile Val Cys Asn Lys Gly Ser Ile Thr Ala Phe Tyr Cys Val Leu Gly Tyr Leu Ala Cys Leu Ala Phe Gly Ser Phe Thr Ile Ala Phe Leu Ala Lys Asn Leu Pro Asp Thr Phe Asn Glu Ala Lys Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ala Val Trp Val Thr Phe Leu Pro Val Tyr His Ser Thr Lys Gly Lys Val Met Val Ala Val Glu Ile Phe Ser Ile Leu Ala Ser Ser Ala Gly Met Leu Gly Cys Ile Phe Ala Pro Lys Val Tyr Ile Ile Leu Met Arg Pro Asp Arg Asn Ser Ile His Lys Ile Arg Glu Lys Ser Tyr Phe (2) INFORMATION FOR SEQ ID N0:49:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3012 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence (B) LOCATION: 3...2087 (D) OTHER INFORMATION: GoVNI3B
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:49:
Val Tyr Leu Ser Pro His Phe Leu Gln Leu Ser Tyr Gly Pro Phe Tyr Ser Ile Phe Ser Asp Asn Glu Gln Tyr Pro Tyr Leu Tyr Gln Met Gly Pro Lys Asp Ser Ser Leu Ala Leu Ala Met Val Ser Phe Ile Ile WO 5!9/00422 PCT/US98/13680 AAG GAT CAA
Tyr Phe Trp Asn Trp Val Gly Leu Phe Ile Ser Asp Asp Lys Asp Gln CAA AAG GAT
Gly Asn Phe Leu Ser Glu Leu Lys Lys Glu Ser Gln Thr Gln Lys Asp w ATT TGC GCC TTT GTG AAC ATG ATA TCA GTC AGT GAT GTT 287 TTT TCA TAC
Ile Cys Ala Phe Val Asn Met Ile Ser Val Ser Asp Val Phe Ser Tyr AAA TCC ACA
Tyr His Thr Glu Met Tyr Tyr Asn Gln Ile Val Met Ser Lys Ser Thr ATT TTG AGC
Lys Val Ile Ile Tyr Gly Glu Thr Asn Ser Ile Ile Glu Ile Leu Ser ATG ACC ACA
Phe Arg Trp Ser Ser Pro Val Lys Gln Arg Ile Trp Val Met Thr Thr TTT GGC ACA
Lys Gln Asp Cys Pro Thr Ser Lys Arg Asp Leu Thr His Phe Gly Thr GGG TCT GGC
Phe Tyr Thr Leu Thr Phe Leu His His Tyr Gly Glu Ile Gly Ser Gly AAT GAT TTA
Phe Lys Phe Val Gln Thr Arg Tyr Asn Leu Arg Ser Thr Asn Asp Leu ',, 180 185 190 GTA TCA GCA
Tyr Leu Met Pro Glu Trp Lys Tyr Phe Asn Tyr Glu Ala Val Ser Ala TGT CTG GAA
Ser Asn Lys Ile Leu Arg Asn Tyr Leu Ser Asn Ile Ser Cys Leu Glu ATG AGT CAC
', Trp Leu Glu Gln Lys Phe Asp Met Ser Phe Ser Asp Tyr Met Ser His TAC GAG AAA
Asn Ile Asn Ala Val Tyr Ala Ile Ala His Ala Leu His Tyr Glu Lys CAA GGA GAA
Asp Leu Glu Phe Glu Asn Gln Ala Ile Asn Asn Ala Lys Gln Gly Glu CAC CAC TTC
Asn Thr Cys Leu Lys Leu Asn Ser Phe Leu Arg Lys Thr His His Phe TCT GTA GTG
Thr Asn Leu Gly Asn Arg Val Ile Met Lys Gln Arg Glu Ser Val Val GAC CGC CTT .
HisGly Tyr IlevalHis MetTrpAsn Phe Gln Leu Asp Asn Ser Arg TTT
GlyIleLysVal LysIleGlyGln PheSerPro His ProGlnGly Phe GCT
GlnGlnLeuHis LeuTyrValAsp MetThrGlu Leu ThrGlySer Ala CAT
ArgLysMetPro SerSerValCys SerAlaAsp Cys ProGlyPhe His TTT
ArgArgIleTrp LysGluGluMet AlaAlaCys Cys ValCysAsn Phe ATG
ProCysProGlu AsnGluIleSer AsnGluThr Asn AspGlnCys Met AAG
AlaAsnCysPro GluTyrGlnTyr AlaAsnThr Glu AsnLysCys Lys CCC
IleGlnLysGly ValIleValLeu SerTyrGlu Asp LeuGlyMet Pro ACA
AlaLeuAlaLeu IleAlaPheCys PheSerAla Phe ValValVal Thr GTG
PheTrpValPhe ValLysHisHis AspThrPro Ile LysAlaAsn Val ATG
AsnArgIleLeu SerTyrLeuLeu IleValSer Leu PheCysPhe Met GCA
LeuCysSerPhe PhePheIleGly TyrProAsn Arg ThrCysIle Ala GCT
LeuGlnGlnIle ThrPheGlyIle PhePheThr Val IleSerThr Aia AAA
ValLeuAlaLys ThrIleThrVal ValLeuAla Phe ValThrAsp Lys ACA
ProGlyArgGln LeuArgIlePhe LeuValSer Gly ProAsnTyr Thr TGT
IleIleProIle CysSerLeuLeu GlnCysIle Leu AlaIleTrp Cys CAC
LeuAlaValSer ProProPheVal AspIleAsp Glu SerGluHis.
His Gly His Ile Ile Ile Val Cys Asn Lys Gly Ser Ile Thr Ala Phe Tyr Cys Val Leu Gly Tyr Leu Ala Cys Leu Ala Phe Gly Ser Phe Thr Ile Ala Phe Leu Ala Lys Asn Leu Pro Asp Thr Phe Asn Glu Ala Lys Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ala Val Trp Val Thr Phe Leu Pro Val Tyr His Ser Thr Lys Gly Lys Val Met Val Ala Val Glu Ile Phe Ser Ile Leu Ala Ser Ser Ala Gly Met Leu Gly Cys Ile Phe Ala Pro Lys Val Tyr Ile Ile Leu Met Arg Pro Asp Arg Asn Ser Ile His Lys Ile Arg Glu Lys Ser Tyr Phe CATATATCTA
ATGTACTGAT
TTCTGGAAGA
TGAGGATTTC
ACATAGAAAG
TCAACAAAGA
AATACTGTCT
TTCTTTATCT
TCCTGTGGTT
TACAAAGCAG
AGTCAGCCTA
TCCAGCCTCA
AGGTCTGGGG
AGAATGAATC
19~AAAAAAAAA
(2) INFORMATION FOR SEQ ID N0:50:
(i) SEQUENCE CHARACTERISTICS:
' (A) LENGTH: 695 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single ' (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:50:
Val Tyr Leu Ser Pro His Phe Leu Gln Leu Ser Tyr Gly Pro Phe Tyr Ser Ile Phe Ser Asp Asn Glu Gln Tyr Pro Tyr Leu Tyr Gln Met Gly Pro Lys Asp Ser Ser Leu Ala Leu Ala Met Val Ser Phe Ile Ile Tyr Phe Lys Trp Asn Trp Val Gly Leu Phe Ile Ser Asp Asp Asp Gln Gly Asn Gln Phe Leu Ser Glu Leu Lys Lys Glu Ser Gln Thr Lys Asp Ile Cys Phe Ala Phe Val Asn Met Ile Ser Val Ser Asp Val Ser Tyr Tyr His Lys Thr Glu Met Tyr Tyr Asn Gln Ile Val Met Ser Ser Thr Lys Val Ile Ile Ile Tyr Gly Glu Thr Asn Ser Ile Ile Glu Leu Ser Phe Arg Met Trp Ser Ser Pro Val Lys Gln Arg Ile Trp Val Thr Thr Lys Gln Phe Asp Cys Pro Thr Ser Lys Arg Asp Leu Thr His Gly Thr Phe Tyr Gly Thr Leu Thr Phe Leu His His Tyr Gly Glu Ile Ser Gly Phe Lys Asn Phe Val Gln Thr Arg Tyr Asn Leu Arg Ser Thr Asp Leu Tyr Leu Val Met Pro Glu Trp Lys Tyr Phe Asn Tyr Glu Ala Ser Ala Ser Asn Cys Lys Ile Leu Arg Asn Tyr Leu Ser Asn Ile Ser Leu Glu Trp Leu Met Glu Gln Lys Phe Asp Met Ser Phe Ser Asp Tyr Ser His Asn Ile Tyr Asn Ala Val Tyr Ala Ile Ala His Ala Leu His Glu Lys Asp Leu Gln Glu Phe Glu Asn Gln Ala Ile Asn Asn Ala Lys Gly Glu Asn Thr His Cys Leu Lys Leu Asn Ser Phe Leu Arg Lys Thr His Phe Thr Asn Ser Leu Gly Asn Arg Val Ile Met Lys Gln Arg Glu Val Val His Gly Asp Tyr Asn Ile Val His Met Trp Asn Phe Ser Gln Arg Leu Gly Ile Lys Val Lys Ile Gly Gln Phe Ser Pro His Phe Pro Gln Gly Gln Gln Leu His Leu Tyr Val Asp Met Thr Glu Leu Ala Thr Gly Ser Arg Lys Met Pro Ser Ser Val Cys Ser Ala Asp Cys His Pro Gly Phe Arg Arg Ile Trp Lys Glu Glu Met Ala Ala Cys Cys Phe Val Cys Asn Pro Cys Pro Glu Asn Glu Ile Ser Asn Glu Thr Asn Met Asp Gln Cys Ala Asn Cys Pro Glu Tyr Gln Tyr Ala Asn Thr Glu Lys Asn Lys Cys Ile Gln Lys Gly Val Ile Val Leu Ser Tyr Glu Asp Pro Leu Gly Met Ala Leu Ala Leu Ile Ala Phe Cys Phe Ser Ala Phe Thr Val Val Val Phe Trp Val Phe Val Lys His His Asp Thr Pro Ile Val Lys Ala Asn Asn Arg Ile Leu Ser Tyr Leu Leu Ile Val Ser Leu Met Phe Cys Phe Leu Cys Ser Phe Phe Phe Ile Gly Tyr Pro Asn Arg Ala Thr Cys Ile Leu Gln Gln Ile Thr Phe Gly Ile Phe Phe Thr Val Ala Ile Ser Thr Val Leu Ala Lys Thr Ile Thr Val Val Leu Ala Phe Lys Val Thr Asp Pro Gly Arg Gln Leu Arg Ile Phe Leu Val Ser Gly Thr Pro Asn Tyr Ile .
Ile Pro Ile Cys Ser Leu Leu Gln Cys Ile Leu Cys Ala Ile Trp Leu Ala Val Ser Pro Pro Phe Val Asp Ile Asp Glu His Ser Glu His Gly His Ile Ile Ile Val Cys Asn Lys Gly Ser Ile Thr Ala Phe Tyr Cys 580 585 . 590 Val Leu Gly Tyr Leu Ala Cys Leu Ala Phe Gly Ser Phe Thr Ile Ala Phe Leu Ala Lys Asn Leu Pro Asp Thr Phe Asn Glu Ala Lys Phe Leu Thr Phe Ser Met Leu Val Phe Cys Ala Val Trp Val Thr Phe Leu Pro Val Tyr His Ser Thr Lys Gly Lys Val Met Val Ala Val Glu Ile Phe Ser Ile Leu Ala Ser Ser Ala Gly Met Leu Gly Cys Ile Phe Ala Pro Lys Val Tyr Ile Ile Leu Met Arg Pro Asp Arg Asn Ser Ile His Lys Ile Arg Glu Lys Ser Tyr Phe (2) INFORMATION FOR SEQ ID N0:51:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 435 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:51:
(2) INFORMATION FOR SEQ ID N0:52:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 145 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide - (xi) SEQUENCE DESCRIPTION: SEQ ID N0:52:
Gln Thr Leu Ser Tyr Thr Leu Leu Val Ser Leu Thr Leu Cys Phe Leu Ser Ser Ser Leu Phe Ile Gly Arg Pro Ser Pro Ala Thr Cys Leu Leu Ser Gln Thr Thr Phe Ala Ala Val Phe Thr Val Ala Val Phe Phe Cys Arg Ala Phe Gln Ala Ile Arg Pro Glu Ser Arg Ile Arg Lys Trp Met Gly Pro Gln Lys Thr Asn Ser Val Val Phe Leu Cys Ser Phe Thr Gln 65 70 75 g0 Val Thr Leu Cys Gly Ile Trp Leu Gly Thr Glu Pro Pro Phe Val Asn Lys Asp Pro Gln Phe Met Pro Gly Tyr Ile Ile Ile Gln Cys Asn Glu Gly Ser Val Thr Ala Phe Tyr Ser Val Leu Gly Tyr Leu Gly Phe Leu Val Leu Gly Ser Leu Ala Val Ala Phe Leu Ala Arg Asn Leu Pro Asp Ala (2) INFORMATION FOR SEQ ID N0:53:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 474 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:53:
(2) INFORMATION FOR SEQ ID N0:54:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 338 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:54:
(2) INFORMATION FOR SEQ ID N0:55:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 182 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID N0:55:
~ 182 (2) INFORMATION FOR SEQ ID N0:56:
(i) SEQUENCE CHARACTERISTICS:
. (A) LENGTH: 37 base pairs (B) TYPE: nucleic acid (C) STR.ANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID
N0:56:
GACAAAATAT GAATTCT
(2) INFORMATION FOR SEQ ID N0:57:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 37 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID
N0:57:
GTACTCTTCA GAATTCT
(2) INFORMATION FOR SEQ ID N0:58:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 51 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID
N0:58:
Asn Met Asp Gln Cys Ala Asn Cys Pro Glu Tyr Ala Asn Thr Tyr Gln Glu Lys Asn Lys Cys Ile Gln Lys Gly Val Leu Ser Tyr Glu Ile Val Asp Pro Leu Gly Met Ala Leu Ala Leu Ile Cys Phe Ser Ala Ala Phe ' 35 40 45 Phe Thr Val !, 50 (2) INFORMATION FOR SEQ ID N0:59:
_ (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1079 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID N0:59:
Met Ala Ser Tyr Ser Cys Cys Leu Ala Leu Leu Ala Leu Ala Trp His ' Ser Ser Ala Tyr Gly Pro Asp Gln Arg Ala Gln Lys Lys Gly Asp Ile .
Ile Leu Gly Gly Leu Phe Pro Ile His Phe Gly Val Ala Ala Lys Asp Gln Asp Leu Lys Ser Arg Pro Glu Ser Val Glu Cys Ile Arg Tyr Asn Phe Arg Gly Phe Arg Trp Leu Gln Ala Met Ile Phe Ala Ile Glu Glu Ile Asn Ser Ser Pro Ser Leu Leu Pro Asn Met Thr Leu Gly Tyr Arg Ile Phe Asp Thr Cys Asn Thr Val Ser Lys Ala Leu Glu Ala Thr Leu Ser Phe Val Ala Gln Asn Lys Ile Asp Ser Leu Asn Leu Asp Glu Phe Cys Asn Cys Ser Glu His Ile Pro Ser Thr Ile Ala Val Val Gly Ala Thr Gly Ser Gly Val Ser Thr Ala Val Ala Asn Leu Leu Gly Leu Phe Tyr Ile Pro Gln Val Ser Tyr Ala Ser Ser Ser Arg Leu Leu Ser Asn Lys Asn Gln Tyr Lys Ser Phe Leu Arg Thr Ile Pro Asn Asp Glu His Gln Ala Thr Ala Met Ala Asp Ile Ile Glu Tyr Phe Arg Trp Asn Trp Val Gly Thr Ile Ala Ala Asp Asp Asp Tyr Gly Arg Pro Gly Ile Glu Lys Phe Arg Glu Glu Ala Glu Glu Arg Asp Ile Cys Ile Asp Phe Ser Glu Leu Ile Ser Gln Tyr Ser Asp Glu Glu Glu Ile Gln Gln Val Val Glu Val Ile Gln Asn Ser Thr Ala Lys Val Ile Val Val Phe Ser Ser Gly Pro Asp Leu Glu Pro Leu Ile Lys Glu Ile Val Arg Arg Asn Ile Thr Gly Arg Ile Trp Leu Ala Ser Glu Ala Trp Ala Ser Ser Ser Leu Ile Ala Met Pro Glu Tyr Phe His Val Val Gly Gly Thr Ile Gly Phe Gly Leu Lys Ala Gly Gln Ile Pro Gly Phe Arg Glu Phe Leu Gln Lys Val His Pro Arg Lys Ser Val His Asn Gly Phe Ala Lys Glu Phe Trp Glu Glu Thr Phe Asn Cys His Leu Gln Glu Gly Ala Lys Gly Pro Leu Pro Val Asp Thr Phe Val Arg Ser His Glu Glu Gly Gly Asn Arg Leu Leu Asn Ser Ser Thr Ala Phe Arg Pro Leu Cys Thr Gly Asp Glu Asn Ile Asn Ser Val Glu Thr Pro Tyr Met Asp Tyr Glu His Leu Arg Ile Ser Tyr Asn Val Tyr Leu Ala Val Tyr Ser Ile Ala His Ala Leu Gln Asp Ile Tyr Thr Cys Leu Pro Gly Arg Gly Leu Phe Thr Asn Gly Ser Cys Ala Asp Ile Lys Lys Val Glu Ala Trp Gln Val Leu Lys His Leu Arg His Leu Asn Phe Thr Asn Asn Met Gly Glu Gln Val Thr Phe Asp Glu Cys Gly Asp Leu Val Gly Asn Tyr Ser Ile Ile Asn Trp His Leu Ser Pro Glu Asp Giy Ser Ile Val Phe Lys Glu Val Gly Tyr Tyr Asn Val Tyr Ala Lys Lys Gly Glu Arg Leu Phe Ile Asn Glu Glu Lys Ile Leu Trp Ser Gly Phe Ser Arg Glu Val Pro Phe Ser Asn Cys Ser Arg WO 99/t10422 PCT/US98/13680 Asp Cys Gln Ala Gly Thr Arg Lys Gly Ile Ile Glu Gly Glu Pro Thr Cys Cys Phe Glu Cys Val Glu Cys Pro Asp Gly Glu Tyr Ser Gly Glu Thr Asp Ala Ser Ala Cys Asp Lys Cys Pro Asp Asp Phe Trp Ser Asn Glu Asn His Thr Ser Cys Ile Ala Lys Glu Ile Glu Phe Leu Ala Trp Thr Glu Pro Phe Gly Ile Ala Leu Thr Leu Phe Ala Val Leu Gly Ile Phe Leu Thr Ala Phe Val Leu Gly Val Phe Ile Lys Phe Arg Asn Thr Pro Ile Val Lys Ala Thr Asn Arg Glu Leu Ser Tyr Leu Leu Leu Phe Ser Leu Leu Cys Cys Phe Ser Ser Ser Leu Phe Phe Ile Gly Glu Pro Gln Asp Trp Thr Cys Arg Leu Arg Gln Pro Ala Phe Gly Ile Ser Phe Val Leu Cys Ile Ser Cys Ile Leu Val Lys Thr Asn Arg Val Leu Leu Val Phe Glu Ala Lys Ile Pro Thr Ser Phe His Arg Lys Trp Trp Gly Leu Asn Leu Gln Phe Leu Leu Val Phe Leu Cys Thr Phe Met Gln Ile Leu Ile Cys Ile Ile Trp Leu Tyr Thr Ala Pro Pro Ser Ser Tyr Arg Asn His Glu Leu Glu Asp Glu Ile Ile Phe Ile Thr Cys His Glu Gly Ser Leu Met Ala Leu Gly Ser Leu Ile Gly Tyr Thr Cys Leu Leu Ala Ala Ile Cys Phe Phe Phe Ala Phe Lys Ser Arg Lys Leu Pro Glu Asn Phe Asn Glu Ala Lys Phe Ile Thr Phe Ser Met Leu Ile Phe Phe Ile Val Trp Ile Ser Phe Ile Pro Ala Tyr Ala Ser Thr Tyr Gly Lys Phe Val Ser Ala Val Glu Val Ile Ala Ile Leu Ala Ala Ser Phe Gly Leu Leu Ala Cys Ile Phe Phe Asn Lys Val Tyr Ile Ile Leu Phe Lys Pro Ser Arg Asn Thr Ile Glu Glu Val Arg Ser Ser Thr Ala Ala His Ala Phe Lys Val Ala Ala Arg Ala Thr Leu Arg Arg Pro Asn Ile Ser Arg Lys Arg Ser Ser Ser Leu Gly Gly Ser Thr Gly Ser Ile Pro Ser Ser Ser Ile Ser Ser Lys Ser Asn Ser Glu Asp Arg Phe Pro Gln Pro Glu Arg Gln Lys Gln Gln Gln Pro Leu Ser Leu Thr Gln Gln Glu Gln Gln Gln Gln Pro Leu Thr Leu His Pro Gln Gln Gln Gln Gln Pro Gln Gln Pro Arg Cys Lys Gln Lys Val Ile Phe Gly Ser Gly Thr Val Thr Phe Ser Leu Ser Phe Asp Glu Pro Gln Lys Asn Ala Met Ala His Arg Asn Ser Met Arg Gln Asn Ser Leu Glu Ala Gln Arg Ser Asn Asp Thr Leu Gly Arg His Gln Ala Leu Leu Pro Leu Gln Cys Ala Asp Ala Asp Ser Glu Met Thr Ile Gln Glu Thr Gly Leu Gln Gly Pro Met Val Gly Asp His Gln Pro Glu Met Glu Ser Ser Asp Glu Met Ser Pro Ala Leu Val Met Ser Thr Ser Arg Ser Phe Val Ile Ser Gly Gly Gly Ser Ser Val .
Thr Glu Asn Val Leu His Ser (2) INFORMATION FOR SEQ ID N0:60:
(iI SEQUENCE CHARACTERISTICS:
(A) LENGTH: 26 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ix) FEATURE:
(A) NAME/KEY: Modified Base (B) LOCATION: 3...3 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 12...12 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 15...15 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 18...18 (D) OTHER INFORMATION: Inosine (xi) SEQUENCE DESCRIPTION: SEQ ID N0:60:
(2) INFORMATION FOR SEQ ID N0:61:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 26 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ix) FEATURE:
(A) NAME/KEY: Modified Base (B) LOCATION: 6...6 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 9...9 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 12...12 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 18...18 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base _, (B) LOCATION: 21...21 (D) OTHER INFORMATION: Inosine (xi) SEQUENCE DESCRIPTION: SEQ ID N0:61:
(2) INFORMATION FOR SEQ ID N0:62:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 26 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ix) FEATURE:
(A) NAME/KEY: Modified Base (B) LOCATION: 3...3 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 9...9 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 12...12 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 13...13 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base ', (B) LOCATION: 24...24 (D) OTHER INFORMATION: Inosine (xi) SEQUENCE DESCRIPTION: SEQ ID N0:62:
(2) INFORMATION FOR SEQ ID N0:63:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 31 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ix) FEATURE:
(A) NAME/KEY: Modified Base (B) LOCATION: 2...2 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 5...5 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 8...8 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 11...11 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 14...14 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 20...20 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 26...26 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 29...29 (D) OTHER INFORMATION: Inosine (xi) SEQUENCE DESCRIPTION: SEQ ID N0:63:
(2) INFORMATION FOR SEQ ID N0:64:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 26 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ix) FEATURE:
(A) NAME/KEY: Modified Base (B) LOCATION: 3...3 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 6...6 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 9...9 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 12...12 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 16...16 (D) OTHER INFORMATION: Inosine (A) NAME/ICEY: Modified Base (B) LOCATION: 24...24 (D) OTHER INFORMATION: Inosine ', (xi) SEQUENCE DESCRIPTION: SEQ ID N0:64:
A~SNYTNR TNTTYNGYTT YYTNTG 26 (2) INFORMATION FOR SEQ ID N0:65:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 28 base pairs (8) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ix) FEATURE:
(A) NAME/KEY: Modified Base (B) LOCATION: 2...2 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base ' (B) LOCATION: 5...5 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 11...11 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 17...17 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 20...20 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 23...23 (D) OTHER INFORMATION: Inosine (xi) SEQUENCE DESCRIPTION: SEQ ID N0:65:
RNATNSWRAA NAYYTCNACN RCNACCAT 2g (2) INFORMATION FOR SEQ ID N0:66:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 26 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ix) FEATURE:
(A) NAME/KEY: Modified Base (B) LOCATION: 6...6 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 9...9 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 12...12 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 15...15 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 21...21 (D) OTHER INFORMATION: Inosine (xi) SEQUENCE DESCRIPTION: SEQ ID N0:66:
(2) INFORMATION FOR SEQ ID N0:67:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 26 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ix) FEATURE:
(A) NAME/KEY: Modified Base (B) LOCATION: 3...3 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 6...6 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 12...12 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 15...15 (D) OTHER INFORMATION: Inosine (A) NAME/KEY: Modified Base (B) LOCATION: 24...24 (D) OTHER INFORMATION: Inosine (xi) SEQUENCE DESCRIPTION: SEQ ID N0:67:
(2) INFORMATION FOR SEQ ID N0:68:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2550 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:68:
TGAAGTTTTC
ATAGTGAAGA
ATGAACCTAT
AATATGAGTT
TTTTACCCAA
GAGTTATGGA
GTTATTTAGA
AACTGGCAAT
GCGACCATGA
ATGGCATGGT
ATGATGACCA
TCTGTTTAGC
CAATATATGA
TGAACTCTAC
GGATCACAAC
TCCATGGGAT
TGCAAACAAT
ATTATTTTAA
ACACCTTGGA
ATTTGTACAA
TAGAGTCTCA
CCTTGATGAA
GGGAAAATCA
GATTAAAAGT
TATCTGATGA
GTAGTGTGGC
GCTTTGATTG
GTGTGAGGTG
CTGTATCATT
CCTTCTCAGC
CTGTGAAGGC
TTCTCTGCTC
CCACATTTGG
TGGTCATGGC
GGGCACCTAA
GGTTGGTCAC
TCATTCTTTG
CCTTGGCTCT
ATGAAGCCAA
TCCCTGTCTA
TGGCTTCTAG
TTAGACCAGA
(2) INFORMATION FOR SEQ ID N0:69:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2424 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:69:
(2) INFORMATION FOR SEQ ID N0:70:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2409 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:70:
TTCACTTTTA
TGCTACTGAT
CATCATTGGT
AAATGGACAT
TCTTACAGGA
GGTTTTCTTT
TCAAGTAGCC
TAGATGGACT
AGATTTAAGA
AGAAAACATG
TTTAGCAAAA
AAGATGGGAA
CACAAATAAA
CAGATTTGAG
AGTAGATATT
CAGCAGTAAA
CTATGATATG
CCACACCTAC
AAGATTTTTC
CCCTGTTGGA
TTTCCTCATT
ACCTTGTTTT
AGGAACATCA
AATTCATCAG
GGTTTCCAAT
CATAGAGAAA
GGGGATAGCT
CACATTTTTG
CATCCTGCTC
AAACCAGGTC
TTCTACAGTG
AAGAAGAATG
CCTAATCCAA
AGATATACAA
CTTCCATGTT
CTTGGCTAGG
GGTGTTCTGC
CATGGTGGTT
CTTTGTCCCA
CAAAGATAAA
(2) INFORMATION FOR SEQ ID N0:71:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2556 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:71:
CACTTCTCAT
TAACGGATGA
TTGAAAAAGA
ATGCTTTGGC
ATATGTCTTT
CACCATATTT
AGAGTATGTG
AGTACCTGGA
CCATCTTCAG
CTCTAGCATT
TCATCCCAGA
ACAAAGAAAT
AAAAAACTGA
ATGGAGAAAC
AGAGAATATG
ATGACACATT
AAAATTTTGT
AGTGGAAATA
CATCTGATGC
ATAGTCATAA
TGCAACAGGC
AGGTAAACTC
TGAAGCAAAG
AACACCTTGG
ACTCTCACTT
CTGTGTGCAG
CCTGCTGTTT
ATCAATGCGT
AGAAAGGTGT
CCTTCTGCTT
CTCCTATTGT
TCTGTTTTCT
AGCAAATCAC
TCACTGTGGT
TATCAGGGAC
CAATCTGGCT
ACATCATCAT
TGGCCTGCCT
CATTCAATGA
CCTTCCTCCC
CTATCTTGGC
TTTTAATGAG
(2) INFORMATION FOR SEQ ID N0:72:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2169 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:72:
GGATGAATCT
GCTTTCCTAT
TCAGATGGCC
GAAATGGAAT
AGAGTTGAAG
TGTTGATGAA
ATTAACAAAT
AATGTGGGAA
TACCAGTAAG
CCATGGTGAG
AGATTTATAT
TTGTAAAATA
GCTTGACATG
CCATGCCCTC
AGGAGCCAGT
TCCTCTTGGG
TGTTCACTTT
CCCATATTTA
AGGAAGAAGA
ATTATGGAAG
CA 02294473~1999-12-21 AATTTCTAAT
CACAGAACAG
GGGGATGGCA
TGTCTTTGTG
TCTATTACTC
AAACAAAGTC
TTCCACAGTT
AAGAAGATTG
CCTACTCCAA
TGATGAACAC
ATTCTACTGT
CTTGGCCAAG
AGTGTTCTGC
CATGGTTGCT
TTTTGTACCC
CAGGGAAAAA
(2) INFORMATION FOR SEQ ID N0:73:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1889 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESC&IPTION: SEQ ID N0:73:
GCCATTGTTT
GACCACAAAG
TGTACGGCTT
GAAAATATGG
CTAATGCGAA
CCCCATATTA
TTTAAGCACA
AAAAAATACC
TTTTCTGATA
TTACCTAGTC
GTGTACGCTG
TGTGAAAATG
ATTGAGGTGA
CTTAACCTCT
GCAAATGCTC
ATATTTTCAG
GTAACCCTGG
ATTTCTAATG
ACAGAGAAGA
GGGATGGCTC
ATATTTGTGA
ACTTTGCTCA
AACACAGTTG
GCCACTGTGT
AGAATGGTAA
CTGATCCAAC
GATGCTCATA
TTCCACTCTG
TTGTCAAGAA
GTATTCTTCT
ATGGTCGCCG
(2) INFORMATION FOR SEQ ID N0:74:
- 182 - w (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1889 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:74:
ATGGCGACGA AGGACACATC
TCTTTCACTT GCCATTGTTT
(2) INFORMATION FOR SEQ ID N0:75:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 270 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:75:
(2) INFORMATION FOR SEQ ID N0:76:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1308 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:76:
TCTCATCTTG
TAATGACGGA
TGAAGATAAT
TCTTCTCGTA
CATAACTTTG
CCAAGCATAT
TGATTCATGT
GCACTCTTCG
CCGGCTGCCC
CTCCTTGATG
TTTCACTTTA GATGGACTTG.GATAGGAATG GTCATCTCAG ATGATGACCA 660 GGi3TATTCAG
TTTTGTTAAT
TCAACAAATT
TCTAGAAGTA
CTCACAATGG
TATCACTTTT
GAACACTGCC
TTGTTCAATA
ATGGACATCA
TGCTGTTTAT
GAAAAAGGCA
(2) INFORMATION FOR SEQ ID N0:77:
(i) SEQUBNCE CHARACTERISTICS:
(A) LENGTH: 1296 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:77:
TCTCATCTTG
TAGTGATGGA
TGAAGATAAT
TCTTCTGGTA
CATAACTTTG
TCAATCATAT
TGATTCATGT
AACATAGGCC TTACAGGACC ATCATGGAAA AAATCCiTAA AACTGGCAAT 480 GGATTCTTCA
CCGGCTGCCC
CTCCTTGATG
GGGTATTCAG
TTTTGTTAAT
TAAACAAATT
TCTAGAAGTA
CTCACAATGG
TATCACTTTT
GAACACTGCC
TTGTTCAATC
TCTAAGAACA GCAGTAAAAT GGATCTT'!'TT ACATCCAACA ACACATTGGA1140 ATGGACAGCA
CTGCACAACT ATGATATGGC CATGAGTGAT GAAiGGTTACA ATTTGTATAA1200 TGCTGTTTAT
GAAAAAGGTA
GAACACAACA GATATTTCAC TGTTTGTCAG CAGATA ' 1296 (2) INFORMATION FOR SEQ ID N0:78:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1521 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:78:
TCTCATCTTG
TAGTGATGGA
TGAAGATAAT
TCTTCTGGTA
CATAACTTTG
TCAATCATAT
TGATTCATGT
GGATTCTTCA
CCGGCTGCCC
CTCCTTGATG
GGGTATTCAG
TTTTGTTAAT
TAAACAAATT
TCTAGAAGTA
CTCACAATGG
TATCACTTTT
GAACACTGCC
TTGTTCAATC
ATGGACAGCA
TGCTGTTTAT
GAAAAAGGTA
AACCAGGGTA
GTGTACAGAG
GAAAATAGGA
TTTGGAATGG
(2) INFORMATION FOR SEQ ID N0:79:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 933 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:79:
TCTCATCTTG
TAATGATGGA
TGAAGATAGT
TCTTCTGGTA
CATGAGTTTG
TCAAGAATAT
TGATTCATGT
GCATTCTTCA
CCGGCTGCCC
CTCCTTGATG
GGGTATTCAG
TTTTGTTAAT
TACACAAATT
TCTAGAAGCA
(2) INFORMATION FOR SEQ ID N0:80:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1236 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:80:
GCAAAGGGAA
TAATTCACTT
TTTTGCTACT
CTCCATCATT
AATAAATGGA
AGGTCTTACA
ACTGGTTTTC
CCATCAAGTA
TTTTAGATGG
CTCAGATTTA
CCCAGAAAAC
GTCTTTAGCA
TAGAAGATGG
CATCACAAAT
CCGCAGATTT
CCCAGTAGAT
GAACAGCAGT
CAACTATGAT
GGCCCACACC
CAAAAGATTT
(2) INFORMATION FOR SEQ ID N0:81:
(i) SEQU8NC8 CHARACTERISTICS:
(A) LENGTH: 2412 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:81:
GGCCAATTTC
ATATTTGGGA
TTATTTCAAC
ATTGGTGTTT
GATTATAAGA
TACACTTTGG GCCGTTGTGA Z'GGAAAAACT GTAATACCTA CACCATATTT 360 ATTTCGTAAA
TTCCTATCTG
CAGCTTCTTA
TGATGATGAA
GGCAATGGTC
TGATGACCAA
GGAAACCAAT TTCTTTTAGA GTT'GAAGAAA CAGAGTGAAA ACAAGGAAAT 720 TTGCTTTGCC
AATGTACTAC
ATACAATTTC
GATCACCACA
CTATGGATCA
(2) INFORMATION FOR SEQ ID N0:82:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 381 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:82:
(2) INFORMATION FOR SEQ ID N0:83:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 228 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: CDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:83:
(2) INFORMATION FOR SEQ ID N0:84:
- 187 - _ (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1644 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SBQ ID N0:84:
ATGTTAGAAT TGGCCCATGG CACTCTaACT TTCTCACCCC ATCATGGGGA 60 GATTTCTGAT
TTTTCTTCAC
AATCTTTGAA
GCTGGTCATG
TCTCCATGAG
TATATTATTT
TGGTGATCGT
TATTTGGAAT
TGCTCCCAAG
TACAGAGATT
CCTGGAGAGC
AAGCCTGCCT GTTaCTTTGA CTGCACTCCT TGCCCAGATA AAGAGATTTC 720 CAACGAGACA
GAAGAGTCAC
GGGACTCACA
TATAATCCAC
GCTCATCACT
AGCCACATGT
AGTGTTGGCC
GACAAGATGG
GCAAATCCTT
TCACTCTGAA
CTGTACTCTG
CAGGAATCTT
CTGCAGTGTC
GGCTATGGAA
CCCTAAGTGC
AAAAAGACAG
(2) INFORMATION FOR SEQ ID N0:85:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2304 base pairs (B) TYPE: nucleic acid (C) STRANDEDNBSS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:85:
TATAAAACAT
ATTTTATAAG
TATAGGGCTG
TCCACGTTTC
ATTTCCATAT
CTTCTTACTT
CAATCAATTT
CTCTCAGAGT TG~?~AAAAAGA GACCCAAAAC AAGGAAATTT GCTTTGCCTT480 TGTTAACATG
TCAAATAGTG
ATGTCATCAA CAAATATTAT TATCATTTAT GGGAX~AACAA ACAGTATCAT600 TGAATTGAGC
AGAGTTGGAT
GACATTTCTA
CCATCTCAGA
TGCCTCAGGA
-188- _ TCCAATGCCT
AGTCATAGTA
CAAGAGGTTG
GTAAACTCAT
AAACAGAGAG
CACCTTCGGA
TTTCACTTAT
GTGTGCAGTG
TGCTGTTTTA
CAATGTGTGA
AAAGACGTGA
TTCTGTTTGT
CCTATTGTGA
TGTTTTCTCT
CAAATCACAT
ACTGTCATTC
TCTGGTGCAC
ATTTGGCTAG
ATCATGATTG
GCCTGCCTGG
TTCAACGAAG
TTTCTCCCTG
ATCTTGGCAT
TTAATGAAAC
(2) INFORMATION FOR SEQ ID N0:86:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2001 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:86:
CCATTTCAGC
TATCTTGGAA
CATTGTTAGT
GATGTCATCA
CTTTAGACTA
TATGATCATA
ACATCACTAT
CTACAGTGAT
ATTATCTGAA
CAGGCACCAT
TGCTGTGGCT
TGATGGAAAA
ATTTATAAAC
GTATGAGATT
AACATTTTCC
GTGGAACACA
ATTCAGAAAA
AGAAAATGAA
GTATGCCAAT
AGATCCATTG
TGTACTTAGT
TCTCAGCTAT
TGGTCATCCC
TGTAGCTGCA
TAATACAAGT
AATTTGCACA
TGTTGATGCT
- 189 - _ ATTTTCTGTA
(2) INFORMATION FOR SEQ ID N0:87:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2598 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:87:
CTTTCTCCTG
TATGAAGAAG
AGCAGTGCAA
CAAAAGTATC CTCTCACCTT GGCTTTI"TCC ATGAATGAAA TCAACAACAA 300 CCCTGATCTT
TTGCCAAATA TGTCTTTAGC AT't'TACATTC TCAGAATATA GTTaTTATTT360 GGAATCCCAC
TTTTATCTGT
AACTGTGACA
TTATGGACAC
GGCCTCTGAT
GATACATCTC TAGCCG'TTGC TCTCGTCTCC TTCATAATTC ATTTCAGTTG 660 GAGAAGAGAG
TATGAATTTA
AAATGTTGTT
GGACTCTCTA
TAAGAAAGAC
TTCACATTTG ATAATGGATA TGGAACTTTT GGTTTTC',GAC ACCGCCACAG1020 TGAGATTTCT
ATATTTGGTA
GTCACTGAAG
CATGGCCATT
ACTCCATGAG
ATGACTC1'TC AAAATGTTGA TAATGTTCTC CTTCCCAATT ATGAAGAACA 1320 AAATTATAAT
TGCAAGATGG TTTATTCCTT Tt'TGAGCAAG ACTCAATTCA CAAFrTCCTGT1380 TGGAGACACT
GTGAATATGA ATCAAAGAAA CAAACTGAAG GAAGAGTACG ACATTrTCTA 1440 CAATTGGAAT
TTTTCCAAAA
TATACAGATG
GAAC'aAATQGA
TAATGAGACA
GCAGAATCAC
GGCTCTTTCC
TGTGAAACAT
GCTCATCGCA
AGCTACCTGC
TGTGTTGGCC
GATGAAGTAC
TCAAATTATT
ACAGTCTGAG
CTGTGTCCTG
CAGAAACCTG
CTGCAGTGTC
GGCTGTTGAG
TCCAAAATGC
GAAGTCATCT
(2) INFORMATION FOR SEQ ID N0:8B:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2337 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:88:
CACATCCCTG
ACTTTTTAGC
GAGCAATTCT
ACTCCTGGAT
TGAACAAAGA
TGGCATTGTA
TGATGACAAA
CTGCACAGCA
GAAAAATCAT
TGATTCTCTA
GGTACTGATC
GCATGGAGCT
GCAGAAGTAC
GTACTTCAAT
TGCCTCCCTG
CAATGTATAC
AGTACAATTT
CCCCTTTCTA
AGGGAAAGAC AACTCATCAA TCAGAATGGA GCGAATGA,AG ATCTGGATTG1140 TACCAGGAAG
TGGGCTAAAT
CATATCTTCT
CAGTGAGAGC
CTTTGACTGC
TGTGAAGTGT
TGTGACATTT
TTTCTCCTCA
TGTCAAGGCC
TCTCTGTCCC
CATTTTTGGG
GGTTATAGCC
GGCCCCTAAT
GCTGACAACC
CATCATTTGC
ACTAGCCCTA
TGAAGCCAAG
CCCTGTCTAC
GGCTTCCAGT
AAGACCAGAA
ACCTTCT
(2) INFORMATION FOR SEQ ID N0:89:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1650 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:89:
AACAAAAAAC
TCCTGATCTT
ACAAACTACA
TTATTTCTGT
AAT'Cv4AGAGA CTATGTGTAC ATTTCTACTT ACAGGACCGC ATTGGATAAC360 ATCTTATAGT
CACATCCCTA
CCTTGTCATC
CAAAATCAAG
TTATCATAAA
CATTTATGGG
AAAAC14,GAGA GTATTATTGA GTTGAGCTTC AGAATGTGGG AATCTCCAGT720 TATCCAGAGA
AACTCATGAC
CTTTAAAAAT
GCCAGAGTGG
CTATTCATCC
TGATGGAAGT
GAATCTGCAC
CTTTAAGATA
GATTATGAAA
TTCTCAGCAC
CAGGCACTTT
ATCCTCTGTG
GGCAGCCTGC
TATGGATCAG
CATTCAGAAA
TCATAGCCTT
(2) INFORMATION FOR SEQ ID N0:90:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2379 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:90:
TGATCCCAGG
AAGTTGTTAC
GACTCTGAAT
AATGGATGAA
TACATTGGGC
CACTGTGATG GAAATATCCC ACTCL'G3CTTA CTTAATCAAA TATTTTATAT360 GCCTTTTCCT
GAATTTGTGG
TCAGATTTCC
CTATCAGATG
CTTCAACTGG
CACAGAGTTG
GGCAATCAAT
GTCAACCGCA
CAGAACATGG
CCCAACAAGT
ACACCATGAT
CATGGATTTA
TATTTATTAA AGCCAGAGT'G GGGTTTCTTT GAATATGAAA CCTCAGCATC1080 TTACTGTAAA
GAAGTTTGAC
GGCCCATGCT
CAAAGGAGCA
CAATCCTTTT
CATTGTTCAC
CAGCCCATAT
CACAGGAAGT
AAAATTCTGG
TGAAATTTCT
TATTGTGAAG
CTTTCTGTGC
AATCACATTT
TGTGCTTCTG
GGGGACACCC
TTGGCTAGCA
CATAATTGTG
CTTCCTGGCC
CAATGAAGCC
CCTTCCTGTC
TTTGACATCC
AATGAAACCA
GAGAGAATTC TATCCAAP~AG ACAGGAGAAA TCACGTTTC 2379 (2) INFORMATION FOR SEQ ID N0:91:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2394 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:91:
GGATCCCAGA
GACTTGTTCC
GACTCTGAAT
AGTGGATGAA
CAATTTGGGT
TAATCATCTC
TACAGGACCA
TCCACATTTC
ATATCCTTAT
CTTCATAATT
CAATCAATTT
TGTGAACATG
CCAAATTGTG
TGAATTGAGC
ACAATTTGAT
TACATTTCTA
CAATCTCAGA
AGCCTCAGCA
GCTAATGGAA
TGTATATGCC
AATAAACAAT
GACCCACTTC
TGGAGACTAT
GATAGGACAA
GACTGAGTTG
TCCTGGATTC
CTGCCCTGAA
CCATGACACT
ACTCATGTTC
TATCTTACAG
CAAAACAATC
CTTTTTGGTA
TCTGTGTGCA
GCATGGCCAC
GGGATACTTG
GCCTGACACA
CTGGGTCACC
(2) INFORMATION FOR SEQ ID N0:92:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2085 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:92:
CATCTTCAGT
ACTAGCATTG
TATCTCAGAT
CAAGGATATT
TAAAACTGAA
TGGGGAAACA
GAGAATATGG
TGGCACATTC
AAATTTTGTA
GTGGAAATAT
ATCCAATATC
TAGTCACAAC
GCAAGAATTT
GCTAAACTCA
GAAACAGAGA
ACGCCTTGGG
GTTACACTTA
AGTGTGCAGT
GCAGATTGCC ATCCTGGATT CAGAAC,AATC TGGAA('~GAGG AAATGGCAGC1140 CTGCTGTTTT
TCAGTGTGCG
GAAAGGTGTG
ATTCTGTTTC
TCCTATTGTG
CTGTTTTCTG
GCAAATCACA
CACTGTGGTT
ATCGGGGACA
AATCTGGCTA
CATCATCATT
GGCCTGCCTG
GCCTTTGGAA GCTTCACTAT AGCTT1'CTTG GCAAAGAACC TGCCTGACAC1860 ATTCAACGAA
CTTCCTCCCT
CATCTTGGCA
TTTAATGAGA
WC C181I11:
'r
Claims (61)
1. A family of isolated pheromone receptor polypeptides, each of said isolated polypeptides comprising from amino terminus to carboxyl terminus:
(a) an amino-terminal extracellular domain containing from 30 to 600 amino acids;
(b) a transmembrane region comprising:
(i) seven non-contiguous transmembrane domains designated TM1, TM2, TM3, TM4, TM5, TM6 and TM7 (ii) three non-contiguous extracellular domains designated EC2, EC3 and EC4, and (iii) three non-contiguous intracellular domains designated IC1, IC2, and IC3, wherein the transmembrane domains, the extracellular domains and the intracellular domains are attached to one another from amino terminus to carboxyl terminus in the order TM1-IC1-TM2-EC2-TM3-IC2-TM4-EC3-TM5-IC3-TM6-EC4-TM7, and wherein the transmembrane region has at least about 35% homology and a length approximately equal to a transmembrane region of a polypeptide selected from the group consisting of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 34, 36, 38, 40, 42, 44, 46, 48, and 50; and (c) a carboxyl-terminal intracellular domain containing from 5 to 200 amino acids;
wherein the pheromone receptor polypeptides are expressed in a G.alpha.0 protein-expressing vomeronasal organ neuron or are expressed in another olfactory organ neuron in an animal which does not possess a vomeronasal organ.
(a) an amino-terminal extracellular domain containing from 30 to 600 amino acids;
(b) a transmembrane region comprising:
(i) seven non-contiguous transmembrane domains designated TM1, TM2, TM3, TM4, TM5, TM6 and TM7 (ii) three non-contiguous extracellular domains designated EC2, EC3 and EC4, and (iii) three non-contiguous intracellular domains designated IC1, IC2, and IC3, wherein the transmembrane domains, the extracellular domains and the intracellular domains are attached to one another from amino terminus to carboxyl terminus in the order TM1-IC1-TM2-EC2-TM3-IC2-TM4-EC3-TM5-IC3-TM6-EC4-TM7, and wherein the transmembrane region has at least about 35% homology and a length approximately equal to a transmembrane region of a polypeptide selected from the group consisting of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 34, 36, 38, 40, 42, 44, 46, 48, and 50; and (c) a carboxyl-terminal intracellular domain containing from 5 to 200 amino acids;
wherein the pheromone receptor polypeptides are expressed in a G.alpha.0 protein-expressing vomeronasal organ neuron or are expressed in another olfactory organ neuron in an animal which does not possess a vomeronasal organ.
2. The polypeptides of claim 1, wherein the transmembrane region of each of said polypeptides has at least between about 60% and about 90% homology to the transdomain region of a pheromone receptor polypeptide selected from the group consisting of SEQ ID
NO. 2, 4, 6, 8, 10, 12, 14, 34, 36, 38, 40, 42, 44, 46, 48, and 50.
NO. 2, 4, 6, 8, 10, 12, 14, 34, 36, 38, 40, 42, 44, 46, 48, and 50.
3. The polypeptides of claims 1 or 2, wherein the non-contiguous intracellular domains of each of said polypeptides has at least between about 60% and about 90%
homology to the non-contiguous intracellular domains of a pheromone receptor polypeptide selected from the group consisting of SEQ ID NO. 2, 4, 6, 8, 10, 34, 36, 38, 40, 42, 44, 46, 48, and 50.
homology to the non-contiguous intracellular domains of a pheromone receptor polypeptide selected from the group consisting of SEQ ID NO. 2, 4, 6, 8, 10, 34, 36, 38, 40, 42, 44, 46, 48, and 50.
4. The polypeptides of claim 1, wherein the extracellular domain of each of said polypeptides has at least between about 50% and about 90% homology to the extracellular domain of a pheromone receptor polypeptide selected from the group consisting of SEQ ID NO.
2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, and 50.
2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, and 50.
5. The polypeptides of claim 2, wherein the extracellular domain of each of said polypeptides has at least between about 50% and about 90% homology to the extracellular domain of a pheromone receptor polypeptide selected from the group consisting of SEQ ID NO.
2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, and 50.
2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, and 50.
6. The polypeptides of claim 3, wherein the extracellular domain of each of said polypeptides has at least between about 50% and about 90% homology to the extracellular domain of a pheromone receptor polypeptide selected from the group consisting of SEQ ID NO.
2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, and 50.
2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, and 50.
7. The polypeptides of claims 1 or 2, wherein the extracellular domain contains at least between about 50 and about 500 amino acids.
8. The polypeptides of claim 3, wherein the extracellular domain contains at least between about 50 and about 500 amino acids.
9. The polypeptides of claims 4, 5 or 6, further comprising a signal sequence attached to the amino terminus of the extracellular domain.
10. The polypeptides of claim 9, wherein the signal sequence is selected from the group of signal sequences of a pheromone receptor polypeptide of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52.
11. A method for identifying a nucleic acid encoding a pheromone receptor polypeptide, comprising:
(1) contacting a mixture of nucleic acid molecules with at least one nucleic acid probe of a nucleic acid selected from the group consisting of (a) a nucleic acid molecule selected from the group consisting of SEQ ID NO. 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 54, and 55 that encodes a pheromone receptor polypeptide;
(b) a unique fragment of (a); (c) a human homolog of (a) or (b); and (d) a set of degenerate primers of any of (a), (b) or (c); and (2) identifying the sequences within the mixture that hybridize to the probe.
(1) contacting a mixture of nucleic acid molecules with at least one nucleic acid probe of a nucleic acid selected from the group consisting of (a) a nucleic acid molecule selected from the group consisting of SEQ ID NO. 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 54, and 55 that encodes a pheromone receptor polypeptide;
(b) a unique fragment of (a); (c) a human homolog of (a) or (b); and (d) a set of degenerate primers of any of (a), (b) or (c); and (2) identifying the sequences within the mixture that hybridize to the probe.
12. The method of claim 11, wherein the mixture is a genomic library.
13. The method of claim 11, wherein the mixture is a cDNA library.
14. The method of claim 11, wherein the nucleic acid probe contains a detectable label.
15. The method of claim 11, wherein the at least one nucleic acid probe is a pair of degenerate polymerise chain reaction primers that amplify a unique fragment of a nucleic acid molecule selected from the group consisting of SEQ ID NO. 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 54, and 55, the method further comprising the step of subjecting the mixture to a polymerise chain reaction amplification reaction prior to selecting a member of the mixture which hybridizes to the nucleic acid probe.
16. The method of claim 15, wherein the pair of degenerate polymerise chain reaction primers is selected from the group consisting of SEQ ID NOs. 60 and 61, SEQ ID
NOs. 62 and 63, SEQ ID NOs. 64 and 63, SEQ ID NOs. 64 and 65, and SEQ ID NOs. 66 and 67.
NOs. 62 and 63, SEQ ID NOs. 64 and 63, SEQ ID NOs. 64 and 65, and SEQ ID NOs. 66 and 67.
17. The method of claim 16, wherein the pair of polymerise chain reaction primers is selected from the group consisting of SEQ ID NOs. 60 and 61, SEQ ID NOs. 62 and 63, SEQ
ID and NOs. 64 and 63.
ID and NOs. 64 and 63.
18. An isolated nucleic acid molecule (a) which hybridizes under high or low stringency conditions to a molecule consisting of a nucleic acid sequence selected from the group consisting of SEQ ID NO. 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 54, and 55, and which codes for a pheromone receptor, (b) nucleic acid molecules that differ from the nucleic acid molecules of (a) in codon sequence due to the degeneracy of the genetic code, and (c) complements of (a) and (b).
19. The nucleic acid molecule of claim 18, wherein the pheromone receptor is expressed in the vomeronasal organ or is expressed in another olfactory organ in an animal which does not possess a vomeronasal organ.
20. The nucleic acid molecule of claim 18, wherein the pheromone receptor is expressed in a G.alpha.o protein-expressing vomeronasal organ neuron.
21. The nucleic acid molecule of claim 18, wherein the pheromone receptor is a G-protein coupled receptor.
22. The isolated nucleic acid molecule of claim 18, wherein the pheromone receptor has an amino acid sequence selected from the group consisting of SEQ ID NO. 2, 4, 6, 8,10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52.
23. The isolated nucleic acid molecule of claim 18, wherein the isolated nucleic acid molecule is selected from the group consisting of SEQ ID NO. 51, 53, 54, 55, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, and 92, that encodes a pheromone receptor polypeptide.
24. The isolated nucleic acid molecule of claim 18, wherein the isolated molecule comprises a molecule having a sequence which encodes a pheromone receptor unique fragment, wherein said unique fragment is selected from the group consisting of a pheromone receptor extracellular domain, a pheromone receptor transmembrane domain, a pheromone receptor intracellular domain, a pheromone receptor extracellular domain coupled to at least one transmembrane domain, and at least one pheromone receptor transmembrane domain coupled to a pheromone receptor intracellular domain.
25. The isolated nucleic acid molecule of claim 18, wherein the pheromone receptor extracellular domain, the pheromone receptor transmembrane domain and the pheromone receptor intracellular domain have amino acid sequences selected from the group of sequences identified as these domains in SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52.
26. The isolated nucleic acid molecule of claim 18, wherein the unique fragment is selected from the group consisting of between 12 and 4000, between 12 and 2000, between 12 and 1000, between 12 and 500, between 12 and 250, between 12 and 100, between 12 and 50, and between 12 and 25, nucleotides in length.
27. An isolated nucleic acid molecule, comprising (a) a molecule having a sequence selected from the group consisting of SEQ ID
NO. 51, 53, 54, 55, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, and 92, and which codes for a pheromone receptor;
(b) nucleic acid molecules that differ from the nucleic acid molecules of (a) in codon sequence due to the degeneracy of the genetic code, and (c) complements of (a) and (b).
NO. 51, 53, 54, 55, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, and 92, and which codes for a pheromone receptor;
(b) nucleic acid molecules that differ from the nucleic acid molecules of (a) in codon sequence due to the degeneracy of the genetic code, and (c) complements of (a) and (b).
28. An expression vector comprising the isolated nucleic acid molecule of claims 18-27 operably linked to a promoter.
29. A host cell transformed or transfected with the isolated nucleic acid molecule of claims 18-27.
30. A host cell transformed or transfected with the isolated nucleic acid molecule of the expression vector of claim 28.
31. An isolated polypeptide encoded by the isolated nucleic acid molecule of claims 18-27.
32. The isolated polypeptide of claim 31, wherein the isolated polypeptide has a pheromone receptor activity.
33. The isolated polypeptide of claim 31, wherein the isolated polypeptide comprises a polypeptide selected from group consisting of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52.
34. The isolated polypeptide of claim 33, wherein the isolated polypeptide is a fragment of a peptide selected from the group consisting of an extracellular domain, a transmembrane domain and an intracellular domain, wherein the foregoing domains have amino acid sequences selected from the group of sequences identified as these domains of a pheromone receptor polypeptide selected from group consisting of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52.
35. A vaccine containing an isolated polypeptide selected from the group consisting of the isolated polypeptides of claim 31, 32, 33, and 34.
36. A method for controlling fertility in an animal, comprising:
administering to an animal in need of such treatment, an effective amount of the vaccine of claim 35 to elicit an immune response to the isolated polypeptide.
administering to an animal in need of such treatment, an effective amount of the vaccine of claim 35 to elicit an immune response to the isolated polypeptide.
37. An isolated binding polypeptide which binds selectively to a polypeptide of claim 1, 2, 4, 5, 6, 8, 10, 31, 32, 33, and 34, provided that the isolated binding polypeptide does not bind to a G-protein coupled receptor other than a G.alpha.~+-coupled pheromone receptor.
38. The isolated binding polypeptide of claim 37, wherein the binding polypeptide binds to a polypeptide selected from the group consisting of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50 and 52.
39. The isolated binding polypeptide of claim 37, wherein the binding polypeptide is an antibody fragment selected from the group consisting of a Fab fragment, a F(ab)2 fragment or a fragment including a CDR3 region selective for a pheromone receptor polypeptide.
40. The isolated binding polypeptide of claim 38, wherein the binding polypeptide is an antibody fragment selected from the group consisting of a Fab fragment, a F(ab)2 fragment or a fragment including a CDR3 region selective for a pheromone receptor polypeptide.
41. An affinity matrix comprising:
a solid support to which is coupled an isolated binding polypeptide selected from the group consisting of the binding polypeptides of any of claims 37-40.
a solid support to which is coupled an isolated binding polypeptide selected from the group consisting of the binding polypeptides of any of claims 37-40.
42. A method for isolating a pheromone receptor, comprising:
contacting a composition containing a putative pheromone receptor with the affinity matrix of claim 41 under conditions to permit the pheromone receptor to selectively bind to the binding polypeptides coupled to the solid support; and isolating the polypeptides that bind to the affinity matrix.
contacting a composition containing a putative pheromone receptor with the affinity matrix of claim 41 under conditions to permit the pheromone receptor to selectively bind to the binding polypeptides coupled to the solid support; and isolating the polypeptides that bind to the affinity matrix.
43. A composition comprising:
the polypeptide of claim 1, 2, 4, 5, 6, 8, 10, 31, 32, 33, or 34; and a pharmaceutically acceptable carrier.
the polypeptide of claim 1, 2, 4, 5, 6, 8, 10, 31, 32, 33, or 34; and a pharmaceutically acceptable carrier.
44. A composition comprising:
the nucleic acid molecule of any of claims 18-28; and a pharmaceutically acceptable carrier.
the nucleic acid molecule of any of claims 18-28; and a pharmaceutically acceptable carrier.
45. A composition comprising:
the binding polypeptide of claim 37; and a pharmaceutically acceptable carrier.
the binding polypeptide of claim 37; and a pharmaceutically acceptable carrier.
46. A composition comprising:
the binding polypeptide of claims 38, 39 or 40; and a pharmaceutically acceptable carrier.
the binding polypeptide of claims 38, 39 or 40; and a pharmaceutically acceptable carrier.
47. A method for modulating a pheromone receptor activity in a cell, comprising:
administering to the cell an amount of the isolated binding polypeptide of claim 37 effective to modulate pheromone receptor activity in the cell.
administering to the cell an amount of the isolated binding polypeptide of claim 37 effective to modulate pheromone receptor activity in the cell.
48. A method for modulating a pheromone receptor activity in a cell, comprising:
administering to the cell an amount of the isolated binding polypeptide of claim 38, 39, or 40 effective to modulate pheromone receptor activity in the cell.
administering to the cell an amount of the isolated binding polypeptide of claim 38, 39, or 40 effective to modulate pheromone receptor activity in the cell.
49. The method of claim 47, wherein modulating a pheromone receptor activity comprises reducing the pheromone receptor activity.
50. The method of claim 48, wherein modulating a pheromone receptor activity comprises reducing the pheromone receptor activity.
51. The method of claim 47, wherein the pheromone receptor activity is selected from the group consisting of a signal transduction activity and a ligand binding activity.
52. The method of claim 48, wherein the pheromone receptor activity is selected from the group consisting of a signal transduction activity and a ligand binding activity.
53. The method of claim 47, wherein the cell is a vertebrate cell, preferably a mammalian cell.
54. The method of claim 48, wherein the cell is a vertebrate cell, preferably a mammalian cell.
55. The method of claim 47, wherein the cell is an invertebrate cell, preferably an insect cell.
56. The method of claim 48, wherein the cell is an invertebrate cell, preferably an insect cell.
57. A method for reducing the binding of a pheromone having a binding domain to a pheromone receptor having a ligand binding site that selectively binds to the binding domain of the pheromone, comprising:
contacting the pheromone receptor with an agent which binds to the binding domain for a time effective to reduce binding of the pheromone to the ligand binding site of the pheromone receptor.
contacting the pheromone receptor with an agent which binds to the binding domain for a time effective to reduce binding of the pheromone to the ligand binding site of the pheromone receptor.
58. The method of claim 57, wherein the agent is an antibody which binds to the binding domain.
59. A method for decreasing pheromone receptor mediated signal transduction activity in a subject comprising:
administering to a subject in need of such treatment an agent that selectively binds to an isolated nucleic acid molecule of claim 1 or an expression product thereof, in an amount effective to decrease pheromone receptor mediated signal transduction activity in the subject.
administering to a subject in need of such treatment an agent that selectively binds to an isolated nucleic acid molecule of claim 1 or an expression product thereof, in an amount effective to decrease pheromone receptor mediated signal transduction activity in the subject.
60. The method of claim 59, wherein the agent is selected from the group consisting of an antisense nucleic acid and a binding polypeptide.
61. A method for identifying lead compounds for a pharmacological agent useful in the diagnosis or treatment of disease associated with pheromone binding to a pheromone receptor polypeptide containing a ligand binding site that selectively binds to a binding domain of the pheromone, comprising forming a mixture comprising a pheromone receptor polypeptide or unique fragment thereof containing a ligand binding site, a molecule protein containing a binding domain which selectively binds the pheromone receptor ligand binding site, and a candidate pharmacological agent, incubating the mixture under conditions which, in the absence of the candidate pharmacological agent, permit a first amount of selective binding of the molecule containing a ligand binding domain by the pheromone receptor ligand binding site, and detecting a test amount of selective binding of the molecule containing the binding domain by the pheromone receptor ligand binding site, wherein reduction of the test amount of selective binding relative to the first amount of selective binding indicates that the candidate pharmacological agent is a lead compound for a pharmacological agent which disrupts selective binding of a molecule containing a binding domain by a pheromone receptor containing a ligand binding site and wherein increase of the test amount of selective binding relative to the first amount of selective binding indicates that the candidate pharmacological agent is a lead compound for a pharmacological agent which enhances selective binding of a molecule containing a binding domain by a pheromone receptor polypeptide containing a ligand binding site.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US5128497P | 1997-06-30 | 1997-06-30 | |
US60/051,284 | 1997-06-30 | ||
PCT/US1998/013680 WO1999000422A1 (en) | 1997-06-30 | 1998-06-30 | Novel family of pheromone receptors |
Publications (1)
Publication Number | Publication Date |
---|---|
CA2294473A1 true CA2294473A1 (en) | 1999-01-07 |
Family
ID=21970359
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002294473A Abandoned CA2294473A1 (en) | 1997-06-30 | 1998-06-30 | Novel family of pheromone receptors |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP0996635A4 (en) |
JP (1) | JP2002511871A (en) |
CA (1) | CA2294473A1 (en) |
WO (1) | WO1999000422A1 (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001025431A1 (en) * | 1999-10-01 | 2001-04-12 | The Rockefeller University | Primate, particularly human, vomeronasal-like receptor |
US20020155444A1 (en) * | 2000-02-17 | 2002-10-24 | Herman Ronald C. | Human VNO cDNA libraries |
ATE432350T1 (en) * | 2000-06-16 | 2009-06-15 | Incyte Corp | G-PROTEIN COUPLED RECEPTORS |
AU2001287111A1 (en) * | 2000-09-07 | 2002-03-22 | Zymogenetics Inc. | Human vomeronasal receptor-3 |
JP2004508843A (en) * | 2000-09-22 | 2004-03-25 | ケムコム エス.エー. | Olfactory and pheromone G protein-coupled receptors |
EP2008691A1 (en) * | 2007-06-29 | 2008-12-31 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Vaginal odorants |
JP2023184465A (en) * | 2022-06-16 | 2023-12-28 | 花王株式会社 | Analysis method for G protein-coupled receptors |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5030722A (en) * | 1988-03-30 | 1991-07-09 | The Johns Hopkins University | Odorant-binding protein from rat |
DE69634134D1 (en) * | 1995-10-19 | 2005-02-03 | Univ Columbia | CLONING OF PHEROMONE RECEPTORS FROM VERTEBRATES AND THEIR USES |
-
1998
- 1998-06-30 JP JP50590499A patent/JP2002511871A/en active Pending
- 1998-06-30 CA CA002294473A patent/CA2294473A1/en not_active Abandoned
- 1998-06-30 WO PCT/US1998/013680 patent/WO1999000422A1/en active Search and Examination
- 1998-06-30 EP EP98933045A patent/EP0996635A4/en not_active Withdrawn
Also Published As
Publication number | Publication date |
---|---|
WO1999000422A9 (en) | 1999-04-15 |
EP0996635A1 (en) | 2000-05-03 |
WO1999000422A1 (en) | 1999-01-07 |
EP0996635A4 (en) | 2003-08-27 |
JP2002511871A (en) | 2002-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5494806A (en) | DNA and vectors encoding the parathyroid hormone receptor, transformed cells, and recombinant production of PTHR proteins and peptides | |
CA2217668C (en) | Genetic markers for breast and ovarian cancer | |
US6262333B1 (en) | Human genes and gene expression products | |
CA2128208C (en) | Novel seven transmembrane receptors | |
CA2348479A1 (en) | Novel members of the capsaicin/vanilloid receptor family of proteins and uses thereof | |
US20020106655A1 (en) | Human GPCR proteins | |
CA2311572A1 (en) | Extended cdnas for secreted proteins | |
CA2292339A1 (en) | Smad6 and uses thereof | |
JP2002531091A5 (en) | ||
CA2281895C (en) | Ikb kinases | |
CA2386509A1 (en) | G protein-coupled receptors expressed in human brain | |
CA2290783A1 (en) | Modulators of tissue regeneration | |
JPH11503012A (en) | Human G protein-coupled receptor | |
CA2288430A1 (en) | Crsp protein (cysteine-rich secreted proteins), nucleic acid molecules encoding them and uses therefor | |
CA2366062A1 (en) | Human dickkopf-related protein and nucleic acid molecules and uses therefor | |
KR20070099564A (en) | Methods for assessing patients with acute myeloid leukemia | |
CA2380000A1 (en) | Odorant receptors | |
CA2145866C (en) | Human calcitonin receptor | |
CA2321194A1 (en) | Human potassium channel genes | |
CA2407219A1 (en) | Pain signaling molecules | |
US20030180739A1 (en) | Reagents and methods for identifying gene targets for treating cancer | |
CA2294473A1 (en) | Novel family of pheromone receptors | |
CA2386029A1 (en) | P-glycoproteins from macaca fascicularis and uses thereof | |
CA2421865A1 (en) | Olfactory and pheromones g-protein coupled receptors | |
US20040014169A1 (en) | Novel G protein-coupled receptors |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FZDE | Discontinued |