CA2200583A1 - Human galactokinase gene - Google Patents
Human galactokinase geneInfo
- Publication number
- CA2200583A1 CA2200583A1 CA002200583A CA2200583A CA2200583A1 CA 2200583 A1 CA2200583 A1 CA 2200583A1 CA 002200583 A CA002200583 A CA 002200583A CA 2200583 A CA2200583 A CA 2200583A CA 2200583 A1 CA2200583 A1 CA 2200583A1
- Authority
- CA
- Canada
- Prior art keywords
- seq
- sequence
- dna
- nucleic acid
- human
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 101001024874 Homo sapiens Galactokinase Proteins 0.000 title claims abstract description 39
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 117
- 102000004169 proteins and genes Human genes 0.000 claims abstract description 65
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 60
- 102000039446 nucleic acids Human genes 0.000 claims abstract description 56
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 56
- 108700023157 Galactokinases Proteins 0.000 claims abstract description 33
- 102000048120 Galactokinases Human genes 0.000 claims abstract description 30
- 230000035772 mutation Effects 0.000 claims abstract description 27
- 108020004414 DNA Proteins 0.000 claims description 74
- 238000000034 method Methods 0.000 claims description 48
- 241000282414 Homo sapiens Species 0.000 claims description 45
- 239000000523 sample Substances 0.000 claims description 40
- 239000013598 vector Substances 0.000 claims description 30
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 29
- 230000014509 gene expression Effects 0.000 claims description 22
- 239000002773 nucleotide Substances 0.000 claims description 20
- 125000003729 nucleotide group Chemical group 0.000 claims description 20
- 239000012634 fragment Substances 0.000 claims description 17
- 208000027472 Galactosemias Diseases 0.000 claims description 13
- 108091032973 (ribonucleotides)n+m Proteins 0.000 claims description 12
- 201000007412 galactokinase deficiency Diseases 0.000 claims description 11
- 238000009396 hybridization Methods 0.000 claims description 8
- 210000002966 serum Anatomy 0.000 claims description 7
- 108091027305 Heteroduplex Proteins 0.000 claims description 6
- 102000008394 Immunoglobulin Fragments Human genes 0.000 claims description 6
- 108010021625 Immunoglobulin Fragments Proteins 0.000 claims description 6
- 238000003556 assay Methods 0.000 claims description 6
- 238000001502 gel electrophoresis Methods 0.000 claims description 6
- 239000013612 plasmid Substances 0.000 claims description 6
- 230000009261 transgenic effect Effects 0.000 claims description 6
- 238000001712 DNA sequencing Methods 0.000 claims description 5
- 230000029087 digestion Effects 0.000 claims description 5
- 230000002068 genetic effect Effects 0.000 claims description 5
- 238000001962 electrophoresis Methods 0.000 claims description 4
- 238000007901 in situ hybridization Methods 0.000 claims description 4
- 238000004519 manufacturing process Methods 0.000 claims description 4
- 108091008146 restriction endonucleases Proteins 0.000 claims description 4
- 239000000499 gel Substances 0.000 claims description 3
- 238000011084 recovery Methods 0.000 claims description 3
- 108020004705 Codon Proteins 0.000 claims description 2
- 238000012258 culturing Methods 0.000 claims description 2
- 229920002401 polyacrylamide Polymers 0.000 claims description 2
- 230000001737 promoting effect Effects 0.000 claims description 2
- 238000007894 restriction fragment length polymorphism technique Methods 0.000 claims 1
- 230000001225 therapeutic effect Effects 0.000 abstract description 9
- 210000004027 cell Anatomy 0.000 description 78
- 150000001413 amino acids Chemical class 0.000 description 30
- 108091026890 Coding region Proteins 0.000 description 24
- 230000000694 effects Effects 0.000 description 20
- NUEHQDHDLDXCRU-GUBZILKMSA-N Ser-Pro-Arg Chemical compound OC[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCCN=C(N)N)C(O)=O NUEHQDHDLDXCRU-GUBZILKMSA-N 0.000 description 19
- 238000003752 polymerase chain reaction Methods 0.000 description 19
- 230000002950 deficient Effects 0.000 description 15
- 239000002299 complementary DNA Substances 0.000 description 14
- 238000001415 gene therapy Methods 0.000 description 14
- 230000001105 regulatory effect Effects 0.000 description 13
- 230000007812 deficiency Effects 0.000 description 12
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 11
- 108090000765 processed proteins & peptides Proteins 0.000 description 11
- 239000000047 product Substances 0.000 description 11
- 150000001875 compounds Chemical class 0.000 description 10
- 241000588724 Escherichia coli Species 0.000 description 9
- 108091029865 Exogenous DNA Proteins 0.000 description 8
- 241000700605 Viruses Species 0.000 description 8
- 239000003153 chemical reaction reagent Substances 0.000 description 8
- 201000010099 disease Diseases 0.000 description 8
- 229930182830 galactose Natural products 0.000 description 8
- 238000003780 insertion Methods 0.000 description 8
- 230000037431 insertion Effects 0.000 description 8
- 239000000203 mixture Substances 0.000 description 8
- 102000004196 processed proteins & peptides Human genes 0.000 description 8
- 230000010076 replication Effects 0.000 description 8
- 239000000243 solution Substances 0.000 description 8
- YBAFDPFAUTYYRW-UHFFFAOYSA-N N-L-alpha-glutamyl-L-leucine Natural products CC(C)CC(C(O)=O)NC(=O)C(N)CCC(O)=O YBAFDPFAUTYYRW-UHFFFAOYSA-N 0.000 description 7
- 108020004485 Nonsense Codon Proteins 0.000 description 7
- 239000000427 antigen Substances 0.000 description 7
- 108091007433 antigens Proteins 0.000 description 7
- 102000036639 antigens Human genes 0.000 description 7
- 101150046414 ce gene Proteins 0.000 description 7
- 230000008859 change Effects 0.000 description 7
- 238000012217 deletion Methods 0.000 description 7
- 230000037430 deletion Effects 0.000 description 7
- 108020004999 messenger RNA Proteins 0.000 description 7
- 239000008194 pharmaceutical composition Substances 0.000 description 7
- 229920001184 polypeptide Polymers 0.000 description 7
- 210000001519 tissue Anatomy 0.000 description 7
- BYXHQQCXAJARLQ-ZLUOBGJFSA-N Ala-Ala-Ala Chemical compound C[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@@H](C)C(O)=O BYXHQQCXAJARLQ-ZLUOBGJFSA-N 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 239000013615 primer Substances 0.000 description 6
- 108700028369 Alleles Proteins 0.000 description 5
- 241000894006 Bacteria Species 0.000 description 5
- 102000004163 DNA-directed RNA polymerases Human genes 0.000 description 5
- 108090000626 DNA-directed RNA polymerases Proteins 0.000 description 5
- 102000004190 Enzymes Human genes 0.000 description 5
- 108090000790 Enzymes Proteins 0.000 description 5
- MIIVFRCYJABHTQ-ONGXEEELSA-N Gly-Leu-Val Chemical compound [H]NCC(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C(C)C)C(O)=O MIIVFRCYJABHTQ-ONGXEEELSA-N 0.000 description 5
- XHQYFGPIRUHQIB-PBCZWWQYSA-N His-Thr-Asp Chemical compound OC(=O)C[C@@H](C(O)=O)NC(=O)[C@H]([C@H](O)C)NC(=O)[C@@H](N)CC1=CN=CN1 XHQYFGPIRUHQIB-PBCZWWQYSA-N 0.000 description 5
- LPFBXFILACZHIB-LAEOZQHASA-N Ile-Gly-Glu Chemical compound CC[C@H](C)[C@@H](C(=O)NCC(=O)N[C@@H](CCC(=O)O)C(=O)O)N LPFBXFILACZHIB-LAEOZQHASA-N 0.000 description 5
- QONKWXNJRRNTBV-AVGNSLFASA-N Leu-Pro-Met Chemical compound CC(C)C[C@@H](C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCSC)C(=O)O)N QONKWXNJRRNTBV-AVGNSLFASA-N 0.000 description 5
- AAKRWBIIGKPOKQ-ONGXEEELSA-N Leu-Val-Gly Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)NCC(O)=O AAKRWBIIGKPOKQ-ONGXEEELSA-N 0.000 description 5
- 241001465754 Metazoa Species 0.000 description 5
- 108091034117 Oligonucleotide Proteins 0.000 description 5
- MBFJIHUHHCJBSN-AVGNSLFASA-N Tyr-Asn-Gln Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(N)=O)C(O)=O MBFJIHUHHCJBSN-AVGNSLFASA-N 0.000 description 5
- 230000027455 binding Effects 0.000 description 5
- 239000003795 chemical substances by application Substances 0.000 description 5
- 210000000349 chromosome Anatomy 0.000 description 5
- 238000010367 cloning Methods 0.000 description 5
- 101150045500 galK gene Proteins 0.000 description 5
- 108010078144 glutaminyl-glycine Proteins 0.000 description 5
- VPZXBVLAVMBEQI-UHFFFAOYSA-N glycyl-DL-alpha-alanine Natural products OC(=O)C(C)NC(=O)CN VPZXBVLAVMBEQI-UHFFFAOYSA-N 0.000 description 5
- 230000037434 nonsense mutation Effects 0.000 description 5
- 239000000126 substance Substances 0.000 description 5
- 108010033670 threonyl-aspartyl-tyrosine Proteins 0.000 description 5
- 238000001890 transfection Methods 0.000 description 5
- 108020004635 Complementary DNA Proteins 0.000 description 4
- ZMVCLTGPGWJAEE-JYJNAYRXSA-N Glu-His-Tyr Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)O)NC(=O)[C@H](CC2=CN=CN2)NC(=O)[C@H](CCC(=O)O)N)O ZMVCLTGPGWJAEE-JYJNAYRXSA-N 0.000 description 4
- UQJNXZSSGQIPIQ-FBCQKBJTSA-N Gly-Gly-Thr Chemical compound C[C@@H](O)[C@@H](C(O)=O)NC(=O)CNC(=O)CN UQJNXZSSGQIPIQ-FBCQKBJTSA-N 0.000 description 4
- WJGSTIMGSIWHJX-HVTMNAMFSA-N His-Ile-Gln Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)O)NC(=O)[C@H](CC1=CN=CN1)N WJGSTIMGSIWHJX-HVTMNAMFSA-N 0.000 description 4
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 4
- 108010044940 alanylglutamine Proteins 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 108010054813 diprotin B Proteins 0.000 description 4
- 239000013604 expression vector Substances 0.000 description 4
- 210000002950 fibroblast Anatomy 0.000 description 4
- XKUKSGPZAADMRA-UHFFFAOYSA-N glycyl-glycyl-glycine Chemical compound NCC(=O)NCC(=O)NCC(O)=O XKUKSGPZAADMRA-UHFFFAOYSA-N 0.000 description 4
- 238000001727 in vivo Methods 0.000 description 4
- 210000003734 kidney Anatomy 0.000 description 4
- 239000002502 liposome Substances 0.000 description 4
- 210000004185 liver Anatomy 0.000 description 4
- 108010029020 prolylglycine Proteins 0.000 description 4
- 238000000746 purification Methods 0.000 description 4
- 238000012163 sequencing technique Methods 0.000 description 4
- 238000006467 substitution reaction Methods 0.000 description 4
- 108010017949 tyrosyl-glycyl-glycine Proteins 0.000 description 4
- KIUYPHAMDKDICO-WHFBIAKZSA-N Ala-Asp-Gly Chemical compound C[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)NCC(O)=O KIUYPHAMDKDICO-WHFBIAKZSA-N 0.000 description 3
- PEIBBAXIKUAYGN-UBHSHLNASA-N Ala-Phe-Arg Chemical compound NC(N)=NCCC[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@@H](N)C)CC1=CC=CC=C1 PEIBBAXIKUAYGN-UBHSHLNASA-N 0.000 description 3
- PSUXEQYPYZLNER-QXEWZRGKSA-N Arg-Val-Asn Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC(N)=O)C(O)=O PSUXEQYPYZLNER-QXEWZRGKSA-N 0.000 description 3
- 208000002177 Cataract Diseases 0.000 description 3
- 241000255581 Drosophila <fruit fly, genus> Species 0.000 description 3
- 102100031780 Endonuclease Human genes 0.000 description 3
- YJIUYQKQBBQYHZ-ACZMJKKPSA-N Gln-Ala-Ala Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](C)C(=O)N[C@@H](C)C(O)=O YJIUYQKQBBQYHZ-ACZMJKKPSA-N 0.000 description 3
- YLJHCWNDBKKOEB-IHRRRGAJSA-N Glu-Glu-Phe Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O YLJHCWNDBKKOEB-IHRRRGAJSA-N 0.000 description 3
- MFVQGXGQRIXBPK-WDSKDSINSA-N Gly-Ala-Glu Chemical compound NCC(=O)N[C@@H](C)C(=O)N[C@@H](CCC(O)=O)C(O)=O MFVQGXGQRIXBPK-WDSKDSINSA-N 0.000 description 3
- UGTHTQWIQKEDEH-BQBZGAKWSA-N L-alanyl-L-prolylglycine zwitterion Chemical compound C[C@H](N)C(=O)N1CCC[C@H]1C(=O)NCC(O)=O UGTHTQWIQKEDEH-BQBZGAKWSA-N 0.000 description 3
- CQQGCWPXDHTTNF-GUBZILKMSA-N Leu-Ala-Glu Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@H](C(O)=O)CCC(O)=O CQQGCWPXDHTTNF-GUBZILKMSA-N 0.000 description 3
- GRZSCTXVCDUIPO-SRVKXCTJSA-N Leu-Arg-Gln Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(N)=O)C(O)=O GRZSCTXVCDUIPO-SRVKXCTJSA-N 0.000 description 3
- QVFGXCVIXXBFHO-AVGNSLFASA-N Leu-Glu-Leu Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(O)=O QVFGXCVIXXBFHO-AVGNSLFASA-N 0.000 description 3
- FLNPJLDPGMLWAU-UWVGGRQHSA-N Leu-Met-Gly Chemical compound OC(=O)CNC(=O)[C@H](CCSC)NC(=O)[C@@H](N)CC(C)C FLNPJLDPGMLWAU-UWVGGRQHSA-N 0.000 description 3
- GZRABTMNWJXFMH-UVOCVTCTSA-N Leu-Thr-Thr Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O GZRABTMNWJXFMH-UVOCVTCTSA-N 0.000 description 3
- AAORVPFVUIHEAB-YUMQZZPRSA-N Lys-Asp-Gly Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC(O)=O)C(=O)NCC(O)=O AAORVPFVUIHEAB-YUMQZZPRSA-N 0.000 description 3
- YRAWWKUTNBILNT-FXQIFTODSA-N Met-Ala-Ala Chemical compound CSCC[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@@H](C)C(O)=O YRAWWKUTNBILNT-FXQIFTODSA-N 0.000 description 3
- JQECLVNLAZGHRQ-CIUDSAMLSA-N Met-Asp-Gln Chemical compound CSCC[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@H](C(O)=O)CCC(N)=O JQECLVNLAZGHRQ-CIUDSAMLSA-N 0.000 description 3
- ZBLSZPYQQRIHQU-RCWTZXSCSA-N Met-Thr-Val Chemical compound CSCC[C@H](N)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C(C)C)C(O)=O ZBLSZPYQQRIHQU-RCWTZXSCSA-N 0.000 description 3
- 238000000636 Northern blotting Methods 0.000 description 3
- CWFGECHCRMGPPT-MXAVVETBSA-N Phe-Ile-Ser Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CO)C(O)=O CWFGECHCRMGPPT-MXAVVETBSA-N 0.000 description 3
- SZZBUDVXWZZPDH-BQBZGAKWSA-N Pro-Cys-Gly Chemical compound OC(=O)CNC(=O)[C@H](CS)NC(=O)[C@@H]1CCCN1 SZZBUDVXWZZPDH-BQBZGAKWSA-N 0.000 description 3
- NXEYSLRNNPWCRN-SRVKXCTJSA-N Pro-Glu-Leu Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(O)=O NXEYSLRNNPWCRN-SRVKXCTJSA-N 0.000 description 3
- 108010003201 RGH 0205 Proteins 0.000 description 3
- YFOCMOVJBQDBCE-NRPADANISA-N Val-Ala-Glu Chemical compound C[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)O)NC(=O)[C@H](C(C)C)N YFOCMOVJBQDBCE-NRPADANISA-N 0.000 description 3
- VHIZXDZMTDVFGX-DCAQKATOSA-N Val-Ser-Leu Chemical compound CC(C)C[C@@H](C(=O)O)NC(=O)[C@H](CO)NC(=O)[C@H](C(C)C)N VHIZXDZMTDVFGX-DCAQKATOSA-N 0.000 description 3
- 239000002253 acid Substances 0.000 description 3
- 108010069020 alanyl-prolyl-glycine Proteins 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 230000001580 bacterial effect Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 3
- 208000035475 disorder Diseases 0.000 description 3
- 239000003814 drug Substances 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 108010050848 glycylleucine Proteins 0.000 description 3
- 230000012010 growth Effects 0.000 description 3
- 108010018006 histidylserine Proteins 0.000 description 3
- 210000004408 hybridoma Anatomy 0.000 description 3
- 238000002955 isolation Methods 0.000 description 3
- 210000004072 lung Anatomy 0.000 description 3
- 238000002844 melting Methods 0.000 description 3
- 230000008018 melting Effects 0.000 description 3
- 230000037230 mobility Effects 0.000 description 3
- 210000002826 placenta Anatomy 0.000 description 3
- 210000005059 placental tissue Anatomy 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 208000024891 symptom Diseases 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 238000013518 transcription Methods 0.000 description 3
- 230000035897 transcription Effects 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 3
- HHGYNJRJIINWAK-FXQIFTODSA-N Ala-Ala-Arg Chemical compound C[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@H](C(O)=O)CCCN=C(N)N HHGYNJRJIINWAK-FXQIFTODSA-N 0.000 description 2
- ODWSTKXGQGYHSH-FXQIFTODSA-N Ala-Arg-Ala Chemical compound C[C@H](N)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](C)C(O)=O ODWSTKXGQGYHSH-FXQIFTODSA-N 0.000 description 2
- KVWLTGNCJYDJET-LSJOCFKGSA-N Ala-Arg-His Chemical compound C[C@@H](C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CC1=CN=CN1)C(=O)O)N KVWLTGNCJYDJET-LSJOCFKGSA-N 0.000 description 2
- WDIYWDJLXOCGRW-ACZMJKKPSA-N Ala-Asp-Glu Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O WDIYWDJLXOCGRW-ACZMJKKPSA-N 0.000 description 2
- CXQODNIBUNQWAS-CIUDSAMLSA-N Ala-Gln-Arg Chemical compound C[C@H](N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@H](C(O)=O)CCCN=C(N)N CXQODNIBUNQWAS-CIUDSAMLSA-N 0.000 description 2
- CWEAKSWWKHGTRJ-BQBZGAKWSA-N Ala-Gly-Met Chemical compound [H]N[C@@H](C)C(=O)NCC(=O)N[C@@H](CCSC)C(O)=O CWEAKSWWKHGTRJ-BQBZGAKWSA-N 0.000 description 2
- YHKANGMVQWRMAP-DCAQKATOSA-N Ala-Leu-Arg Chemical compound C[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@H](C(O)=O)CCCN=C(N)N YHKANGMVQWRMAP-DCAQKATOSA-N 0.000 description 2
- CCDFBRZVTDDJNM-GUBZILKMSA-N Ala-Leu-Glu Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(O)=O)C(O)=O CCDFBRZVTDDJNM-GUBZILKMSA-N 0.000 description 2
- MNZHHDPWDWQJCQ-YUMQZZPRSA-N Ala-Leu-Gly Chemical compound C[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)NCC(O)=O MNZHHDPWDWQJCQ-YUMQZZPRSA-N 0.000 description 2
- MDNAVFBZPROEHO-DCAQKATOSA-N Ala-Lys-Val Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](C(C)C)C(O)=O MDNAVFBZPROEHO-DCAQKATOSA-N 0.000 description 2
- NLOMBWNGESDVJU-GUBZILKMSA-N Ala-Met-Arg Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O NLOMBWNGESDVJU-GUBZILKMSA-N 0.000 description 2
- XAXHGSOBFPIRFG-LSJOCFKGSA-N Ala-Pro-His Chemical compound C[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](Cc1cnc[nH]1)C(O)=O XAXHGSOBFPIRFG-LSJOCFKGSA-N 0.000 description 2
- HOVPGJUNRLMIOZ-CIUDSAMLSA-N Ala-Ser-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CO)NC(=O)[C@H](C)N HOVPGJUNRLMIOZ-CIUDSAMLSA-N 0.000 description 2
- JJHBEVZAZXZREW-LFSVMHDDSA-N Ala-Thr-Phe Chemical compound C[C@@H](O)[C@H](NC(=O)[C@H](C)N)C(=O)N[C@@H](Cc1ccccc1)C(O)=O JJHBEVZAZXZREW-LFSVMHDDSA-N 0.000 description 2
- NLYYHIKRBRMAJV-AEJSXWLSSA-N Ala-Val-Pro Chemical compound C[C@@H](C(=O)N[C@@H](C(C)C)C(=O)N1CCC[C@@H]1C(=O)O)N NLYYHIKRBRMAJV-AEJSXWLSSA-N 0.000 description 2
- REWSWYIDQIELBE-FXQIFTODSA-N Ala-Val-Ser Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CO)C(O)=O REWSWYIDQIELBE-FXQIFTODSA-N 0.000 description 2
- MUXONAMCEUBVGA-DCAQKATOSA-N Arg-Arg-Gln Chemical compound NC(N)=NCCC[C@H](N)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CCC(N)=O)C(O)=O MUXONAMCEUBVGA-DCAQKATOSA-N 0.000 description 2
- IYMAXBFPHPZYIK-BQBZGAKWSA-N Arg-Gly-Asp Chemical compound NC(N)=NCCC[C@H](N)C(=O)NCC(=O)N[C@@H](CC(O)=O)C(O)=O IYMAXBFPHPZYIK-BQBZGAKWSA-N 0.000 description 2
- KMFPQTITXUKJOV-DCAQKATOSA-N Arg-Ser-Leu Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(O)=O KMFPQTITXUKJOV-DCAQKATOSA-N 0.000 description 2
- ASQKVGRCKOFKIU-KZVJFYERSA-N Arg-Thr-Ala Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](C)C(=O)O)NC(=O)[C@H](CCCN=C(N)N)N)O ASQKVGRCKOFKIU-KZVJFYERSA-N 0.000 description 2
- DRDWXKWUSIKKOB-PJODQICGSA-N Arg-Trp-Ala Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC1=CNC2=C1C=CC=C2)C(=O)N[C@@H](C)C(O)=O DRDWXKWUSIKKOB-PJODQICGSA-N 0.000 description 2
- REQUGIWGOGSOEZ-ZLUOBGJFSA-N Asn-Ser-Asn Chemical compound C([C@@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(=O)N)C(=O)O)N)C(=O)N REQUGIWGOGSOEZ-ZLUOBGJFSA-N 0.000 description 2
- VZNOVQKGJQJOCS-SRVKXCTJSA-N Asp-Asp-Tyr Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O VZNOVQKGJQJOCS-SRVKXCTJSA-N 0.000 description 2
- CZECQDPEMSVPDH-MNXVOIDGSA-N Asp-Leu-Val-Ser Chemical compound OC(=O)C[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CO)C(O)=O CZECQDPEMSVPDH-MNXVOIDGSA-N 0.000 description 2
- KGHLGJAXYSVNJP-WHFBIAKZSA-N Asp-Ser-Gly Chemical compound OC(=O)C[C@H](N)C(=O)N[C@@H](CO)C(=O)NCC(O)=O KGHLGJAXYSVNJP-WHFBIAKZSA-N 0.000 description 2
- 241000283707 Capra Species 0.000 description 2
- 108091035707 Consensus sequence Proteins 0.000 description 2
- UDPSLLFHOLGXBY-FXQIFTODSA-N Cys-Glu-Glu Chemical compound [H]N[C@@H](CS)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O UDPSLLFHOLGXBY-FXQIFTODSA-N 0.000 description 2
- MBRWOKXNHTUJMB-CIUDSAMLSA-N Cys-Pro-Glu Chemical compound [H]N[C@@H](CS)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCC(O)=O)C(O)=O MBRWOKXNHTUJMB-CIUDSAMLSA-N 0.000 description 2
- KFYPRIGJTICABD-XGEHTFHBSA-N Cys-Thr-Val Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](C(C)C)C(=O)O)NC(=O)[C@H](CS)N)O KFYPRIGJTICABD-XGEHTFHBSA-N 0.000 description 2
- 102000053602 DNA Human genes 0.000 description 2
- 238000002965 ELISA Methods 0.000 description 2
- 241000701959 Escherichia virus Lambda Species 0.000 description 2
- 108700024394 Exon Proteins 0.000 description 2
- 108091060211 Expressed sequence tag Proteins 0.000 description 2
- ZHNUHDYFZUAESO-UHFFFAOYSA-N Formamide Chemical compound NC=O ZHNUHDYFZUAESO-UHFFFAOYSA-N 0.000 description 2
- RZSLYUUFFVHFRQ-FXQIFTODSA-N Gln-Ala-Glu Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](C)C(=O)N[C@@H](CCC(O)=O)C(O)=O RZSLYUUFFVHFRQ-FXQIFTODSA-N 0.000 description 2
- SXGMGNZEHFORAV-IUCAKERBSA-N Gln-Lys-Gly Chemical compound C(CCN)C[C@@H](C(=O)NCC(=O)O)NC(=O)[C@H](CCC(=O)N)N SXGMGNZEHFORAV-IUCAKERBSA-N 0.000 description 2
- OZEQPCDLCDRCGY-SOUVJXGZSA-N Gln-Phe-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CC2=CC=CC=C2)NC(=O)[C@H](CCC(=O)N)N)C(=O)O OZEQPCDLCDRCGY-SOUVJXGZSA-N 0.000 description 2
- MXOODARRORARSU-ACZMJKKPSA-N Glu-Ala-Ser Chemical compound C[C@@H](C(=O)N[C@@H](CO)C(=O)O)NC(=O)[C@H](CCC(=O)O)N MXOODARRORARSU-ACZMJKKPSA-N 0.000 description 2
- MUSGDMDGNGXULI-DCAQKATOSA-N Glu-Glu-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@@H](N)CCC(O)=O MUSGDMDGNGXULI-DCAQKATOSA-N 0.000 description 2
- QIQABBIDHGQXGA-ZPFDUUQYSA-N Glu-Ile-Arg Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CCCN=C(N)N)C(O)=O QIQABBIDHGQXGA-ZPFDUUQYSA-N 0.000 description 2
- ZAPFAWQHBOHWLL-GUBZILKMSA-N Glu-Ser-His Chemical compound C1=C(NC=N1)C[C@@H](C(=O)O)NC(=O)[C@H](CO)NC(=O)[C@H](CCC(=O)O)N ZAPFAWQHBOHWLL-GUBZILKMSA-N 0.000 description 2
- MXJYXYDREQWUMS-XKBZYTNZSA-N Glu-Thr-Ser Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CO)C(O)=O MXJYXYDREQWUMS-XKBZYTNZSA-N 0.000 description 2
- KIEICAOUSNYOLM-NRPADANISA-N Glu-Val-Ala Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](C)C(O)=O KIEICAOUSNYOLM-NRPADANISA-N 0.000 description 2
- QXUPRMQJDWJDFR-NRPADANISA-N Glu-Val-Ser Chemical compound CC(C)[C@H](NC(=O)[C@@H](N)CCC(O)=O)C(=O)N[C@@H](CO)C(O)=O QXUPRMQJDWJDFR-NRPADANISA-N 0.000 description 2
- XPJBQTCXPJNIFE-ZETCQYMHSA-N Gly-Gly-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)CNC(=O)CN XPJBQTCXPJNIFE-ZETCQYMHSA-N 0.000 description 2
- GGAPHLIUUTVYMX-QWRGUYRKSA-N Gly-Phe-Ser Chemical compound OC[C@@H](C([O-])=O)NC(=O)[C@@H](NC(=O)C[NH3+])CC1=CC=CC=C1 GGAPHLIUUTVYMX-QWRGUYRKSA-N 0.000 description 2
- YOBGUCWZPXJHTN-BQBZGAKWSA-N Gly-Ser-Arg Chemical compound NCC(=O)N[C@@H](CO)C(=O)N[C@H](C(O)=O)CCCN=C(N)N YOBGUCWZPXJHTN-BQBZGAKWSA-N 0.000 description 2
- IZVICCORZOSGPT-JSGCOSHPSA-N Gly-Val-Tyr Chemical compound [H]NCC(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O IZVICCORZOSGPT-JSGCOSHPSA-N 0.000 description 2
- DHMQDGOQFOQNFH-UHFFFAOYSA-N Glycine Chemical compound NCC(O)=O DHMQDGOQFOQNFH-UHFFFAOYSA-N 0.000 description 2
- JBCLFWXMTIKCCB-UHFFFAOYSA-N H-Gly-Phe-OH Natural products NCC(=O)NC(C(O)=O)CC1=CC=CC=C1 JBCLFWXMTIKCCB-UHFFFAOYSA-N 0.000 description 2
- VSLXGYMEHVAJBH-DLOVCJGASA-N His-Ala-Leu Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](C)C(=O)N[C@@H](CC(C)C)C(O)=O VSLXGYMEHVAJBH-DLOVCJGASA-N 0.000 description 2
- IAYPZSHNZQHQNO-KKUMJFAQSA-N His-Ser-Phe Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)O)NC(=O)[C@H](CO)NC(=O)[C@H](CC2=CN=CN2)N IAYPZSHNZQHQNO-KKUMJFAQSA-N 0.000 description 2
- 101000829171 Hypocrea virens (strain Gv29-8 / FGSC 10586) Effector TSP1 Proteins 0.000 description 2
- XLDYDEDTGMHUCZ-GHCJXIJMSA-N Ile-Asp-Cys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CS)C(=O)O)N XLDYDEDTGMHUCZ-GHCJXIJMSA-N 0.000 description 2
- HTDRTKMNJRRYOJ-SIUGBPQLSA-N Ile-Gln-Tyr Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 HTDRTKMNJRRYOJ-SIUGBPQLSA-N 0.000 description 2
- HASRFYOMVPJRPU-SRVKXCTJSA-N Leu-Arg-Glu Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CCC(O)=O)C(O)=O HASRFYOMVPJRPU-SRVKXCTJSA-N 0.000 description 2
- PJYSOYLLTJKZHC-GUBZILKMSA-N Leu-Asp-Gln Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@H](C(O)=O)CCC(N)=O PJYSOYLLTJKZHC-GUBZILKMSA-N 0.000 description 2
- PPBKJAQJAUHZKX-SRVKXCTJSA-N Leu-Cys-Leu Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CS)C(=O)N[C@H](C(O)=O)CC(C)C PPBKJAQJAUHZKX-SRVKXCTJSA-N 0.000 description 2
- HUEBCHPSXSQUGN-GARJFASQSA-N Leu-Cys-Pro Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CS)C(=O)N1CCC[C@@H]1C(=O)O)N HUEBCHPSXSQUGN-GARJFASQSA-N 0.000 description 2
- KAFOIVJDVSZUMD-DCAQKATOSA-N Leu-Gln-Gln Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCC(N)=O)C(O)=O KAFOIVJDVSZUMD-DCAQKATOSA-N 0.000 description 2
- KAFOIVJDVSZUMD-UHFFFAOYSA-N Leu-Gln-Gln Natural products CC(C)CC(N)C(=O)NC(CCC(N)=O)C(=O)NC(CCC(N)=O)C(O)=O KAFOIVJDVSZUMD-UHFFFAOYSA-N 0.000 description 2
- LIINDKYIGYTDLG-PPCPHDFISA-N Leu-Ile-Thr Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)O)C(O)=O LIINDKYIGYTDLG-PPCPHDFISA-N 0.000 description 2
- JVTYXRRFZCEPPK-RHYQMDGZSA-N Leu-Met-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](CCSC)NC(=O)[C@H](CC(C)C)N)O JVTYXRRFZCEPPK-RHYQMDGZSA-N 0.000 description 2
- NJMXCOOEFLMZSR-AVGNSLFASA-N Leu-Met-Val Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](C(C)C)C(O)=O NJMXCOOEFLMZSR-AVGNSLFASA-N 0.000 description 2
- PWPBLZXWFXJFHE-RHYQMDGZSA-N Leu-Pro-Thr Chemical compound CC(C)C[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H]([C@@H](C)O)C(O)=O PWPBLZXWFXJFHE-RHYQMDGZSA-N 0.000 description 2
- IZPVWNSAVUQBGP-CIUDSAMLSA-N Leu-Ser-Asp Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(O)=O)C(O)=O IZPVWNSAVUQBGP-CIUDSAMLSA-N 0.000 description 2
- AKVBOOKXVAMKSS-GUBZILKMSA-N Leu-Ser-Gln Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(N)=O)C(O)=O AKVBOOKXVAMKSS-GUBZILKMSA-N 0.000 description 2
- FDBTVENULFNTAL-XQQFMLRXSA-N Leu-Val-Pro Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](C(C)C)C(=O)N1CCC[C@@H]1C(=O)O)N FDBTVENULFNTAL-XQQFMLRXSA-N 0.000 description 2
- GCMWRRQAKQXDED-IUCAKERBSA-N Lys-Glu-Gly Chemical compound [NH3+]CCCC[C@H]([NH3+])C(=O)N[C@@H](CCC([O-])=O)C(=O)NCC([O-])=O GCMWRRQAKQXDED-IUCAKERBSA-N 0.000 description 2
- VEGLGAOVLFODGC-GUBZILKMSA-N Lys-Glu-Ser Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CO)C(O)=O VEGLGAOVLFODGC-GUBZILKMSA-N 0.000 description 2
- HAUUXTXKJNVIFY-ONGXEEELSA-N Lys-Gly-Val Chemical compound [H]N[C@@H](CCCCN)C(=O)NCC(=O)N[C@@H](C(C)C)C(O)=O HAUUXTXKJNVIFY-ONGXEEELSA-N 0.000 description 2
- MYZMQWHPDAYKIE-SRVKXCTJSA-N Lys-Leu-Ala Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C)C(O)=O MYZMQWHPDAYKIE-SRVKXCTJSA-N 0.000 description 2
- 241001529936 Murinae Species 0.000 description 2
- SITLTJHOQZFJGG-UHFFFAOYSA-N N-L-alpha-glutamyl-L-valine Natural products CC(C)C(C(O)=O)NC(=O)C(N)CCC(O)=O SITLTJHOQZFJGG-UHFFFAOYSA-N 0.000 description 2
- WYBVBIHNJWOLCJ-UHFFFAOYSA-N N-L-arginyl-L-leucine Natural products CC(C)CC(C(O)=O)NC(=O)C(N)CCCN=C(N)N WYBVBIHNJWOLCJ-UHFFFAOYSA-N 0.000 description 2
- 108010079364 N-glycylalanine Proteins 0.000 description 2
- 108010002311 N-glycylglutamic acid Proteins 0.000 description 2
- 241000283973 Oryctolagus cuniculus Species 0.000 description 2
- LZDIENNKWVXJMX-JYJNAYRXSA-N Phe-Arg-Arg Chemical compound NC(N)=NCCC[C@@H](C(O)=O)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@@H](N)CC1=CC=CC=C1 LZDIENNKWVXJMX-JYJNAYRXSA-N 0.000 description 2
- RFEXGCASCQGGHZ-STQMWFEESA-N Phe-Gly-Arg Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)NCC(=O)N[C@@H](CCCNC(N)=N)C(O)=O RFEXGCASCQGGHZ-STQMWFEESA-N 0.000 description 2
- NAXPHWZXEXNDIW-JTQLQIEISA-N Phe-Gly-Gly Chemical compound OC(=O)CNC(=O)CNC(=O)[C@@H](N)CC1=CC=CC=C1 NAXPHWZXEXNDIW-JTQLQIEISA-N 0.000 description 2
- LSIWVWRUTKPXDS-DCAQKATOSA-N Pro-Gln-Arg Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O LSIWVWRUTKPXDS-DCAQKATOSA-N 0.000 description 2
- AFXCXDQNRXTSBD-FJXKBIBVSA-N Pro-Gly-Thr Chemical compound [H]N1CCC[C@H]1C(=O)NCC(=O)N[C@@H]([C@@H](C)O)C(O)=O AFXCXDQNRXTSBD-FJXKBIBVSA-N 0.000 description 2
- MCWHYUWXVNRXFV-RWMBFGLXSA-N Pro-Leu-Pro Chemical compound CC(C)C[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@@H]2CCCN2 MCWHYUWXVNRXFV-RWMBFGLXSA-N 0.000 description 2
- 108010076504 Protein Sorting Signals Proteins 0.000 description 2
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 description 2
- 108020004511 Recombinant DNA Proteins 0.000 description 2
- 241000235070 Saccharomyces Species 0.000 description 2
- YRBGKVIWMNEVCZ-WDSKDSINSA-N Ser-Glu-Gly Chemical compound OC[C@H](N)C(=O)N[C@@H](CCC(O)=O)C(=O)NCC(O)=O YRBGKVIWMNEVCZ-WDSKDSINSA-N 0.000 description 2
- FUMGHWDRRFCKEP-CIUDSAMLSA-N Ser-Leu-Ala Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C)C(O)=O FUMGHWDRRFCKEP-CIUDSAMLSA-N 0.000 description 2
- QYSFWUIXDFJUDW-DCAQKATOSA-N Ser-Leu-Arg Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O QYSFWUIXDFJUDW-DCAQKATOSA-N 0.000 description 2
- ZIFYDQAFEMIZII-GUBZILKMSA-N Ser-Leu-Glu Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(O)=O)C(O)=O ZIFYDQAFEMIZII-GUBZILKMSA-N 0.000 description 2
- GYDFRTRSSXOZCR-ACZMJKKPSA-N Ser-Ser-Glu Chemical compound OC[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@H](C(O)=O)CCC(O)=O GYDFRTRSSXOZCR-ACZMJKKPSA-N 0.000 description 2
- XQJCEKXQUJQNNK-ZLUOBGJFSA-N Ser-Ser-Ser Chemical compound OC[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@@H](CO)C(O)=O XQJCEKXQUJQNNK-ZLUOBGJFSA-N 0.000 description 2
- ANOQEBQWIAYIMV-AEJSXWLSSA-N Ser-Val-Pro Chemical compound CC(C)[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CO)N ANOQEBQWIAYIMV-AEJSXWLSSA-N 0.000 description 2
- 238000002105 Southern blotting Methods 0.000 description 2
- XOWKUMFHEZLKLT-CIQUZCHMSA-N Thr-Ile-Ala Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](C)C(O)=O XOWKUMFHEZLKLT-CIQUZCHMSA-N 0.000 description 2
- MEJHFIOYJHTWMK-VOAKCMCISA-N Thr-Leu-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)[C@@H](C)O MEJHFIOYJHTWMK-VOAKCMCISA-N 0.000 description 2
- MEBDIIKMUUNBSB-RPTUDFQQSA-N Thr-Phe-Tyr Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O MEBDIIKMUUNBSB-RPTUDFQQSA-N 0.000 description 2
- LVRFMARKDGGZMX-IZPVPAKOSA-N Thr-Tyr-Thr Chemical compound C[C@@H](O)[C@H](N)C(=O)N[C@H](C(=O)N[C@@H]([C@@H](C)O)C(O)=O)CC1=CC=C(O)C=C1 LVRFMARKDGGZMX-IZPVPAKOSA-N 0.000 description 2
- HSVPZJLMPLMPOX-BPNCWPANSA-N Tyr-Arg-Ala Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](C)C(O)=O HSVPZJLMPLMPOX-BPNCWPANSA-N 0.000 description 2
- NSGZILIDHCIZAM-KKUMJFAQSA-N Tyr-Leu-Ser Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CO)C(=O)O)NC(=O)[C@H](CC1=CC=C(C=C1)O)N NSGZILIDHCIZAM-KKUMJFAQSA-N 0.000 description 2
- CDBXVDXSLPLFMD-BPNCWPANSA-N Tyr-Pro-Ala Chemical compound OC(=O)[C@H](C)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)CC1=CC=C(O)C=C1 CDBXVDXSLPLFMD-BPNCWPANSA-N 0.000 description 2
- RGYCVIZZTUBSSG-JYJNAYRXSA-N Tyr-Pro-Val Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N1CCC[C@H]1C(=O)N[C@@H](C(C)C)C(O)=O RGYCVIZZTUBSSG-JYJNAYRXSA-N 0.000 description 2
- DDRBQONWVBDQOY-GUBZILKMSA-N Val-Ala-Arg Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@@H](CCCN=C(N)N)C(O)=O DDRBQONWVBDQOY-GUBZILKMSA-N 0.000 description 2
- HNWQUBBOBKSFQV-AVGNSLFASA-N Val-Arg-His Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CC1=CN=CN1)C(=O)O)N HNWQUBBOBKSFQV-AVGNSLFASA-N 0.000 description 2
- LNYOXPDEIZJDEI-NHCYSSNCSA-N Val-Asn-Leu Chemical compound CC(C)C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](C(C)C)N LNYOXPDEIZJDEI-NHCYSSNCSA-N 0.000 description 2
- VXCAZHCVDBQMTP-NRPADANISA-N Val-Cys-Gln Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CS)C(=O)N[C@@H](CCC(=O)N)C(=O)O)N VXCAZHCVDBQMTP-NRPADANISA-N 0.000 description 2
- VFOHXOLPLACADK-GVXVVHGQSA-N Val-Gln-Leu Chemical compound CC(C)C[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)N)NC(=O)[C@H](C(C)C)N VFOHXOLPLACADK-GVXVVHGQSA-N 0.000 description 2
- XGJLNBNZNMVJRS-NRPADANISA-N Val-Glu-Ala Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](C)C(O)=O XGJLNBNZNMVJRS-NRPADANISA-N 0.000 description 2
- AEFJNECXZCODJM-UWVGGRQHSA-N Val-Val-Gly Chemical compound CC(C)[C@H]([NH3+])C(=O)N[C@@H](C(C)C)C(=O)NCC([O-])=O AEFJNECXZCODJM-UWVGGRQHSA-N 0.000 description 2
- LLJLBRRXKZTTRD-GUBZILKMSA-N Val-Val-Ser Chemical compound CC(C)[C@@H](C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CO)C(=O)O)N LLJLBRRXKZTTRD-GUBZILKMSA-N 0.000 description 2
- 150000007513 acids Chemical class 0.000 description 2
- 239000011543 agarose gel Substances 0.000 description 2
- 230000004075 alteration Effects 0.000 description 2
- 230000003321 amplification Effects 0.000 description 2
- 230000000692 anti-sense effect Effects 0.000 description 2
- 230000000890 antigenic effect Effects 0.000 description 2
- 239000008365 aqueous carrier Substances 0.000 description 2
- 108010072041 arginyl-glycyl-aspartic acid Proteins 0.000 description 2
- 108010093581 aspartyl-proline Proteins 0.000 description 2
- 230000037429 base substitution Effects 0.000 description 2
- 210000004556 brain Anatomy 0.000 description 2
- 239000000969 carrier Substances 0.000 description 2
- 210000000170 cell membrane Anatomy 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 238000003776 cleavage reaction Methods 0.000 description 2
- 239000013599 cloning vector Substances 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000002405 diagnostic procedure Methods 0.000 description 2
- FSXRLASFHBWESK-UHFFFAOYSA-N dipeptide phenylalanyl-tyrosine Natural products C=1C=C(O)C=CC=1CC(C(O)=O)NC(=O)C(N)CC1=CC=CC=C1 FSXRLASFHBWESK-UHFFFAOYSA-N 0.000 description 2
- 229940079593 drug Drugs 0.000 description 2
- 210000003527 eukaryotic cell Anatomy 0.000 description 2
- 108010067216 glycyl-glycyl-glycine Proteins 0.000 description 2
- 108010026364 glycyl-glycyl-leucine Proteins 0.000 description 2
- 108010089804 glycyl-threonine Proteins 0.000 description 2
- 108010081551 glycylphenylalanine Proteins 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 210000003494 hepatocyte Anatomy 0.000 description 2
- 108010040030 histidinoalanine Proteins 0.000 description 2
- 210000003917 human chromosome Anatomy 0.000 description 2
- 150000002500 ions Chemical class 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 239000012528 membrane Substances 0.000 description 2
- 210000004379 membrane Anatomy 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 108010070409 phenylalanyl-glycyl-glycine Proteins 0.000 description 2
- 102000054765 polymorphisms of proteins Human genes 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 108010090894 prolylleucine Proteins 0.000 description 2
- 230000000069 prophylactic effect Effects 0.000 description 2
- 230000002685 pulmonary effect Effects 0.000 description 2
- 230000007017 scission Effects 0.000 description 2
- 241000894007 species Species 0.000 description 2
- 238000010561 standard procedure Methods 0.000 description 2
- 210000002536 stromal cell Anatomy 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 108010003137 tyrosyltyrosine Proteins 0.000 description 2
- 241000701161 unidentified adenovirus Species 0.000 description 2
- 238000011144 upstream manufacturing Methods 0.000 description 2
- 101150028074 2 gene Proteins 0.000 description 1
- SVTBMSDMJJWYQN-UHFFFAOYSA-N 2-methylpentane-2,4-diol Chemical compound CC(O)CC(C)(C)O SVTBMSDMJJWYQN-UHFFFAOYSA-N 0.000 description 1
- 101150098072 20 gene Proteins 0.000 description 1
- YIGLXQRFQVWFEY-NRPADANISA-N Ala-Gln-Val Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](C(C)C)C(O)=O YIGLXQRFQVWFEY-NRPADANISA-N 0.000 description 1
- BVSGPHDECMJBDE-HGNGGELXSA-N Ala-Glu-His Chemical compound C[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CC1=CN=CN1)C(=O)O)N BVSGPHDECMJBDE-HGNGGELXSA-N 0.000 description 1
- AWNAEZICPNGAJK-FXQIFTODSA-N Ala-Met-Ser Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CO)C(O)=O AWNAEZICPNGAJK-FXQIFTODSA-N 0.000 description 1
- 108020000948 Antisense Oligonucleotides Proteins 0.000 description 1
- DPSUVAPLRQDWAO-YDHLFZDLSA-N Asn-Tyr-Val Chemical compound CC(C)[C@@H](C(=O)O)NC(=O)[C@H](CC1=CC=C(C=C1)O)NC(=O)[C@H](CC(=O)N)N DPSUVAPLRQDWAO-YDHLFZDLSA-N 0.000 description 1
- DTNUIAJCPRMNBT-WHFBIAKZSA-N Asp-Gly-Ala Chemical compound [H]N[C@@H](CC(O)=O)C(=O)NCC(=O)N[C@@H](C)C(O)=O DTNUIAJCPRMNBT-WHFBIAKZSA-N 0.000 description 1
- 241000193830 Bacillus <bacterium> Species 0.000 description 1
- 244000063299 Bacillus subtilis Species 0.000 description 1
- 235000014469 Bacillus subtilis Nutrition 0.000 description 1
- 201000004569 Blindness Diseases 0.000 description 1
- 241000282693 Cercopithecidae Species 0.000 description 1
- 101100007328 Cocos nucifera COS-1 gene Proteins 0.000 description 1
- BPHKULHWEIUDOB-FXQIFTODSA-N Cys-Gln-Gln Chemical compound SC[C@H](N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCC(N)=O)C(O)=O BPHKULHWEIUDOB-FXQIFTODSA-N 0.000 description 1
- UPURLDIGQGTUPJ-ZKWXMUAHSA-N Cys-Gly-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)CNC(=O)[C@H](CS)N UPURLDIGQGTUPJ-ZKWXMUAHSA-N 0.000 description 1
- 201000003883 Cystic fibrosis Diseases 0.000 description 1
- 239000003155 DNA primer Substances 0.000 description 1
- 239000003298 DNA probe Substances 0.000 description 1
- 230000004543 DNA replication Effects 0.000 description 1
- 241000702421 Dependoparvovirus Species 0.000 description 1
- 108010042407 Endonucleases Proteins 0.000 description 1
- 241000283073 Equus caballus Species 0.000 description 1
- 229920001917 Ficoll Polymers 0.000 description 1
- 108700028146 Genetic Enhancer Elements Proteins 0.000 description 1
- 208000002966 Giant Cell Tumor of Bone Diseases 0.000 description 1
- LPYPANUXJGFMGV-FXQIFTODSA-N Gln-Gln-Ala Chemical compound C[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)N)NC(=O)[C@H](CCC(=O)N)N LPYPANUXJGFMGV-FXQIFTODSA-N 0.000 description 1
- WHVLABLIJYGVEK-QEWYBTABSA-N Gln-Phe-Ile Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O WHVLABLIJYGVEK-QEWYBTABSA-N 0.000 description 1
- KHHDJQRWIFHXHS-NRPADANISA-N Gln-Val-Cys Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CS)C(=O)O)NC(=O)[C@H](CCC(=O)N)N KHHDJQRWIFHXHS-NRPADANISA-N 0.000 description 1
- WVTIBGWZUMJBFY-GUBZILKMSA-N Glu-His-Ser Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CO)C(O)=O WVTIBGWZUMJBFY-GUBZILKMSA-N 0.000 description 1
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 1
- AQLHORCVPGXDJW-IUCAKERBSA-N Gly-Gln-Lys Chemical compound C(CCN)C[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)N)NC(=O)CN AQLHORCVPGXDJW-IUCAKERBSA-N 0.000 description 1
- LXTRSHQLGYINON-DTWKUNHWSA-N Gly-Met-Pro Chemical compound CSCC[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)CN LXTRSHQLGYINON-DTWKUNHWSA-N 0.000 description 1
- 239000004471 Glycine Substances 0.000 description 1
- 241000238631 Hexapoda Species 0.000 description 1
- 241000701044 Human gammaherpesvirus 4 Species 0.000 description 1
- IALVDKNUFSTICJ-GMOBBJLQSA-N Ile-Met-Asp Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CC(=O)O)C(=O)O)N IALVDKNUFSTICJ-GMOBBJLQSA-N 0.000 description 1
- 108091092195 Intron Proteins 0.000 description 1
- 102000003960 Ligases Human genes 0.000 description 1
- 108090000364 Ligases Proteins 0.000 description 1
- 241000234435 Lilium Species 0.000 description 1
- 239000006154 MacConkey agar Substances 0.000 description 1
- 241000699666 Mus <mouse, genus> Species 0.000 description 1
- 108091061960 Naked DNA Proteins 0.000 description 1
- 229930193140 Neomycin Natural products 0.000 description 1
- 239000000020 Nitrocellulose Substances 0.000 description 1
- 101710163270 Nuclease Proteins 0.000 description 1
- 101710149086 Nuclease S1 Proteins 0.000 description 1
- 108020004711 Nucleic Acid Probes Proteins 0.000 description 1
- 229910019142 PO4 Inorganic materials 0.000 description 1
- 108020002230 Pancreatic Ribonuclease Proteins 0.000 description 1
- 102000005891 Pancreatic ribonuclease Human genes 0.000 description 1
- 108010033276 Peptide Fragments Proteins 0.000 description 1
- 102000007079 Peptide Fragments Human genes 0.000 description 1
- FPTXMUIBLMGTQH-ONGXEEELSA-N Phe-Ala-Gly Chemical compound OC(=O)CNC(=O)[C@H](C)NC(=O)[C@@H](N)CC1=CC=CC=C1 FPTXMUIBLMGTQH-ONGXEEELSA-N 0.000 description 1
- 230000004570 RNA-binding Effects 0.000 description 1
- 241000700159 Rattus Species 0.000 description 1
- 102000006382 Ribonucleases Human genes 0.000 description 1
- 108010083644 Ribonucleases Proteins 0.000 description 1
- 229920005654 Sephadex Polymers 0.000 description 1
- 239000012507 Sephadex™ Substances 0.000 description 1
- XXNYYSXNXCJYKX-DCAQKATOSA-N Ser-Leu-Met Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCSC)C(O)=O XXNYYSXNXCJYKX-DCAQKATOSA-N 0.000 description 1
- NQZFFLBPNDLTPO-DLOVCJGASA-N Ser-Phe-Ala Chemical compound C[C@@H](C(=O)O)NC(=O)[C@H](CC1=CC=CC=C1)NC(=O)[C@H](CO)N NQZFFLBPNDLTPO-DLOVCJGASA-N 0.000 description 1
- 208000036623 Severe mental retardation Diseases 0.000 description 1
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 1
- 108091081024 Start codon Proteins 0.000 description 1
- 208000037065 Subacute sclerosing leukoencephalitis Diseases 0.000 description 1
- 206010042297 Subacute sclerosing panencephalitis Diseases 0.000 description 1
- QAOWNCQODCNURD-UHFFFAOYSA-L Sulfate Chemical compound [O-]S([O-])(=O)=O QAOWNCQODCNURD-UHFFFAOYSA-L 0.000 description 1
- 210000001744 T-lymphocyte Anatomy 0.000 description 1
- 108010006224 TGGCCA-specific type II deoxyribonucleases Proteins 0.000 description 1
- 108010022394 Threonine synthase Proteins 0.000 description 1
- 108700009124 Transcription Initiation Site Proteins 0.000 description 1
- 102000004142 Trypsin Human genes 0.000 description 1
- 108090000631 Trypsin Proteins 0.000 description 1
- 108010082433 UDP-glucose-hexose-1-phosphate uridylyltransferase Proteins 0.000 description 1
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 1
- YRKCREAYFQTBPV-UHFFFAOYSA-N acetylacetone Chemical compound CC(=O)CC(C)=O YRKCREAYFQTBPV-UHFFFAOYSA-N 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 239000013543 active substance Substances 0.000 description 1
- 108010005233 alanylglutamic acid Proteins 0.000 description 1
- 108010047495 alanylglycine Proteins 0.000 description 1
- 210000004381 amniotic fluid Anatomy 0.000 description 1
- 229960000723 ampicillin Drugs 0.000 description 1
- 208000036878 aneuploidy Diseases 0.000 description 1
- 231100001075 aneuploidy Toxicity 0.000 description 1
- 210000000628 antibody-producing cell Anatomy 0.000 description 1
- 239000000074 antisense oligonucleotide Substances 0.000 description 1
- 238000012230 antisense oligonucleotides Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000011888 autopsy Methods 0.000 description 1
- 239000003833 bile salt Substances 0.000 description 1
- 229940093761 bile salts Drugs 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 238000001574 biopsy Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 239000011449 brick Substances 0.000 description 1
- 239000006172 buffering agent Substances 0.000 description 1
- 239000001506 calcium phosphate Substances 0.000 description 1
- 229910000389 calcium phosphate Inorganic materials 0.000 description 1
- 235000011010 calcium phosphates Nutrition 0.000 description 1
- 230000007910 cell fusion Effects 0.000 description 1
- 239000013592 cell lysate Substances 0.000 description 1
- 108091092328 cellular RNA Proteins 0.000 description 1
- 210000004252 chorionic villi Anatomy 0.000 description 1
- 238000004587 chromatography analysis Methods 0.000 description 1
- 239000013611 chromosomal DNA Substances 0.000 description 1
- 238000007398 colorimetric assay Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 210000000695 crystalline len Anatomy 0.000 description 1
- 210000004748 cultured cell Anatomy 0.000 description 1
- 230000001086 cytosolic effect Effects 0.000 description 1
- 230000006735 deficit Effects 0.000 description 1
- 238000003936 denaturing gel electrophoresis Methods 0.000 description 1
- 239000000032 diagnostic agent Substances 0.000 description 1
- 229940039227 diagnostic agent Drugs 0.000 description 1
- 235000005911 diet Nutrition 0.000 description 1
- 230000037213 diet Effects 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 239000000975 dye Substances 0.000 description 1
- 235000013601 eggs Nutrition 0.000 description 1
- 238000004520 electroporation Methods 0.000 description 1
- 210000002257 embryonic structure Anatomy 0.000 description 1
- 210000002889 endothelial cell Anatomy 0.000 description 1
- 239000003623 enhancer Substances 0.000 description 1
- 230000002255 enzymatic effect Effects 0.000 description 1
- 210000002919 epithelial cell Anatomy 0.000 description 1
- 210000003743 erythrocyte Anatomy 0.000 description 1
- 230000001747 exhibiting effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000013613 expression plasmid Substances 0.000 description 1
- 238000000855 fermentation Methods 0.000 description 1
- 230000004151 fermentation Effects 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 239000007850 fluorescent dye Substances 0.000 description 1
- 238000002825 functional assay Methods 0.000 description 1
- FBPFZTCFMRRESA-GUCUJZIJSA-N galactitol Chemical compound OC[C@H](O)[C@@H](O)[C@@H](O)[C@H](O)CO FBPFZTCFMRRESA-GUCUJZIJSA-N 0.000 description 1
- 230000009395 genetic defect Effects 0.000 description 1
- 238000010353 genetic engineering Methods 0.000 description 1
- 239000008103 glucose Substances 0.000 description 1
- 239000011544 gradient gel Substances 0.000 description 1
- 239000001963 growth medium Substances 0.000 description 1
- 210000002216 heart Anatomy 0.000 description 1
- 230000028993 immune response Effects 0.000 description 1
- 210000000987 immune system Anatomy 0.000 description 1
- 238000000338 in vitro Methods 0.000 description 1
- 238000010249 in-situ analysis Methods 0.000 description 1
- 238000011065 in-situ storage Methods 0.000 description 1
- 208000015181 infectious disease Diseases 0.000 description 1
- 238000001802 infusion Methods 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 210000000936 intestine Anatomy 0.000 description 1
- 238000010255 intramuscular injection Methods 0.000 description 1
- 239000007927 intramuscular injection Substances 0.000 description 1
- 238000001990 intravenous administration Methods 0.000 description 1
- 210000003292 kidney cell Anatomy 0.000 description 1
- 208000032839 leukemia Diseases 0.000 description 1
- 210000000265 leukocyte Anatomy 0.000 description 1
- 210000004698 lymphocyte Anatomy 0.000 description 1
- 239000006166 lysate Substances 0.000 description 1
- 210000004962 mammalian cell Anatomy 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 239000002609 medium Substances 0.000 description 1
- 230000003340 mental effect Effects 0.000 description 1
- 208000030159 metabolic disease Diseases 0.000 description 1
- 230000002503 metabolic effect Effects 0.000 description 1
- 230000004060 metabolic process Effects 0.000 description 1
- MYWUZJCMWCOHBA-VIFPVBQESA-N methamphetamine Chemical compound CN[C@@H](C)CC1=CC=CC=C1 MYWUZJCMWCOHBA-VIFPVBQESA-N 0.000 description 1
- 108010005942 methionylglycine Proteins 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 230000011278 mitosis Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010369 molecular cloning Methods 0.000 description 1
- 210000003205 muscle Anatomy 0.000 description 1
- 230000000869 mutational effect Effects 0.000 description 1
- 229960004927 neomycin Drugs 0.000 description 1
- PGSADBUBUOPOJS-UHFFFAOYSA-N neutral red Chemical compound Cl.C1=C(C)C(N)=CC2=NC3=CC(N(C)C)=CC=C3N=C21 PGSADBUBUOPOJS-UHFFFAOYSA-N 0.000 description 1
- 229920001220 nitrocellulos Polymers 0.000 description 1
- 238000007899 nucleic acid hybridization Methods 0.000 description 1
- 239000002853 nucleic acid probe Substances 0.000 description 1
- 231100000590 oncogenic Toxicity 0.000 description 1
- 230000002246 oncogenic effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 210000001672 ovary Anatomy 0.000 description 1
- 239000003002 pH adjusting agent Substances 0.000 description 1
- 239000013618 particulate matter Substances 0.000 description 1
- 230000032696 parturition Effects 0.000 description 1
- 238000010647 peptide synthesis reaction Methods 0.000 description 1
- 210000001428 peripheral nervous system Anatomy 0.000 description 1
- HTYIXCKSEQQCJO-UHFFFAOYSA-N phenaglycodol Chemical compound CC(C)(O)C(C)(O)C1=CC=C(Cl)C=C1 HTYIXCKSEQQCJO-UHFFFAOYSA-N 0.000 description 1
- 239000010452 phosphate Substances 0.000 description 1
- 230000004962 physiological condition Effects 0.000 description 1
- 230000008488 polyadenylation Effects 0.000 description 1
- 229920000036 polyvinylpyrrolidone Polymers 0.000 description 1
- 239000001267 polyvinylpyrrolidone Substances 0.000 description 1
- 235000013855 polyvinylpyrrolidone Nutrition 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000001323 posttranslational effect Effects 0.000 description 1
- 238000001556 precipitation Methods 0.000 description 1
- 230000002028 premature Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 125000002924 primary amino group Chemical group [H]N([H])* 0.000 description 1
- 238000001742 protein purification Methods 0.000 description 1
- 230000002797 proteolythic effect Effects 0.000 description 1
- 238000003127 radioimmunoassay Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000241 respiratory effect Effects 0.000 description 1
- 210000001525 retina Anatomy 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 210000003296 saliva Anatomy 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000004513 sizing Methods 0.000 description 1
- 210000002027 skeletal muscle Anatomy 0.000 description 1
- 239000011780 sodium chloride Substances 0.000 description 1
- 239000007790 solid phase Substances 0.000 description 1
- 230000001954 sterilising effect Effects 0.000 description 1
- 238000004659 sterilization and disinfection Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 239000006228 supernatant Substances 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 229940124597 therapeutic agent Drugs 0.000 description 1
- 230000005030 transcription termination Effects 0.000 description 1
- 230000002103 transcriptional effect Effects 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 230000005945 translocation Effects 0.000 description 1
- QORWJWZARLRLPR-UHFFFAOYSA-H tricalcium bis(phosphate) Chemical compound [Ca+2].[Ca+2].[Ca+2].[O-]P([O-])([O-])=O.[O-]P([O-])([O-])=O QORWJWZARLRLPR-UHFFFAOYSA-H 0.000 description 1
- 239000012588 trypsin Substances 0.000 description 1
- 241000701447 unidentified baculovirus Species 0.000 description 1
- 241001515965 unidentified phage Species 0.000 description 1
- 241001430294 unidentified retrovirus Species 0.000 description 1
- 210000002700 urine Anatomy 0.000 description 1
- 108010073969 valyllysine Proteins 0.000 description 1
- 238000001262 western blot Methods 0.000 description 1
Classifications
-
- A—HUMAN NECESSITIES
- A01—AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
- A01K—ANIMAL HUSBANDRY; AVICULTURE; APICULTURE; PISCICULTURE; FISHING; REARING OR BREEDING ANIMALS, NOT OTHERWISE PROVIDED FOR; NEW BREEDS OF ANIMALS
- A01K67/00—Rearing or breeding animals, not otherwise provided for; New or modified breeds of animals
- A01K67/027—New or modified breeds of vertebrates
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K16/00—Immunoglobulins [IGs], e.g. monoclonal or polyclonal antibodies
- C07K16/40—Immunoglobulins [IGs], e.g. monoclonal or polyclonal antibodies against enzymes
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/10—Transferases (2.)
- C12N9/12—Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/10—Transferases (2.)
- C12N9/12—Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
- C12N9/1205—Phosphotransferases with an alcohol group as acceptor (2.7.1), e.g. protein kinases
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/53—Immunoassay; Biospecific binding assay; Materials therefor
- G01N33/573—Immunoassay; Biospecific binding assay; Materials therefor for enzymes or isoenzymes
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Zoology (AREA)
- Genetics & Genomics (AREA)
- Wood Science & Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Immunology (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Biomedical Technology (AREA)
- Medicinal Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- General Engineering & Computer Science (AREA)
- Analytical Chemistry (AREA)
- Urology & Nephrology (AREA)
- Biophysics (AREA)
- Physics & Mathematics (AREA)
- Environmental Sciences (AREA)
- Hematology (AREA)
- Cell Biology (AREA)
- Animal Behavior & Ethology (AREA)
- Food Science & Technology (AREA)
- Biodiversity & Conservation Biology (AREA)
- Animal Husbandry (AREA)
- General Physics & Mathematics (AREA)
- Pathology (AREA)
- Peptides Or Proteins (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
Abstract
This invention relates to human galactokinase and the identification of galactokinase mutations, a missense and nonsense, as well as isolated nucleic acids encoding same, recombinant host cell transformed with DNA encoding such proteins and to uses of the expressed proteins and nucleic acid sequences in therapeutic and diagnostic applications.
Description
Human GalactQI-in~e Gene This invention was made in part with go~ lllellt support under EY-09404 awarded by the National Institutes of Health. The U.S. Gove~ ent has certain rights in the invention.
Cross-Reference to R~ ed Applications:
This appli~tion is a continuation in part of Serial No. PCT/US94/10825, filed 23 September 1994.
Field of the Invention:
This invention relates to human galactokinase and the identifi~ation of g~l~stokin~ce mutations, a mi~sen~e and nonsense, as well as isolated nucleic acids encoding same, recombinant host cell transformed with DNA encoding such proteins and to uses of the expressed proteins and nucleic acid sequences in thcl~e~l~ic and diagnostic applications.
Background of the Invention:
There are numerous inherited human metabolic disorders, most of which are recessive. Many have devastating effects that may include a combination of several clinical features, such as severe mental retardation, impairment of the peripheral nervous system, blindness, hearing deficiency and organomegaly. Most of the disorders are rare. However, the majority of such disorders cannot be treated bydrugs.
Galact~ kin~e deficiency is one of three known forms of galactosemia. The other forms are galactose-l-phosphate uridyltransferase deficiency and UDP-g~ rtose-4-e~ cl~se deficiency. All three enzymes are irlvolved in galactose metabolism, i.e., the conversion of galactose to glucose in the body. Galactokin~e defici~-ncy is inherited as an autosomal recessive trait with a heterozygote frequency estim~teA to be 0.2% in the general population (see, e.g., Levy et al., J. Pediatr.~
~2:871-877 (1978)). Patients with homozygous galactokinase deficiency usually become symptomatic in the early infantile period showing galactosemia, galactosura, increased galactitol levels, cataracts and in a few cases, mental 40 retardation (Segal et al., J. Pediatr~:750-752 (1979)). These symptoms usually improve dramatically with the ~iministration of a galactose free diet.
Heterozygotes for galactokinase deficiency are prone to presenile cataracts with the S onset during 20-50 years of age (Stambolian et al., Invest. Ophthal. Vis. Sci..
~7:429-433 (1986)).
Galactokin~ce activity has been found in a variety of m~mm~ n tissues, including liver, kidney, brain, lens, placenta, erythrocytes and leukocytes. While the protein has been purified from E. coli, the purification of the protein from10 m~mm,.li~n tissues has proven difficult due to its low cellular concentration. In addition, the molecular basis of galactokinase deficiency is unknown.
This invention provides a human galactokin~ce gene. The DNAs of this invention, such as the specific sequences disclosed herein, are useful in that they encode the genetic information required for expression of this protein. Additionally, 15 the sequences may be used as probes in order to isolate and identify additional members, of the family, type andlor subtype as well mutations which may form thebasis of galactokin~ce deficiency which may be characterized by site-specific mutations or by atypical expression of the galactokinase gene. The galactokin~cegene is also useful as a diagnostic agent to identify mutant galactokin~ce proteins or 20 as a therapeutic agent via gene therapy.
The first clinical trials of gene therapy began in 1990. Since that time, more than 70 clinical trial protocols have been reviewed and approved by a regulatory authority such as the NIH's Recombinant Advisory Co..,.~ e (RAC), see, e.g., Anderson, W. F., Human Gene Therapy~ 5:281-282 (1994). The 25 therapeutic treatm~nt of rlice~ces and disorders by gene therapy involves the transfer and stable insertion of new genetic information into cells. The correction of a genetic defect by re-introduction of the norrnal allele of a gene has hence demonstrated that this concept is clinically feasible (see, e.g., Rosenberg et al., New Eng. J. Med.. ~: 570 (1990)).
These and additional uses for the reagents described herein will become a~,arel1t to those of ordinary skill in the art upon reading this specification.
Summary of the Invention:
This invention provides isolated nucleic acid molecules encoding human galactokin~ce, as well as nucleic acid molecules encoding missense and nonsense mutations, which includes mRNAs, DNAs (e.g., cDNA, genomic DNA, etc.), as well as ~nticence analogs thereof and diagnostically or therapeutically useful fragments thereof.
This invention also provides recombinant vectors, such as cloning and expression plasmids useful as reagents in the recombinant production of human wo 96/09374 2 2 0 0 5 8 3 Pcr/uss5lo6743 5 galactokin~e proteins, as well as recombin~nt prokaryotic and/or eukaryotic host cells comprising a human galactokinase nucleic acid sequence.
This invention also provides a process for preparing human galactokinase proteins which comprises culturing recombinant prokaryotic and/or eukaryotic host - cells, cont~ining a human galactnkin~e nucleic acid sequence, under conditions promoting expression of said protein and subsequent recovery thereof of said protein. Another related aspect of this invention is isolated human galactokinase proteins produced by said method. In yet another aspect, this invention also provides antibodies that are directed to (i.e., bind) human g~l~ctokin~se proteins.
This invention also provides an i~ol~tecl human galactokin~ce proteins having a mi~sen~e or nonsense mutation and antibodies (monoclonal or polyclonal)that are specifically reactive with said proteins.
This invention also provides nucleic acid probes and PCR primers comprising nucleic acid molecules of sufficient length to specifically hybridize to human galactokin~ce sequences.
This invention also provides a method to diagnose human g~l~ctolin~e deficiency which comprises isolating a nucleic acid sample from an individual and assaying the sequence of said nucleic acid sample with the reference gene of theinvention and cc,lllpa~ g dirrerences between said sample and the nucleic acid of the instant invention, wherein said differences indicate mutations in the human g~ tckin~e gene isolated from an individual. The sample can be assayed by direct sequence co~ ~ison (i.e., DNA sequencing), wherein the sample nucleic acid can be compared to the reference galactokin~e gene, by hybridi_ation (e.g.,mobility shift assays such as heteroduplex gel electrophoresis, SSCP or other techniques such as Northern or Southern blotting which are based upon the length of the nucleic acid sequence) or other known gel electrophoresis methods such as RLFP (for example, by restriction endonuclease digestion of a sample amplified by PCR (for DNA) or PCR-RT (for RNA)). ~ltern~tively~ the diagnostic method comprises isolating cells from an individual containing genomic DNA and assayingsaid sample (e.g., cellular RNA) by in situ hyb~ifli7~tion using the DNA sequçnce of the invention, or at least one exon, or a fragment containing at least 15, preferably 18, and more preferably 21 contiguous base pairs as a probe. This invention alsoprovides an anti~en~e oligonucleotide having a sequence capable of binding with mRNAs encoding human galactokinase so as to identify mutant g~l~ctokin~e genes.
This invention also provides yet another method to diagnose human galactokinase deficiency which comprises obtaining a serum or tissue sarnple;
allowing such sample to come in contact with an antibody or antibody fragment W O 96/09374 ~ ~ O D 5 ~ 3 PCTrUS95/06743 S which specific~lly binds to a mutant human galactokin~ce protein of the invention under conditions such that an antigen-antibody complex is formed between said antibody (or antibody fragment) and said mutant galactokinase protein; and ~letecting the presence or absence of said complex.
This invention also provides transgenic non-human ~nim~lc comprising a nucleic acid molecule encoding human g~l~ctokin~ce. Also provided are methods for use of said transgenic ~nim~lc as models for disease states, mutation and SAR.
This invention also provides a method for treating conditions which are related to incnlffi~ient human galactokinase activity which comprises ~tlminictering to a patient in need thereof a pharmaceutical composition containing the galactcl~in~ce protein of the invention which is effective to supplement a patient's endogenousg~lactr)l~in~ce and thereby alleviating said condition.
This invention also provides a method for treating conditions which are related to insufficient human galactokinase activity via gene therapy. An additional, or reference, gene comprising the non-mutant galactokinase gene of the instant invention is inserted into a patient's cells either in vivo or ex vivo. The reference gene is c~yl~ssed in transfected cells and as a result, the protein encoded by the reference gene ~oll~cls the defect (i.e., galactokinase deficiency) thus ye~ iuing the transfected cells to function normally and alleviating disease conditions (or symptoms).
Brief Description of the Drawings:
Figure 1 depicts the intron/exon org~ni7~tion of the human galactokinase gene.
Figure 2 is the genomic DNA sequence (and single letter amino acid abbreviations) for human galactokinase [SEQ ID NO: 7]. The bolded DNA
sequence corresponds to the exon regions whereas the normal or unbolded type corresponds to the intron regions of human galactokinase.
Detailed Description of the Invention:
This invention relates to human galactokinase (amino acid and nucleotide sequences) and its use as a diagnostic and therapeutic. The particular cDNA and amino acid sequence of human galactokinase is identified by SEQ ID NO:4 as described more fully below. This invention also relates to the genomic DNA
sequence for human galactokinase [SEQ ID NO: 7] and also to mutant human galactokinase genes and amino acid sequences [SEQ ID NO:5 and 6] and their use for fli~gnostic purposes.
22U~S~
-In further describing the present invention, the following additional terms will be employed, and are inten~le~l to be defined as indicated below.
An "antigen" refers to a molecule containing one or more epito~es that will stim~ te a host's immune system to make a humoral and/or cellular antigen-specific response. The term is also used herein interchangeably with '~immllnogen.~
The term "epitope" refers to the site on an antigen or hapten to which a specific antibody molecule binds. The term is also used herein interch~ngeablywith "antigenic detwminant'' or "antigenic determinant site."
A coding sequence is "operably linked to" another coding sequence when RNA polymerase will transcribe the two coding sequences into a single mRNA, which is then tran~l~ted into a single polypeptide having amino acids derived from both coding sequences. The coding sequences need not be contiguous to one another so long as the ~plessed sequence is llltim~tçly processed to produce the desired protein.
"Recombinant" polypeptides refer to polypeptides produced by recombinant DNA techniques; i.e., produced from cells transformed by an exogenous DNA construct encoding the desired polypeptide. "Synthetic"
polypeptides are those plepal~,d by chemical synthesis.
A "replicon" is any genetic element (e.g., plasmid, chromosome, virus) that functions as an autonomous unit of DNA replication in vivo; i.e., capable of replication under its own control.
A "vector" is a replicon, such as a plasmid, phage, or cosmid, to which another DNA segment may be attached so as to bring about the replication of the attached segment.
A "replication-deficient virus" is a virus in which the excision and/or replication functions have been altered such that after transfection into a host cell, the virus is not able to reproduce and/or infect addition cells.
A "reference" gene refers to the galactokinase sequence of the invention and is understood to include the various sequence polymorphisms that exist, wherein nucleotide substitutions in the gene sequence exist, but do not affect the essential function of the gene product.
A "mutant" gene refers to galactokinase sequences dirr~lel,t from the reference gene wherein nucleotide substitutions and/or deletions and/or insertions - result in imrairmPnt of the essential function of the gene product such that the levels of galactose in an individual (or patient) are atypically elevated. For example, the G
to A substit-ltion at position 122 of human galactokinase [SEQ ID NO: 5] is a W O 96/09374 2 2 ~ O ~ ~ 3 PCT~US95/06743 5 mi~sen~e mutation associated with patients who are galactokinase deficient. Another T for G substitution produces an in-frame nonsense codon at amino acid position 80 of the mature protein. The result is a truncated protein consisting of the first 79 amino acids of human galactokinase A DNA "coding sequence of" or a "nucleotide sequence encoding" a 10 particular protein, is a DNA sequence which is transcribed and tr~nsl~te~l into a polypeptide when placed under the control of appropliate regulatory sequences.
A "promoter sequence" is a DNA regulatory region capable of binding RNA polymerase in a cell and initiating transcription of a downstream (3' direction) coding sequence. For purposes of defining the present invention, the 15 promoter sequence is bound at the 3' terminus by a translation start codon (e.g., ATG) of a coding sequence and extends upstream (5' direction) to include the ..,ill;..,l,.-- number of bases or elements necessary to initiate transcription at levels detectable above background. Within the promoter sequence will be found a transcription initiation site (conveniently defined by mapping with nuclease S1), as well as protein binding domains (consensus sequences) responsible for the binding of RNA polymerase. Eukaryotic promoters will often, but not always, contain "TATA" boxes and "CAT" boxes. Prokaryotic promoters contain Shine-Dalgarno sequences in addition to the -10 and -35 consensus sequences.
DNA "control sequences" refers collectively to promoter sequences, ribosome binding sites, polyadenylation signals, transcription termination sequences, upstream regulatory domains, enhancers, and the like, which collectively provide for the expression (i.e., the transcription and translation) of a coding sequence in a host cell.
A control sequence "directs the expression" of a coding sequence in a cell when RNA polymerase will bind the promoter sequence and transcribe the coding sequence into mRNA, which is then tr~ncl~te~ into the polypeptide encodedby the coding sequence.
A "host cell" is a cell which has been transformed or transfected, or is capable of transformation or transfection by an exogenous DNA sequence.
A cell has been "transformed" by exogenous DNA when such exogenous DNA has been introduced inside the cell membrane. Exogenous DNA
may or may not be integrated (covalently linked) into chromosomal DNA making up the genome of the cell. In prokaryotes and yeasts, for example, the exogenous DNA
may be m~int~inecl on an episomal element, such as a plasmid. With respect to eukaryotic cells, a stably transformed or transfected cell is one in which the exogenous DNA has become integrated into the chromosome so that it is inherited 2~0~5~3 by daughter cells through chromosome replication. This stability is demon~trated by the ability of the eukaryotic cell to establish cell lines or clones comprised of a population of daughter cell containing the exogenous DNA.
"Transfecdon" or "transfected" refers to a process by which cells take up foreign DNA and integrate that foreign DNA into their chromosome.
Transfection can be accomplished, for example, by various techniques in which cells take up DNA (e.g., calcium phosphate precipitation, electroporation, ~csimil~tion of liposomes, etc.), or by infection, in which viruses are used to transfer DNA into cells.
A "target cell" is a cell(s) that is selectively transfected over other cell types (or cell lines).
A "clone" is a population of cells derived from a single cell or common ancestor by mitosis. A "cell line" is a clone of a primary cell that is capable of stable growth in vitro for many generations.
A "heterologous" region of a DNA construct is an identifiable segment of DNA within or attached to another DNA molecule that is not found in ~Csoci~tion with the other molecule in nature. Thus, when the heterologous region encodes a gene, the gene will usually be flanked by DNA that does not flank the gene in the genome of the source animal. Another example of a heterologous coding sequence is a construct where the coding sequence itself is not found in nature (e.g., synthetic sequences having codons different from the native gene). Allelic variation or naturally occurring mutational events do not give rise to a heterologous region of DNA, as used herein.
"Conditions which are related to insufficient human galactokin~e activity"
or a "deficiency in galactokinase activity" means mutations of the galactokinaseprotein which affects g~l~ctokin~ce activity or may affect expression of galactokin~e or both such that the levels of galactose in a patient are atypically elevated. In addition, this definition is intended to cover atypically low levels of galactol~in~e expression in a patient due to defective control sequences for thereference g~l~ctokin~e protein.
This invention provides an i~ol~ted nucleic acid molecule encoding a human galactokinase protein and substantially similar sequences. Isolated nucleic acidsequences are "subst~nti~lly similar" if: (i) they are approximately the same length (i.e., at least 80% of the coding region of SEQ ID NO:4); (ii) they encode a protein with the same (i.e., within an order of m~gnitude) galactokinase activity as theprotein encoded by SEQ ID NO:4; and (iii) they are capable of hybridizing under moderately stringent conditions to SEQ ID NO:4; or they encode DNA sequences wo 96/09374 ~ 2 0 0 5 8 3 PCT~Sg5/06743 5 which are degenerate to SEQ ID NO:4. Degenerate DNA sequences encode the same amino acid sequence as SEQ ID NO:4, but have variation(s) in the nucleotidecoding sequences. Hybridization under moderately stringent conditions is outlined below.
Hybridizadon under moderately stringent conditions can be pelrol.ncd as 10 follows. Nitrocellulose filters are prehybridized at 65C in a solution conl~;ning 6X
SSPE, 5X Denhardt's solution (lOg Ficoll, 10g BSA and 10g Polyvinylpyrrolidone per liter solution), 0.05% SDS and 100 micrograms tRNA. Hybridi_ation probes arelabeled, preferably radiolabelled (e.g., using the Bios TAG-IT~) kit). Hybridi7~ti~n is then carried out for a~plo~i"lately 18 hours at 65C. The filters are then washed in a solution of 2X SSC and 0.5% SDS at room temperature for 15 minutes (repeated once). Subsequently, the filters are washed at 58C, air-dried and exposed to X-ray film ovemight at -70C with an intensifying screen.
~ltern~tively, "substantially similar" sequences are subst~nti~lly the same when about 66% (preferably about 75%, and most preferably about 90%) of the nucleotides or amino acids match over a defined length (i.e., at least 80% of the coding region of SEQ ID NO:4) of the molecule and the protein encoded by such sequence has the same (i.e., within an order of magnitude) galactokinase activity as the protein encoded by SEQ ID NO:4. As used herein, substantially similar refers to the sequences having similar identity to the sequences of the instant invention. Thus nucleotide sequences that are substantially the same can be identified by hybr ~li7~tion or by sequence comparison. Protein sequences that are substantially the same can be identi~le~ by one or more of the following: proteolytic digestion, gel electrophoresis and/or microsequencing.
This invention also provides isolated nucleic acid molecules encoding a mi~se~ce mutation (SEQ ID NO:5) or a nonsense mutation (SEQ ID NO:6) of the human galactokinase protein and DNA sequences which are degenerate to SEQ ID
NO:5 or 6. Degenerate DNA sequences encode the same amino acid (or termination site) sequence as SEQ ID NO:5 or 6, but have variation(s) in the nucleotide coding sequences.
One means for isolating a nucleic acid molecule encoding for a human galactokin~ce is to probe a human genomic or cDNA library with a natural or artificially designed probe using art recognized procedures (See for example:
"Current Protocols in Molecular Biology", Ausubel, F.M., et al. (eds.) Greene Publishing Assoc. and John Wiley Interscience, New York, 1989,1992). It is appreciated to one skilled in the art that SEQ ID NO:4, or fragments thereof (comprising at least 15 contiguous nucleotides), is a particularly useful probe.
WO 96/09374 ~ 2 0 (J 5 ~ 3 PCT/US95/06743 5 Several particularly useful probes for this purpose are set forth in Table 1, or hyhritli7~ble fr~gm~nt~ thereof (i.e., comprising at least 15 contiguous nucleotides).
It is also appreciated that such probes can be and are preferably labeled with an analytically ~etect~hle reagent to facilitate identific.~tion of the probe. Useful reagents include but are not limited to radioactivity, fluorescent dyes or enzymes 10 capable of catalyzing the forrnation of a detectable product. The probes are thus useful to isolate complementary copies of genomic DNA, cDNA or RNA from human, . ..~ n or other animal sources or to screen such sources for relatedsequences (e.g., additional members of the family, type and/or subtype) and including transcriptional regulatory and control elements defined above as well as 15 other stability, processing, translation and tissue specificity-dele~ ing regions from 5' and/or 3' regions relative to the coding sequences disclosed herein.
This invention also provides for gene therapy. "Gene therapy" means gene supplem~nt~tion. That is, an additional (i.e., reference) copy of the gene of interest is inserted into a p~tient~ cells. As a result, the protein encoded by the reference 20 gene corrects the defect (i.e., galactokinase deficiency) and permits the cells to function normally thus alleviating disease symptoms.
Gene therapy of the present invention can occur in vivo or ex vivo. Ex vivo gene therapy requires the isolation and purification of patient cells, the introduction of a therapeutic gene, and introduction of the genetically altered cells back into the 25 patient. A replication-deficient virus such as a modified retrovirus can be used to introduce the therapeutic gene (galactokinase) into such cells. For example, mouse Moloney leukemia virus (MMLV) is a well-known vector in clinical gene therapy trials (see, e.g., Boris-Lauerie et al., Curr. Opin. Genet. Dev.. 3:102-109 (1993)).
In contrast, in vivo gene therapy does not require isolation and purification 30 of patients' cells. The therapeutic gene is typically "packaged" for ~rimini~tration to a patient such as in liposomes or in a replication-deficient virus such as adenovirus (see, e.g., Berkner, K.L., Curr. Top Microbiol. Immunol.. 158:39-66 (1992)) or adeno-associated virus (AAV) vectors (see, e.g., Muzyczka, N., Curr. Top.
Microbiol. Immunol, 158:97-129 (1992) and U.S. Patent 5,252,479 "Safe Vector 35 for Gene Therapy"). Another approach is a~ministration of so-called "naked DNA"
in which the therapeutic gene is directly injected into the bloodstream or muscle tissue.
Cell types useful for gene therapy of the present invention include hepatocytes, fibroblasts, lymphocytes, any cell of the eye (e.g., retina), epithelial 40 and endothelial cells. Preferably the cells are hepatocytes, any cell of the eye or respiratory (or pulmonary) epithelial cells. Transfection of (pulmonary) epithelial W O 96/09374 2 2 0 0 5 8 3 PCTrUS95/06743 cells can occur via inhalation of a neubulized preparation of DNA vectors in liposomes, DNA-protein complexes or replication-deficient adenoviruses (see, e.g., U.S. Patent 5,240,846 "Gene Therapy Vector for Cystic Fibrosis".
This invention also provides for a process to prepare human g~l~ct()kin~e proteins. Non-mutant proteins are defined with reference to the amino acid sequence listed in SEQ ID NO:4 and includes variants with a substantially similar amino acid sequence that have the same galactokinase activity. Additional proteins of this invention include mutant human galactokinase proteins as set forth in SEQ ID NO: 5 or 6. The proteins of this invention are preferably made by recombinant genetic engineering techniques. The isolated nucleic acids particularly the DNAs can be introduced into expression vectors by operatively linking the DNA to the necess~ry expression control regions (e.g., regulatory regions) required for gene expression.
The vectors can be introduced into the appl~liate host cells such as prokaryotic(e.g., bacterial), or eukaryotic (e.g., yeast or m~mm~ 3n) cells by methods wellknown in the art (Ausubel et al., supra). The coding sequences for the desired proteins having been pl~;~ed or isolated, can be cloned into any suitable vector or replicon. Numelous cloning vectors are known to those of skill in the art, and the selection of an applopliate cloning vector is a matter of choice. Examples of recombinant DNA vectors for cloning and host cells which they can transform include, but is not limited to, the bacteriophage ~ (~. coli), pBR322 (~. ÇQll),pACYC177 C~- coli), pKT230 (gram-negative bacteria), pGV1106 (gram-negative bacteria), pLAFRl (gram-negative bacteria), pME290 (non-E. coli gram-negative bacteria), pHV14 (~. coli and Bacillus subtilis), pBD9 (Bacillus), pIJ61 (Sl~ptol"yces), pUC6 (S~IGpto,nyces), YIp5 (Saccharomyces), a baculovirus insectcell system, a Drosophila insect system, and YCpl9 (Saccharomyces). See, generally.
"DNA Cloning": Vols. I & II, Glover et al. ed. IRL Press Oxford (1985) (1987) and;
T. Maniatis et al. ("Molecular Cloning" Cold Spring Harbor Laboratory (1982).
The gene can be placed under the control of a promoter, ribosome binding site (for bacterial expression) and, optionally, an operator (collectively referred to herein as "control" elements), so that the DNA sequence encoding thedesired protein is transcribed into RNA in the host cell transformed by a vectorcom~ining this c~lGssion construction. The coding sequence may or may not contain a signal peptide or leader sequence. The subunit antigens of the presentinvention can be expressed using, for example, the E. coli tac promoter or the protein A gene (spa) promoter and signal sequence. Leader sequences can be removed by the bacterial host in post-translational processing. See, e.~., U.S. Patent Nos. 4,431,739;
4,425,437; 4,338,397.
WO 96/09374 2 2 U ~ S 8 ~ PCT/US95/06743 S In addition to control sequences, it may be desirable to add regulatory sequences which allow for regulation of the e,-plGssion of the protein sequencesrelative to the growth of the host cell. Regulatory sequences are known to those of skill in the art, and examples include those which cause the expression of a gene to be turned on or off in response to a chemical or physical stim-lh-s, including the presence of a regulatory compound. Other types of regulatory element~ may also be present in the vector, for example, enhancer sequences.
An e~r~ ssion vector is constructed so that the particular coding sequence is located in the vector with the applopliate regulatory sequences, thepositioning and or çnt~sion of the coding sequence with respect to the control sequences being such that the coding sequence is transcribed under the "control" of the control sequences (i.e., RNA polymerase which binds to the DNA molecule at the control sequences transcribes the coding sequence). Modification of the sequences encoding the particular antigen of interest may be desirable to achieve this end. For example, in some cases it may be necessary to modify the sequence so that it may be ~ts~hecl to the control sequences with the applol,liate orientation; i.e., to m~int~in the reading frame. The control sequences and other regulatory sequences may be ligated to the coding sequence prior to insertion into a vector, such as the cloning vectors described above. Alternatively, the coding sequence can be cloned directly into an e~ression vector which already contains the control sequences and an a~pl~pliate restriction site.
In some cases, it may be desirable to produce other mut~nt~ or analogs of the g~l~ctokin~ce protein. Mutants or analogs may be prepared by the deletion of a portion of the sequence encoding the protein, by insertion of a sequence, and/or by substitll~ion of one or more nucleotides within the sequence. Techniques for modifying nucleotide sequences, such as site-directed m-lt~genesi~ are well known to those skilled in the art. See, e.~., T. Maniatis et al., supra; DNA Clonin~ Vols. I and II, supra; Nucleic Acid Hybridization~ supra.
A number of prokaryotic ~ s~ion vectors are known in the art.
See, ç~" U.S. Patent Nos. 4,578,355; 4,440,859; 4,436,815; 4,431,740; 4,431,739;4,428,941; 4,425,437; 4,418,149; 4,411,994; 4,366,246; 4,342,832; ~ also U.K.
Patent Applications GB 2,121,054; GB 2,008,123; GB 2,007,675; and Eul~eall Patent Application 103,395. Yeast expression vectors are also known in the art. See, e.g., U.S. Patent Nos. 4,446,235; 4,443,539; 4,430,428; see ~1~Q European Patent- Applications 103,409; 100,561; 96,491. pSV2neo (as described in J. Mol. Appl.
Genet. 1 :327-341) which uses the SV40 late promoter to drive expression in .,.~.. ~li~n cells orpCDNAlneo, a vector derived from pCDNA1 (Mol. Cell Biol.
wo 96t09374 2 2 0 0 ~ 8 3 PCT/USg5/06743 7:4125-29) which uses the CMV promoter to drive expression. Both these latter two vectors can be employed for transient or stable (using G418 resistance) eA~rl,ssion in " ,~" ",~ n cells. Insect cell e~ression systems, e.g., Drosophila~ are also useful, see for ex~mrle, PCT applications WO 90/06358 and WO 92/06212 as well as EP
290,261-B1.
Depending on the expression system and host selected, the proteins of the present invention are produced by growing host cells transformed by an eA~ssion vector described above under conditions whereby the protein of interest is tA~l~,ssed. ~efell~,d "~ n cells include human embryonic kidney cells, monkey kidney (HEK-293cells), fibroblast tCOS) cells, Chinese ha~llsl~l ovary (CHO) cells, Drosophila or murine L-cells. If the expression system sec~ s the protein into growth media, the protein can be purified directly from the media. If the protein is not secreted, it is i~ol~t~l from cell Iysates orrecovered from the cell membrane fraction. The selection of the a~lopliate growth conditions and recovery methodsare within the skill of the art.
An ~ltenl~tive method to identify proteins of the present invention is by constructing gene libraries, using the resulting clones to transform 1~ 1i and pooling and scl~,e~ g individual colonies using polyclonal serum or monoclonal antibodies to galactokinase.
The proteins of the present invention may also be produced by chemical synthesis such as solid phase peptide synthesis, using known amino acidsequences or amino acid sequences derived from the DNA sequence of the genes of interest. Such methods are known to those skilled in the art. Chemical synthesis of peptides is not particularly preferred.
The proteins of the present invention or their fragments cornrn~ing at least one epitope can be used to produce antibodies, both polyclonal and monoclonal.
If polyclonal antibodies are desired, a selected m~mm~l, (e.g., mouse, rabbit, goat, horse, etc.) is immunized with the protein of the present invention, or a fragment thereof, capable of eliciting an immune response (i.e., having at least one epitope).
Serum from the immnni7ed animal is collected and treated according to known procedures. If serum containing polyclonal antibodies is used, the polyclonal andbodies can be purified by immunoaffinity chromatography or other known procedures.
Monoclonal antibodies to the proteins of the present invention, and to the fr~,~m~n~c thereof, can also be readily produced by one skilled in the art. The general methodology for making monoclonal antibodies by using hybridoma technology is well known. Immortal antibody-producing cell lines can be created by WO 96/09374 Z 2 U () 5 8 3 PCT/US95/06743 _ S cell fusion, and also by other techniques such as direct transformation of Blymphocytes with oncogenic DNA, or transfection with Epstein-Barr virus. See, e.~., M. Schreier ~ ~1., "Hybridoma Techniques" (1980); Hammerling ~ al., "Monoclonal Antibodies and T-cell Hybridom~c" (1981); Kennett et al., "Monoclonal Antibodies"
(1980); see also U.S. Patent Nos. 4,341,761; 4,399,121; 4,427,783; 4,444,887;
4,452,570; 4,466,917; 4,472,500; 4,491,632; and 4,493,890. Panels of monoclonal antibodies produced against the antigen of interest, or fragment thereof, can bescreened for various yluy~-lLies; i.e., for isotype, epitope, affinity, etc. Hence one skilled in the art can produce monoclonal antibodies specifically reactive with mutant g~l~etc-bin~e proteins, e.g., the misse~se mutation of SEQ ID NO:5 or nonsense mutation of SEQ ID NO:6. Monoclonal antibodies are useful in pllrific~tiQn, using immllno~ffinity techniques, of the individual antigens which they are directed against.
Alternatively, genes encoAing the monoclonals of interest may be isol~ted from the hybridomas by PCR techniques known in the art and cloned and expressed in the apl)lopl;ate vectors. The antibodies of this invention, whether polyclonal or monoclonal have additional utility in that they may be employed reagents in immllno~csays, RIA, ELISA, and the like. As used herein, "monoclonal antibody" is understood to include antibodies derived from one species (e.g., murine, rabbit, goat, rat, human, etc.) as well as antibodies derived from two (or perhaps more) species (e.g., chimeric and h~ ni7ed antibodies).
Chime~ic antibodies, in which non-human variable regions are joined or fused to human constant regions (~, e.~. Liu et al., Proc. Natl Acad. Sci. USA~ 84:3439 (1987)), may also be used in assays or thcl~euLically. Preferably, a theidpeuLicmonoclonal antibody would be "hum~ni7ed" as described in Jones et al., Nature, 321:522 (1986); Verhoeyen et al., Science. 239: 1534 (1988); Kabat et al., L
Irnrnunol., 147:1709 (1991); Queen et al., Proc. Natl Acad. Sci. USA. 86:10029 (1989); Gorman et al., Proc. Natl Acad. Sci. USA. 88:34181 (1991); and Hodgson et al., Bio~rechnolo~y. 9:421 (1991). Therefore, this invention also contemplates antibodies, polyclonal or monoclonal (including chimeric and "hnm~ni7~A") directed to epitopes cc.ll. ~,uonding to amino acid sequences disclosed herein from humangalactokin~se. Methods for the production of polyclonal and monoclonal antibodies are well known, see for example Chap. 11 of Ausubel et al. (supra).
When the antibody is labeled with an analytically detect~ble reagent such a r~ioactivity, fluorescence, or an enzyme, the antibody can be use to detect the presence or absence of human galactokin~e and/or its qll~ntitative level. In ~dtlition, antibodies (polyclonal or monoclonal) specific for the misserlce and nonsense mutations of the present invention are useful for diagnostic purposes. A serum or ~20iJ5~3 W O 96/09374 PCT~US95/06743 tissue sample (e.g., liver, lung, etc.) is obtained and allowed to come in contact with an antibody or antibody fragment which specifically binds to a mutant human galactokin~ce protein of the invention under conditions such that an antigen-antibody complex is formed between said antibody (or antibody fragment) and saidmutant g~ tokin~ce protein. The detection for the presence or absence of said complex is within the skill of the art (e.g., ELISA, RIA, Western Blotting, Optical Biosensor (e.g., BIAcore - Pharmacia Biosensor, Uppsala, Sweden) and do not limit this invention.
This invention also contçmpl~tes pharmaceutical compositions comprising an effective amount of the galactokin~ce protein of the invention and a pharm~ce~ltic~lly acceptable carrier. Ph~rm~ceutical compositions of p~teinaceous drugs of this invention are particularly useful for parentel~l a~lminictration, i.e., subcutaneously, intramuscularly or intravenously. Optionally, the g~l~ctQkin~ce protein is surrounded by a membrane bound vesicle, such as a liposome.
The compositions forparenteral ~minictration will commonly comprise a solution of the compounds of the invention or a cocktail thereof dissolved in anacceptable carrier, preferably an aqueous carrier. A variety of aqueous carriers may be employed, e.g., water, buffered water, 0.4% saline, 0.3% glycine, and the like.
These solutions are sterile and generally free of particulate matter. These solutions may be steriliæd by conventional, well known sterilization techniques. The compositions may contain pharmaceutically acceptable auxiliary substances as required to al)l,lu~ ate physiological conditions such as pH adjusting and buffering agents, etc. The concentration of the compound of the invention in such pharmaceutical formulation can very widely, i.e., from less than about 0.5%, usually at or at least about 1% to as much as 15 or 20% by weight and will be selected primarily based on fluid volumes, viscosities, etc., according to the particular mode of ~minis~ration selecte~
Thus, a ph~n~eutical composition of the invention for intramuscular injection could be plepal~d to contain 1 mL sterile buffered water, and 50 mg of a compound of the invention. Similarly, a pharmaceutical composition of the invention for intravenous infusion could be made up to contain 250 ml of sterile Ringer's solution, and 150 mg of a compound of the invention. Actual methods for ~.e~.;ng~alentel~lly aflministrable compositions are well known or will be apparent to those skilled in the art and are described in more detail in, for example, Remin~ton'sPharmaceutical Science. l5th ed., Mack Publishing Company, Easton, Pennsylvania.The compounds described herein can be lyophilized for storage and reconstituted in a suitable carrier prior to use. This technique has been shown to be WO 96/09374 2 2 0 ~ S ~ 3 PCT/US95/06743 effective with conventional proteins and art-known lyophili7~tion and reco~tit~ltion techniques can be employed.
The physician will determine the dosage of the present thel~eu~ic agents which will be most suitable and it will vary with the forrn of a~mini~tration and the particular compound chosen, and furthermore, it will vary with the particular patient under patient under tre~tm~nt. He will generally wish to initiate treatment with small dosages subst~nti~lly less than the optimum dose of the compound and increase the dosage by small increments until the o~ ulll effect under the circumstances is reached. It will generally be found that when the composition is ~mini~tered orally, larger quantities of the active agent will be required to produce the same effect as a smaller quantity given pa,cnlcldlly. The therapeutic dosage will generally be from 1 to 10 milligrams per day and higher although it may be ~tlmini~tered in several different dosage units.
Depending on the patient condition, the pharmaceutical composition of the invention can be a~lministçred for prophylactic and/or therapeutic tre~tm~nts Intherapeutic application, compositions are a~lmini~tered to a patient already suffering from a disease in an amount sufficient to cure or at least partially arrest the disease and its complications. In prophylactic applications, compositions cont;1;ning the present compounds or a cocktail thereof are ~tlmini~tered to a patient not already in a disease state to enhance the patient's resistance.
Single or m~lltiple a~lmini~trations of the pharmaceutical compositions can be carried out with dose levels and pattern being selected by the treating physician. In any event, the pharmaceutical composition of the invention should provide a quantity of the compounds of the invention sufficient to effectively treat the patient.
This invention also contemplates use of the galactokin~ce genes of the instant invention as a diagnostic. For example, some diseases result from inherited defective genes. These genes can be detected by comparing the sequence of the defective gene with that of a normal one. Subsequently, one can verify that a "mutant" gene is associated with galactokinase deficiency by measurement of galactose. That is, a mutant gene would be associated with (atypically) elevatedlevels of galactose in a patient. In addition, one can insert mutant galactokin~e genes into a suitable vector for expression in a functional assay system (e.g., colorimetric assay, expression on MacConkey plates, complem~nt~tion experiments,e.g, in a g~ tokin~e deficient strain of yeast or E. coli) as yet another means to verify or identify galactokin~e mutations. As an example, RNA from an individualcan be transcribed with reverse transcriptase to cDNA which can then be amplified by polymerase chain reaction (PCR), cloned into an E. coli expression vector, and 5 transformed into a galactokinase-deficient strain of E. coli. When grown on MacConkey in-licatQr plates, galactokinase-deficient cells will produce colonies that are white in color, whereas cells that have been transformed/complem~nted with afunctional g~l~ctokin~ce gene will be red (see, e.g., Examples section). If most to all of the colonies from an individual are red, then the individual is considered to be 10 normal with respect to galactokin~ce activity. If approximately 50% of the colonies are red (the other 50% white), then that individual is likely to be a carrier for galacto'-in~ce deficiency. If most to all of the colonies are white, then that individual is likely to be galactokinase deficient. Once "mutant" genes have been identified, one can then screen the population for carriers of the "mutant"
15 galactokin~ce gene. (A carrier is a person in apparent health whose chromosomes contain a "mutant" galactokin~ce gene that may be tr~nsmitted to that person's offspring.) In addition, monoclonal antibodies that are speciffc for the mutant galactokin~ce proteins can be used for diagnostic purposes as described above.
Individuals carrying mutations in the human galactokinase gene may be 20 detected at the DNA level by a variety of techniques. Nucleic acids used for gnosic (genomic DNA, mRNA, etc.) may be obtained from a patient's cells, such as from blood, urine, saliva, tissue biopsy (e.g., chorionic villi s~mpling or removal of amniotic fluid cells), and autopsy material. The genomic DNA may be used directly for detection or may be amplified enzym~tic~lly by using PCR, ligase chain 25 reaction (LCR), strand displacement amplification (SDA), etc. (see, e.g., Saiki et al., Nature, 324:163-166 (1986), Bej, et al., Crit. Rev. Biochem. Molec. Biol.. 26:301-334 (1991), Birkenmeyer et al., J. Virol. Meth.. 35:117-126 (1991), Van Brunt, J., BiolTechnolo~y. 8:291-294 (1990)) prior to analysis. RNA may also be used for the same purpose. The RNA can be reverse-transcribed and ampli~led at one time 30 with PCR-RT (polymerase chain reaction - reverse transcriptase) or reverse-transcribed to an unamplified cDNA. As an example, PCR primers complçm~ nt~ry to the nucleic acid of the instant invention can be used to identify and analyzegalactokin~ce mutations. For example, deletions and insertions can be detected by a change in size of the amplified product in comparison to the normal galactokinase 35 genotype. Point mutations can be identified by hybridizing amplified DNA to radiolabeled galactokin~ce RNA (of the invention) or alternatively, radiolabelled galactokin~ce ~ntisence DNA sequences (of the invention). Perfectly matched sequences can be distinguished from micm~tched duplexes by RNase A digestion or by differences in melting temperatures (Tm). Such a diagnostic would be particularly 40 useful for prenatal and even neonatal testing.
WO 96/09374 ~ ~ U i) J ~ 3 PCT/US95/06743 -In addition, point mutations and other sequence differences between the reference gene and "mutant" genes can be identi~led by yet other well-known techniques, e.g., direct DNA sequencing, single-strand conformational polymorphism (SSCP; Orita et al., Genomics. 5:874-879 (1989)). For example, a sequencing primer is used with double-stranded PCR product or a single-stranded template molecule generated by a modified PCR. The sequence determination is pe.~~ ed by convention~l procedures with radiolabeled nucleotides or by fl~l~Q~ ;C sequencing procedures with fluorescent-tags. Cloned DNA segments may also be used as probes to detect specific DNA segments. The sensitivity of this methotl is greatly enh~nced when combined with PCR. The presence of nucleotide repeats may correlate to a change in galactokin~ce activity (causative change) or serve as marker for various polymorphisms.
Genetic testing based on DNA sequence differences may be achieved by detection of alteration in electrophoretic mobility of DNA fragments in gels with or without denaturing agents. Small sequence deletions and insertions can be vi~u~li7e~1 by high resolution gel electrophoresis. DNA fragments of different sequences may be distinguished on denaturing formamide gradient gels in which the mobilities of dirferent DNA fragments are retarded in the gel at different positions according to their specific melting or partial melting temperatures (see, e.g., Myers et al., Science. 230:1242 (1985)). In addition, sequence alterations, in particular small deletions, may be detected as changes in the migration pattern of DNA
heteroduplexes in non-denaturing gel electrophoresis (i.e., heteroduplex electrophoresis) (see, e.g., Nagamine et al., Am. J. Hum. Genet.. 45:337-339 (1989))-Sequence changes at specific locations may also be revealed by nuclease 30 protection assays, such as RNase and S 1 protection or the chemical cleavage method (e.g., Cotton et al., Proc. Natl. Acad. Sci. USA~ 85:4397-4401 (1985)).
Thus, the detection of a specific DNA sequence may be achieved by methods such as hybridization (e.g., heteroduplex ele-;L..poldtion, see, White et al., Genorrucs. 12:301-306 (1992), RNAse protection (e.g., Myers et al., Science.
230:1242 (1985)) chemical cleavage (e.g., Cotton et al., Proc. Natl. Acad. Sci. USA.
85:4397-4401 (1985))), direct DNA sequencing, or the use of restriction enzymes (e.g., restriction fragment length polymo~phisms (RFLP) in which variations in the number and size of restriction fragments can indicate insertions, deletions, presence - of nucleotide repeats and any other mutation which creates or destroys an endonuclease restriction sequence). Southern blotting of genomic DNA may also beused to identify large (i.e., greater than 100 base pair) deletions and insertions.
WO 96/09374 2 2 U lJ j 8 3 PCT/US95/06743 In addition to more conventional gel-electrophoresis, and DNA sequencing, mutations (e.g., rnicrodeletions, aneuploidies, translocations, inversions) can also be detected by in situ analysis (See, e.g., Keller et al., DNA Probes, 2nd Ed., Stockton Press, New York, N.Y., USA (1993)). That is, DNA (or RNA) sequences in cells can be analyzed for mutations without isolation and/or immobilization onto a membrane. Fluorescence in situ hybridization (FISH) is presently the most commonly applied method and numerous reviews of FISH have appeared. See, e.g., Trachuck et al., Science. ~Q:559-562 (1990), and Trask et al., Trends. Genet., l:
149-154 (1991) which are incorporated herein by reference for background purposes. Hence, by using nucleic acids based on the structure of specific genes, e.g., galactokin~ce, one can develop diagnostic tests for galactokinase ~leficiency.
In addition, some diseases are a result of, or are characterized by, changes in gene expression which can be detected by changes in the mRNA. Alternatively, thegalactokin~ce gene can be used as a reference to identify individuals e~lcssing a decreased level of galactokinase, e.g., by Northern blotting or in situ hybridization.
Defining ap~lo~liate hybridization conditions is within the skill of the art.
See, e. g., "Current Protocols in Mol. Biol." Vol. I & II, Wiley Interscience. Ausbel et _l. (ed.) (1992). Probing technology is well known in the art and it is appreciated that the size of the probes can vary widely but it is preferred that the probe be at least 15 nucleotides in length. It is also appreciated that such probes can be and arepreferably labeled with an analytically detectable reagent to facilitate i-l~ntific~tiQn of the probe. Useful reagents include but are not limited to radioactivity, flu~ scent dyes or enzymes capable of catalyzing the formation of a detectable product. As a general rule the more stringent the hybridization conditions the more closely related genes will be that are recovered.
Also within the scope of this invention are antisense oligonucleotides pre~1icate~ upon the sequences disclosed herein for human galactokinase. Synthetic oligonucleotides or related antisense chemical structural analogs are designed to recognize and specifically bind to a target nucleic acid encoding galact-~kin~ce and galactokinase mutations. The general field of antisense technology is illustrated by the following disclosures which are incorporated herein by reference for purposes of background (Cohen, J.S., Trends in Pharm. Sci.. 10:435(1989) and Weintraub, H.M.Scientific American. Jan.(1990) at page 40).
Transgenic, non-human, animals may be obtained by transfecting ap~ liate fertilized eggs or embryos of a host with nucleic acids encoding human g~l~ctok;n~e 40 disclosed herein, see for example U.S. Patents 4,736,866; 5,175,385; 5,175,384 and 5,175,386. The resultant transgenic animal may be used as a model for the study of WO 96t09374 2 2 U U 5 ~ 3 PCT/US95/06743 ._ S g~ tokin~se Particularly, useful transgenic ~nim~lc are those which display a detect~kle phenotype associated with the expression of the receptor. Drugs may then be screened for their ability to reverse or exacerbate the relevant phenotype. This invention also conte-mrlates operatively linking the receptor coding gene to regulatory eleme-nt~ which are differentially responsive to various ~ Lulc or 10 metabolic conditions, thereby effectively turning on or off the phenotypic expression in response to those conditions.
Although not necess~lily limiting of this invention, following are some experimental data illustrative of this invention.
F.XAMpLE I
Purification of Human Galactokinase from Placental Tissue Galactokinase (galK) was obtained from human placenta as described by Stambolian et al. (Biochim Biophys Acta, 831:306-312 (1985)), which is incorporated 20 by reference in its entirety. In essence, human placenta tissue (obtained within 1 hour of parturition) was homogenized, centrifuged and the resulting supernatant was absorbed onto DEAE-Sephacel@). The m~te~i~l was eluted, precipitated with ~mmonium sulfate and then run through a sizing column (Sephadex G-100 SF(E)).
Pooled active fractions were concentrated. Purified protein was obtained following 25 separation by SDS polyacrylamide electrophoresis and then Western blotted using standard techniques (see, Laemmli, ~ure, 227:680-685 (1970), or LeGendre et al.,Biotechniques. 6:154 (1988)). Minute amounts of galactokinase were isolated (micrograms) from multiple rounds of protein purification. After a trypsin peptide digest, 7 peptide sequences were eventually i~ol~ted and identifi~l The three longest 30 fr~gm~nts are presented below:
[SEQ ID NO:l]
Val Asn Leu Ile Gly Glu His Thr Asp Tyr Asn Gln Gly Leu Val Leu-Pro Met Ala Leu Glu Leu Met Thr Val Leu Val Gly Ser Pro Arg 35 [SEQ ID NO:2]
His Ile Gln Glu His Tyr Gly Gly Thr Ala Thr Phe Tyr Leu Ser Gln-Ala Ala Asp Gly Ala Lys [SEQ ID NO:3]
40Ala Gln Val Cys Gln Gln Ala Glu His Ser Phe Ala Gly Met Pro Cys-Gly Ile Met Asp Gln Phe Ile Ser Leu Met Gly Gln Lys W O 96/09374 PCTrUS95/06743 S The fr~mPnts were co.~ ed with peptide sequences encoded by cDNAs, in which the cDNAs were partially sequenced. The cDNAs (also known as expressed sequence tags or ESTs) were obtained from Human Genome Sciences, Inc.
(Rockville, MD, USA). The best ~lignm~ns~ occurred with an EST sequence from a human osteocl~tom~ stromal cell library (SEQ ID NO:1 showed 100% identity over 18 contiguous amino acids) and an EST sequence from a human ~iLuila y library (SEQ
ID NO:2 showed 95.5% identity over 22 contiguous amino acids). A full-length cDNA from the human osteoclastoma stromal cell library was i~lentifie~l and sequenced (SEQ ID NO:4) in its entirety on an automated ABI 373A Sequencer.
Sequencing was conr~lllcd on both strands. The corresponding amino acid sequence(SEQ ID NO:4) was coll~pa ed against the peptide fragments idçntifie~ above. SEQID N0:1 coll~onds to arnino acids 38-68 of the full-length human galactokinase protein. Similarly, SEQ ID NOs: 2 and 3 correspond to amino acids 367-388 and 167-195, respectively, of human galactokinase.
Analysis of the Human Galactokinase Gene:
A coll~;son of the amino acid sequence for human g~ tokin~ce with that of E. coli galactokin?ce (Debouck et al., Nuc. Acid Res.. 13:1841-1853 (1985)) shows 61% similarity and 44.5% identity. Further co-~yalison with another ~ ol~ed human g~ tok;n~ce gene (G~2) (Lee et al., Proc. Natl. Acad. Sci. USA. 89:10887-10891 (1992)) shows 54% similarity and 34.6% identity at the amino acid l~vel.
Furthermore, the Gk~2 gene maps to human chromosome 15 which is in contrast to the gene of the present invention which maps to human chromosome 17, position q24 as ~letenninpd by fluorescence in situ hybridization (FISH) analysis.
SEQ ID N0:4 was hybridized against a Northern blot containing human messenger RNA from placenta, brain, skeletal muscle, kidney, intestine, heart, lung and liver according to standard procedures (see, e.g., Sambrook et al., Molecular Clonin~: A Laboratory Manual~ 2nd Ed., Cold Spring Harbor Laboratory Press, 1989). Hybridization was strongest with human liver and lung tissue.
Galactokinase Complementation:
SEQ ID NO:4 was subcloned into an E. coli vector, plasmid pBluescript [Stratagene]. When transformed into C600K-, a galactokinase-deficient strain, the transformed E. coli grew on MacConkey agar plates containing 1 % galactose (and ampicillin @ 50ug/ml for plasmid selection), and produced brick red colonies, indicating sugar fermentation. Specifically, the red color is due to the action of acids, ~2()~3 W 096/09374 PCT~US95/06743 -5 produeed by galaetose ferment~tion, upon bile salts and the in~ie~tQr (neutral red) in MacConkey mylillm Expression in Mammalian Cells:
SEQ ID NO:4 was also subcloned into COS-1 eells [ATCC CRL 1650]. The 10 cells were transfected, grown, and cell lysates were ple~d. The lysates were assayed by a 14C galactnkin~ce assay as deseribed by Stambolian et al. (Exp. Eye Res..
~:231-237 (1984)) which is hereby incorporated by reference in its entirety. When expressed in tr~ncie-ntly transfected COS cells, galactokinase activity was tenfold higher than control levels (6600 vs. 640 counts per minute - repeated three times).
15 These results definitively confirm that SEQ ID NO:4 encodes a full-length, biologieally aetive, human galactc-kin~ce gene.
The nucleic acid molecule of the invention can also be subcloned into an e~iession vector to produce high levels of human galactokinase (either fused to another protein, e.g., operatively linked at the 5' end with another coding sequence, or 20 unfused) in transfected cells. For m~mm~ n cells, the expression vector wouldoptionally encode a neomycin resistance gene to select for transfectants on the basis of ability to grow in G418 and a dihydrofolate reductase gene which permits ~mplifie~tion of the transfected gene in DHFR- cells. The pl~smid ean then be introduced into host cell lines e.g., CHO ACC98, a nonadherent, DHFR- cell line 25 adapted to grow in serum free m~ ]m, and human embryonic kidney 293 cells (ATCC CRL 1573), and transfeeted cell lines can be selected by G418 resist~nce.
Human Galactokinase Gene - Genomic Sequence:
A full-length galaetc~ in~ce genomie gene coding region was identified from a lambda phage (~ Fix II) human genomic library (made from human placenta tissue) using the galK cDNA as a probe. One isolate, designated clone 17 was deposited on 3 May 1995, with the American Type Culture Collection (ATCC), Rockville, MD, USA, under accession number ATCC 97135, and has been accepted as a patent deposit, in accordance with the Budapest Treaty of 1977 governing the deposit of35 microor~nicmc for the purposes of patent procedure.
The genomic gene coding region is divided into at least 8 exons isolated from 4 DNA fragments. The arrangement is depicted in Figure 1. The DNA sequenee was - determined by using multiple oligonucleotide PCR primers corresponding to the galK
eDNA sequence (i.e., corresponding to galK genomic exons) as well as 40 oligonucleotide PCR primers subsequently designed that correspond to non-coding regions (i.e., galK genomic introns). Thus the structure of the galactokinase genomic gene is s-lmm~ri7ed in Table 1 below (see also Figure 2 and SEQ ID NO:7]):
~ 2 U () S 8 3 PCT/US95/06743 wo 96/09374 s Table 1 Genomic Galactokinase Gene Amino Acids PCR Primer #/
Exon # Encoded ~SEQ ID NO]
1-55 3333/[8]
3334l[9]
3598/[10]
3599/[11]
2 56-118 1888/[12]
3332/[13]
3604/[14]
3605/[15]
3 119-158 3331/[16]
3606/[17]
4 159-204 1657/[18]
3034/[19]
205-264 3330/[20]
3607/[21]
6 265-315 1539/[22]
2665/[23]
7 316-369 1891/[24]
2665/[25]
8 370-392 2665/[26]
2666l~27]
2667l[28]
Galactokinase Deficiency Marker/Gene:
A fibroblast cell line (GM00334), derived from a patient with ~ ctokin~e 15 deficiency, was obtained from the Coriell Institute for Medical research, 401 Haddon 22U~583 WO 96t09374 PCT/US95/06743 Ave., C~m~pn~ New Jersey, 08103. Total RNA was isolated from the cultured cells using the RNAZOL kit for i~ol~tion cf RNA (Biotecx, Houston, Tx). Cytoplasmic DNA (1 ug) was reversed transcribed with oligonucleotide primers 1823 [SEQ ID
NO: 29] and 1825 [SEQ ID NO: 30]. The sample was amplified by 35 cycles at 94C
for 1 min., 60C for 1 min. and 72C for 7 min. The DNA product was purified electrophoretically, ligated to the TA cloning vector (Invitrogen) and sequenced.
Twelve cDNAs in total were sequenced (representing cloned PCR products of mllltiple independent PCR reactions). This procedure was also repeated with cultured fibroblasts from normal controls (i.e., persons not exhibiting galactokin~e deficiency).
A comparison with normal controls iclenti~le~ a single base substitution of A
for G at position 122 of the "normal" human galactokin~e gene [SEQ ID NO: 4].
The result is a missen~e mutation in amino acid 32 from Val to Met [SEQ ID NO: 5].
The G to A base change creates a MscI endonuclease restliction site (i.e., TGGI,CCA) on the mutant allele. This restriction site was then used to rapidly screen for the mutant allele in the parents of the patient with galactokin~e deficiency. In essence, the exon encoding galactokinase residues 1 to 5 (i.e., exon 1, see Table 1) was cloned from a genomic lambda phage library and its DNA sequence was ~leterrnined inrlu~ing a portion of the fl~nking intron sequences. Oligonucleotide im~ (X2-SOUT [SEQ ID NO: 31] and X2-30UT [SEQ ID NO: 32]) were ~lçsigned to hybridize to intron sequences for the amplification of a 346 bp DNAfragment of the genomic DNA. The PCR product was analyzed for the point mutation via RFLP, that is, the presence of a newly created MscI site as detected by electrophoresis of a 1.5% agarose gel. A "normal" allele remains uncut with the enzyme MscI, and thus migrates as a 346bp fragment on an agarose gel. The PCR
product from the patient with galactokin~ce deficiency (i.e., the G to A base change) is cleaved with MscI, resulting in two fr~gmrnts of 193 and 153 bp, respectively. The ~bsenre of 346 bp fragment in~ic~tes that the patient was homozygous for this allele.
In contrast, PCR products from the parents of this patient, followed by a MscI
digestion, resulted in three fragments (346, 193 and 153 bp) which is consistent with a heterozygous pattern for the G to A base change. That is, the parents were both carriers of the same mutation.
To determine whether the mi Ccen ce mutation resulted in decreased enzyma~c activity, a cDNA clone cont~ining the G to A base change was subcloned into COS
cells and assayed for g~l~rtokin~e activity as previously described. COS cells transfected with cDNA encoding the missen~e mutation had the same level of galactokinase activity as the host COS cells, namely 0.02 units/ug protein. In contrast~
COS cells transfected with the non-mutant galactokinase cDNA [SEQ ID NO:4] had a ~2U~5~3 S fifty-fold higher activity col,l~auGd to the host COS cells (i.e., control). This results supports the Val32 to Met32 substitution as the cause of the decreased enzymaticactivity.
Another mutadon was discovered in an unrelated patient having cataracts and diagnosed as g~l~ctokin~ce deficient (galactokin~ce activity was found to be close to 10 zero). Genomic DNA was isolated from lymphoblastoid cell lines and sequenced by automated sequencing on an ABI 373A sequencer. A single base substitution of T for G resulted in an in-frame nonsense codon (i.e., TAG) at amino acid position 80 rSEQ
II) NO:6]. This mutation causes premature termination of human galactokinase, resulting in a trnncated protein of 79 amino acids that would be expected to be non-15 functional. (The genomic DNA of the parents of this patient were heterozygous forthis mutation, and hence not galactokinase deficient.) The above description and examples fully disclose the invention including 20 l~lc~Gll~,d embo-lim~ntc thereof. Those skilled in the art will recognize, or be able to ascertain using no more than routine expGl;...e~ tion, many equivalents to the specific eml~limPntc herein. Such equivalents are intended to be within the scope of the following claims.
22u~J583 W O 96/09374 PCT~US9j/06743 SEQUENCE LISTING
(1) GENERAL INFORMATION:
(l) APPLICANT: Bergsma, Derk J.
Stambolian, Dwight (ii) TITLE OF INVENTION: Human Galactokinase Gene (iii) NUMBER OF SEQUENCES: 32 (iv) CORRESPONDENCE ADDRESS:
(A) ADDRESSEE: SmithKline Beecham Corp./Corporate Intellectual Property (B) STREET: 709 Swedeland Road/UW2220 (C) CITY: King of Prussia (D) STATE: Pennsylvania (E) COUNTRY: USA
(F) ZIP: 19406-0939 (v) COMPUTER READABLE FORM:
(A) MEDIUM TYPE: Floppy disk (B) COMPUTER: IBM PC compatible (C) OPERATING SYSTEM: PC-DOS/MS-DOS
(D) SOFTWARE: PatentIn Release #1.0, Version #1.30 (vi) CURRENT APPLICATION DATA:
(A) APPLICATION NUMBER:
(B) FILING DATE:
3~ (C) CLASSIFICATION:
(vii) PRIOR APPLICATION DATA:
(A) APPLICATION NUMBER: PCT/US94/10825 (B) FILING DATE: 23-SEP-1994 (viii) ATTORNEY/AGENT INFORMATION:
(A) NAME: Sutton, Jeffrey A.
(B) REGISTRATION NUMBER: 34,028 (C) REFERENCE/DOCKET NUMBER: P50268-l - ~ -220(J583 W 096/09374 PCTnUS95/06743 s ~ix) TELECOMMUNICATION INFORMATION:
(A) TELEPHONE: 610-270-5024 (B) TELEFAX: 610-270-S090 (2) INFORMATION FOR SEQ ID NO:1:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 31 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:1:
Val Asn Leu Ile Gly Glu His Thr Asp Tyr Asn Gln Gly Leu Val Leu Pro Met Ala Leu Glu Leu Met Thr Val Leu Val Gly Ser Pro Arg (2) INFORMATION FOR SEQ ID NO:2:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 22 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single ~D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein 220(J~83 W 096/09374 PCTrUS95/06743 _ (xl) SEQUENCE DESCRIPTION: SEQ ID NO:2:
His Ile Gln Glu His Tyr Gly Gly Thr Ala Thr Phe Tyr Leu Ser Gln 10 Ala Ala Asp Gly Ala Lys ~2) INFORMATION FOR SEQ ID NO:3:
15 (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 29 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3:
Ala Gln Val Cys Gln Gln Ala Glu His Ser Phe Ala Gly Met Pro Cys Gly Ile Met Asp Gln Phe Ile Ser Leu Met Gly Gln Lys (2) INFORMATION FOR SEQ ID NO:4:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1349 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: double (D) TOPOLOGY: llnear (ii) MOLECULE TYPE: cDNA
W O 96/09374 PCT~US95/06743 ~ix) FEATURE:
(A) NAME/KEY: CDS
(B) LOCATION: 29..1204 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:4:
GAATTCGGCA CGAGTGCAGG CGCGCGTC ATG GCT GCT TTG AGA CAG CCC CAG
Met Ala Ala Leu Arg Gln Pro Gln GTC GCG GAG CTG CTG GCC GAG GCC CGG CGA GCC TTC CGG GAG GAG TTC
Val Ala Glu Leu Leu Ala Glu Ala Arg Arg Ala Phe Arg Glu Glu Phe GGG GCC GAG CCC GAG CTG GCC GTG TCA GCG CCG GGC CGC GTC AAC CTC
Gly Ala Glu Pro Glu Leu Ala Val Ser Ala Pro Gly Arg Val Asn Leu ATC GGG GAA CAC ACG GAC TAC AAC CAG GGC CTG GTG CTG CCT ATG GCT
Ile Gly Glu His Thr Asp Tyr Asn Gln Gly Leu Val Leu Pro Met Ala CTG GAG CTC ATG ACG GTG CTG GTG GGC AGC CCC CGC AAG GAT GGG CTG
Leu Glu Leu Met Thr Val Leu Val Gly Ser Pro Arg Lys Asp Gly Leu GTG TCT CTC CTC ACC ACC TCT GAG GGT GCC GAT GAG CCC CAG CGG CTG
Val Ser Leu Leu Thr Thr Ser Glu Gly Ala Asp Glu Pro Gln Arg Leu CAG TTT CCA CTG CCC ACA GCC CAG CGC TCG CTG GAG CCT GGG ACT CCT
Gln Phe Pro Leu Pro Thr Ala Gln Arg Ser Leu Glu Pro Gly Thr Pro 22ui~ss3 CGG TGG GCC AAC TAT GTC AAG GGA GTG ATT CAG TAC TAC CCA GCT GCC
Arg Trp Ala Asn Tyr Yal Lys Gly Val Ile Gln Tyr Tyr Pro Ala Ala CCC CTC CCT GGC TTC AGT GCA GTG GTG GTC AGC TCA GTG CCC CTG GGG
Pro Leu Pro Gly Phe Ser Ala Val Val Val Ser Ser Val Pro Leu Gly GGT GGC CTG TCC AGC TCA GCA TCC TTG GAA GTG GCC ACG TAC ACC TTC
Gly Gly Leu Ser Ser Ser Ala Ser Leu Glu Val Ala Thr Tyr Thr Phe CTC CAG CAG CTC TGT CCA GAC TCG GGC ACA ATA GCT GCC CGC GCC CAG
Leu Gln Gln Leu Cys Pro Asp Ser Gly Thr Ile Ala Ala Arg Ala Gln GTG TGT CAG CAG GCC GAG CAC AGC TTC GCA GGG ATG CCC TGT GGC ATC
Val Cys Gln Gln Ala Glu His Ser Phe Ala Gly Met Pro Cys Gly Ile ATG GAC CAG TTC ATC TCA CTT ATG GGA CAG AAA GGC CAC GCG CTG CTC
Met Asp Gln Phe Ile Ser Leu Met Gly Gln Lys Gly His Ala Leu Leu ATT GAC TGC AGG TCC TTG GAG ACC AGC CTG GTG CCA CTC TCG GAC CCC
Ile Asp Cys Arg Ser Leu Glu Thr Ser Leu Val Pro Leu Ser Asp Pro AAG CTG GCC GTG CTC ATC ACC AAC TCT AAT GTC CGC CAC TCC CTG GCC
Lys Leu Ala Val Leu Ile Thr Asn Ser Asn Val Arg His Ser Leu Ala ~2U0583 W 096/09374 PCTrUS95/06743 TCC AGC GAG TAC CCT GTG CGG CGG CGC CAA TGT GAA GAA GTG GCC CGG
Ser Ser Glu Tyr Pro Val Arg Arg Arg Gln Cys Glu Glu Val Ala Arg GCG CTG GGC AAG GAA AGC CTC CGG GAG GTA CAA CTG GAA GAG CTA GAG
Ala Leu Gly Lys Glu Ser Leu Arg Glu Val Gln Leu Glu Glu Leu Glu GCT GCC AGG GAC CTG GTG AGC AAA GAG GGC TTC CGG CGG GCC CGG CAC
Ala Ala Arg Asp Leu Val Ser Lys Glu Gly Phe Arg Arg Ala Arg His GTG GTG GGG GAG ATT CGG CGC ACG GCC CAG GCA GCG GCC GCC CTG AGA
Val Val Gly Glu Ile Arg Arg Thr Ala Gln Ala Ala Ala Ala Leu Arg CGT GGC GAC TAC AGA GCC TTT GGC CGC CTC ATG GTG GAG AGC CAC CGC
Arg Gly Asp Tyr Arg Ala Phe Gly Arg Leu Met Val Glu Ser His Arg TCA CTC AGA GAC GAC TAT GAG GTG AGC TGC CCA GAG CTG GAC CAG CTG
Ser Leu Arg Asp Asp Tyr Glu Val Ser Cys Pro Glu Leu Asp Gln Leu GTG GAG GCT GCG CTT GCT GTG CCT GGG GTT TAT GGC AGC CGC ATG ACG
Val Glu Ala Ala Leu Ala Val Pro Gly Val Tyr Gly Ser Arg Met Thr GGC GGT GGC TTC GGT GGC TGC ACG GTG ACA CTG CTG GAG GCC TCC GCT
Gly Gly Gly Phe Gly Gly Cys Thr Val Thr Leu Leu Glu Ala Ser Ala 22u~583 W O 96/09374 PCTrUS95/06743 ._ GCT CCC CAC GCC ATG CGG CAC ATC CAG GAG CAC TAC GGC GGG ACT GCC
Ala Pro His Ala Met Arg His Ile Gln Glu His Tyr Gly Gly Thr Ala ACC TTC TAC CTC TCT CAA GCA GCC GAT GGA GCC AAG GTG CTG TGC TTG
Thr Phe Tyr Leu Ser Gln Ala Ala Asp Gly Ala Lys Val Leu Cys Leu TGAGGCACCC CCAGGACAGC ACACGGTGAG GGTGCGGGGC CTGCAGGCCA GTCCCACGGC
AAAAAAAAAA Ai~UU~AAAC TCGAG
(2) INFORMATION FOR SEQ ID NO:5:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1349 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: double (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(lx) FEATURE:
(A) NAME/~EY: CDS
(B) LOCATION: 29..1204 W O 96/09374 2 2 ~ u 3 PCTrUS95/06743 ~xi) SEQUENCE DESCRIPTION: SEQ ID NO:5:
GAATTCGGCA CGAGTGCAGG CGCGCGTC ATG GCT GCT TTG AGA CAG CCC CAG
Met Ala Ala Leu Arg Gln Pro Gln GTC GCG GAG CTG CTG GCC GAG GCC CGG CGA GCC TTC CGG GAG GAG TTC
Val Ala Glu Leu Leu Ala Glu Ala Arg Arg Ala Phe Arg Glu Glu Phe GGG GCC GAG CCC GAG CTG GCC ATG TCA GCG CCG GGC CGC GTC AAC CTC
Gly Ala Glu Pro Glu Leu Ala Met Ser Ala Pro Gly Arg Val Asn Leu ATC GGG GAA CAC ACG GAC TAC AAC CAG GGC CTG GTG CTG CCT ATG GCT
Ile Gly Glu His Thr Asp Tyr Asn Gln Gly Leu Val Leu Pro Met Ala CTG GAG CTC ATG ACG GTG CTG GTG GGC AGC CCC CGC AAG GAT GGG CTG
Leu Glu Leu Met Thr Val Leu Val Gly Ser Pro Arg Lys Asp Gly Leu GTG TCT CTC CTC ACC ACC TCT GAG GGT GCC GAT GAG CCC CAG CGG CTG
Val Ser Leu Leu Thr Thr Ser Glu Gly Ala Asp Glu Pro Gln Arg Leu CAG TTT CCA CTG CCC ACA GCC CAG CGC TCG CTG GAG CCT GGG ACT CCT
Gln Phe Pro Leu Pro Thr Ala Gln Arg Ser Leu Glu Pro Gly Thr Pro CGG TGG GCC AAC TAT GTC AAG GGA GTG ATT CAG TAC TAC CCA GCT GCC
Arg Trp Ala Asn Tyr Val Lys Gly Val Ile Gln Tyr Tyr Pro Ala Ala WO 96/09374 2 2 0 ~ 5 8 3 PCT/US95/06743 CCC CTC CCT GGC TTC AGT GCA GTG GTG GTC AGC TCA GTG CCC CTG GGG
Pro Leu Pro Gly Phe Ser Ala Val Val Val Ser Ser Val Pro Leu Gly GGT GGC CTG TCC AGC TCA GCA TCC TTG GAA GTG GCC ACG TAC ACC TTC
Gly Gly Leu Ser Ser Ser Ala Ser Leu Glu Val Ala Thr Tyr Thr Phe CTC CAG CAG CTC TGT CCA GAC TCG GGC ACA ATA GCT GCC CGC GCC CAG
Leu Gln Gln Leu Cys Pro Asp Ser Gly Thr Ile Ala Ala Arg Ala Gln GTG TGT CAG CAG GCC GAG CAC AGC TTC GCA GGG ATG CCC TGT GGC ATC
Val Cys Gln Gln Ala Glu His Ser Phe Ala Gly Met Pro Cys Gly Ile ATG GAC CAG TTC ATC TCA CTT ATG GGA CAG AAA GGC CAC GCG CTG CTC
Met Asp Gln Phe Ile Ser Leu Met Gly Gln Lys Gly His Ala Leu Leu ATT GAC TGC AGG TCC TTG GAG ACC AGC CTG GTG CCA CTC TCG GAC CCC
Ile Asp Cys Arg Ser Leu Glu Thr Ser Leu Val Pro Leu Ser Asp Pro AAG CTG GCC GTG CTC ATC ACC AAC TCT AAT GTC CGC CAC TCC CTG GCC
Lys Leu Ala Val Leu Ile Thr Asn Ser Asn Val Arg His Ser Leu Ala TCC AGC GAG TAC CCT GTG CGG CGG CGC CAA TGT GAA GAA GTG GCC CGG
Ser Ser Glu Tyr Pro Val Arg Arg Arg Gln Cys Glu Glu Val Ala Arg WO 96l09374 2 2 ~ U ~ 8 3 PCTIUS95/06743 GCG CTG GGC AAG GAA AGC CTC CGG GAG GTA CAA CTG GAA GAG CTA GAG
Ala Leu Gly Lys Glu Ser Leu Arg Glu Val Gln Leu Glu Glu Leu Glu GCT GCC AGG GAC CTG GTG AGC AAA GAG GGC TTC CGG CGG GCC CGG CAC
Ala Ala Arg Asp Leu Val Ser Lys Glu Gly Phe Arg Arg Ala Arg His GTG GTG GGG GAG ATT CGG CGC ACG GCC CAG GCA GCG GCC GCC CTG AGA
Val Val Gly Glu Ile Arg Arg Thr Ala Gln Ala Ala Ala Ala Leu Arg CGT GGC GAC TAC AGA GCC TTT GGC CGC CTC ATG GTG GAG AGC CAC CGC
Arg Gly Asp Tyr Arg Ala Phe Gly Arg Leu Met Val Glu Ser His Arg TCA CTC AGA GAC GAC TAT GAG GTG AGC TGC CCA GAG CTG GAC CAG CTG
Ser Leu Arg Asp Asp Tyr Glu Val Ser Cys Pro Glu Leu Asp Gln Leu GTG GAG GCT GCG CTT GCT GTG CCT GGG GTT TAT GGC AGC CGC ATG ACG
Val Glu Ala Ala Leu Ala Val Pro Gly Val Tyr Gly Ser Arg Met Thr GGC GGT GGC TTC GGT GGC TGC ACG GTG ACA CTG CTG GAG GCC TCC GCT
Gly Gly Gly Phe Gly Gly Cys Thr Val Thr Leu Leu Glu Ala Ser Ala GCT CCC CAC GCC ATG CGG CAC ATC CAG GAG CAC TAC GGC GGG ACT GCC
Ala Pro His Ala Met Arg His Ile Gln Glu His Tyr Gly Gly Thr Ala 220~583 ._ ACC TTC TAC CTC TCT CAA GCA GCC GAT GGA GCC AAG GTG CTG TGC TTG
Thr Phe Tyr Leu Ser Gln Ala Ala Asp Gly Ala Lys Val Leu Cys Leu TGAGGCACCC CCAGGACAGC ACACGGTGAG GGTGCGGGGC CTGCAGGCCA GTCCCACGGC
TCTGTGCCCG GTGCCATCTT CCATATCCGG GTGCTCAATA AACTTGTGCC TCCAATGTGG
AAA~AA~AAA AAAAAAAAAC TCGAG
(2) INFORMATION FOR SEQ ID NO:6:
(i) SEQUENCE CHARACTERISTICS:
(A~ LENGTH: 1349 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: double (D) TOPOLOGY: linear (li) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/~EY: CDS
(B) LOCATION: 29..265 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6:
GAATTCGGCA CGAGTGCAGG CGCGCGTC ATG GCT GCT TTG AGA CAG CCC CAG
Met Ala Ala Leu Arg Gln Pro Gln W O 96/09374 2 2 ~ u ~ 8 3 PCT~US95106743 GTC GCG GAG CTG CTG GCC GAG GCC CGG CGA GCC TTC CGG GAG GAG TTC
Val Ala Glu Leu Leu Ala Glu Ala Arg Arg Ala Phe Arg Glu Glu Phe Gly Ala Glu Pro Glu Leu Ala Val Ser Ala Pro Gly Arg Val Asn Leu ATC GGG GAA CAC ACG GAC TAC AAC CAG GGC CTG GTG CTG CCT ATG GCT
Ile Gly Glu His Thr Asp Tyr Asn Gln Gly Leu Val Leu Pro Met Ala CTG GAG CTC ATG ACG GTG CTG GTG GGC AGC CCC CGC AAG GAT GGG CTG
Leu Glu Leu Met Thr Val Leu Val Gly Ser Pro Arg Lys Asp Gly Leu GTG TCT CTC CTC ACC ACC TCT TAGGGTGCCG ATGAGCCCCA GCGGCTGCAG
Val Ser Leu Leu Thr Thr Ser TTTCCACTGC CCACAGCCCA GCGCTCGCTG GAGCCTGGGA CTCCTCGGTG GGCCAACTAT
GTCAAGGGAG TGATTCAGTA CTACCCAGCT GCCCCCCTCC CTGGCTTCAG TGCAGTGGTG
GTCAGCTCAG TGCCCCTGGG GGGTGGCCTG TCCAGCTCAG CATCCTTGGA AGTGGCCACG
TACACCTTCC TCCAGCAGCT CTGTCCAGAC TCGGGCACAA TAGCTGCCCG CGCCCAGGTG
TGTCAGCAGG CCGAGCACAG CTTCGCAGGG ATGCCCTGTG GCATCATGGA CCAGTTCATC
W O 96/09374 2 2 U ~ S 8 3 PC~rrUS95/06743 5 TCACTTATGG GACA~AAAGG CCACGCGCTG CTCATTGACT GCAGGTCCTT G~AGACCAGC
CTGGTGCCAC TCTCGGACCC CAAGCTGGCC GTGCTCATCA CCAACTCTAA TGTCCGCCAC
~CCClGGCCT CCAGCGAGTA CCCTGTGCGG CGGCGCCAAT GTGAAGAAGT GGCCCGGGCG
CTGGGCAAGG AAAGCCTCCG GGAGGTACAA CTGGAAGAGC TAGAGGCTGC CAGGGACCTG
GT~.AG~AAAG AGGGCTTCCG GCGGGCCCGG CACGTGGTGG GGGAGATTCG GCGCACGGCC
CAGGr-AGCGG CCGCCCTGAG ACGTGGCGAC TACAGAGCCT TTGGCCGCCT CATGGTGGAG
AGCCACCGCT CACTCAGAGA CGACTATGAG GTGAGCTGCC CAGAGCTGGA CCAGCTGGTG
GAGGCTGCGC TTGCTGTGCC TGGGGTTTAT GGCAGCCGCA TGACGGGCGG TGGCTTCGGT
GGCTGCACGG TGACACTGCT GGAGGCCTCC GCTGCTCCCC ACGCCATGCG GCACATCCAG
GAGCACTACG GCGGGACTGC CACCTTCTAC CTCTCTCAAG CAGCCGATGG AGCCAAGGTG
CTGTGCTTGT GAGGCACCCC CAGGACAGCA CACGGTGAGG GTGCGGGGCC TGCAGGCCAG
TCCCACGGCT CTGTGCCCGG TGCCATCTTC CATATCCGGG TGCTCAATAA ACTTGTGCCT
CCAATGTGGA AA~LULaAa AAAAAAAACT CGAG
.2~u~3 ~2) INFORMATION FOR SEQ ID NO:7:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 7676 base pairs (B) TYPE: nucleic acid 0 (C) STRANDEDNESS: double (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA ~genomic) ~xi) SEQUENCE DESCRIPTION: SEQ ID NO:7:
CCGAGCATCC CGCGCCGACG GGTCTGTGCC GGAGCAGCTG TGCAGAGCTG CAGGCGCGCG
TCATGGCTGC TTTGAGACAG CCCCAGGTCG CGGAGCTGCT GGCCGAGGCC CGGCGAGCCT
TCCGGGAGGA GTTCGGGGCC GAGCCCGAGC TGGCCGTGTC AGCGCCGGGC CGCGTCAACC
TCATCGGGGA ACACACGGAC TACAACCAGG GCCTGGTGCT GCCTATGGTG AGGGGCTGCA
CGGGGAGCCC CTAGCCCGCC GCCGCCTGTC CCGGTCGCCG AGGAGGGCGG GCCTCGGGGA
CGCTGGGGGC GACl~lTCC CGCGGGAGAT GTGGGGCGGG CAGCTGCGCC TGGAGCACCG
GTGCACGGAA GAGTCCCCGG GACAGGCTGT TCCCCACGTT GGAAGGGAGG AAGCGAAGAA
GTGGTCCCCA GAGGGTGCGC GGCCGCCTCT TGGCTCAAGC CCGCCCTCTG GGGGCTGGGG
W 096/09374 ~ 2 0 ~ 5 8 3 PCT~US95/06743 ClCC~CGC~l TCAACCTGGG AGCATGTTCC CCTTAAACTG TGAGGCCCTG TGTGCCACGC
AGAAGGGGAC ACTCCGCGCC TCCGGCCACC GTGGGGCCCC AACCGCAGAC CTGGGCGAAC
GTAGCCTTCT GGCCCAGCCC GTTCAATTTA CAGAGGAGGA AACTGAGGCC TAGAGAGGCC
CAGTGAACTG CTGGAGGTCA CACAGCAGGT TCTTGGCGGG GCTGCGACTT GGGAGTGAGG
ACTCCCAGCT TTCAGCGGGG GGCGCTTTCC GCCCCATCTG CAGCTTGGGG AGTGCACAGG
TACAGGATGT CCAGAGCCAC CCAAAATGTA AAGGCTTTGG AGCTCCAGTG Al~lG1l-lC
CCTTTGGGCT AAGCTCTCCC CCCTTGCCCC ACAGCTCAGG GCAGAGTCCA GGTCTGTGCT
CCAGCTGCAG CCGCCCCGCC CCTGAAGACC TAAGGGGGCA GGGCTCAAGC CCCCAAGGTC
AGCTGGCCCT CAGGATCTTC CCTGCGACGC TGAACCTGGA GGTTCAGAAC CTGATGACTG
TGGAGGCATC AGAACCTCGG CTGGAGGCAG TGTCATTGGA GAGGCTTACT CCAGCTGGCG
GAAGCCTCAC GTACTGCTTG TCTCTCCTGC CAGGCTCTGG AGCTCATGAC GGTGCTGGTG
GGCAGCCCCC GCAAGGATGG GCTGGTGTCT CTCCTCACCA CCTCTGAGGG TGCCGATGAG
CCCCAGCGGC TGCAGTTTCC ACTGCCCACA GCCCAGCGCT CGCTGGAGCC TGGGACTCCT
W 0 96/09374 PCTtUS95tO6743 CGGTGGGCCA ACTATGTCAA GGGAGTGATT CAGTACTACC CAGGTATGGG GCCCAGGCCT
GAGCCAAGTC CTCACTGATA CTAGGAGTGC CACCTCACAG CCACAGAGCC CATTCATTTG
TCTGATACAC ~iGGGGAAG GCTTGTAGAG TGGAGCATCC CATTGTACAG ATGAGGAAAC
TGATGCCCCC AGAAGGTCGG GAACTTGCCC TGGGTTTCCC GTGACCTGAT TGGAGGAGCC
AGGATTTGAA CCCCAGCCTT TTTTCCCTCC AGAGCCCTAA ACCAGGAGGA CAATTAGAAG
TGTCCCAGCA ACCTCAGAGG GTGGGAAAAT GGAGGGGAGT GGGTCCCTTG GGCCAGCAGG
TTGGTGGGGT TCTTGACAAT TGAGACACAC ACCTAGAAAC AGTTGCTAGG CCGTTGCTGC
CCTTCCCGCC AGGACACCTG CCCTTCCTGT CCAATCCTCC CAGGCAGCCT CTCTTACCAT
CAC~lGi~Ci TTCCCCCTGC AGCTGCCCCC CTCCCTGGCT TCAGTGCAGT GGTGGTCAGC
TCAGTGCCCC TGGGGGGTGG CCTGTCCAGC TCAGCATCCT TGGAAGTGGC CACGTACACC
TTCCTCCAGC AGCTCTGTCC AGGTACCAGC TAGGCCCCAG CCCTGACCCA GCCCTCCTTC
CCTGAGGTCT CCAGGTGGTC CCAGCTTCTA CTATGCCTTA TGGAGGGGGT GGCAGGGAAT
CTCCCTGGAG TGTCATTGAA GCCACTGCTG CTTCCACCAG CCCTAGCCTC CCCACCTCAC
W 096l09374 ~ 2 ~ ~ ~ 8 3 PCT~US95/06743 _ CCTGTACTGC AGACTCGGGC ACAATAGCTG CCCGCGCCCA GGTGTGTCAG CAGGCCGAGC
ACAGCTTCGC AGGGATGCCC TGTGGCATCA TGGACCAGTT CATCTCACTT ATGGGACAGA
AAGGCCACGC GClGC~CATT GACTGCAGGT TGGGCTCGCT CCCCTCGTCC CCTCCCGCCC
TGCACTCAGC AGCTCCTGGG TGGAGTGTGC CCACTGCCTG GCGCAGCAAG CACACGCTTG
GCClCClCAT C-CCCCCATT GTAACTCCAC CCCAGGTCCT TGGAGACCAG CCTGGTGCCA
CTCTCGGACC CCAAGCTGGC CGTGCTCATC ACCAACTCTA ATGTCCGCCA CTCCCTGGCC
TCCAGCGAGT ACCCl~lGCG GCGGCGCCAA TGTGAAGAAG TGGCCCGGGC GCTGGGCAAG
GAAAGCCTCC GGGAGGTACA ACTGGAAGAG CTAGAGGGTG AGAACTGCCA GGClGCl~lA
TCCTGGAGGC GG~lGlGClC CCTGCTGGCG CCTCAGTGTG GCCTTGACCC TGCCTGGGAC
CCCGATCTCC AGGGGCTTCT GCCATGCTCT CCCCAGTCCC TTCAAACACT GCGCACCCAG
GGTTCCAATC TCAGCAGGGG TGCTTGAAAT CCTAAAATGG TCTTATCTAA T~AG~AAAAT
CATGTTTCCA TTGTGGAAAA TGTAGAAAAG TACAAAGTAG AAAATAATAA GCTATAAGGG
CACTACCCAG AGATAGGCAC TGCTGACATT TTCACGTTTC CTTTCAGTAT TTTTCCACAT
W 096/09374 ~ 2 ~ U 5 8 3 PCT~US95/06743 ~lGlC.lCAA AGCTGAGTAT ATGTAATATA TCATCACTTT CCCCCCCCAC CCCC
TTAAGAGGCA GGGlClCATT CTGTTGCCCA AGCTGGAGTG TAGTGGTGTG ATCATAGCTT
ACTGCAAACT TGAACTCTTG AGCTCAAGGG ATCCTCCCAG CTCAGCCTTC CAAGTAGCTG
AGATTACAGG TGTGCCACCA TGCCCGGCTA ATTTTTATCT TCGTAAAGAC GGCCTTGTAG
TGTTGCCCAG GATGATCCTG AACTCTGGCC TCAAGAGGTC CTCCTGCCTT GGGCTCCCAA
AGTGTTGGGA TTATAGGCAT GAGCCACTGC GGCCAGCCCA TTTGCCGTGT lllllllllG
GACACA~AGT TTCGGTCTTG TCACCCATGC TGGAGTGCAA TGGTGCGATC TCAGCTCACT
GTAACCTCTG CCTCCCGGGT TCAAGTGATT CTCCTGCCTC AGCCTCCCGA GTAGCTGGGA
CTACAGGCGC CCGCCACTAC GCCTGGCACA TTTTTTATAG TTCTAGTAGA GACTGGGGTT
TCACCATGTT GGCCAGGCTG GTCTCAAACG CCTGACCTCA GGTGATCCTC CCGCCTCAGC
CTTCCAAAGT GCTGGGATTA CAGGCGTGAG CCATAGTGCC GGTCTCTTTT
TTAAACTAAA CATAATCTCA GAACCCAGAA CCCTATCTTA TCTTATGCCA TGAAAGGCAT
ATCTCGGCGT GGCICl~ lllll CIllTllIll GGGCGAGGTG GAGGCTTGCC
W0 96/09374 2 2 0 ~J; ~, 3 PCTrUS95/06743 ~ lGCCCA GGCTGGAGTG CAGCGGCGCA ATCTCGGTTC ACTGCATCCT CCACCICClG
GGTCCAAATG ATCClCCrGC CTTAGCTTCC TGAGTAGGTG GGATTACTGG AACCCACCAC
CACGCCCAGC CAATTTTTAT ATTTTTAGTA GAGACGGGGT TTCATGTTGG CCAGGCTGGC
CTCGAACTCC TGACCTCGTG ATCTGCCCGC CTCAGCCTCC CAATGTGCTA GGATTACATG
TGTGAGCCAC TGCACCTGGC CTCCGTGTGG CTCTTTAAAG CTCCACAATA TTTTAGCATT
CAGGTGCTCT GTCATTTACT TAACTATTTT CTGATACACC TCACACTGCG ATTAACTTTC
CTTATTTATC TTTTTTATTA TTTATTTATT TATTTATTTG AGACAGAGTC TTGCTCTGTC
ACCCAGGCTG GAGTGCAGTG GCACGATCTC GGCTCACTGC AACCTCTGCC TCCCAGGTTC
AAGTGATTCT CCTGCCTCAG CCTCCTGAGT AGCTAGGATT AGAGGCATGT GCCACCACAC
CTGGCTAATC TTCGTATTTT TAGCAGAGAT GAGGTTTTAC CATGTTGGTC GGGCTGGTCG
TGAACCACTG TGCCTGGCCA lClIl,rlAT TTTTTAAAGA GATGGGTTCT GCTAAGTTGC
CCAGGCTGGA CCTGAACTCT TGGGCTCAAG TAATCTTCTC ACCTAGTCTC CTGGGTAGCT
W O 96/09374 2 2 ~ ~ S 8 3 PCTrUS95/06743 GCAACCAAAG GCACCCGGTT TATCTGCATT ClC~ ll TCTTTGAGAC TGAG~CllGC
TCTGTAGCCC AGGCTGGAGC GCAGTGGCGT GATCTCGGCT CACTGCAACC TCCGTCTTCA
GGGTTCAAGC AAll~lCC-G CCTCAGCCTC TGGAGTGGCT GGGACTACAG GCGTGTGCCA
CCAGAGCGAG TTAATTTTTT .llllllllG TATTTTTAGT GGACACTGGG TTTCACTATA
TTGGCCAGGC TGG1Cll'GGA CTCCTGACCT CAAGTGATCC GCCTGCCTTG GCCTCCCAAA
20 ClGClGGGAT TACAGGCACA GGCGTGAGCC ACTACACCTG GCCTATCTGC A'll~lCllAA
TA6111C'llA GAAATGGATT CTTAGGAGTA GGATTACAGA GTCAAGA~.AC ACAAGTTTTG
TAGGCTGGGT GCGGlGGClC ACGTCTGTGC CTGTAATCCC AGTACTTTAG GAGGCCAAGG
TGGGCAGATT CATTGAGCTC AGGAATTCGA GACCAGCCTG GGCAACATGG CAAAACCCCA
TCTCTAAAGA AATACAAAAA TTAGCCAGGT GTGGTGGTGT GTGCCTGTAG TCCTAGCTAC
TTAGGAGGCT GGGGTGGGAG GATCAATTGA GCCCAGGAGG TTGAGACTGC AGTGAGCTGT
GATTGCACCA TGGCACTCCA GCCTGGGCCT CAAAGTGAGA TCCTGTCTCC AAAA~.~AAAA
AGATACAAGT ATCCTTAAGG CTCCTGCTAC ACATGGCCAG GAAGGTAGTC TATTGGACAG
W 096/09374 2 2 U ~ 5 8 3 PCTAUS95/06743 TTTTAAGGTC ATTATCAATA TTAGCTCATT TAATTCCCTC CAAAACTCTG TAAAGCACAT
TCTGCTACCA TAGTTGTCAT ATTTTTGATG GGGGAATCTA CAGTGAGAGG CAGTGCTGGG
ATCTGAACCC CATCTGGACA GATTAGCTCC AGGGCCCATG CTCTTGACTG GCTGGCCGCG
CTGCCCACAC TGAGTTGTTC CTTCCTGGCA GGGTAGGTGT GCCTATCTCA GGGACACTAG
ACAGCTCCGA GGGACCTCCC TGTCClll~C ClllGlGAAC TGTGTCACGT TCTCCAGAGC
AGGGCTCAGA CCTGCCCTGC CTGCl~lGlG CAGATGCCCT TGGCCAAGGT TTTCACACTG
CAAGTTG GTCCCTCCTC CCCACCCCAG CCTGTCCTTG GCCCTCCTCC AGGlC1C~l L
CTG~ATAGGA GCAGCTCACC CTGCCTCCTC CAGAGTCCTG CCCTAGAAGC GCAATCCCTC
TCCTTCCATC CCCTGCCTGG CTGCCTGGCT CCTTCCCTCA GCCTCCAAGA CATGCTCAGT
TTTCTTCCCT CCTAA~ACAC CACCCACTGT CTCATTTCCA TTCATTTCTT lC111~11iC
111C11illl llllllGAGA GGGAGCCTCA CTCTGTCACC CAGGCTGAAG TGCAGTGGCA
TGATCTCCAC TCACTGCAAC CTCCGCCTCC CAGGTTCAAG CAATTCTCCT GCCTCAGCCT
CCTGAGTAGC TGGGATTACA GGCGCCTGCC ACGATGCCCG GCTAACTTTT GTATTTTTAG
W 096/09374 '~ ~ U ~ ~ ~ 3 PCTrUS95/06743 TAGA~ACGGG G1l~CGCCAT GTTGGCCAGG CTGGTCTCGA GCTCCTGACC TCAGGCAATC
TGCCTGCCTC AGCTTCCCAA AGTGCTGGGA TTACAGGTGT GAGCCACCGC GCCCACCCAT
TCATTTCTCA GlCCl~GAA TCTACTTGCC CCTCCATCCC GCCATGCCAC CTACCCTAAC
AACCTTCCCC CTTAAACCTG CGGG~llGGC CGGGCGCAGT ACACTGAGTC AGTACTGGTA
CTGACCCAGG TACCCCTCCA GCCTCAGCTC CAGTCAGATG GGACAGCCTG ClGGICCClG
GClGCll~lG CCCCClCllC TGGAGCCCCA GCCCTGGAGG CTCCATGTGG CTCAGCAGAA
Cll~ CC TCCTGCTCTG TGGTGGCCTC TTGAGGGCAG CACTCACCTT GGAAAGCATG
GA~l~lllCA ACCCTCACTG CTCCCTGAAG GACCAAGGTG TCCCATTTTA CAGTCGGGGG
AGGAGGCACT GTGATAAAGG GGCTCTTCAG ACCCACGTCT GAGAGAGCCA GGCTGCGCCG
CCCCCGCGGC CTTCCACCCT TCACCGTCCA GCCAGGGCCA CTGCCATCAC CGCCTGCTGG
TCCTCACAGG CGTCGGGGCC CCAGGCAGTG AGAAGGCGGC TGCTGACTCC TCTTTCCTCC
CCAGCTGCCA GGGACCTGGT GAGCAAAGAG GGCTTCCGGC GGGCCCGGCA CGTGGTGGGG
GAGATTCGGC GCACGGCCCA GGCAGCGGCC GCCCTGAGAC GTGGCGACTA CAGAGCCTTT
2 2 0 0 5 ~ 3 PCT/USg5/06743 _ S GGCCGCCTCA TGGTGGAGAG CCACCGCTCA CTCAGGTGAG GCCCTCTGGG CGCCCCGCTC
CTGCCGGGCA CAGGCCGGCC CAGGCCCACC CCTTCAATAT CCTCTCTGCA G~CGACTA
TGAGGTGAGC TGCCCAGAGC TGGACCAGCT GGTGGAGGCT GCGCTTGCTG TGCCTGGGGT
TTATGGCAGC CGCATGACGG GCGGTGGCTT CGGTGGCTGC ACGGTGACAC TGCTGGAGGC
CTCCGCTGCT CCCCACGCCA TGCGGCACAT CCAGGTGGGC GGGCACCAGG GCCTGGGCGG
GCAGGAGCGG CAGCTTCCCG GGGCCCTGCC ACTCACCCCC AGCCCGCCTC TTACAGGAGC
ACTACGGCGG GACTGCCACC TTCTACCTCT CTCAAGCAGC CGATGGAGCC AAGGTGCTGT
GCTTGTGAGG CACCCCCAGG ACAGCACACG GTGAGGGTGC GGGGCCTGCA GGCCAGTCCC
ACGGCTCTGT GCCCGGTGCC ATCTTCCATA TCCGGGTGCT CAATAAACTT GTGCCTCCAA
TGTGGTACCT GCCTCCTCTA GAGGTGGGTG TATGCTTGGG TGTCAGAGAA TGGGGGATGT
CAGAACCGCT CCCCTACCCT AGGGGAGCAC CTCTCAGGCC CCAGAAGAAT GGGCAAGGCA
GGGCCTAGCA GTAGCAAAAC CATTTATTAA GTGCAGAACA AAGGCTGGGT CCTTGTGCTG
CTCCCAGCTC TTTGGTTACA AATAGGTTTG GGCCCACAGA GGACGGACCT TGCCCCCTTC
~32U~83 W 096/09374 PCTrUS95/06743 ATGCCTCCCA GGAGACACCT AGCCCCTGCT CTGTGCATGC GGGTGGGCTG GGCCCCCAGG
GGTGCAAGGA TGGAGTAGCT GAGGAGGCTC CGGGAGAGGA GTCGGGAGGA CGCCTAGTGG
GACATTGCGG GGGTGGCGCA GGGTGCGGTC AAGTTTGGAA GAAACTGTTG GGTCCA
(2) INFORMATION FOR SEQ ID NO:8:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid ~C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:8:
AGCCTTCCGG GAGGAGTTCG G
(2) INFORMATION FOR SEQ ID NO:9:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acld (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) W 096/09374 ~ 2 ll ~ 5 8 3 PCTrUS95/06743 _ (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9:
CTGGTTGTAG TCCGTGTGTT C
(2) INFORMATION FOR SEQ ID NO:10:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:10:
GCCAGCAGCT CCGCGACCTG G
(2) INFORMATION FOR SEQ ID NO:11:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:11:
GCTTCCTCCC TTCCAACGTG G
~2UV~83 W 096/09374 - PCTrUS95/06743 t2) INFORMATION FOR SEQ ID NO:12:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single tD) TOPOLOGY: linear (li) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:12:
CCCAGGCTCC AGCGAGCGCT G
(2) INFORMATION FOR SEQ ID NO:13:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:13:
ACCTCTGAGG GTGCCGATGA G
(2) INFORMATION FOR SEQ ID NO:14:
(i) SEQUENCE CHARACTERISTICS:
W O 96/09374 ~ 2 0 ~ 5 8 3 PCTrUS95/06743 .~_ (A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA ~genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:14:
CCCACAGCTC AGGGCAGAGT C
20 (2) INFORMATION FOR SEQ ID NO:15:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (c) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:15:
(2) INFORMATION FOR SEQ ID NO:16:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear ~20iJ~3 ~ii) MOLECULE TYPE: DNA ~genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:16:
GATGAACTGG TCCATGATGC C
(2) INFORMATION FOR SEQ ID NO:17:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:17:
AGGGGCACTG AGCTGACCAC C
(2) INFORMATION FOR SEQ ID NO:18:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B~ TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) -W 096/09374 ~ 2 ~) ~J 5 8 3 PCT~US95/06743 _ (xi) SEQUENCE DESCRIPTION: SEQ ID NO:18:
CACTTCTACA CATTGGCGCC G
(2) INFORMATION FOR SEQ ID NO:19:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D~ TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:19:
CTTCGCAGGG ATGCCCTGTG G
(2) INFORMATION FOR SEQ ID NO:20:
(1) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:20:
~ 220ui83 W 096/09374 PCTrUS95/06743 TCATCACCAA CTCTAATGTC C
(2) INFORMATION FOR SEQ ID NO:21:
~i) SEQUENCE CHARACTERISTICS:
~A) LENGTH: 21 base pairs ~B) TYPE: nucleic acid ~C) STRANDEDNESS: single (D) TOPOLOGY: llnear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:21:
TGTCAGCAGT GCCTATCTCT G
(2) INFORMATION FOR SEQ ID NO:22:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid ~C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:22:
AGCAGCGGAG GCCTCCAGCA G
(2) INFORMATION FOR SEQ ID NO:23:
~2:)~583 ~i) SEQUENCE CHARACTERISTICS:
~A) LENGTH: 21 base pairs ~B) TYPE: nucleic acid ~C) STRANDEDNESS: single ~D) TOPOLOGY: linear ~ii) MOLECULE TYPE: DNA (genomic) ~xi) SEQUENCE DESCRIPTION: SEQ ID NO:23:
CCTCACCGTG TGCTGTCCTG G
(2) INFORMATION FOR SEQ ID NO:24:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:24:
GGCTGCGCTT GCTGTGCCTG G
(2) INFORMATION FOR SEQ ID NO:25:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid 2~5~3 (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:25:
CCTCACCGTG TGCTGTCCTG G
(2) INFORMATION FOR SEQ ID NO:26:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:26:
CCTCACCGTG TGCTGTCCTG G
(2) INFORMATION FOR SEQ ID NO:27:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acld (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) W O 96t09374 PCTnUS95/06743 -s (xi) SEQUENCE DESCRIPTION: SEQ ID NO:27:
GCGGGACTGC CACCTTCTAC C
(2) INFORMATION FOR SEQ ID NO:28:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:28:
CTCAATAAAC TTGTGCCTCC A
(2) INFORMATION FOR SEQ ID NO:29:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 23 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLO~Y: linear (ii) MOLECULE TYPE: DNA (genomic) 220U~3 W 096/09374 PCTrUS95/06743 ~xi) SEQUENCE DESCRIPTION: SEQ ID NO:29:
CGGATATGGA AGATGGCACC GGG
(2) INFORMATION FOR SEQ ID NO:30:
(1) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 22 base pairs ~B) TYeE: nuclelc acld (C) STRANDEDNESS: single (D) TOPOLOGY: llnear (11) MOLECULE TYPE: DNA (genomic) (xl) SEQUENCE DESCRIPTION: SEQ ID NO:30:
AGAGCTGCAG GCGCGCGTCA TG
(2) INFORMATION FOR SEQ ID NO:31:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 19 base pairs (B) TYPE: nucleic acld (C) STRANDEDNESS: slllgle (D) TOPOLOGY: linear (il) MOLECULE TYPE: DNA (genomlc) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:31:
CCGAGCATCC CGCGCCGAC
W 096/09374 2 2 ~ U 5 8 3 PCTrUS95/06743 (2) INFORMATION FOR SEQ ID NO:32:
~i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 20 base pairs ~B) TYPE: nucleic acid ~C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic~
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:32:
CAGCTGCCCG CCCCACATCT
Cross-Reference to R~ ed Applications:
This appli~tion is a continuation in part of Serial No. PCT/US94/10825, filed 23 September 1994.
Field of the Invention:
This invention relates to human galactokinase and the identifi~ation of g~l~stokin~ce mutations, a mi~sen~e and nonsense, as well as isolated nucleic acids encoding same, recombinant host cell transformed with DNA encoding such proteins and to uses of the expressed proteins and nucleic acid sequences in thcl~e~l~ic and diagnostic applications.
Background of the Invention:
There are numerous inherited human metabolic disorders, most of which are recessive. Many have devastating effects that may include a combination of several clinical features, such as severe mental retardation, impairment of the peripheral nervous system, blindness, hearing deficiency and organomegaly. Most of the disorders are rare. However, the majority of such disorders cannot be treated bydrugs.
Galact~ kin~e deficiency is one of three known forms of galactosemia. The other forms are galactose-l-phosphate uridyltransferase deficiency and UDP-g~ rtose-4-e~ cl~se deficiency. All three enzymes are irlvolved in galactose metabolism, i.e., the conversion of galactose to glucose in the body. Galactokin~e defici~-ncy is inherited as an autosomal recessive trait with a heterozygote frequency estim~teA to be 0.2% in the general population (see, e.g., Levy et al., J. Pediatr.~
~2:871-877 (1978)). Patients with homozygous galactokinase deficiency usually become symptomatic in the early infantile period showing galactosemia, galactosura, increased galactitol levels, cataracts and in a few cases, mental 40 retardation (Segal et al., J. Pediatr~:750-752 (1979)). These symptoms usually improve dramatically with the ~iministration of a galactose free diet.
Heterozygotes for galactokinase deficiency are prone to presenile cataracts with the S onset during 20-50 years of age (Stambolian et al., Invest. Ophthal. Vis. Sci..
~7:429-433 (1986)).
Galactokin~ce activity has been found in a variety of m~mm~ n tissues, including liver, kidney, brain, lens, placenta, erythrocytes and leukocytes. While the protein has been purified from E. coli, the purification of the protein from10 m~mm,.li~n tissues has proven difficult due to its low cellular concentration. In addition, the molecular basis of galactokinase deficiency is unknown.
This invention provides a human galactokin~ce gene. The DNAs of this invention, such as the specific sequences disclosed herein, are useful in that they encode the genetic information required for expression of this protein. Additionally, 15 the sequences may be used as probes in order to isolate and identify additional members, of the family, type andlor subtype as well mutations which may form thebasis of galactokin~ce deficiency which may be characterized by site-specific mutations or by atypical expression of the galactokinase gene. The galactokin~cegene is also useful as a diagnostic agent to identify mutant galactokin~ce proteins or 20 as a therapeutic agent via gene therapy.
The first clinical trials of gene therapy began in 1990. Since that time, more than 70 clinical trial protocols have been reviewed and approved by a regulatory authority such as the NIH's Recombinant Advisory Co..,.~ e (RAC), see, e.g., Anderson, W. F., Human Gene Therapy~ 5:281-282 (1994). The 25 therapeutic treatm~nt of rlice~ces and disorders by gene therapy involves the transfer and stable insertion of new genetic information into cells. The correction of a genetic defect by re-introduction of the norrnal allele of a gene has hence demonstrated that this concept is clinically feasible (see, e.g., Rosenberg et al., New Eng. J. Med.. ~: 570 (1990)).
These and additional uses for the reagents described herein will become a~,arel1t to those of ordinary skill in the art upon reading this specification.
Summary of the Invention:
This invention provides isolated nucleic acid molecules encoding human galactokin~ce, as well as nucleic acid molecules encoding missense and nonsense mutations, which includes mRNAs, DNAs (e.g., cDNA, genomic DNA, etc.), as well as ~nticence analogs thereof and diagnostically or therapeutically useful fragments thereof.
This invention also provides recombinant vectors, such as cloning and expression plasmids useful as reagents in the recombinant production of human wo 96/09374 2 2 0 0 5 8 3 Pcr/uss5lo6743 5 galactokin~e proteins, as well as recombin~nt prokaryotic and/or eukaryotic host cells comprising a human galactokinase nucleic acid sequence.
This invention also provides a process for preparing human galactokinase proteins which comprises culturing recombinant prokaryotic and/or eukaryotic host - cells, cont~ining a human galactnkin~e nucleic acid sequence, under conditions promoting expression of said protein and subsequent recovery thereof of said protein. Another related aspect of this invention is isolated human galactokinase proteins produced by said method. In yet another aspect, this invention also provides antibodies that are directed to (i.e., bind) human g~l~ctokin~se proteins.
This invention also provides an i~ol~tecl human galactokin~ce proteins having a mi~sen~e or nonsense mutation and antibodies (monoclonal or polyclonal)that are specifically reactive with said proteins.
This invention also provides nucleic acid probes and PCR primers comprising nucleic acid molecules of sufficient length to specifically hybridize to human galactokin~ce sequences.
This invention also provides a method to diagnose human g~l~ctolin~e deficiency which comprises isolating a nucleic acid sample from an individual and assaying the sequence of said nucleic acid sample with the reference gene of theinvention and cc,lllpa~ g dirrerences between said sample and the nucleic acid of the instant invention, wherein said differences indicate mutations in the human g~ tckin~e gene isolated from an individual. The sample can be assayed by direct sequence co~ ~ison (i.e., DNA sequencing), wherein the sample nucleic acid can be compared to the reference galactokin~e gene, by hybridi_ation (e.g.,mobility shift assays such as heteroduplex gel electrophoresis, SSCP or other techniques such as Northern or Southern blotting which are based upon the length of the nucleic acid sequence) or other known gel electrophoresis methods such as RLFP (for example, by restriction endonuclease digestion of a sample amplified by PCR (for DNA) or PCR-RT (for RNA)). ~ltern~tively~ the diagnostic method comprises isolating cells from an individual containing genomic DNA and assayingsaid sample (e.g., cellular RNA) by in situ hyb~ifli7~tion using the DNA sequçnce of the invention, or at least one exon, or a fragment containing at least 15, preferably 18, and more preferably 21 contiguous base pairs as a probe. This invention alsoprovides an anti~en~e oligonucleotide having a sequence capable of binding with mRNAs encoding human galactokinase so as to identify mutant g~l~ctokin~e genes.
This invention also provides yet another method to diagnose human galactokinase deficiency which comprises obtaining a serum or tissue sarnple;
allowing such sample to come in contact with an antibody or antibody fragment W O 96/09374 ~ ~ O D 5 ~ 3 PCTrUS95/06743 S which specific~lly binds to a mutant human galactokin~ce protein of the invention under conditions such that an antigen-antibody complex is formed between said antibody (or antibody fragment) and said mutant galactokinase protein; and ~letecting the presence or absence of said complex.
This invention also provides transgenic non-human ~nim~lc comprising a nucleic acid molecule encoding human g~l~ctokin~ce. Also provided are methods for use of said transgenic ~nim~lc as models for disease states, mutation and SAR.
This invention also provides a method for treating conditions which are related to incnlffi~ient human galactokinase activity which comprises ~tlminictering to a patient in need thereof a pharmaceutical composition containing the galactcl~in~ce protein of the invention which is effective to supplement a patient's endogenousg~lactr)l~in~ce and thereby alleviating said condition.
This invention also provides a method for treating conditions which are related to insufficient human galactokinase activity via gene therapy. An additional, or reference, gene comprising the non-mutant galactokinase gene of the instant invention is inserted into a patient's cells either in vivo or ex vivo. The reference gene is c~yl~ssed in transfected cells and as a result, the protein encoded by the reference gene ~oll~cls the defect (i.e., galactokinase deficiency) thus ye~ iuing the transfected cells to function normally and alleviating disease conditions (or symptoms).
Brief Description of the Drawings:
Figure 1 depicts the intron/exon org~ni7~tion of the human galactokinase gene.
Figure 2 is the genomic DNA sequence (and single letter amino acid abbreviations) for human galactokinase [SEQ ID NO: 7]. The bolded DNA
sequence corresponds to the exon regions whereas the normal or unbolded type corresponds to the intron regions of human galactokinase.
Detailed Description of the Invention:
This invention relates to human galactokinase (amino acid and nucleotide sequences) and its use as a diagnostic and therapeutic. The particular cDNA and amino acid sequence of human galactokinase is identified by SEQ ID NO:4 as described more fully below. This invention also relates to the genomic DNA
sequence for human galactokinase [SEQ ID NO: 7] and also to mutant human galactokinase genes and amino acid sequences [SEQ ID NO:5 and 6] and their use for fli~gnostic purposes.
22U~S~
-In further describing the present invention, the following additional terms will be employed, and are inten~le~l to be defined as indicated below.
An "antigen" refers to a molecule containing one or more epito~es that will stim~ te a host's immune system to make a humoral and/or cellular antigen-specific response. The term is also used herein interchangeably with '~immllnogen.~
The term "epitope" refers to the site on an antigen or hapten to which a specific antibody molecule binds. The term is also used herein interch~ngeablywith "antigenic detwminant'' or "antigenic determinant site."
A coding sequence is "operably linked to" another coding sequence when RNA polymerase will transcribe the two coding sequences into a single mRNA, which is then tran~l~ted into a single polypeptide having amino acids derived from both coding sequences. The coding sequences need not be contiguous to one another so long as the ~plessed sequence is llltim~tçly processed to produce the desired protein.
"Recombinant" polypeptides refer to polypeptides produced by recombinant DNA techniques; i.e., produced from cells transformed by an exogenous DNA construct encoding the desired polypeptide. "Synthetic"
polypeptides are those plepal~,d by chemical synthesis.
A "replicon" is any genetic element (e.g., plasmid, chromosome, virus) that functions as an autonomous unit of DNA replication in vivo; i.e., capable of replication under its own control.
A "vector" is a replicon, such as a plasmid, phage, or cosmid, to which another DNA segment may be attached so as to bring about the replication of the attached segment.
A "replication-deficient virus" is a virus in which the excision and/or replication functions have been altered such that after transfection into a host cell, the virus is not able to reproduce and/or infect addition cells.
A "reference" gene refers to the galactokinase sequence of the invention and is understood to include the various sequence polymorphisms that exist, wherein nucleotide substitutions in the gene sequence exist, but do not affect the essential function of the gene product.
A "mutant" gene refers to galactokinase sequences dirr~lel,t from the reference gene wherein nucleotide substitutions and/or deletions and/or insertions - result in imrairmPnt of the essential function of the gene product such that the levels of galactose in an individual (or patient) are atypically elevated. For example, the G
to A substit-ltion at position 122 of human galactokinase [SEQ ID NO: 5] is a W O 96/09374 2 2 ~ O ~ ~ 3 PCT~US95/06743 5 mi~sen~e mutation associated with patients who are galactokinase deficient. Another T for G substitution produces an in-frame nonsense codon at amino acid position 80 of the mature protein. The result is a truncated protein consisting of the first 79 amino acids of human galactokinase A DNA "coding sequence of" or a "nucleotide sequence encoding" a 10 particular protein, is a DNA sequence which is transcribed and tr~nsl~te~l into a polypeptide when placed under the control of appropliate regulatory sequences.
A "promoter sequence" is a DNA regulatory region capable of binding RNA polymerase in a cell and initiating transcription of a downstream (3' direction) coding sequence. For purposes of defining the present invention, the 15 promoter sequence is bound at the 3' terminus by a translation start codon (e.g., ATG) of a coding sequence and extends upstream (5' direction) to include the ..,ill;..,l,.-- number of bases or elements necessary to initiate transcription at levels detectable above background. Within the promoter sequence will be found a transcription initiation site (conveniently defined by mapping with nuclease S1), as well as protein binding domains (consensus sequences) responsible for the binding of RNA polymerase. Eukaryotic promoters will often, but not always, contain "TATA" boxes and "CAT" boxes. Prokaryotic promoters contain Shine-Dalgarno sequences in addition to the -10 and -35 consensus sequences.
DNA "control sequences" refers collectively to promoter sequences, ribosome binding sites, polyadenylation signals, transcription termination sequences, upstream regulatory domains, enhancers, and the like, which collectively provide for the expression (i.e., the transcription and translation) of a coding sequence in a host cell.
A control sequence "directs the expression" of a coding sequence in a cell when RNA polymerase will bind the promoter sequence and transcribe the coding sequence into mRNA, which is then tr~ncl~te~ into the polypeptide encodedby the coding sequence.
A "host cell" is a cell which has been transformed or transfected, or is capable of transformation or transfection by an exogenous DNA sequence.
A cell has been "transformed" by exogenous DNA when such exogenous DNA has been introduced inside the cell membrane. Exogenous DNA
may or may not be integrated (covalently linked) into chromosomal DNA making up the genome of the cell. In prokaryotes and yeasts, for example, the exogenous DNA
may be m~int~inecl on an episomal element, such as a plasmid. With respect to eukaryotic cells, a stably transformed or transfected cell is one in which the exogenous DNA has become integrated into the chromosome so that it is inherited 2~0~5~3 by daughter cells through chromosome replication. This stability is demon~trated by the ability of the eukaryotic cell to establish cell lines or clones comprised of a population of daughter cell containing the exogenous DNA.
"Transfecdon" or "transfected" refers to a process by which cells take up foreign DNA and integrate that foreign DNA into their chromosome.
Transfection can be accomplished, for example, by various techniques in which cells take up DNA (e.g., calcium phosphate precipitation, electroporation, ~csimil~tion of liposomes, etc.), or by infection, in which viruses are used to transfer DNA into cells.
A "target cell" is a cell(s) that is selectively transfected over other cell types (or cell lines).
A "clone" is a population of cells derived from a single cell or common ancestor by mitosis. A "cell line" is a clone of a primary cell that is capable of stable growth in vitro for many generations.
A "heterologous" region of a DNA construct is an identifiable segment of DNA within or attached to another DNA molecule that is not found in ~Csoci~tion with the other molecule in nature. Thus, when the heterologous region encodes a gene, the gene will usually be flanked by DNA that does not flank the gene in the genome of the source animal. Another example of a heterologous coding sequence is a construct where the coding sequence itself is not found in nature (e.g., synthetic sequences having codons different from the native gene). Allelic variation or naturally occurring mutational events do not give rise to a heterologous region of DNA, as used herein.
"Conditions which are related to insufficient human galactokin~e activity"
or a "deficiency in galactokinase activity" means mutations of the galactokinaseprotein which affects g~l~ctokin~ce activity or may affect expression of galactokin~e or both such that the levels of galactose in a patient are atypically elevated. In addition, this definition is intended to cover atypically low levels of galactol~in~e expression in a patient due to defective control sequences for thereference g~l~ctokin~e protein.
This invention provides an i~ol~ted nucleic acid molecule encoding a human galactokinase protein and substantially similar sequences. Isolated nucleic acidsequences are "subst~nti~lly similar" if: (i) they are approximately the same length (i.e., at least 80% of the coding region of SEQ ID NO:4); (ii) they encode a protein with the same (i.e., within an order of m~gnitude) galactokinase activity as theprotein encoded by SEQ ID NO:4; and (iii) they are capable of hybridizing under moderately stringent conditions to SEQ ID NO:4; or they encode DNA sequences wo 96/09374 ~ 2 0 0 5 8 3 PCT~Sg5/06743 5 which are degenerate to SEQ ID NO:4. Degenerate DNA sequences encode the same amino acid sequence as SEQ ID NO:4, but have variation(s) in the nucleotidecoding sequences. Hybridization under moderately stringent conditions is outlined below.
Hybridizadon under moderately stringent conditions can be pelrol.ncd as 10 follows. Nitrocellulose filters are prehybridized at 65C in a solution conl~;ning 6X
SSPE, 5X Denhardt's solution (lOg Ficoll, 10g BSA and 10g Polyvinylpyrrolidone per liter solution), 0.05% SDS and 100 micrograms tRNA. Hybridi_ation probes arelabeled, preferably radiolabelled (e.g., using the Bios TAG-IT~) kit). Hybridi7~ti~n is then carried out for a~plo~i"lately 18 hours at 65C. The filters are then washed in a solution of 2X SSC and 0.5% SDS at room temperature for 15 minutes (repeated once). Subsequently, the filters are washed at 58C, air-dried and exposed to X-ray film ovemight at -70C with an intensifying screen.
~ltern~tively, "substantially similar" sequences are subst~nti~lly the same when about 66% (preferably about 75%, and most preferably about 90%) of the nucleotides or amino acids match over a defined length (i.e., at least 80% of the coding region of SEQ ID NO:4) of the molecule and the protein encoded by such sequence has the same (i.e., within an order of magnitude) galactokinase activity as the protein encoded by SEQ ID NO:4. As used herein, substantially similar refers to the sequences having similar identity to the sequences of the instant invention. Thus nucleotide sequences that are substantially the same can be identified by hybr ~li7~tion or by sequence comparison. Protein sequences that are substantially the same can be identi~le~ by one or more of the following: proteolytic digestion, gel electrophoresis and/or microsequencing.
This invention also provides isolated nucleic acid molecules encoding a mi~se~ce mutation (SEQ ID NO:5) or a nonsense mutation (SEQ ID NO:6) of the human galactokinase protein and DNA sequences which are degenerate to SEQ ID
NO:5 or 6. Degenerate DNA sequences encode the same amino acid (or termination site) sequence as SEQ ID NO:5 or 6, but have variation(s) in the nucleotide coding sequences.
One means for isolating a nucleic acid molecule encoding for a human galactokin~ce is to probe a human genomic or cDNA library with a natural or artificially designed probe using art recognized procedures (See for example:
"Current Protocols in Molecular Biology", Ausubel, F.M., et al. (eds.) Greene Publishing Assoc. and John Wiley Interscience, New York, 1989,1992). It is appreciated to one skilled in the art that SEQ ID NO:4, or fragments thereof (comprising at least 15 contiguous nucleotides), is a particularly useful probe.
WO 96/09374 ~ 2 0 (J 5 ~ 3 PCT/US95/06743 5 Several particularly useful probes for this purpose are set forth in Table 1, or hyhritli7~ble fr~gm~nt~ thereof (i.e., comprising at least 15 contiguous nucleotides).
It is also appreciated that such probes can be and are preferably labeled with an analytically ~etect~hle reagent to facilitate identific.~tion of the probe. Useful reagents include but are not limited to radioactivity, fluorescent dyes or enzymes 10 capable of catalyzing the forrnation of a detectable product. The probes are thus useful to isolate complementary copies of genomic DNA, cDNA or RNA from human, . ..~ n or other animal sources or to screen such sources for relatedsequences (e.g., additional members of the family, type and/or subtype) and including transcriptional regulatory and control elements defined above as well as 15 other stability, processing, translation and tissue specificity-dele~ ing regions from 5' and/or 3' regions relative to the coding sequences disclosed herein.
This invention also provides for gene therapy. "Gene therapy" means gene supplem~nt~tion. That is, an additional (i.e., reference) copy of the gene of interest is inserted into a p~tient~ cells. As a result, the protein encoded by the reference 20 gene corrects the defect (i.e., galactokinase deficiency) and permits the cells to function normally thus alleviating disease symptoms.
Gene therapy of the present invention can occur in vivo or ex vivo. Ex vivo gene therapy requires the isolation and purification of patient cells, the introduction of a therapeutic gene, and introduction of the genetically altered cells back into the 25 patient. A replication-deficient virus such as a modified retrovirus can be used to introduce the therapeutic gene (galactokinase) into such cells. For example, mouse Moloney leukemia virus (MMLV) is a well-known vector in clinical gene therapy trials (see, e.g., Boris-Lauerie et al., Curr. Opin. Genet. Dev.. 3:102-109 (1993)).
In contrast, in vivo gene therapy does not require isolation and purification 30 of patients' cells. The therapeutic gene is typically "packaged" for ~rimini~tration to a patient such as in liposomes or in a replication-deficient virus such as adenovirus (see, e.g., Berkner, K.L., Curr. Top Microbiol. Immunol.. 158:39-66 (1992)) or adeno-associated virus (AAV) vectors (see, e.g., Muzyczka, N., Curr. Top.
Microbiol. Immunol, 158:97-129 (1992) and U.S. Patent 5,252,479 "Safe Vector 35 for Gene Therapy"). Another approach is a~ministration of so-called "naked DNA"
in which the therapeutic gene is directly injected into the bloodstream or muscle tissue.
Cell types useful for gene therapy of the present invention include hepatocytes, fibroblasts, lymphocytes, any cell of the eye (e.g., retina), epithelial 40 and endothelial cells. Preferably the cells are hepatocytes, any cell of the eye or respiratory (or pulmonary) epithelial cells. Transfection of (pulmonary) epithelial W O 96/09374 2 2 0 0 5 8 3 PCTrUS95/06743 cells can occur via inhalation of a neubulized preparation of DNA vectors in liposomes, DNA-protein complexes or replication-deficient adenoviruses (see, e.g., U.S. Patent 5,240,846 "Gene Therapy Vector for Cystic Fibrosis".
This invention also provides for a process to prepare human g~l~ct()kin~e proteins. Non-mutant proteins are defined with reference to the amino acid sequence listed in SEQ ID NO:4 and includes variants with a substantially similar amino acid sequence that have the same galactokinase activity. Additional proteins of this invention include mutant human galactokinase proteins as set forth in SEQ ID NO: 5 or 6. The proteins of this invention are preferably made by recombinant genetic engineering techniques. The isolated nucleic acids particularly the DNAs can be introduced into expression vectors by operatively linking the DNA to the necess~ry expression control regions (e.g., regulatory regions) required for gene expression.
The vectors can be introduced into the appl~liate host cells such as prokaryotic(e.g., bacterial), or eukaryotic (e.g., yeast or m~mm~ 3n) cells by methods wellknown in the art (Ausubel et al., supra). The coding sequences for the desired proteins having been pl~;~ed or isolated, can be cloned into any suitable vector or replicon. Numelous cloning vectors are known to those of skill in the art, and the selection of an applopliate cloning vector is a matter of choice. Examples of recombinant DNA vectors for cloning and host cells which they can transform include, but is not limited to, the bacteriophage ~ (~. coli), pBR322 (~. ÇQll),pACYC177 C~- coli), pKT230 (gram-negative bacteria), pGV1106 (gram-negative bacteria), pLAFRl (gram-negative bacteria), pME290 (non-E. coli gram-negative bacteria), pHV14 (~. coli and Bacillus subtilis), pBD9 (Bacillus), pIJ61 (Sl~ptol"yces), pUC6 (S~IGpto,nyces), YIp5 (Saccharomyces), a baculovirus insectcell system, a Drosophila insect system, and YCpl9 (Saccharomyces). See, generally.
"DNA Cloning": Vols. I & II, Glover et al. ed. IRL Press Oxford (1985) (1987) and;
T. Maniatis et al. ("Molecular Cloning" Cold Spring Harbor Laboratory (1982).
The gene can be placed under the control of a promoter, ribosome binding site (for bacterial expression) and, optionally, an operator (collectively referred to herein as "control" elements), so that the DNA sequence encoding thedesired protein is transcribed into RNA in the host cell transformed by a vectorcom~ining this c~lGssion construction. The coding sequence may or may not contain a signal peptide or leader sequence. The subunit antigens of the presentinvention can be expressed using, for example, the E. coli tac promoter or the protein A gene (spa) promoter and signal sequence. Leader sequences can be removed by the bacterial host in post-translational processing. See, e.~., U.S. Patent Nos. 4,431,739;
4,425,437; 4,338,397.
WO 96/09374 2 2 U ~ S 8 ~ PCT/US95/06743 S In addition to control sequences, it may be desirable to add regulatory sequences which allow for regulation of the e,-plGssion of the protein sequencesrelative to the growth of the host cell. Regulatory sequences are known to those of skill in the art, and examples include those which cause the expression of a gene to be turned on or off in response to a chemical or physical stim-lh-s, including the presence of a regulatory compound. Other types of regulatory element~ may also be present in the vector, for example, enhancer sequences.
An e~r~ ssion vector is constructed so that the particular coding sequence is located in the vector with the applopliate regulatory sequences, thepositioning and or çnt~sion of the coding sequence with respect to the control sequences being such that the coding sequence is transcribed under the "control" of the control sequences (i.e., RNA polymerase which binds to the DNA molecule at the control sequences transcribes the coding sequence). Modification of the sequences encoding the particular antigen of interest may be desirable to achieve this end. For example, in some cases it may be necessary to modify the sequence so that it may be ~ts~hecl to the control sequences with the applol,liate orientation; i.e., to m~int~in the reading frame. The control sequences and other regulatory sequences may be ligated to the coding sequence prior to insertion into a vector, such as the cloning vectors described above. Alternatively, the coding sequence can be cloned directly into an e~ression vector which already contains the control sequences and an a~pl~pliate restriction site.
In some cases, it may be desirable to produce other mut~nt~ or analogs of the g~l~ctokin~ce protein. Mutants or analogs may be prepared by the deletion of a portion of the sequence encoding the protein, by insertion of a sequence, and/or by substitll~ion of one or more nucleotides within the sequence. Techniques for modifying nucleotide sequences, such as site-directed m-lt~genesi~ are well known to those skilled in the art. See, e.~., T. Maniatis et al., supra; DNA Clonin~ Vols. I and II, supra; Nucleic Acid Hybridization~ supra.
A number of prokaryotic ~ s~ion vectors are known in the art.
See, ç~" U.S. Patent Nos. 4,578,355; 4,440,859; 4,436,815; 4,431,740; 4,431,739;4,428,941; 4,425,437; 4,418,149; 4,411,994; 4,366,246; 4,342,832; ~ also U.K.
Patent Applications GB 2,121,054; GB 2,008,123; GB 2,007,675; and Eul~eall Patent Application 103,395. Yeast expression vectors are also known in the art. See, e.g., U.S. Patent Nos. 4,446,235; 4,443,539; 4,430,428; see ~1~Q European Patent- Applications 103,409; 100,561; 96,491. pSV2neo (as described in J. Mol. Appl.
Genet. 1 :327-341) which uses the SV40 late promoter to drive expression in .,.~.. ~li~n cells orpCDNAlneo, a vector derived from pCDNA1 (Mol. Cell Biol.
wo 96t09374 2 2 0 0 ~ 8 3 PCT/USg5/06743 7:4125-29) which uses the CMV promoter to drive expression. Both these latter two vectors can be employed for transient or stable (using G418 resistance) eA~rl,ssion in " ,~" ",~ n cells. Insect cell e~ression systems, e.g., Drosophila~ are also useful, see for ex~mrle, PCT applications WO 90/06358 and WO 92/06212 as well as EP
290,261-B1.
Depending on the expression system and host selected, the proteins of the present invention are produced by growing host cells transformed by an eA~ssion vector described above under conditions whereby the protein of interest is tA~l~,ssed. ~efell~,d "~ n cells include human embryonic kidney cells, monkey kidney (HEK-293cells), fibroblast tCOS) cells, Chinese ha~llsl~l ovary (CHO) cells, Drosophila or murine L-cells. If the expression system sec~ s the protein into growth media, the protein can be purified directly from the media. If the protein is not secreted, it is i~ol~t~l from cell Iysates orrecovered from the cell membrane fraction. The selection of the a~lopliate growth conditions and recovery methodsare within the skill of the art.
An ~ltenl~tive method to identify proteins of the present invention is by constructing gene libraries, using the resulting clones to transform 1~ 1i and pooling and scl~,e~ g individual colonies using polyclonal serum or monoclonal antibodies to galactokinase.
The proteins of the present invention may also be produced by chemical synthesis such as solid phase peptide synthesis, using known amino acidsequences or amino acid sequences derived from the DNA sequence of the genes of interest. Such methods are known to those skilled in the art. Chemical synthesis of peptides is not particularly preferred.
The proteins of the present invention or their fragments cornrn~ing at least one epitope can be used to produce antibodies, both polyclonal and monoclonal.
If polyclonal antibodies are desired, a selected m~mm~l, (e.g., mouse, rabbit, goat, horse, etc.) is immunized with the protein of the present invention, or a fragment thereof, capable of eliciting an immune response (i.e., having at least one epitope).
Serum from the immnni7ed animal is collected and treated according to known procedures. If serum containing polyclonal antibodies is used, the polyclonal andbodies can be purified by immunoaffinity chromatography or other known procedures.
Monoclonal antibodies to the proteins of the present invention, and to the fr~,~m~n~c thereof, can also be readily produced by one skilled in the art. The general methodology for making monoclonal antibodies by using hybridoma technology is well known. Immortal antibody-producing cell lines can be created by WO 96/09374 Z 2 U () 5 8 3 PCT/US95/06743 _ S cell fusion, and also by other techniques such as direct transformation of Blymphocytes with oncogenic DNA, or transfection with Epstein-Barr virus. See, e.~., M. Schreier ~ ~1., "Hybridoma Techniques" (1980); Hammerling ~ al., "Monoclonal Antibodies and T-cell Hybridom~c" (1981); Kennett et al., "Monoclonal Antibodies"
(1980); see also U.S. Patent Nos. 4,341,761; 4,399,121; 4,427,783; 4,444,887;
4,452,570; 4,466,917; 4,472,500; 4,491,632; and 4,493,890. Panels of monoclonal antibodies produced against the antigen of interest, or fragment thereof, can bescreened for various yluy~-lLies; i.e., for isotype, epitope, affinity, etc. Hence one skilled in the art can produce monoclonal antibodies specifically reactive with mutant g~l~etc-bin~e proteins, e.g., the misse~se mutation of SEQ ID NO:5 or nonsense mutation of SEQ ID NO:6. Monoclonal antibodies are useful in pllrific~tiQn, using immllno~ffinity techniques, of the individual antigens which they are directed against.
Alternatively, genes encoAing the monoclonals of interest may be isol~ted from the hybridomas by PCR techniques known in the art and cloned and expressed in the apl)lopl;ate vectors. The antibodies of this invention, whether polyclonal or monoclonal have additional utility in that they may be employed reagents in immllno~csays, RIA, ELISA, and the like. As used herein, "monoclonal antibody" is understood to include antibodies derived from one species (e.g., murine, rabbit, goat, rat, human, etc.) as well as antibodies derived from two (or perhaps more) species (e.g., chimeric and h~ ni7ed antibodies).
Chime~ic antibodies, in which non-human variable regions are joined or fused to human constant regions (~, e.~. Liu et al., Proc. Natl Acad. Sci. USA~ 84:3439 (1987)), may also be used in assays or thcl~euLically. Preferably, a theidpeuLicmonoclonal antibody would be "hum~ni7ed" as described in Jones et al., Nature, 321:522 (1986); Verhoeyen et al., Science. 239: 1534 (1988); Kabat et al., L
Irnrnunol., 147:1709 (1991); Queen et al., Proc. Natl Acad. Sci. USA. 86:10029 (1989); Gorman et al., Proc. Natl Acad. Sci. USA. 88:34181 (1991); and Hodgson et al., Bio~rechnolo~y. 9:421 (1991). Therefore, this invention also contemplates antibodies, polyclonal or monoclonal (including chimeric and "hnm~ni7~A") directed to epitopes cc.ll. ~,uonding to amino acid sequences disclosed herein from humangalactokin~se. Methods for the production of polyclonal and monoclonal antibodies are well known, see for example Chap. 11 of Ausubel et al. (supra).
When the antibody is labeled with an analytically detect~ble reagent such a r~ioactivity, fluorescence, or an enzyme, the antibody can be use to detect the presence or absence of human galactokin~e and/or its qll~ntitative level. In ~dtlition, antibodies (polyclonal or monoclonal) specific for the misserlce and nonsense mutations of the present invention are useful for diagnostic purposes. A serum or ~20iJ5~3 W O 96/09374 PCT~US95/06743 tissue sample (e.g., liver, lung, etc.) is obtained and allowed to come in contact with an antibody or antibody fragment which specifically binds to a mutant human galactokin~ce protein of the invention under conditions such that an antigen-antibody complex is formed between said antibody (or antibody fragment) and saidmutant g~ tokin~ce protein. The detection for the presence or absence of said complex is within the skill of the art (e.g., ELISA, RIA, Western Blotting, Optical Biosensor (e.g., BIAcore - Pharmacia Biosensor, Uppsala, Sweden) and do not limit this invention.
This invention also contçmpl~tes pharmaceutical compositions comprising an effective amount of the galactokin~ce protein of the invention and a pharm~ce~ltic~lly acceptable carrier. Ph~rm~ceutical compositions of p~teinaceous drugs of this invention are particularly useful for parentel~l a~lminictration, i.e., subcutaneously, intramuscularly or intravenously. Optionally, the g~l~ctQkin~ce protein is surrounded by a membrane bound vesicle, such as a liposome.
The compositions forparenteral ~minictration will commonly comprise a solution of the compounds of the invention or a cocktail thereof dissolved in anacceptable carrier, preferably an aqueous carrier. A variety of aqueous carriers may be employed, e.g., water, buffered water, 0.4% saline, 0.3% glycine, and the like.
These solutions are sterile and generally free of particulate matter. These solutions may be steriliæd by conventional, well known sterilization techniques. The compositions may contain pharmaceutically acceptable auxiliary substances as required to al)l,lu~ ate physiological conditions such as pH adjusting and buffering agents, etc. The concentration of the compound of the invention in such pharmaceutical formulation can very widely, i.e., from less than about 0.5%, usually at or at least about 1% to as much as 15 or 20% by weight and will be selected primarily based on fluid volumes, viscosities, etc., according to the particular mode of ~minis~ration selecte~
Thus, a ph~n~eutical composition of the invention for intramuscular injection could be plepal~d to contain 1 mL sterile buffered water, and 50 mg of a compound of the invention. Similarly, a pharmaceutical composition of the invention for intravenous infusion could be made up to contain 250 ml of sterile Ringer's solution, and 150 mg of a compound of the invention. Actual methods for ~.e~.;ng~alentel~lly aflministrable compositions are well known or will be apparent to those skilled in the art and are described in more detail in, for example, Remin~ton'sPharmaceutical Science. l5th ed., Mack Publishing Company, Easton, Pennsylvania.The compounds described herein can be lyophilized for storage and reconstituted in a suitable carrier prior to use. This technique has been shown to be WO 96/09374 2 2 0 ~ S ~ 3 PCT/US95/06743 effective with conventional proteins and art-known lyophili7~tion and reco~tit~ltion techniques can be employed.
The physician will determine the dosage of the present thel~eu~ic agents which will be most suitable and it will vary with the forrn of a~mini~tration and the particular compound chosen, and furthermore, it will vary with the particular patient under patient under tre~tm~nt. He will generally wish to initiate treatment with small dosages subst~nti~lly less than the optimum dose of the compound and increase the dosage by small increments until the o~ ulll effect under the circumstances is reached. It will generally be found that when the composition is ~mini~tered orally, larger quantities of the active agent will be required to produce the same effect as a smaller quantity given pa,cnlcldlly. The therapeutic dosage will generally be from 1 to 10 milligrams per day and higher although it may be ~tlmini~tered in several different dosage units.
Depending on the patient condition, the pharmaceutical composition of the invention can be a~lministçred for prophylactic and/or therapeutic tre~tm~nts Intherapeutic application, compositions are a~lmini~tered to a patient already suffering from a disease in an amount sufficient to cure or at least partially arrest the disease and its complications. In prophylactic applications, compositions cont;1;ning the present compounds or a cocktail thereof are ~tlmini~tered to a patient not already in a disease state to enhance the patient's resistance.
Single or m~lltiple a~lmini~trations of the pharmaceutical compositions can be carried out with dose levels and pattern being selected by the treating physician. In any event, the pharmaceutical composition of the invention should provide a quantity of the compounds of the invention sufficient to effectively treat the patient.
This invention also contemplates use of the galactokin~ce genes of the instant invention as a diagnostic. For example, some diseases result from inherited defective genes. These genes can be detected by comparing the sequence of the defective gene with that of a normal one. Subsequently, one can verify that a "mutant" gene is associated with galactokinase deficiency by measurement of galactose. That is, a mutant gene would be associated with (atypically) elevatedlevels of galactose in a patient. In addition, one can insert mutant galactokin~e genes into a suitable vector for expression in a functional assay system (e.g., colorimetric assay, expression on MacConkey plates, complem~nt~tion experiments,e.g, in a g~ tokin~e deficient strain of yeast or E. coli) as yet another means to verify or identify galactokin~e mutations. As an example, RNA from an individualcan be transcribed with reverse transcriptase to cDNA which can then be amplified by polymerase chain reaction (PCR), cloned into an E. coli expression vector, and 5 transformed into a galactokinase-deficient strain of E. coli. When grown on MacConkey in-licatQr plates, galactokinase-deficient cells will produce colonies that are white in color, whereas cells that have been transformed/complem~nted with afunctional g~l~ctokin~ce gene will be red (see, e.g., Examples section). If most to all of the colonies from an individual are red, then the individual is considered to be 10 normal with respect to galactokin~ce activity. If approximately 50% of the colonies are red (the other 50% white), then that individual is likely to be a carrier for galacto'-in~ce deficiency. If most to all of the colonies are white, then that individual is likely to be galactokinase deficient. Once "mutant" genes have been identified, one can then screen the population for carriers of the "mutant"
15 galactokin~ce gene. (A carrier is a person in apparent health whose chromosomes contain a "mutant" galactokin~ce gene that may be tr~nsmitted to that person's offspring.) In addition, monoclonal antibodies that are speciffc for the mutant galactokin~ce proteins can be used for diagnostic purposes as described above.
Individuals carrying mutations in the human galactokinase gene may be 20 detected at the DNA level by a variety of techniques. Nucleic acids used for gnosic (genomic DNA, mRNA, etc.) may be obtained from a patient's cells, such as from blood, urine, saliva, tissue biopsy (e.g., chorionic villi s~mpling or removal of amniotic fluid cells), and autopsy material. The genomic DNA may be used directly for detection or may be amplified enzym~tic~lly by using PCR, ligase chain 25 reaction (LCR), strand displacement amplification (SDA), etc. (see, e.g., Saiki et al., Nature, 324:163-166 (1986), Bej, et al., Crit. Rev. Biochem. Molec. Biol.. 26:301-334 (1991), Birkenmeyer et al., J. Virol. Meth.. 35:117-126 (1991), Van Brunt, J., BiolTechnolo~y. 8:291-294 (1990)) prior to analysis. RNA may also be used for the same purpose. The RNA can be reverse-transcribed and ampli~led at one time 30 with PCR-RT (polymerase chain reaction - reverse transcriptase) or reverse-transcribed to an unamplified cDNA. As an example, PCR primers complçm~ nt~ry to the nucleic acid of the instant invention can be used to identify and analyzegalactokin~ce mutations. For example, deletions and insertions can be detected by a change in size of the amplified product in comparison to the normal galactokinase 35 genotype. Point mutations can be identified by hybridizing amplified DNA to radiolabeled galactokin~ce RNA (of the invention) or alternatively, radiolabelled galactokin~ce ~ntisence DNA sequences (of the invention). Perfectly matched sequences can be distinguished from micm~tched duplexes by RNase A digestion or by differences in melting temperatures (Tm). Such a diagnostic would be particularly 40 useful for prenatal and even neonatal testing.
WO 96/09374 ~ ~ U i) J ~ 3 PCT/US95/06743 -In addition, point mutations and other sequence differences between the reference gene and "mutant" genes can be identi~led by yet other well-known techniques, e.g., direct DNA sequencing, single-strand conformational polymorphism (SSCP; Orita et al., Genomics. 5:874-879 (1989)). For example, a sequencing primer is used with double-stranded PCR product or a single-stranded template molecule generated by a modified PCR. The sequence determination is pe.~~ ed by convention~l procedures with radiolabeled nucleotides or by fl~l~Q~ ;C sequencing procedures with fluorescent-tags. Cloned DNA segments may also be used as probes to detect specific DNA segments. The sensitivity of this methotl is greatly enh~nced when combined with PCR. The presence of nucleotide repeats may correlate to a change in galactokin~ce activity (causative change) or serve as marker for various polymorphisms.
Genetic testing based on DNA sequence differences may be achieved by detection of alteration in electrophoretic mobility of DNA fragments in gels with or without denaturing agents. Small sequence deletions and insertions can be vi~u~li7e~1 by high resolution gel electrophoresis. DNA fragments of different sequences may be distinguished on denaturing formamide gradient gels in which the mobilities of dirferent DNA fragments are retarded in the gel at different positions according to their specific melting or partial melting temperatures (see, e.g., Myers et al., Science. 230:1242 (1985)). In addition, sequence alterations, in particular small deletions, may be detected as changes in the migration pattern of DNA
heteroduplexes in non-denaturing gel electrophoresis (i.e., heteroduplex electrophoresis) (see, e.g., Nagamine et al., Am. J. Hum. Genet.. 45:337-339 (1989))-Sequence changes at specific locations may also be revealed by nuclease 30 protection assays, such as RNase and S 1 protection or the chemical cleavage method (e.g., Cotton et al., Proc. Natl. Acad. Sci. USA~ 85:4397-4401 (1985)).
Thus, the detection of a specific DNA sequence may be achieved by methods such as hybridization (e.g., heteroduplex ele-;L..poldtion, see, White et al., Genorrucs. 12:301-306 (1992), RNAse protection (e.g., Myers et al., Science.
230:1242 (1985)) chemical cleavage (e.g., Cotton et al., Proc. Natl. Acad. Sci. USA.
85:4397-4401 (1985))), direct DNA sequencing, or the use of restriction enzymes (e.g., restriction fragment length polymo~phisms (RFLP) in which variations in the number and size of restriction fragments can indicate insertions, deletions, presence - of nucleotide repeats and any other mutation which creates or destroys an endonuclease restriction sequence). Southern blotting of genomic DNA may also beused to identify large (i.e., greater than 100 base pair) deletions and insertions.
WO 96/09374 2 2 U lJ j 8 3 PCT/US95/06743 In addition to more conventional gel-electrophoresis, and DNA sequencing, mutations (e.g., rnicrodeletions, aneuploidies, translocations, inversions) can also be detected by in situ analysis (See, e.g., Keller et al., DNA Probes, 2nd Ed., Stockton Press, New York, N.Y., USA (1993)). That is, DNA (or RNA) sequences in cells can be analyzed for mutations without isolation and/or immobilization onto a membrane. Fluorescence in situ hybridization (FISH) is presently the most commonly applied method and numerous reviews of FISH have appeared. See, e.g., Trachuck et al., Science. ~Q:559-562 (1990), and Trask et al., Trends. Genet., l:
149-154 (1991) which are incorporated herein by reference for background purposes. Hence, by using nucleic acids based on the structure of specific genes, e.g., galactokin~ce, one can develop diagnostic tests for galactokinase ~leficiency.
In addition, some diseases are a result of, or are characterized by, changes in gene expression which can be detected by changes in the mRNA. Alternatively, thegalactokin~ce gene can be used as a reference to identify individuals e~lcssing a decreased level of galactokinase, e.g., by Northern blotting or in situ hybridization.
Defining ap~lo~liate hybridization conditions is within the skill of the art.
See, e. g., "Current Protocols in Mol. Biol." Vol. I & II, Wiley Interscience. Ausbel et _l. (ed.) (1992). Probing technology is well known in the art and it is appreciated that the size of the probes can vary widely but it is preferred that the probe be at least 15 nucleotides in length. It is also appreciated that such probes can be and arepreferably labeled with an analytically detectable reagent to facilitate i-l~ntific~tiQn of the probe. Useful reagents include but are not limited to radioactivity, flu~ scent dyes or enzymes capable of catalyzing the formation of a detectable product. As a general rule the more stringent the hybridization conditions the more closely related genes will be that are recovered.
Also within the scope of this invention are antisense oligonucleotides pre~1icate~ upon the sequences disclosed herein for human galactokinase. Synthetic oligonucleotides or related antisense chemical structural analogs are designed to recognize and specifically bind to a target nucleic acid encoding galact-~kin~ce and galactokinase mutations. The general field of antisense technology is illustrated by the following disclosures which are incorporated herein by reference for purposes of background (Cohen, J.S., Trends in Pharm. Sci.. 10:435(1989) and Weintraub, H.M.Scientific American. Jan.(1990) at page 40).
Transgenic, non-human, animals may be obtained by transfecting ap~ liate fertilized eggs or embryos of a host with nucleic acids encoding human g~l~ctok;n~e 40 disclosed herein, see for example U.S. Patents 4,736,866; 5,175,385; 5,175,384 and 5,175,386. The resultant transgenic animal may be used as a model for the study of WO 96t09374 2 2 U U 5 ~ 3 PCT/US95/06743 ._ S g~ tokin~se Particularly, useful transgenic ~nim~lc are those which display a detect~kle phenotype associated with the expression of the receptor. Drugs may then be screened for their ability to reverse or exacerbate the relevant phenotype. This invention also conte-mrlates operatively linking the receptor coding gene to regulatory eleme-nt~ which are differentially responsive to various ~ Lulc or 10 metabolic conditions, thereby effectively turning on or off the phenotypic expression in response to those conditions.
Although not necess~lily limiting of this invention, following are some experimental data illustrative of this invention.
F.XAMpLE I
Purification of Human Galactokinase from Placental Tissue Galactokinase (galK) was obtained from human placenta as described by Stambolian et al. (Biochim Biophys Acta, 831:306-312 (1985)), which is incorporated 20 by reference in its entirety. In essence, human placenta tissue (obtained within 1 hour of parturition) was homogenized, centrifuged and the resulting supernatant was absorbed onto DEAE-Sephacel@). The m~te~i~l was eluted, precipitated with ~mmonium sulfate and then run through a sizing column (Sephadex G-100 SF(E)).
Pooled active fractions were concentrated. Purified protein was obtained following 25 separation by SDS polyacrylamide electrophoresis and then Western blotted using standard techniques (see, Laemmli, ~ure, 227:680-685 (1970), or LeGendre et al.,Biotechniques. 6:154 (1988)). Minute amounts of galactokinase were isolated (micrograms) from multiple rounds of protein purification. After a trypsin peptide digest, 7 peptide sequences were eventually i~ol~ted and identifi~l The three longest 30 fr~gm~nts are presented below:
[SEQ ID NO:l]
Val Asn Leu Ile Gly Glu His Thr Asp Tyr Asn Gln Gly Leu Val Leu-Pro Met Ala Leu Glu Leu Met Thr Val Leu Val Gly Ser Pro Arg 35 [SEQ ID NO:2]
His Ile Gln Glu His Tyr Gly Gly Thr Ala Thr Phe Tyr Leu Ser Gln-Ala Ala Asp Gly Ala Lys [SEQ ID NO:3]
40Ala Gln Val Cys Gln Gln Ala Glu His Ser Phe Ala Gly Met Pro Cys-Gly Ile Met Asp Gln Phe Ile Ser Leu Met Gly Gln Lys W O 96/09374 PCTrUS95/06743 S The fr~mPnts were co.~ ed with peptide sequences encoded by cDNAs, in which the cDNAs were partially sequenced. The cDNAs (also known as expressed sequence tags or ESTs) were obtained from Human Genome Sciences, Inc.
(Rockville, MD, USA). The best ~lignm~ns~ occurred with an EST sequence from a human osteocl~tom~ stromal cell library (SEQ ID NO:1 showed 100% identity over 18 contiguous amino acids) and an EST sequence from a human ~iLuila y library (SEQ
ID NO:2 showed 95.5% identity over 22 contiguous amino acids). A full-length cDNA from the human osteoclastoma stromal cell library was i~lentifie~l and sequenced (SEQ ID NO:4) in its entirety on an automated ABI 373A Sequencer.
Sequencing was conr~lllcd on both strands. The corresponding amino acid sequence(SEQ ID NO:4) was coll~pa ed against the peptide fragments idçntifie~ above. SEQID N0:1 coll~onds to arnino acids 38-68 of the full-length human galactokinase protein. Similarly, SEQ ID NOs: 2 and 3 correspond to amino acids 367-388 and 167-195, respectively, of human galactokinase.
Analysis of the Human Galactokinase Gene:
A coll~;son of the amino acid sequence for human g~ tokin~ce with that of E. coli galactokin?ce (Debouck et al., Nuc. Acid Res.. 13:1841-1853 (1985)) shows 61% similarity and 44.5% identity. Further co-~yalison with another ~ ol~ed human g~ tok;n~ce gene (G~2) (Lee et al., Proc. Natl. Acad. Sci. USA. 89:10887-10891 (1992)) shows 54% similarity and 34.6% identity at the amino acid l~vel.
Furthermore, the Gk~2 gene maps to human chromosome 15 which is in contrast to the gene of the present invention which maps to human chromosome 17, position q24 as ~letenninpd by fluorescence in situ hybridization (FISH) analysis.
SEQ ID N0:4 was hybridized against a Northern blot containing human messenger RNA from placenta, brain, skeletal muscle, kidney, intestine, heart, lung and liver according to standard procedures (see, e.g., Sambrook et al., Molecular Clonin~: A Laboratory Manual~ 2nd Ed., Cold Spring Harbor Laboratory Press, 1989). Hybridization was strongest with human liver and lung tissue.
Galactokinase Complementation:
SEQ ID NO:4 was subcloned into an E. coli vector, plasmid pBluescript [Stratagene]. When transformed into C600K-, a galactokinase-deficient strain, the transformed E. coli grew on MacConkey agar plates containing 1 % galactose (and ampicillin @ 50ug/ml for plasmid selection), and produced brick red colonies, indicating sugar fermentation. Specifically, the red color is due to the action of acids, ~2()~3 W 096/09374 PCT~US95/06743 -5 produeed by galaetose ferment~tion, upon bile salts and the in~ie~tQr (neutral red) in MacConkey mylillm Expression in Mammalian Cells:
SEQ ID NO:4 was also subcloned into COS-1 eells [ATCC CRL 1650]. The 10 cells were transfected, grown, and cell lysates were ple~d. The lysates were assayed by a 14C galactnkin~ce assay as deseribed by Stambolian et al. (Exp. Eye Res..
~:231-237 (1984)) which is hereby incorporated by reference in its entirety. When expressed in tr~ncie-ntly transfected COS cells, galactokinase activity was tenfold higher than control levels (6600 vs. 640 counts per minute - repeated three times).
15 These results definitively confirm that SEQ ID NO:4 encodes a full-length, biologieally aetive, human galactc-kin~ce gene.
The nucleic acid molecule of the invention can also be subcloned into an e~iession vector to produce high levels of human galactokinase (either fused to another protein, e.g., operatively linked at the 5' end with another coding sequence, or 20 unfused) in transfected cells. For m~mm~ n cells, the expression vector wouldoptionally encode a neomycin resistance gene to select for transfectants on the basis of ability to grow in G418 and a dihydrofolate reductase gene which permits ~mplifie~tion of the transfected gene in DHFR- cells. The pl~smid ean then be introduced into host cell lines e.g., CHO ACC98, a nonadherent, DHFR- cell line 25 adapted to grow in serum free m~ ]m, and human embryonic kidney 293 cells (ATCC CRL 1573), and transfeeted cell lines can be selected by G418 resist~nce.
Human Galactokinase Gene - Genomic Sequence:
A full-length galaetc~ in~ce genomie gene coding region was identified from a lambda phage (~ Fix II) human genomic library (made from human placenta tissue) using the galK cDNA as a probe. One isolate, designated clone 17 was deposited on 3 May 1995, with the American Type Culture Collection (ATCC), Rockville, MD, USA, under accession number ATCC 97135, and has been accepted as a patent deposit, in accordance with the Budapest Treaty of 1977 governing the deposit of35 microor~nicmc for the purposes of patent procedure.
The genomic gene coding region is divided into at least 8 exons isolated from 4 DNA fragments. The arrangement is depicted in Figure 1. The DNA sequenee was - determined by using multiple oligonucleotide PCR primers corresponding to the galK
eDNA sequence (i.e., corresponding to galK genomic exons) as well as 40 oligonucleotide PCR primers subsequently designed that correspond to non-coding regions (i.e., galK genomic introns). Thus the structure of the galactokinase genomic gene is s-lmm~ri7ed in Table 1 below (see also Figure 2 and SEQ ID NO:7]):
~ 2 U () S 8 3 PCT/US95/06743 wo 96/09374 s Table 1 Genomic Galactokinase Gene Amino Acids PCR Primer #/
Exon # Encoded ~SEQ ID NO]
1-55 3333/[8]
3334l[9]
3598/[10]
3599/[11]
2 56-118 1888/[12]
3332/[13]
3604/[14]
3605/[15]
3 119-158 3331/[16]
3606/[17]
4 159-204 1657/[18]
3034/[19]
205-264 3330/[20]
3607/[21]
6 265-315 1539/[22]
2665/[23]
7 316-369 1891/[24]
2665/[25]
8 370-392 2665/[26]
2666l~27]
2667l[28]
Galactokinase Deficiency Marker/Gene:
A fibroblast cell line (GM00334), derived from a patient with ~ ctokin~e 15 deficiency, was obtained from the Coriell Institute for Medical research, 401 Haddon 22U~583 WO 96t09374 PCT/US95/06743 Ave., C~m~pn~ New Jersey, 08103. Total RNA was isolated from the cultured cells using the RNAZOL kit for i~ol~tion cf RNA (Biotecx, Houston, Tx). Cytoplasmic DNA (1 ug) was reversed transcribed with oligonucleotide primers 1823 [SEQ ID
NO: 29] and 1825 [SEQ ID NO: 30]. The sample was amplified by 35 cycles at 94C
for 1 min., 60C for 1 min. and 72C for 7 min. The DNA product was purified electrophoretically, ligated to the TA cloning vector (Invitrogen) and sequenced.
Twelve cDNAs in total were sequenced (representing cloned PCR products of mllltiple independent PCR reactions). This procedure was also repeated with cultured fibroblasts from normal controls (i.e., persons not exhibiting galactokin~e deficiency).
A comparison with normal controls iclenti~le~ a single base substitution of A
for G at position 122 of the "normal" human galactokin~e gene [SEQ ID NO: 4].
The result is a missen~e mutation in amino acid 32 from Val to Met [SEQ ID NO: 5].
The G to A base change creates a MscI endonuclease restliction site (i.e., TGGI,CCA) on the mutant allele. This restriction site was then used to rapidly screen for the mutant allele in the parents of the patient with galactokin~e deficiency. In essence, the exon encoding galactokinase residues 1 to 5 (i.e., exon 1, see Table 1) was cloned from a genomic lambda phage library and its DNA sequence was ~leterrnined inrlu~ing a portion of the fl~nking intron sequences. Oligonucleotide im~ (X2-SOUT [SEQ ID NO: 31] and X2-30UT [SEQ ID NO: 32]) were ~lçsigned to hybridize to intron sequences for the amplification of a 346 bp DNAfragment of the genomic DNA. The PCR product was analyzed for the point mutation via RFLP, that is, the presence of a newly created MscI site as detected by electrophoresis of a 1.5% agarose gel. A "normal" allele remains uncut with the enzyme MscI, and thus migrates as a 346bp fragment on an agarose gel. The PCR
product from the patient with galactokin~ce deficiency (i.e., the G to A base change) is cleaved with MscI, resulting in two fr~gmrnts of 193 and 153 bp, respectively. The ~bsenre of 346 bp fragment in~ic~tes that the patient was homozygous for this allele.
In contrast, PCR products from the parents of this patient, followed by a MscI
digestion, resulted in three fragments (346, 193 and 153 bp) which is consistent with a heterozygous pattern for the G to A base change. That is, the parents were both carriers of the same mutation.
To determine whether the mi Ccen ce mutation resulted in decreased enzyma~c activity, a cDNA clone cont~ining the G to A base change was subcloned into COS
cells and assayed for g~l~rtokin~e activity as previously described. COS cells transfected with cDNA encoding the missen~e mutation had the same level of galactokinase activity as the host COS cells, namely 0.02 units/ug protein. In contrast~
COS cells transfected with the non-mutant galactokinase cDNA [SEQ ID NO:4] had a ~2U~5~3 S fifty-fold higher activity col,l~auGd to the host COS cells (i.e., control). This results supports the Val32 to Met32 substitution as the cause of the decreased enzymaticactivity.
Another mutadon was discovered in an unrelated patient having cataracts and diagnosed as g~l~ctokin~ce deficient (galactokin~ce activity was found to be close to 10 zero). Genomic DNA was isolated from lymphoblastoid cell lines and sequenced by automated sequencing on an ABI 373A sequencer. A single base substitution of T for G resulted in an in-frame nonsense codon (i.e., TAG) at amino acid position 80 rSEQ
II) NO:6]. This mutation causes premature termination of human galactokinase, resulting in a trnncated protein of 79 amino acids that would be expected to be non-15 functional. (The genomic DNA of the parents of this patient were heterozygous forthis mutation, and hence not galactokinase deficient.) The above description and examples fully disclose the invention including 20 l~lc~Gll~,d embo-lim~ntc thereof. Those skilled in the art will recognize, or be able to ascertain using no more than routine expGl;...e~ tion, many equivalents to the specific eml~limPntc herein. Such equivalents are intended to be within the scope of the following claims.
22u~J583 W O 96/09374 PCT~US9j/06743 SEQUENCE LISTING
(1) GENERAL INFORMATION:
(l) APPLICANT: Bergsma, Derk J.
Stambolian, Dwight (ii) TITLE OF INVENTION: Human Galactokinase Gene (iii) NUMBER OF SEQUENCES: 32 (iv) CORRESPONDENCE ADDRESS:
(A) ADDRESSEE: SmithKline Beecham Corp./Corporate Intellectual Property (B) STREET: 709 Swedeland Road/UW2220 (C) CITY: King of Prussia (D) STATE: Pennsylvania (E) COUNTRY: USA
(F) ZIP: 19406-0939 (v) COMPUTER READABLE FORM:
(A) MEDIUM TYPE: Floppy disk (B) COMPUTER: IBM PC compatible (C) OPERATING SYSTEM: PC-DOS/MS-DOS
(D) SOFTWARE: PatentIn Release #1.0, Version #1.30 (vi) CURRENT APPLICATION DATA:
(A) APPLICATION NUMBER:
(B) FILING DATE:
3~ (C) CLASSIFICATION:
(vii) PRIOR APPLICATION DATA:
(A) APPLICATION NUMBER: PCT/US94/10825 (B) FILING DATE: 23-SEP-1994 (viii) ATTORNEY/AGENT INFORMATION:
(A) NAME: Sutton, Jeffrey A.
(B) REGISTRATION NUMBER: 34,028 (C) REFERENCE/DOCKET NUMBER: P50268-l - ~ -220(J583 W 096/09374 PCTnUS95/06743 s ~ix) TELECOMMUNICATION INFORMATION:
(A) TELEPHONE: 610-270-5024 (B) TELEFAX: 610-270-S090 (2) INFORMATION FOR SEQ ID NO:1:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 31 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:1:
Val Asn Leu Ile Gly Glu His Thr Asp Tyr Asn Gln Gly Leu Val Leu Pro Met Ala Leu Glu Leu Met Thr Val Leu Val Gly Ser Pro Arg (2) INFORMATION FOR SEQ ID NO:2:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 22 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single ~D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein 220(J~83 W 096/09374 PCTrUS95/06743 _ (xl) SEQUENCE DESCRIPTION: SEQ ID NO:2:
His Ile Gln Glu His Tyr Gly Gly Thr Ala Thr Phe Tyr Leu Ser Gln 10 Ala Ala Asp Gly Ala Lys ~2) INFORMATION FOR SEQ ID NO:3:
15 (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 29 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3:
Ala Gln Val Cys Gln Gln Ala Glu His Ser Phe Ala Gly Met Pro Cys Gly Ile Met Asp Gln Phe Ile Ser Leu Met Gly Gln Lys (2) INFORMATION FOR SEQ ID NO:4:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1349 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: double (D) TOPOLOGY: llnear (ii) MOLECULE TYPE: cDNA
W O 96/09374 PCT~US95/06743 ~ix) FEATURE:
(A) NAME/KEY: CDS
(B) LOCATION: 29..1204 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:4:
GAATTCGGCA CGAGTGCAGG CGCGCGTC ATG GCT GCT TTG AGA CAG CCC CAG
Met Ala Ala Leu Arg Gln Pro Gln GTC GCG GAG CTG CTG GCC GAG GCC CGG CGA GCC TTC CGG GAG GAG TTC
Val Ala Glu Leu Leu Ala Glu Ala Arg Arg Ala Phe Arg Glu Glu Phe GGG GCC GAG CCC GAG CTG GCC GTG TCA GCG CCG GGC CGC GTC AAC CTC
Gly Ala Glu Pro Glu Leu Ala Val Ser Ala Pro Gly Arg Val Asn Leu ATC GGG GAA CAC ACG GAC TAC AAC CAG GGC CTG GTG CTG CCT ATG GCT
Ile Gly Glu His Thr Asp Tyr Asn Gln Gly Leu Val Leu Pro Met Ala CTG GAG CTC ATG ACG GTG CTG GTG GGC AGC CCC CGC AAG GAT GGG CTG
Leu Glu Leu Met Thr Val Leu Val Gly Ser Pro Arg Lys Asp Gly Leu GTG TCT CTC CTC ACC ACC TCT GAG GGT GCC GAT GAG CCC CAG CGG CTG
Val Ser Leu Leu Thr Thr Ser Glu Gly Ala Asp Glu Pro Gln Arg Leu CAG TTT CCA CTG CCC ACA GCC CAG CGC TCG CTG GAG CCT GGG ACT CCT
Gln Phe Pro Leu Pro Thr Ala Gln Arg Ser Leu Glu Pro Gly Thr Pro 22ui~ss3 CGG TGG GCC AAC TAT GTC AAG GGA GTG ATT CAG TAC TAC CCA GCT GCC
Arg Trp Ala Asn Tyr Yal Lys Gly Val Ile Gln Tyr Tyr Pro Ala Ala CCC CTC CCT GGC TTC AGT GCA GTG GTG GTC AGC TCA GTG CCC CTG GGG
Pro Leu Pro Gly Phe Ser Ala Val Val Val Ser Ser Val Pro Leu Gly GGT GGC CTG TCC AGC TCA GCA TCC TTG GAA GTG GCC ACG TAC ACC TTC
Gly Gly Leu Ser Ser Ser Ala Ser Leu Glu Val Ala Thr Tyr Thr Phe CTC CAG CAG CTC TGT CCA GAC TCG GGC ACA ATA GCT GCC CGC GCC CAG
Leu Gln Gln Leu Cys Pro Asp Ser Gly Thr Ile Ala Ala Arg Ala Gln GTG TGT CAG CAG GCC GAG CAC AGC TTC GCA GGG ATG CCC TGT GGC ATC
Val Cys Gln Gln Ala Glu His Ser Phe Ala Gly Met Pro Cys Gly Ile ATG GAC CAG TTC ATC TCA CTT ATG GGA CAG AAA GGC CAC GCG CTG CTC
Met Asp Gln Phe Ile Ser Leu Met Gly Gln Lys Gly His Ala Leu Leu ATT GAC TGC AGG TCC TTG GAG ACC AGC CTG GTG CCA CTC TCG GAC CCC
Ile Asp Cys Arg Ser Leu Glu Thr Ser Leu Val Pro Leu Ser Asp Pro AAG CTG GCC GTG CTC ATC ACC AAC TCT AAT GTC CGC CAC TCC CTG GCC
Lys Leu Ala Val Leu Ile Thr Asn Ser Asn Val Arg His Ser Leu Ala ~2U0583 W 096/09374 PCTrUS95/06743 TCC AGC GAG TAC CCT GTG CGG CGG CGC CAA TGT GAA GAA GTG GCC CGG
Ser Ser Glu Tyr Pro Val Arg Arg Arg Gln Cys Glu Glu Val Ala Arg GCG CTG GGC AAG GAA AGC CTC CGG GAG GTA CAA CTG GAA GAG CTA GAG
Ala Leu Gly Lys Glu Ser Leu Arg Glu Val Gln Leu Glu Glu Leu Glu GCT GCC AGG GAC CTG GTG AGC AAA GAG GGC TTC CGG CGG GCC CGG CAC
Ala Ala Arg Asp Leu Val Ser Lys Glu Gly Phe Arg Arg Ala Arg His GTG GTG GGG GAG ATT CGG CGC ACG GCC CAG GCA GCG GCC GCC CTG AGA
Val Val Gly Glu Ile Arg Arg Thr Ala Gln Ala Ala Ala Ala Leu Arg CGT GGC GAC TAC AGA GCC TTT GGC CGC CTC ATG GTG GAG AGC CAC CGC
Arg Gly Asp Tyr Arg Ala Phe Gly Arg Leu Met Val Glu Ser His Arg TCA CTC AGA GAC GAC TAT GAG GTG AGC TGC CCA GAG CTG GAC CAG CTG
Ser Leu Arg Asp Asp Tyr Glu Val Ser Cys Pro Glu Leu Asp Gln Leu GTG GAG GCT GCG CTT GCT GTG CCT GGG GTT TAT GGC AGC CGC ATG ACG
Val Glu Ala Ala Leu Ala Val Pro Gly Val Tyr Gly Ser Arg Met Thr GGC GGT GGC TTC GGT GGC TGC ACG GTG ACA CTG CTG GAG GCC TCC GCT
Gly Gly Gly Phe Gly Gly Cys Thr Val Thr Leu Leu Glu Ala Ser Ala 22u~583 W O 96/09374 PCTrUS95/06743 ._ GCT CCC CAC GCC ATG CGG CAC ATC CAG GAG CAC TAC GGC GGG ACT GCC
Ala Pro His Ala Met Arg His Ile Gln Glu His Tyr Gly Gly Thr Ala ACC TTC TAC CTC TCT CAA GCA GCC GAT GGA GCC AAG GTG CTG TGC TTG
Thr Phe Tyr Leu Ser Gln Ala Ala Asp Gly Ala Lys Val Leu Cys Leu TGAGGCACCC CCAGGACAGC ACACGGTGAG GGTGCGGGGC CTGCAGGCCA GTCCCACGGC
AAAAAAAAAA Ai~UU~AAAC TCGAG
(2) INFORMATION FOR SEQ ID NO:5:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1349 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: double (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(lx) FEATURE:
(A) NAME/~EY: CDS
(B) LOCATION: 29..1204 W O 96/09374 2 2 ~ u 3 PCTrUS95/06743 ~xi) SEQUENCE DESCRIPTION: SEQ ID NO:5:
GAATTCGGCA CGAGTGCAGG CGCGCGTC ATG GCT GCT TTG AGA CAG CCC CAG
Met Ala Ala Leu Arg Gln Pro Gln GTC GCG GAG CTG CTG GCC GAG GCC CGG CGA GCC TTC CGG GAG GAG TTC
Val Ala Glu Leu Leu Ala Glu Ala Arg Arg Ala Phe Arg Glu Glu Phe GGG GCC GAG CCC GAG CTG GCC ATG TCA GCG CCG GGC CGC GTC AAC CTC
Gly Ala Glu Pro Glu Leu Ala Met Ser Ala Pro Gly Arg Val Asn Leu ATC GGG GAA CAC ACG GAC TAC AAC CAG GGC CTG GTG CTG CCT ATG GCT
Ile Gly Glu His Thr Asp Tyr Asn Gln Gly Leu Val Leu Pro Met Ala CTG GAG CTC ATG ACG GTG CTG GTG GGC AGC CCC CGC AAG GAT GGG CTG
Leu Glu Leu Met Thr Val Leu Val Gly Ser Pro Arg Lys Asp Gly Leu GTG TCT CTC CTC ACC ACC TCT GAG GGT GCC GAT GAG CCC CAG CGG CTG
Val Ser Leu Leu Thr Thr Ser Glu Gly Ala Asp Glu Pro Gln Arg Leu CAG TTT CCA CTG CCC ACA GCC CAG CGC TCG CTG GAG CCT GGG ACT CCT
Gln Phe Pro Leu Pro Thr Ala Gln Arg Ser Leu Glu Pro Gly Thr Pro CGG TGG GCC AAC TAT GTC AAG GGA GTG ATT CAG TAC TAC CCA GCT GCC
Arg Trp Ala Asn Tyr Val Lys Gly Val Ile Gln Tyr Tyr Pro Ala Ala WO 96/09374 2 2 0 ~ 5 8 3 PCT/US95/06743 CCC CTC CCT GGC TTC AGT GCA GTG GTG GTC AGC TCA GTG CCC CTG GGG
Pro Leu Pro Gly Phe Ser Ala Val Val Val Ser Ser Val Pro Leu Gly GGT GGC CTG TCC AGC TCA GCA TCC TTG GAA GTG GCC ACG TAC ACC TTC
Gly Gly Leu Ser Ser Ser Ala Ser Leu Glu Val Ala Thr Tyr Thr Phe CTC CAG CAG CTC TGT CCA GAC TCG GGC ACA ATA GCT GCC CGC GCC CAG
Leu Gln Gln Leu Cys Pro Asp Ser Gly Thr Ile Ala Ala Arg Ala Gln GTG TGT CAG CAG GCC GAG CAC AGC TTC GCA GGG ATG CCC TGT GGC ATC
Val Cys Gln Gln Ala Glu His Ser Phe Ala Gly Met Pro Cys Gly Ile ATG GAC CAG TTC ATC TCA CTT ATG GGA CAG AAA GGC CAC GCG CTG CTC
Met Asp Gln Phe Ile Ser Leu Met Gly Gln Lys Gly His Ala Leu Leu ATT GAC TGC AGG TCC TTG GAG ACC AGC CTG GTG CCA CTC TCG GAC CCC
Ile Asp Cys Arg Ser Leu Glu Thr Ser Leu Val Pro Leu Ser Asp Pro AAG CTG GCC GTG CTC ATC ACC AAC TCT AAT GTC CGC CAC TCC CTG GCC
Lys Leu Ala Val Leu Ile Thr Asn Ser Asn Val Arg His Ser Leu Ala TCC AGC GAG TAC CCT GTG CGG CGG CGC CAA TGT GAA GAA GTG GCC CGG
Ser Ser Glu Tyr Pro Val Arg Arg Arg Gln Cys Glu Glu Val Ala Arg WO 96l09374 2 2 ~ U ~ 8 3 PCTIUS95/06743 GCG CTG GGC AAG GAA AGC CTC CGG GAG GTA CAA CTG GAA GAG CTA GAG
Ala Leu Gly Lys Glu Ser Leu Arg Glu Val Gln Leu Glu Glu Leu Glu GCT GCC AGG GAC CTG GTG AGC AAA GAG GGC TTC CGG CGG GCC CGG CAC
Ala Ala Arg Asp Leu Val Ser Lys Glu Gly Phe Arg Arg Ala Arg His GTG GTG GGG GAG ATT CGG CGC ACG GCC CAG GCA GCG GCC GCC CTG AGA
Val Val Gly Glu Ile Arg Arg Thr Ala Gln Ala Ala Ala Ala Leu Arg CGT GGC GAC TAC AGA GCC TTT GGC CGC CTC ATG GTG GAG AGC CAC CGC
Arg Gly Asp Tyr Arg Ala Phe Gly Arg Leu Met Val Glu Ser His Arg TCA CTC AGA GAC GAC TAT GAG GTG AGC TGC CCA GAG CTG GAC CAG CTG
Ser Leu Arg Asp Asp Tyr Glu Val Ser Cys Pro Glu Leu Asp Gln Leu GTG GAG GCT GCG CTT GCT GTG CCT GGG GTT TAT GGC AGC CGC ATG ACG
Val Glu Ala Ala Leu Ala Val Pro Gly Val Tyr Gly Ser Arg Met Thr GGC GGT GGC TTC GGT GGC TGC ACG GTG ACA CTG CTG GAG GCC TCC GCT
Gly Gly Gly Phe Gly Gly Cys Thr Val Thr Leu Leu Glu Ala Ser Ala GCT CCC CAC GCC ATG CGG CAC ATC CAG GAG CAC TAC GGC GGG ACT GCC
Ala Pro His Ala Met Arg His Ile Gln Glu His Tyr Gly Gly Thr Ala 220~583 ._ ACC TTC TAC CTC TCT CAA GCA GCC GAT GGA GCC AAG GTG CTG TGC TTG
Thr Phe Tyr Leu Ser Gln Ala Ala Asp Gly Ala Lys Val Leu Cys Leu TGAGGCACCC CCAGGACAGC ACACGGTGAG GGTGCGGGGC CTGCAGGCCA GTCCCACGGC
TCTGTGCCCG GTGCCATCTT CCATATCCGG GTGCTCAATA AACTTGTGCC TCCAATGTGG
AAA~AA~AAA AAAAAAAAAC TCGAG
(2) INFORMATION FOR SEQ ID NO:6:
(i) SEQUENCE CHARACTERISTICS:
(A~ LENGTH: 1349 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: double (D) TOPOLOGY: linear (li) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/~EY: CDS
(B) LOCATION: 29..265 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6:
GAATTCGGCA CGAGTGCAGG CGCGCGTC ATG GCT GCT TTG AGA CAG CCC CAG
Met Ala Ala Leu Arg Gln Pro Gln W O 96/09374 2 2 ~ u ~ 8 3 PCT~US95106743 GTC GCG GAG CTG CTG GCC GAG GCC CGG CGA GCC TTC CGG GAG GAG TTC
Val Ala Glu Leu Leu Ala Glu Ala Arg Arg Ala Phe Arg Glu Glu Phe Gly Ala Glu Pro Glu Leu Ala Val Ser Ala Pro Gly Arg Val Asn Leu ATC GGG GAA CAC ACG GAC TAC AAC CAG GGC CTG GTG CTG CCT ATG GCT
Ile Gly Glu His Thr Asp Tyr Asn Gln Gly Leu Val Leu Pro Met Ala CTG GAG CTC ATG ACG GTG CTG GTG GGC AGC CCC CGC AAG GAT GGG CTG
Leu Glu Leu Met Thr Val Leu Val Gly Ser Pro Arg Lys Asp Gly Leu GTG TCT CTC CTC ACC ACC TCT TAGGGTGCCG ATGAGCCCCA GCGGCTGCAG
Val Ser Leu Leu Thr Thr Ser TTTCCACTGC CCACAGCCCA GCGCTCGCTG GAGCCTGGGA CTCCTCGGTG GGCCAACTAT
GTCAAGGGAG TGATTCAGTA CTACCCAGCT GCCCCCCTCC CTGGCTTCAG TGCAGTGGTG
GTCAGCTCAG TGCCCCTGGG GGGTGGCCTG TCCAGCTCAG CATCCTTGGA AGTGGCCACG
TACACCTTCC TCCAGCAGCT CTGTCCAGAC TCGGGCACAA TAGCTGCCCG CGCCCAGGTG
TGTCAGCAGG CCGAGCACAG CTTCGCAGGG ATGCCCTGTG GCATCATGGA CCAGTTCATC
W O 96/09374 2 2 U ~ S 8 3 PC~rrUS95/06743 5 TCACTTATGG GACA~AAAGG CCACGCGCTG CTCATTGACT GCAGGTCCTT G~AGACCAGC
CTGGTGCCAC TCTCGGACCC CAAGCTGGCC GTGCTCATCA CCAACTCTAA TGTCCGCCAC
~CCClGGCCT CCAGCGAGTA CCCTGTGCGG CGGCGCCAAT GTGAAGAAGT GGCCCGGGCG
CTGGGCAAGG AAAGCCTCCG GGAGGTACAA CTGGAAGAGC TAGAGGCTGC CAGGGACCTG
GT~.AG~AAAG AGGGCTTCCG GCGGGCCCGG CACGTGGTGG GGGAGATTCG GCGCACGGCC
CAGGr-AGCGG CCGCCCTGAG ACGTGGCGAC TACAGAGCCT TTGGCCGCCT CATGGTGGAG
AGCCACCGCT CACTCAGAGA CGACTATGAG GTGAGCTGCC CAGAGCTGGA CCAGCTGGTG
GAGGCTGCGC TTGCTGTGCC TGGGGTTTAT GGCAGCCGCA TGACGGGCGG TGGCTTCGGT
GGCTGCACGG TGACACTGCT GGAGGCCTCC GCTGCTCCCC ACGCCATGCG GCACATCCAG
GAGCACTACG GCGGGACTGC CACCTTCTAC CTCTCTCAAG CAGCCGATGG AGCCAAGGTG
CTGTGCTTGT GAGGCACCCC CAGGACAGCA CACGGTGAGG GTGCGGGGCC TGCAGGCCAG
TCCCACGGCT CTGTGCCCGG TGCCATCTTC CATATCCGGG TGCTCAATAA ACTTGTGCCT
CCAATGTGGA AA~LULaAa AAAAAAAACT CGAG
.2~u~3 ~2) INFORMATION FOR SEQ ID NO:7:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 7676 base pairs (B) TYPE: nucleic acid 0 (C) STRANDEDNESS: double (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA ~genomic) ~xi) SEQUENCE DESCRIPTION: SEQ ID NO:7:
CCGAGCATCC CGCGCCGACG GGTCTGTGCC GGAGCAGCTG TGCAGAGCTG CAGGCGCGCG
TCATGGCTGC TTTGAGACAG CCCCAGGTCG CGGAGCTGCT GGCCGAGGCC CGGCGAGCCT
TCCGGGAGGA GTTCGGGGCC GAGCCCGAGC TGGCCGTGTC AGCGCCGGGC CGCGTCAACC
TCATCGGGGA ACACACGGAC TACAACCAGG GCCTGGTGCT GCCTATGGTG AGGGGCTGCA
CGGGGAGCCC CTAGCCCGCC GCCGCCTGTC CCGGTCGCCG AGGAGGGCGG GCCTCGGGGA
CGCTGGGGGC GACl~lTCC CGCGGGAGAT GTGGGGCGGG CAGCTGCGCC TGGAGCACCG
GTGCACGGAA GAGTCCCCGG GACAGGCTGT TCCCCACGTT GGAAGGGAGG AAGCGAAGAA
GTGGTCCCCA GAGGGTGCGC GGCCGCCTCT TGGCTCAAGC CCGCCCTCTG GGGGCTGGGG
W 096/09374 ~ 2 0 ~ 5 8 3 PCT~US95/06743 ClCC~CGC~l TCAACCTGGG AGCATGTTCC CCTTAAACTG TGAGGCCCTG TGTGCCACGC
AGAAGGGGAC ACTCCGCGCC TCCGGCCACC GTGGGGCCCC AACCGCAGAC CTGGGCGAAC
GTAGCCTTCT GGCCCAGCCC GTTCAATTTA CAGAGGAGGA AACTGAGGCC TAGAGAGGCC
CAGTGAACTG CTGGAGGTCA CACAGCAGGT TCTTGGCGGG GCTGCGACTT GGGAGTGAGG
ACTCCCAGCT TTCAGCGGGG GGCGCTTTCC GCCCCATCTG CAGCTTGGGG AGTGCACAGG
TACAGGATGT CCAGAGCCAC CCAAAATGTA AAGGCTTTGG AGCTCCAGTG Al~lG1l-lC
CCTTTGGGCT AAGCTCTCCC CCCTTGCCCC ACAGCTCAGG GCAGAGTCCA GGTCTGTGCT
CCAGCTGCAG CCGCCCCGCC CCTGAAGACC TAAGGGGGCA GGGCTCAAGC CCCCAAGGTC
AGCTGGCCCT CAGGATCTTC CCTGCGACGC TGAACCTGGA GGTTCAGAAC CTGATGACTG
TGGAGGCATC AGAACCTCGG CTGGAGGCAG TGTCATTGGA GAGGCTTACT CCAGCTGGCG
GAAGCCTCAC GTACTGCTTG TCTCTCCTGC CAGGCTCTGG AGCTCATGAC GGTGCTGGTG
GGCAGCCCCC GCAAGGATGG GCTGGTGTCT CTCCTCACCA CCTCTGAGGG TGCCGATGAG
CCCCAGCGGC TGCAGTTTCC ACTGCCCACA GCCCAGCGCT CGCTGGAGCC TGGGACTCCT
W 0 96/09374 PCTtUS95tO6743 CGGTGGGCCA ACTATGTCAA GGGAGTGATT CAGTACTACC CAGGTATGGG GCCCAGGCCT
GAGCCAAGTC CTCACTGATA CTAGGAGTGC CACCTCACAG CCACAGAGCC CATTCATTTG
TCTGATACAC ~iGGGGAAG GCTTGTAGAG TGGAGCATCC CATTGTACAG ATGAGGAAAC
TGATGCCCCC AGAAGGTCGG GAACTTGCCC TGGGTTTCCC GTGACCTGAT TGGAGGAGCC
AGGATTTGAA CCCCAGCCTT TTTTCCCTCC AGAGCCCTAA ACCAGGAGGA CAATTAGAAG
TGTCCCAGCA ACCTCAGAGG GTGGGAAAAT GGAGGGGAGT GGGTCCCTTG GGCCAGCAGG
TTGGTGGGGT TCTTGACAAT TGAGACACAC ACCTAGAAAC AGTTGCTAGG CCGTTGCTGC
CCTTCCCGCC AGGACACCTG CCCTTCCTGT CCAATCCTCC CAGGCAGCCT CTCTTACCAT
CAC~lGi~Ci TTCCCCCTGC AGCTGCCCCC CTCCCTGGCT TCAGTGCAGT GGTGGTCAGC
TCAGTGCCCC TGGGGGGTGG CCTGTCCAGC TCAGCATCCT TGGAAGTGGC CACGTACACC
TTCCTCCAGC AGCTCTGTCC AGGTACCAGC TAGGCCCCAG CCCTGACCCA GCCCTCCTTC
CCTGAGGTCT CCAGGTGGTC CCAGCTTCTA CTATGCCTTA TGGAGGGGGT GGCAGGGAAT
CTCCCTGGAG TGTCATTGAA GCCACTGCTG CTTCCACCAG CCCTAGCCTC CCCACCTCAC
W 096l09374 ~ 2 ~ ~ ~ 8 3 PCT~US95/06743 _ CCTGTACTGC AGACTCGGGC ACAATAGCTG CCCGCGCCCA GGTGTGTCAG CAGGCCGAGC
ACAGCTTCGC AGGGATGCCC TGTGGCATCA TGGACCAGTT CATCTCACTT ATGGGACAGA
AAGGCCACGC GClGC~CATT GACTGCAGGT TGGGCTCGCT CCCCTCGTCC CCTCCCGCCC
TGCACTCAGC AGCTCCTGGG TGGAGTGTGC CCACTGCCTG GCGCAGCAAG CACACGCTTG
GCClCClCAT C-CCCCCATT GTAACTCCAC CCCAGGTCCT TGGAGACCAG CCTGGTGCCA
CTCTCGGACC CCAAGCTGGC CGTGCTCATC ACCAACTCTA ATGTCCGCCA CTCCCTGGCC
TCCAGCGAGT ACCCl~lGCG GCGGCGCCAA TGTGAAGAAG TGGCCCGGGC GCTGGGCAAG
GAAAGCCTCC GGGAGGTACA ACTGGAAGAG CTAGAGGGTG AGAACTGCCA GGClGCl~lA
TCCTGGAGGC GG~lGlGClC CCTGCTGGCG CCTCAGTGTG GCCTTGACCC TGCCTGGGAC
CCCGATCTCC AGGGGCTTCT GCCATGCTCT CCCCAGTCCC TTCAAACACT GCGCACCCAG
GGTTCCAATC TCAGCAGGGG TGCTTGAAAT CCTAAAATGG TCTTATCTAA T~AG~AAAAT
CATGTTTCCA TTGTGGAAAA TGTAGAAAAG TACAAAGTAG AAAATAATAA GCTATAAGGG
CACTACCCAG AGATAGGCAC TGCTGACATT TTCACGTTTC CTTTCAGTAT TTTTCCACAT
W 096/09374 ~ 2 ~ U 5 8 3 PCT~US95/06743 ~lGlC.lCAA AGCTGAGTAT ATGTAATATA TCATCACTTT CCCCCCCCAC CCCC
TTAAGAGGCA GGGlClCATT CTGTTGCCCA AGCTGGAGTG TAGTGGTGTG ATCATAGCTT
ACTGCAAACT TGAACTCTTG AGCTCAAGGG ATCCTCCCAG CTCAGCCTTC CAAGTAGCTG
AGATTACAGG TGTGCCACCA TGCCCGGCTA ATTTTTATCT TCGTAAAGAC GGCCTTGTAG
TGTTGCCCAG GATGATCCTG AACTCTGGCC TCAAGAGGTC CTCCTGCCTT GGGCTCCCAA
AGTGTTGGGA TTATAGGCAT GAGCCACTGC GGCCAGCCCA TTTGCCGTGT lllllllllG
GACACA~AGT TTCGGTCTTG TCACCCATGC TGGAGTGCAA TGGTGCGATC TCAGCTCACT
GTAACCTCTG CCTCCCGGGT TCAAGTGATT CTCCTGCCTC AGCCTCCCGA GTAGCTGGGA
CTACAGGCGC CCGCCACTAC GCCTGGCACA TTTTTTATAG TTCTAGTAGA GACTGGGGTT
TCACCATGTT GGCCAGGCTG GTCTCAAACG CCTGACCTCA GGTGATCCTC CCGCCTCAGC
CTTCCAAAGT GCTGGGATTA CAGGCGTGAG CCATAGTGCC GGTCTCTTTT
TTAAACTAAA CATAATCTCA GAACCCAGAA CCCTATCTTA TCTTATGCCA TGAAAGGCAT
ATCTCGGCGT GGCICl~ lllll CIllTllIll GGGCGAGGTG GAGGCTTGCC
W0 96/09374 2 2 0 ~J; ~, 3 PCTrUS95/06743 ~ lGCCCA GGCTGGAGTG CAGCGGCGCA ATCTCGGTTC ACTGCATCCT CCACCICClG
GGTCCAAATG ATCClCCrGC CTTAGCTTCC TGAGTAGGTG GGATTACTGG AACCCACCAC
CACGCCCAGC CAATTTTTAT ATTTTTAGTA GAGACGGGGT TTCATGTTGG CCAGGCTGGC
CTCGAACTCC TGACCTCGTG ATCTGCCCGC CTCAGCCTCC CAATGTGCTA GGATTACATG
TGTGAGCCAC TGCACCTGGC CTCCGTGTGG CTCTTTAAAG CTCCACAATA TTTTAGCATT
CAGGTGCTCT GTCATTTACT TAACTATTTT CTGATACACC TCACACTGCG ATTAACTTTC
CTTATTTATC TTTTTTATTA TTTATTTATT TATTTATTTG AGACAGAGTC TTGCTCTGTC
ACCCAGGCTG GAGTGCAGTG GCACGATCTC GGCTCACTGC AACCTCTGCC TCCCAGGTTC
AAGTGATTCT CCTGCCTCAG CCTCCTGAGT AGCTAGGATT AGAGGCATGT GCCACCACAC
CTGGCTAATC TTCGTATTTT TAGCAGAGAT GAGGTTTTAC CATGTTGGTC GGGCTGGTCG
TGAACCACTG TGCCTGGCCA lClIl,rlAT TTTTTAAAGA GATGGGTTCT GCTAAGTTGC
CCAGGCTGGA CCTGAACTCT TGGGCTCAAG TAATCTTCTC ACCTAGTCTC CTGGGTAGCT
W O 96/09374 2 2 ~ ~ S 8 3 PCTrUS95/06743 GCAACCAAAG GCACCCGGTT TATCTGCATT ClC~ ll TCTTTGAGAC TGAG~CllGC
TCTGTAGCCC AGGCTGGAGC GCAGTGGCGT GATCTCGGCT CACTGCAACC TCCGTCTTCA
GGGTTCAAGC AAll~lCC-G CCTCAGCCTC TGGAGTGGCT GGGACTACAG GCGTGTGCCA
CCAGAGCGAG TTAATTTTTT .llllllllG TATTTTTAGT GGACACTGGG TTTCACTATA
TTGGCCAGGC TGG1Cll'GGA CTCCTGACCT CAAGTGATCC GCCTGCCTTG GCCTCCCAAA
20 ClGClGGGAT TACAGGCACA GGCGTGAGCC ACTACACCTG GCCTATCTGC A'll~lCllAA
TA6111C'llA GAAATGGATT CTTAGGAGTA GGATTACAGA GTCAAGA~.AC ACAAGTTTTG
TAGGCTGGGT GCGGlGGClC ACGTCTGTGC CTGTAATCCC AGTACTTTAG GAGGCCAAGG
TGGGCAGATT CATTGAGCTC AGGAATTCGA GACCAGCCTG GGCAACATGG CAAAACCCCA
TCTCTAAAGA AATACAAAAA TTAGCCAGGT GTGGTGGTGT GTGCCTGTAG TCCTAGCTAC
TTAGGAGGCT GGGGTGGGAG GATCAATTGA GCCCAGGAGG TTGAGACTGC AGTGAGCTGT
GATTGCACCA TGGCACTCCA GCCTGGGCCT CAAAGTGAGA TCCTGTCTCC AAAA~.~AAAA
AGATACAAGT ATCCTTAAGG CTCCTGCTAC ACATGGCCAG GAAGGTAGTC TATTGGACAG
W 096/09374 2 2 U ~ 5 8 3 PCTAUS95/06743 TTTTAAGGTC ATTATCAATA TTAGCTCATT TAATTCCCTC CAAAACTCTG TAAAGCACAT
TCTGCTACCA TAGTTGTCAT ATTTTTGATG GGGGAATCTA CAGTGAGAGG CAGTGCTGGG
ATCTGAACCC CATCTGGACA GATTAGCTCC AGGGCCCATG CTCTTGACTG GCTGGCCGCG
CTGCCCACAC TGAGTTGTTC CTTCCTGGCA GGGTAGGTGT GCCTATCTCA GGGACACTAG
ACAGCTCCGA GGGACCTCCC TGTCClll~C ClllGlGAAC TGTGTCACGT TCTCCAGAGC
AGGGCTCAGA CCTGCCCTGC CTGCl~lGlG CAGATGCCCT TGGCCAAGGT TTTCACACTG
CAAGTTG GTCCCTCCTC CCCACCCCAG CCTGTCCTTG GCCCTCCTCC AGGlC1C~l L
CTG~ATAGGA GCAGCTCACC CTGCCTCCTC CAGAGTCCTG CCCTAGAAGC GCAATCCCTC
TCCTTCCATC CCCTGCCTGG CTGCCTGGCT CCTTCCCTCA GCCTCCAAGA CATGCTCAGT
TTTCTTCCCT CCTAA~ACAC CACCCACTGT CTCATTTCCA TTCATTTCTT lC111~11iC
111C11illl llllllGAGA GGGAGCCTCA CTCTGTCACC CAGGCTGAAG TGCAGTGGCA
TGATCTCCAC TCACTGCAAC CTCCGCCTCC CAGGTTCAAG CAATTCTCCT GCCTCAGCCT
CCTGAGTAGC TGGGATTACA GGCGCCTGCC ACGATGCCCG GCTAACTTTT GTATTTTTAG
W 096/09374 '~ ~ U ~ ~ ~ 3 PCTrUS95/06743 TAGA~ACGGG G1l~CGCCAT GTTGGCCAGG CTGGTCTCGA GCTCCTGACC TCAGGCAATC
TGCCTGCCTC AGCTTCCCAA AGTGCTGGGA TTACAGGTGT GAGCCACCGC GCCCACCCAT
TCATTTCTCA GlCCl~GAA TCTACTTGCC CCTCCATCCC GCCATGCCAC CTACCCTAAC
AACCTTCCCC CTTAAACCTG CGGG~llGGC CGGGCGCAGT ACACTGAGTC AGTACTGGTA
CTGACCCAGG TACCCCTCCA GCCTCAGCTC CAGTCAGATG GGACAGCCTG ClGGICCClG
GClGCll~lG CCCCClCllC TGGAGCCCCA GCCCTGGAGG CTCCATGTGG CTCAGCAGAA
Cll~ CC TCCTGCTCTG TGGTGGCCTC TTGAGGGCAG CACTCACCTT GGAAAGCATG
GA~l~lllCA ACCCTCACTG CTCCCTGAAG GACCAAGGTG TCCCATTTTA CAGTCGGGGG
AGGAGGCACT GTGATAAAGG GGCTCTTCAG ACCCACGTCT GAGAGAGCCA GGCTGCGCCG
CCCCCGCGGC CTTCCACCCT TCACCGTCCA GCCAGGGCCA CTGCCATCAC CGCCTGCTGG
TCCTCACAGG CGTCGGGGCC CCAGGCAGTG AGAAGGCGGC TGCTGACTCC TCTTTCCTCC
CCAGCTGCCA GGGACCTGGT GAGCAAAGAG GGCTTCCGGC GGGCCCGGCA CGTGGTGGGG
GAGATTCGGC GCACGGCCCA GGCAGCGGCC GCCCTGAGAC GTGGCGACTA CAGAGCCTTT
2 2 0 0 5 ~ 3 PCT/USg5/06743 _ S GGCCGCCTCA TGGTGGAGAG CCACCGCTCA CTCAGGTGAG GCCCTCTGGG CGCCCCGCTC
CTGCCGGGCA CAGGCCGGCC CAGGCCCACC CCTTCAATAT CCTCTCTGCA G~CGACTA
TGAGGTGAGC TGCCCAGAGC TGGACCAGCT GGTGGAGGCT GCGCTTGCTG TGCCTGGGGT
TTATGGCAGC CGCATGACGG GCGGTGGCTT CGGTGGCTGC ACGGTGACAC TGCTGGAGGC
CTCCGCTGCT CCCCACGCCA TGCGGCACAT CCAGGTGGGC GGGCACCAGG GCCTGGGCGG
GCAGGAGCGG CAGCTTCCCG GGGCCCTGCC ACTCACCCCC AGCCCGCCTC TTACAGGAGC
ACTACGGCGG GACTGCCACC TTCTACCTCT CTCAAGCAGC CGATGGAGCC AAGGTGCTGT
GCTTGTGAGG CACCCCCAGG ACAGCACACG GTGAGGGTGC GGGGCCTGCA GGCCAGTCCC
ACGGCTCTGT GCCCGGTGCC ATCTTCCATA TCCGGGTGCT CAATAAACTT GTGCCTCCAA
TGTGGTACCT GCCTCCTCTA GAGGTGGGTG TATGCTTGGG TGTCAGAGAA TGGGGGATGT
CAGAACCGCT CCCCTACCCT AGGGGAGCAC CTCTCAGGCC CCAGAAGAAT GGGCAAGGCA
GGGCCTAGCA GTAGCAAAAC CATTTATTAA GTGCAGAACA AAGGCTGGGT CCTTGTGCTG
CTCCCAGCTC TTTGGTTACA AATAGGTTTG GGCCCACAGA GGACGGACCT TGCCCCCTTC
~32U~83 W 096/09374 PCTrUS95/06743 ATGCCTCCCA GGAGACACCT AGCCCCTGCT CTGTGCATGC GGGTGGGCTG GGCCCCCAGG
GGTGCAAGGA TGGAGTAGCT GAGGAGGCTC CGGGAGAGGA GTCGGGAGGA CGCCTAGTGG
GACATTGCGG GGGTGGCGCA GGGTGCGGTC AAGTTTGGAA GAAACTGTTG GGTCCA
(2) INFORMATION FOR SEQ ID NO:8:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid ~C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:8:
AGCCTTCCGG GAGGAGTTCG G
(2) INFORMATION FOR SEQ ID NO:9:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acld (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) W 096/09374 ~ 2 ll ~ 5 8 3 PCTrUS95/06743 _ (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9:
CTGGTTGTAG TCCGTGTGTT C
(2) INFORMATION FOR SEQ ID NO:10:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:10:
GCCAGCAGCT CCGCGACCTG G
(2) INFORMATION FOR SEQ ID NO:11:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:11:
GCTTCCTCCC TTCCAACGTG G
~2UV~83 W 096/09374 - PCTrUS95/06743 t2) INFORMATION FOR SEQ ID NO:12:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single tD) TOPOLOGY: linear (li) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:12:
CCCAGGCTCC AGCGAGCGCT G
(2) INFORMATION FOR SEQ ID NO:13:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:13:
ACCTCTGAGG GTGCCGATGA G
(2) INFORMATION FOR SEQ ID NO:14:
(i) SEQUENCE CHARACTERISTICS:
W O 96/09374 ~ 2 0 ~ 5 8 3 PCTrUS95/06743 .~_ (A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA ~genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:14:
CCCACAGCTC AGGGCAGAGT C
20 (2) INFORMATION FOR SEQ ID NO:15:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (c) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:15:
(2) INFORMATION FOR SEQ ID NO:16:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear ~20iJ~3 ~ii) MOLECULE TYPE: DNA ~genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:16:
GATGAACTGG TCCATGATGC C
(2) INFORMATION FOR SEQ ID NO:17:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:17:
AGGGGCACTG AGCTGACCAC C
(2) INFORMATION FOR SEQ ID NO:18:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B~ TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) -W 096/09374 ~ 2 ~) ~J 5 8 3 PCT~US95/06743 _ (xi) SEQUENCE DESCRIPTION: SEQ ID NO:18:
CACTTCTACA CATTGGCGCC G
(2) INFORMATION FOR SEQ ID NO:19:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D~ TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:19:
CTTCGCAGGG ATGCCCTGTG G
(2) INFORMATION FOR SEQ ID NO:20:
(1) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:20:
~ 220ui83 W 096/09374 PCTrUS95/06743 TCATCACCAA CTCTAATGTC C
(2) INFORMATION FOR SEQ ID NO:21:
~i) SEQUENCE CHARACTERISTICS:
~A) LENGTH: 21 base pairs ~B) TYPE: nucleic acid ~C) STRANDEDNESS: single (D) TOPOLOGY: llnear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:21:
TGTCAGCAGT GCCTATCTCT G
(2) INFORMATION FOR SEQ ID NO:22:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid ~C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:22:
AGCAGCGGAG GCCTCCAGCA G
(2) INFORMATION FOR SEQ ID NO:23:
~2:)~583 ~i) SEQUENCE CHARACTERISTICS:
~A) LENGTH: 21 base pairs ~B) TYPE: nucleic acid ~C) STRANDEDNESS: single ~D) TOPOLOGY: linear ~ii) MOLECULE TYPE: DNA (genomic) ~xi) SEQUENCE DESCRIPTION: SEQ ID NO:23:
CCTCACCGTG TGCTGTCCTG G
(2) INFORMATION FOR SEQ ID NO:24:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:24:
GGCTGCGCTT GCTGTGCCTG G
(2) INFORMATION FOR SEQ ID NO:25:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid 2~5~3 (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:25:
CCTCACCGTG TGCTGTCCTG G
(2) INFORMATION FOR SEQ ID NO:26:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:26:
CCTCACCGTG TGCTGTCCTG G
(2) INFORMATION FOR SEQ ID NO:27:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acld (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) W O 96t09374 PCTnUS95/06743 -s (xi) SEQUENCE DESCRIPTION: SEQ ID NO:27:
GCGGGACTGC CACCTTCTAC C
(2) INFORMATION FOR SEQ ID NO:28:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:28:
CTCAATAAAC TTGTGCCTCC A
(2) INFORMATION FOR SEQ ID NO:29:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 23 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLO~Y: linear (ii) MOLECULE TYPE: DNA (genomic) 220U~3 W 096/09374 PCTrUS95/06743 ~xi) SEQUENCE DESCRIPTION: SEQ ID NO:29:
CGGATATGGA AGATGGCACC GGG
(2) INFORMATION FOR SEQ ID NO:30:
(1) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 22 base pairs ~B) TYeE: nuclelc acld (C) STRANDEDNESS: single (D) TOPOLOGY: llnear (11) MOLECULE TYPE: DNA (genomic) (xl) SEQUENCE DESCRIPTION: SEQ ID NO:30:
AGAGCTGCAG GCGCGCGTCA TG
(2) INFORMATION FOR SEQ ID NO:31:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 19 base pairs (B) TYPE: nucleic acld (C) STRANDEDNESS: slllgle (D) TOPOLOGY: linear (il) MOLECULE TYPE: DNA (genomlc) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:31:
CCGAGCATCC CGCGCCGAC
W 096/09374 2 2 ~ U 5 8 3 PCTrUS95/06743 (2) INFORMATION FOR SEQ ID NO:32:
~i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 20 base pairs ~B) TYPE: nucleic acid ~C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic~
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:32:
CAGCTGCCCG CCCCACATCT
Claims (21)
1. An isolated nucleic acid molecule encoding human genomic galactokinase, said nucleic acid molecule selected from the group consisting of:
(a) a nucleic acid molecule comprising the sequence as set forth in SEQ ID
NO:7; and (b) a nucleic acid molecule differing from the nucleic acid molecule of (a) in codon sequence due to the degeneracy of the genetic code.
(a) a nucleic acid molecule comprising the sequence as set forth in SEQ ID
NO:7; and (b) a nucleic acid molecule differing from the nucleic acid molecule of (a) in codon sequence due to the degeneracy of the genetic code.
2. A vector comprising the nucleic acid molecule of claim 1.
3. A recombinant host cell comprising the vector of claim 2.
4. An isolated nucleic acid molecule comprising a DNA sequence that encodes nucleotides 29 to 1204 of SEQ ID NO:5 or nucleotides 29 to 265 of SEQ ID NO:6.
5. A vector comprising the nucleic acid molecule of claim 4.
6. The vector according to claim 5 which is a plasmid.
7. A recombinant host cell comprising the vector of claim 5.
8. A process for preparing a human galactokinase protein comprising culturing the recombinant host cell of claim 7 under conditions promoting expression of said protein and recovery thereof.
9. An isolated protein encoded by the DNA sequence of claim 4.
10. A monoclonal antibody that is specifically reactive with the protein of claim 9.
11. A method for diagnosing conditions associated with human galactokinase deficiency which comprises isolating a serum or tissue sample from an individual;
allowing such sample to come in contact with an antibody or antibody fragment which specifically binds to the human galactokinase protein of claim 9 under conditions such that an antigen-antibody complex is formed between said antibodyor antibody fragment and said galactokinase protein; and detecting the presence or absence of said complex.
allowing such sample to come in contact with an antibody or antibody fragment which specifically binds to the human galactokinase protein of claim 9 under conditions such that an antigen-antibody complex is formed between said antibodyor antibody fragment and said galactokinase protein; and detecting the presence or absence of said complex.
12. A method for diagnosing conditions associated with human galactokinase deficiency which comprises isolating a nucleic acid sample from an individual; assaying said sample and the DNA sequence, or corresponding RNA sequence, that encodes a human galactokinase gene; and comparing differences between said sample and saidDNA (or RNA) that encodes nucleotides 29 to 1204 of SEQ ID NO:4, wherein said differences indicate mutations in the human galactokinase gene.
13. The method of claim 12 wherein said sample is RNA which is subsequently amplified by PCR-RT.
14. The method of claim 13 wherein assaying said sample comprises a restriction endonuclease digestion.
15. The method of claim 14 wherein said restriction endonuclease is Msc I.
16. The method of claim 12 wherein assaying said sample comprises a hybridization assay.
17. The method of claim 16 wherein the hybridization assay is heteroduplex electrophoresis which comprises determining differential mobility of heteroduplex products in polyacrylamide gels, said heteroduplex products are the result of hybridization between the nucleic acid sample and the DNA sequence, or corresponding RNA sequence, that encodes nucleotides 29 to 1204 of SEQ ID NO:4.
18. The method of claim 12 wherein assaying said sample comprises gel electrophoresis of restriction fragment length polymorphisms of said nucleic acid sample and the DNA sequence, or corresponding RNA sequence, that encodes nucleotides 29 to 1204 of SEQ ID NO:4.
19. The method of claim 12 wherein assaying said sample comprises DNA
sequencing.
sequencing.
20. A method for diagnosing conditions associated with human galactokinase deficiency which comprises isolating cells from an individual containing genomic DNA
and assaying said sample by in situ hybridization using the DNA sequence that encodes nucleotides 29 to 1204 of SEQ ID NO:4, nucleotides 29 to 1204 of SEQ ID
NO:5, or nucleotides 29 to 265 of SEQ ID NO:6; or a fragment that encodes at least one exon of said sequence; or a fragment containing at least 15 contiguous base pairs of said sequence as a probe.
and assaying said sample by in situ hybridization using the DNA sequence that encodes nucleotides 29 to 1204 of SEQ ID NO:4, nucleotides 29 to 1204 of SEQ ID
NO:5, or nucleotides 29 to 265 of SEQ ID NO:6; or a fragment that encodes at least one exon of said sequence; or a fragment containing at least 15 contiguous base pairs of said sequence as a probe.
21. A transgenic non-human mammal capable of expresing in any cell thereof the DNA of claim 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA002200583A CA2200583A1 (en) | 1994-09-23 | 1995-05-26 | Human galactokinase gene |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
USPCT/US94/10825 | 1994-09-23 | ||
CA002200583A CA2200583A1 (en) | 1994-09-23 | 1995-05-26 | Human galactokinase gene |
Publications (1)
Publication Number | Publication Date |
---|---|
CA2200583A1 true CA2200583A1 (en) | 1996-03-28 |
Family
ID=4160211
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002200583A Abandoned CA2200583A1 (en) | 1994-09-23 | 1995-05-26 | Human galactokinase gene |
Country Status (1)
Country | Link |
---|---|
CA (1) | CA2200583A1 (en) |
-
1995
- 1995-05-26 CA CA002200583A patent/CA2200583A1/en not_active Abandoned
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8394932B2 (en) | Survival motor neurons (SMN) gene: a gene for spinal muscular atrophy | |
JP4285771B2 (en) | DB, leptin receptor, nucleic acid encoding the receptor, and uses thereof | |
EP0828003A2 (en) | Human serine protease | |
CN107602690B (en) | Pulmonary arterial hypertension related PTGIS gene mutation and application thereof | |
JPH10511936A (en) | Human somatostatin-like receptor | |
US5789223A (en) | Human galactokinase gene | |
Joutel et al. | A human homolog of bacterial acetolactate synthase genes maps within the CADASIL critical region | |
CA2388363C (en) | Dna polymerase lambda and uses thereof | |
US5830649A (en) | Human galactokinase gene | |
US20020146772A1 (en) | Methods and materials relating to novel CD39-like polypeptides | |
Ai et al. | Mouse galactokinase: isolation, characterization, and location on chromosome 11. | |
EP0783567A1 (en) | Human galactokinase gene | |
US20050032155A1 (en) | Mutation in the beta2 nicotinic acetycholine receptor subunit associated with nocturnal frontal lobe epilepsy | |
CA2200583A1 (en) | Human galactokinase gene | |
US5721113A (en) | NERF genes | |
MXPA97002205A (en) | Gene of galactocinasa hum | |
US20030022311A1 (en) | Human CIS protein | |
US5916758A (en) | Smooth muscle cell-derived migration factor | |
WO1997044347A1 (en) | Human cis protein | |
CN100478355C (en) | New human protein with mouse NIH/3T3 cell transformation improving function and its code sequence | |
JPH10127296A (en) | Ext2 gene | |
WO1997020573A1 (en) | Growth factor receptor-binding protein 2 homolog | |
Schlüter et al. | An 87 bp deletion in exon 5 of the LDL receptor gene in a mother and her son with familial hypercholesterolemia | |
US7700748B2 (en) | VMGLOM gene and its mutations causing disorders with a vascular component | |
CN1351081A (en) | Human protein with cancer cell growth suppressing function and its coding sequence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FZDE | Discontinued |