IDENTIFICATIQN OF VARIANTS IN HISTONE DEACETYLASE 1 (HDAC n TO PREDICT DRUG RESPONSE
FIELD OF THE INVENTION
[01] This invention relates generally to the analytical testing of tissue samples in vitro, and more particularly to aspects of mutations and genetic polymorphisms in genes coding for kinase receptors and associated pathways that may predict drug response and connote new drug targets.
BACKGROUND OF THE INVENTION
[02] There are many published reports on the use of denaturing high pressure liquid chromatography (DHPLC) and other methods (for example, sequencing) for screening for mutations in human tumours. Due to the increased sensitivity of the DHPLC approach, many more mutations are being identified than previously reported in the literature. [03] Based upon available public information, these types of structural DNA changes may directly or indirectly affect activity of the receptor, the downstream pathways, and alter the receptor drug binding affinity and drug effect. Knowledge of the mutations prior to drug treatment allow for prognostic and predictive biomarker and diagnostic development. The development of these mutations during drug treatment may serve as a clinical biomarker for drug resistance. Since these mutant and polymorphic targets may be functionally different from the wild-type target, they may represent new drug targets. Therefore, mutant and polymorphic forms may be used in compound screening and drug development including patient selection for drug treatment or drug dosing.
[04] Histone deacetylase 1 (HDACl) expression and function are related to various types of cancers. For example, HDACl is up-regulated in hormone refractory (HR) prostate cancer. Halkidou K et al, Prostate 59(2): 177-89 (May 1, 2004). Increased expression of HDACl mRNA and protein has been detected in gastric tumour tissues. Kim JH et al, J. Gastroenterol Hepatol. 19(2): 218-24 (February 2004); Choi JH et al, Jpn. J. Cancer Res. 92(12): 1300-4 (December 2001). BRACl, which is related breast and ovarian cancer, interacts with components of the histone deacetylase complex including HDACl . Yarden RI ■ et al, Proc. Natl Acad. Sci. USA 96(9): 4983-8 (April 27, 1999). In hepatocellular carcinoma (HCC) cells, expression of IGFBP-3, which induces apoptosis, is greatly up-regulated by
treatment with the HDACl inhibitor TSA. Gray SG et al, Int. J. MoI Med. 5(1): 33-41 (January 2000). HDACl inhibition partially reverses AMLl/ETO-mediated transcriptional repression in t(8;21) acute myeloid leukaemia (AML). Wang J et al, Cancer Res. 59(12): 2766-9 (June 15, 1999). However, little information on HDACl mutation has been reported in the literature.
[05] Accordingly, there is a need in the art for additional information about the relationship between HDACl mutations and cancers.
SUMMARY OF THE INVENTION
[06] Several previously unidentified mutations in the HDAC domain of the HDACl genes of cancer patients. Four missense mutations (M5 IL (SEQ ID NO:6), QlI lK (SEQ ID NO: 14),
T114A (SEQ ID NO:12) and V157G (SEQ ID NO:16)) and a truncation mutation (R34Term
(SEQ ID NO: 8)) were identified.
[07] Accordingly, the invention provides nucleotides that contain a sequence selected from
SEQ ID NO:6, SEQ ID NO:14, SEQ ID NO:12, SEQ ID NO:16 or SEQ ID NO:8.
[08] In another embodiment, the nucleotides encode HDACl polypeptides. In another embodiment, the invention provides the HDACl polypeptides themselves. The invention further provides vectors and organisms containing the polynucleotides or polypeptides of the invention. The invention also provides methods of using the polynucleotides, vectors, organisms and polypeptides of the invention.
[09] In still another embodiment, the invention provides for the use of an anti-cancer agent in the manufacture of a medicament for the treatment of a disease associated with a HDACl mutation in a selected patient population, wherein the patient population is selected on the basis of the genotype of the patients at a HDACl genetic locus indicative of a propensity for having a disease associated with a HDACl mutation.
[10] The yet another embodiment, the invention provides a method for treating a cancer using patient stratification. The invention also provides a method for diagnosing cancer based upon mutations in the HDACl gene.
[11] The invention also provides clinical assays, kits and reagents. The kits of the invention may contain a written product on or in the kit container. The written product describes how to use the reagents contained in the kit, e.g., to determine whether a patient has a single
nucleotide polymorphism (SNP) and/or mutation pattern indicative of the efficacy of anticancer agents in treating the patient's cancer. In several embodiments, the use of the reagents can be according to the methods of the invention, hi one embodiment, the reagent is a gene chip for determining the gene expression of relevant genes.
BRIEF DESCRIPTION OF THE DRAWINGS
[12] The drawing figures depict preferred embodiments by way of example, not by way of limitations. In the figures, like reference numerals refer to the same or similar elements. [13] FIG. 1 is a sequence alignment of human HDACl sequence with the Pfam model of HDAC domain. Mutated positions are highlighted in red.
[14] FIG. 2 is a sequence alignment of known wild-type HDACl sequences from various organisms. For mutated positions (referred to human HDACl), residue matching the wild- type human HDACl is highlighted in red, and residue matching the mutant human HDACl is in blue.
[15] FIG. 3 is a list of protein sequences showing secondary structure prediction of wild type and mutant human HDACl .
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS [16] In "in silico molecular epidemiology", gene and cancer relationship are identified from biomedical literature. Then the resulting information is used for patient stratification in biomarker studies. ~ " " " . . . .
[17] A simple scoring is used to evaluate gene and cancer type relationship. Briefly, abstracts of publications describing a gene of interest are obtained, usually electronically. If a gene symbol or its alias and a cancer type name appear in the one sentence in the abstract, it is scored 1 otherwise 0. The procedure is repeated for every gene of interest. In the end, scores for every gene are combined to generate a matrix of gene and cancer correlation, which is subject to clustering analysis. This is the initial step of mining gene/cancer relationship from literature in a high-throughput manner. More sophisticated scoring schemes are implemented to improve the accuracy' and sensitivity.
[18] In one embodiment, the scoring scheme to evaluation gene-to-gene or gene-to-cancer relationship is as following:
Score=l, if two terms appear in the same abstract
Score=2, if two terms appear in the same sentence
Score=3, if two terms appear in the same sentence and with any keyword starting with: interact, associat, correlat, or mutat
Score=4, if two terms appear in the same sentence and with any keyword starting with: inhibit, downregulat, down-regulat, deactivat, de-activat, repress, dephosphorylat, or de- phosphorylat, descrease
Score=5, if two terms appear in the same sentence and with any keyword starting with: stimulat, upregulat, up-regulat, activat, phosphorylat, enhance, increase, indue, overexpress, or over-express
If multiple conditions are valid at one time, the highest possible score is assigned. [ 19] DHPLC Mutational Analysis. DNA Mutational Analysis was performed as follows: Regions of the HDACl gene were amplified by PCR, including all HDACl exons and splice junctions. This was followed by quality control of PCR amplification by agarose gel electrophoresis. Next, WAVE® DHPLC mutation scanning (Transgenomic, Inc., Omaha, Nebraska, USA) of HDACl exons was performed. HDACl variants were confirmed by DNA sequencing.
[20] Identified HDACl mutations: Accordingly, the following mutations have been identified.
TABLE 1
Exon Mutation/SNP Allelic Unmutated Mutated
Fraction Sequence Sequence
Exon 1 G>A 59 bp 5' to heterozygous GTGCTCACCGTC GTGCTCACCGTC
ATG start GTAGTAGTAACA ATAGTAGTAACA
(SEQ ID NO: 1) (SEQ ID N0:2)
Exonl OT 40bp to 5' heterozygous GGCGGACGGACC GGTGGACGGACC exon GACTGACGG GACTGACGG
(SEQ ID N0:3) (SEQ ID N0:4)
Exon 2 ATG>CTG, 0.05 CTACCGAAAAAT CTACCGAAAACT
M51L GGAAATCTA GGAAATCTA
(SEQ ID N0:5) (SEQ ID N0:6)
Exon 2 CGA>TGA 0.07 TGAAGCCTCACC TGAAGCCTCACT
R34Term GAATCCGCAT GAATCCGCAT
(R34STOP) (SEQ ID N0:7) (SEQ ID NO:8)
Exon 3 T>C 22 bp 5' heterozygous ATAACTTGCCCT ATAACTTGCCCC intron TTCTCCCTT (SEQ TTCTCCCTT (SEQ
ID NO:9) ID NO: 10)
Exoή 3 T>C 22 bp 5' heterozygous ATAACTTGCCCT ATAACTTGCCCC intron TTCTCCCTT (SEQ TTCTCCCTT (SEQ
ID NO:) ID NO:)
Exon 4 ACT>GCT, 0.035 TCAGTTGTCTACT TCAGTTGTCTGCT
T114A GGTGGTTC (SEQ GGTGGTTC (SEQ
ID NO: 11) ID NO: 12)
Exon 4 CAG>AAG 0.25 GAGTTCTGTCTA GAGTTCTGTATA
Ql I lK GTTGTCTAC (SEQ GTTGTCTAC (SEQ
ID NO: 13) ID NO: 14)
Exon 5 GTOGGC, 0.11 AATGATATCGTC AATGATATCGGC
V157G TTGGCCATC (SEQ TTGGCCATC (SEQ
ID NO: 15) ID NO: 16)
Exon 9 OT 16 bp 5' heterozygous TAACTCAGCACC TAACTCAGCACC intron CCTTCTCCCACC CTTTCTCCCACCA
A (SEQ ID NO: 17) (SEQ ID NO: 18)
Exon 9 OT 17 bp 5' homozygous TAACTCAGCACC TAACTCAGCACC intron CCTTCTCCCACC TCTTCTCCCACCA
A (SEQ ID NO: 17) (SEQ ID N0:19)
[21] As described further in the EXAMPLE, the mutation site position 51 is close to a potential tyrosine phosphorylation site at position 54 and next to a potential SUMO-I modification site at position 50. The mutation M51L (SEQ ID N0:6) may eliminate the
phosphorylation at 54 which may potentially alter the function of HDACl. Sequence alignment analysis of known HDACl from multiple organisms indicates position 157 is highly conserved while position 114 is not. Mutation at the highly conserved site may have dramatic influence on the overall function. Analysis of the secondary structure of the mutated proteins implies that Tl 14A (SEQ ID NO: 12) potentiates more significant structural changes than other mutations.
[22] Definitions. The definitions of certain terms as used in this specification are provided below. Definitions of other terms may be found in the glossary provided by the U.S.
Department of Energy, Office of Science, Human Genome Project
(http://www.ornl. gov/sci/techresources/Human_Genome/glossary/).
[23] As used herein, the term "antibody" includes, but is not limited to, polyclonal antibodies, monoclonal antibodies, humanized or chimaeric antibodies and biologically functional antibody fragments sufficient for binding of the antibody fragment to the protein.
[24] The term "clinical response" means any or all of the following: a quantitative measure of the response, no response, and adverse response {i.e., side effects).
[25] The term "clinical trial" means any research study designed to collect clinical data on responses to a particular treatment, and includes but is not limited to phase I, phase II and phase III clinical trials. Standard methods are used to define the patient population and to enrol subjects.
[26] The phrase "disease associated with a HDACl" mutation refers to any disease or disorder arising from a mutation in at least one position a gene encoding histone deacetylase 1
(HDACl). The mutation refers to any alteration of the nucleic acid sequence encoding
HDACl that inactivates the functionality of the protein produced by that gene. Such mutations can include, but are not limited to, an amino acid substitution wherein a native amino acid is replaced with another amino acid residue. In one embodiment, the disease associated with a HDACl mutation is cancer. In a more particular embodiment, the cancer is acute myelogenous leukaemia (AML) or breast cancer.
[27] The term "effective amount" of a compound is a quantity sufficient to achieve a desired pharmacodynamic, toxicologic, therapeutic and/or prophylactic effect, for example, an amount which results in the prevention of or a decrease in the symptoms associated with a disease that is being treated, e.g., the diseases associated with HDACl mutant polypeptides and HDACl
mutant polynucleotides identified herein. The amount of compound administered to the subject will depend on the type and severity of the disease and on the characteristics of the individual, such as general health, age, sex, body weight and tolerance to drugs. It will also depend on the degree, severity and type of disease. The skilled artisan will be able to determine appropriate dosages depending on these and other factors. Typically, an effective amount of the compounds of the present invention, sufficient for achieving a therapeutic or prophylactic effect, range from about 0.000001 mg per kilogram body weight per day to about 10,000 mg per kilogram body weight per day. Preferably, the dosage ranges are from about 0.0001 mg per kilogram body weight per day to about 100 mg per kilogram body weight per day. The compounds of the present invention can also be administered in combination with each other, or with one or more additional therapeutic compounds.
[28] In several embodiments of the invention, the therapeutic compound is an anti-cancer agent or other compound, such therapeutic compound being useful in conjunction with an anti-cancer therapy. The term "anti-cancer agent" include the following: [29] Glivec® (Gleevec®; imatinib) is a medication for chronic myeloid leukaemia (CML) and certain stages of gastrointestinal stromal tumours (GIST). It targets and interferes with the molecular abnormalities that drive the growth of cancer cells. Corless CL et ah, J. Clin. Oncol. 22(18):3813-25 (September 15, 2004); Verweij J et al, Lancet 364(9440): 1127-34 (September 25, 2004); Kantarjian HM et al, Blood 104(7):1979-88 (October 1, 2004). By inhibiting multiple targets, Glivec® has potential as an anticancer therapy for several types of cancer, including leukaemia and solid tumours.
[30] The aromatase inhibitor FEM ARA® is a treatment for advanced breast cancer in postmenopausal women. It blocks the use of oestrogen by certain types of breast cancer that require oestrogen to grow. Janicke F, Breast 13 Suppl l:S10-8 (December 2004); Mouridsen H et al, Oncologist 9(5):489-96 (2004).
[31] Sandostatin® LAR® is used to treat patients with acromegaly and to control symptoms, such as severe diarrhoea and flushing, in patients with functional gastro-entero-pancreatic (GEP) tumours (e.g., metastatic carcinoid tumours and vasoactive intestinal peptide-secreting tumours [VIPomas]). Oberg K, Chemotherapy 47 Suppl 2:40-53 (2001); Raderer M et al, Oncology 60(2):141-5 (2001); Aparicio T et al, Eur. J. Cancer 37(8):1014-9 (May 2001).
Sandostatin® LAR® regulates hormones in the body to help manage diseases and their symptoms.
[32] ZOMET A® is a treatment for hypocalcaemia of malignancy (HCM)I and for the treatment of bone metastases across a broad range of tumour types. These tumours include multiple myeloma, prostrate cancer, breast cancer, lung cancer, renal cancer and other solid tumours. Rosen LS et al, Cancer 100(12):2613-21 (June 15, 2004). [33] Vatalanib (l-[4-chloroanilino]-4-[4-pyridylmethyl] phthalazine succinate) is a multi- VEGF receptor (VEGF) inhibitor that may block the creation of new blood vessels to prevent tumour growth. This compound inhibits all known VEGF receptor tyrosine kinases, blocking angiogenesis and lymphangiogenesis. Drevs J et al, Cancer Res. 60:4819-4824 (2000); Wood JM et al, Cancer Res. 60:2178-2189 (2000). Vatalanib is being studied in two large, multinational, randomized, phase III, placebo-controlled trials in combination with FOLFOX- 4 in first-line and second-line treatment of patients with metastatic colorectal cancer. Thomas A et al, 37th Annual Meeting of the American Society of Clinical Oncology, San Francisco, CA, Abstract 279 (May 12-15, 2001).
[34] The orally bioavailable rapamycin derivative everolimus inhibits oncogenic signalling in tumour cells. By blocking the mammalian target of rapamycin (mTOR)-mediated signalling, everolimus exhibits broad antiproliferative activity in tumour cell lines and animal models of cancer. Boulay A et al, Cancer Res. 64:252-261 (2004). In preclinical studies, everolimus also potently inhibited the proliferation of human umbilical vein endothelial cells directly indicating an involvement in angiogenesis. By blocking tumour cell proliferation and angiogenesis, everolimus may provide a clinical benefit to patients with cancer. Everolimus is being investigated for its antitumour properties in a number of clinical studies in patients with haematological and solid tumours. Huang S & Houghton PJ, Curr. Opin. Investig. Drugs 3:295-304 (2002).
[35] Gimatecan is a novel oral inhibitor of topoisomerase I (topo I). Gimatecan blocks cell division in cells that divide rapidly, such as cancer cells, which activates apoptosis. Preclinical data indicate that gimatecan is not a substrate for multidrug resistance pumps, and that it increases the drug-target interaction. De Cesare M et al, Cancer Res. 61 :7189-7195 (2001). Phase I clinical studies indicate that the dose-limiting toxicity of gimatecan is myelosuppression.
[36] Patupilone is a microtubule stabilizer. Altmann K-H, Curr. Opin. Chem. Biol. 5:424- 431 (2001); Altmann K-H et al, Biochim Biophys Acta. 470:M79-M91 (2000); O'Neill V et al, 36th Annual Meeting of the American Society of Clinical Oncology; May 19-23, 2000; New Orleans, LA5 Abstract 829; Calvert PM et al. Proceedings of the 11th National Cancer Institute-European Organization for Research and Treatment of Cancer/American Association for Cancer Research Symposium on New Drugs in Cancer Therapy; November 7- 10, 2000; Amsterdam, The Netherlands, Abstract 575. Patupilone blocked mitosis and . induced apoptosis greater than the frequently used anticancer drug paclitaxel. Also, patupilone retained full activity against human cancer cells that were resistant to paclitaxel and other chemotherapeutic agents.
[37] Midostaurin is an inhibitor of multiple signalling proteins. By targeting specific receptor tyrosine kinases and components of several signal transduction pathways, midostaurin impacts several targets involved in cell growth (e.g., KIT, PDGFR, PKC), leukaemic cell proliferation {e.g., FLT3), and angiogenesis {e.g., VEGFR2). Weisberg E et al. Cancer Cell 1:433-443 (2002); Fabbro D et al, Anticancer Drug Des. 15:17-28 (2000). In preclinical studies, midostaurin showed broad antiproliferative activity against various tumour cell lines, including those that were resistant to several other chemotherapeutic agents. [38] The somatostatin analogue pasireotide is a stable cyclohexapeptide with broad somatotropin release inhibiting factor (SRIF) receptor binding. Bruns C et al., Eur. J. Endocrinol. 146(5):707-16 (May 2002); Weckbecker G et al, Endocrinology 143(10):4123- 30 (October 2002); Oberg K, Chemotherapy Al Suppl 2:40-53 (2001). [39] LAQ824, a cinnamyl hydroxamate histone deacetylase inhibitor, is known to induce acetylation and inhibition of heat shock protein 90. See, Remiszewski et al, J. Med. Chem. 46(21):4609-24; Bali P et al, Clinical Cancer Research 10(15):4991-4997 (2004); Brieger A et al, Biochemical Pharmacology 68(l):85-93 (2004); Fuino L et al, Molecular Cancer Therapeutics 2(10):971-984 (2003); Qian DZ et al, Cancer Research 64(18):6626-6634 (2004).
[40] LBH589 is a histone deacetylase (HDAC) inhibitor. By blocking the deacetylase activity of HDAC, HDAC inhibitors activate gene transcription of critical genes that cause apoptosis (programmed cell death). By triggering apoptosis, LBH589 induces growth
inhibition and regression in tumour cell lines. LBH589 is being tested in phase I clinical trials as an anticancer agent. See also, George P et al, Blood 105(4): 1768-76 (February 15, 2005). [41] AEE788 inhibits multiple receptor tyrosine kinases including EGFR, HER2, and VEGFR, which stimulate tumour cell growth and angiogenesis. Traxler P et al, Cancer Res. 64:4931-4941 (2004). In preclinical studies, AEE788 showed high target specificity and demonstrated antiproliferative effects against tumour cell lines and in animal models of cancer. AEE788 also exhibited direct antiangiogenic activity. AEE788 is currently in phase I clinical development.
[42] AMNl 07 is an oral tyrosine kinase inhibitor that targets Bcr-Abl, KIT, and PDGFR. Preclinical studies have shown in cellular assays using Philadelphia chromosome-positive (Ph+) CML cells that AMNl 07 is highly potent and has high selectivity for Bcr-Abl, KIT, and PDGFR. Weisberg E et al, Cancer Cell 7(2): 129-41 (February 2005); OΗare T et al, Cancer Cell 7(2): 117-9 (February 2005). AMNl 07 also shows activity against mutated variants of Bcr-Abl. AMNl 07 is currently being studied in phase I clinical trials. [43] In one embodiment, the relevant anti-cancer agents include all HDAC inhibitors including LBH589 and LAQ824, any demethylator drugs, any epigenetic modifying agents and any drugs that affect acetylation.
[44] The term "HDAC modulating agent" is any compound that alters {e.g., increases or decreases) the expression level or biological activity level of HDACl polypeptide compared to the expression level or biological activity level of HDACl polypeptide in the absence of the HDAC modulating agent. HDAC modulating agent can be a small molecule, antibody, polypeptide, carbohydrate, lipid, nucleotide, or combination thereof. The HDAC modulating agent can be an organic compound or an inorganic compound.
[45] In a preferred embodiment, the HDAC modulating agents is a histone deacetylase inhibitors (HDAI). HDAI are described in e.g. Monneret C, European Journal of Medicinal Chemistry 40:1-13 (2005), the contents of witch is herewith incorporated by reference. HDAIs include for example sodium butyrate, phenylacetate, phenylbutyrate, valproic acid, tributyrinpivaloyloxymethyl butyrate, pivanex®, trichostatinA (TSA), trichostatin C, trapoxins A and B, depudecin, cyclic hydroxamic-acid containing peptide (CHAPs), apicidin or OSI-2040, suberoylanilide hydroxamic acid (SAHA), oxamflatindepsipeptide, FK228, scriptaid, biarylhydroxamate inhibitor, A-161906, JNJ16241199, PDX 101, MS-275, CI-994.
The structure and synthesis of HDAIs are known in the art and can for instance be found in
Monneret C5 European Journal of Medicinal Chemistry ' 40:1—13 (2005) and references therein.
[46] HDAI compounds of particular interest are hydroxamate compounds described by the formula I:
R1 is H, halo, or a straight chain C1-C6 alkyl (especially methyl, ethyl or /t-propyl, which methyl, ethyl and n-propyl substituents are unsubstituted or substituted by one or more substituents described below for alkyl substituents);
R2 is selected from H, C1-C10 alkyl, (preferably C1-C6 alkyl, e.g. methyl, ethyl or - CH2CH2-OH), C4 - C9 cycloalkyl, C4 - C9 heterocycloalkyl, C4 - C9 heterocycloalkylalkyl, cycloalkylalkyl (e.g., cyclopropylmethyl), aryl, heteroaryl, arylalkyl (e.g. benzyl), heteroarylalkyl (e.g. pyridylmethyl), -(CH2)nC(O)Re, -(CH2)nOC(O)R6, amino acyl, HON- C(O)-CH=C(R0-aryl-alkyl- and -(CH2)nR7;
R3 and R4 are the same or different and independently H, C1-C6 alkyl, acyl or acylamino, or R3 and R4 together with the carbon to which they are bound represent C=O, C=S, or C=NR8, or R2 together with the nitrogen to which it is bound and R3 together with the carbon to which it is bound can form a C4 - C9 heterocycloalkyl, a heteroaryl, a polyheteroaryl, a non- aromatic polyheterocycle, or a mixed aryl and non-aryl polyheterocycle ring;
R5 is selected from H, Ci-C6 alkyl, C4 - C9 cycloalkyl, C4 - C9 heterocycloalkyl, acyl, aryl, heteroaryl, arylalkyl (e.g. benzyl), heteroarylalkyl (e.g. pyridylmethyl), aromatic polycycles, non-aromatic polycycles, mixed aryl and non-aryl polycycles, polyheteroaryl, non- aromatic polyheterocycles, and mixed aryl and non-aryl polyheterocycles; n, m, n2 and n3 are the same or different and independently selected from 0 - 6, when m is 1- 6, each carbon atom can be optionally and independently substituted with R3 and/or R4;
X and Y are the same or different and independently selected from H, halo, C1-C4 alkyl, such as CH3 and CF3, NO2, C(O)R1, OR9, SR9, CN, and NR10R11;
R6 is selected from H, C1-C6 alkyl, C4 - C9 cycloalkyl, C4 - C9 heterocycloalkyl, cycloalkylalkyl {e.g., cyclopropylmethyl), aryl, heteroaryl, arylalkyl (e.g., benzyl, 2- phenylethenyl), heteroarylalkyl (e.g., pyridylmethyl), OR12, and NR13R14;
R7 is selected from OR15, SR15, S(O)R16, SO2R17, NR13R14, and NR12SO2R6;
R8 is selected from H, OR15, NR13R14, C1-C6 alkyl, C4 - C9 cycloalkyl, C4 - C9 heterocycloalkyl, aryl, heteroaryl, arylalkyl (e.g., benzyl), and heteroarylalkyl (e.g., pyridylmethyl);
R9 is selected from C1 - C4 alkyl, for example, CH3 and CF3, C(O)-alkyl, for example C(O)CH3, and C(O)CF3;
R10 and R11 are the same or different and independently selected from H, C1-C4 alkyl, and -C(O)-alkyl;
R12 is selected from H, C1-C6 alkyl, C4 - C9 cycloalkyl, C4 - C9 heterocycloalkyl, C4 - C9 heterocycloalkylalkyl, aryl, mixed aryl and non-aryl polycycle, heteroaryl, arylalkyl (e.g., benzyl), and heteroarylalkyl (e.g., pyridylmethyl);
R13 and R14 are the same or different and independently selected from H, C1-C6 alkyl, C4 - C9 cycloalkyl, C4 - C9 heterocycloalkyl, aryl, heteroaryl, arylalkyl (e.g., benzyl), heteroarylalkyl (e.g., pyridylmethyl), amino acyl, or R13 and R14 together with the nitrogen to which they are bound are C4 — C9 heterocycloalkyl, heteroaryl, polyheteroaryl, non-aromatic polyheterocycle_or mixed aryl and non-aryl polyheterocycle;
R15 is selected from H, C1-C6 alkyl, C4 - C9 cycloalkyl, C4 - C9 heterocycloalkyl, aryl, heteroaryl, arylalkyl, heteroarylalkyl and (CH2)mZR12;
R16 is selected from C1-C6 alkyl, C4 - C9 cycloalkyl, C4 - C9 heterocycloalkyl, aryl, heteroaryl, polyheteroaryl, arylalkyl, heteroarylalkyl and (CH2)mZR12;
R17 is selected from C1-C6 alkyl, C4 - C9 cycloalkyl, C4 - C9 heterocycloalkyl, aryl, aromatic polycycles, heteroaryl, arylalkyl, heteroarylalkyl, polyheteroaryl and NR13R14; m is an integer selected from O to 6; and
Z is selected from O, NR13, S and S(O).
[47] As appropriate, unsubstituted means that there is no substituent or that the only substituents are hydrogen.
[48] Halo substituents are selected from fluoro, chloro, bromo and iodo, preferably fluoro or chloro.
[49] Alkyl substituents include straight and branched d-C6alkyl5 unless otherwise noted. Examples of suitable straight and branched Q-Cδalkyl substituents include methyl, ethyl, n- propyl, 2-propyl, n-butyl, sec-butyl, t-butyl, and the like. Unless otherwise noted, the alkyl substituents include both unsubstituted alkyl groups and alkyl groups that are substituted by one or more suitable substituents, including unsaturation (i.e. there are one or more double or triple C-C bonds), acyl, cycloalkyl, halo, oxyalkyl, alkylamino, aminoalkyl, acylamino and OR15, for example, alkoxy. Preferred substituents for alkyl groups include halo, hydroxy, alkoxy, oxyalkyl, alkylamino, and aminoalkyl.
[50] Cycloalkyl substituents include C3-C9 cycloalkyl groups, such as cyclopropyl, cyclobutyl, cyclopentyl, cyclohexyl and the like, unless otherwise specified. Unless otherwise noted, cycloalkyl substituents include both unsubstituted cycloalkyl groups and cycloalkyl groups that are substituted by one or more suitable substituents, including C1-C6 alkyl, halo, hydroxy, aminoalkyl, oxyalkyl, alkylamino, and OR15, such as alkoxy. Preferred substituents for cycloalkyl groups include halo, hydroxy, alkoxy, oxyalkyl, alkylamino and aminoalkyl. [51] The above discussion of alkyl and cycloalkyl substituents also applies to the alkyl portions of other substituents, such as without limitation, alkoxy, alkyl amines, alkyl ketones, arylalkyl, heteroarylalkyl, alkylsulphonyl and alkyl ester substituents and the like. [52] Heterocycloalkyl substituents include 3 to 9 membered aliphatic rings, such as 4 to 7 membered aliphatic rings, containing from one to three heteroatoms selected from nitrogen, sulphur, oxygen. Examples of suitable heterocycloalkyl substituents include pyrrolidyl, tetrahydrofuryl, tetrahydrothiofuranyl, piperidyl, piperazyl, tetrahydropyranyl, morphilino, 1,3-diazapane, 1,4-diazapane, 1,4-oxazepane, and 1,4-oxathiapane. Unless otherwise noted, the rings are unsubstituted or substituted on the carbon atoms by one or more suitable substituents, including C1-C6 alkyl, C4 - C9 cycloalkyl, aryl, heteroaryl, arylalkyl (e.g., benzyl), and heteroarylalkyl (e.g., pyridylmethyl), halo, amino, alkyl amino and OR15, for example alkoxy. Unless otherwise noted, nitrogen heteroatoms are unsubstituted or substituted by H, C1-C4 alkyl, arylalkyl (e.g., benzyl), and heteroarylalkyl (e.g., pyridylmethyl), acyl, aminoacyl, alkylsulphonyl, and arylsulphonyl.
[53] Cycloalkylalkyl substituents include compounds of the formula -(CH2)n5-cycloalkyl wherein n5 is a number from 1-6. Suitable alkylcycloalkyl substituents include cyclopentylmethyl-, cyclopentylethyl, cyclohexylmethyl and the like. Such substituents are unsubstituted or substituted in the alkyl portion or in the cycloalkyl portion by a suitable substituent, including those listed above for alkyl and cycloalkyl.
[54] Aryl substituents include unsubstituted phenyl and phenyl substituted by one or more suitable substituents, including C1-C6 alkyl, cycloalkylalkyl (e.g., cyclopropylmethyl), O(CO)alkyl, oxyalkyl, halo, nitro, amino, alkylamino, aminoalkyl, alkyl ketones, nitrile, . carboxyalkyl, alkylsulphonyl, aminosulphonyl, arylsulphonyl, and OR15, such as alkoxy. Preferred substituents include including C1-C6 alkyl, cycloalkyl (e.g., cyclopropylmethyl), alkoxy, oxyalkyl, halo, nitro, amino, alkylamino, aminoalkyl, alkyl ketones, nitrile, carboxyalkyl, alkylsulphonyl, arylsulphonyl, and aminosulphonyl. Examples of suitable aryl groups include C1-C4alkylphenyl, Ci-Qalkoxyphenyl, trifluoromethylphenyl, methoxyphenyl, hydroxyethylphenyl, dimethylaminophenyl, aminopropylphenyl, carbethoxyphenyl, methanesulphonylphenyl and tolylsulphonylphenyl.
[55] Aromatic polycycles include naphthyl, and naphthyl substituted by one or more suitable substituents, including C1-C6 alkyl, alkylcycloalkyl (e.g., cyclopropylmethyl), oxyalkyl, halo, nitro, amino, alkylamino, aminoalkyl, alkyl ketones, nitrile, carboxyalkyl, alkylsulphonyl, arylsulphonyl, aminosulphonyl and OR15, such as alkoxy.
[56] Heteroaryl substituents include compounds with a 5 to 7 member aromatic ring containing one or more heteroatoms, for example from 1 to 4 heteroatoms, selected from N, O and S. Typical heteroaryl substituents include furyl, thienyl, pyrrole, pyrazole, triazole, thiazole, oxazole, pyridine, pyrimidine, isoxazolyl, pyrazine and the like. Unless otherwise noted, heteroaryl substituents are unsubstituted or substituted on a carbon atom by one or more suitable substituents, including alkyl, the alkyl substituents identified above, and another heteroaryl substituent. Nitrogen atoms are unsubstituted or substituted, for example by R13; especially useful N substituents include H, C1 - C4 alkyl, acyl, aminoacyl, and sulphonyl. [57] Arylalkyl substituents include groups of the formula -(CH2)n5-aryl, -(CH2)n5-i-(CH- aryl)-(CH2)n5-aryl or -(CH2)n5-1CH(aryl)(aryl) wherein aryl and n5 are defined above. Such arylalkyl substituents include benzyl, 2-phenylethyl, 1-phenylethyl, tolyl-3 -propyl, 2- phenylpropyl, diphenylmethyl, 2-diphenylethyl, 5,5-dimethyl-3-phenylpentyl and the like.
Arylalkyl substituents are unsubstituted or substituted in the alkyl moiety or the aryl moiety or both as described above for alkyl and aryl substituents.
[58] Heteroarylalkyl substituents include groups of the formula — (CH2)n5-heteroaryl wherein heteroaryl and n5 are defined above and the bridging group is linked to a carbon or a nitrogen of the heteroaryl portion, such as 2-, 3- or 4-pyridylmethyl, imidazolylmethyl, quinolylethyl, and pyrrolylbutyl. Heteroaryl substituents are unsubstituted or substituted as discussed above for heteroaryl and alkyl substituents.
[59] Amino acyl substituents include groups of the formula -C(O)-(CH2)n-C(H)(NR13R14)- (CH2)n-R5 wherein n, R13, R14 and R5 are described above. Suitable aminoacyl substituents include natural and non-natural amino acids such as glycinyl, D-tryptophanyl, L-lysinyl, D- or L-homoserinyl, 4-aminobutryic acyl, ±-3-amin-4-hexenoyl.
[60] Nonraromatic polycycle substituents include bicyclic and tricyclic fused ring systems where each ring can be 4-9 membered and each ring can contain zero, 1 or more double and/or triple bonds. Suitable examples of non-aromatic polycycles include decalin, octahydroindene, perhydrobenzocycloheptene, perhydrobenzo-[/]-azulene. Such substituents are unsubstituted or substituted as described above for cycloalkyl groups. [61] Mixed aryl and non-aryl polycycle substituents include bicyclic and tricyclic fused ring systems where each ring can be 4 - 9 membered and at least one ring is aromatic. Suitable examples of mixed aryl and non-aryl polycycles include methylenedioxyphenyl, bis- methylenedioxyphenyl, 1,2,3,4-tetrahydronaphthalene, dibenzosuberane, dihdydroanthracene, 9H-fluorene. Such substituents are unsubstituted or substituted by nitro or as described above for cycloalkyl groups.
[62] Polyheteroaryl substituents include bicyclic and tricyclic fused ring systems where each ring can independently be 5 or 6 membered and contain one or more heteroatom, for example, 1, 2, 3, or 4 heteroatoms, chosen from O, N or S such that the fused ring system is aromatic. Suitable examples of polyheteroaryl ring systems include quinoline, isoquinoline, pyridopyrazine, pyrrolopyridine, furopyridine, indole, benzofuran, benzothiofuran, benzindole, benzoxazole, pyrroloquinoline, and the like. Unless otherwise noted, polyheteroaryl substituents are unsubstituted or substituted on a carbon atom by one or more suitable substituents, including alkyl, the alkyl substituents identified above and a substituent of the formula -O-(CH2CH=CH(CH3)(CH2))i-3H. Nitrogen atoms are unsubstituted or
substituted, for example by R13; especially useful N substituents include H, C1 - C4 alkyl, acyl, aminoacyl, and sulphonyl.
[63] Non-aromatic polyheterocyclic substituents include bicyclic and tricyclic fused ring systems where each ring can be 4 - 9 membered, contain one or more heteroatom, for example, 1, 2, 3, or 4 heteroatoms, chosen from O, N or S and contain zero or one or more C- C double or triple bonds. Suitable examples of non-aromatic polyheterocycles include hexitol, cis-perhydro-cyclohepta[b]pyridinyl, decahydro-benzo[f][l,4]oxazepinyl, 2,8- dioxabicyclo [3.3.0] octane, hexahydro-thieno[3,2-b]thiophene, perhydropyrrolo[3,2-b]pyrrole, perhydronaphthyridine, perhydro-lH-dicyclopenta[b,e]pyran. Unless otherwise noted, non- aromatic polyheterocyclic substituents are unsubstituted or substituted on a carbon atom by one or more substituents, including alkyl and the alkyl substituents identified above. Nitrogen atoms are unsubstituted or substituted, for example, by R13; especially useful N substituents include H, C1 - C4 alkyl, acyl, aminoacyl, and sulphonyl.
[64] Mixed aryl and non-aryl polyheterocycles substituents include bicyclic and tricyclic fused ring systems where each ring can be 4 - 9 membered, contain one or more heteroatom chosen from O, N or S, and at least one of the rings must be aromatic. Suitable examples of mixed aryl and non-aryl polyheterocycles include 2,3-dihydroindole,, 1,2,3,4- tetrahydroquinoline,5, 11 -dihydro-1 OH-dibenz[b,e] [1 ,4]diazepine,5H- dibenzo[b,e] [1 ,4]diazepine, 1 ,2-dihydropyrrolo[3,4-b] [1 ,5]benzodiazepine, 1 ,5-dihydro- pyrido[2,3-b] [1 ,4]diazepin-4-one,l ,2,3,4,6, 11 -hexahydro-benzo[b]pyrido[2,3-e][l ,4]diazepin- 5-one. Unless otherwise noted, mixed aryl and non-aryl polyheterocyclic substituents are unsubstituted or substituted on a carbon atom by one or more suitable substituents, including, -N-OH, =N-0H, alkyl and the alkyl substituents identified above. Nitrogen atoms are unsubstituted or substituted, for example, by R13; especially useful N substituents include H, C1 - C4 alkyl, acyl, aminoacyl, and sulphonyl.
[65] Amino substituents include primary, secondary and tertiary amines and in salt form, quaternary amines. Examples of amino substituents include mono- and di-alkylamino, mono- and di-aryl amino, mono- and di-arylalkyl amino, aryl-arylalkylamino, alkyl-arylamino, alkyl- arylalkylamino and the like.
[66] Sulphonyl substituents include alkylsulphonyl and arylsulphonyl, for example methane sulphonyl, benzene sulphonyl, tosyl and the like.
[67] Acyl substituents include groups of formula -C(O)-W, -OC(O)-W, -C(O)-O-W or - C(O)NR13R14, where W is R16, H or cycloalkylalkyl.
[68] Acylamino substituents include substituents of the formula -N(R12)C(O)-W, - N(R12)C(O)-O-W, and -N(R12)C(O)-NHOH and R12 and W are defined above. [69] The R2 substituent HON-C(O)-CH=C(R1)-aryl-alkyl- is a group of the formula
[70] Preferences for each of the substituents include the following:
R1 is H, halo, or a straight chain C1-C4 alkyl;
R2 is selected from H, C1-C6 alkyl, C4 - C9 cycloalkyl, C4 - C9 heterocycloalkyl, alkylcycloalkyl, aryl, heteroaryl, arylalkyl, heteroarylalkyl, -(CH2)nC(O)R6, amino acyl, and - (CH2)nR7;
R3 and R4 are the same or different and independently selected from H, and C1-C6 alkyl, or R3 and R4 together with the carbon to which they are bound represent C=O, C=S, or C=NR8;
R5 is selected from H, C1-C6 alkyl, C4 - C9 cycloalkyl, C4 - C9 heterocycloalkyl, aryl, heteroaryl, arylalkyl, heteroarylalkyl, a aromatic polycycle, a non-aromatic polycycle, a mixed aryl and non-aryl polycycle, polyheteroaryl, a non-aromatic polyheterocycle, and a mixed aryl and non-aryl polyheterocycle; n, m, n2 and n3 are the same or different and independently selected from 0 - 6, when nj is 1-6, each carbon atom is unsubstituted or independently substituted with R3 and/or R4;
X and Y are the same or different and independently selected from H, halo, C1-C4 alkyl, CF3, NO2, C(O)R1, OR9, SR9, CN, and NR10R11;
R6 is selected from H, C1-C6 alkyl, C4 - C9 cycloalkyl, C4 - C9 heterocycloalkyl, alkylcycloalkyl, aryl, heteroaryl, arylalkyl, heteroarylalkyl, OR12, and NR13R14;
R7 is selected from OR15, SR15, S(O)R16, SO2R17, NRi3R14, and NR12SO2R6;
R8 is selected from H, OR15, NR13R14, C1-C6 alkyl, C4 - C9 cycloalkyl, C4 - C9 heterocycloalkyl, aryl, heteroaryl, arylalkyl, and heteroarylalkyl;
R9 is selected from C1 - C4 alkyl and C(O)-alkyl;
R10 and R11 are the same or different and independently selected from H, C1-C4 alkyl, and -C(O)-alkyl;
R12 is selected from H, C1-C6 alkyl, C4 - C9 cycloalkyl, C4 - C9 heterocycloalkyl, aryl, heteroaryl, arylalkyl, and heteroarylalkyl;
R13 and R14 are the same or different and independently selected from H, C1-C6 alkyl, C4 - C9 cycloalkyl, C4 - C9 heterocycloalkyl, aryl, heteroaryl, arylalkyl, heteroarylalkyl and amino acyl;
R15 is selected from H, C1-C6 alkyl, C4 - C9 cycloalkyl, C4 - C9 heterocycloalkyl, aryl, heteroaryl, arylalkyl, heteroarylalkyl and (CH2)HiZR12;
R16 is selected from C1-C6 alkyl, C4 - C9 cycloalkyl, C4 - C9 heterocycloalkyl, aryl, heteroaryl, arylalkyl, heteroarylalkyl and (CH2)mZR12;
R17 is selected from C1-C6 alkyl, C4 - C9 cycloalkyl, C4 - C9 heterocycloalkyl, aryl, heteroaryl, arylalkyl, heteroarylalkyl and NR13R14; m is an integer selected from 0 to 6; and
Z is selected from O, NR13, S, S(O).
[71] Useful compounds of the formula (I) include those wherein each OfR1, X, Y, R3, and R4 is H, including those wherein one of n2 and n3 is zero and the other is 1, especially those wherein R2 is H or -CH2-CH2-OH.
[72] One suitable genus of hydroxamate compounds are those of formula Ia:
R2 is selected from H, C1-C6 alkyl, C4 - C9 cycloalkyl, C4 - C9 heterocycloalkyl, • alkylcycloalkyl, aryl, heteroaryl, arylalkyl, heteroarylalkyl, -(CH2)nC(O)R6, amino acyl and - (CH2)nR7;
[73] In one embodiment, R5' is heteroaryl, heteroarylalkyl {e.g., pyridylmethyl), aromatic polycycles, non-aromatic polycycles, mixed aryl and non-aryl polycycles, polyheteroaryl, or mixed aryl and non-aryl polyheterocycles.
[74] In another embodiment, R5' is aryl, arylalkyl, aromatic polycycles, non-aromatic polycycles, and mixed aryl and non-aryl polycycles; especially aryl, such as p-fmorophenyl, p- chlorophenyl, p-O-CrC4-alkylphenyl, such as p-methoxyphenyl, and p-C1-C4-alkylphenyl; and arylalkyl, such as benzyl, ortho, meta or para-Αuorobenzy\, ortho, tneta or par a- chlorobenzyl, ortho, meta or para-mono, di or tri-O-C1-C4-alkylbenzyl, such as ortho, meta or /rørø-methoxybenzyl, «2j?-diethoxybenzyl, σ,røj?-triimethoxybenzyl , and ortho, meta or para- mono, di or tri Q-Q-alkylphenyl, such as/?-methyl, m,m-diethylphenyl. [75] As used herein, "expression" includes but is not limited to one or more of the following: transcription of the gene into precursor mRNA; splicing and other processing of the precursor mRNA to produce mature mRNA; mRNA stability; translation of the mature mRNA into protein (including codon usage and tRNA availability); and glycosylation and/or other modifications of the translation product, if required for proper expression and function. [76] The term "mutation" means any heritable or acquired variation from the wild-type that alters the nucleotide sequence thereby changing the protein sequence. The terms "mutation" and "mutant" is used interchangeably with the terms "marker", "biomarker", and "target" throughout the specification.
[77] The term "medical condition" includes, but is not limited to, any condition or disease manifested as one or more physical and/or psychological symptoms for which treatment and/or prevention is desirable, and includes previously and newly identified diseases and other disorders.
[78] The term "nucleotide pair" means the two nucleotides bound to each other between the two nucleotide strands.
[79] The term "polymorphism" means any sequence variant present at a frequency of >1% in a population. The sequence variant may be present at a frequency significantly greater than 1% such as 5% or 10% or more. Also, the term may be used to refer to the sequence variation observed in an individual at a polymorphic site. Polymorphisms include nucleotide substitutions, insertions, deletions and microsatellites and may, but need not, result in detectable differences in gene expression or protein function.
[80] The term "small molecule" means a composition that has a molecular weight of less than about 5 kDa and more preferably less than about 2 kDa. Small molecules can be, e.g., nucleic acids, peptides, polypeptides, glycopeptides, peptidomimetics, carbohydrates, lipids, lipopolysaccharides, combinations of these, or other organic or inorganic molecules. [81] The term "mutant nucleic acid" means a nucleic acid sequence, which comprises a nucleotide that is variable within an otherwise identical nucleotide sequence between individuals or groups of individuals, thus, existing as alleles. Such mutant nucleic acids are preferably from about 15 to about 500 nucleotides in length. The mutant nucleic acids may be part of a chromosome, or they may be an exact copy of a part of a chromosome, e.g., by amplification of such a part of a chromosome through PCR or through cloning. The mutant probes according to the invention are oligonucleotides that are complementary to a mutant nucleic acid.
[82] The term "SNP nucleic acid" means a nucleic acid sequence, which comprises a nucleotide that is variable within an otherwise identical nucleotide sequence between individuals or groups of individuals, thus, existing as alleles. Such SNP nucleic acids are preferably from about 15 to about 500 nucleotides in length. The SNP nucleic acids may be part of a chromosome, or they may be an exact copy of a part of a chromosome, e.g., by amplification of such a part of a chromosome through PCR or through cloning. The SNP nucleic acids are referred to hereafter simply as "SNPs". The SNP probes according to the invention are oligonucleotides that are complementary to a SNP nucleic acid. In a particular embodiment, the SNP is in the HDACl gene.
[83] The term "subject" as used herein refers to any living organism capable of eliciting an immune response. The term subject includes, but is not limited to, humans, nonhuman primates such as chimpanzees and other apes and monkey species; farm animals such as cattle, sheep, pigs, goats and horses; domestic mammals such as dogs and cats; laboratory animals including rodents such as mice, rats and guinea pigs, and the like. The term does not denote a particular age or sex. Thus, adult and newborn subjects, as well as fetuses, whether male or female, are intended to be covered.
[84] The administration of an agent or drug to a subject or patient includes self- administration and the administration by another. It is also to be appreciated that the various modes of treatment or prevention of medical conditions as described are intended to mean
"substantial", which includes total but also less than total treatment or prevention, and wherein some biologically or medically relevant result is achieved. [85] The details of one or more embodiments of the invention are set forth in the accompanying description below. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, the preferred methods and materials are now described. Other features, objects, and advantages of the invention will be apparent from the description and the claims. In the specification and the appended claims, the singular forms include plural referents unless the context clearly dictates otherwise. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. All references cited herein are incorporated herein by reference in their entireties and for all purposes to the same extent as if each individual publication, patent, or patent application was specifically and individually incorporated by reference in its entirety for all purposes.
[86] Identification and Characterization of Gene Sequence Variation. Sequence variation in the human germline consists primarily of SNPs, the remainder being short tandem repeats (including micro-satellites), long tandem repeats (mini-satellites), and other insertions and deletions. A SNP is the occurrence of nucleotide variability at a single position in the genome, in which two alternative bases occur at appreciable frequency {i.e., >1%) in the human population. A SNP may occur within a gene or within intergenic regions of the genome.
[87] Due to their prevalence and widespread nature, SNPs have the potential to be important tools for locating genes that are involved in human disease conditions. See e.g., Wang et ah, Science 280: 1077-1082 (1998)).
[88] Identification and Characterization of SNPs and Mutations. Many different techniques can be used to identify and characterize SNPs and mutations, including single-strand conformation polymorphism (SSCP) analysis, heteroduplex analysis by denaturing high- performance liquid chromatography (DHPLC), direct DNA sequencing and computational methods (Shi et ah, Clin Chem 47:164-172 (2001)). There is a wealth of sequence information in public databases; computational tools useful to identify SNPs and/or mutations in silico by aligning independently submitted sequences for a given gene (either cDNA or
genomic sequences). The most common SNP-typing methods currently include hybridization, primer extension, and cleavage methods. Each of these methods must be connected to an appropriate detection system. Detection technologies include fluorescent polarization (Chan et ah, Genome Res. 9:492-499 (1999)), luminometric detection of pyrophosphate release (pyrosequencing) (Ahmadiian et ah, Anal. Biochem. 280:103-10 (2000)), fluorescence resonance energy transfer (FRET)-based cleavage assays, DHPLC, and mass spectrometry (Shi, Clin Chem 47:164-172 (2001); U.S. Pat. No. 6,300,076 Bl). Other methods of detecting and characterizing SNPs and mutations are those disclosed in U.S. Pat. Nos. 6,297,018 Bl and 6,300,063 Bl.
[89] In a particularly preferred embodiment, the detection of polymorphisms and mutations is detected using INVADER™ technology (available from Third Wave Technologies Inc. Madison, Wisconsin USA). In this assay, a specific upstream "invader" oligonucleotide and a partially overlapping downstream probe together form a specific structure when bound to complementary DNA template. This structure is recognized and cut at a specific site by the Cleavase enzyme, resulting in the release of the 5' flap of the probe oligonucleotide. This fragment then serves as the "invader" oligonucleotide with respect to synthetic secondary targets and secondary fluorescently labelled signal probes contained in the reaction mixture. This results in specific cleavage of the secondary signal probes by the Cleavase enzyme. Fluorescent signal is generated when this secondary probe (labelled with dye molecules capable of fluorescence resonance energy transfer) is cleaved. Cleavases have stringent requirements relative to the structure formed by the overlapping DNA sequences or flaps and can, therefore, be used to specifically detect single base pair mismatches immediately upstream of the cleavage site on the downstream DNA strand. Ryan D et ah, Molecular Diagnosis 4(2): 135-144 (1999) and Lyamichev V et a Nature Biotechnology 17: 292-296 (1999), see also U.S. Pat. Nos. 5,846,717 and 6,001,567.
[90] The identity of polymorphisms and mutations may also be determined using a mismatch detection technique including, but not limited to, the RNase protection method using riboprobes (Winter et ah, Proc. Natl. Acad. Sci. USA 82:7575 (1985); Meyers et ah, Science 230:1242 (1985)) and proteins which recognize nucleotide mismatches, such as the E. coli mutS protein (Modrich P, Ann Rev Genet 25:229-253 (1991)). Alternatively, variant alleles can be identified by single strand conformation polymorphism (SSCP) analysis (Orita et ah,
Genomics 5:874-879 (1989); Humphries et al, in Molecular Diagnosis of Genetic Diseases, Elles R, ed. (1996) pp. 321-340) or denaturing gradient gel electrophoresis (DGGE) (Wartell et al, NmI. Acids Res. 18:2699-2706 (1990); Sheffield et al, Proc. Natl. Acad. ScI USA 86: 232-236 (1989)). A polymerase-mediated primer extension method may also be used to identify the polymorphisms/mutations. Several such methods have been described in the patent and scientific literature and include the "Genetic Bit Analysis" method (WO 92/15712) and the ligase/polymerase mediated genetic bit analysis (U.S. Pat. No. 5,679,524). Related methods are disclosed in WO 91/02087, WO 90/09455, WO 95/17676, and U.S. Pat. Nos. 5,302,509 and 5,945,283. Extended primers containing a polymorphism or mutation may be detected by mass spectrometry as described in U.S. Pat. No. 5,605,798. Another primer extension method is allele-specific PCR. Ruafio et al, Nucl. Acids Res. 17: 8392 (1989); Ruafio et al., Nucl. Acids Res. 19: 6877-6882 (1991); WO 93/22456; Turki et al, J. Clin. Invest. 95: 1635-1641 (1995). In addition, multiple polymorphic and/or mutant sites may be investigated by simultaneously amplifying multiple regions of the nucleic acid using sets of allele-specific primers as described in WO 89/10414.
[91] Haplotyping and Genotyping Oligonucleotides. The invention provides methods and compositions for haplotyping and/or genotyping the genetic polymorphisms (and possibly mutations) in an individual. As used herein, the terms "genotype" and "haplotype" mean the genotype or haplotype containing the nucleotide pair or nucleotide, respectively, that is present at one or more of the novel polymorphic (or mutant) sites described herein and may optionally also include the nucleotide pair or nucleotide present at one or more additional polymorphic (or mutant) sites in the gene. The additional polymorphic (and mutant) sites may be currently known polymorphic/mutant sites or sites that are subsequently discovered. [92] The compositions contain oligonucleotide probes and primers designed to specifically hybridize to one or more target regions containing, or that are adjacent to, a polymorphic or mutant site. Oligonucleotide compositions of the invention are useful in methods for genotyping and/or haplotyping a gene in an individual. The methods and compositions for establishing the genotype or haplotype of an individual at the novel polymorphic/mutant sites described herein are useful for studying the effect of the polymorphisms and mutations in the aetiology of diseases affected by the expression and function of the protein, studying the efficacy of drugs targeting, predicting individual susceptibility to diseases affected by the
expression and function of the protein and predicting individual responsiveness to drugs targeting the gene product.
[93] Some embodiments of the invention contain two or more differently labelled genotyping oligonucleotides, for simultaneously probing the identity of nucleotides at two or more polymorphic or mutant sites. It is also contemplated that primer compositions may . contain two or more sets of allele-specific primer pairs to allow simultaneous targeting and amplification of two or more regions containing a polymorphic or mutant site. [94] Genotyping oligonucleotides of the invention may be immobilized on or synthesized on a solid surface such as a microchip, bead, or glass slide (see, e.g., WO 98/20020 and WO 98/20019). Such immobilized genotyping oligonucleotides may be used in a variety of polymorphism and mutation detection assays, including but not limited to probe hybridization and polymerase extension assays. Immobilized genotyping oligonucleotides of the invention may comprise an ordered array of oligonucleotides designed to rapidly screen a DNA sample for polymorphisms and mutations in multiple genes at the same time. [95] An allele-specific oligonucleotide primer of the invention has a 3' terminal nucleotide, or preferably a 3' penultimate nucleotide, that is complementary to only one nucleotide of a particular SNP and/or mutation, thereby acting as a primer for polymerase-mediated extension only if the allele containing that nucleotide is present. Allele-specific oligonucleotide (ASO) primers hybridizing to either the coding or noncoding strand are contemplated by the invention. An ASO primer for detecting gene polymorphisms and mutations can be developed using techniques known to those of skill in the art.
[96] Other genotyping oligonucleotides of the invention hybridize to a target region located one to several nucleotides downstream of one of the novel polymorphic or mutant sites identified herein. Such oligonucleotides are useful in polymerase-mediated primer extension methods for detecting one of the novel polymorphisms or mutations described herein and therefore such genotyping oligonucleotides are referred to herein as "primer-extension oligonucleotides". In a preferred embodiment, the 3 '-terminus of a primer-extension oligonucleotide is a deoxynucleotide complementary to the nucleotide located immediately adjacent to the polymorphic/mutant site.
[97] Direct Genotyping Method of the Invention. One embodiment of a genotyping method of the invention involves isolating from an individual a nucleic acid mixture comprising at
least one copy of the gene of interest and/or a fragment or flanking regions thereof, and determining the identity of the nucleotide pair at one or more of the polymorphic/mutant sites in the nucleic acid mixture. As will be readily understood by the skilled artisan, the two "copies" of a germline gene in an individual may be the same on each allele or may be different on each allele. In a particularly preferred embodiment, the genotyping method comprises determining the identity of the nucleotide pair at each polymorphic and mutant site. [98] Direct Haplotyping Method of the Invention. One embodiment of the haplotyping method of the invention comprises isolating from an individual a nucleic acid molecule containing only one of the two copies of a gene of interest, or a fragment thereof, and determining the identity of the nucleotide at one or more of the polymorphic or mutant sites in that copy. The nucleic acid may be isolated using any method capable of separating the two copies of the gene or fragment. As will be readily appreciated by those skilled in the art, any individual clone will only provide haplotype information on one of the two gene copies present in an individual. If haplotype information is desired for the individual's other copy, additional clones will need to be examined. Typically, at least five clones should be examined to have more than a 90% probability of haplotyping both copies of the gene in an individual. In a particularly preferred embodiment, the nucleotide at each polymorphic or mutant site is identified.
[99] In both the genotyping and haplotyping methods, the identity of a nucleotide (or nucleotide pair) at a polymorphic and/or mutant site may be determined by amplifying a target region containing the polymorphic and/or mutant sites directly from one or both copies of the gene, or fragments thereof, and sequencing the amplified regions by conventional methods. It will be readily appreciated by the skilled artisan that only one nucleotide will be detected at a polymorphic or mutant site in individuals who are homozygous at that site, while two different nucleotides will be detected if the individual is heterozygous for that site. The polymorphism or mutation may be identified directly, known as positive-type identification, or by inference, referred to as negative-type identification. For example, where a SNP and/or mutation is known to be guanine and cytosine in a reference population, a site may be positively determined to be either guanine or cytosine for all individuals homozygous at that site, or both guanine and cytosine, if the individual is heterozygous at that site. Alternatively,
the site may be negatively determined to be not guanine (and thus cytosine/cytosine) or not cytosine (and thus guanine/guanine).
[100] Indirect Genotyping Method using Polymorphic and Mutation Sites in Linkage Disequilibrium with a Target Polymorphism or Mutation. In addition, the identity of the alleles present at any of the novel polymorphic/mutant sites of the invention may be indirectly determined by genotyping other polymorphic/mutant sites in linkage disequilibrium with those sites of interest. As described supra, two sites are said to be in linkage disequilibrium if the presence of a particular variant (polymorphism or mutation) at one site is indicative of the presence of another variant at a second site. See, Stevens JC, MoI. Diag. 4:309-317 (1999). Polymorphic and mutant sites in linkage disequilibrium with the polymorphic or mutant sites of the invention may be located in regions of the same gene or in other genomic regions. Genotyping of a polymorphic/mutant site in linkage disequilibrium with the novel polymorphic/mutant sites described herein may be performed by, but is not limited to, any of the above-mentioned methods for detecting the identity of the allele at a polymorphic/mutant site.
[101] Amplifying a Target Gene Region. The target regions may be amplified using any oligonucleotide-directed amplification method, including but not limited to polymerase chain reaction (PCR). (U.S. Pat. No. 4,965,188), ligase chain reaction (LCR) (Barany et al, Proc. Natl. Acad. Sci. USA 88:189-193 (1991); published PCT patent application WO 90/01069), and oligonucleotide ligation assay (OLA) (Landegren et al, Science 241: 1077-1080 (1988)). Oligonucleotides useful as primers or probes in such methods should specifically hybridize to a region of the nucleic acid that contains or is adjacent to the polymorphic/mutant site. Typically, the oligonucleotides are between 10 and 35 nucleotides in length and preferably, between 15 and 30 nucleotides in length. Most preferably, the oligonucleotides are 20 to 25 nucleotides long. The exact length of the oligonucleotide will depend on many factors that are routinely considered and practiced by the skilled artisan.
[102] Other known nucleic acid amplification procedures may be used to amplify the target region including transcription-based amplification systems (U.S. Pat. No. 5,130,238; EP 329,822; U.S. Pat. No. 5,169,766, published PCT patent application WO 89/06700) and isothermal methods (Walker et al, Proc. Natl. Acad. Sci. USA 89: 392-396 (1992)).
[103] A polymorphism or mutation in the target region may be assayed before or after amplification using one of several hybridization-based methods known in the art. Typically, allele-specific oligonucleotides are utilized in performing such methods. The allele-specific oligonucleotides may be used as differently labelled probe pairs, with one member of the pair showing a perfect match to one variant of a target sequence and the other member showing a perfect match to a different variant. In some embodiments, more than one polymorphic/mutant site may be detected at once using a set of allele-specific oligonucleotides or oligonucleotide pairs. Preferably, the members of the set have melting temperatures within 50C, and more preferably within 2°C, of each other when hybridizing to each of the polymorphic or mutant sites being detected.
[104] Hybridizing Allele-Specific Oligonucleotide to a Target Gene. Hybridization of an allele-specific oligonucleotide to a.target polynucleotide may be performed with both entities in solution, or such hybridization may be performed when either the oligonucleotide or the target polynucleotide is covalently or noncovalently affixed to a solid support. Attachment may be mediated, for example, by antibody-antigen interactions, poly-L-Lys, streptavidin or avidin-biotin, salt bridges, hydrophobic interactions, chemical linkages, UV cross-linking, baking, etc. Allele-specific oligonucleotide may be synthesized directly on the solid support or attached to the solid support subsequent to synthesis. Solid-supports suitable for use in detection methods of the invention include substrates made of silicon, glass, plastic, paper atid the like, which may be formed, for example, into wells (as in 96-well plates), slides, sheets, jnembranes, fibres, chips, dishes, and beads. The solid support may be treated, coated or derivatised to facilitate the immobilization of the allele-specific oligonucleotide or target nucleic acid.
[105] The genotype or haplotype for the gene of an individual may also be determined by hybridization of a nucleic sample containing one or both copies of the gene to nucleic acid arrays and subarrays such as described in WO 95/11995. The arrays would contain a battery of allele-specific oligonucleotides representing each of the polymorphic or mutant sites to be included in the genotype or haplotype.
[106] Determining Population Genotypes and Haplotypes and Correlating them with a Trait The present invention provides a method for determining the frequency of a genotype or haplotype in a population. The method comprises determining the genotype or the haplotype
for a gene present in each member of the population, wherein the genotype or haplotype comprises the nucleotide pair or nucleotide detected at one or more of the polymorphic sites in the gene and mutations identified in the region, and calculating the frequency at which the genotype or haplotype is found in the population. The population may be a reference population, a family population, a same sex population, a population group, or a trait population (e.g., a group of individuals exhibiting a trait of interest such as a medical condition or response to a therapeutic treatment).
[107] In another aspect of the invention, frequency data for genotypes and/or haplotypes . found in a reference population are used in a method for identifying an association between a trait and a genotype or a haplotype. The trait may be any detectable phenotype, including but not limited to cancer, susceptibility to a disease or response to a treatment. The method involves obtaining data on the frequency of the genotypes or haplotypes of interest in a reference population and comparing the data to the frequency of the genotypes or haplotypes in a population exhibiting the trait. Frequency data for one or both of the reference and trait populations may be obtained by genotyping or haplotyping each individual in the populations using one of the methods described above. The haplotypes for the trait population may be determined directly or, alternatively, by the predictive genotype to haplotype approach described above.
[108] In preferred embodiments, the trait is susceptibility to a disease, severity of a disease, the staging of a disease or response to a drug. Such methods have applicability in developing diagnostic tests and therapeutic treatments for all pharmacogenetic applications where there is the potential for an association between a genotype and a treatment outcome, including efficacy measurements, PD measurements, PK measurements and side effect measurements. [109] In another embodiment, the frequency data for the reference and/or trait populations are obtained by accessing previously determined frequency data, which may be in written or electronic form. For example, the frequency data may be present in a database that is accessible by a computer. Once the frequency data are obtained, the frequencies of the genotypes or haplotypes of interest in the reference and trait populations are compared. In a preferred embodiment, the frequencies of all genotypes and/or haplotypes observed in the populations are compared. If a particular genotype or haplotype for the gene is more frequent
in the trait population than in the reference population at a statistically significant amount, then the trait is predicted to be associated with that genotype or haplotype. [110] hi a preferred embodiment, the haplotype frequency data for different ethnogeographic groups are examined to determine whether they are consistent with Hardy- Weinberg equilibrium. Hartl DL et ah, Principles of Population Genomics, 3rd Ed. (Sinauer Associates, Sunderland, MA, 1997). Hardy- Weinberg equilibrium postulates that the frequency of finding the haplotype pair HxIH1 is equal to PH-w (WH2) = Ip(Hx) p (H2) if Hx ≠ H2 and PH-w (HxIH2) =p (Hx)P (H2) if Hx = H2. A statistically significant difference between the observed and expected haplotype frequencies could be due to one or more factors including significant inbreeding in the population group, strong selective pressure on the gene, sampling bias, and/or errors in the genotyping process. If large deviations from Hardy- Weinberg equilibrium are observed in an ethnogeographic group, the number of individuals in that group can be increased to see if the deviation is due to a sampling bias. If a larger sample size does not reduce the difference between observed and expected haplotype pair frequencies, then one may wish to consider haplotyping the individual using a direct haplotyping method such as, for example, CLASPER System™ technology (U.S. Pat. No. 5,866,404), SMD, or allele- specific long-range PCR (Michalotos-Beloin et ah, Nucl. Acids Res. 24: 4841-4843 (1996)). [I l l] In one embodiment of this method for predicting a haplotype pair, the assigning step involves performing the following analysis. First, each of the possible haplotype pairs is compared to the haplotype pairs in the reference population. Generally, only one of the haplotype pairs in the reference population matches a possible haplotype pair and that pair is assigned to the individual. Occasionally, only one haplotype represented in the reference haplotype pairs is consistent with a possible haplotype pair for an individual, and in such cases the individual is assigned a haplotype pair containing this known haplotype and a new haplotype derived by subtracting the known haplotype from the possible haplotype pair. In rare cases, either no haplotypes in the reference population are consistent with the possible haplotype pairs, or alternatively, multiple reference haplotype pairs are consistent with the possible haplotype pairs. In such cases, the individual is preferably haplotyped using a direct molecular haplotyping method such as, for example, those discussed supra. [112] hi a preferred embodiment, statistical analysis is performed by the use of standard ANOVA tests with a Bonferoni correction and/or a bootstrapping method that simulates the
genotype phenotype correlation many times and calculates a significance value. When many polymorphisms and/or mutations are being analyzed, a calculation may be performed to correct for a significant association that might be found by chance. For statistical methods useful in the methods of the present invention, see Bailey NTJ, Statistical Methods in Biology, 3rd Edition (Cambridge Univ. Press, Cambridge, 1997); Waterman MS, Introduction to Computational Biology (CRC Press, 2000) and Bioinformatics, Baxevanis AD & Ouellette BFF, eds. (John Wiley & Sons, Inc., 2001).
[113] In a preferred embodiment of the method, the trait of interest is a clinical response, exhibited by a patient to some therapeutic treatment, for example, response to a drug targeting or to a therapeutic treatment for a medical condition.
[114] In another embodiment of the invention, a detectable genotype or haplotype that is in linkage disequilibrium with a genotype or haplotype of interest may be used as a surrogate marker. A genotype that is in linkage disequilibrium with another genotype is indicated where a particular genotype or haplotype for a given gene is more frequent in the population that also demonstrates the potential surrogate marker genotype than in the reference population. If the frequency is statistically significant, then the marker genotype is predictive of that genotype or haplotype, and can be used as a surrogate marker.
[115] Correlating Subject Genotype or Haplotype to Treatment Response, hi order to deduce a correlation between a clinical response to a treatment and a genotype or haplotype, genotype or haplotype data is obtained on the clinical responses exhibited by a population of individuals who received the treatment, hereinafter the "clinical population". This clinical data may be obtained by analyzing the results of a clinical trial that has already been previously conducted and/or by designing and carrying out one or more new clinical trials. [116] It is preferred that the individuals included in the clinical population be graded for the existence of the medical condition of interest. This grading of potential patients could employ a standard physical exam or one or more lab tests. Alternatively, grading of patients could use genotyping or haplotyping for situations where there is a strong correlation between haplotype pair and disease susceptibility or severity.
[117] The therapeutic treatment of interest is administered to each individual in the trial population, and each individual's response to the treatment is measured using one or more predetermined criteria. It is contemplated that in many cases, the trial population will exhibit
a range of responses, and that the investigator may choose more than one responder groups (e.g., low, medium, high) made up by the various responses. In addition, the gene for each individual in the trial population is genotyped and/or haplotyped, which may be done before or after administering the treatment.
[118] These results are then analyzed to determine if any observed variation in clinical response between polymorphism/mutation groups is statistically significant. Statistical analysis methods, which may be used, are described in Fisher LD & vanBelle G, Biostatistics: A Methodology for the Health Sciences (Wiley-lnterscience, New York, 1993). This analysis may also include a regression calculation of which polymorphic/mutation sites in the gene contribute most significantly to the differences in phenotype.
[119] A second method for finding correlations between genotype and haplotype content and clinical responses uses predictive models based on error-minimizing optimization algorithms, one of which is a genetic algorithm. Judson R, Genetic Algorithms and Their Uses in Chemistry, in Reviews in Computational Chemistry, Vol. 10, Lipkowitz KB & Boyd DB, eds. (VCH Publishers, New York, 1997) pp. 1-73. Simulated annealing (Press et al, Numerical Recipes in C: The Art of Scientific Computing, Ch. 10 (Cambridge University Press, Cambridge, 1992)), neural networks (Rich E & Knight K, Artificial Intelligence, 2nd Edition, Ch. 10 (McGraw-Hill, New York, 1991), standard gradient descent methods (Press et al., Numerical Recipes in C: The Art of Scientific Computing, Ch. 10 (Cambridge University Press, Cambridge, 1992), or other global or local optimization approaches (see discussion in Judson, supra) can also be used.
[120] Correlations may also be analyzed using analysis of variation (ANOVA) techniques to determine how much of the variation in the clinical data is explained by different subsets of the polymorphic and mutant sites in the gene. ANOVA is used to test hypotheses about whether a response variable is caused by or correlates with one or more traits or variables that can be measured (Fisher & vanBelle, supra, Ch. 10).
[121] After the clinical, mutation and polymorphism data have been obtained, correlations between individual response and genotype or haplotype content are created. Correlations may be produced in several ways. In one method, individuals are grouped by their genotype or haplotype (or haplotype pair) (also referred to as a polymorphism/mutation group), and then
the averages and standard deviations of clinical responses exhibited by the members of each polymorphism/mutation group are calculated.
[122] From the analyses described above, the skilled artisan that predicts clinical response as a function of genotype or haplotype content may readily construct a mathematical model. The identification of an association between a clinical response and a genotype or haplotype (or haplotype pair) for the gene may be the basis for designing a diagnostic method to determine those individuals who will or will not respond to the treatment, or alternatively, will respond at a lower level and thus may require more treatment, i.e., a greater dose of a drug or suffer an adverse reaction. The diagnostic method may take one of several forms: for example, a direct DNA test (i.e., genotyping or haplotyping one or more of the polymorphic/mutant sites in the gene), a serological test, or a physical exam measurement. The only requirement is that there be a good correlation between the diagnostic test results and the underlying genotype or haplotype. In a preferred embodiment, this diagnostic method uses the predictive genotyping/haplotyping method described above.
[123] Patient Selection for Therapy Based Upon Polymorphisms and/or Mutations. The application of genotypes and/or haplotypes that correlate with efficacious drag responses will be used to select patients for therapy of existing diseases. Genotypes and haplotypes that correlate with adverse consequences will be used to either modify how the drag is administered (e.g., dose, schedule or in combination with other drugs) or eliminated as an option.
[124] Patient Selection for Prophylactic Therapy Based Upon Polymorphisms and/or Mutations. The application of genotypes and/or haplotypes that correlate with a predisposition for disease will be used to select patients for preventative therapy. [125] Computer System for Storing or Displaying Polymorphism and Mutation Data The invention also provides a computer system for storing and displaying polymorphism and mutation data determined for the gene. The computer system comprises a computer processing unit, a display, and a database containing the polymorphism/mutation data. The polymorphism/mutation data includes the polymorphisms, mutations, the genotypes and the haplotypes identified for a given gene in a reference population. In a preferred embodiment, the computer system is capable of producing a display showing haplotypes organized according to their evolutionary relationships. A computer may implement any or all analytical
and mathematical operations involved in practicing the methods of the present invention. In addition, the computer may execute a program that generates views (or screens) displayed on a display device and with which the user can interact to .view and analyze large amounts of information relating to the gene and its genomic variation, including chromosome location, gene structure, and gene family, gene expression data, polymorphism data, mutation data, genetic sequence data, and clinical population data (e.g., data on ethnogeographic origin, clinical responses, genotypes, and haplotypes for one or more populations). The polymorphism and mutation data described herein may be stored as part of a relational database (e.g., an instance of an Oracle database or a set of ASCII flat files). These polymorphism and mutation data may be stored on the computer's hard drive or may, for example, be stored on a CD-ROM or on one or more other storage devices accessible by the computer. For example, the data may be stored on one or more databases in communication with the computer via a network.
[126] Nucleic Acid-based Diagnostics. In another aspect, the invention provides SNP and mutation probes, which are useful in classifying subjects according to their types of genetic variation. The SNP and mutation probes according to the invention are oligonucleotides, which discriminate between SNPs or mutations and the wild-type sequence in conventional allelic discrimination assays. In certain preferred embodiments, the oligonucleotides according to this aspect of the invention are complementary to one allele of the SNP and/or mutation nucleic acid, but not to any other allele of the SNP and/or mutation nucleic acid. Oligonucleotides according to this embodiment of the invention can discriminate between SNPs and mutations in various ways. For example, under stringent hybridization conditions, an oligonucleotide of appropriate length will hybridize to one SNP or mutation, but not to any other. The oligonucleotide may be labelled using a radiolabel or a fluorescent molecular tag. Alternatively, an oligonucleotide of appropriate length can be used as a primer for PCR, wherein the 3' terminal nucleotide is complementary to one allele containing a SNP or mutation, but not to any other allele. In this embodiment, the presence or absence of amplification by PCR determines the haplotype of the SNP or the specific mutation. [127] Genomic and cDNA fragments of the invention comprise at least one novel polymorphic site or mutation identified herein, have a length of at least 10 nucleotides, and may range up to the full length of the gene. Preferably, a fragment according to the present
invention is between 100 and 3000 nucleotides in length, and more preferably between 200 and 2000 nucleotides in length, and most preferably between 500 and 1000 nucleotides in length.
[128] Kits of the Invention. The invention provides nucleic acid and polypeptide detection kits useful for haplotyping and/or genotyping the genes in an individual. Such kits are useful for classifying individuals for the purpose of classifying individuals. Specifically, the invention encompasses kits for detecting the presence of a polypeptide or nucleic acid corresponding to a marker of the invention in a biological sample, e.g., any tissue or bodily fluid including, but not limited to, serum, plasma, lymph, cystic fluid, urine, stool, cerebrospinal fluid, ascites fluid or blood, and including biopsy samples of body tissue. For example, the kit can comprise a labelled compound or agent capable of detecting a polypeptide or an mRNA encoding a polypeptide corresponding to a marker of the invention in a biological sample and means for determining the amount of the polypeptide or mRNA in the sample, e.g., an antibody which binds the polypeptide or an oligonucleotide probe which binds to DNA or mRNA encoding the polypeptide. Kits can also include instructions for interpreting the results obtained using the kit.
[129] In another embodiment, the invention provides a kit comprising at least two genotyping oligonucleotides packaged in separate containers. The kit may also contain other components such as hybridization buffer (where the oligonucleotides are to be used as a probe) packaged in a separate container. Alternatively, where the oligonucleotides are to be used to amplify a target region, the kit may contain, packaged in separate containers, a polymerase and a reaction buffer optimized for primer extension mediated by the polymerase, such as in the case of PCR.
[130] In a preferred embodiment, such kit may further comprise a DNA sample collecting means. In particular, the genotyping primer composition may comprise at least two sets of allele specific primer pairs. Preferably, the two genotyping oligonucleotides are packaged in separate containers.
[131] For antibody-based kits, the kit can comprise, e.g., (1) a first antibody, e.g., attached to a solid support, which binds to a polypeptide corresponding to a marker or the invention; and, optionally; (2) a second, different antibody which binds to either the polypeptide or the first antibody and is conjugated to a detectable label.
[132] For oligonucleotide-based kits, the kit can comprise, e.g., (1) an oligonucleotide, e.g., a detectably-labelled oligonucleotide, which hybridizes to a nucleic acid sequence encoding a polypeptide corresponding to a marker of the invention; or (2) a pair of primers useful for amplifying a nucleic acid molecule corresponding to a marker of the invention. [133] The kit can also comprise, e.g., a buffering agent, a preservative or a protein-stabilizing agent. The kit can further comprise components necessary for detecting the detectable-label, e.g., an enzyme or a substrate. The kit can also contain a control sample or a series of control samples, which can be assayed and compared to the test sample. Each component of the kit can be enclosed within an individual container and all of the various containers can be within a single package, along with instructions for interpreting the results of the assays performed using the kit.
[134] Making Polymorphisms and Mutations of the Invention. Effects of the polymorphisms and mutations identified herein on gene expression may be investigated by preparing recombinant cells and/or organisms, preferably recombinant animals, containing a polymorphic variant and/or mutation of the gene.
[135] In one aspect, the present invention includes one or more polynucleotides encoding mutant or polymorphic polypeptides, including degenerate variants thereof. The invention also encompasses allelic variants of the same, that is, naturally occurring alternative forms of the isolated polynucleotides that encode mutant polypeptides that are identical, homologous or related to those encoded by the polynucleotides. Alternatively, non-naturally occurring variants may be produced by mutagenesis techniques or by direct synthesis techniques well known in the art. Accordingly, nucleic acid sequences capable of hybridizing at low stringency with any nucleic acid sequences encoding mutant polypeptide of the present invention are considered to be within the scope of the invention. For example, for a nucleic acid sequence of about 20-40 bases, a typical prehybridization, hybridization, and wash protocol is as follows: (1) prehybridization: incubate nitrocellulose filters containing the denatured target DNA for 3-4 hours at 55°C in 5xDenhardt's solution, 6xSSC (2OxSSC consists of 175 g NaCl, 88.2 g sodium citrate in 800 ml H2O adjusted to pH. 7.0 with 10 N NaOH), 0.1% SDS, and 100 mg/ml denatured salmon sperm DNA, (2) hybridization: incubate filters in prehybridization solution plus probe at 42°C for 14-48 hours, (3) wash; three 15 minutes washes in 6xSSC and 0.1% SDS at room temperature, followed by a final 1-1.5
minutes wash in 6xSSC and 0.1% SDS at 55°C. Other equivalent procedures, e.g., employing organic solvents such as formamide, are well known in the art. Standard stringency conditions are well characterized in standard molecular biology cloning texts. See, for example, Sambrook, Fritsch, & Maniatis, Molecular Cloning A Laboratory Manual, 2nd Ed., (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1989); Glover DN, DNA Cloning, Volumes I and II , (1985); Oligonucleotide Synthesis, Gait MJ, ed. (1984); Nucleic Acid Hybridization, Hames BD & Higgins SJ, eds. (1984). [136] Recombinant Expression Vectors. Another aspect of the invention includes vectors containing one or more nucleic acid sequences encoding a mutant or polymorphic polypeptide. In practicing the present invention, many conventional techniques in molecular biology, microbiology and recombinant DNA are used. These techniques are well known and are explained in, e.g., Current Protocols in Molecular Biology, VoIs. I-III, Ausubel, ed. (1997); Sambrook et ah, Molecular Cloning: A Laboratory Manual, 2nd Edition. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1989); Glover DN, DNA Cloning: A Practical Approach, VoIs. I and II (1985); Oligonucleotide Synthesis, Gait, Ed. (1984); Nucleic Acid Hybridization, Hames & Higgins, eds. (1985); Transcription and Translation, Hames & Higgins, Eds. (1984); Animal Cell Culture, Freshney, ed. (1986); Immobilized Cells and Enzymes (IRL Press, 1986); Perbal, A Practical Guide to Molecular Cloning; the series Methods in En∑ymoL, (Academic Press, Inc., 1984); Gene Transfer Vectors for Mammalian Cells, Miller & Calos, eds. (Cold Spring Harbor Press, Cold Spring Harbor Laboratory, New York, 1987); and Methods in Enzymology, VoIs. 154 and 155, Wu & Grossman, and Wu, Eds., respectively.
[137] For recombinant expression of one or more the polypeptides of the invention, the nucleic acid containing all or a portion of the nucleotide sequence encoding the polypeptide is inserted into an appropriate cloning vector, or an expression vector {i.e., a vector that contains the necessary elements for the transcription and translation of the inserted polypeptide coding sequence) by recombinant DNA techniques well known in the art and as detailed below. [138] The term "regulatory sequence" is intended to include promoters, enhancers and other expression control elements (e.g., polyadenylation signals). Such regulatory sequences are described, for example, in Goeddel, Gene Expression Technology: Methods In Enzymology (Academic Press, San Diego, Calif., 1990). Regulatory sequences include those that direct
constitutive expression of a nucleotide sequence in many types of host cell and those that direct expression of the nucleotide sequence only in certain host cells (e.g., tissue specific regulatory sequences). It will be appreciated by those skilled in the art that the design of the expression vector can depend on such factors as the choice of the host cell to be transformed, the level of expression of polypeptide desired, etc. The expression vectors of the invention can be introduced into host cells to thereby produce polypeptides or peptides, including fusion polypeptides, encoded by nucleic acids as described herein (e.g., mutant polypeptides and mutant-derived fusion polypeptides, etc.).
[139] Mutant and Polymorphic Polypeptide-Expressing Host Cells. Another aspect of the invention pertains to mutant and polymorphic polypeptide-expressing host cells, which contain a nucleic acid encoding one or more mutant/polymorphic polypeptides of the invention. To prepare a recombinant cell of the invention, the desired isogene may be introduced into a host cell in a vector such that the isogene remains extrachromosomal. In such a situation, the gene will be expressed by the cell from the extrachromosomal location, hi a preferred embodiment, the isogene is introduced into a cell in such a way that it recombines with the endogenous gene present in the cell. Such recombination requires the occurrence of a double recombination event, thereby resulting in the desired gene polymorphism or mutation. Vectors for the introduction of genes both for recombination and for extrachromosomal maintenance are known in the art, and any suitable vector or vector construct may be used in the invention. Methods such as electroporation, particle bombardment, calcium phosphate co-precipitation and viral transduction for introducing DNA into cells are known in the art; therefore, the choice of method may lie with the competence and preference of the skilled practitioner.
[140] The recombinant expression vectors of the invention can be designed for expression of mutant polypeptides in prokaryotic or eukaryotic cells. For example, mutant/polymorphic polypeptides can be expressed in bacterial cells such as Escherichia coli (E. coif), insect cells (using baculovirus expression vectors), fungal cells, e.g., yeast, yeast cells or mammalian cells. Suitable host cells are discussed further in Goeddel, Gene Expression Technology: Methods In Enzymology (Academic Press, San Diego, Calif., 1990). Alternatively, the recombinant expression vector can be transcribed and translated in vitro, for example using T7 promoter regulatory sequences and T7 polymerase. The SMP2 promoter is useful in the
expression of polypeptides in smooth muscle cells, Qian et ah, Endocrinology 140(4): 1826 (1999).
[141] Expression of polypeptides in prokaryotes is most often carried out in E. coli with vectors containing constitutive or inducible promoters directing the expression of either fusion or non fusion polypeptides. Fusion vectors add a number of amino acids to a polypeptide encoded therein, usually to the amino terminus of the recombinant polypeptide. Such fusion vectors typically serve three purposes: (i) to increase expression of recombinant polypeptide; (ii) to increase the solubility of the recombinant polypeptide; and (iii) to aid in the purification of the recombinant polypeptide by acting as a ligand in affinity purification. Often, in fusion expression vectors, a proteolytic cleavage site is introduced at the junction of the fusion moiety and the recombinant polypeptide to enable separation of the recombinant polypeptide from the fusion moiety subsequent to purification of the fusion polypeptide. Such enzymes, and their cognate recognition sequences, include Factor Xa, thrombin and enterokinase. Typical fusion expression vectors include pGEX (Pharmacia Biotech Inc; Smith and Johnson, Gene 67: 31 40 (1988)), pMAL (New England Biolabs, Beverly, Mass.) and pRIT5 (Pharmacia, Piscataway, NJ.) that fuse glutathione S transferase (GST), maltose E binding polypeptide, or polypeptide A, respectively, to the target recombinant polypeptide. [142] Examples' of suitable inducible non fusion E. coli expression vectors include pTrc (Amrann et ah, Gene 69:301 315 (1988)) and pET 1 Id (Studier et ah, Gene Expression Technology: Methods In Enzymology (Academic Press, San Diego, Calif., 1990) pp. 60-89). [143] One strategy to maximize recombinant polypeptide expression in E. coli is to express the polypeptide in host bacteria with an impaired capacity to proteolytically cleave the recombinant polypeptide. See, e.g., Gottesman, Gene Expression Technology: Methods In Enzymology (Academic Press, San Diego, Calif, 1990) 119 128. Another strategy is to alter the nucleic acid sequence of the nucleic acid to be inserted into an expression vector so that the individual codons for each amino acid are those preferentially utilized in the expression host, e.g., E. coli (see, e.g., Wada et ah, Nucl. Acids Res. 20: 2111-2118 (1992)). Suph alteration of nucleic acid sequences of the invention can be carried out by standard DNA synthesis techniques. In another embodiment, the mutant/polymorphic polypeptide expression vector is a yeast expression vector.
[144] Examples of vectors for expression in yeast Saccharomyces cerivisiae include pYepSecl (Baldari et al, EMBO J. 6: 229 234 (1987)), pMFa (Kurjan & Herskowitz, Cell 30: 933 943 (1982)), pJRY88 (Schultz et al, Gene 54: 113 123 (1987)), pYES2 (InVitrogen Corporation, San Diego, Calif, USA), and picZ (InVitrogen Corp, San Diego, Calif, USA). Alternatively, mutant polypeptide can be expressed in insect cells using baculovirus expression vectors. Baculovirus vectors available for expression of polypeptides in cultured insect cells {e.g., SF9 cells) include the pAc series (Smith et al, MoI Cell. Biol. 3: 2156 2165 (1983)) and the pVL series (Lucklow & Summers, Virology 170: 31 39 (1989)). [145] In yet another embodiment, a nucleic acid of the invention is expressed in mammalian cells using a mammalian expression vector. Examples of mammalian expression vectors include pCDM8 (Seed, Nature 329: 842 846 (1987)) and pMT2PC (Kaufman et al, EMBOJ. 6: 187 195 (1987)). When used in mammalian cells, the expression vector's control functions are often provided by viral regulatory elements. For example, commonly used promoters are derived from polyoma, adenovirus 2, cytomegalovirus, and simian virus 40. For other suitable expression systems for both prokaryotic and eukaryotic cells see, e.g., Chapters 16 and 17 of Sambrook, et al, Molecular Cloning: A Laboratory Manual, 2nd Ed(CoId Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1989).
[146] hi another embodiment, the recombinant mammalian expression vector is capable of directing expression of the nucleic acid preferentially in a particular cell type (e.g., tissue specific regulatory elements are used to express the nucleic acid). Tissue specific regulatory elements are known in the art. Nonlimiting examples of suitable tissue specific promoters include the albumin promoter (liver specific; Pinkert, et al, Genes Dev. 1: 268 277 (1987)), lymphoid specific promoters (Calame & Eaton, Adv. Immunol. 43: 235 275 (1988)), in particular promoters of T cell receptors (Winoto & Baltimore, EMBOJ. 8: 729 733 (1989)) and immunoglobulins (Banerji et al, Cell 33: 729 740 (1983); Queen & Baltimore, Cell 33: 741 748 (1983)), neuron specific promoters (e.g., the neurofilament promoter; Byrne & Ruddle, Proc. Natl. Acad. ScL USA 86: 5473 5477 (1989)), pancreas specific promoters (Edlund et al, Science 230: 912 916 (1985)), and mammary gland specific promoters (e.g., milk whey promoter; U.S. Pat. No. 4,873,316 and European Application Publication No. 264,166). Developmentally regulated promoters are also encompassed, e.g., the murine hox
promoters (Kessel & Grass, Science 249: 374 379 (1990)) and the α-fetoprotein promoter (Campes & Tilghman, Genes Dev. 3 : 537 546 (1989)).
[147] The invention further provides a recombinant expression vector comprising a DNA molecule of the invention cloned into the expression vector in an antisense orientation. That is, the DNA molecule is operatively linked to a regulatory sequence in a manner that allows for expression (by transcription of the DNA molecule) of an RNA molecule that is antisense to a mutant polypeptide mRNA. Regulatory sequences operatively linked to a nucleic acid cloned in the antisense orientation can be chosen that direct the continuous expression of the antisense RNA molecule in a variety of cell types, for instance viral promoters and/or enhancers, or regulatory sequences can be chosen that direct constitutive, tissue specific or cell type specific expression of antisense RNA. The antisense expression vector can be in the form of a recombinant plasmid, phagemid or attenuated virus in which antisense nucleic acids are produced under the control of a high efficiency regulatory region, the activity of which can be determined by the cell type into which the vector is introduced. For a discussion of the regulation of gene expression using antisense genes see, e.g., Weintraub et al., "Antisense RNA as a molecular tool for genetic analysis," Reviews Trends in Genetics, Vol. 1(1) (1986). [148] Another aspect of the invention pertains to host cells into which a recombinant expression vector of the invention has been introduced. The terms "host cell" and "recombinant host cell" are used interchangeably herein. It is understood that such terms refer not only to the particular subject cell but also to the progeny or potential progeny of such a cell. Because certain modifications may occur in succeeding generations due to either mutation or environmental influences, such progeny may not, in fact, be identical to the parent cell, but are still included within the scope of the term as used herein.
[149] A host cell can be any prokaryotic or eukaryotic cell. For example, mutant polypeptide can be expressed in bacterial cells such as E. coli, insect cells, yeast or mammalian cells (such as Chinese hamster ovary cells (CHO) or COS cells). Other suitable host cells are known to those skilled in the art.
[150] Vector DNA can be introduced into prokaryotic or eukaryotic cells via conventional transformation or transfection techniques. As used herein, the terms "transformation" and "transfection" are intended to refer to a variety of art recognized techniques for introducing foreign nucleic acid (e.g., DNA) into a host cell, including calcium phosphate or calcium
chloride co precipitation, DEAE dextran mediated transfection, lipofection, or electroporation. Suitable methods for transforming or transfecting host cells can be found in Sambrook, et ah, Molecular Cloning: A Laboratory Manual, 2nd ed. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1989), and other laboratory manuals. [151] Transgenic Animals. Recombinant organisms, i.e., transgenic animals, expressing a variant gene of the invention are prepared using standard procedures known in the art. Transgenic animals carrying the constructs of the invention can be made by several methods known to those having skill in the art. See, e.g., U.S. Pat. No. 5,610,053 and "The Introduction of Foreign Genes into Mice" and the cited references therein, in: Recombinant DNA, Watson JD, Gilman M, Witkowski J & Zoller M, eds. (W.H. Freeman and Company, New York) pp. 254-272. Transgenic animals stably expressing a human isogene and producing human protein can be used as biological models for studying diseases related to abnormal expression and/or activity, and for screening and assaying various candidate drugs, compounds, and treatment regimens to reduce the symptoms or effects of these diseases. [152] Characterizing Gene Expression Level. Methods to detect and measure mRNA levels (i.e., gene transcription level) and levels of polypeptide gene expression products (i.e., gene translation level) are well-known in the art and include the use of nucleotide microarrays and polypeptide detection methods involving mass spectrometers, reverse-transcription and amplification and/or antibody detection and quantification techniques. See also, Strachan T & Read A, Human Molecular Genetics, 2nd Edition. (John Wiley and Sons, Inc. Publication, New York, 1999)).
[153] Determination of Target Gene Transcription. The determination of the level of the expression product of the gene in a biological sample, e.g., the tissue or body fluids of an individual, may be performed in a variety of ways. The term "biological sample" is intended to include tissues, cells, biological fluids and isolates thereof, isolated from a subject, as well as tissues, cells and fluids present within a subject. Many expression detection methods use isolated RNA. For in vitro methods, any RNA isolation technique that does not select against the isolation of mRNA can be utilized for the purification of RNA from cells. See, e.g., Ausubel et al, Ed., Curr. Prot. MoI. Biol. (John Wiley & Sons, New York, 1987-1999). [154] In one embodiment, the level of the mRNA expression product of the target gene is determined. Methods to measure the level of a specific mRNA are well-known in the art and
include Northern blot analysis, reverse transcription PCR and real time quantitative PCR or by hybridization to a oligonucleotide array or microarray. In other more preferred embodiments, the determination of the level of expression may be performed by determination of the level of the protein or polypeptide expression product of the gene in body fluids or tissue samples including but not limited to blood or serum. Large numbers of tissue samples can readily be processed using techniques well-known to those of skill in the art, such as, e.g., the single-step RNA isolation process of U.S. Pat. No. 4,843,155.
[155] The isolated niRNA can be used in hybridization or amplification assays that include, but are not limited to, Southern or Northern analyses, PCR analyses and probe arrays. One preferred diagnostic method for the detection of niRNA levels involves contacting the isolated niRNA with a nucleic acid molecule (probe) that can hybridize to the mRNA encoded by the gene being detected. The nucleic acid probe can be, e.g., a full-length cDNA, or a portion thereof, such as an oligonucleotide of at least 7, 15, 30, 50, 100, 250 or 500 nucleotides in length and sufficient to specifically hybridize under stringent conditions to an mRNA or genomic DNA encoding a marker of the present invention. Other suitable probes for use in the diagnostic assays of the invention are described herein. Hybridization of an mRNA with the probe indicates that the marker in question is being expressed. [156] In one format, the probes are immobilized on a solid surface and the mRNA is contacted with the probes, for example, in an Affymetrix gene chip array (Affymetrix, Calif. USA). A skilled artisan can readily adapt known mRNA detection methods for use in detecting the level of mRNA encoded by the markers of the present invention. [157] An alternative method for determining the level of mRNA corresponding to a marker of the present invention in a sample involves the process of nucleic acid amplification, e.g., by RT-PCR (the experimental embodiment set forth in U.S. Pat. No. 4,683,202); ligase chain reaction (Barany et al, Proc. Natl. Acad. Sci. USA 88:189-193 (1991)) self-sustained sequence replication (Guatelli et al, Proc. Natl. Acad. Sci. USA 87: 1874-1878 (1990)); transcriptional amplification system (Kwoh et al, Proc. Natl. Acad. Sci. USA 86: 1173-1177 (1989)); Q-Beta Replicase (Lizardi et al, Biol. Technology 6: 1197 (1988)); rolling circle replication (U.S. Pat. No. 5,854,033); or any other nucleic acid amplification method, followed by the detection of the amplified molecules using techniques well-known to those of skill in the art. These detection schemes are especially useful for the detection of the nucleic
acid molecules if such molecules are present in very low numbers. As used herein, "amplification primers" are defined as being a pair of nucleic acid molecules that can anneal to 5' or 3' regions of a gene (plus and minus strands, respectively, or vice-versa) and contain a short region in between. In general, amplification primers are from about 10-30 nucleotides in length and flank a region from about 50-200 nucleotides in length.
[158] Real-time quantitative PCR (RT-PCR) is one way to assess gene expression levels, e.g., of genes of the invention, e.g., those containing SNPs and mutations of interest. The RT-PCR assay utilizes an RNA reverse transcriptase to catalyze the synthesis of a DNA strand from an RNA strand, including an mRNA strand. The resultant DNA may be specifically detected and quantified and this process may be used to determine the levels of specific species of mRNA. One method for doing this is TAQMAN® (PE Applied Biosystems, Foster City, Calif., USA) and exploits the 5' nuclease activity of AMPLITAQ GOLD™ DNA polymerase to cleave a specific form of probe during a PCR reaction. This is referred to as a TAQMAN™ probe. See Luthra et al., Am. J. Pathol. 153: 63-68 (1998); Kuimelis et al, Nucl. Acids Symp. Ser. 37: 255-256 (1997); and Mullah et al, Nucl. Acids Res. 26(4): 1026-1031 (1998)). During the reaction, cleavage of the probe separates a reporter dye and a quencher dye, resulting in increased fluorescence of the reporter. The accumulation of PCR products is detected directly by monitoring the increase in fluorescence of the reporter dye. Heid et al, Genome Res. 6(6): 986-994 (1996)). The higher the starting copy number of nucleic acid target, the sooner a significant increase in fluorescence is observed. See Gibson, Heid & Williams et al, Genome Res. 6: 995-1001 (1996).
[159] Other technologies for measuring the transcriptional state of a cell produce pools of restriction fragments of limited complexity for electrophoretic analysis, such as methods combining double restriction enzyme digestion with phasing primers (see, e.g., EP 0 534858 Al), or methods selecting restriction fragments with sites closest to a defined mRNA end. (See, e.g., Prashar & Weissman, Proc. Natl. Acad. Sci. USA 93(2) 659-663 (1996)). [160] Other methods statistically sample cDNA pools, such as by sequencing sufficient bases, e.g., 20-50 bases, in each of multiple cDNAs to identify each cDNA, or by sequencing short tags, e.g., 9-10 bases, which are generated at known positions relative to a defined mRNA end pathway pattern. See, e.g., Velculescu, Science 270: 484-487 (1995). The cDNA levels in the samples are quantified and the mean, average and standard deviation of each cDNA is
determined using by standard statistical means well-known to those of skill in the art. Norman TJ. Bailey, Statistical Methods In Biology, 3rd Edition. (Cambridge University Press, 1995). [161] Detection of Polypeptides. Immunological Detection Methods. Expression of the protein encoded by the genes of the invention can be detected by a probe which is detectably labelled, or which can be subsequently labelled. The term "labelled", with regard to the probe or antibody, is intended to encompass direct-labelling of the probe or antibody by coupling, i.e., physically linking, a detectable substance to the probe or antibody, as well as indirect- labelling of the probe or antibody by reactivity with another reagent that is directly-labelled. Examples of indirect labelling include detection of a primary antibody using a fluorescently- labelled secondary antibody and end-labelling of a DNA probe with biotin such that it can be detected with fluorescently-labelled streptavidin. Generally, the probe is an antibody that recognizes the expressed protein. A variety of formats can be employed to determine whether a sample contains a target protein that binds to a given antibody. Immunoassay methods useful in the detection of target polypeptides of the present invention include, but are not limited to, e.g., dot blotting, western blotting, protein chips, competitive and noncompetitive protein binding assays, enzyme-linked immunosorbant assays (ELISA), immunohistochemistry, fluorescence activated cell sorting (FACS), and others commonly used and widely-described in scientific and patent literature, and many employed commercially. A skilled artisan can readily adapt known protein/antibody detection methods for use in determining whether cells express a marker of the present invention and the relative concentration of that specific polypeptide expression product in blood or other body tissues. Proteins from individuals can be isolated using techniques that are well-known to those of skill in the art. The protein isolation methods employed can, e.g., be such as those described in Harlow & Lane, Antibodies: A Laboratory Manual (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1988)).
[162] For the production of antibodies to a protein encoded by one of the disclosed genes, various host animals may be immunized by injection with the polypeptide, or a portion thereof. Such host animals may include, but are not limited to, rabbits, mice and rats. Various adjuvants may be used to increase the immunological response, depending on the host species including, but not limited to, Freund's (complete and incomplete), mineral gels, such as aluminium hydroxide; surface active substances, such as lysolecithin, pluronic polyols,
polyanions, peptides, oil emulsions, keyhole limpet hemocyanin and dinitrophenol; and potentially useful human adjuvants, such as bacille Camette-Guerin (BCG) and Corynebacterium parvum.
[163] Monoclonal antibodies (mAbs), which are homogeneous populations of antibodies to a particular antigen, may be obtained by any technique that provides for the production of antibody molecules by continuous cell lines in culture. These include, but are not limited to, the hybridoma technique of Kohler & Milstein, Nature 256: 495-497 (1975); and U.S. Pat. No. 4,376,110; the human B-cell hybridoma technique of Kosbor et al, Immunol. Today 4: 72 (1983); Cole et al, Proc. Natl. Acad. ScL USA 80: 2026-2030 (1983); and the EBV- hybridoma technique of Cole et al, Monoclonal Antibodies and Cancer Therapy (Alan R. Liss, Inc, 1985) pp. 77-96.
[164] Li addition, techniques developed for the production of "chimaeric antibodies" (see Morrison et al, Proc. Natl Acad. ScL USA 81: 6851-6855 (1984); Neuberger et al, Nature 312: 604-608 (1984); and Takeda et al, Nature 314: 452-454 (1985)), by splicing the genes from a mouse antibody molecule of appropriate antigen specificity together with genes from a human antibody molecule of appropriate biological activity can be used. A chimeric antibody is a molecule in which different portions are derived from different animal species, such as those having a variable or hypervariable region derived form a murine mAb and a human immunoglobulin constant region.
[165] Alternatively, techniques described for the production of single chain antibodies (U.S. Pat. No. 4,946,778; Bird, Science 242: 423-426 (1988); Huston et al, Proc. Natl. Acad. Set USA 85: 5879-5883 (1988); and Ward et al, Nature 334: 544-546 (1989)) can be adapted to produce differentially expressed gene single-chain antibodies.
[166] Techniques useful for the production of "humanized antibodies" can be adapted to produce antibodies to the proteins, fragments or derivatives thereof. Such techniques are disclosed in U.S. Pat. Nos. 5,932,448; 5,693,762; 5,693,761; 5,585,089; 5,530,101; 5,569,825; 5,625,126; 5,633,425; 5,789,650; 5,661,016; and 5,770,429. [167] Antibodies or antibody fragments can be used in methods, such as Western blots or immunofluorescence techniques, to detect the expressed proteins. In such uses, it is generally preferable to immobilize either the antibody or proteins on a solid support. Suitable solid phase supports or carriers include any support capable of binding an antigen or an antibody.
Well-known supports or carriers include glass, polystyrene, polypropylene, polyethylene, dextran, nylon, amylases, natural and modified celluloses, polyacrylamides, gabbros and magnetite.
[168] A useful method, for ease of detection, is the sandwich ELISA, of which a number of variations exist, all of which are intended to be used in the methods and assays of the present invention. As used herein, "sandwich assay" is intended to encompass all variations on the basic two-site technique. Immunofluorescence and EIA techniques are both very well- established in the art. However, other reporter molecules, such as radioisotopes, chemiluminescent or bioluminescent molecules may also be employed. It will be readily apparent to the skilled artisan how to vary the procedure to suit the required use. [169] Whole genome monitoring of protein, i.e., the "proteome," can be carried out by constructing a microarray in which binding sites comprise immobilized, preferably monoclonal, antibodies specific to a plurality of protein species encoded by the cell genome. Preferably, antibodies are present for a substantial fraction of the encoded proteins, or at least for those proteins relevant to testing or confirming a biological network model of interest. As noted above, methods for making monoclonal antibodies are well-known. See, e.g., Harlow & Lane, Antibodies: A Laboratory Manual" (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1988)). In a preferred embodiment, monoclonal antibodies are raised against synthetic peptide fragments designed based on genomic sequence of the cell. With such an antibody array, proteins from the cell are contacted to the array and their binding is measured with assays known in the art.
[170] Two-dimensional gel electrophoresis is well-known in the art and typically involves isoelectric focusing along a first dimension followed by SDS-PAGE electrophoresis along a second dimension. See, e.g., Hames et ah, Gel Electrophoresis of Proteins: A Practical Approach (IRL Press, New York, 1990); Shevchenko et ah, Proc. Natl. Acad. Sci. USA 93: 14440-14445 (1996); Sagliocco et ah, Yeast 12: 1519-1533 (1996); and Lander, Science 11 A: 536-539 (1996)).
[171] Detection of Polypeptides. Mass Spectroscopy. The identity as well as expression level of target polypeptide can be determined using mass spectrocopy technique (MS). MS-based analysis methodology is useful for analysis of isolated target polypeptide as well as analysis of target polypeptide in a biological sample. MS formats for use in analyzing a target
polypeptide include ionization (I) techniques, such as, but not limited to, matrix assisted laser desorption (MALDI), continuous or pulsed electrospray ionization (ESI) and related methods, such as ionspray or thermospray, and massive cluster impact (MCI). Such ion sources can be matched with detection formats, including linear or non-linear reflectron time of flight (TOF), single or multiple quadrupole, single or multiple magnetic sector Fourier transform ion cyclotron resonance (FTICR), ion trap and combinations thereof such as ion-trap/TOF. For ionization, numerous matrix/wavelength combinations (e.g., matrix assisted laser desorption (MALDI)) or solvent combinations (e.g., ESI) can be employed.
[172] For mass spectroscopy (MS) analysis, the target polypeptide can be solubilised in an appropriate solution or reagent system. The selection of a solution or reagent system, e.g., an organic or inorganic solvent, will depend on the properties of the target polypeptide and the type of MS performed, and is based on methods well-known in the art. See, e.g., Vorm et ah, Anal. Chem. 61: 3281 (1994) for MALDI; and Valaskovic et al., Anal. Chem. 67: 3802 (1995), for ESI. MS of peptides also is described, e.g., in International PCT Application No. WO 93/24834 and U.S. Pat. No. 5,792,664. A solvent is selected that minimizes the risk that the target polypeptide will be decomposed by the energy introduced for the vaporization process. A reduced risk of target polypeptide decomposition can be achieved, e.g., by embedding the sample in a matrix. A suitable matrix can be an organic compound such as a sugar, e.g., a pentose or hexose, or a polysaccharide such as cellulose. Such compounds are decomposed thermolytically into CO2 and H2O such that no residues are formed that can lead to chemical reactions. The matrix also can be an inorganic compound, such as nitrate of ammonium, which is decomposed essentially without leaving any residue. Use of these and other solvents is known to those of skill in the art. See, e.g., U.S. Pat. No. 5,062,935. Electrospray MS has been described by Fenn et al., J. Phys. Chem. 88: 4451-4459 (1984); and PCT Application No. WO 90/14148; and current applications are summarized in review articles. See Smith et ah, Anal. Chem. 62: 882-89 (1990); and Ardrey, Spectroscopy 4: 10-18 (1992).
[173] The mass of a target polypeptide determined by MS can be compared to the mass of a corresponding known polypeptide. For example, where the target polypeptide is a mutant protein, the corresponding known polypeptide can be the corresponding non-mutant protein, e.g., wild-type protein. With ESI, the determination of molecular weights in femtomole
amounts of sample is very accurate due to the presence of multiple ion peaks, all of which can be used for mass calculation. Sub-attomole levels of protein have been detected, e.g., using ESI MS (Valaskovic et al, Science 273: 1199-1202 (1996)) and MALDI MS (Li et al., J. Am. Chem. Soc. 118: 1662-1663 (1996)).
[174] Matrix Assisted Laser Desorption (MALDI). The level of the target protein in a biological sample, e.g., body fluid or tissue sample, may be measured by means of mass spectrometric (MS) methods including, but not limited to, those techniques known in the art as matrix-assisted laser desorption/ionization, time-of-flight mass spectrometry (MALDI- TOF-MS) and surfaces enhanced for laser desorption/ionization, time-of-flight mass spectrometry (SELDI-TOF-MS) as further detailed below. Methods for performing MALDI are well-known to those of skill in the art. See, e.g., Juhasz et al., Analysis, Anal. Chem. 68: 941-946 (1996), and see also, e.g., U.S. Pat. Nos. 5,777,325; 5,742,049; 5,654,545; 5,641,959; 5,654,545 and 5,760,393 for descriptions of MALDI and delayed extraction protocols. Numerous methods for improving resolution are also known. MALDI-TOF-MS has been described by Hillenkamp et al., Biological Mass Spectrometry, Burlingame & McCloskey, eds. (Elsevier Science Publ., Amsterdam, 1990) pp. 49-60.
[175] A variety of techniques for marker detection using mass spectroscopy can be used. See Bordeaux Mass Spectrometry Conference Report, Hillenkamp, Ed., pp. 354-362 (1988); Bordeaux Mass Spectrometry Conference Report, Karas & Hillenkamp, Eds., pp. 416-417 (1988); Karas & Hillenkamp, Anal. Chem. 60: 2299-2301 (1988); and Karas et al, Biomed. Environ. Mass Spectrum 18: 841-843 (1989). The use of laser beams in TOF-MS is shown, e.g., in U.S. Patent Nos. 4,694,167; 4,686,366, 4,295,046 and 5,045,694, which are incorporated herein by reference in their entireties. Other MS techniques allow the successful volatilization of high molecular weight biopolymers, without fragmentation, and have enabled a wide variety of biological macromolecules to be analyzed by mass spectrometry. [176] Surfaces Enhanced for Laser Desorption/ionization (SELDI). Other techniques are used which employ new MS probe element compositions with surfaces that allow the probe element to actively participate in the capture and docking of specific analytes, described as Affinity Mass Spectrometry (AMS). See SELDI patents U.S. Pat. Nos. 5,719,060; 5,894,063; 6,020,208; 6,027,942; 6,124,137; and U.S. Patent application No. U.S. 2003/0003465. Several types of new MS probe elements have been designed with Surfaces Enhanced for
Affmity Capture (SEAC). See Hutchens & Yip, Rapid Commun. Mass Spectrom. 7: 576-580 (1993). SEAC probe elements have been used successfully to retrieve and tether different classes of biopolymers, particularly proteins, by exploiting what is known about protein surface structures and biospecific molecular recognition. The immobilized affinity capture devices on the MS probe element surface, i.e., SEAC, determines the location and affinity (specificity) of the analyte for the probe surface, therefore the subsequent analytical MS process is efficient.
[177] Use of a Pin Tool to Immobilize a Polypeptide. The immobilization of a polypeptide of interest to a solid support using a pin tool can be particularly advantageous. Pin tools include those disclosed herein or otherwise known in the art. See, e.g., U.S. Application Serial Nos. 08/786,988 and 08/787,639; and International PCT Application No. WO 98/20166. [178] Pin tools can be useful for immobilizing polypeptides of interest in spatially addressable manner on an array. Such spatially addressable or pre-addressable arrays are useful in a variety of processes, including, for example, quality control and amino acid sequencing diagnostics. The pin tools described in the U.S. Application Nos. 08/786,988 and 08/787,639 and International PCT Application No. WO 98/20166 are serial and parallel dispensing tools that can be employed to generate multi-element arrays of polypeptides on a surface of the solid support. The array surface can be flat, with beads or geometrically altered to include wells, which can contain beads. In addition, MS geometries can be adapted for accommodating a pin tool apparatus.
[179] Other Aspects of the Biological State. In various embodiments of the invention, aspects of the biological activity state, or mixed aspects can be measured in order to obtain drug and pathway responses. The activities of proteins relevant to the characterization of cell function can be measured, and embodiments of this invention can be based on such measurements. Activity measurements can be performed by any functional, biochemical or physical means appropriate to the particular activity being characterized. Where the activity involves a chemical transformation, the cellular protein can be contacted with natural substrates, and the rate of transformation measured. Where the activity involves association in multimeric units, e.g., association of an activated DNA binding complex with DNA, the amount of associated protein or secondary consequences of the association, such as amounts of mRNA transcribed, can be measured. Also, where only a functional activity is known, e.g., as in cell cycle
control, performance of the function can be observed. However known and measured, the changes in protein activities form the response data analyzed by the methods of this invention. In alternative and non-limiting embodiments, response data may be formed of mixed aspects of the biological state of a cell. Response data can be constructed from, e.g., changes in certain mRNA abundances, changes in certain protein abundances and changes in certain protein activities.
[180] The following EXAMPLE is presented in order to more fully illustrate the preferred embodiments of the invention. This EXAMPLE should in no way be construed as limiting the scope of the invention, as defined by the appended claims.
EXAMPLE
BIOINFORMATICS ANALYSIS OF HDACl VARIANTS
[181] In an effort to investigate HDACl genetic variants in association with cancers, DHPLC analysis (Lilleberg SL, Curr. Opin. DrugDiscov. Devel. 6(2): 237-52 (March 2003)) was conducted on blood samples from 15 AML patients and tumour samples from 30 breast cancer patients. Five missense mutations were identified (TABLE 2) in the HDACl gene (NP_004955) from AML samples. Computational analyses were designed to evaluate effect of these mutations on HDACl function.
TABLE 2
HDACl mutations in AML
NT change Mutation/SNP AlIe. Frac.
ATOCTG M51L 0.05
SEQ ID N0:6
CGA>TGA R34Term 0.07
SEQ ID N0:8
ACT>GCT T114A 0.035
SEQ ID NO: 12
CAG>AAG Ql I lK 0.25
SEQ ID NO: 14
GTOGGC V157G 0.11
SEQ ID N0:16
[182] Known mutations and non-synonymous Single Nucleotide Polymorphisms (SNPs). Six non-synonymous SNPs of HDACl have been reported in dbSNP
(http://www.ncbi.nlm.nih.gov/SNP/index.html). None of the mutations in TABLE 2 matches any of the known SNPs. hi Drosophila, missense mutations (R30C, C98Y and P204S) of HDACl have been characterized and associated with adverse phenotype. Mottus R et al, Genetics 154(2): 657-68 (February 2000). However, no mutation of human HDACl in cancer has been reported previously.
TABLE 3
Known HDACl Non-Svnonvmous SNPs cSNP Allele 1 Allele 2 Ref SNP
H33N C A rsl l541185
SEQ ID NO:
L139 G T rsl l541184
SEQ ID NO:
F150S T C rsl 1541183
SEQ ID NO:
G378A G C rsl 140658
SEQ ID NO:
V379 C T rsl 140660
SEQ ID NO:
K403R A G rsl 140673
SEQ ID NO: 11
[183] Protein domain structure of HDACl. By searching Pfam (http://pfam.wustl.edu/), the histone deacetylase domain of HDACl is located between amino acid positions 10 and 320. Alignment of the human HDACl sequence with the Pfam model for HDAC indicates position 157 is highly conserved while the other three positions (51, 111 & 114) are less conserved. However, the wild type HDACl sequence doesn't match the consensus residue at position 157 ("V" in wild type HDACl versus "A" in the Pfam model), while it matches the consensus at position 51 and 111.
[184] Potential phosphorylation sites. Phosphorylation plays an important role in regulating HDACl function. Loss of phosphorylation at Ser(421) and Ser(423) reduces its en2ymatic activity and complex formation. Pflum MK et al, J. Biol. Chem. 276(50): 47733-41 (December 14, 2001; E-published October 15, 2001). In addition, phosphorylation at these two positions is important for nuclear translocation of HDACl . By using a phosphorylation site prediction software, NetPhos (http://www.cbs.dtu.dk/services/NetPhos/), additional serine, threonine and tyrosine phosphorylation sites are identified (TABLE 4). One mutation site (position 51) is close to a potential tyrosine phosphorylation site at position 54. This phosphorylation site is not predicted on the mutated sequence with M51L (SEQ ID NO:6).
TABLE 4
HDACl phosphorylation sites predicted by NetPhos Phosphorylation Positions
Serine 69,78,85,148,236,265,267,346,393,406,410,421,423
Threonine 7,65,189,190,304,445,460
Tyrosine 54,67,72,152,172,188,221,237,330
[185] Potential Small Ubiquitin-like Modifier (SUMO) modification sites. SUMO-I modification of HDACl at Lys444 and Lys476 modulates its biological activities. David G et al, J. Biol. Chem. 277(26):23658-63 (Jun 28, 2002; E-published April 17, 2002). Additional SUMO modification sites of wild type HDACl are predicted by SUMOPlot (http://www.abgent.com/default.php ?page=sumoplot). The mutation site, position 51, is close to a potential SUMO modification site, K50, with low score. K50 SUMO modification is also predicted on the rnutated sequence with M5 IL, which implies this mutation may have little influence on the SUMO modification pattern.
TABLE 5
SUMO-I modification sites predicted by SUMOplot
Position Score
K444 0.9278
K476 0.9278
K298 0.5778
K469 0.5000
K450 0.5000
K144 0.4778
K50 0.4444
K451 0.3944
K200 0.2000
[186] Other potential modifications. Other potential sites for protein modification are predicted by ProSite (http://au.expasy.org/tools/scanprosite/). Position 111 and 114 are close to a potential N-myristoylation site.
TABLE 6 Potential protein modification sites predicted by PROSITE
Modification Positions
N-glycosylation 83-86,275-278,349-349,433-436
Tyrosine sulphation 230-244,323-337,329-343 cAMP & cGMP dependent protein kinase 403-406,431-434 phosphorylation
PKC phosphorylation 7-9,78-80, 190- 192,277-279,304-306,410-
412
Casein kinase II phosphorylation 393-396,421-424,423-426,445-448,460-
463
Tyrosine kinase phosphorylation 229-237 N-myristoylation 115-120,116-121,138-143,182-187,215-
220,300-305,429-434
Amidation 429-432
Glutamic acid-rich region 417-487
Lysine-rich region 438-480
[187] Sequence alignment of known HDACl. Known HDACl sequences of various organisms including human, mouse, rat, chicken, frog, zebrafish, fruit fly, sea urchin, and corn and the sequence of yeast homolog RPD3 were obtained from GenBank and aligned using ClustalW (http://biobench2.eu.novartis.net/align/alignaa.html). For every position with a mutation reported, the mutated residues are inspected for their occurrence in organisms other than human. It is hypothesized that if the mutated residue is present in the wild type sequence of another species in the corresponding position, the amino acid change may not have any adverse effect on the protein function. In positions 51, 111 and 114 of human HDACl, mutated residues were found present in non-human species.
TABLE 7
Summary of sequence alignment of HDACl sequences from multiple organisms Mutation SEO ID NO: Comment
QlIlK SEQ ID NO: 14 "K" is present in S. cerevisiae RPD3
Tl 14A SEQ ID NO:12 "A" is present in HDACl of corn, chicken and fly
V157G SEQ ID NO:16 No variation in this position found
[188] Structure consideration. The change of amino acid property (Valdar WS, Proteins 48(2): 227-41 (August 1, 2002)) by mutation is summarized in TABLE 8. Secondary structure prediction of wild type and mutated HDACl sequences were performed by SOPM (self optimized prediction method; http://npsa-pbil.ibcp.fr/cgi- bin/npsa_automat.pl?page=npsa_sopm.html) .
TABLE 8
Influence of mutations on protein secondary structure Mutation Property change Secondary structure change
M51L Hydrophobic -> Aliphatic Shorten the upstream alpha-helix by 1 amino acid and extend the following extended strand by 1 amino acid Ql I lK Polar -> Positive Extend the upstream alpha-helix by 2 amino acids and shorten the following extended strand by 2 amino acids
Tl 14A Small -> Tiny Extend the upstream alpha-helix by 4 amino acid
V157G AliphatiCjSmall -> Tiny,hydrophobic No change
[189] Potential changes to secondary structures are summarized in TABLE 9 as well.
TABLE 9
Evaluation of mutations by sequence features
Mutation Protein PhosphorySUMO Other AA AA Secondary domain lation modification modification conservation property Structure change
M51L + + + ++ + +
Ql I lK + - + ++ + ++
T114A + - + + + +++
V 157G + +++ +
+: the effect of mutation on protein function is low ++: the effect of mutation on protein function is medium +++: the effect of mutation on protein function is high
[190] Conclusion. In AML patient samples, four missense mutations (M511L, Ql IlK, Tl 14A and Vl 57G) and a truncation mutation (R34Term) of HDACl have been identified. They are all located in the HDAC domain. None of the mutations has been reported previously. The mutation site position 51 is close to a potential tyrosine phosphorylation site at position 54 and next to a potential SUMO-I modification site at position 50. The mutation M51L (SEQ ID NO: 6) may eliminate the phosphorylation at 54 which may potentially alter the function of HDACl. Sequence alignment analysis of known HDACl from multiple organisms indicates position 157 is highly conserved while position 114 is not. Mutation at the highly conserved site may have dramatic influence on the overall function. Analysis of the secondary structure of the mutated proteins implies that Tl 14A (SEQ ID NO: 12) potentiates more significant structural changes than other mutations.
EQUIVALENTS
[191] The present invention is not to be limited in terms of the particular embodiments described in this application, which are intended as single illustrations of individual aspects of the invention. Many modifications and variations of this invention can be made without departing from its spirit and scope, as will be apparent to those skilled in the art. Functionally equivalent methods and apparatuses within the scope of the invention, in addition to those enumerated herein, will be apparent to those skilled in the art from the foregoing descriptions. Such modifications and variations are intended to fall within the scope of the appended claims. The present invention is to be limited only by the terms of the appended claims, along with the full scope of equivalents to which such claims are entitled.