WO2006104812A2 - Biomarkers for pharmacogenetic diagnosis of type 2 diabetes - Google Patents

Biomarkers for pharmacogenetic diagnosis of type 2 diabetes Download PDF

Info

Publication number
WO2006104812A2
WO2006104812A2 PCT/US2006/010464 US2006010464W WO2006104812A2 WO 2006104812 A2 WO2006104812 A2 WO 2006104812A2 US 2006010464 W US2006010464 W US 2006010464W WO 2006104812 A2 WO2006104812 A2 WO 2006104812A2
Authority
WO
WIPO (PCT)
Prior art keywords
diabetes
type
predisposition
individual
gene
Prior art date
Application number
PCT/US2006/010464
Other languages
French (fr)
Other versions
WO2006104812A3 (en
Inventor
Hong Chen
Thomas Edward Hughes
Original Assignee
Novartis Ag
Novartis Pharma Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Novartis Ag, Novartis Pharma Gmbh filed Critical Novartis Ag
Priority to JP2008503147A priority Critical patent/JP2008538177A/en
Priority to EP06739311A priority patent/EP1869214A2/en
Publication of WO2006104812A2 publication Critical patent/WO2006104812A2/en
Publication of WO2006104812A3 publication Critical patent/WO2006104812A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P3/00Drugs for disorders of the metabolism
    • A61P3/08Drugs for disorders of the metabolism for glucose homeostasis
    • A61P3/10Drugs for disorders of the metabolism for glucose homeostasis for hyperglycaemia, e.g. antidiabetics
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/106Pharmacogenomics, i.e. genetic variability in individual responses to drugs and drug metabolism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/172Haplotypes

Definitions

  • This invention relates generally to the analytical testing of tissue samples in vitro, and more particularly to aspects of genetic polymorphisms indicative of type 2 diabetes mellitus.
  • Type 2 diabetes mellitus is the common form of diabetes affecting approximately 16 million Americans alone. T2DM has become an enormous public health concern because of the rapid increase of patient population and its association with multitude of age-related disorders and its complications. Amos AF et al, Diabet. Med. 14 Suppl 5:S1- 85 (1997). T2DM is multifactorial in origin with both genetic and environmental factors contributing to its development. However, our understanding of the disease and its treatment is limited and unsatisfactory.
  • Therapy specific diagnostics (a.k.a., theranostics) is an emerging medical technology field, which provides tests useful to diagnose a disease, choose the correct treatment regime and monitor a subject's response. That is, theranostics are useful to predict and assess drug response in individual subjects, i.e., individualized medicine. Theranostic tests are also useful to select subjects for treatments that are particularly likely to benefit from the treatment or to provide an early and objective indication of treatment efficacy in individual subjects, so that the treatment can be altered with a minimum of delay.
  • SNPs single nucleotide polymorphisms
  • SNPs genetic variations
  • haplotypes a set of closely linked genetic markers, in this context SNPs, present on one chromosome
  • the invention provides a response to the need in the art. Significant associations were identified between single nucleotide polymorphisms (SNPs) in TCF2, ACADSB, CPTlA, ESRRA, PPARD, PPARGClA and SCDl and diabetes were observed. In addition, significant associations were also identified between haplotypes in ESRRA, PPARD, and ACACB and diabetes. These SNPs and haplotypes are useful for improving the diagnosis of type 2 diabetes (T2DM) and for designing clinical trials by better patient stratification. [07] Accordingly, the invention provides a method for diagnosing type 2 diabetes in an individual.
  • SNPs single nucleotide polymorphisms
  • the genotype of the individual is determined in a gene selected from TCF2, ACADSB, CPTlA, ESRRA, PPARD, PPARGClA and SCDl. If a SNP is found that is indicative of a predisposition to type 2 diabetes, then the individual is diagnosed as having a predisposition to type 2 diabetes.
  • the invention also provides a method for diagnosing type 2 diabetes in an individual by determining the haplotype of the individual in a gene selected from ESRRA, PPARD, or ACACB. If a haplotype is found that is indicative of a predisposition to type 2 diabetes, then the individual is diagnosed as having a predisposition to type 2 diabetes. [09] The invention further provides a theranostic method of treating type 2 diabetes in an individual. The genotype or haplotype of an individual suspected of having type 2 diabetes is determined. If the genotype or haplotype indicates that the individual suspected of having type 2 diabetes has a predisposition for having type 2 diabetes, then the individual is treated with an appropriate anti-diabetic agent or other therapy.
  • the invention provides a method for determining whether an individual is to be included in a study of an anti-diabetic agent.
  • the genotype or haplotype of candidate for inclusion in the study is determined. If the genotype or haplotype indicates that the candidate has a predisposition for having type 2 diabetes, then the individual is included in the study. If the genotype or haplotype indicates that the candidate does not have a predisposition for having type 2 diabetes, then the individual is either not included in the study or else included as a control.
  • the invention also provides kit for use in the methods of the invention.
  • SNPs were chosen from the public database or developed internally by direct sequencing of the genomic regions of selected genes. SNPs were spaced at an average interval of at least one per 5 kb spanning the whole genomic region of each gene (ACADSB, ACACB, CPTlA, ESRRA, PPARD, PPARGClA and SCDl). For TCF2, only a few candidate SNPs were selected from literature for the analysis.
  • the various aspects of the invention further relate to diagnostic/theranostic methods and kits that use the genetic variations of the invention to identify individuals predisposed to disease or to classify individuals with regard to drug responsiveness, side effects, or optimal drug dose.
  • the invention provides methods for compound validation and a computer system for storing and analyzing data related to the genetic variations of the invention. Accordingly, various particular embodiments that illustrate these aspects follow.
  • Type 2 diabetes mellitus is a clinically and genetically heterogeneous groups of disorders characterized by abnormally high levels of glucose in the blood. T2DM comprises approximately 90% of the diabetes syndrome. It is characterized by insulin resistance in muscle, liver and adipose tissue that probably begins at a preclinical stage. Eventually, defects in insulin secretion fail to compensate for insulin resistance and lead to hyperglycaemia precipitate clinical onset of diabetes. Harris MI, Chapter 32: Definition and Classification of Diabetes Mellitus and the New Criteria for Diagnosis, pp. 326-334, in Diabetes Mellitus: a Fundamental and Clinical Text, 2 nd Edition, editors: LeRoith D, Taylor SI, Olefsky JM; (Lippincott Williams &Wilkins, Philadelphia, Pennsylvania, 2000).
  • allele means a particular form of a gene or DNA sequence at a specific chromosomal location (locus).
  • the term “antibody” includes, but is not limited to, polyclonal antibodies, monoclonal antibodies, humanized or chimaeric antibodies and biologically functional antibody fragments sufficient for binding of the antibody fragment to the protein.
  • the term “clinical response” means any or all of the following: a quantitative measure of the response, no response, and adverse response (i.e., side effects).
  • the term “clinical trial” means any research study designed to collect clinical data on responses to a particular treatment, and includes but is not limited to phase I, phase II and phase III clinical trials. Standard methods are used to define the patient population and to enrol subjects.
  • the term "effective amount" of a compound is a quantity sufficient to achieve a desired therapeutic and/or prophylactic effect, for example, an amount which results in the prevention of or a decrease in the symptoms associated with a disease that is being treated, e.g., the diseases associated with genetic variations and polypeptides identified herein.
  • the amount of compound administered to the subject will depend on the type and severity of the disease and on the characteristics of the individual, such as general health, age, sex, body weight and tolerance to drugs. It will also depend on the degree, severity and type of disease.
  • an effective amount of the compounds of the present invention ranges from about 0.000001 mg per kilogram body weight per day to about 10,000 mg per kilogram body weight per day.
  • the dosage ranges are from about 0.0001 mg per kilogram body weight per day to about 100 mg per kilogram body weight per day.
  • the compounds of the present invention can also be administered in combination with each other, or with one or more additional therapeutic compounds.
  • expression includes but is not limited to one or more of the following: transcription of the gene into precursor mRNA; splicing and other processing of the precursor mRNA to produce mature mRNA; mRNA stability; translation of the mature mRNA into protein (including codon usage and tRNA availability); and glycosylation and/or other modifications of the translation product, if required for proper expression and function.
  • RNA Ribonucleic acid
  • gene means a segment of DNA that contains all the information for the regulated biosynthesis of an RNA product, including promoters, exons, introns, and other untranslated regions that control expression.
  • genotype means an unphased 5' to 3' sequence of nucleotide pairs found at one or more polymorphic sites in a locus on a pair of homologous chromosomes in an individual.
  • genotype includes a full-genotype and/or a sub-genotype.
  • locus means a location on a chromosome or DNA molecule corresponding to a gene or a physical or phenotypic feature.
  • the term “modulating agent” is any compound that alters (e.g., increases or decreases) the expression level or biological activity level of the polypeptides compared to the expression level or biological activity level of the polypeptides in the absence of the modulating agent.
  • the modulating agent can be a small molecule, polypeptide, carbohydrate, lipid, nucleotide, or combination thereof.
  • the modulating agent may be an organic compound or an inorganic compound.
  • mutant means any heritable variation from the wild-type that is the result of a mutation, e.g., single nucleotide polymorphism.
  • mutant is used interchangeably with the terms “marker”, “biomarker”, and “target” throughout the specification.
  • the term "medical condition” includes, but is not limited to, any condition or disease manifested as one or more physical and/or psychological symptoms for which treatment is desirable, and includes previously and newly identified diseases and other disorders.
  • nucleotide pair means the nucleotides found at a polymorphic site on the two copies of a chromosome from an individual.
  • polymorphic site means a position within a locus at which at least two alternative sequences are found in a population, the most frequent of which has a frequency of no more than 99%.
  • phased means, when applied to a sequence of nucleotide pairs for two or more polymorphic sites in a locus, the combination of nucleotides present at those polymorphic sites on a single copy of the locus is known.
  • polymorphism means any sequence variant present at a frequency of >1% in a population.
  • the sequence variant may be present at a frequency significantly greater than 1% such as 5% or 10 % or more.
  • the term may be used to refer to the sequence variation observed in an individual at a polymorphic site.
  • Polymorphisms include nucleotide substitutions, insertions, deletions and microsatellites and may, but need not, result in detectable differences in gene expression or protein function.
  • polynucleotide means any RNA or DNA, which may be unmodified or modified RNA or DNA.
  • Polynucleotides include, without limitation, single- and double-stranded DNA, DNA that is a mixture of single- and double-stranded regions, single- and double-stranded RNA, RNA that is mixture of single- and double-stranded regions, and hybrid molecules comprising DNA and RNA that may be single-stranded or, more typically, double-stranded or a mixture of single- and double-stranded regions.
  • polynucleotide refers to triple-stranded regions comprising RNA or DNA or both
  • polynucleotide also includes DNAs or RNAs containing one or more modified bases and DNAs or RNAs with backbones modified for stability or for other reasons.
  • polypeptide means any polypeptide comprising two or more amino acids joined to each other by peptide bonds or modified peptide bonds, i.e., peptide isosteres.
  • Polypeptide refers to both short chains, commonly referred to as peptides, glycopeptides or oligomers, and to longer chains, generally referred to as proteins.
  • Polypeptides may contain amino acids other than the 20 gene-encoded amino acids.
  • Polypeptides include amino acid sequences modified either by natural processes, such as post- translational processing, or by chemical modification techniques that are well-known in the art. Such modifications are well described in basic texts and in more detailed monographs, as well as in a voluminous research literature.
  • SNP nucleic acid means a nucleic acid sequence, which comprises a nucleotide that is variable within an otherwise identical nucleotide sequence between individuals or groups of individuals, thus existing as alleles. Such SNP nucleic acids are preferably from about 15 to about 500 nucleotides in length.
  • the SNP nucleic acids may be part of a chromosome, or they may be an exact copy of a part of a chromosome, e.g., by amplification of such a part of a chromosome through PCR or through cloning.
  • SNPs The SNP nucleic acids are referred to hereafter simply as "SNPs”.
  • a SNP is the occurrence of nucleotide variability at a single position in the genome, in which two alternative bases occur at appreciable frequency (i.e., >1%) in the human population.
  • a SNP may occur within a gene or within intergenic regions of the genome.
  • SNP probes according to the invention are oligonucleotides that are complementary to a SNP nucleic acid.
  • a "haplotype” is a set of closely linked genetic markers, in this context SNPs, present on one chromosome which tends to be inherited together.
  • the term "subject" means that preferably the subject is a mammal, such as a human, but can also be an animal, e.g., domestic animals (e.g., dogs, cats and the like), farm animals (e.g., cows, sheep, pigs, horses and the like) and laboratory animals (e.g., monkey (e.g., cynmologous monkey), rats, mice, guinea pigs and the like).
  • the administration of an agent or drug to a subject or patient includes self-administration and the administration by another.
  • a SNP is said to be "allelic” in that due to the existence of the polymorphism, some members of a species may have an unmutated sequence (i.e., the original allele) whereas other members may have a mutated sequence (i.e. , the variant or mutant allele).
  • An association between a SNP and a particular phenotype does not necessarily indicate or require that the SNP is causative of the phenotype. Instead, the association may merely be due to genome proximity between a SNP and those genetic factors actually responsible for a given phenotype, such that the SNP and said genetic factors are closely linked. That is, a SNP may be in linkage disequilibrium ("LD") with the "true” functional variant.
  • LD linkage disequilibrium
  • LD a.k.a., allelic association
  • a SNP may serve as a marker that has value by virtue of its proximity to a mutation that causes a particular phenotype.
  • the invention also includes single-stranded polynucleotides that are complementary to the sense strand of the genomic variants described herein.
  • SNPs Single-strand conformation polymorphism
  • DPLC denaturing high-performance liquid chromatography
  • Detection technologies include fluorescent polarization (Chan et ah, Genome Res.
  • ACACB_A112792T_rs TCCATGCATGCCTG AACAAGMGGGACA GCATGCCTGTTGTT 2160602 (tag SNP) TTGTTG (SEQ ID ACAGGG (SEQ ID GGGAGTGTG (SEQ ID
  • ACACB_A136178G_rs TGGCCTTTTTTCTCA GTTGGGGTGAGTGC GGGTCATTCTTTTCC 882355 (tag SNP) GGGTC (SEQ ID TATGAA (SEQ ID CCTAAC (SEQ ID
  • ACACB_A70941G_rs2 CCTGCAGCCTCACA GCGGTCTTCTAACT GACAGACTGGGAGA 268389 (tag SNP) TAAATG (SEQ ID ACACTC (SEQ ID TCGAGT (SEQ ID NO: 1)
  • ACACB_A75069G_rs2 GGCCTCTGATCAAG CAAGGCTCAAAAGG GAGACCCACTTATA 239607 (tag SNP) TTGAAC (SEQ ID AAACCC (SEQ ID GCCTAAC (SEQ ID
  • ACACB_A85859G_rs2 ATCAGGTGCAGAGA GAACCAATTACAGT TAACAAATGGGACC 300452 (tag SNP)
  • ACTCAC SEQ ID CTCGGG (SEQ ID AAGAAAGT (SEQ ID NO: 1)
  • ACACB_C115848T_rs ATCCTGTATGCATAC ATTTGCGGAAGGAT TGCATACATTGGAAA 759560 (tag SNP) ATTGG (SEQ ID GTGTAC (SEQ ID CATACA (SEQ ID
  • ACACB_C121771T_rs AGGGTGCCATGATT TTCTGTCCACAGCT CATGATTTCCTTGAA 3742023 (tag SNP) TCCTTG (SEQ ID CTGAAG (SEQ ID ACTGCC (SEQ ID
  • ACACB_C61722T_rs2 ACATCTCCCTCCAG AAGGCCCTTTAGGG CTGCCCCATCTGGT 284694 (tag SNP) GAAGAG (SEQ ID TGTGTG (SEQ ID ACTTCAGTTC (SEQ ID NO: 1)
  • ACACB_C72398T_rs2 TGAGACAGGTGGAC TGTGTTTCTGCACCA AGGCTGGTAGCGCT 284690 (tag SNP) TCAAGG (SEQ ID TCACG (SEQ ID TCTCC (SEQ ID NO: 1)
  • ACACB_C98257T_rs3 AGAGTTGGGTCTGC TTGGTGATGCTGAT CCCCGACTTGCCAT 742027 (tag SNP) AAGCAG (SEQ ID GGGCAC (SEQ ID CACC (SEQ ID NO: 1)
  • ACACB_G 10107A_rs2 GTCTTTTGACACCAC GTAATACCTCTTCCC TTTGACACCACCTTC 430684 (tag SNP) CTTCC (SEQ ID TGGTG (SEQ ID CATGGCC (SEQ ID
  • ACACB_G124627A_rs GTTGGCAAAGATCA CCAGACTCAGCCTA CTCCCGGTTGAAGT 2075260 (tag SNP) TCAGGG (SEQ ID CAAAAC (SEQ ID CCTTGA (SEQ ID NO: 1)
  • ACACB_G34840A_rs2 AATGCATCAGAGGC ATGTCTGTGAAGAG AGGCTGTGTGCTGT 46092 (tag SNP) TGTGTG (SEQ ID CTTGGG (SEQ ID TCCCA (SEQ ID
  • ACACB_G9282A_rs38 GCACACATGAGTCT CTTCCTCAAGGAAC TCTTCTCTGTCAGAA 58707 (tag SNP) TCTCTG (SEQ ID ATTCCC (SEQ ID GCCCCTGAT (SEQ ID
  • ACACB_T2745C_rs16 CCTTTCAAGATCATC CCTTTGTTCCATTAA TTAAAGAAAAGTCAG 54884 (tag SNP) ATGTG (SEQ ID TGTGGC (SEQ ID TCAAGGGTG (SEQ ID TGTGGC (SEQ ID TCAAGGGTG (SEQ ID TGTGGC (SEQ ID TCAAGGGTG (SEQ ID TGTGGC (SEQ ID TCAAGGGTG (SEQ
  • ACACB_T33519C_rs4 TTGCGGAACATCTC GTGCTTATTGCCAA TGGAGCGCATGCAC 766516 (tag SNP) ATAGGC (SEQ ID CAACGG (SEQ ID TTCAC (SEQ ID
  • ACACB_T52085C_rs1 CTCTCTACAATGAG CAGGTTTAGAACCC AATGAGCCAGACTT 016331 (tag SNP) CCAGAC (SEQ ID TAGTCC (SEQ ID CATACTGT (SEQ ID
  • ACACB_T5524C_rs28 ATGACCAACTTCATC GTAGACTCACGAGA CTCTTTTGATGACTA 78960 (tag SNP) CTGGG (SEQ ID TGAGCC (SEQ ID CTCCTC (SEQ ID
  • ACACB_T57784C_rs2 CCTTGAACTCAGAA TGGCAGTCAGTGAA TACACAAGTCAGCA 287221 (tag SNP) CTCCTG (SEQ ID CAGGCTG (SEQ ID TGGATCC (SEQ ID
  • ESRRA_C7886T_gs2 ACAAGGTGCCTACC GAGGAAGACTTTTC AGGAGTCTGCGGAT 29601623 CATCTC (SEQ ID TGGGAG (SEQ ID GAC (SEQ ID NO:84)
  • ESRRA_C9200T_rs22 ACGCGGGCTGTCCT CCCCATCCGAGTGG TCCTGCACTGACTC 86613 (tag SNP) GCACTGA (SEQ I D AATTTG (SEQ ID ACG (SEQ ID NO:87)
  • ESRRA_T22947G_rs2 TTATTTCCTGCCTGC ACGATTGGCGAGAA CTGCCAGACCCCTC 079786 (tag SNP) CAGAC (SEQ ID AGGTGG (SEQ ID CCC (SEQ ID NO:90)
  • PPARD_A74075G_rs2 AGAGACAATTCCAG AGATGCAGTTCTGG ACTAGAGACCCTGG 038068 (tag SNP) GCTAGG (SEQ ID ACTCTG (SEQ ID TCCCAA (SEQ ID NO: 1)
  • PPARD_C91401T_rs2 TGAGAAGAGGAAGC TTGGAGAAGGCCTT GCTGGTGGCAGGG 076167 (tag SNP) TGGTGG (SEQ ID CAGGTC (SEQ ID CTGACTGCAAA
  • PPARD_G10263T_rs9 TCACGGCGGCTTCC AGGGTCAGCGGGG CCGGTCAGCCGTCG 658060 (tag SNP) TGATGC (SEQ ID CGCCTAC (SEQ ID TGCG (SEQ ID NO: 1)
  • PPARD_G77738A_rs2 TGGAGTCTTTCCAA TAAGGGTTGGAACT CTTACTGGGTGGTG 267669 (tag SNP) GGTGAC (SEQ ID GTCTCC (SEQ ID ATGCCA (SEQ ID
  • SCD_C25750T_rs784 AACCCTCTTTTGCTC TCTCATGAGGCACA CTGGCCCACTGGCT 9 TGTGG (SEQ ID GCCAAG (SEQ ID CAAC (SEQ ID NO: 154) NO: 155) NO: 156)
  • TCF2_A10228G_rs11 AACAGAGGAGAAGG ATGGGAAGTCCTCT GGTACACCTCATCC 651755 (first run) TGACTG (SEQ ID TTTGCC (SEQ ID CTTTCTTC (SEQ ID NO: 160) NO:161) NO:162)
  • TCF2_A10228G_rs11 ATGGGAAGTCCTCT AACAGAGGAGAAGG CCTCTTTTGCCCACT 651755 (second run) TTTGCC (SEQ ID TGACTG (SEQ ID AACCTC (SEQ ID NO:163) NO:164) NO: 165)
  • Polymorphisms can also be detected using commercially available products, such as INVADERTM technology (available from Third Wave Technologies Inc. Madison, Wisconsin, USA).
  • INVADERTM technology available from Third Wave Technologies Inc. Madison, Wisconsin, USA.
  • a specific upstream "invader” oligonucleotide and a partially overlapping downstream probe together form a specific structure when bound to complementary DNA template. This structure is recognized and cut at a specific site by the Cleavase enzyme, resulting in the release of the 5' flap of the probe oligonucleotide. This fragment then serves as the "invader” oligonucleotide with respect to synthetic secondary targets and secondary fluorescently labelled signal probes contained in the reaction mixture.
  • polymorphisms may also be determined using a mismatch detection technique including, but not limited to, the RNase protection method using riboprobes (Winter et al, Proc. Natl. Acad. ScL USA 82:7575 (1985); Meyers et al, Science 230:1242 (1985)) and proteins which recognize nucleotide mismatches, such as the E. coli mutS protein (Modrich P.
  • variant alleles can be identified by single strand conformation polymorphism (SSCP) analysis (Orita et al, Genomics 5:874-879 (1989); Humphries et al, in Molecular Diagnosis of Genetic Diseases, R. Elles, ed., (1996) pp. 321-340) or denaturing gradient gel electrophoresis (DGGE) (Wartell et al, Nucl Acids. Res. 18:2699-2706 (1990); Sheffield et al, Proc. Natl Acad. Sci. USA 86:232-236 (1989)).
  • SSCP single strand conformation polymorphism
  • DGGE denaturing gradient gel electrophoresis
  • a polymerase-mediated primer extension method may also be used to identify the polymorphisms.
  • the invention provides methods and compositions for haplotyping and/or genotyping the gene in an individual.
  • the terms "genotype” and “haplotype” mean the genotype or haplotype containing the nucleotide pair or nucleotide, respectively, that is present at one or more of the polymorphic sites described herein and may optionally also include the nucleotide pair or nucleotide present at one or more additional polymorphic sites in the gene.
  • the additional polymorphic sites may be currently known polymorphic sites or sites that are subsequently discovered. [46] .
  • compositions of the invention contain oligonucleotide probes and primers designed to specifically hybridize to one or more target regions containing, or that are adjacent to, a polymorphic site.
  • Oligonucleotide compositions of the invention are useful in methods for genotyping and/or haplotyping a gene in an individual.
  • the methods and compositions for establishing the genotype or haplotype of an individual at the polymorphic sites described herein are useful for studying the effect of the polymorphisms in the aetiology of diseases affected by the expression and function of the protein, studying the efficacy of drugs targeting, predicting individual susceptibility to diseases affected by the expression and function of the protein and predicting individual responsiveness to drugs targeting the gene product.
  • Genotyping oligonucleotides of the invention may be immobilized on or synthesized on a solid surface such as a microchip, bead, or glass slide. See, e.g., WO 98/20020 and WO 98/20019.
  • Genotyping oligonucleotides may hybridize to a target region located one to several nucleotides downstream of one of the polymorphic sites identified herein. Such oligonucleotides are useful in polymerase-mediated primer extension methods for detecting one of the polymorphisms described herein and therefore such genotyping oligonucleotides are referred to herein as "primer-extension oligonucleotides”.
  • a genotyping method of the invention may involve isolating from an individual a nucleic acid mixture comprising the two copies of a gene of interest or fragment thereof, and determining the identity of the nucleotide pair at one or more of the polymorphic sites in the two copies.
  • the two "copies" of a gene in an individual may be the same allele or may be different alleles.
  • the genotyping method comprises determining the identity of the nucleotide pair at each polymorphic site.
  • the nucleic acid mixture is isolated from a biological sample taken from the individual, such as a blood sample or tissue sample. Suitable tissue samples include whole blood, semen, saliva, tears, urine, faecal material, sweat, buccal smears, skin and hair.
  • a method of genotyping used in the EXAMPLE below is as follows: Genotyping of all SNPs was performed by single base extension followed by Mass Spectroscopy using Sequenom's MassArrayTM Technology. Ross et al, Nat. Biotechnol. 16: 1347-1351 (1998). Ascertainment of genotypes on this system is based on matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) analysis of homogenous Mass Extension (hME) reaction products.
  • MALDI-TOF matrix-assisted laser desorption/ionization time-of-flight
  • a haplotyping method of the invention may include isolating from an individual a nucleic acid molecule containing only one of the two copies of a gene of interest, or a fragment thereof, and determining the identity of the nucleotide at one or more of the polymorphic sites in that copy.
  • Direct haplotyping methods include, for example, CLASPER SystemTM technology (U.S. Pat. No. 5,866,404) or allele- specific long-range PCR (Michalotos-Beloin et al, Nucl. Acids. Res. 24: 4841-4843 (1996)).
  • the nucleic acid may be isolated using any method capable of separating the two copies of the gene or fragment.
  • a haplotype pair is determined for an individual by identifying the phased sequence of nucleotides at one or more of the polymorphic sites in each copy of the gene that is present in the individual.
  • the haplotyping method comprises identifying the phased sequence of nucleotides at each polymorphic site in each copy of the gene.
  • haplotyping method was as follows: All haplotype association analyses were performed using the haplo.score program from the haplo.stats package in R, using the recommended parameters for a binary trait. See, Becker R, Chambers J & Willks A, The new S language: a programming environment for data analysis and graphics (Wadsworth & Brooks/Cole Advanced Books, Pacific Grove, 1988) p. 702; Schaid DJ et al, Am. J. Hum. Genet. 70(2): 425-34 (2002).
  • Haplo.score infers haplotypes from genotype data using the EM algorithm and tests association with a trait both for the individual haplotypes at a locus and for the entire set of haplotypes.
  • the identity of a nucleotide (or nucleotide pair) at a polymorphic site may be determined by amplifying a target regions containing the polymorphic sites directly from one or both copies of the gene, or fragments thereof, and sequencing the amplified regions by conventional methods.
  • the genotype or haplotype for the gene of an individual may also be determined by hybridization of a nucleic sample containing one or both copies of the gene to nucleic acid arrays and subarrays such as described in WO 95/11995.
  • polymorphic sites in linkage disequilibrium may be indirectly determined by genotyping other polymorphic sites in linkage disequilibrium with those sites of interest. As described above, two sites are said to be in linkage disequilibrium if the presence of a particular variant at one site is indicative of the presence of another variant at a second site. Stevens JC, MoI. Diag. 4: 309-317 (1999). Polymorphic sites in linkage disequilibrium with the polymorphic sites of the invention may be located in regions of the same gene or in other genomic regions.
  • the target regions may be amplified using any oligonucleotide-directed amplification method, including but not limited to polymerase chain reaction (PCR).
  • PCR polymerase chain reaction
  • LCR ligase chain reaction
  • OLA oligonucleotide ligation assay
  • Oligonucleotides useful as primers or probes in such methods should specifically hybridize to a region of the nucleic acid that contains or is adjacent to the polymorphic site.
  • the oligonucleotides are between 10 and 35 nucleotides in length and preferably, between 15 and 30 nucleotides in length. Most preferably, the oligonucleotides are 20 to 25 nucleotides long. The exact length of the oligonucleotide will depend on many factors that are routinely considered and practiced by the skilled artisan.
  • nucleic acid amplification procedures may be used to amplify the target region including transcription-based amplification systems (U.S. Pat. No. 5,130,238; EP 329,822; U.S. Pat. No. 5,169,766, published PCT patent application WO 89/06700) and isothermal methods (Walker et al, Proc. Natl. Acad. ScI USA 89:392-396 (1992)).
  • Hybridizing Allele-Speci ⁇ c Oligonucleotide to a Target Gene A polymorphism in the target region may be assayed before or after, amplification using one of several hybridization- based methods known in the art.
  • allele-specific oligonucleotides are utilized in performing such methods.
  • the allele-specific oligonucleotides may be used as differently labelled probe pairs, with one member of the pair showing a perfect match to one variant of a target sequence and the other member showing a perfect match to a different variant.
  • more than one polymorphic site may be detected at once using a set of allele- specific oligonucleotides or oligonucleotide pairs.
  • the members of the set have melting temperatures within 5°C, and more preferably within 2°C, of each other when hybridizing to each of the polymorphic sites being detected.
  • Hybridization of an allele-specific oligonucleotide to a target polynucleotide may be performed with both entities in solution, or such hybridization may be performed when either the oligonucleotide or the target polynucleotide is covalently or noncovalently affixed to a solid support. Attachment may be mediated, for example, by antibody-antigen interactions, poly-L-Lys, streptavidin or avidin-biotin, salt bridges, hydrophobic interactions, chemical linkages, UV cross-linking, baking, etc. Allele-specific oligonucleotide may be synthesized directly on the solid support or attached to the solid support subsequent to synthesis.
  • Solid- supports suitable for use in detection methods of the invention include substrates made of silicon, glass, plastic, paper and the like, which may be formed, for example, into wells (as in 96-well plates), slides, sheets, membranes, fibres, chips, dishes, and beads.
  • the solid support may be treated, coated or derivatised to facilitate the immobilization of the allele-specific oligonucleotide or target nucleic acid.
  • the method comprises determining the genotype or the haplotype for a gene present in each member of the population, wherein the genotype or haplotype comprises the nucleotide pair or nucleotide detected at one or more of the polymorphic sites in the gene, and calculating the frequency at which the genotype or haplotype is found in the population.
  • the population may be a reference population, a family population, a same sex population, a population group, or a trait population ⁇ e.g., a group of individuals exhibiting a trait of interest such as a medical condition or response to a therapeutic treatment).
  • frequency data for genotypes and/or haplotypes found in a reference population are used in a method for identifying an association between a trait and a genotype or a haplotype.
  • the trait may be any detectable phenotype, including but not limited to susceptibility to a disease or response to a treatment.
  • the method involves obtaining data on the frequency of the genotypes or haplotypes of interest in a reference population and comparing the data to the frequency of the genotypes or haplotypes in a population exhibiting the trait.
  • Frequency data for one or both of the reference and trait populations may be obtained by genotyping or haplotyping each individual in the populations using one of the methods described above.
  • the haplotypes for the trait population may be determined directly or, alternatively, by the predictive genotype to haplotype approach described above.
  • the frequency data for the reference and/or trait populations are obtained by accessing previously determined frequency data, which may be in written or electronic form.
  • the frequency data may be present in a database that is accessible by a computer. Once the frequency data are obtained, the frequencies of the genotypes or haplotypes of interest in the reference and trait populations are compared.
  • the analysis includes an assigning step, as follows: First, each of the possible haplotype pairs is compared to the haplotype pairs in the reference population.
  • haplotype pairs in the reference population matches a possible haplotype pair and that pair is assigned to the individual.
  • only one haplotype represented in the reference haplotype pairs is consistent with a possible haplotype pair for an individual, and in such cases the individual is assigned a haplotype pair containing this known haplotype and a new haplotype derived by subtracting the known haplotype from the possible haplotype pair.
  • a detectable genotype or haplotype that is in linkage disequilibrium with a genotype or haplotype of interest may be used as a surrogate marker.
  • a genotype that is in linkage disequilibrium with another genotype is indicated where a particular genotype or haplotype for a given gene is more frequent in the population that also demonstrates the potential surrogate marker genotype than in the reference population. If the frequency is statistically significant, then the marker genotype is predictive of that genotype or haplotype, and can be used as a surrogate marker.
  • the trait is susceptibility to a disease, severity of a disease, the staging of a disease or response to a drug.
  • Such methods have applicability in developing diagnostic tests and therapeutic treatments for all pharmacogenetic applications where there is the potential for an association between a genotype and a treatment outcome, including efficacy measurements, pharmacokinetic measurements and side-effect measurements.
  • the trait of interest is a clinical response exhibited by a patient to some therapeutic treatment, for example, response to a drug targeting or to a therapeutic treatment for a medical condition.
  • genotype or haplotype data is obtained on the clinical responses exhibited by a population of individuals who received the treatment, hereinafter the "clinical population”.
  • This clinical data may be obtained by analyzing the results of a clinical trial that has already been run and/or by designing and carrying out one or more new clinical trials.
  • the individuals included in the clinical population are usually graded for the existence of the medical condition of interest. This grading of potential patients could employ a standard physical exam or one or more lab tests. Alternatively, grading of patients could use haplotyping for situations where there is a strong correlation between haplotype pair and disease susceptibility or severity.
  • the therapeutic treatment of interest is administered to each individual in the trial population, and each individual's response to the treatment is measured using one or more predetermined criteria. It is contemplated that in many cases, the trial population will exhibit a range of responses and that the investigator will choose the number of responder groups (e.g., low, medium, high) made up by the various responses. In addition, the gene for each individual in the trial population is genotyped and/or haplotyped, which may be done before or after administering the treatment.
  • correlations between individual response and genotype or haplotype content are created. Correlations may be produced in several ways. In one method, individuals are grouped by their genotype or haplotype (or haplotype pair) (also referred to as a polymorphism group), and then the averages and standard deviations of clinical responses exhibited by the members of each polymorphism group are calculated.
  • the identification of an association between a clinical response and a genotype or haplotype (or haplotype pair) for the gene may be the basis for designing a diagnostic method to determine those individuals who will or will not respond to the treatment, or alternatively, will respond at a lower level and thus may require more treatment, i.e., a greater dose of a drug.
  • the diagnostic method may take one of several forms: for example, a direct DNA test (i.e., genotyping or haplotyping one or more of the polymorphic sites in the gene), a serological test, or a physical exam measurement. The only requirement is that there be a good correlation between the diagnostic test results and the underlying genotype or haplotype. In a preferred embodiment, this diagnostic method uses the predictive haplotyping method described above.
  • the measured level of the gene expression product falls within 1.5 standard deviations of the mean of any of the control groups then that individual may be assigned to that genotype group. In yet another embodiment, if the measured level of the gene expression product is 1.0 or less Standard deviations of the mean of any of the control groups levels then that individual may be assigned to that genotype group.
  • the standard control levels of the gene expression product would then be compared with the measured level of a gene expression product in a given patient.
  • This gene expression product could be the characteristic niRNA associated with that particular genotype group or the polypeptide gene expression product of that genotype group.
  • the patient could then be classified or assigned to a particular genotype group based on how similar the measured levels were compared to the control levels for a given group.
  • the invention also provides a computer system for storing and displaying polymorphism data determined for the gene.
  • the computer system comprises a computer processing unit, a display, and a database containing the polymorphism data.
  • the polymorphism data includes the polymorphisms, the genotypes and the haplotypes identified for a given gene in a reference population.
  • the computer system is capable of producing a display showing haplotypes organized according to their evolutionary relationships.
  • a computer may implement any or all analytical and mathematical operations involved in practicing the methods of the present invention.
  • the computer may execute a program that generates views (or screens) displayed on a display device and with which the user can interact to view and analyze large amounts of information relating to the gene and its genomic variation, including chromosome location, gene structure, and gene family, gene expression data, polymorphism data, genetic sequence data, and clinical population data (e.g., data on ethnogeographic origin, clinical responses, genotypes, and haplotypes for one or more populations).
  • the polymorphism data described herein may be stored as part of a relational database (e.g., an instance of an Oracle database or a set of ASCII flat files).
  • polymorphism data may be stored on the computer's hard drive or may, for example, be stored on a CD-ROM or on one or more other storage devices accessible by the computer.
  • the data may be stored on one or more databases in communication with the computer via a network.
  • the invention provides SNP probes, which are useful in classifying subjects according to their types of genetic variation.
  • the SNP probes according to the invention are oligonucleotides, which discriminate between SNPs in conventional allelic discrimination assays.
  • the oligonucleotides according to this aspect of the invention are complementary to one allele of the SNP nucleic acid, but not to any other allele of the SNP nucleic acid.
  • Oligonucleotides according to this embodiment of the invention can discriminate between SNPs in various ways. For example, under stringent hybridization conditions, an oligonucleotide of appropriate length will hybridize to one SNP, but not to any other.
  • the oligonucleotide may be labelled using a radiolabel or a fluorescent molecular tag.
  • an oligonucleotide of appropriate length can be used as a primer for PCR, wherein the 3' terminal nucleotide is complementary to one allele containing a SNP, but not to any other allele.
  • the presence or absence of amplification by PCR determines the haplotype of the SNP.
  • Genomic and cDNA fragments of the invention comprise at least one polymorphic site identified herein, have a length of at least 10 nucleotides, and may range up to the full length of the gene.
  • a fragment according to the present invention is between 100 and 3000 nucleotides in length, and more preferably between 200 and 2000 nucleotides in length, and most preferably between 500 and 1000 nucleotides in length.
  • kits of the Invention provides nucleic acid and polypeptide detection kits useful for haplotyping and/or genotyping the gene in an individual. Such kits are useful for classifying individuals for the purpose of classifying individuals. Specifically, the invention encompasses kits for detecting the presence of a polypeptide or nucleic acid corresponding to a marker of the invention in a biological sample, e.g., any bodily fluid including, but not limited to, seram, plasma, lymph, cystic fluid, urine, stool, cerebrospinal fluid, ascities fluid or blood, and including biopsy samples of body tissue.
  • a biological sample e.g., any bodily fluid including, but not limited to, seram, plasma, lymph, cystic fluid, urine, stool, cerebrospinal fluid, ascities fluid or blood, and including biopsy samples of body tissue.
  • the kit can comprise a labelled compound or agent capable of detecting a polypeptide or an mRNA encoding a polypeptide corresponding to a marker of the invention in a biological sample and means for determining the amount of the polypeptide or mRNA in the sample, e.g., an antibody which binds the polypeptide or an oligonucleotide probe which binds to DNA or mRNA encoding the polypeptide.
  • Kits can also include instructions for interpreting the results obtained using the kit.
  • the invention provides a kit comprising at least two genotyping oligonucleotides packaged in separate containers.
  • the kit may also contain other components such as hybridization buffer (where the oligonucleotides are to be used as a probe) packaged in a separate container.
  • the kit may contain, packaged in separate containers, a polymerase and a reaction buffer optimized for primer extension mediated by the polymerase, such as in the case of PCR.
  • such kit may further comprise a DNA sample collecting means.
  • the kit can comprise, e.g. , (1) a first antibody, e.g. , attached to a solid support, which binds to a polypeptide corresponding to a marker or the invention; and, optionally (2) a second, different antibody which binds to either the polypeptide or the first antibody and is conjugated to a detectable label.
  • the kit can comprise, e.g., (1) an oligonucleotide, e.g., a detectably-labelled oligonucleotide, which hybridizes to a nucleic acid sequence encoding a polypeptide corresponding to a marker of the invention; or (2) a pair of primers useful for amplifying a nucleic acid molecule corresponding to a marker of the invention.
  • the kit can also comprise, e.g., a buffering agent, a preservative or a protein- stabilizing agent.
  • the kit can further comprise components necessary for detecting the detectable-label, e.g., an enzyme or a substrate.
  • the kit can also contain a control sample or a series of control samples, which can be assayed and compared to the test sample.
  • Each component of the kit can be enclosed within an individual container and all of the various containers can be within a single package, along with instructions for interpreting the results of the assays performed using the kit.
  • Nucleic Acid Sequences of the Invention comprises one or more isolated polynucleotides.
  • the invention also encompasses allelic variants of the same, that is, naturally occurring alternative forms of the isolated polynucleotides that encode mutant polypeptides that are identical, homologous or related to those encoded by the polynucleotides.
  • non-naturally occurring variants may be produced by mutagenesis techniques or by direct synthesis techniques well-known in the art.
  • nucleic acid sequences capable of hybridizing at low stringency with any nucleic acid sequences encoding mutant polypeptide of the present invention are considered to be within the scope of the invention.
  • Standard stringency conditions are well characterized in standard molecular biology cloning texts. See, for example Molecular Cloning A Laboratory Manual, 2nd Ed., ed., Sambrook, Fritsch, & Maniatis (Cold Spring Harbor Laboratory Press, 1989); DNA Cloning, Volumes I and II, D.N. Glover, ed. (1985); Oligonucleotide Synthesis, MJ. Gait, ed. (1984); Nucleic Acid Hybridization, B.D. Hames & SJ. Higgins, eds (1984).
  • Recombinant Expression Vectors Another aspect of the invention comprises vectors containing one or more nucleic acid sequences encoding a mutant polypeptide.
  • many conventional techniques in molecular biology, microbiology and recombinant DNA are used. These techniques are well-known and are explained in, e.g., Current Protocols in Molecular Biology, VoIs. I-III, Ausubel, ed. (1997); Sambrook et ah, Molecular Cloning: A Laboratory Manual, 2 nd Ed. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1989); DNA Cloning: A Practical Approach, VoIs. I and II, Glover, Ed. (1985); Oligonucleotide Synthesis, Gait, Ed.
  • the nucleic acid containing all or a portion of the nucleotide sequence encoding the polypeptide is inserted into an appropriate cloning vector, or an expression vector ⁇ i.e., a vector that contains the necessary elements for the transcription and translation of the inserted polypeptide coding sequence) by recombinant DNA techniques well-known in the art.
  • an expression vector ⁇ i.e., a vector that contains the necessary elements for the transcription and translation of the inserted polypeptide coding sequence
  • vector can be used interchangeably as the plasmid is the most commonly used form of vector.
  • the invention is intended to include such other forms of expression vectors that are not technically plasmids, such as viral vectors (e.g.
  • "operably linked" is intended to mean that the nucleotide sequence of interest is linked to the regulatory sequences in a manner that allows for expression of the nucleotide sequence (e.g., in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell).
  • regulatory sequence is intended to include promoters, enhancers and other expression control elements (e.g., polyadenylation signals). Such regulatory sequences are described, for example, in Goeddel, Gene Expression Technology: Methods In Enzymology 185 (Academic Press, San Diego, Calif, 1990). Regulatory sequences include those that direct constitutive expression of a nucleotide sequence in many types of host cell and those that direct expression of the nucleotide sequence only in certain host cells (e.g., tissue specific regulatory sequences). It will be appreciated by those skilled in the art that the design of the expression vector can depend on such factors as the choice of the host cell to be transformed, the level of expression of polypeptide desired, etc.
  • the expression vectors of the invention can be introduced into host cells to thereby produce polypeptides or peptides, including fusion polypeptides, encoded by nucleic acids as described herein (e.g., mutant polypeptides and mutant-derived fusion polypeptides, 'etc.).
  • polypeptide-Expressing Host Cells Another aspect of the invention pertains to polypeptide-expressing host cells, which contain a nucleic acid encoding one or more mutant polypeptides of the invention.
  • the terms "host cell” and “recombinant host cell” are used interchangeably herein. It is understood that such terms refer not only to the particular subject cell but also to the progeny or potential progeny of such a cell. Because certain modifications may occur in succeeding generations due to either mutation or environmental influences, such progeny may not, in fact, be identical to the parent cell, but are still included within the scope of the term as used herein.
  • a host cell can be any prokaryotic or eukaryotic cell. Sambrook et al Molecular Cloning: A Laboratory Manual, 2nd ed. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1989).
  • Vector DNA can be introduced into prokaryotic or eukaryotic cells via conventional transformation or transfection techniques.
  • transformation and “transfection” are intended to refer to a variety of art recognized techniques for introducing foreign nucleic acid ⁇ e.g., DNA) into a host cell, including calcium phosphate or calcium chloride co precipitation, DEAE dextran mediated transfection, lipofection, or electroporation. Suitable methods for transforming or transfecting host cells can be found in Sambrook et al. Molecular Cloning: A Laboratory Manual, 2nd ed. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1989), and other laboratory manuals.
  • the desired isogene may be introduced into a host cell in a vector such that the isogene remains extrachromosomal. In such a situation, the gene will be expressed by the cell from the extrachromosomal location.
  • the isogene is introduced into a cell in such a way that it recombines with the endogenous gene present in the cell.
  • Vectors for the introduction of genes both for recombination and for extrachromosomal maintenance are known in the art, and any suitable vector or vector construct may be used in the invention.
  • mutant polypeptide can be expressed in bacterial cells such as Escherichia coli (E. coli), insect cells (using baculovirus expression vectors), fungal cells, e.g., yeast, or mammalian cells. Suitable host cells are discussed further in Goeddel, Gene Expression Technology: Methods In Enzymology 185 (Academic Press, San Diego, Calif., 1990).
  • polypeptides in prokaryotes are most often carried out in E. coli with vectors containing constitutive or inducible promoters directing the expression of either fusion or non fusion polypeptides.
  • Typical fusion expression vectors include pGEX (Pharmacia Biotech Inc; Smith and Johnson, 1988. Gene 67: 31 40), pMAL (New England Biolabs, Beverly, Mass., USA) and pRIT5 (Pharmacia, Piscataway, N.J., USA) that fuse glutathione S transferase (GST), maltose E binding polypeptide, or polypeptide A, respectively, to the target recombinant polypeptide. Examples of suitable inducible non fusion E.
  • coli expression vectors include pTrc (Amrann et al, Gene 69: 301 315 (1988)) and pET 1 Id (Studier et al, Gene Expression Technology: Methods In Enzymology (Academic Press, San Diego, Calif., 1990) pp. 60-89. Other strategies are described by Gottesman, Gene Expression Technology: Methods In Enzymology (Academic Press, San Diego, Calif., 1990) pp. 119-128 and by Wada, et al. , Nucl. Acids Res. 20: 2111 -2118 (1992)).
  • the polypeptide expression vector may be a yeast expression vector.
  • yeast Saccharomyces cerivisae examples include pYepSecl (Baldari et al, EMBO J. 6: 229 234 (1987)), pMFa (Kurjan & Herskowitz, Cell 30: 933-943 (1982)), pJRY88 (Schultz etal, Gene 54: 113 123 (1987)), pYES2 (InVitrogen Corporation, San Diego, Calif. USA), and picZ (InVitrogen Corp, San Diego, Calif, USA).
  • mutant polypeptide can be expressed in insect cells using baculovirus expression vectors.
  • Baculovirus vectors available for expression of polypeptides in cultured insect cells include the pAc series (Smith et al, MoI Cell Biol 3: 2156 2165 (1983)) and the pVL series (Lucklow & Summers, Virology 170: 31 39 (1989)).
  • the nucleic acid of the invention may be expressed in mammalian cells using a mammalian expression vector such as pCDM8 (Seed, Nature 329: 842 846 (1987)) or pMT2PC (Kaufman et al, EMBOJ. 6: 187 195 (1987)).
  • a host cell that includes a compound of the invention can be used to produce (i.e., express) recombinant mutant polypeptide.
  • Purification of recombinant polypeptides is well-known in the art and includes ion exchange purification techniques, or affinity purification techniques, for example with an antibody to the compound.
  • Transgenic Animals Recombinant organisms, i.e., transgenic animals, expressing a variant gene of the invention are prepared using standard procedures known in the art. Transgenic animals carrying the constructs of the invention can be made by several methods known to those having skill in the art. See, e.g., U.S. Pat. No. 5,610,053 and "The Introduction of Foreign Genes into Mice" and the cited references therein, in: Recombinant DNA, Eds. J.D. Watson, M. Gilman, J. Witkowski & M. Zoller (W.H. Freeman and Company, New York) pp. 254-272.
  • Transgenic animals stably expressing a human isogene and producing human protein can be used as biological models for studying diseases related to abnormal expression and/or activity, and for screening and assaying various candidate drugs, compounds, and treatment regimens to reduce the symptoms or effects of these diseases.
  • Characterizing Gene Expression Level Methods to detect and measure mRNA levels (i.e., gene transcription level) and levels of polypeptide gene expression products (i.e., gene translation level) are well-known in the art and include the use of nucleotide microarrays and polypeptide detection methods involving mass spectrometers and/or antibody detection and quantification techniques. See also, Tom Strachan & Andrew Read, Human Molecular Genetics, 2 nd Edition. (John Wiley and Sons, Inc.
  • RNA isolation technique that does not select against the isolation of mRNA can be utilized for the purification of RNA from cells. See, e.g., Ausubel et al, Ed., Curr. Prot. MoI. Biol.
  • the level of the mRNA expression product of the target gene is determined.
  • Methods to measure the level of a specific mRNA are well-known in the art and include Northern blot analysis, reverse transcription PCR and real time quantitative PCR or by hybridization to a oligonucleotide array or microarray.
  • the determination of the level of expression may be performed by determination of the level of the protein or polypeptide expression product of the gene in body fluids or tissue samples including but not limited to blood or serum. Large numbers of tissue samples can readily be processed using techniques well-known to those of skill in the art, such as, e.g., the single-step RNA isolation process of U.S.
  • the isolated mRNA can be used in hybridization or amplification assays that include, but are not limited to, Southern or Northern analyses, PCR analyses and probe arrays.
  • One preferred diagnostic method for the detection of mRNA levels involves contacting the isolated mRNA with a nucleic acid molecule (probe) that can hybridize to the mRNA encoded by the gene being detected.
  • the nucleic acid probe can be, e.g., a full-length cDNA, or a portion thereof, such as an oligonucleotide of at least 7, 15, 30, 50, 100, 250 or 500 nucleotides in length and sufficient to specifically hybridize under stringent conditions to an mRNA or genomic DNA encoding a marker of the present invention.
  • probes for use in the diagnostic assays of the invention are described herein. Hybridization of an mRNA with the probe indicates that the marker in question is being expressed.
  • the probes are immobilized on a solid surface and the mRNA is contacted with the probes, for example, in an Affymetrix gene chip array (Affymetrix, Calif. USA).
  • a skilled artisan can readily adapt known mRNA detection methods for use in detecting the level of mRNA encoded by the markers of the present invention.
  • An alternative method for determining the level of mRNA corresponding to a marker of the present invention in a sample involves the process of nucleic acid amplification, e.g., by RT-PCR (the experimental embodiment set forth in U.S. Pat. No. 4,683,202); ligase chain reaction (Barany et ⁇ l, Proc. N ⁇ tl. Ac ⁇ d. Sci. USA 88:189-193 (1991)) self-sustained sequence replication (Guatelli et ⁇ l., Proc. N ⁇ tl. Ac ⁇ d. Sci. USA 87: 1874-1878 (1990)); transcriptional amplification system (Kwoh et ⁇ l., Proc. N ⁇ tl. Ac ⁇ d. Sci.
  • amplification primers are defined as being a pair of nucleic acid molecules that can anneal to 5' or 3' regions of a gene (plus and minus strands, respectively, or vice- versa) and contain a short region in between.
  • amplification primers are from about 10-30 nucleotides in length and flank a region from about 50-200 nucleotides in length.
  • RT-PCR Real-time quantitative PCR
  • the RT-PCR assay utilizes an RNA reverse transcriptase to catalyze the synthesis of a DNA strand from an RNA strand, including an mRNA strand.
  • the resultant DNA may be specifically detected and quantified and this process may be used to determine the levels of specific species of mRNA.
  • TAQMAN® PE Applied Biosystems, Foster City, Calif., USA
  • AMPLITAQ GOLDTM DNA polymerase exploits the 5' nuclease activity of AMPLITAQ GOLDTM DNA polymerase to cleave a specific form of probe during a PCR reaction.
  • TAQMANTM probe See Luthra et al., Am. J. Pathol 153: 63-68 (1998); Kuimelis et al, Nucl. Acids Symp. Ser. 37: 255-256 (1997); and Mullah et al, Nucl. Acids Res. 26(4): 1026-1031 (1998)).
  • cleavage of the probe separates a reporter dye and a quencher dye, resulting in increased fluorescence of the reporter.
  • the accumulation of PCR products is detected directly by monitoring the increase in fluorescence of the reporter dye. Heid et al, Genome Res. 6(6): 986-994 (1996)). The higher the starting copy number of nucleic acid target, the sooner a significant increase in fluorescence is observed. See Gibson, Heid & Williams et al, Genome Res. 6: 995-1001 (1996).
  • cDNA pools such as by sequencing sufficient bases, e.g., 20-50 bases, in each of multiple cDNAs to identify each cDNA, or by sequencing short tags, e.g., 9-10 bases, which are generated at known positions relative to a defined mRNA end pathway pattern. See, e.g., Velculescu, Science 270: 484-487 (1995).
  • the cDNA levels in the samples are quantified and the mean, average and standard deviation of each cDNA is determined using by standard statistical means well-known to those of skill in the art. Norman TJ. Bailey, Statistical Methods In Biology, 3rd Edition (Cambridge University Press, 1995).
  • Detection of Polypeptides can be detected by a probe which is detectably labelled, or which can be subsequently labelled.
  • the term "labelled", with regard to the probe or antibody is intended to encompass direct-labelling of the probe or antibody by coupling, i.e., physically linking, a detectable substance to the probe or antibody, as well as indirect- labelling of the probe or antibody by reactivity with another reagent that is directly-labelled. Examples of indirect labelling include detection of a primary antibody using a fluorescently- labelled secondary antibody and end-labelling of a DNA probe with biotin such that it can be detected with fluorescently-labelled streptavidin.
  • the probe is an antibody that recognizes the expressed protein.
  • a variety of formats can be employed to determine whether a sample contains a target protein that binds to a given antibody.
  • Immunoassay methods useful in the detection of target polypeptides of the present invention include, but are not limited to, e.g., dot blotting, western blotting, protein chips, competitive and noncompetitive protein binding assays, enzyme-linked immunosorbant assays (ELISA), immunohistochemistry, fluorescence activated cell sorting (FACS), and others commonly used and widely-described in scientific and patent literature, and many employed commercially.
  • a skilled artisan can readily adapt known protein/antibody detection methods for use in determining whether cells express a marker of the present invention and the relative concentration of that specific polypeptide expression product in blood or other body tissues.
  • Proteins from individuals can be isolated using techniques that are well-known to those of skill in the art. The protein isolation methods employed can, e.g., be such as those described in Harlow & Lane, Antibodies: A Laboratory Manual (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1988)).
  • various host animals may be immunized by injection with the polypeptide, or a portion thereof.
  • host animals may include, but are not limited to, rabbits, mice and rats.
  • adjuvants may be used to increase the immunological response, depending on the host species including, but not limited to, Freund's (complete and incomplete), mineral gels, such as aluminium hydroxide; surface active substances, such as lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, keyhole limpet haemocyanin and dinitrophenol; and potentially useful human adjuvants, such as bacille Camette-Guerin (BCG) and Corynebacterium parvum.
  • BCG Bacille Camette-Guerin
  • Monoclonal antibodies which are homogeneous populations of antibodies to a particular antigen, may be obtained by any technique that provides for the production of antibody molecules by continuous cell lines in culture. These include, but are not limited to, the hybridoma technique of Kohler & Milstein, Nature 256: 495-497 (1975); and U.S. Pat. No. 4,376,110; the human B-cell hybridoma technique of Kosbor et ah, Immunol. Today 4: 72 (1983); Cole et al., Proc. Natl. Acad. ScL USA 80: 2026-2030 (1983); and the EBV- hybridoma technique of Cole et al, Monoclonal Antibodies and Cancer Therapy (Alan R.
  • chimaeric antibodies are derived from different animal species, such as those having a variable or hypervariable region derived form a murine mAb and a human immunoglobulin constant region.
  • Antibodies or antibody fragments can be used in methods, such as Western blots or immunofluorescence techniques, to detect the expressed proteins. In such uses, it is generally preferable to immobilize either the antibody or proteins on a solid support.
  • Suitable solid phase supports or carriers include any support capable of binding an antigen or an antibody.
  • Well-known supports or carriers include glass, polystyrene, polypropylene, polyethylene, dextran, nylon, amylases, natural and modified celluloses, polyacrylamides, gabbros and magnetite.
  • a useful method for ease of detection, is the sandwich ELISA, of which a number of variations exist, all of which are intended to be used in the methods and assays of the present invention.
  • sandwich assay is intended to encompass all variations on the basic two-site technique. Immunofluorescence and EIA techniques are both very well- established in the art. However, other reporter molecules, such as radioisotopes, chemiluminescent or bioluminescent molecules may also be employed. It will be readily apparent to the skilled artisan how to vary the procedure to suit the required use.
  • Whole genome monitoring of protein i.e., the "proteome” can be carried out by constructing a microarray in which binding sites comprise immobilized, preferably monoclonal, antibodies specific to a plurality of protein species encoded by the cell genome.
  • binding sites comprise immobilized, preferably monoclonal, antibodies specific to a plurality of protein species encoded by the cell genome.
  • antibodies are present for a substantial fraction of the encoded proteins, or at least for those proteins relevant to testing or confirming a biological network model of interest.
  • methods for making monoclonal antibodies are well-known. See, e.g., Harlow & Lane, Antibodies: A Laboratory Manual” (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1988)).
  • monoclonal antibodies are raised against synthetic peptide fragments designed based on genomic sequence of the cell. With such an antibody array, proteins from the cell are contacted to the array and their binding is measured with assays known in the art.
  • Two-Dimensional Gel Electrophoresis Two-dimensional gel electrophoresis is well-known in the art and typically involves isoelectric focusing along a first dimension followed by SDS-PAGE electrophoresis along a second dimension. See, e.g., Hames et at, Gel Electrophoresis of Proteins: A Practical Approach (IRL Press, New York, 1990); Shevchenko et at, Proc. Natl. Acad. Sci. USA 93: 14440-14445 (1996); Sagliocco et at, Yeast 12: 1519-1533 (1996); and Lander, Science 274: 536-539 (1996).
  • MS-based analysis methodology is useful for analysis of isolated target polypeptide as well as analysis . of target polypeptide in a biological sample.
  • MS formats for use in analyzing a target polypeptide include ionization (I) techniques, such as, but not limited to, matrix assisted laser desorption (MALDI), continuous or pulsed electrospray ionization (ESI) and related methods, such as ionspray or thermospray, and massive cluster impact (MCI).
  • I ionization
  • MALDI matrix assisted laser desorption
  • ESI electrospray ionization
  • MCI massive cluster impact
  • Such ion sources can be matched with detection formats, including linear or non-linear reflectron time of flight (TOF), single or multiple quadrupole, single or multiple magnetic sector, Fourier transform ion cyclotron resonance (FTICR), ion trap and combinations thereof such as ion-trap/TOF.
  • TOF linear or non-linear reflectron time of flight
  • FTICR Fourier transform ion cyclotron resonance
  • ion trap and combinations thereof such as ion-trap/TOF.
  • numerous matrix/wavelength combinations e.g., matrix assisted laser desorption (MALDI)
  • solvent combinations e.g., ESI
  • the target polypeptide can be solubilised in an appropriate solution or reagent system.
  • a solution or reagent system e.g., an organic or inorganic solvent
  • MS of peptides also is described, e.g., in International PCT Application No. WO 93/24834 and U.S.
  • a solvent is selected that minimizes the risk that the target polypeptide will be decomposed by the energy introduced for the vaporization process.
  • a reduced risk of target polypeptide decomposition can be achieved, e.g., by embedding the sample in a matrix.
  • a suitable matrix can be an organic compound such as a sugar, e.g., a pentose or hexose, or a polysaccharide such as cellulose. Such compounds are decomposed thermolytically into CO 2 and H 2 O such that no residues are formed that can lead to chemical reactions.
  • the matrix also can be an inorganic compound, such as nitrate of ammonium, which is decomposed essentially without leaving any residue.
  • Electrospray MS has been described by Fenn et al, J. Phys. Chem. 88: 4451-4459 (1984); and PCT Application No. WO 90/14148; and current applications are summarized in review articles. See Smith et al, Anal. Chem. 62: 882-89 (1990); and Ardrey, Spectroscopy 4: 10-18 (1992).
  • the mass of a target polypeptide determined by MS can be compared to the mass of a corresponding known polypeptide.
  • the corresponding known polypeptide can be the corresponding non-mutant protein, e.g., wild-type protein.
  • ESI the determination of molecular weights in femtomole amounts of sample is very accurate due to the presence of multiple ion peaks, all of which can be used for mass calculation.
  • Sub-attomole levels of protein have been detected, e.g., using ESI MS (Valaskovic et al, Science 273: 1199-1202 (1996)) and MALDI MS (Li et al, J. Am. Chem. Soc. 118: 1662-1663 (1996)).
  • MALDI Matrix Assisted Laser Desorption
  • the level of the target protein in a biological sample may be measured by means of mass spectrometric (MS) methods including, but not limited to, those techniques known in the art as matrix-assisted laser desorption/ionization, time-of-flight mass spectrometry (MALDI- TOF-MS) and surfaces enhanced for laser desorption/ionization, time-of-flight mass spectrometry (SELDI-TOF-MS) as further detailed below.
  • MS mass spectrometric
  • Methods for performing MALDI are well-known to those of skill in the art. See, e.g., Juhasz et al, Analysis, Anal. Chem.
  • MALDI-TOF-MS has been described by Hillenkamp et ah, Biological Mass Spectrometry, Burlingame & McCloskey, eds. (Elsevier Science PubL, Amsterdam, 1990) pp. 49-60. [124] A variety of techniques for marker detection using mass spectroscopy can be used. See Bordeaux Mass Spectrometry Conference Report, Hillenkamp, Ed., pp.
  • MS techniques allow the successful volatilization of high molecular weight biopolymers, without fragmentation, and have enabled a wide variety of biological macromolecules to be analyzed by mass spectrometry.
  • SMDI Surfaces Enhanced for Laser Desorption/Ionization
  • Other techniques are used which employ new MS probe element compositions with surfaces that allow the probe element to actively participate in the capture and docking of specific analytes, described as Affinity Mass Spectrometry (AMS). See SELDI patents U.S. Pat. Nos. 5,719,060; 5,894,063; 6,020,208; 6,027,942; 6,124,137; and U.S. Patent application No. U.S. 2003/0003465.
  • SEAC probe elements have been designed with Surfaces Enhanced for Affinity Capture (SEAC). See Hutchens & Yip, Rapid Commun. Mass Spectrom. 7: 576-580 (1993).
  • SEAC probe elements have been used successfully to retrieve and tether different classes of biopolymers, particularly proteins, by exploiting what is known about protein surface structures and biospecif ⁇ c molecular recognition.
  • the immobilized affinity capture devices on the MS probe element surface, i.e., SEAC determines the location and affinity (specificity) of the analyte for the probe surface, therefore the subsequent analytical MS process is efficient.
  • SELDI Surfaces Enhanced for Neat Desorption
  • the probe element surfaces i.e., sample presenting means
  • EAM Energy Absorbing Molecules
  • SEAC SEAC
  • the probe element surfaces i.e., sample presenting means
  • affinity capture devices to facilitate either the specific or non-specific attachment or adsorption (so-called docking or tethering) of analytes to the probe surface, by a variety of mechanisms (mostly non-covalent).
  • SEPAR Photolabile Attachment and Release
  • the probe element surfaces i.e., sample presenting means
  • the analyte e.g., protein
  • the chemical specificities determining the type and number of the photolabile molecule attachment points between the SEPAR sample presenting means (i.e., probe element surface) and the analyte may involve any one or more of a number of different residues or chemical structures in the analyte (e.g., His, Lys, Arg, Tyr, Phe and Cys residues in the case of proteins and peptides).
  • a polypeptide of interest also can be modified to facilitate conjugation to a solid support.
  • a chemical or physical moiety can be incorporate into the polypeptide at an appropriate position.
  • a polypeptide of interest can be modified by adding an appropriate functional group to the carboxyl terminus or amino terminus of the polypeptide, or to an amino acid in the peptide, (e.g., to a reactive side chain, or to the peptide backbone.
  • a naturally-occurring amino acid normally present in the polypeptide also can contain a functional group suitable for conjugating the polypeptide to the solid support.
  • a cysteine residue present in the polypeptide can be used to conjugate the polypeptide to a support containing a sulfhydryl group through a disulfide linkage, e.g., a support having cysteine residues attached thereto.
  • bonds that can be formed between two amino acids include, but are not limited to, e.g., monosulfide bonds between two lanthionine residues, which are non-naturally-occurring amino acids that can be incorporated into a polypeptide; a lactam bond formed by a transamidation reaction between the side chains of an acidic amino acid and a basic amino acid, such as between the y-carboxyl group of GIu (or alpha carboxyl group of Asp) and the amino group of Lys; or a lactone bond produced, e.g., by a crosslink between the hydroxy group of Ser and the carboxyl group of GIu (or alpha carboxyl group of Asp).
  • a solid support can be modified to contain a desired amino acid residue, e.g. , a GIu residue, and a polypeptide having a Ser residue, particularly a Ser residue at the N-terminus or C-terminus, can be conjugated to the solid support through the formation of a lactone bond.
  • the support need not be modified to contain the particular amino acid, e.g., GIu, where it is desired to form a lactone-like bond with a Ser in the polypeptide, but can be modified, instead, to contain an accessible carboxyl group, thus providing a function corresponding to the alpha carboxyl group of GIu.
  • a thiol-reactive functionality is particularly useful for conjugating a polypeptide to a solid support.
  • a thiol-reactive functionality is a chemical group that can rapidly react with a nucleophilic thiol moiety to produce a covalent bond, e.g., a disulfide bond or a thioether bond.
  • thiol-reactive functionalities include, e.g., haloacetyls, such as iodoacetyl; diazoketones; epoxy ketones, alpha- and beta-unsaturated carbonyls, such as alpha-enones and beta-enones; and other reactive Michael acceptors, such as maleimide; acid halides; benzyl halides; and the like. See Greene & Wuts, Protective Groups in Organic Synthesis, 2 nd Edition (John Wiley & Sons, 1991).
  • the thiol groups can be blocked with a photocleavable protecting group, which then can be selectively cleaved, e.g., by photolithography, to provide portions of a surface activated for immobilization of a polypeptide of interest.
  • Photocleavable protecting groups are known in the art (see, e.g. , published International PCT Application No. WO 92/10092; and McCray et ah, Ann. Rev. Biophys. Biophys. Chem. 18: 239-270 (1989)) and can be selectively de-blocked by irradiation of selected areas of the surface using, e.g., a photolithography mask.
  • Linkers A polypeptide of interest can be attached directly to a support via a linker. Any linkers known to those of skill in the art to be suitable for linking peptides or amino acids to supports, either directly or via a spacer, may be used. For example, the polypeptide can be conjugated to a support, such as a bead, through means of a variable spacer.
  • Linkers include, Rink amide linkers (see, e.g., Rink, Tetrahedron Lett. 28: 3787 (1976)); trityl chloride linkers (see, e.g., Leznoff, Ace Chem. Res.
  • linkers see, e.g., Bodansky et ah, Peptide Synthesis, 2 nd Edition (Academic Press, New York, 1976)
  • trityl linkers are known. See, e.g., U.S. Pat. Nos. 5,410,068 and 5,612,474.
  • Amino trityl linkers are also known. See, e.g., U.S. Pat. No. 5,198,531.
  • Other linkers include those that can be incorporated into fusion proteins and expressed in a host cell. Such linkers may be selected amino acids, enzyme substrates or any suitable peptide.
  • the linker may be made, e.g., by appropriate selection of primers when isolating the nucleic acid. Alternatively, they may be added by post-translational modification of the protein of interest.
  • Linkers that are suitable for chemically linking peptides to supports include disulfide bonds, thioether bonds, hindered disulfide bonds and covalent bonds between free reactive groups, such as amine and thiol groups.
  • a linker can provide a reversible linkage such that it is cleaved under the select conditions.
  • selectively cleavable linkers including photocleavable linkers (see U.S. Pat. No. 5,643,722), acid cleavable linkers (see Fattom et al, Infect. Immun. 60: 584-589 (1992)), acid-labile linkers (see Welh ⁇ ner et al, J. Biol. Chem. 266: 4309-4314 (1991)) and heat sensitive linkers are useful.
  • a linkage can be, e.g., a disulfide bond, which is chemically cleavable by mercaptoethanol or dithioerythrol; a biotin/streptavidin linkage, which can be photocleavable; a heterobifunctional derivative of a trityl ether group, which can be cleaved by exposure to acidic conditions or under conditions of MS (see K ⁇ ster et al, Tetrahedron Lett.
  • a levulinyl-mediated linkage which can be cleaved under almost neutral conditions with a hydrazinium/acetate buffer; an arginine-arginine or a lysine-lysine bond, either of which can be cleaved by an endopeptidase, such as trypsin; a pyrophosphate bond, which can be cleaved by a pyrophosphatase; or a ribonucleotide bond, which can be cleaved using a ribonuclease or by exposure to alkali condition.
  • an endopeptidase such as trypsin
  • a pyrophosphate bond which can be cleaved by a pyrophosphatase
  • a ribonucleotide bond which can be cleaved using a ribonuclease or by exposure to alkali condition.
  • a photolabile cross-linker such as 3-amino-(2-nitrophenyl)propionic acid can be employed as a means for cleaving a polypeptide from a solid support.
  • Other linkers include RNA linkers that are cleavable by ribozymes and other RNA enzymes and linkers, such as the various domains, such as CH 1 , CH 2 and CH 3 , from the constant region of human IgGl.
  • linker that is cleavable under MS conditions, such as a silyl linkage or photocleavable linkage, can be combined with a linker, such as an avidin biotin linkage, that is not cleaved under these conditions, but may be cleaved under other conditions.
  • Acid-labile linkers are particularly useful chemically cleavable linkers for mass spectrometry, especially for MALDI-TOF, because the acid labile bond is cleaved during conditioning of the target polypeptide upon addition of a 3 -HPA matrix solution.
  • the acid labile bond can be introduced as a separate linker group, e.g., an acid labile trityl group, or can be incorporated in a synthetic linker by introducing one or more silyl bridges using diisopropylysilyl, thereby forming a diisopropylysilyl linkage between the polypeptide and the solid support.
  • the diisopropylysilyl linkage can be cleaved using mildly acidic conditions, such as 1.5% trifluoroacetic acid (TFA) or 3-HPA/l % TFA MALDI-TOF matrix solution.
  • TFA trifluoroacetic acid
  • Methods for the preparation of diisopropylysilyl linkages and analogues thereof are well-known in the art. See, e.g., Saha et ah, J. Org. Chem. 58: 7827-7831 (1993).
  • Pin tools include those disclosed herein or otherwise known in the art. See, e.g., U.S. Application Serial Nos. 08/786,988 and 08/787,639; and International PCT Application No. WO 98/20166.
  • a pin tool in an array e.g., a 4 x 4 array, can be applied to wells containing polypeptides of interest.
  • the pin tool has a functional group attached to each pin tip, or a solid support, e.g., functionalized beads or paramagnetic beads are attached to each pin
  • the polypeptides in a well can be captured (1 pmol capacity).
  • the pins can be kept in motion (vertical, 1-2 mm travel) to increase the efficiency of the capture.
  • a reaction such as an in vitro transcription is being performed in the wells
  • movement of the pins can increase efficiency of the reaction.
  • Further immobilization can result by applying an electrical field to the pin tool.
  • the polypeptides are attracted to the anode or the cathode, depending on their net charge.
  • the pin tool (with or without voltage) can be modified to have conjugated thereto a reagent specific for the polypeptide of interest, such that only the polypeptides of interest are bound by the pins.
  • the pins can have nickel ions attached, such that only polypeptides containing a polyhistidine sequence are bound.
  • the pins can have antibodies specific for a target polypeptide attached thereto, or to beads that, in turn, are attached to the pins, such that only the target polypeptides, which contain the epitope recognized by the antibody, are bound by the pins.
  • Captured polypeptides can be analyzed by a variety of means including, e.g.
  • spectrometric techniques such as UV/VIS, IR, fluorescence, chemiluminescence, NMR spectroscopy, MS or other methods known in the art, or combinations thereof. If conditions preclude direct analysis of captured polypeptides, the polypeptides can be released or transferred from the pins, under conditions such that the advantages of sample concentration are not lost. Accordingly, the polypeptides can be removed from the pins using a minimal volume of eluent, and without any loss of sample. Where the polypeptides are bound to the beads attached to the pins, the beads containing the polypeptides can be removed from the pins and measurements made directly from the beads. [137] Pin tools can be useful for immobilizing polypeptides of interest in spatially addressable manner on an array.
  • Such spatially addressable or pre-addressable arrays are useful in a variety of processes, including, for example, quality control and amino acid sequencing diagnostics.
  • the pin tools described in the U.S. Application Nos. 08/786,988 and 08/787,639 and International PCT Application No. WO 98/20166 are serial and parallel dispensing tools that can be employed to generate multi-element arrays of polypeptides on a surface of the solid support.
  • the array surface can be flat, with beads or geometrically altered to include wells, which can contain beads.
  • MS geometries can be adapted for accommodating a pin tool apparatus.
  • aspects of the biological activity state, or mixed aspects can be measured in order to obtain drug and pathway responses.
  • the activities of proteins relevant to the characterization of cell function can be measured, and embodiments of this invention can be based on such measurements.
  • Activity measurements can be performed by any functional, biochemical or physical means appropriate to the particular activity being characterized. Where the activity involves a chemical transformation, the cellular protein can be contacted with natural substrates, and the rate of transformation measured. Where the activity involves association in multimeric units, e.g., association of an activated DNA binding complex with DNA, the amount of associated protein or secondary consequences of the association, such as amounts of mRNA transcribed, can be measured.
  • response data may be formed of mixed aspects of the biological state of a cell.
  • Response data can be constructed from, e.g., changes in certain mRNA abundances, changes in certain protein abundances and changes in certain protein activities.
  • EXAMPLE The purpose of this EXAMPLE is to define type 2 diabetes more accurately by providing biomarkers for the identification of the disease. Such biomarkers will help to identify drug targets for better intervention and treat patient more effectively based individualized medicine.
  • T2DM polymorphism association analysis The analysis consisted of over 1000 type 2 diabetes mellitus (T2DM) participants and over 1000 normal controls. Searches were conducted using USA collections only. Participants were limited to Caucasians only, excluding participants from Central and South America.
  • HBAlC body mass index
  • the ages of the control participants were > 30 years.
  • the body mass index (BMI) of the control participants was > 25 and ⁇ 40.
  • OMIM 5 the SNP consortium, Locus Link and dbSNP.
  • Genotyping of all SNPs was performed by single base extension followed by Mass Spectroscopy using Sequenom's MassArrayTM Technology. Ross et ah, Nat. Biotechnol. 16: 1347-1351 (1998). Ascertainment of genotypes on this system is based on matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) analysis of homogenous Mass Extension (hME) reaction products. During the hME reaction, the primer is extended by a specific number of nucleotides dependent on the SNP allele, and the few bases immediately 3' of the SNP.
  • MALDI-TOF matrix-assisted laser desorption/ionization time-of-flight
  • the extension is terminated by incorporating any one of 3 dideoxynucleotides (ddNTPs) matching a given allele, or continued using a single deoxynucleotide (dNTP), which matches the alternate allele.
  • ddNTPs dideoxynucleotides
  • dNTP deoxynucleotide
  • the hME reaction thus produces allele specific extension products of different masses (Daltons). Based on the mass differences of these hME products, a number of different assays can be run simultaneously (multiplexing), which provides cost- effective and high-throughput genotyping. Each mass-specific peak is called a specific allele based on the known sequence of the two extension products from each SNP in the multiplex. The entire genotyping process is supported by automation, with bar-coded tracking through the process.
  • Genotype data is approved and transferred from the platform system to a database for final statistical analysis. Genotype data's considered passing for a given assay if the pass rate of conservative and moderate calls combined is greater than or equal to 95%, and the nominal HWE p-value is 0.01 or higher. [ 147] A list of all SNPs assayed is described in TABLE 2.
  • TCF2 Hepatocyte nuclear rsl l651755 G A intron factor l ⁇
  • the counts of genotypes in the two affectation groups were compared to the expectation for no association of disease with genetics using a ⁇ 2 test with 2df, or the Fisher exact test for contingency tables with at least one sparse margin.
  • the count of heterozygous genotypes was combined with the count of homozygous genotypes for the major or minor alleles, respectively, in constructing contingency tables for the observed data. Again either a ⁇ 2 test with ldf, or a Fisher exact test, was used to assess the deviation from the null hypothesis of no association with affectation status given the specific model of inheritance.
  • Each group of SNPs (i.e., all SNPs in a single gene or all singleton SNPs together) were viewed as a distinct set of multiple hypotheses.
  • the statistic of the best marker was compared with the maximum value distribution derived from the re-sampled data sets to estimate a corrected p-value.
  • corrected p-values were derived from comparison to the estimated maximum value distribution from the re-sampled data of subsets of markers from the original group, containing only those markers that had uncorrected p-values greater than or equal to the test marker p-value.
  • the R statistical package (Becker R et al, The new S language: A programming environment for data analysis and graphics (Wadsworth & Brooks/Cole Advanced Books, Pacific Grove, 1988) p. 702) was used in this analysis, as well as haploview (Barrett JC et al, "Haploview: analysis and visualization of LD and haplotype maps” Bioinformatics (2004)) for analysis of the haplotype block structure.
  • Haplotype Analysis All haplotype association analyses were performed using the haplo. score program from the haplo. stats package in R, using the recommended parameters for a binary trait. See, Becker R, Chambers J & Willks A, The new S language: a programming environment for data analysis and graphics (Wadsworth & Brooks/Cole Advanced Books, Pacific Grove, 1988) p. 702; Schaid DJ et al, Am. J. Hum. Genet. 70(2): 425-34 (2002).
  • Haplo.score infers haplotypes from genotype data using the EM algorithm and tests association with a trait both for the individual haplotypes at a locus and for the entire set of haplotypes.
  • haplo.score warns that subjects with missing genotype data will dramatically degrade haplotype inference, and our experience reinforced this caution.
  • haplotype inference with haplo.score and the complete data would not converge unless the locus was divided into two groups (SNPs T27456C-A70941G and A71434G-A136178G), and even then the parameters had to be changed from their default settings.
  • the critical parameter "insert batch size” had to be reduced from its default setting of 6 (see TABLE 3).
  • the analysis was therefore performed not only on the whole data set but also on just the subset of individuals who had complete genotype information for all SNPs and for a reduced number of "tag" SNPs that were expected to capture the major haplotype diversity as determined from the block structure by the program haploview.
  • the "tag” SNP method represents a compromise between the haplotype inference using all of the SNPs including individuals with some missing genotyping data and using all of the SNPs including only individuals with complete data. See TABLE 3 for the number of eligible subjects in the different modes of the analysis.
  • haplotvpe.score interference AU subjects AU SNPs ACACB ACACB ESRRA PPARD block.1 block.2 n.snps 24 16 8 22 insert.batch 2 3 6 3 n control 1000 1000 992 998 n case 1001 1001 1000 1001
  • ACACB ESRRA PPARD ACACB ESRRA PPARD n.snps 38 8 22 23 4 14 insert.batch 6 6 6 6 6 6 n control 534 900 791 687 959 840 n case 568 896 806 698 983 867
  • SCD Stearoyl-CoA desaturase Association. Genetic variations in gene stearoyl-CoA desaturase (SCD) are associated with type 2 diabetes. 3 SNPs (rs3870747, rs7849 and rsl393491) of the 14 SNPs genotyped showed statistically significant (P ⁇ 0.05) association with diabetes phenotype by genotype analysis using all three genotypes (co-dominant model).. [154] SCD catalyzes a rate-limiting step in the synthesis of unsaturated fatty acids and lipogenesis. The principal product of SCD is oleic acid, which is formed by desaturation of stearic acid.
  • Hepatocyte nuclear factor I ⁇ (HNFl a, HNF2, TCF2) Association.
  • the G allele of SNP rsl 1651755 is associated with higher incidence of T2DM by both allele-specific analysis (p ⁇ 0.01) and genotype analysis using both dominant and recessive models (p ⁇ 0.05).
  • Mutations in the homeodomain-containing transcription factor hepatocyte nuclear factor-l ⁇ (HNF-I ⁇ , HNF2, TCF2) are known to cause a rare subtype of maturity-onset diabetes of the young (M0DY5) which is often associated with early-onset progressive non- diabetic renal dysfunction. Horikawa Y etal, (Letter) Nature Genet. 17: 384-385 (1997). We hypothesized that TCF2 could also be a potential candidate gene for the more common form of type 2 diabetes.
  • SNP rsl 1651755 located within the intronic region of TCF2, was used for this analysis.
  • SEQ ID NO: 166 shows the exact location of the variant within the surrounding sequences:
  • the overall allelic frequency for G is 52.64 % in diabetic patients as compared to 48.08% in non-diabetic patients.
  • the G allele is associated with higher incidence of T2DM by both allele-specific analysis (p ⁇ 0.01) and genotype analyses using all three genotypes (co-dominant model) and either of the dominant and recessive models (p ⁇ 0.05).
  • SNPs Oestrogen related receptor a (ESRRA) Association.
  • SNPs gs229601623 and rsl 1600990
  • T2DM p ⁇ 0.05
  • SNP rs2276014 is associated with higher incidence of T2DM (p ⁇ 0.05) by genotype analyses using all three genotypes (co-dominant model) and the dominant model.
  • ESRRA is an orphan nuclear receptor transcription factor expressed highly in kidney, heart, and brown adipocytes, all tissues that preferentially metabolize fatty acids. It has been hypothesized to play an important role in regulating mitochondriogenesis and cellular energy balance in vivo. Insulin resistance can develop as a result of an imbalance between triglyceride deposition in skeletal muscle and fatty acid oxidation capacity of the tissue. This capacity is directly dependent on tissue mitochondrial density. Thus, increasing skeletal muscle mitochondrial density through stimulation of mitochondriogenesis is expected to increase fatty acid oxidation capacity, leading to improvement of insulin sensitivity.
  • Agonizing ESRRA is therefore a promising approach to treat obesity, dislipidaemia, insulin resistance and T2DM.
  • ESRRA Haplotype Analysis In addition, significant association (p ⁇ 0.05) was also found with one haplotype (hap 3) within the ESRRA genomic region. Two haplotype association methods were used for this analysis: Global_p and the Max-Stat_p (see TABLE 3 and TABLE 7).
  • Peroxisome proliferation-activated receptor gamma coactivator 1 a (PPARGClA; PGC-I a) Association.
  • Two SNPs (rs2305683 and rs4469064 ) showed association with T2DM (p ⁇ 0.05), by allele-specific analysis and by genotype analysis using all three phenotypes or using the dominant model.
  • One additional SNP (rsl532195) is associated with higher incidence of T2DM (p ⁇ 0.05) by genotype analysis using all three genotypes (co- dominant model.
  • PGC 1 ⁇ stimulates mitochondrial biogenesis and respiration in muscle cells in mice through an induction of uncoupling protein-2 (Ucp2) and through regulation of the nuclear respiratory factors, Nrfl and Nr£2.
  • Ref ACC X p.val df Odds ci.hi ci.low perm p- cor p squared val val
  • SNP rsl053046 showed association with T2DM (p ⁇ .01) by allele-specif ⁇ c analysis and by genotype analysis using all three genotypes or the dominant model.
  • SNPs rsl 1571504, rs9296148, rs9658100 and rs3798343
  • rsl 1571504, rs9296148, rs9658100 and rs3798343 showed association with T2DM by allele-specific analysis (p ⁇ 0.05), three of which (rsl 1571504, rs9296148, rs9658100) also demonstrated association by genotype analysis using all three genotypes and the dominant model (P ⁇ 0.05).
  • PPARD belongs to the peroxisome proliferator-activated receptor transcription factor superfamily which includes PPAR-alpha, PPAR-gamma and PPAR-delta.
  • PPAR-alpha and PPAR-gamma have been shown to be activated by a variety of fatty acids and hypolipidaemic compounds and human PPARD and mouse PPARD are known to be activated by Cl 8 unsaturated fatty acids.
  • PPARs are key mediators of lipid metabolism in body.
  • TABLE 9 summarizes the PPARD SNPs used in this analysis.
  • SNP rsl053046 showed association with T2DM (pO.Ol) by allele- specific analysis and by genotype analysis using all three genotypes or the dominant model.
  • SNPs rsl 1571504, rs9296148, rs9658100 and rs3798343
  • T2DM allele-specific analysis
  • rsl 1571504, rs9296148, rs9658100 also demonstrated association by genotype analysis using all three genotypes and the dominant model (P ⁇ 0.05).
  • the TABLE also provides allele frequency of each SNP in controls and diabetic patients, and two association analytical methods. HWE, which is used to measure the population admixture of the samples, is good for these SNPs.
  • Acetyl-CoA carboxylase-beta (ACC-beta) is hypothesized to control fatty acid oxidation by means of the ability of malonyl-CoA to inhibit carnitine palmitoyl transferase I (CPTlA), the rate-limiting step in fatty acid uptake and oxidation by mitochondria.
  • CPTlA carnitine palmitoyl transferase I
  • ACC-beta is expressed primarily in heart and skeletal muscles.
  • Abu-Elheiga et al Science 291(5513): 2613-6 (2001) generated mice deficient in ACC2 by targeted disruption.
  • Acc2 -/- mutant mice have a normal life span, a higher fatty acid oxidation rate, and lower amounts of fat.
  • Acc2 -deficient mice had 10- and 30-fold lower levels of malonyl-CoA in heart and muscle, respectively.
  • the fatty acid oxidation rate in the soleus muscle of the Acc2 -/- mice was 30% higher than that of wildtype mice and was not affected by addition of insulin, while addition of insulin to the wildtype muscle reduced fatty acid oxidation by 45%.
  • the mutant mice accumulated 50% less fat in their adipose tissue than did wildtype mice.
  • ACACB inhibition is therefore a promising approach to treat obesity, insulin resistance, fatty liver disease and T2DM.
  • ACADSB Acyl-CoA dehydrogenase association.
  • ACADs acyl-CoA dehydrogenases
  • S short branched chain acyl-CoA derivative
  • S 2-methylbutyryl- CoA
  • CPTlA Carnitine Palmitoyltranserase IA
  • the CPTlA gene encodes carnitine palmitoyltransferase IA 5 a liver enzyme involved in fatty acid oxidation. Major control over fatty acid oxidation process is determined at the level of CPT I, whose activity in turn is inhibited by high cellular levels of malonyl-CoA concentration. It has been well established that dysregulation of fatty acid and lipid metabolism is of importance in the aetiology of obesity and type 2 diabetes mellitus. [192] Table 13 summaries the CPTlA SNPs used in this study.

Landscapes

  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Diabetes (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Wood Science & Technology (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Zoology (AREA)
  • Public Health (AREA)
  • Physics & Mathematics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Animal Behavior & Ethology (AREA)
  • Medicinal Chemistry (AREA)
  • General Chemical & Material Sciences (AREA)
  • Veterinary Medicine (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Pathology (AREA)
  • Obesity (AREA)
  • Hematology (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Endocrinology (AREA)
  • Emergency Medicine (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Pharmaceuticals Containing Other Organic And Inorganic Compounds (AREA)
  • Acyclic And Carbocyclic Compounds In Medicinal Compositions (AREA)

Abstract

Significant associations between SNPs in TCF2, ACADSB, CPT1A, ESRRA, PPARD, PPARGC1A and SCD1 and diabetes were observed. In addition, significant association was also observed between haplotypes in ESRRA, PPARD, and ACACB and diabetes.

Description

BIOMARKERS FOR PHARMACOGENETIC DIAGNOSIS OF TYPE 2 DIABETES
FIELD OF THE INVENTION
[01] This invention relates generally to the analytical testing of tissue samples in vitro, and more particularly to aspects of genetic polymorphisms indicative of type 2 diabetes mellitus.
BACKGROUND OF THE INVENTION
[02] Type 2 diabetes mellitus (T2DM) is the common form of diabetes affecting approximately 16 million Americans alone. T2DM has become an enormous public health concern because of the rapid increase of patient population and its association with multitude of age-related disorders and its complications. Amos AF et al, Diabet. Med. 14 Suppl 5:S1- 85 (1997). T2DM is multifactorial in origin with both genetic and environmental factors contributing to its development. However, our understanding of the disease and its treatment is limited and unsatisfactory.
[03] Conventional medical approaches to diagnosis and treatment of disease, including type 2 diabetes mellitus, are based on clinical data alone or made in conjunction with a diagnostic test. Such traditional practices often lead to therapeutic choices that are not optimal for the efficacy of the prescribed drug therapy or to minimize the likelihood of side effects for an individual subject.
[04] Therapy specific diagnostics (a.k.a., theranostics) is an emerging medical technology field, which provides tests useful to diagnose a disease, choose the correct treatment regime and monitor a subject's response. That is, theranostics are useful to predict and assess drug response in individual subjects, i.e., individualized medicine. Theranostic tests are also useful to select subjects for treatments that are particularly likely to benefit from the treatment or to provide an early and objective indication of treatment efficacy in individual subjects, so that the treatment can be altered with a minimum of delay.
[05] Progress in pharmacogenetics, which establishes correlations between responses to specific drugs and the genetic profile of individual patients, is foundational to the development of new theranostic approaches. As such, there is a need in the art for the evaluation of patient-to-patient variations in gene sequence and gene expression. A common form of genetic profiling relies on the identification of DNA sequence variations called single nucleotide polymorphisms ("SNPs"), which are one type of genetic mutation leading to, or correlated with, patient-to-patient variation in disease susceptibility, disease progression, and individual drug response. It follows that there is a need in the art to identify and characterize genetic variations, such as SNPs, which are useful to identify the genotypes of subjects associated with disease susceptibility, progression, drug responsiveness, side-effects, or optimal dose. In addition, SNPs and haplotypes (a set of closely linked genetic markers, in this context SNPs, present on one chromosome) can be used to design better clinical trials by improving patient stratification.
SUMMARY OF THE INVENTION
[06] The invention provides a response to the need in the art. Significant associations were identified between single nucleotide polymorphisms (SNPs) in TCF2, ACADSB, CPTlA, ESRRA, PPARD, PPARGClA and SCDl and diabetes were observed. In addition, significant associations were also identified between haplotypes in ESRRA, PPARD, and ACACB and diabetes. These SNPs and haplotypes are useful for improving the diagnosis of type 2 diabetes (T2DM) and for designing clinical trials by better patient stratification. [07] Accordingly, the invention provides a method for diagnosing type 2 diabetes in an individual. The genotype of the individual is determined in a gene selected from TCF2, ACADSB, CPTlA, ESRRA, PPARD, PPARGClA and SCDl. If a SNP is found that is indicative of a predisposition to type 2 diabetes, then the individual is diagnosed as having a predisposition to type 2 diabetes.
[08] The invention also provides a method for diagnosing type 2 diabetes in an individual by determining the haplotype of the individual in a gene selected from ESRRA, PPARD, or ACACB. If a haplotype is found that is indicative of a predisposition to type 2 diabetes, then the individual is diagnosed as having a predisposition to type 2 diabetes. [09] The invention further provides a theranostic method of treating type 2 diabetes in an individual. The genotype or haplotype of an individual suspected of having type 2 diabetes is determined. If the genotype or haplotype indicates that the individual suspected of having type 2 diabetes has a predisposition for having type 2 diabetes, then the individual is treated with an appropriate anti-diabetic agent or other therapy.
[10] The invention provides a method for determining whether an individual is to be included in a study of an anti-diabetic agent. The genotype or haplotype of candidate for inclusion in the study is determined. If the genotype or haplotype indicates that the candidate has a predisposition for having type 2 diabetes, then the individual is included in the study. If the genotype or haplotype indicates that the candidate does not have a predisposition for having type 2 diabetes, then the individual is either not included in the study or else included as a control. [11] The invention also provides kit for use in the methods of the invention.
DETAILED DESCRIPTION OF THE INVENTION
[12] To investigate the possible roles of various candidate genes in T2DM, an analysis was performed for over 1000 type 2 diabetic participants and over 1000 normal controls matched for ethnic background (white Caucasians), gender, age (+/- 5 years) and body mass index (BMI) (+/- units). SNPs were chosen from the public database or developed internally by direct sequencing of the genomic regions of selected genes. SNPs were spaced at an average interval of at least one per 5 kb spanning the whole genomic region of each gene (ACADSB, ACACB, CPTlA, ESRRA, PPARD, PPARGClA and SCDl). For TCF2, only a few candidate SNPs were selected from literature for the analysis. Significant associations between SNPs in TCF2, ACADSB, CPTlA, ESRRA, PPARD, PPARGClA and SCDland diabetes were observed. In addition, significant association was also observed between haplotypes in ESRRA, PPARD and ACACB and diabetes.
[13] It is to be appreciated that certain aspects, modes, embodiments, variations and features of the invention are described below in various levels of detail in order to provide a substantial understanding of the present invention. In general, such disclosure provides genetic variations, including SNPs of TCF2, ACADSB, CPTlA, ESRRA, PPARD, PPARGClA and SCDl, useful in the diagnosis and treatment of subjects in need thereof. Accordingly, the various aspects of the present invention relate to polynucleotides encoding the genetic variations of the invention, expression vectors encoding the polypeptides of the invention and organisms that express the genetic variations and/or polypeptides of the invention. The various aspects of the invention further relate to diagnostic/theranostic methods and kits that use the genetic variations of the invention to identify individuals predisposed to disease or to classify individuals with regard to drug responsiveness, side effects, or optimal drug dose. In other aspects, the invention provides methods for compound validation and a computer system for storing and analyzing data related to the genetic variations of the invention. Accordingly, various particular embodiments that illustrate these aspects follow.
[14] Definitions. The definitions of certain terms as used in this specification are provided below. Definitions of other terms may be found in the glossary provided by the U.S. Department of Energy, Office of Science, Human Genome Project
(http://www.ornl.gov/sci/techresources/Human Genome/glossary/). Type 2 diabetes mellitus (T2DM) is a clinically and genetically heterogeneous groups of disorders characterized by abnormally high levels of glucose in the blood. T2DM comprises approximately 90% of the diabetes syndrome. It is characterized by insulin resistance in muscle, liver and adipose tissue that probably begins at a preclinical stage. Eventually, defects in insulin secretion fail to compensate for insulin resistance and lead to hyperglycaemia precipitate clinical onset of diabetes. Harris MI, Chapter 32: Definition and Classification of Diabetes Mellitus and the New Criteria for Diagnosis, pp. 326-334, in Diabetes Mellitus: a Fundamental and Clinical Text, 2nd Edition, editors: LeRoith D, Taylor SI, Olefsky JM; (Lippincott Williams &Wilkins, Philadelphia, Pennsylvania, 2000).
[15] As used herein, the term "allele" means a particular form of a gene or DNA sequence at a specific chromosomal location (locus).
[16] As used herein, the term "antibody" includes, but is not limited to, polyclonal antibodies, monoclonal antibodies, humanized or chimaeric antibodies and biologically functional antibody fragments sufficient for binding of the antibody fragment to the protein. [17] As used herein, the term "clinical response" means any or all of the following: a quantitative measure of the response, no response, and adverse response (i.e., side effects). [18] As used herein, the term "clinical trial" means any research study designed to collect clinical data on responses to a particular treatment, and includes but is not limited to phase I, phase II and phase III clinical trials. Standard methods are used to define the patient population and to enrol subjects.
[19] As used herein, the term "effective amount" of a compound is a quantity sufficient to achieve a desired therapeutic and/or prophylactic effect, for example, an amount which results in the prevention of or a decrease in the symptoms associated with a disease that is being treated, e.g., the diseases associated with genetic variations and polypeptides identified herein. The amount of compound administered to the subject will depend on the type and severity of the disease and on the characteristics of the individual, such as general health, age, sex, body weight and tolerance to drugs. It will also depend on the degree, severity and type of disease.
The skilled artisan will be able to determine appropriate dosages depending on these and other factors. Typically, an effective amount of the compounds of the present invention, sufficient for achieving a therapeutic or prophylactic effect, range from about 0.000001 mg per kilogram body weight per day to about 10,000 mg per kilogram body weight per day.
Preferably, the dosage ranges are from about 0.0001 mg per kilogram body weight per day to about 100 mg per kilogram body weight per day. The compounds of the present invention can also be administered in combination with each other, or with one or more additional therapeutic compounds.
[20] As used herein, "expression" includes but is not limited to one or more of the following: transcription of the gene into precursor mRNA; splicing and other processing of the precursor mRNA to produce mature mRNA; mRNA stability; translation of the mature mRNA into protein (including codon usage and tRNA availability); and glycosylation and/or other modifications of the translation product, if required for proper expression and function.
[21] As used herein, the term "gene" means a segment of DNA that contains all the information for the regulated biosynthesis of an RNA product, including promoters, exons, introns, and other untranslated regions that control expression.
[22] As used herein, the term "genotype" means an unphased 5' to 3' sequence of nucleotide pairs found at one or more polymorphic sites in a locus on a pair of homologous chromosomes in an individual. As used herein, genotype includes a full-genotype and/or a sub-genotype.
[23] As used herein, the term "locus" means a location on a chromosome or DNA molecule corresponding to a gene or a physical or phenotypic feature.
[24] As used herein, the term "modulating agent" is any compound that alters (e.g., increases or decreases) the expression level or biological activity level of the polypeptides compared to the expression level or biological activity level of the polypeptides in the absence of the modulating agent. The modulating agent can be a small molecule, polypeptide, carbohydrate, lipid, nucleotide, or combination thereof. The modulating agent may be an organic compound or an inorganic compound.
[25] As used herein, the term "mutant" means any heritable variation from the wild-type that is the result of a mutation, e.g., single nucleotide polymorphism. The term "mutant" is used interchangeably with the terms "marker", "biomarker", and "target" throughout the specification.
[26] As used herein, the term "medical condition" includes, but is not limited to, any condition or disease manifested as one or more physical and/or psychological symptoms for which treatment is desirable, and includes previously and newly identified diseases and other disorders.
[27] As used herein, the term "nucleotide pair" means the nucleotides found at a polymorphic site on the two copies of a chromosome from an individual.
[28] As used herein, the term "polymorphic site" means a position within a locus at which at least two alternative sequences are found in a population, the most frequent of which has a frequency of no more than 99%.
[29] As used herein, the term "phased" means, when applied to a sequence of nucleotide pairs for two or more polymorphic sites in a locus, the combination of nucleotides present at those polymorphic sites on a single copy of the locus is known.
[30] As used herein, the term "polymorphism" means any sequence variant present at a frequency of >1% in a population. The sequence variant may be present at a frequency significantly greater than 1% such as 5% or 10 % or more. Also, the term may be used to refer to the sequence variation observed in an individual at a polymorphic site.
Polymorphisms include nucleotide substitutions, insertions, deletions and microsatellites and may, but need not, result in detectable differences in gene expression or protein function.
[31] As used herein, the term "polynucleotide" means any RNA or DNA, which may be unmodified or modified RNA or DNA. Polynucleotides include, without limitation, single- and double-stranded DNA, DNA that is a mixture of single- and double-stranded regions, single- and double-stranded RNA, RNA that is mixture of single- and double-stranded regions, and hybrid molecules comprising DNA and RNA that may be single-stranded or, more typically, double-stranded or a mixture of single- and double-stranded regions. In addition, polynucleotide refers to triple-stranded regions comprising RNA or DNA or both
RNA and DNA. The term polynucleotide also includes DNAs or RNAs containing one or more modified bases and DNAs or RNAs with backbones modified for stability or for other reasons.
[32] As used herein, the term "polypeptide" means any polypeptide comprising two or more amino acids joined to each other by peptide bonds or modified peptide bonds, i.e., peptide isosteres. Polypeptide refers to both short chains, commonly referred to as peptides, glycopeptides or oligomers, and to longer chains, generally referred to as proteins. Polypeptides may contain amino acids other than the 20 gene-encoded amino acids. Polypeptides include amino acid sequences modified either by natural processes, such as post- translational processing, or by chemical modification techniques that are well-known in the art. Such modifications are well described in basic texts and in more detailed monographs, as well as in a voluminous research literature.
[33] As used herein, the term "SNP nucleic acid" means a nucleic acid sequence, which comprises a nucleotide that is variable within an otherwise identical nucleotide sequence between individuals or groups of individuals, thus existing as alleles. Such SNP nucleic acids are preferably from about 15 to about 500 nucleotides in length. The SNP nucleic acids may be part of a chromosome, or they may be an exact copy of a part of a chromosome, e.g., by amplification of such a part of a chromosome through PCR or through cloning. The SNP nucleic acids are referred to hereafter simply as "SNPs". A SNP is the occurrence of nucleotide variability at a single position in the genome, in which two alternative bases occur at appreciable frequency (i.e., >1%) in the human population. A SNP may occur within a gene or within intergenic regions of the genome. SNP probes according to the invention are oligonucleotides that are complementary to a SNP nucleic acid.
[34] A "haplotype" is a set of closely linked genetic markers, in this context SNPs, present on one chromosome which tends to be inherited together.
[35] As used herein, the term "subject" means that preferably the subject is a mammal, such as a human, but can also be an animal, e.g., domestic animals (e.g., dogs, cats and the like), farm animals (e.g., cows, sheep, pigs, horses and the like) and laboratory animals (e.g., monkey (e.g., cynmologous monkey), rats, mice, guinea pigs and the like). [36] As used herein, the administration of an agent or drug to a subject or patient includes self-administration and the administration by another. It is also to be appreciated that the various modes of treatment or prevention of medical conditions as described are intended to mean "substantial", which includes total but also less than total treatment or prevention, and wherein some biologically or medically relevant result is achieved. [37] Identification and Characterization of Gene Sequence Variation. Due to their prevalence and widespread nature, SNPs have the potential to be important tools for locating genes that are involved in human disease conditions. See e.g., Wang et ah, Science 280: 1077-1082 (1998). It is increasingly clear that the risk of developing many common disorders and the metabolism of medications used to treat these conditions are substantially influenced by underlying genomic variations, although the effects of any one variant might be small. [38] A SNP is said to be "allelic" in that due to the existence of the polymorphism, some members of a species may have an unmutated sequence (i.e., the original allele) whereas other members may have a mutated sequence (i.e. , the variant or mutant allele). [39] An association between a SNP and a particular phenotype does not necessarily indicate or require that the SNP is causative of the phenotype. Instead, the association may merely be due to genome proximity between a SNP and those genetic factors actually responsible for a given phenotype, such that the SNP and said genetic factors are closely linked. That is, a SNP may be in linkage disequilibrium ("LD") with the "true" functional variant. LD (a.k.a., allelic association) exists when alleles at two distinct locations of the genome are more highly associated than expected. Thus, a SNP may serve as a marker that has value by virtue of its proximity to a mutation that causes a particular phenotype. [40] In describing the polymorphic sites of the invention, reference is made to the sense strand of the gene for convenience. As recognized by the skilled artisan, however, nucleic acid molecules containing the gene may be complementary double stranded molecules and thus reference to a particular site on the sense strand refers as well to the corresponding site on the complementary antisense strand. That is, reference may be made to the same polymorphic site on either strand and an oligonucleotide may be designed to hybridize specifically to either strand at a target region containing the polymorphic site. Thus, the invention also includes single-stranded polynucleotides that are complementary to the sense strand of the genomic variants described herein.
[41] Identification and Characterization of SNPs. Many different techniques can be used to identify and characterize SNPs, including single-strand conformation polymorphism (SSCP) analysis, heteroduplex analysis by denaturing high-performance liquid chromatography (DHPLC) and direct DNA sequencing and computational methods. Shi et ah, Clin. Chem. 47:164-172 (2001). There is a wealth of sequence information in public databases. [42] The most common SNP -typing methods currently include hybridization, primer extension, and cleavage methods. Each of these methods must be connected to an appropriate detection system. Detection technologies include fluorescent polarization (Chan et ah, Genome Res. 9:492-499 (1999)), luminometric detection of pyrophosphate release (pyrosequencing) (Ahmadiian et al, Anal. Biochem. 280:103-10 (2000)), fluorescence resonance energy transfer (FRET)-based cleavage assays, DHPLC, and mass spectrometry (Shi, Clin. Chem. 47:164-172 (2001); U.S. Pat. No. 6,300,076 Bl). Other methods of detecting and characterizing SNPs are those disclosed in U.S. Pat. Nos. 6,297,018 and 6,300,063.
TABLE IA
SNP Reverse Forward Extension
ACACB_A112792T_rs TCCATGCATGCCTG AACAAGMGGGACA GCATGCCTGTTGTT 2160602 (tag SNP) TTGTTG (SEQ ID ACAGGG (SEQ ID GGGAGTGTG (SEQ
N0:1) N0:2) ID NO:3)
ACACB_A136178G_rs TGGCCTTTTTTCTCA GTTGGGGTGAGTGC GGGTCATTCTTTTCC 882355 (tag SNP) GGGTC (SEQ ID TATGAA (SEQ ID CCTAAC (SEQ ID
N0:4) N0:5) NO:6)
ACACB_A62820G_rs2 TGCACAGCTCTCTG AAGGCACCTTCTCT TGTCCTAGGCATGG 268393 (tag SNP) TCCTAG (SEQ ID AAGACC (SEQ ID AAAACACAGC (SEQ
N0:7) N0:8) ID NO:9)
ACACB_A70941G_rs2 CCTGCAGCCTCACA GCGGTCTTCTAACT GACAGACTGGGAGA 268389 (tag SNP) TAAATG (SEQ ID ACACTC (SEQ ID TCGAGT (SEQ ID
NO: 10) N0:11) NO:12)
ACAC B_A71434G_rs2 TGCTGTTCTCCCAG AGAAAAGCCTGCAG TTCTCCCAGCAGAG 268388 (tag SNP) CAGAGA (SEQ ID GGCTAG (SEQ ID AACACTC (SEQ ID
N0:13) N0:14) NO:15)
ACACB_A75069G_rs2 GGCCTCTGATCAAG CAAGGCTCAAAAGG GAGACCCACTTATA 239607 (tag SNP) TTGAAC (SEQ ID AAACCC (SEQ ID GCCTAAC (SEQ ID
N0:16) N0:17) NO:18)
ACACB_A85859G_rs2 ATCAGGTGCAGAGA GAACCAATTACAGT TAACAAATGGGACC 300452 (tag SNP) ACTCAC (SEQ ID CTCGGG (SEQ ID AAGAAAGT (SEQ ID
N0:19) NO:20) NO:21)
ACACB_C114571 G_rs CATCTCTCTGTAGAC AGATATCTGCCTGTT GTTGGAGGAATATG 2284685 (tag SNP) CGTTG (SEQ ID CAGGG (SEQ ID GGCCGAGGGT (SEQ
NO:22) NO:23) ID NO:24)
ACACB_C115848T_rs ATCCTGTATGCATAC ATTTGCGGAAGGAT TGCATACATTGGAAA 759560 (tag SNP) ATTGG (SEQ ID GTGTAC (SEQ ID CATACA (SEQ ID
NO:25) NO:26) NO:27)
ACACB_C121771T_rs AGGGTGCCATGATT TTCTGTCCACAGCT CATGATTTCCTTGAA 3742023 (tag SNP) TCCTTG (SEQ ID CTGAAG (SEQ ID ACTGCC (SEQ ID
NO:28) NO:29) NO:30)
ACACB_C61722T_rs2 ACATCTCCCTCCAG AAGGCCCTTTAGGG CTGCCCCATCTGGT 284694 (tag SNP) GAAGAG (SEQ ID TGTGTG (SEQ ID ACTTCAGTTC (SEQ
N0:31) NO:32) ID NO:33)
ACACB_C72398T_rs2 TGAGACAGGTGGAC TGTGTTTCTGCACCA AGGCTGGTAGCGCT 284690 (tag SNP) TCAAGG (SEQ ID TCACG (SEQ ID TCTCC (SEQ ID
NO:34) NO:35) NO:36)
ACACB_C98257T_rs3 AGAGTTGGGTCTGC TTGGTGATGCTGAT CCCCGACTTGCCAT 742027 (tag SNP) AAGCAG (SEQ ID GGGCAC (SEQ ID CACC (SEQ ID
NO:37) NO:38) NO:39)
ACACB_G 10107A_rs2 GTCTTTTGACACCAC GTAATACCTCTTCCC TTTGACACCACCTTC 430684 (tag SNP) CTTCC (SEQ ID TGGTG (SEQ ID CATGGCC (SEQ ID
NO:40) N0:41) NO:42)
ACACB_G124627A_rs GTTGGCAAAGATCA CCAGACTCAGCCTA CTCCCGGTTGAAGT 2075260 (tag SNP) TCAGGG (SEQ ID CAAAAC (SEQ ID CCTTGA (SEQ ID
NO:43) NO:44) NO:45)
ACACB_G34840A_rs2 AATGCATCAGAGGC ATGTCTGTGAAGAG AGGCTGTGTGCTGT 46092 (tag SNP) TGTGTG (SEQ ID CTTGGG (SEQ ID TCCCA (SEQ ID
NO:46) NO:47) NO:48)
ACACB_G9282A_rs38 GCACACATGAGTCT CTTCCTCAAGGAAC TCTTCTCTGTCAGAA 58707 (tag SNP) TCTCTG (SEQ ID ATTCCC (SEQ ID GCCCCTGAT (SEQ
NO:49) NO:50) ID NO:51) TABLE IB
SNP Reverse Forward Extension
ACACB_T102818C_rs GGCGGTCAGATCGA TCGCATTTACCGTCA TCGAAGTTACGCAT 2241220 (tag SNP) AGTTAC (SEQ ID CTTGG (SEQ ID CCGGTT (SEQ ID
NO:52) NO:53) NO:54)
ACACB_T2745C_rs16 CCTTTCAAGATCATC CCTTTGTTCCATTAA TTAAAGAAAAGTCAG 54884 (tag SNP) ATGTG (SEQ ID TGTGGC (SEQ ID TCAAGGGTG (SEQ
NO:55) NO:56) ID NO:57)
ACACB_T33519C_rs4 TTGCGGAACATCTC GTGCTTATTGCCAA TGGAGCGCATGCAC 766516 (tag SNP) ATAGGC (SEQ ID CAACGG (SEQ ID TTCAC (SEQ ID
NO:58) NO:59) NO:60)
ACACB_T52085C_rs1 CTCTCTACAATGAG CAGGTTTAGAACCC AATGAGCCAGACTT 016331 (tag SNP) CCAGAC (SEQ ID TAGTCC (SEQ ID CATACTGT (SEQ ID
N0:61) NO:62) NO:63)
ACACB_T5524C_rs28 ATGACCAACTTCATC GTAGACTCACGAGA CTCTTTTGATGACTA 78960 (tag SNP) CTGGG (SEQ ID TGAGCC (SEQ ID CTCCTC (SEQ ID
NO:64) NO:65) NO:66)
ACACB_T57784C_rs2 CCTTGAACTCAGAA TGGCAGTCAGTGAA TACACAAGTCAGCA 287221 (tag SNP) CTCCTG (SEQ ID CAGGCTG (SEQ ID TGGATCC (SEQ ID
NO:67) NO:68) NO:69)
ACADSB_G57981C_r GAGAGATGATGAAA TAGGGCATTTCATC AGATGATGAAAACC S3763738 ACCCAC (SEQ ID CATGTC (SEQ ID CACTCATTCA (SEQ
NO:70) N0:71) ID NO:72)
CPT1A_G20600A_rs1 TCAAGCTGACCTTTC ATGCCCTTTGAGCA GCTGACCTTTCACC 017641 ACCTG (SEQ ID AACCTC (SEQ ID TGCTTTCCTC (SEQ
NO:73) NO:74) ID NO:75)
ESRRA_A15662G_rs7 GGCACAGACCTGTT AGAAACAGCCCTGC CTGTTCTTTGCTGTC 31703 (tag SNP) CTTTGC (SEQ ID TTCAGC (SEQ ID CTG (SEQ ID NO:78)
NO:76) NO:77)
ESRRA_C19763T_rs1 TGTCTCCGTAAGGT TCCCTAAGCCTCAG AAGGTCTTCAGGTT 1600990 CTTCAG (SEQ ID TTTTCG (SEQ ID AACTCAGTGA (SEQ
NO:79) NO:80) ID NO:81)
ESRRA_C7886T_gs2 ACAAGGTGCCTACC GAGGAAGACTTTTC AGGAGTCTGCGGAT 29601623 CATCTC (SEQ ID TGGGAG (SEQ ID GAC (SEQ ID NO:84)
NO:82) NO:83)
ESRRA_C9200T_rs22 ACGCGGGCTGTCCT CCCCATCCGAGTGG TCCTGCACTGACTC 86613 (tag SNP) GCACTGA (SEQ I D AATTTG (SEQ ID ACG (SEQ ID NO:87)
NO:85) NO:86)
ESRRA_T22947G_rs2 TTATTTCCTGCCTGC ACGATTGGCGAGAA CTGCCAGACCCCTC 079786 (tag SNP) CAGAC (SEQ ID AGGTGG (SEQ ID CCC (SEQ ID NO:90)
NO:88) NO:89)
PPARD_A10363T_rs1 TGAGTGAGTCCGAA ACAAGTGGGAGAAG CTCCTTCCACCCTA 1571504 ATACGG (SEQ ID ACGAAG (SEQ ID CTCT (SEQ ID NO:93)
N0:91 ) NO:92)
PPARD_A37081G_rs9 TATAGTTTTCAAGAA GGCAGGACTGATTA GAATGCACTTTTAAT 296148 TGCAC (SEQ ID AATAAG (SEQ ID AGCAGAGC (SEQ ID
NO:94) NO:95) NO:96)
PPARD_A95192G_rs1 TTCAAGCCCAGGCT TGGGCACTTCCACC CCCTCCCAAGGAGC 053046 TCCTGG (SEQ ID CAGAGT (SEQ ID CATTCT (SEQ ID
NO:97) NO:98) NO:99) TABLE 1C
SNP Reverse Forward Extension
PPARD_G56254T_rs9 TTCCACCACCTGGT TTGTGGCCCTCTAC CACCTGGTCTAATC 658100 CTAATC (SEQ ID AGTGCTG (SEQ ID ATTGACTTA (SEQ ID
NO: 100) NO:101) NO:102)
PPARD_G57307C_rs3 TGTGTCATTTCTGAA ACTTAATTTTGAGGC GCCCTGGTGACTTT 798343 GGGCC (SEQ ID CCGGG (SEQ ID CTTTGC (SEQ ID
NO:103) NO: 104) NO:105)
PPARD_A67523G_rs1 TTTCCATGTCTCCTC AATGGGACGTGCAA CTCTCTCCCTGGGA 040436 (tag SNP) TCTCC (SEQ ID TTGCAG (SEQ ID AGGTTGAGA (SEQ
NO: 106) NO:107) ID NO:108)
PPARD_A69105G_rs2 GGCTATGAAGGACA ACGTAAGGTCCTCA GAAGGACAAATGCC 267665 (tag SNP) AATGCC (SEQ ID GGAAAG (SEQ ID AAGCAAGGTG (SEQ
NO:109) NO:110) 1D NO:111)
PPARD_A74075G_rs2 AGAGACAATTCCAG AGATGCAGTTCTGG ACTAGAGACCCTGG 038068 (tag SNP) GCTAGG (SEQ ID ACTCTG (SEQ ID TCCCAA (SEQ ID
N0:112) NO:113) NO:114)
PPARD_A88093G_rs2 TGGGTCCTTCCTAA GGTTCTGTGTATTTG CCTTCCTAACTTCAC 076169 (tag SNP) CTTCAC (SEQ ID TGAGG (SEQ ID ACCCATCA (SEQ ID
N0:115) NO: 116) NO:117)
PPARD_C72138G_rs2 AAGGGAGACAACTG AGCTCTAGAGAGAC AGGTGGCTTAGTTG 267667 (tag SNP) GACTTG (SEQ ID CTGGTC (SEQ ID CTCT (SEQ ID
N0:118) NO:119) NO:120)
PPARD_C78392T_rs2 TCTGGGTCTGAACG TGACCTCTTCCTGTC GCAGATGGACCTCT 016520 (tag SNP) CAGATG (SEQ ID TTCTC (SEQ ID ACAGG (SEQ ID
N0:121) NO:122) NO:123)
PPARD_C91401T_rs2 TGAGAAGAGGAAGC TTGGAGAAGGCCTT GCTGGTGGCAGGG 076167 (tag SNP) TGGTGG (SEQ ID CAGGTC (SEQ ID CTGACTGCAAA
NO:124) NO:125) (SEQ ID NO:126)
PPARD_C9565T_gs2 AAGGTACGTGACTT TAGGCGATATGATC GACTCTTAACCCAG 29601276 (tag SNP) GCAGTG (SEQ ID TCCTCC (SEQ ID TGCTA (SEQ ID
NO:127) NO:128) NO:129)
PPARD_G10263T_rs9 TCACGGCGGCTTCC AGGGTCAGCGGGG CCGGTCAGCCGTCG 658060 (tag SNP) TGATGC (SEQ ID CGCCTAC (SEQ ID TGCG (SEQ ID
NO:130) NO:131) NO: 132)
PPARD_G77738A_rs2 TGGAGTCTTTCCAA TAAGGGTTGGAACT CTTACTGGGTGGTG 267669 (tag SNP) GGTGAC (SEQ ID GTCTCC (SEQ ID ATGCCA (SEQ ID
NO:133) NO: 134) NO: 135)
PPARD_G95618T_rs7 CTGGAAGCTGACTC CTCCAGTACTGGAT TGCATGTTTTTCCTG 60783 (tag SNP) AGTTAC (SEQ ID GTGGAG (SEQ ID GGGCT (SEQ ID
NO:136) NO:137) NO: 138)
PPARD_T94624C_rs3 TCACATCCCCCTGC TCAGTGCTTATGTGT ACTCCCCCTGAAGC 734254 (tag SNP) TCCTTT (SEQ ID GTGTG (SEQ ID TGCC (SEQ ID
NO:139) NO:140) NO:141)
PPARGC1_A19200G_ GGGTTCGTTAGAAT CAAAGGCAGTCCTC TGGCAAGTGAGAAT rs2305683 GATGGC (SEQ ID AGAACG (SEQ ID GCAGTC (SEQ ID
NO:142) NO:143) NO: 144)
PPARGC1_G112016C GAGGTTCATTCCTTT ATGCCAACTCATTCC TCCTTTCTTCACAAT _rs1532195 CTTCAC (SEQ ID ATGAG (SEQ ID TCTACAG (SEQ ID
NO:145) NO:146) NO:147)
PPARGC1_T17903C_ TGCCTCGATAAAGA TCAGAGGGCAAAAG TCGATAAAGAATGG rs4469064 ATGGGC (SEQ ID TGACTG (SEQ ID GCTTTGTGAC (SEQ
NO:148) NO:149) ID NO:150) TABLE lD
SNP Reverse Forward Extension
SCD_C16826T_rs387 CAACCCAATAACAA AGTCATGAAGAAGC GGCCAGGCATGCAA 0747 GGCCAG (SEQ ID CCAGAG (SEQ ID CTCGT (SEQ ID NO:151) NO: 152) NO: 153)
SCD_C25750T_rs784 AACCCTCTTTTGCTC TCTCATGAGGCACA CTGGCCCACTGGCT 9 TGTGG (SEQ ID GCCAAG (SEQ ID CAAC (SEQ ID NO: 154) NO: 155) NO: 156)
SCD_C27908A_rs508 TTAATGGCAGCAGA TCATACCCCATTAAC CAGACTCCTGGCTT 384 CTCCTG (SEQ ID CTGCC (SEQ ID CCTGC (SEQ ID NO: 157) NO: 158) NO:159)
TCF2_A10228G_rs11 AACAGAGGAGAAGG ATGGGAAGTCCTCT GGTACACCTCATCC 651755 (first run) TGACTG (SEQ ID TTTGCC (SEQ ID CTTTCTTC (SEQ ID NO: 160) NO:161) NO:162)
TCF2_A10228G_rs11 ATGGGAAGTCCTCT AACAGAGGAGAAGG CCTCTTTTGCCCACT 651755 (second run) TTTGCC (SEQ ID TGACTG (SEQ ID AACCTC (SEQ ID NO:163) NO:164) NO: 165)
[43] Polymorphisms can also be detected using commercially available products, such as INVADER™ technology (available from Third Wave Technologies Inc. Madison, Wisconsin, USA). In this assay, a specific upstream "invader" oligonucleotide and a partially overlapping downstream probe together form a specific structure when bound to complementary DNA template. This structure is recognized and cut at a specific site by the Cleavase enzyme, resulting in the release of the 5' flap of the probe oligonucleotide. This fragment then serves as the "invader" oligonucleotide with respect to synthetic secondary targets and secondary fluorescently labelled signal probes contained in the reaction mixture. See also, Ryan D et al, Molecular Diagnosis 4(2): 135-144 (1999) and Lyamichev V et ah, Nature Biotechnology 17: 292-296 (1999), see also U.S. Pat. Nos. 5,846,717 and 6,001,567. [44] The identity of polymorphisms may also be determined using a mismatch detection technique including, but not limited to, the RNase protection method using riboprobes (Winter et al, Proc. Natl. Acad. ScL USA 82:7575 (1985); Meyers et al, Science 230:1242 (1985)) and proteins which recognize nucleotide mismatches, such as the E. coli mutS protein (Modrich P. Ann Rev Genet 25:229-253 (1991)). Alternatively, variant alleles can be identified by single strand conformation polymorphism (SSCP) analysis (Orita et al, Genomics 5:874-879 (1989); Humphries et al, in Molecular Diagnosis of Genetic Diseases, R. Elles, ed., (1996) pp. 321-340) or denaturing gradient gel electrophoresis (DGGE) (Wartell et al, Nucl Acids. Res. 18:2699-2706 (1990); Sheffield et al, Proc. Natl Acad. Sci. USA 86:232-236 (1989)). A polymerase-mediated primer extension method may also be used to identify the polymorphisms. Several such methods have been described in the patent and scientific literature and include the "Genetic Bit Analysis" method (WO 92/15712) and the ligase/polymerase mediated genetic bit analysis (U.S. Pat. No. 5,679,524). Related methods are disclosed in WO 91/02087, WO 90/09455, WO 95/17676, and U.S. Pat. Nos. 5,302,509 and 5,945,283. Extended primers containing a polymorphism may be detected by mass spectrometry as described in U.S. Pat. No. 5,605,798. Another primer extension method is allele-specific PCR (Ruafio et al., Nucl. Acids. Res. 17:8392 (1989); Ruafio et al, Nucl. Acids. Res. 19: 6877-6882 (1991); WO 93/22456; Turki et al, J. Clin. Invest. 95:1635-1641 (1995)). In addition, multiple polymorphic sites may be investigated by simultaneously amplifying multiple regions of the nucleic acid using sets of allele-specific primers as described in PCT patent application WO 89/10414.
[45] Haplotyping and Genotyping Oligonucleotides. The invention provides methods and compositions for haplotyping and/or genotyping the gene in an individual. As used herein, the terms "genotype" and "haplotype" mean the genotype or haplotype containing the nucleotide pair or nucleotide, respectively, that is present at one or more of the polymorphic sites described herein and may optionally also include the nucleotide pair or nucleotide present at one or more additional polymorphic sites in the gene. The additional polymorphic sites may be currently known polymorphic sites or sites that are subsequently discovered. [46] . The compositions of the invention contain oligonucleotide probes and primers designed to specifically hybridize to one or more target regions containing, or that are adjacent to, a polymorphic site. Oligonucleotide compositions of the invention are useful in methods for genotyping and/or haplotyping a gene in an individual. The methods and compositions for establishing the genotype or haplotype of an individual at the polymorphic sites described herein are useful for studying the effect of the polymorphisms in the aetiology of diseases affected by the expression and function of the protein, studying the efficacy of drugs targeting, predicting individual susceptibility to diseases affected by the expression and function of the protein and predicting individual responsiveness to drugs targeting the gene product.
[47] Genotyping oligonucleotides of the invention may be immobilized on or synthesized on a solid surface such as a microchip, bead, or glass slide. See, e.g., WO 98/20020 and WO 98/20019. [48] Genotyping oligonucleotides may hybridize to a target region located one to several nucleotides downstream of one of the polymorphic sites identified herein. Such oligonucleotides are useful in polymerase-mediated primer extension methods for detecting one of the polymorphisms described herein and therefore such genotyping oligonucleotides are referred to herein as "primer-extension oligonucleotides".
[49] Direct Genotyping Method of the Invention. A genotyping method of the invention may involve isolating from an individual a nucleic acid mixture comprising the two copies of a gene of interest or fragment thereof, and determining the identity of the nucleotide pair at one or more of the polymorphic sites in the two copies. As will be readily understood by the skilled artisan, the two "copies" of a gene in an individual may be the same allele or may be different alleles. In a particularly preferred embodiment, the genotyping method comprises determining the identity of the nucleotide pair at each polymorphic site. Typically, the nucleic acid mixture is isolated from a biological sample taken from the individual, such as a blood sample or tissue sample. Suitable tissue samples include whole blood, semen, saliva, tears, urine, faecal material, sweat, buccal smears, skin and hair.
[50] A method of genotyping used in the EXAMPLE below is as follows: Genotyping of all SNPs was performed by single base extension followed by Mass Spectroscopy using Sequenom's MassArray™ Technology. Ross et al, Nat. Biotechnol. 16: 1347-1351 (1998). Ascertainment of genotypes on this system is based on matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) analysis of homogenous Mass Extension (hME) reaction products.
[51] Direct Haplotyping Method of the Invention. A haplotyping method of the invention may include isolating from an individual a nucleic acid molecule containing only one of the two copies of a gene of interest, or a fragment thereof, and determining the identity of the nucleotide at one or more of the polymorphic sites in that copy. Direct haplotyping methods include, for example, CLASPER System™ technology (U.S. Pat. No. 5,866,404) or allele- specific long-range PCR (Michalotos-Beloin et al, Nucl. Acids. Res. 24: 4841-4843 (1996)). The nucleic acid may be isolated using any method capable of separating the two copies of the gene or fragment. As will be readily appreciated by those skilled in the art, any individual clone will only provide haplotype information on one of the two gene copies present in an individual. In one embodiment, a haplotype pair is determined for an individual by identifying the phased sequence of nucleotides at one or more of the polymorphic sites in each copy of the gene that is present in the individual. In a preferred embodiment, the haplotyping method comprises identifying the phased sequence of nucleotides at each polymorphic site in each copy of the gene.
[52] In EXAMPLE below, the haplotyping method was as follows: All haplotype association analyses were performed using the haplo.score program from the haplo.stats package in R, using the recommended parameters for a binary trait. See, Becker R, Chambers J & Willks A, The new S language: a programming environment for data analysis and graphics (Wadsworth & Brooks/Cole Advanced Books, Pacific Grove, 1988) p. 702; Schaid DJ et al, Am. J. Hum. Genet. 70(2): 425-34 (2002). Haplo.score infers haplotypes from genotype data using the EM algorithm and tests association with a trait both for the individual haplotypes at a locus and for the entire set of haplotypes.
[53] In both the genotyping and haplotyping methods, the identity of a nucleotide (or nucleotide pair) at a polymorphic site may be determined by amplifying a target regions containing the polymorphic sites directly from one or both copies of the gene, or fragments thereof, and sequencing the amplified regions by conventional methods. The genotype or haplotype for the gene of an individual may also be determined by hybridization of a nucleic sample containing one or both copies of the gene to nucleic acid arrays and subarrays such as described in WO 95/11995.
[54] Indirect Genotyping Method using Polymorphic Sites in Linkage Disequilibrium with a Target Polymorphism. In addition, the identity of the alleles present at any of the polymorphic sites of the invention may be indirectly determined by genotyping other polymorphic sites in linkage disequilibrium with those sites of interest. As described above, two sites are said to be in linkage disequilibrium if the presence of a particular variant at one site is indicative of the presence of another variant at a second site. Stevens JC, MoI. Diag. 4: 309-317 (1999). Polymorphic sites in linkage disequilibrium with the polymorphic sites of the invention may be located in regions of the same gene or in other genomic regions. [55] Amplifying a Target Gene Region. The target regions may be amplified using any oligonucleotide-directed amplification method, including but not limited to polymerase chain reaction (PCR). (U.S. Pat. No. 4,965,188), ligase chain reaction (LCR) (Barany et al, Proc. Natl. Acad. Sci. USA 88:189-193 (1991); published PCT patent application WO 90/01069), and oligonucleotide ligation assay (OLA) (Landegren et al, Science 241 : 1077-1080 (1988)). Oligonucleotides useful as primers or probes in such methods should specifically hybridize to a region of the nucleic acid that contains or is adjacent to the polymorphic site. Typically, the oligonucleotides are between 10 and 35 nucleotides in length and preferably, between 15 and 30 nucleotides in length. Most preferably, the oligonucleotides are 20 to 25 nucleotides long. The exact length of the oligonucleotide will depend on many factors that are routinely considered and practiced by the skilled artisan.
[56] Other known nucleic acid amplification procedures may be used to amplify the target region including transcription-based amplification systems (U.S. Pat. No. 5,130,238; EP 329,822; U.S. Pat. No. 5,169,766, published PCT patent application WO 89/06700) and isothermal methods (Walker et al, Proc. Natl. Acad. ScI USA 89:392-396 (1992)). [57] Hybridizing Allele-Speciβc Oligonucleotide to a Target Gene. A polymorphism in the target region may be assayed before or after, amplification using one of several hybridization- based methods known in the art. Typically, allele-specific oligonucleotides are utilized in performing such methods. The allele-specific oligonucleotides may be used as differently labelled probe pairs, with one member of the pair showing a perfect match to one variant of a target sequence and the other member showing a perfect match to a different variant. In some embodiments, more than one polymorphic site may be detected at once using a set of allele- specific oligonucleotides or oligonucleotide pairs. Preferably, the members of the set have melting temperatures within 5°C, and more preferably within 2°C, of each other when hybridizing to each of the polymorphic sites being detected.
[58] Hybridization of an allele-specific oligonucleotide to a target polynucleotide may be performed with both entities in solution, or such hybridization may be performed when either the oligonucleotide or the target polynucleotide is covalently or noncovalently affixed to a solid support. Attachment may be mediated, for example, by antibody-antigen interactions, poly-L-Lys, streptavidin or avidin-biotin, salt bridges, hydrophobic interactions, chemical linkages, UV cross-linking, baking, etc. Allele-specific oligonucleotide may be synthesized directly on the solid support or attached to the solid support subsequent to synthesis. Solid- supports suitable for use in detection methods of the invention include substrates made of silicon, glass, plastic, paper and the like, which may be formed, for example, into wells (as in 96-well plates), slides, sheets, membranes, fibres, chips, dishes, and beads. The solid support may be treated, coated or derivatised to facilitate the immobilization of the allele-specific oligonucleotide or target nucleic acid. [59] Determining Population Genotypes and Haplotypes and Correlating Them with a Trait. The invention provides a method for determining the frequency of a genotype or haplotype in a population. The method comprises determining the genotype or the haplotype for a gene present in each member of the population, wherein the genotype or haplotype comprises the nucleotide pair or nucleotide detected at one or more of the polymorphic sites in the gene, and calculating the frequency at which the genotype or haplotype is found in the population. The population may be a reference population, a family population, a same sex population, a population group, or a trait population {e.g., a group of individuals exhibiting a trait of interest such as a medical condition or response to a therapeutic treatment). [60] In another aspect of the invention, frequency data for genotypes and/or haplotypes found in a reference population are used in a method for identifying an association between a trait and a genotype or a haplotype. The trait may be any detectable phenotype, including but not limited to susceptibility to a disease or response to a treatment. The method involves obtaining data on the frequency of the genotypes or haplotypes of interest in a reference population and comparing the data to the frequency of the genotypes or haplotypes in a population exhibiting the trait. Frequency data for one or both of the reference and trait populations may be obtained by genotyping or haplotyping each individual in the populations using one of the methods described above. The haplotypes for the trait population may be determined directly or, alternatively, by the predictive genotype to haplotype approach described above.
[61] The frequency data for the reference and/or trait populations are obtained by accessing previously determined frequency data, which may be in written or electronic form. For example, the frequency data may be present in a database that is accessible by a computer. Once the frequency data are obtained, the frequencies of the genotypes or haplotypes of interest in the reference and trait populations are compared.
[62] When polymorphisms are being analyzed, a calculation may be performed to correct for a significant association that might be found by chance. For statistical methods useful in the methods of the invention, see Statistical Methods in Biology, 3rd edition, Bailey NTJ, (Cambridge Univ. Press, 1997); Waterman MS, Introduction to Computational Biology (CRC Press, 2000) and Bioinformatics, Baxevanis AD & Ouellette BFF editors (John Wiley & Sons, Inc., 2001). [63] In another embodiment, the haplotype frequency data for different groups are examined to determine whether they are consistent with Hardy- Weinberg equilibrium. D.L. Hartl et al., Principles of Population Genomics, 3rd Ed. (Sinauer Associates, Sunderland, MA, 1997).
[64] In another embodiment, statistical analysis is performed by the use of standard ANOVA tests with a Bonferoni correction or a bootstrapping method that simulates the genotype phenotype correlation many times and calculates a significance value. ANOVA is used to test hypotheses about whether a response variable is caused by or correlates with one or more traits or variables that can be measured. L.D. Fisher & G. vanBelle, Biostatistics: A Methodology for the Health Sciences (Wiley-lnterscience, New York, 1993) Ch. 10. [65] In one embodiment for predicting a haplotype pair, the analysis includes an assigning step, as follows: First, each of the possible haplotype pairs is compared to the haplotype pairs in the reference population. Generally, only one of the haplotype pairs in the reference population matches a possible haplotype pair and that pair is assigned to the individual. Occasionally, only one haplotype represented in the reference haplotype pairs is consistent with a possible haplotype pair for an individual, and in such cases the individual is assigned a haplotype pair containing this known haplotype and a new haplotype derived by subtracting the known haplotype from the possible haplotype pair.
[66] In another embodiment, a detectable genotype or haplotype that is in linkage disequilibrium with a genotype or haplotype of interest may be used as a surrogate marker. A genotype that is in linkage disequilibrium with another genotype is indicated where a particular genotype or haplotype for a given gene is more frequent in the population that also demonstrates the potential surrogate marker genotype than in the reference population. If the frequency is statistically significant, then the marker genotype is predictive of that genotype or haplotype, and can be used as a surrogate marker.
[67] Another method for finding correlations between haplotype content and clinical responses uses predictive models based on error-minimizing optimization algorithms, one of which is a genetic algorithm. See, R. Judson, "Genetic Algorithms and Their Uses in Chemistry" in Reviews in Computational Chemistry 10: 1-73, K.B. Lipkowitz & D.B. Boyd, eds. (VCH Publishers, New York, 1997). Simulated annealing (Press et al, Numerical Recipes in C: The Art of Scientific Computing, Ch. 10 (Cambridge University Press, Cambridge) 1992), neural networks (E. Rich & K. Knight, Artificial Intelligence, 2nd Edition, Ch. 10 (McGraw-Hill, New York, 1991), standard gradient descent methods (Press et al, supra Ch. 10), or other global or local optimization approaches (see discussion in Judson, supra) can also be used.
[68] Correlating Subject Genotype or Haplotype to Treatment Response. In preferred embodiments, the trait is susceptibility to a disease, severity of a disease, the staging of a disease or response to a drug. Such methods have applicability in developing diagnostic tests and therapeutic treatments for all pharmacogenetic applications where there is the potential for an association between a genotype and a treatment outcome, including efficacy measurements, pharmacokinetic measurements and side-effect measurements.
[69] In another preferred embodiment, the trait of interest is a clinical response exhibited by a patient to some therapeutic treatment, for example, response to a drug targeting or to a therapeutic treatment for a medical condition.
[70] To deduce a correlation between a clinical response to a treatment and a genotype or haplotype, genotype or haplotype data is obtained on the clinical responses exhibited by a population of individuals who received the treatment, hereinafter the "clinical population".
This clinical data may be obtained by analyzing the results of a clinical trial that has already been run and/or by designing and carrying out one or more new clinical trials.
[71] The individuals included in the clinical population are usually graded for the existence of the medical condition of interest. This grading of potential patients could employ a standard physical exam or one or more lab tests. Alternatively, grading of patients could use haplotyping for situations where there is a strong correlation between haplotype pair and disease susceptibility or severity.
[72] The therapeutic treatment of interest is administered to each individual in the trial population, and each individual's response to the treatment is measured using one or more predetermined criteria. It is contemplated that in many cases, the trial population will exhibit a range of responses and that the investigator will choose the number of responder groups (e.g., low, medium, high) made up by the various responses. In addition, the gene for each individual in the trial population is genotyped and/or haplotyped, which may be done before or after administering the treatment.
[73] These results are then analyzed to determine if any observed variation in clinical response between polymorphism groups is statistically significant. Statistical analysis methods, which may be used, are described in L.D. Fisher & G. vanBelle, Biostatistics: A Methodology for the Health Sciences (Wiley-lnterscience, New York, 1993). This analysis may also include a regression calculation of which polymorphic sites in the gene contribute most significantly to the differences in phenotype.
[74] After both the clinical and polymorphism data have been obtained, correlations between individual response and genotype or haplotype content are created. Correlations may be produced in several ways. In one method, individuals are grouped by their genotype or haplotype (or haplotype pair) (also referred to as a polymorphism group), and then the averages and standard deviations of clinical responses exhibited by the members of each polymorphism group are calculated.
[75] From the analyses described above, the skilled artisan that predicts clinical response as a function of genotype or haplotype content may readily construct a mathematical model. The identification of an association between a clinical response and a genotype or haplotype (or haplotype pair) for the gene may be the basis for designing a diagnostic method to determine those individuals who will or will not respond to the treatment, or alternatively, will respond at a lower level and thus may require more treatment, i.e., a greater dose of a drug. The diagnostic method may take one of several forms: for example, a direct DNA test (i.e., genotyping or haplotyping one or more of the polymorphic sites in the gene), a serological test, or a physical exam measurement. The only requirement is that there be a good correlation between the diagnostic test results and the underlying genotype or haplotype. In a preferred embodiment, this diagnostic method uses the predictive haplotyping method described above.
[76] Assigning a Subject to a Genotype Group. As one of skill in the art will understand, there will be a certain degree of uncertainty involved in making this determination. Therefore, the standard deviations of the control group levels would be used to make a probabilistic determination and the methods of this invention would be applicable over a wide range of probability based genotype group determinations. Thus, for example and not by way of limitation, in one embodiment, if the measured level of the gene expression product falls within 2.5 standard deviations of the mean of any of the control groups, then that individual may be assigned to that genotype group. In another embodiment if the measured level of the gene expression product falls within 2.0 standard deviations of the mean of any of the control groups then that individual may be assigned to that genotype group. In still another embodiment, if the measured level of the gene expression product falls within 1.5 standard deviations of the mean of any of the control groups then that individual may be assigned to that genotype group. In yet another embodiment, if the measured level of the gene expression product is 1.0 or less Standard deviations of the mean of any of the control groups levels then that individual may be assigned to that genotype group.
[77] Thus this process allows determination, with various degrees of probability, which group a specific subject should be placed in, and such assignment to a genotype group would then determine the risk category into which the individual should be placed. [78] Correlation between Clinical Response and Genotype or Haplotype. In order to deduce a correlation between clinical response to a treatment and a genotype or haplotype, it is necessary to obtain data on the clinical responses exhibited by a population of individuals who received the treatment, hereinafter the "clinical population." This clinical data may be obtained by analyzing the results of a clinical trial that has already been run and/or the clinical data may be obtained by designing and carrying out one or more new clinical trials. [79] The standard control levels of the gene expression product, thus determined in the different control groups, would then be compared with the measured level of a gene expression product in a given patient. This gene expression product could be the characteristic niRNA associated with that particular genotype group or the polypeptide gene expression product of that genotype group. The patient could then be classified or assigned to a particular genotype group based on how similar the measured levels were compared to the control levels for a given group.
[80] Computer System for Storing or Displaying Polymorphism Data. The invention also provides a computer system for storing and displaying polymorphism data determined for the gene. The computer system comprises a computer processing unit, a display, and a database containing the polymorphism data. The polymorphism data includes the polymorphisms, the genotypes and the haplotypes identified for a given gene in a reference population. In a preferred embodiment, the computer system is capable of producing a display showing haplotypes organized according to their evolutionary relationships. A computer may implement any or all analytical and mathematical operations involved in practicing the methods of the present invention. In addition, the computer may execute a program that generates views (or screens) displayed on a display device and with which the user can interact to view and analyze large amounts of information relating to the gene and its genomic variation, including chromosome location, gene structure, and gene family, gene expression data, polymorphism data, genetic sequence data, and clinical population data (e.g., data on ethnogeographic origin, clinical responses, genotypes, and haplotypes for one or more populations). The polymorphism data described herein may be stored as part of a relational database (e.g., an instance of an Oracle database or a set of ASCII flat files). These polymorphism data may be stored on the computer's hard drive or may, for example, be stored on a CD-ROM or on one or more other storage devices accessible by the computer. For example, the data may be stored on one or more databases in communication with the computer via a network.
[81] Nucleic Acid-based Diagnostics. In another aspect, the invention provides SNP probes, which are useful in classifying subjects according to their types of genetic variation. The SNP probes according to the invention are oligonucleotides, which discriminate between SNPs in conventional allelic discrimination assays. In certain preferred embodiments, the oligonucleotides according to this aspect of the invention are complementary to one allele of the SNP nucleic acid, but not to any other allele of the SNP nucleic acid. Oligonucleotides according to this embodiment of the invention can discriminate between SNPs in various ways. For example, under stringent hybridization conditions, an oligonucleotide of appropriate length will hybridize to one SNP, but not to any other. The oligonucleotide may be labelled using a radiolabel or a fluorescent molecular tag. Alternatively, an oligonucleotide of appropriate length can be used as a primer for PCR, wherein the 3' terminal nucleotide is complementary to one allele containing a SNP, but not to any other allele. In this embodiment, the presence or absence of amplification by PCR determines the haplotype of the SNP.
[82] Genomic and cDNA fragments of the invention comprise at least one polymorphic site identified herein, have a length of at least 10 nucleotides, and may range up to the full length of the gene. Preferably, a fragment according to the present invention is between 100 and 3000 nucleotides in length, and more preferably between 200 and 2000 nucleotides in length, and most preferably between 500 and 1000 nucleotides in length.
[83] Kits of the Invention. The invention provides nucleic acid and polypeptide detection kits useful for haplotyping and/or genotyping the gene in an individual. Such kits are useful for classifying individuals for the purpose of classifying individuals. Specifically, the invention encompasses kits for detecting the presence of a polypeptide or nucleic acid corresponding to a marker of the invention in a biological sample, e.g., any bodily fluid including, but not limited to, seram, plasma, lymph, cystic fluid, urine, stool, cerebrospinal fluid, ascities fluid or blood, and including biopsy samples of body tissue. For example, the kit can comprise a labelled compound or agent capable of detecting a polypeptide or an mRNA encoding a polypeptide corresponding to a marker of the invention in a biological sample and means for determining the amount of the polypeptide or mRNA in the sample, e.g., an antibody which binds the polypeptide or an oligonucleotide probe which binds to DNA or mRNA encoding the polypeptide. Kits can also include instructions for interpreting the results obtained using the kit.
[84] In another embodiment, the invention provides a kit comprising at least two genotyping oligonucleotides packaged in separate containers. The kit may also contain other components such as hybridization buffer (where the oligonucleotides are to be used as a probe) packaged in a separate container. Alternatively, where the oligonucleotides are to be used to amplify a target region, the kit may contain, packaged in separate containers, a polymerase and a reaction buffer optimized for primer extension mediated by the polymerase, such as in the case of PCR. In a preferred embodiment, such kit may further comprise a DNA sample collecting means.
[85] For antibody-based kits, the kit can comprise, e.g. , (1) a first antibody, e.g. , attached to a solid support, which binds to a polypeptide corresponding to a marker or the invention; and, optionally (2) a second, different antibody which binds to either the polypeptide or the first antibody and is conjugated to a detectable label.
[86] For oligonucleotide-based kits, the kit can comprise, e.g., (1) an oligonucleotide, e.g., a detectably-labelled oligonucleotide, which hybridizes to a nucleic acid sequence encoding a polypeptide corresponding to a marker of the invention; or (2) a pair of primers useful for amplifying a nucleic acid molecule corresponding to a marker of the invention. [87] The kit can also comprise, e.g., a buffering agent, a preservative or a protein- stabilizing agent. The kit can further comprise components necessary for detecting the detectable-label, e.g., an enzyme or a substrate. The kit can also contain a control sample or a series of control samples, which can be assayed and compared to the test sample. Each component of the kit can be enclosed within an individual container and all of the various containers can be within a single package, along with instructions for interpreting the results of the assays performed using the kit. [88] Nucleic Acid Sequences of the Invention. In one aspect, the invention comprises one or more isolated polynucleotides. The invention also encompasses allelic variants of the same, that is, naturally occurring alternative forms of the isolated polynucleotides that encode mutant polypeptides that are identical, homologous or related to those encoded by the polynucleotides. Alternatively, non-naturally occurring variants may be produced by mutagenesis techniques or by direct synthesis techniques well-known in the art. [89] Accordingly, nucleic acid sequences capable of hybridizing at low stringency with any nucleic acid sequences encoding mutant polypeptide of the present invention are considered to be within the scope of the invention. Standard stringency conditions are well characterized in standard molecular biology cloning texts. See, for example Molecular Cloning A Laboratory Manual, 2nd Ed., ed., Sambrook, Fritsch, & Maniatis (Cold Spring Harbor Laboratory Press, 1989); DNA Cloning, Volumes I and II, D.N. Glover, ed. (1985); Oligonucleotide Synthesis, MJ. Gait, ed. (1984); Nucleic Acid Hybridization, B.D. Hames & SJ. Higgins, eds (1984).
[90] Recombinant Expression Vectors. Another aspect of the invention comprises vectors containing one or more nucleic acid sequences encoding a mutant polypeptide. In practicing the present invention, many conventional techniques in molecular biology, microbiology and recombinant DNA are used. These techniques are well-known and are explained in, e.g., Current Protocols in Molecular Biology, VoIs. I-III, Ausubel, ed. (1997); Sambrook et ah, Molecular Cloning: A Laboratory Manual, 2nd Ed. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1989); DNA Cloning: A Practical Approach, VoIs. I and II, Glover, Ed. (1985); Oligonucleotide Synthesis, Gait, Ed. (1984); Nucleic Acid Hybridization, Hames & Higgins, Eds. (1985); Transcription and Translation, Hames & Higgins, Eds. (1984); Animal Cell Culture, Freshney, ed. (1986); Immobilized Cells and Enzymes (IRL Press, 1986); Perbal, A Practical Guide to Molecular Cloning; the series Methods in Enzymology, (Academic Press, Inc., 1984); Gene Transfer Vectors for Mammalian Cells, Miller & Calos, Eds. (Cold Spring Harbor Laboratory, New York, 1987); and Methods in Enzymology, VoIs. 154 and 155, Wu & Grossman, and Wu, Eds., respectively.
[91] For recombinant expression of one or more the polypeptides of the invention, the nucleic acid containing all or a portion of the nucleotide sequence encoding the polypeptide is inserted into an appropriate cloning vector, or an expression vector {i.e., a vector that contains the necessary elements for the transcription and translation of the inserted polypeptide coding sequence) by recombinant DNA techniques well-known in the art. [92] In the specification, "plasmid" and "vector" can be used interchangeably as the plasmid is the most commonly used form of vector. However, the invention is intended to include such other forms of expression vectors that are not technically plasmids, such as viral vectors (e.g. , replication defective retroviruses, adenoviruses and adeno-associated viruses), which serve equivalent functions. Such viral vectors permit infection of a subject and expression in that subject of a compound (Becker et ah, Meth. Cell Biol. 43: 161 89 (1994)). Within a recombinant expression vector, "operably linked" is intended to mean that the nucleotide sequence of interest is linked to the regulatory sequences in a manner that allows for expression of the nucleotide sequence (e.g., in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell). The term "regulatory sequence" is intended to include promoters, enhancers and other expression control elements (e.g., polyadenylation signals). Such regulatory sequences are described, for example, in Goeddel, Gene Expression Technology: Methods In Enzymology 185 (Academic Press, San Diego, Calif, 1990). Regulatory sequences include those that direct constitutive expression of a nucleotide sequence in many types of host cell and those that direct expression of the nucleotide sequence only in certain host cells (e.g., tissue specific regulatory sequences). It will be appreciated by those skilled in the art that the design of the expression vector can depend on such factors as the choice of the host cell to be transformed, the level of expression of polypeptide desired, etc. The expression vectors of the invention can be introduced into host cells to thereby produce polypeptides or peptides, including fusion polypeptides, encoded by nucleic acids as described herein (e.g., mutant polypeptides and mutant-derived fusion polypeptides, 'etc.).
[93] Polypeptide-Expressing Host Cells. Another aspect of the invention pertains to polypeptide-expressing host cells, which contain a nucleic acid encoding one or more mutant polypeptides of the invention. The terms "host cell" and "recombinant host cell" are used interchangeably herein. It is understood that such terms refer not only to the particular subject cell but also to the progeny or potential progeny of such a cell. Because certain modifications may occur in succeeding generations due to either mutation or environmental influences, such progeny may not, in fact, be identical to the parent cell, but are still included within the scope of the term as used herein. A host cell can be any prokaryotic or eukaryotic cell. Sambrook et al Molecular Cloning: A Laboratory Manual, 2nd ed. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1989).
[94] Vector DNA can be introduced into prokaryotic or eukaryotic cells via conventional transformation or transfection techniques. As used herein, the terms "transformation" and "transfection" are intended to refer to a variety of art recognized techniques for introducing foreign nucleic acid {e.g., DNA) into a host cell, including calcium phosphate or calcium chloride co precipitation, DEAE dextran mediated transfection, lipofection, or electroporation. Suitable methods for transforming or transfecting host cells can be found in Sambrook et al. Molecular Cloning: A Laboratory Manual, 2nd ed. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1989), and other laboratory manuals. Methods such as electroporation, particle bombardment, calcium phosphate co-precipitation and viral transduction for introducing DNA into cells are known in the art; therefore, the choice of method may lie with the competence and preference of the skilled practitioner. [95] To prepare a recombinant cell of the invention, the desired isogene may be introduced into a host cell in a vector such that the isogene remains extrachromosomal. In such a situation, the gene will be expressed by the cell from the extrachromosomal location. In a preferred embodiment, the isogene is introduced into a cell in such a way that it recombines with the endogenous gene present in the cell. Vectors for the introduction of genes both for recombination and for extrachromosomal maintenance are known in the art, and any suitable vector or vector construct may be used in the invention.
[96] The recombinant expression vectors of the invention can be designed for expression of mutant polypeptides in prokaryotic or eukaryotic cells. For example, mutant polypeptide can be expressed in bacterial cells such as Escherichia coli (E. coli), insect cells (using baculovirus expression vectors), fungal cells, e.g., yeast, or mammalian cells. Suitable host cells are discussed further in Goeddel, Gene Expression Technology: Methods In Enzymology 185 (Academic Press, San Diego, Calif., 1990).
[97] Expression of polypeptides in prokaryotes is most often carried out in E. coli with vectors containing constitutive or inducible promoters directing the expression of either fusion or non fusion polypeptides. Typical fusion expression vectors include pGEX (Pharmacia Biotech Inc; Smith and Johnson, 1988. Gene 67: 31 40), pMAL (New England Biolabs, Beverly, Mass., USA) and pRIT5 (Pharmacia, Piscataway, N.J., USA) that fuse glutathione S transferase (GST), maltose E binding polypeptide, or polypeptide A, respectively, to the target recombinant polypeptide. Examples of suitable inducible non fusion E. coli expression vectors include pTrc (Amrann et al, Gene 69: 301 315 (1988)) and pET 1 Id (Studier et al, Gene Expression Technology: Methods In Enzymology (Academic Press, San Diego, Calif., 1990) pp. 60-89. Other strategies are described by Gottesman, Gene Expression Technology: Methods In Enzymology (Academic Press, San Diego, Calif., 1990) pp. 119-128 and by Wada, et al. , Nucl. Acids Res. 20: 2111 -2118 (1992)).
[98] The polypeptide expression vector may be a yeast expression vector. Examples of vectors for expression in yeast Saccharomyces cerivisae include pYepSecl (Baldari et al, EMBO J. 6: 229 234 (1987)), pMFa (Kurjan & Herskowitz, Cell 30: 933-943 (1982)), pJRY88 (Schultz etal, Gene 54: 113 123 (1987)), pYES2 (InVitrogen Corporation, San Diego, Calif. USA), and picZ (InVitrogen Corp, San Diego, Calif, USA). Alternatively, mutant polypeptide can be expressed in insect cells using baculovirus expression vectors. Baculovirus vectors available for expression of polypeptides in cultured insect cells (e.g., SF9 cells) include the pAc series (Smith et al, MoI Cell Biol 3: 2156 2165 (1983)) and the pVL series (Lucklow & Summers, Virology 170: 31 39 (1989)). The nucleic acid of the invention may be expressed in mammalian cells using a mammalian expression vector such as pCDM8 (Seed, Nature 329: 842 846 (1987)) or pMT2PC (Kaufman et al, EMBOJ. 6: 187 195 (1987)). For other suitable expression systems for both prokaryotic and eukaryotic cells see, e.g., Chapters 16 and 17 of Sambrook et al, Molecular Cloning: A Laboratory Manual, 2nd ed. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1989). Tissue specific and developmentally-regulated regulatory elements are known in the art. For a discussion of the regulation of gene expression using antisense genes see, e.g., Weintraub et al, "Antisense RNA as a molecular tool for genetic analysis," Trends in Genetics 1(1) (1986).
[99] A host cell that includes a compound of the invention, such as a prokaryotic or eukaryotic host cell in culture, can be used to produce (i.e., express) recombinant mutant polypeptide. Purification of recombinant polypeptides is well-known in the art and includes ion exchange purification techniques, or affinity purification techniques, for example with an antibody to the compound.
[100] Transgenic Animals. Recombinant organisms, i.e., transgenic animals, expressing a variant gene of the invention are prepared using standard procedures known in the art. Transgenic animals carrying the constructs of the invention can be made by several methods known to those having skill in the art. See, e.g., U.S. Pat. No. 5,610,053 and "The Introduction of Foreign Genes into Mice" and the cited references therein, in: Recombinant DNA, Eds. J.D. Watson, M. Gilman, J. Witkowski & M. Zoller (W.H. Freeman and Company, New York) pp. 254-272. Transgenic animals stably expressing a human isogene and producing human protein can be used as biological models for studying diseases related to abnormal expression and/or activity, and for screening and assaying various candidate drugs, compounds, and treatment regimens to reduce the symptoms or effects of these diseases. [101] Characterizing Gene Expression Level. Methods to detect and measure mRNA levels (i.e., gene transcription level) and levels of polypeptide gene expression products (i.e., gene translation level) are well-known in the art and include the use of nucleotide microarrays and polypeptide detection methods involving mass spectrometers and/or antibody detection and quantification techniques. See also, Tom Strachan & Andrew Read, Human Molecular Genetics, 2nd Edition. (John Wiley and Sons, Inc. Publication, New York, 1999)). [102] Determination of T or get Gene Transcription. The determination of the level of the expression product of the gene in a biological sample, e.g., the tissue or body fluids of an individual, may be performed in a variety of ways. The term "biological sample" is intended to include tissues, cells, biological fluids and isolates thereof, isolated from a subject, as well as tissues, cells and fluids present within a subject. Many expression detection methods use isolated RNA. For in vitro methods, any RNA isolation technique that does not select against the isolation of mRNA can be utilized for the purification of RNA from cells. See, e.g., Ausubel et al, Ed., Curr. Prot. MoI. Biol. (John Wiley & Sons, New York, 1987-1999). [103] In one embodiment, the level of the mRNA expression product of the target gene is determined. Methods to measure the level of a specific mRNA are well-known in the art and include Northern blot analysis, reverse transcription PCR and real time quantitative PCR or by hybridization to a oligonucleotide array or microarray. In other more preferred embodiments, the determination of the level of expression may be performed by determination of the level of the protein or polypeptide expression product of the gene in body fluids or tissue samples including but not limited to blood or serum. Large numbers of tissue samples can readily be processed using techniques well-known to those of skill in the art, such as, e.g., the single-step RNA isolation process of U.S. Pat. No. 4,843,155. [104] The isolated mRNA can be used in hybridization or amplification assays that include, but are not limited to, Southern or Northern analyses, PCR analyses and probe arrays. One preferred diagnostic method for the detection of mRNA levels involves contacting the isolated mRNA with a nucleic acid molecule (probe) that can hybridize to the mRNA encoded by the gene being detected. The nucleic acid probe can be, e.g., a full-length cDNA, or a portion thereof, such as an oligonucleotide of at least 7, 15, 30, 50, 100, 250 or 500 nucleotides in length and sufficient to specifically hybridize under stringent conditions to an mRNA or genomic DNA encoding a marker of the present invention. Other suitable probes for use in the diagnostic assays of the invention are described herein. Hybridization of an mRNA with the probe indicates that the marker in question is being expressed. [105] In one format, the probes are immobilized on a solid surface and the mRNA is contacted with the probes, for example, in an Affymetrix gene chip array (Affymetrix, Calif. USA). A skilled artisan can readily adapt known mRNA detection methods for use in detecting the level of mRNA encoded by the markers of the present invention. [106] An alternative method for determining the level of mRNA corresponding to a marker of the present invention in a sample involves the process of nucleic acid amplification, e.g., by RT-PCR (the experimental embodiment set forth in U.S. Pat. No. 4,683,202); ligase chain reaction (Barany et αl, Proc. Nαtl. Acαd. Sci. USA 88:189-193 (1991)) self-sustained sequence replication (Guatelli et αl., Proc. Nαtl. Acαd. Sci. USA 87: 1874-1878 (1990)); transcriptional amplification system (Kwoh et αl., Proc. Nαtl. Acαd. Sci. USA 86: 1173-1177 (1989)); Q-Beta Replicase (Lizardi et αl., Biol Technology 6: 1197 (1988)); rolling circle replication (U.S. Pat. No. 5,854,033); or any other nucleic acid amplification method, followed by the detection of the amplified molecules using techniques well-known to those of skill in the art. These detection schemes are especially useful for the detection of the nucleic acid molecules if such molecules are present in very low numbers. As used herein, "amplification primers" are defined as being a pair of nucleic acid molecules that can anneal to 5' or 3' regions of a gene (plus and minus strands, respectively, or vice- versa) and contain a short region in between. In general, amplification primers are from about 10-30 nucleotides in length and flank a region from about 50-200 nucleotides in length. [107] Real-time quantitative PCR (RT-PCR) is one way to assess gene expression levels, e.g., of genes of the invention, e.g., those containing SNPs and polymorphisms of interest. The RT-PCR assay utilizes an RNA reverse transcriptase to catalyze the synthesis of a DNA strand from an RNA strand, including an mRNA strand. The resultant DNA may be specifically detected and quantified and this process may be used to determine the levels of specific species of mRNA. One method for doing this is TAQMAN® (PE Applied Biosystems, Foster City, Calif., USA) and exploits the 5' nuclease activity of AMPLITAQ GOLD™ DNA polymerase to cleave a specific form of probe during a PCR reaction. This is referred to as a TAQMAN™ probe. See Luthra et al., Am. J. Pathol 153: 63-68 (1998); Kuimelis et al, Nucl. Acids Symp. Ser. 37: 255-256 (1997); and Mullah et al, Nucl. Acids Res. 26(4): 1026-1031 (1998)). During the reaction, cleavage of the probe separates a reporter dye and a quencher dye, resulting in increased fluorescence of the reporter. The accumulation of PCR products is detected directly by monitoring the increase in fluorescence of the reporter dye. Heid et al, Genome Res. 6(6): 986-994 (1996)). The higher the starting copy number of nucleic acid target, the sooner a significant increase in fluorescence is observed. See Gibson, Heid & Williams et al, Genome Res. 6: 995-1001 (1996). [108] Other technologies for measuring the transcriptional state of a cell produce pools of restriction fragments of limited complexity for electrophoretic analysis, such as methods combining double restriction enzyme digestion with phasing primers (see, e.g., EP 0 534858 Al), or methods selecting restriction fragments with sites closest to a defined mRNA end. (See, e.g., Prashar & Weissman, Proc. Natl. Acad. Sci. USA 93(2) 659-663 (1996)). [109] Other methods statistically sample cDNA pools, such as by sequencing sufficient bases, e.g., 20-50 bases, in each of multiple cDNAs to identify each cDNA, or by sequencing short tags, e.g., 9-10 bases, which are generated at known positions relative to a defined mRNA end pathway pattern. See, e.g., Velculescu, Science 270: 484-487 (1995). The cDNA levels in the samples are quantified and the mean, average and standard deviation of each cDNA is determined using by standard statistical means well-known to those of skill in the art. Norman TJ. Bailey, Statistical Methods In Biology, 3rd Edition (Cambridge University Press, 1995).
[110] Detection of Polypeptides. Immunological Detection Methods. Expression of the protein encoded by the genes of the invention can be detected by a probe which is detectably labelled, or which can be subsequently labelled. The term "labelled", with regard to the probe or antibody, is intended to encompass direct-labelling of the probe or antibody by coupling, i.e., physically linking, a detectable substance to the probe or antibody, as well as indirect- labelling of the probe or antibody by reactivity with another reagent that is directly-labelled. Examples of indirect labelling include detection of a primary antibody using a fluorescently- labelled secondary antibody and end-labelling of a DNA probe with biotin such that it can be detected with fluorescently-labelled streptavidin. Generally, the probe is an antibody that recognizes the expressed protein. A variety of formats can be employed to determine whether a sample contains a target protein that binds to a given antibody. Immunoassay methods useful in the detection of target polypeptides of the present invention include, but are not limited to, e.g., dot blotting, western blotting, protein chips, competitive and noncompetitive protein binding assays, enzyme-linked immunosorbant assays (ELISA), immunohistochemistry, fluorescence activated cell sorting (FACS), and others commonly used and widely-described in scientific and patent literature, and many employed commercially. A skilled artisan can readily adapt known protein/antibody detection methods for use in determining whether cells express a marker of the present invention and the relative concentration of that specific polypeptide expression product in blood or other body tissues. Proteins from individuals can be isolated using techniques that are well-known to those of skill in the art. The protein isolation methods employed can, e.g., be such as those described in Harlow & Lane, Antibodies: A Laboratory Manual (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1988)).
[I l l] For the production of antibodies to a protein encoded by one of the disclosed genes, various host animals may be immunized by injection with the polypeptide, or a portion thereof. Such host animals may include, but are not limited to, rabbits, mice and rats. Various adjuvants may be used to increase the immunological response, depending on the host species including, but not limited to, Freund's (complete and incomplete), mineral gels, such as aluminium hydroxide; surface active substances, such as lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, keyhole limpet haemocyanin and dinitrophenol; and potentially useful human adjuvants, such as bacille Camette-Guerin (BCG) and Corynebacterium parvum.
[112] Monoclonal antibodies (mAbs), which are homogeneous populations of antibodies to a particular antigen, may be obtained by any technique that provides for the production of antibody molecules by continuous cell lines in culture. These include, but are not limited to, the hybridoma technique of Kohler & Milstein, Nature 256: 495-497 (1975); and U.S. Pat. No. 4,376,110; the human B-cell hybridoma technique of Kosbor et ah, Immunol. Today 4: 72 (1983); Cole et al., Proc. Natl. Acad. ScL USA 80: 2026-2030 (1983); and the EBV- hybridoma technique of Cole et al, Monoclonal Antibodies and Cancer Therapy (Alan R. Liss, Inc, 1985) pp. 77-96. [113] In addition, techniques developed for the production of "chimaeric antibodies" (see Morrison et al, Proc. Natl. Acad. ScI USA 81: 6851-6855 (1984); Neuberger et al, Nature 312: 604-608 (1984); and Takeda et al, Nature 314: 452-454 (1985)), by splicing the genes from a mouse antibody molecule of appropriate antigen specificity together with genes from a human antibody molecule of appropriate biological activity can be used. A chimaeric antibody is a molecule in which different portions are derived from different animal species, such as those having a variable or hypervariable region derived form a murine mAb and a human immunoglobulin constant region.
[114] Alternatively, techniques described for the production of single chain antibodies (U.S. Pat. No. 4,946,778; Bird, Science 242: 423-426 (1988); Huston et al, Proc. Natl. Acad. ScI USA 85: 5879-5883 (1988); and Ward et al, Nature 334: 544-546 (1989)) can be adapted to produce differentially expressed gene single-chain antibodies.
[115] Techniques useful for the production of "humanized antibodies" can be adapted to produce antibodies to the proteins, fragments or derivatives thereof. Such techniques are disclosed in U.S. Pat. Nos. 5,932,448; 5,693,762; 5,693,761; 5,585,089; 5,530,101; 5,569,825; 5,625,126; 5,633,425; 5,789,650; 5,661,016; and 5,770,429. [116] Antibodies or antibody fragments can be used in methods, such as Western blots or immunofluorescence techniques, to detect the expressed proteins. In such uses, it is generally preferable to immobilize either the antibody or proteins on a solid support. Suitable solid phase supports or carriers include any support capable of binding an antigen or an antibody. Well-known supports or carriers include glass, polystyrene, polypropylene, polyethylene, dextran, nylon, amylases, natural and modified celluloses, polyacrylamides, gabbros and magnetite.
[117] A useful method, for ease of detection, is the sandwich ELISA, of which a number of variations exist, all of which are intended to be used in the methods and assays of the present invention. As used herein, "sandwich assay" is intended to encompass all variations on the basic two-site technique. Immunofluorescence and EIA techniques are both very well- established in the art. However, other reporter molecules, such as radioisotopes, chemiluminescent or bioluminescent molecules may also be employed. It will be readily apparent to the skilled artisan how to vary the procedure to suit the required use. [118] Whole genome monitoring of protein, i.e., the "proteome," can be carried out by constructing a microarray in which binding sites comprise immobilized, preferably monoclonal, antibodies specific to a plurality of protein species encoded by the cell genome. Preferably, antibodies are present for a substantial fraction of the encoded proteins, or at least for those proteins relevant to testing or confirming a biological network model of interest. As noted above, methods for making monoclonal antibodies are well-known. See, e.g., Harlow & Lane, Antibodies: A Laboratory Manual" (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1988)). In a preferred embodiment, monoclonal antibodies are raised against synthetic peptide fragments designed based on genomic sequence of the cell. With such an antibody array, proteins from the cell are contacted to the array and their binding is measured with assays known in the art.
[119] Detection of Polypeptides. Two-Dimensional Gel Electrophoresis. Two-dimensional gel electrophoresis is well-known in the art and typically involves isoelectric focusing along a first dimension followed by SDS-PAGE electrophoresis along a second dimension. See, e.g., Hames et at, Gel Electrophoresis of Proteins: A Practical Approach (IRL Press, New York, 1990); Shevchenko et at, Proc. Natl. Acad. Sci. USA 93: 14440-14445 (1996); Sagliocco et at, Yeast 12: 1519-1533 (1996); and Lander, Science 274: 536-539 (1996). [120] Detection of Polypeptides. Mass Spectroscopy. The identity as well as expression level of target polypeptide can be determined using mass spectrocopy technique (MS). MS-based analysis methodology is useful for analysis of isolated target polypeptide as well as analysis . of target polypeptide in a biological sample. MS formats for use in analyzing a target polypeptide include ionization (I) techniques, such as, but not limited to, matrix assisted laser desorption (MALDI), continuous or pulsed electrospray ionization (ESI) and related methods, such as ionspray or thermospray, and massive cluster impact (MCI). Such ion sources can be matched with detection formats, including linear or non-linear reflectron time of flight (TOF), single or multiple quadrupole, single or multiple magnetic sector, Fourier transform ion cyclotron resonance (FTICR), ion trap and combinations thereof such as ion-trap/TOF. For ionization, numerous matrix/wavelength combinations (e.g., matrix assisted laser desorption (MALDI)) or solvent combinations (e.g., ESI) can be employed.
[121] For mass spectroscopy (MS) analysis, the target polypeptide can be solubilised in an appropriate solution or reagent system. The selection of a solution or reagent system, e.g., an organic or inorganic solvent, will depend on the properties of the target polypeptide and the type of MS performed, and is based on methods well-known in the art. See, e.g., Vorm et at, Anal. Chem. 61 : 3281 (1994) for MALDI; and Valaskovic et at, Anal. Chem. 67: 3802 (1995), for ESI. MS of peptides also is described, e.g., in International PCT Application No. WO 93/24834 and U.S. Pat. No. 5,792,664. A solvent is selected that minimizes the risk that the target polypeptide will be decomposed by the energy introduced for the vaporization process. A reduced risk of target polypeptide decomposition can be achieved, e.g., by embedding the sample in a matrix. A suitable matrix can be an organic compound such as a sugar, e.g., a pentose or hexose, or a polysaccharide such as cellulose. Such compounds are decomposed thermolytically into CO2 and H2O such that no residues are formed that can lead to chemical reactions. The matrix also can be an inorganic compound, such as nitrate of ammonium, which is decomposed essentially without leaving any residue. Use of these and other solvents is known to those of skill in the art. See, e.g., U.S. Pat. No. 5,062,935. Electrospray MS has been described by Fenn et al, J. Phys. Chem. 88: 4451-4459 (1984); and PCT Application No. WO 90/14148; and current applications are summarized in review articles. See Smith et al, Anal. Chem. 62: 882-89 (1990); and Ardrey, Spectroscopy 4: 10-18 (1992).
[122] The mass of a target polypeptide determined by MS can be compared to the mass of a corresponding known polypeptide. For example, where the target polypeptide is a mutant protein, the corresponding known polypeptide can be the corresponding non-mutant protein, e.g., wild-type protein. With ESI, the determination of molecular weights in femtomole amounts of sample is very accurate due to the presence of multiple ion peaks, all of which can be used for mass calculation. Sub-attomole levels of protein have been detected, e.g., using ESI MS (Valaskovic et al, Science 273: 1199-1202 (1996)) and MALDI MS (Li et al, J. Am. Chem. Soc. 118: 1662-1663 (1996)).
[123] Matrix Assisted Laser Desorption (MALDI). The level of the target protein in a biological sample, e.g., body fluid or tissue sample, may be measured by means of mass spectrometric (MS) methods including, but not limited to, those techniques known in the art as matrix-assisted laser desorption/ionization, time-of-flight mass spectrometry (MALDI- TOF-MS) and surfaces enhanced for laser desorption/ionization, time-of-flight mass spectrometry (SELDI-TOF-MS) as further detailed below. Methods for performing MALDI are well-known to those of skill in the art. See, e.g., Juhasz et al, Analysis, Anal. Chem. 68: 941-946 (1996), and see also, e.g., U.S. Pat. Nos. 5,777,325; 5,742,049; 5,654,545; 5,641,959; 5,654,545 and 5,760,393 for descriptions of MALDI and delayed extraction protocols. Numerous methods for improving resolution are also known. MALDI-TOF-MS has been described by Hillenkamp et ah, Biological Mass Spectrometry, Burlingame & McCloskey, eds. (Elsevier Science PubL, Amsterdam, 1990) pp. 49-60. [124] A variety of techniques for marker detection using mass spectroscopy can be used. See Bordeaux Mass Spectrometry Conference Report, Hillenkamp, Ed., pp. 354-362 (1988); Bordeaux Mass Spectrometry Conference Report, Karas & Hillenkamp, Eds., pp. 416-417 (1988); Karas & Hillenkamp, Anal. Chem. 60: 2299-2301 (1988); and Karas et al, Biomed. Environ. Mass Spectrum 18: 841-843 (1989). The use of laser beams in TOF-MS is shown, e.g., in U.S. Patent Nos. 4,694,167; 4,686,366, 4,295,046 and 5,045,694, which are incorporated herein by reference in their entireties. Other MS techniques allow the successful volatilization of high molecular weight biopolymers, without fragmentation, and have enabled a wide variety of biological macromolecules to be analyzed by mass spectrometry. [125] Surfaces Enhanced for Laser Desorption/Ionization (SELDI). Other techniques are used which employ new MS probe element compositions with surfaces that allow the probe element to actively participate in the capture and docking of specific analytes, described as Affinity Mass Spectrometry (AMS). See SELDI patents U.S. Pat. Nos. 5,719,060; 5,894,063; 6,020,208; 6,027,942; 6,124,137; and U.S. Patent application No. U.S. 2003/0003465. Several types of new MS probe elements have been designed with Surfaces Enhanced for Affinity Capture (SEAC). See Hutchens & Yip, Rapid Commun. Mass Spectrom. 7: 576-580 (1993). SEAC probe elements have been used successfully to retrieve and tether different classes of biopolymers, particularly proteins, by exploiting what is known about protein surface structures and biospecifϊc molecular recognition. The immobilized affinity capture devices on the MS probe element surface, i.e., SEAC, determines the location and affinity (specificity) of the analyte for the probe surface, therefore the subsequent analytical MS process is efficient.
[126] Within the general category of SELDI are three separate subcategories: (1) Surfaces Enhanced for Neat Desorption (SEND), where the probe element surfaces, i.e., sample presenting means, are designed to contain Energy Absorbing Molecules (EAM) instead of "matrix" to facilitate desorption/ionizations of analytes added directly (neat) to the surface. (2) SEAC, where the probe element surfaces, i.e., sample presenting means, are designed to contain chemically defined and/or biologically defined affinity capture devices to facilitate either the specific or non-specific attachment or adsorption (so-called docking or tethering) of analytes to the probe surface, by a variety of mechanisms (mostly non-covalent). (3) Surfaces Enhanced for Photolabile Attachment and Release (SEPAR), where the probe element surfaces, i.e., sample presenting means, are designed or modified to contain one or more types of chemically defined cross-linking molecules to serve as covalent docking devices. The chemical specificities determining the type and number of the photolabile molecule attachment points between the SEPAR sample presenting means (i.e., probe element surface) and the analyte (e.g., protein) may involve any one or more of a number of different residues or chemical structures in the analyte (e.g., His, Lys, Arg, Tyr, Phe and Cys residues in the case of proteins and peptides).
[127] Functionalizing Polypeptides. A polypeptide of interest also can be modified to facilitate conjugation to a solid support. A chemical or physical moiety can be incorporate into the polypeptide at an appropriate position. For example, a polypeptide of interest can be modified by adding an appropriate functional group to the carboxyl terminus or amino terminus of the polypeptide, or to an amino acid in the peptide, (e.g., to a reactive side chain, or to the peptide backbone. The artisan will recognize, however, that such a modification, e.g., the incorporation of a biotin moiety, can affect the ability of a particular reagent to interact specifically with the polypeptide and, accordingly, will consider this factor, if relevant, in selecting how best to modify a polypeptide of interest. A naturally-occurring amino acid normally present in the polypeptide also can contain a functional group suitable for conjugating the polypeptide to the solid support. For example, a cysteine residue present in the polypeptide can be used to conjugate the polypeptide to a support containing a sulfhydryl group through a disulfide linkage, e.g., a support having cysteine residues attached thereto. Other bonds that can be formed between two amino acids, include, but are not limited to, e.g., monosulfide bonds between two lanthionine residues, which are non-naturally-occurring amino acids that can be incorporated into a polypeptide; a lactam bond formed by a transamidation reaction between the side chains of an acidic amino acid and a basic amino acid, such as between the y-carboxyl group of GIu (or alpha carboxyl group of Asp) and the amino group of Lys; or a lactone bond produced, e.g., by a crosslink between the hydroxy group of Ser and the carboxyl group of GIu (or alpha carboxyl group of Asp). Thus, a solid support can be modified to contain a desired amino acid residue, e.g. , a GIu residue, and a polypeptide having a Ser residue, particularly a Ser residue at the N-terminus or C-terminus, can be conjugated to the solid support through the formation of a lactone bond. The support need not be modified to contain the particular amino acid, e.g., GIu, where it is desired to form a lactone-like bond with a Ser in the polypeptide, but can be modified, instead, to contain an accessible carboxyl group, thus providing a function corresponding to the alpha carboxyl group of GIu.
[128] Thiol-Reactive Functionalities. A thiol-reactive functionality is particularly useful for conjugating a polypeptide to a solid support. A thiol-reactive functionality is a chemical group that can rapidly react with a nucleophilic thiol moiety to produce a covalent bond, e.g., a disulfide bond or a thioether bond. A variety of thiol-reactive functionalities are known in the art, including, e.g., haloacetyls, such as iodoacetyl; diazoketones; epoxy ketones, alpha- and beta-unsaturated carbonyls, such as alpha-enones and beta-enones; and other reactive Michael acceptors, such as maleimide; acid halides; benzyl halides; and the like. See Greene & Wuts, Protective Groups in Organic Synthesis, 2nd Edition (John Wiley & Sons, 1991). [129] If desired, the thiol groups can be blocked with a photocleavable protecting group, which then can be selectively cleaved, e.g., by photolithography, to provide portions of a surface activated for immobilization of a polypeptide of interest. Photocleavable protecting groups are known in the art (see, e.g. , published International PCT Application No. WO 92/10092; and McCray et ah, Ann. Rev. Biophys. Biophys. Chem. 18: 239-270 (1989)) and can be selectively de-blocked by irradiation of selected areas of the surface using, e.g., a photolithography mask.
[130] Linkers. A polypeptide of interest can be attached directly to a support via a linker. Any linkers known to those of skill in the art to be suitable for linking peptides or amino acids to supports, either directly or via a spacer, may be used. For example, the polypeptide can be conjugated to a support, such as a bead, through means of a variable spacer. Linkers, include, Rink amide linkers (see, e.g., Rink, Tetrahedron Lett. 28: 3787 (1976)); trityl chloride linkers (see, e.g., Leznoff, Ace Chem. Res. 11 : 327 (1978)); and Merrifield linkers (see, e.g., Bodansky et ah, Peptide Synthesis, 2nd Edition (Academic Press, New York, 1976)). For example, trityl linkers are known. See, e.g., U.S. Pat. Nos. 5,410,068 and 5,612,474. Amino trityl linkers are also known. See, e.g., U.S. Pat. No. 5,198,531. Other linkers include those that can be incorporated into fusion proteins and expressed in a host cell. Such linkers may be selected amino acids, enzyme substrates or any suitable peptide. The linker may be made, e.g., by appropriate selection of primers when isolating the nucleic acid. Alternatively, they may be added by post-translational modification of the protein of interest. Linkers that are suitable for chemically linking peptides to supports, include disulfide bonds, thioether bonds, hindered disulfide bonds and covalent bonds between free reactive groups, such as amine and thiol groups.
[131] Cleavable Linkers. A linker can provide a reversible linkage such that it is cleaved under the select conditions. In particular, selectively cleavable linkers, including photocleavable linkers (see U.S. Pat. No. 5,643,722), acid cleavable linkers (see Fattom et al, Infect. Immun. 60: 584-589 (1992)), acid-labile linkers (see Welhδner et al, J. Biol. Chem. 266: 4309-4314 (1991)) and heat sensitive linkers are useful. A linkage can be, e.g., a disulfide bond, which is chemically cleavable by mercaptoethanol or dithioerythrol; a biotin/streptavidin linkage, which can be photocleavable; a heterobifunctional derivative of a trityl ether group, which can be cleaved by exposure to acidic conditions or under conditions of MS (see Kδster et al, Tetrahedron Lett. 31 : 7095 (1990)); a levulinyl-mediated linkage, which can be cleaved under almost neutral conditions with a hydrazinium/acetate buffer; an arginine-arginine or a lysine-lysine bond, either of which can be cleaved by an endopeptidase, such as trypsin; a pyrophosphate bond, which can be cleaved by a pyrophosphatase; or a ribonucleotide bond, which can be cleaved using a ribonuclease or by exposure to alkali condition. A photolabile cross-linker, such as 3-amino-(2-nitrophenyl)propionic acid can be employed as a means for cleaving a polypeptide from a solid support. Brown et al, MoI Divers, pp. 4-12 (1995); Rothschild et al, Nucl. Acids. Res. 24: 351-66 (1996); and U.S. Pat. No. 5,643,722. Other linkers include RNA linkers that are cleavable by ribozymes and other RNA enzymes and linkers, such as the various domains, such as CH1, CH2 and CH3, from the constant region of human IgGl. See, Batra et al, MoI Immunol 30: 379-396 (1993). [132] Combinations of any linkers are also contemplated herein. For example, a linker that is cleavable under MS conditions, such as a silyl linkage or photocleavable linkage, can be combined with a linker, such as an avidin biotin linkage, that is not cleaved under these conditions, but may be cleaved under other conditions. Acid-labile linkers are particularly useful chemically cleavable linkers for mass spectrometry, especially for MALDI-TOF, because the acid labile bond is cleaved during conditioning of the target polypeptide upon addition of a 3 -HPA matrix solution. The acid labile bond can be introduced as a separate linker group, e.g., an acid labile trityl group, or can be incorporated in a synthetic linker by introducing one or more silyl bridges using diisopropylysilyl, thereby forming a diisopropylysilyl linkage between the polypeptide and the solid support. The diisopropylysilyl linkage can be cleaved using mildly acidic conditions, such as 1.5% trifluoroacetic acid (TFA) or 3-HPA/l % TFA MALDI-TOF matrix solution. Methods for the preparation of diisopropylysilyl linkages and analogues thereof are well-known in the art. See, e.g., Saha et ah, J. Org. Chem. 58: 7827-7831 (1993).
[133] Use of a Pin Tool to Immobilize a Polypeptide. The immobilization of a polypeptide of interest to a solid support using a pin tool can be particularly advantageous. Pin tools include those disclosed herein or otherwise known in the art. See, e.g., U.S. Application Serial Nos. 08/786,988 and 08/787,639; and International PCT Application No. WO 98/20166. [134] A pin tool in an array, e.g., a 4 x 4 array, can be applied to wells containing polypeptides of interest. Where the pin tool has a functional group attached to each pin tip, or a solid support, e.g., functionalized beads or paramagnetic beads are attached to each pin, the polypeptides in a well can be captured (1 pmol capacity). During the capture step, the pins can be kept in motion (vertical, 1-2 mm travel) to increase the efficiency of the capture. Where a reaction, such as an in vitro transcription is being performed in the wells, movement of the pins can increase efficiency of the reaction. Further immobilization can result by applying an electrical field to the pin tool. When a voltage is applied to the pin tool, the polypeptides are attracted to the anode or the cathode, depending on their net charge. [135] For more specificity, the pin tool (with or without voltage) can be modified to have conjugated thereto a reagent specific for the polypeptide of interest, such that only the polypeptides of interest are bound by the pins. For example, the pins can have nickel ions attached, such that only polypeptides containing a polyhistidine sequence are bound. Similarly, the pins can have antibodies specific for a target polypeptide attached thereto, or to beads that, in turn, are attached to the pins, such that only the target polypeptides, which contain the epitope recognized by the antibody, are bound by the pins. [136] Captured polypeptides can be analyzed by a variety of means including, e.g. , spectrometric techniques, such as UV/VIS, IR, fluorescence, chemiluminescence, NMR spectroscopy, MS or other methods known in the art, or combinations thereof. If conditions preclude direct analysis of captured polypeptides, the polypeptides can be released or transferred from the pins, under conditions such that the advantages of sample concentration are not lost. Accordingly, the polypeptides can be removed from the pins using a minimal volume of eluent, and without any loss of sample. Where the polypeptides are bound to the beads attached to the pins, the beads containing the polypeptides can be removed from the pins and measurements made directly from the beads. [137] Pin tools can be useful for immobilizing polypeptides of interest in spatially addressable manner on an array. Such spatially addressable or pre-addressable arrays are useful in a variety of processes, including, for example, quality control and amino acid sequencing diagnostics. The pin tools described in the U.S. Application Nos. 08/786,988 and 08/787,639 and International PCT Application No. WO 98/20166 are serial and parallel dispensing tools that can be employed to generate multi-element arrays of polypeptides on a surface of the solid support. The array surface can be flat, with beads or geometrically altered to include wells, which can contain beads. In addition, MS geometries can be adapted for accommodating a pin tool apparatus.
[138] Other Aspects of the Biological State. In various embodiments of the invention, aspects of the biological activity state, or mixed aspects can be measured in order to obtain drug and pathway responses. The activities of proteins relevant to the characterization of cell function can be measured, and embodiments of this invention can be based on such measurements. Activity measurements can be performed by any functional, biochemical or physical means appropriate to the particular activity being characterized. Where the activity involves a chemical transformation, the cellular protein can be contacted with natural substrates, and the rate of transformation measured. Where the activity involves association in multimeric units, e.g., association of an activated DNA binding complex with DNA, the amount of associated protein or secondary consequences of the association, such as amounts of mRNA transcribed, can be measured. Also, where only a functional activity is known, e.g., as in cell cycle control, performance of the function can be observed. However known and measured, the changes in protein activities form the response data analyzed by the methods of this invention. In alternative and non-limiting embodiments, response data may be formed of mixed aspects of the biological state of a cell. Response data can be constructed from, e.g., changes in certain mRNA abundances, changes in certain protein abundances and changes in certain protein activities.
[139] The following EXAMPLE is presented in order to more fully illustrate the preferred embodiments of the invention. This EXAMPLE should in no way be construed as limiting the scope of the invention, as defined by the appended claims. EXAMPLE
GENETIC VARIATIONS IN GENES TCF2, ACADSB5 CPTlA, ESRRA, PPARD5
PPARGClA5 ACACB and SCDl ARE ASSOCIATED WITH TYPE 2 DIABETES IN
NORTH AMERICAN CAUCASIANS
[140] Introduction and summary. The purpose of this EXAMPLE is to define type 2 diabetes more accurately by providing biomarkers for the identification of the disease. Such biomarkers will help to identify drug targets for better intervention and treat patient more effectively based individualized medicine.
[141] Demographics of population used in T2DM polymorphism association analysis. The analysis consisted of over 1000 type 2 diabetes mellitus (T2DM) participants and over 1000 normal controls. Searches were conducted using USA collections only. Participants were limited to Caucasians only, excluding participants from Central and South America.
[142] Search Criteria for Diabetic Participants. Using Survey Source (US collections only), the following criteria were applied for participant selection: (1) The diabetic participants were confirmed to have type 2 diabetes, not type 1 diabetes. (2) The fasting plasma glucose levels of the participants were > 140 mg/dl for at least one measurement. (3) The haemoglobin AIc
(HBAlC) test was > 7 for at least one measurement. 1004 diabetic participants met these criteria. The ages of the diabetic participants were > 30 years and < 80 years. The age of the participants at diagnosis was also was > 30 years and < 80 years. The body mass index (BMI) of the diabetic participants was > 25 and < 40.
[143] Search Criteria for Controls. For controls, the following criteria were applied: The controls did not have type 1 or type 2 diabetes. 1609 control participants met these criteria.
The ages of the control participants were > 30 years. The body mass index (BMI) of the control participants was > 25 and < 40.
[144] Match Criteria & Case Control Comparisons. Diabetic cases and controls were matched using the following criteria: (1) Exact match for gender. (2) Match for +/- 5 years of age {i.e., maximal age difference between diabetic and controls = 5 years). (3) Match for body mass index (BMI) +/- 5 units. Final maximal difference between diabetic and controls = 4.9 units. Blood samples were collected from each diabetic participant and control.
[145] Single Nucleotide Polymorphism Genotyping. A total of 115 SNPs in nine genes were successfully genotyped. SNP assays were designed using information from databases such as
OMIM5 the SNP consortium, Locus Link and dbSNP. [146] Genotyping of all SNPs was performed by single base extension followed by Mass Spectroscopy using Sequenom's MassArray™ Technology. Ross et ah, Nat. Biotechnol. 16: 1347-1351 (1998). Ascertainment of genotypes on this system is based on matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) analysis of homogenous Mass Extension (hME) reaction products. During the hME reaction, the primer is extended by a specific number of nucleotides dependent on the SNP allele, and the few bases immediately 3' of the SNP. The extension is terminated by incorporating any one of 3 dideoxynucleotides (ddNTPs) matching a given allele, or continued using a single deoxynucleotide (dNTP), which matches the alternate allele. The hME reaction thus produces allele specific extension products of different masses (Daltons). Based on the mass differences of these hME products, a number of different assays can be run simultaneously (multiplexing), which provides cost- effective and high-throughput genotyping. Each mass-specific peak is called a specific allele based on the known sequence of the two extension products from each SNP in the multiplex. The entire genotyping process is supported by automation, with bar-coded tracking through the process. Final genotype data is approved and transferred from the platform system to a database for final statistical analysis. Genotype data's considered passing for a given assay if the pass rate of conservative and moderate calls combined is greater than or equal to 95%, and the nominal HWE p-value is 0.01 or higher. [ 147] A list of all SNPs assayed is described in TABLE 2.
TABLE 2A List of Genes analyzed in the association analysis
Gene Gene Name Ref ACC Major Minor Location
Symbol Allele Allele
SCD Stearoyl-CoA desaturase rs735877 C T 51UTR gs229601114 T C 51UTR rsl 1190479 G A 51UTR rs 1054411 C G 51UTR gs229601124 C G exon rsl 502593 G A intron rs522951 C G intron rs3870747 C T intron rs3071 A C intron rs3793767 T C intron rsl 1598233 A C exon rs7849 T C 31UTR rs508384 C A 31UTR rsl393491 T C 3'UTR
Gene Gene Name Ref ACC Major Minor Location
Symbol Allele Allele
TCF2 Hepatocyte nuclear rsl l651755 G A intron factor lβ
Gene Gene Name Ref ACC Major Minor Location
Symbol Allele Allele
ESRRA Estrogen related receptor gs229601623 T C 51UTR α rsl 1231746 T C 51UTR rs2286613 C T 5'UTR rs731703 G A intron rs2276014 G A exon gs229601636 C T exon rsl 1600990 C T intron rs2079786 T G 3'UTR
TABLE 2B
List of Genes analyzed in the association analysis
Gene Gene Name Ref ACC Major Allele Minor Allele Location
Symbol
ACACB Acetyl-CoA rsl654884 C T 51UTR carboxylase-β rs2878960 T C exon rs3858707 G A intron rs2430684 G A intron rs246090 G T intron rs34268 G A intron rs34274 C T intron rs4766516 C T exon rs34286 C T intron rs246092 G A intron rs6606697 G A intron rs2284695 T C intron rsl 1065765 T G exon rs2300455 G A exon rslO16331 C T intron rs2268403 C T intron rs7135947 T C exon rs2287221 T C intron rs2284694 C T intron rs2268393 G A intron rs2268391 T C intron rs2284690 C T intron rs2268389 A G intron rs2268388 G A intron rs2239607 A G intron rs2300453 T G intron rs2300452 G A intron rs3742027 C T intron rs2241220 C T exon rs2160602 A T intron rs759560 C T intron rs2284689 G A intron rs2284685 G C intron rs2075259 G A intron rs3742023 C T exon rs2075260 A G exon rs2075263 T C 3'UTR rs882355 A G 31UTR TABLE 2C
List of Genes analyzed in the association analysis
Gene Gene Name Ref ACC Mai or Allele Minor Allele Location
Symbol
PPARGClA Peroxisome rsl532195 G C 31UTR proliferation- rs 1532196 T G 31UTR activated rs3774923 G A 31UTR receptor rs768695 T C intron gamma rs6833873 G A intron coactivator rs2932965 C T intron lα rs3774921 A G intron rs3736265 C T exon rs2932967 G C intron rs3755863 G A exon rs2970847 G A exon rsl472095 G A intron rs2932974 T C intron rs3774907 T C intron rs6839966 G A intron rsl 0032795 G A intron rs6838835 C T intron rs7437482 T G intron rs7657517 A C intron rs6850464 T C intron rs4235308 A G intron rs7438250 T C intron rs4637388 T C intron rs4361373 A G intron rs2305683 G A intron rs4469064 T C intron rs2946385 C A intron rs3774902 C T intron rs2970870 T C 51UTR rsl 878949 T G 5'UTR
Figure imgf000048_0001
TABLE 2E List of genes analyzed in the association analysis
Gene Gene Name Ref ACC Major Minor Location
Symbol Allele Allele
ACADSB Acyl-CoA rslO51O119 A G 51UTR dehydrogena rs4511227 G A 51UTR se rs3824685 T G 51UTR rs6599641 G T intron rs7912404 T C intron rs7070793 G A intron rs2421139 G A intron rs 10902866 C T exon rs4128783 G T intron gs229601113 A G exon rs2421166 G A intron rs3763738 G C 3'UTR rs2118318 G A 31UTR
CPTlA Carnitine rs2924699 G A intron palmitoyl rs2123869 T C intron transferase I rs2228502 C T exon rs2305508 A G intron rs747840 T C intron rsl 1228356 G A exon gs229601477 G A exon gs229601476 G A exon gs229601475 G A exon rs3794020 G A intron gs229601471 G A exon rs6591356 A G intron rs3019607 T G 51UTR rs3019613 T C 51UTR rs879784 G A 51UTR rslO17641 A G 5'UTR rs613084 G T 5'UTR rs3019594 A G 5'UTR rs597316 C G 5'UTR rs2060982 T C 5'UTR
[148] Statistical Methods for SNP Association Tests. Single Marker Association Tests. Tests of association between affectation status and each of the markers were performed on the basis of both allele frequency and genotype frequency. For the allele frequency measures, the contingency table summarizing the observed count of each allele in the two affectation groups was compared to the expectation from the null hypothesis of no association with a χ2 test of 1 degree of freedom (df). For contingency tables having at least one sparse margin (< 10 observed counts), the Fisher exact test was used instead. Similarly, for associations based on genotype, the counts of genotypes in the two affectation groups were compared to the expectation for no association of disease with genetics using a χ2 test with 2df, or the Fisher exact test for contingency tables with at least one sparse margin. In addition, to detect genotype-based association derived from recessive or dominant modes of inheritance, the count of heterozygous genotypes was combined with the count of homozygous genotypes for the major or minor alleles, respectively, in constructing contingency tables for the observed data. Again either a χ2 test with ldf, or a Fisher exact test, was used to assess the deviation from the null hypothesis of no association with affectation status given the specific model of inheritance.
[149] For both allele-based and genotype-based hypothesis testing, significance from the analytic distributions was confirmed by the standard re-sampling procedure of Westfall P & Young S, Resampling-based multiple testing: examples and methods for p-value adjustment (Wiley, New York, 1993) p. 340. To estimate the null distributions, test statistics were computed for at least 5000 simulated data sets, each having the affectation status re-assigned to the subjects by sampling at random without replacement while preserving the genotype assignments from the original data. To further assess the significance of the associations, the simulated null distributions were also used to compute corrected p- values for the markers. Each group of SNPs (i.e., all SNPs in a single gene or all singleton SNPs together) were viewed as a distinct set of multiple hypotheses. For each group, the statistic of the best marker was compared with the maximum value distribution derived from the re-sampled data sets to estimate a corrected p-value. For the statistics of the remaining markers in each group, corrected p-values were derived from comparison to the estimated maximum value distribution from the re-sampled data of subsets of markers from the original group, containing only those markers that had uncorrected p-values greater than or equal to the test marker p-value. Occasionally, these nested null distributions for the maximum values yielded corrected p-values that were not ordered the same way as their uncorrected counterparts. For these markers, monotonicity was imposed by setting the corrected p-value equal to the corrected p-value for the marker with the closest, smaller uncorrected p-value. See, Westfall P & Young S, Resampling-based multiple testing: examples and methods for p-value adjustment (Wiley, New York, 1993) p. 340; Schaid DJ et al, Am. J. Hum. Genet. 70(2): 425- 34 (2002).
[150] To assess the genotyping quality and to detect potential demographic structure in the analyzed population, observed genotype frequencies for each SNP were compared to the Hardy- Weinberg expectation based on allele frequencies estimated from the sample by the χ2 test with 1 df.
[151] The R statistical package (Becker R et al, The new S language: A programming environment for data analysis and graphics (Wadsworth & Brooks/Cole Advanced Books, Pacific Grove, 1988) p. 702) was used in this analysis, as well as haploview (Barrett JC et al, "Haploview: analysis and visualization of LD and haplotype maps" Bioinformatics (2004)) for analysis of the haplotype block structure.
[152] Haplotype Analysis. All haplotype association analyses were performed using the haplo. score program from the haplo. stats package in R, using the recommended parameters for a binary trait. See, Becker R, Chambers J & Willks A, The new S language: a programming environment for data analysis and graphics (Wadsworth & Brooks/Cole Advanced Books, Pacific Grove, 1988) p. 702; Schaid DJ et al, Am. J. Hum. Genet. 70(2): 425-34 (2002). Haplo.score infers haplotypes from genotype data using the EM algorithm and tests association with a trait both for the individual haplotypes at a locus and for the entire set of haplotypes. The significance of association for both tests is determined from analytic distributions and from an empirical null distribution by re-sampling, with the empirical distribution used also to assess the maximum value statistic for the most tightly associated haplotype as above. The manual for haplo.score warns that subjects with missing genotype data will dramatically degrade haplotype inference, and our experience reinforced this caution. In particular, for the gene ACACB, haplotype inference with haplo.score and the complete data would not converge unless the locus was divided into two groups (SNPs T27456C-A70941G and A71434G-A136178G), and even then the parameters had to be changed from their default settings. In particular, the critical parameter "insert batch size" had to be reduced from its default setting of 6 (see TABLE 3). For each gene, the analysis was therefore performed not only on the whole data set but also on just the subset of individuals who had complete genotype information for all SNPs and for a reduced number of "tag" SNPs that were expected to capture the major haplotype diversity as determined from the block structure by the program haploview. The "tag" SNP method represents a compromise between the haplotype inference using all of the SNPs including individuals with some missing genotyping data and using all of the SNPs including only individuals with complete data. See TABLE 3 for the number of eligible subjects in the different modes of the analysis.
TABLE 3A
Details of the haplotvpe.score interference AU subjects, AU SNPs ACACB ACACB ESRRA PPARD block.1 block.2 n.snps 24 16 8 22 insert.batch 2 3 6 3 n control 1000 1000 992 998 n case 1001 1001 1000 1001
TABLE 3B
Details of the haplotvpe.score interference
AU subjects with complete AU subjects with complete genotype genotype, no tags at tag SNPs
ACACB ESRRA PPARD ACACB ESRRA PPARD n.snps 38 8 22 23 4 14 insert.batch 6 6 6 6 6 6 n control 534 900 791 687 959 840 n case 568 896 806 698 983 867
[153] Stearoyl-CoA desaturase (SCD) Association. Genetic variations in gene stearoyl-CoA desaturase (SCD) are associated with type 2 diabetes. 3 SNPs (rs3870747, rs7849 and rsl393491) of the 14 SNPs genotyped showed statistically significant (P < 0.05) association with diabetes phenotype by genotype analysis using all three genotypes (co-dominant model).. [154] SCD catalyzes a rate-limiting step in the synthesis of unsaturated fatty acids and lipogenesis. The principal product of SCD is oleic acid, which is formed by desaturation of stearic acid. Analysis of transcription profiling of liver from genetically obese mice (ob/ob) showed that leptin specifically represses RNA levels and enzymatic activity of SCDl . Mice lacking Scdl were lean and hypermetabolic; ob/ob mice with mutations in Scdl were significantly less obese than ob/ob controls and had markedly increased energy expenditure. Ob/ob mice with mutations in SCDl had histologically normal livers with significantly reduced triglyceride storage and VLDL production. The main consequence of SCDl deficiency is hypothesized to be an activation of lipid oxidation in addition to reduced triglyceride synthesis and storage. Cohen P et al., Science 297: 240-243 (2002). [155] SCD inhibition is therefore a promising approach to treat obesity, insulin resistance, fatty liver disease and T2DM.
[156] TABLE 4 summarizes the SCD SNPs used in this analysis. Of the 14 genotyped successfully, 3 SNPs (rs3870747, rs7849 and rsl393491) showed statistically significant (P < 0.05) association with diabetes phenotype by genotype analysis using all three genotypes (co- dominant model). The TABLE provides frequency of each SNP in controls and diabetic patients and two association analytical methods (by allele and by genotype). The Hardy- Weinberg Equilibrium (HWE), which is used to measure the population admixture of the samples, is good in for two of these SNPs except rs3870747.
TABLE 4A
Association analysis based on allele αdf)
Ref ACC X-squared p.val df Odds ci.hi ci.low perm p- cor p val val
Rs735877 0.0573 0.8108 1 0.9823 1.1174 0.8634 0.7884 0.9986
Gs229601114 1.5360 0.2152 1 0.8760 1.0712 0.7163 0.2000 0.7098
RsI 1190479 1.8435 0.1745 1 1.2187 1.5997 0.9284 0.1614 0.6794
RslO54411 0.0007 0.9793 1 0.9962 1.1309 0.8776 0.9616 0.9986
Gs229601124 -1.0000 1.0100 1 -1.0000 -1.0000 -1.0000 9.9000 0.9986
Rsl502593 0.3995 0.5273 1 0.9571 1.0891 0.8412 0.5084 0.9486
Rs522951 0.0484 0.8259 1 0.9839 1.1160 0.8675 0.8070 0.9986
Rs3870747 * 1.2642 0.2608 1 1.1344 1.3995 0.9196 0.2488 0.7328
Rs3071 0.0485 0.8258 1 0.9823 1.1268 0.8564 0.7916 0.9986
Rs3793767 0.0003 0.9858 1 1.0033 1.1391 0.8836 0.9652 0.9986
RsI 1598233 0.0010 0.9745 1 1.0041 1.1394 0.8849 0.9506 0.9986
Rs7849 0.3129 0.5759 1 1.0504 1.2334 0.8946 0.5406 0.9578
Rs508384 0.5845 0.4445 1 1.0683 1.2547 0.9096 0.4118 0.8876
Rsl393491 0.2569 0.6123 1 1.0466 1.2315 0.8894 0.5830 0.9690
* This SNP is not in Hardv-Weiri berε eαuili brium in this oartici nant DODulε ition.
TABLE 4B
Association analysis based on ϋhenotvoes (2df)
Ref ACC X-squared p.val df perm p-val cor p-val
Rs735877 0.6952 0.7064 2 0.7026 0.9806
Gs229601114 1.7489 0.4171 2 0.4076 0.9068
RsI 1190479 3.0033 0.0831 1 0.0696 0.3856
RslO54411 0.0289 0.9856 2 0.9864 0.9980
Gs229601124 -1.0000 1.0100 0 9.9000 0.9980
Rs 1502593 0.7001 0.7047 2 0.7084 0.9806
Rs522951 0.0651 0.9680 2 0.9724 0.9980
Rs3870747 * 7.3557 0.0253 2 0.0256 0.1894
Rs3071 1.3738 0.5031 2 0.4948 0.9252
Rs3793767 0.5847 0.7465 2 0.7468 0.9806
RsI 1598233 1.0255 0.5988 2 0.5956 0.9418
Rs7849 * 8.4835 0.0144 2 0.0144 0.1190
Rs508384 * 7.8757 0.0195 2 0.0192 0.1534
Rsl393491 5.3419 0.0692 2 0.0720 0.3856
* This SNP is not in Hardy- Weinberg equilibrium in this participant population.
TABLE 4C
2df dominant 2df recessive
Ref ACC X-squared p.val df X-squared p.val df
Rs735877 0.0101 0.9199 1 0.4357 0.5092 1
Gs229601114 1.2559 0.2624 1 0.4112 0.5213 1
RsI 1190479 2.4678 0.1162 1 0.5917 0.5044 0
RslO54411 0.0071 0.9331 1 0.0000 0.9990 1
Gs229601124 1.0112 1.0000 0 0.0613 0.8044 1
Rs 1502593 0.0691 0.7926 1 0.6004 0.4384 1
Rs522951 0.0148 0.9033 1 0.0336 0.8546 1
Rs3870747 * 2.8932 0.0890 1 1.8483 0.1740 1
Rs3071 0.0416 0.8383 1 0.8622 0.3531 1
Rs3793767 0.1481 0.7004 1 0.1526 0.6960 1
RsI 1598233 0.2768 0.5988 1 0.2900 0.5902 1
Rs7849 2.0355 0.1537 1 3.6704 0.0554 1
Rs508384 2.4701 0.1160 1 2.7748 0.0958 1
Rs 1393491 1.3633 0.2430 1 2.1375 0.1437 1
* This SNP is not in Hardy- Weinberg equilibrium in this participant population.
[157] In summary, our results suggest that multiple SCD variants are involved in T2DM. These SNPs can be used to improve T2DM diagnosis, help design clinical trials by better patient stratification.
[158] Hepatocyte nuclear factor Iβ (HNFl a, HNF2, TCF2) Association. The G allele of SNP rsl 1651755 is associated with higher incidence of T2DM by both allele-specific analysis (p<0.01) and genotype analysis using both dominant and recessive models (p<0.05). [159] Mutations in the homeodomain-containing transcription factor hepatocyte nuclear factor-lβ (HNF-I β, HNF2, TCF2) are known to cause a rare subtype of maturity-onset diabetes of the young (M0DY5) which is often associated with early-onset progressive non- diabetic renal dysfunction. Horikawa Y etal, (Letter) Nature Genet. 17: 384-385 (1997). We hypothesized that TCF2 could also be a potential candidate gene for the more common form of type 2 diabetes.
[160] SNP rsl 1651755, located within the intronic region of TCF2, was used for this analysis. SEQ ID NO: 166 shows the exact location of the variant within the surrounding sequences:
TAAAAATAAAAAATTTAGCTGGGTGTGGTGCTGGGCATCTATAATCCCAGTTACT CTGGAGGCTGAGGCAGGAGAATCGCTTGAACCCAGGATCCTGGTGGAGGTTGCA GTGAGCCAAGATGCCGTTGCACTCCAGCCTGGGTGACAAGAGCAAAACTCCACA TCAAAAAAAATAATAATAAATAAATTAATTAATTAATTAAATAAAACAAGAGCTT TTCTTTTTGCTTAATAAGAGAGAGTGGTGGTGGTGCTTTTTTATTCCTGAAGATGG GAAGTCCTCTTTTGCCCACTAACCTCR(AZG)GAAGAAAGGGATGAGGTGTACCGT ACAGGGGCAGTCACCTTCTCCTCTGTTTAGCTTCCATTTTGGCCTCATGTCTACCC CAAAGTTGTAGCTTAGATGGGGGGAAAATTCAGAATTTTGCATAGACCATAGGTA GCACCCCCTAGAAAAAGAATGTTTCTCCCCAGATGTCTCCCACTAGTACCCTAAC CATCTGCTTGTCTGTCTAGTGAGGACCCTTGGAGGGCTGCTAAAATGATCAAGGG TTACATGCAGCAACACAACATCCCCCAGAGGGAGGTGGTCGATGTCACCGGCCTG (SEQ ID NO: 166)
[161] TABLE 5 summarizes the result. The overall allelic frequency for G is 52.64 % in diabetic patients as compared to 48.08% in non-diabetic patients. The G allele is associated with higher incidence of T2DM by both allele-specific analysis (p < 0.01) and genotype analyses using all three genotypes (co-dominant model) and either of the dominant and recessive models (p < 0.05).
TABLE 5A ldf Ref ACC Xr p.val df Odds ci.hi ci.low perm cor p squared p-val -val
RsI 1651755 * 7.9866 0.0047 1 0.8321 0.9439 0.7335 0.0030 0.1510
* This SNP is not in Hardy- Weinberg equilibrium in this participant population. TABLE 5B
2df
Ref ACC X-squared p.val df perm p-val cor p-val
RsI 1651755 " * 7.6631 0.0217 2 0.0250 0.7270
* This SNP is not in Hardy- Weinberg equilibrium in this participant population.
TABLE 5C
2df dominant 2df recessive
Ref ACC X-squared p.val df X-squared p.val df
RsI 1651755 * 5.7587 0.0164 1 4.2027 0.0404 1
* This SNP is not in Hardy- Weinberg equilibrium in this participant population.
[162] This analysis shows that, in addition to its key role in M0DY5, TCF2 is involved in the pathogenesis of the more common forms of T2DM.
[163] Oestrogen related receptor a (ESRRA) Association. SNPs (gs229601623 and rsl 1600990) showed association with T2DM (p<0.05) by both allele-specific and genotype analyses using all three genotypes (co-dominant model) and the dominant model. SNP rs2276014 is associated with higher incidence of T2DM (p<0.05) by genotype analyses using all three genotypes (co-dominant model) and the dominant model. .
[164] ESRRA is an orphan nuclear receptor transcription factor expressed highly in kidney, heart, and brown adipocytes, all tissues that preferentially metabolize fatty acids. It has been hypothesized to play an important role in regulating mitochondriogenesis and cellular energy balance in vivo. Insulin resistance can develop as a result of an imbalance between triglyceride deposition in skeletal muscle and fatty acid oxidation capacity of the tissue. This capacity is directly dependent on tissue mitochondrial density. Thus, increasing skeletal muscle mitochondrial density through stimulation of mitochondriogenesis is expected to increase fatty acid oxidation capacity, leading to improvement of insulin sensitivity.
[165] Agonizing ESRRA is therefore a promising approach to treat obesity, dislipidaemia, insulin resistance and T2DM.
[166] TABLE 6 summaries the ESRRA SNPs used in this analysis. Three SNPs
(gs229601623, rs2276014 and rsl 1600990) showed association with T2DM (p<0.05) by genotype analyses using all three genotypes (co-dominant model) or the dominant model.
Two of these three SNPs (gs229601623 and rsl 1600990) also showed association with T2DM by allele-specific analysis (p<0.05). The table provides the frequency of each SNP in controls and diabetic patients and two association analysis methods. HWE, which is used to measure the population admixture of the samples, is good for these SNPs. TABLE 6A ldf
Ref ACC ∑z p.val df Odds ci.hi ci.low perm p- cor p squared val val
Gs229601623 4.0442 0.0443 1 1.1986 1.4248 1.0083 0.0364 0.1218
RsI 1231746 1.1167 0.2906 1 0.9285 1.0609 0.8127 0.2780 0.4760
Rs2286613 -1.0000 1.0100 1 -1.0000 -1.0000 -1.0000 9.9000 0.4768
Rs731703 0.9723 0.3241 1 0.9327 1.0663 0.8158 0.3128 0.4768
Rs2276014 3.4842 0.0620 1 1.1826 1.4048 0.9955 0.0554 0.1750
Gs229601636 -1.0000 1.0100 1 -1.0000 -1.0000 -1.0000 9.9000 0.4768
RsI 1600990 4.4429 0.0350 1 1.2088 1.4368 1.0169 0.0272 0.1022
Rs2079786 -1.0000 1.0100 1 -1.0000 -1.0000 -1.0000 9.9000 0.4768
TABLE 6B
2df
Ref ACC X-squared p.val df perm p-val cor p-val
Gs229601623 7.4860 0.0237 2 0.0224 0.0688
RsI 1231746 1.2232 0.5425 2 0.5358 0. .5838
Rs2286613 -1.0000 1.0100 0 9.9000 0.5838
Rs731703 1.1302 0.5683 2 0.5574 0, .5838
Rs2276014 7.9674 0.0186 2 0.0184 0 .0622
Gs229601636 -1.0000 1.0100 0 9.9000 0 .5838
RsI 1600990 8.0901 0.0175 2 0.0164 0.0578
Rs2079786 0.1175 0.7318 1 0.3330 0, .5838
TABLE 6C
2df dominant 2df recessive
Ref ACC X-squared p.val df X-squared p.val df
Gs229601623 5.9526 0.0147 1 0.1336 0.7148 1
RsI 1231746 0.7052 0.4010 1 0.7014 0.4023 1
Rs2286613 2.0362 0.4488 0 0.1296 0.7189 1
Rs731703 0.5280 0.4674 1 0.7498 0.3865 1
Rs2276014 5.6851 0.0171 1 0.3958 0.5293 1
Gs229601636 -1.0000 1.0000 1 0.0005 0.9818 1
RsI 1600990 ' 6.4993 0.0108 1 0.1200 0.7290 1
Rs2079786 0.1175 0.7318 1 0.1298 0.7.186 1
[167] In summary, our results suggest that multiple ESRRA variants may have contributed to the development of T2DM. These SNPs and haplotypes can be used to improve T2DM diagnosis, help design clinical trials by better patient stratification. [168] ESRRA Haplotype Analysis. In addition, significant association (p < 0.05) was also found with one haplotype (hap 3) within the ESRRA genomic region. Two haplotype association methods were used for this analysis: Global_p and the Max-Stat_p (see TABLE 3 and TABLE 7).
Figure imgf000058_0001
[169] Peroxisome proliferation-activated receptor gamma coactivator 1 a (PPARGClA; PGC-I a) Association. Two SNPs (rs2305683 and rs4469064 ) showed association with T2DM (p<0.05), by allele-specific analysis and by genotype analysis using all three phenotypes or using the dominant model. One additional SNP (rsl532195) is associated with higher incidence of T2DM (p<0.05) by genotype analysis using all three genotypes (co- dominant model. [170] PGC 1 α stimulates mitochondrial biogenesis and respiration in muscle cells in mice through an induction of uncoupling protein-2 (Ucp2) and through regulation of the nuclear respiratory factors, Nrfl and Nr£2. Wu Z et al., Cell 98: 115-124 (1999). It plays a key role in insulin-regulated gluconeogenesis in liver. Yoon JC et al., Nature 413: 131-138 (2001). [171] Microarray expression studies showed modest but coδrdinant reduction in the expression of a set of genes involved in oxidative phosphorylation in human diabetic muscle. Expression of these genes is high in sites of insulin-mediated glucose disposal, is activated by PGCl α, and correlates with total body aerobic capacity. PGC lα expression is lower in diabetic patients. Mootha VK et al., Proc Natl Acad Sci USA. 101(17): 6570-5 (April 27, 2004). [ 172] These data suggest that PGC- 1 α could play an important role in the development of human T2DM and that improvement of PGC-I α mediated oxidative phosphorylation pathway is a potential approach to treat diabetes.
[173] TABLE 8 summaries the PGC-lα SNPs used in this analysis. Two SNPs (rs2305683 and rs4469064 ) showed association with T2DM (p<0.05), by allele-specific analysis and by genotype analysis using all three phenotypes or using the dominant model. One additional SNP (rs 1532195) is associated with higher incidence of T2DM (p<0.05) by genotype analysis using all three genotypes (co-dominant model. The table also provides allele frequency of each SNP in controls and diabetic patients and two association analytical methods. HWE, which is used to measure the population admixture of the samples, is good for two of these SNPs except rs 1532195.
TABLE 8A ldf
Ref ACC X= p.val df Odds ci.hi ci.low perm p- cor p squared val val
Rsl532195 * 0.1741 0.6765 1 1.0335 1.1918 0.8962 0.6500 1.0000
Rs 1532196 0.0370 0.8475 1 0.9849 1.1239 0.8631 0.8150 1.0000
Rs3774923 0.1535 0.6952 1 0.9197 1.2953 0.6531 0.6340 1.0000
Rs768695 0.8413 0.3590 1 1.0625 1.2042 0.9375 0.3452 0.9964
Rs6833873 1.2383 0.2658 1 1.0774 1.2239 0.9484 0.2608 0.9860
Rs2932965 1.4030 0.2362 1 1.1055 1.2980 0.9416 0.2150 0.9694
Rs3774921 0.1449 0.7034 1 1.0268 1.1640 0.9057 0.6812 1.0000
Rs3736265 0.0363 0.8488 1 1.0364 1.3574 0.7913 0.8000 1.0000
Rs2932967 0.9794 0.3224 1 1.0857 1.2697 0.9284 0.3014 0.9934
Rs3755863 0.0786 0.7792 1 0.9800 1.1121 0.8637 0.7548 1.0000
Rs2970847 1.5290 0.2163 1 1.1106 1.3046 0.9455 0.1912 0.9568
Rsl472095 0.0687 0.7933 1 0.9784 1.1289 0.8481 0.7694 1.0000
Rs2932974 * 1.0303 0.3101 1 1.0861 1.2663 0.9315 0.2844 0.9904
Rs3774907 0.4129 0.5205 1 1.0548 1.2297 0.9048 0.4914 0.9998
Rs6839966 0.2856 0.5930 1 0.9144 1.2204 0.6851 0.5504 1.0000
Rs 10032795 -1.0000 1.0100 1 -1.0000 -1.0000 -1.0000 9.9000 1.0000
Rs6838835 0.0028 0.9581 1 1.0334 1.9125 0.5583 0.7694 1.0000
Rs7437482 0.0527 0.8185 1 1.0177 1.1597 0.8931 0.7926 1.0000
Rs7657517 0.4400 0.5071 1 0.9337 1.1278 0.7731 0.4756 0.9998
Rs6850464 0.0052 0.9426 1 0.9884 1.1948 0.8177 0.9024 1.0000
Rs4235308 0.0000 0.9967 1 0.9977 1.1322 0.8791 0.9914 1.0000
Rs7438250 0.0612 0.8047 1 1.0245 1.2073 0.8694 0.7826 1.0000
Rs4637388 0.2205 0.6387 1 0.9597 1.1242 0.8192 0.6116 1.0000
Rs4361373 2.4080 0.1207 1 1.1489 1.3625 0.9687 0.1096 0.8414
Rs2305683 5.7634 0.0164 1 0.6775 0.9234 0.4971 0.0114 0.2010
Rs4469064 * 7.0700 0.0078 1 0.7145 0.9112 0.5603 0.0088 0.1666
Rs2946385 3.6369 Bu 1 0.8822 1.0014 0.7772 0.6208
Rs3774902 * 0.0065 0.9356 1 1.0000 1.3726 0.7285 0.8762 1.0000
Rs2970870 0.6157 0.4326 1 0.9477 1.0779 0.8333 0.4230 0.9992
RsI 878949 0.1423 0.7060 1 0.9622 1.1505 0.8047 0.6684 1.0000
* This SNP is not in Ha rdv-Weir ibere equilibrium in this nai ticinant Ό oϋulation.
TABLE 8B
2df
Ref ACC X-squared p.val df perm p-val cor p-val
Rsl532195 * 6.0019 0.0497 2 0.6356
Rs 1532196 0.3271 0.8491 2 0.8504 1.0000
Rs3774923 0.1623 0.6871 1 0.6386 1.0000
Rs768695 2.4921 0.2876 2 0.2838 0.9948
Rs6833873 2.8221 0.2439 2 0.2428 0.9856
Rs2932965 3.0344 0.2193 2 0.2146 0.9820
Rs3774921 0.3089 0.8569 2 0.8560 1.0000
Rs3736265 0.3979 0.8196 2 0.8718 1.0000
Rs2932967 2.4616 0.2921 2 0.2838 0.9948
Rs3755863 1.8310 0.4003 2 0.4082 0.9990
Rs2970847 1.7668 0.4134 2 0.4122 0.9990
Rsl472095 0.9019 0.6370 2 0.6392 1.0000
Rs2932974 * 2.8557 0.2398 2 0.2380 0.9852
Rs3774907 1.8469 0.3971 2 0.3948 0.9990
Rs6839966 0.7660 0.3814 1 0.3512 0.9986
Rs 10032795 -1.0000 1.0100 0 9.9000 1.0000
Rs6838835 0.0028 0.9579 1 0.7686 1.0000
Rs7437482 0.1208 0.9414 2 0.9472 1.0000
Rs7657517 0.6201 0.7334 2 0.7248 1.0000
Rs6850464 0.7301 0.6941 2 0.6878 1.0000
Rs4235308 0.0034 0.9983 2 0.9976 1.0000
Rs7438250 1.1605 0.5598 2 0.5612 1.0000
Rs4637388 0.8073 0.6679 2 0.6576 1.0000
Rs4361373 3.0509 0.2175 2 0.2182 0.9826
Rs2305683 6.4249 0.0113 1 0.0096 0.1900
Rs4469064 * 7.3114 0.025S 2 0.0228 0.3748
Rs2946385 3.9158 0.1412 2 0.1452 0.9354
Rs3774902 * 2.0664 0.3559 2 0.3510 0.9986
Rs2970870 0.7494 0.6875 2 0.6886 1.0000
RsI 878949 1.3492 0.5094 2 0.5076 1.0000
* This SNP is not in Hardy- Weinberg equilibrium in this participant population.
TABLE 8C
2df dominant 2df recessive
Ref ACC X-squared p.val df X-squared p.val df
Rsl532195 * 1.7368 0.1875 1 2.0147 0.1558 1
Rsl532196 0.0001 0.9914 1 0.2108 0.6461 1
Rs3774923 0.1606 0.6886 1 0.9783 1.0000 0
Rs768695 2.0901 0.1483 1 0.0011 0.9738 1
Rs6833873 2.5062 0.1134 1 0.0194 0.8893 1
Rs2932965 2.3016 0.1292 1 0.0411 0.8394 1
Rs3774921 0.2494 0.6175 1 0.0025 0.9598 1
Rs3736265 0.0032 0.9549 1 0.0985 0.7537 1
Rs2932967 1.7379 0.1874 1 0.0766 0.7819 1
Rs3755863 0.1248 0.7238 1 1.0629 0.3025 1
Rs2970847 1.6290 0.2018 1 0.0954 0.7575 1
Rsl472095 0.0008 0.9772 1 0.6264 0.4287 1
Rs2932974 * 1.9381 0.1639 1 0.1494 0.6991 1
Rs3774907 0.9874 0.3204 1 0.2176 0.6409 1
Rs6839966 0.5137 0.4735 1 1.6825 0.5053 0
RsI 0032795 0.9749 1.0000 0 0.3177 0.5730 1
Rs6838835 0.0028 0.9579 1 0.1305 0.7179 1
Rs7437482 0.0062 0.9370 1 0.0785 0.7794 1
Rs7657517 0.5193 0.4712 1 0.0164 0.8981 1
Rs6850464 0.0810 0.7760 1 0.2557 0.6131 1
Rs4235308 0.0016 0.9682 1 0.0000 0.9987 1
Rs7438250 0.3141 0.5752 1 0.3124 0.5762 1
Rs4637388 0.0393 0.8428 1 0.6046 0.4368 1
Rs4361373 1.5295 0.2162 1 1.7977 0.1800 1
Rs2305683 6.2242 0.0126 1 0.9899 1.0000 0
Rs4469064 * 6.9435 0.0084 1 0.2711 0.6026 1
Rs2946385 1.8712 0.1713 1 2.9575
Rs3774902 * 0.0652 0.7984 1 0.9048
Figure imgf000062_0001
Rs2970870 0.6484 0.4207 1 0.1335 0.7149 1
RsI 878949 0.4392 0.5075 1 0.3042 0.5813 1
* This SNP is not in Hardv '-Weinberg i eαuilibrium in this ϋarticir. >ant υorjulatior L.
[174] In summary, multiple PGC- lα variants have been associated with T2DM. These SNPs and haplotypes can be used to improve T2DM diagnosis, help design clinical trials by better patient stratification.
[175] Peroxisome proliferation-activated receptor δ (PPARD) Association. SNP rsl053046 showed association with T2DM (pθ.01) by allele-specifϊc analysis and by genotype analysis using all three genotypes or the dominant model. Four additional SNPs (rsl 1571504, rs9296148, rs9658100 and rs3798343) showed association with T2DM by allele-specific analysis (p<0.05), three of which (rsl 1571504, rs9296148, rs9658100) also demonstrated association by genotype analysis using all three genotypes and the dominant model (P<0.05). [176] PPARD belongs to the peroxisome proliferator-activated receptor transcription factor superfamily which includes PPAR-alpha, PPAR-gamma and PPAR-delta. PPAR-alpha and PPAR-gamma have been shown to be activated by a variety of fatty acids and hypolipidaemic compounds and human PPARD and mouse PPARD are known to be activated by Cl 8 unsaturated fatty acids. PPARs are key mediators of lipid metabolism in body. [ 177] Treatment of middle-aged insulin-resistant obese rhesus monkeys with GW501516, a selective PPARD agonist, causes a dramatic dose-dependent increase in serum high density lipoprotein (HDL) cholesterol, while lowering the levels of low density lipoprotein (LDL), fasting triglycerides, and fasting insulin. Oliver WR Jr et αl., Proc NαtlAcαdSci USA. 98(9):5306-l 1 (April 24, 2001). Administration of the agonist to mice fed a high-fat diet and to genetically obese ob/ob ameliorated obesity, diabetes, and insulin resistance. Tanaka T et αl., Proc NαtlAcαdSci USA. 100(26): 15924-9 (December 23, 2003). [178] Agonizing PPARD is therefore a promising approach to treat obesity, dislipidaemia, insulin resistance and T2DM.
[179] TABLE 9 summarizes the PPARD SNPs used in this analysis. Of the 24 SNPs genotyped successfully, SNP rsl053046 showed association with T2DM (pO.Ol) by allele- specific analysis and by genotype analysis using all three genotypes or the dominant model. Four additional SNPs (rsl 1571504, rs9296148, rs9658100 and rs3798343) showed association with T2DM by allele-specific analysis (p<0.05), three of which (rsl 1571504, rs9296148, rs9658100) also demonstrated association by genotype analysis using all three genotypes and the dominant model (P<0.05). The TABLE also provides allele frequency of each SNP in controls and diabetic patients, and two association analytical methods. HWE, which is used to measure the population admixture of the samples, is good for these SNPs. TABLE 9A ldf
Ref_ACC X- p.val df odds ci.lii ci.low permjp- cor_p- squared val val
Gs229601276 0.9594 0.3273 1 0.8586 1.1410 0.6460 0.3014 0.8642
Rs9658060 0.0629 0.8020 1 0.9910 2.8304 0.3469 0.7856 0.9808
RsI1571504 4.3270 0.0375 1 0.7690 0.9786 0.6043 0.0370 0.2856
Rs2267664 2.1718 0.1406 1 0.7091 1.0879 0.4622 0.1046 0.5694
Rs7744392 0.8413 0.3590 1 0.8657 1.1520 0.6505 0.3218 0.8872
Rs6457815 0.7363 0.3908 1 0.8639 1.1738 0.6359 0.3574 0.8872
Rs9296148 5.0124 0.0252 1 0.7541 0.9597 0.5925 0.0232 0.2062
Rs9658085 0.4405 0.5069 1 0.8864 1.2178 0.6451 0.4490 0.9186
Rs7770619 0.1138 0.7359 1 0.9329 1.2903 0.6744 0.6730 0.9760
Rs9658100 4.8805 0.02 0.9631 0.5936 0.0274 0.2324
Rs3798343 3.4621 1.0035 0.4956 0.0478 0.3498
Rsl040436 0.7471 mm72 1 0.7561
' 1 0.7052
0.3874 1 0.9352 1.0818 0.8085 0.3826 0.8872
Rs2267665 0.2616 0.6090 1 1.0484 1.2392 0.8870 0.5992 0.9666
Rs2267667 0.7158 0.3975 1 0.9365 1.0832 0.8097 0.3892 0.8872
Rs2038068 0.0534 0.8173 1 0.9800 1.1357 0.8456 0.7932 0.9808
Rs2267669 0.7557 0.3847 1 1.0805 1.2762 0.9148 0.3634 0.8872
Rs2016520 0.0053 0.9418 1 0.9909 1.1610 0.8457 0.9118 0.9808
Gs229601291 -1.0000 1.0100 1 -1.0000 -1.0000 -1.0000 9.9000 0.9808
Rs2076169 1.5681 0.2105 1 1.1579 1.4427 0.9294 0.1854 0.7200
Rs2076167 0.5579 0.4551 1 0.9425 1.0928 0.8128 0.4440 0.9186
Rs9658163 -1.0000 1.0100 1 -1.0000 -1.0000 -1.0000 9.9000 0.9808
Rs3734254 0.7328 0.3920 1 0.9309 1.0888 0.7959 0.3858 0.8872
Rsl053046 8.9129 0.0028 1 0.6455 0.8563 0.4866 0.0026 0.0320
Rs760783 0.8869 0.3463 1 1.0914 1.2986 0.9172 0.3304 0.8872
TABLE 9B
2df
Ref ACC X-squared p.val df perm p-val cor p-val
Gs229601276 1.3474 0.2457 1 0.2212 0.8196
Rs9658060 0.0631 1 0.9736 0.9990
Rsl l571504 4.8360
Figure imgf000065_0001
2 " 0.5730
Rs2267664 1.6638 1 0.1524 0.7264
Rs7744392 1.2068 0.2720 1 0.2442 0.8470
Figure imgf000065_0002
Rs7770619 0.2630 0.6080 1 0.5500 0.9794
Rs9658100 5.9736 2 " 0.4050
Rs3798343 2.4449 0.1179 1 in 0.5908
RsI 040436 1.0397 0.5946 2 0.5894 0.9794
Rs2267665 0.3207 0.8518 2 0.8490 0.9916
Rs2267667 0.7672 0.6814 2 0.6756 0.9862
Rs2038068 0.4692 0.7909 2 0.7876 0.9892
Rs2267669 0.8853 0.6423 2 0.6578 0.9794
Rs2016520 0.0969 0.9527 2 0.9506 0.9990
Gs229601291 -1.0000 1.0100 0 9.9000 0.9990
Rs2076169 1.7311 0.4208 2 0.4176 0.9438
Rs2076167 1.1030 0.5761 2 0.5860 0.9794
Rs9658163 -1.0000 1.0100 0 9.9000 0.9990
Rs3734254 0.7847 0.6755 2 0.6816 0.9862
Rsl053046 9.3599 0.0093 2 0.0074 Ui
Rs760783 0.9889 0.6099 2 0.6228 0.9794
TABLE 9C
2dfdominant 2dfrecessive
Ref ACC X-squared p.val df X-squared p.val df
Gs229601276 1.1813 0.2771 1 0.5073 1.0000 0
Rs9658060 0.0631 0.8016 1 0.0407 0.8401 1
RsI1571504 4.5491 0.0329 1 0.0006 0.9812 1
Rs2267664 1.9278 0.1650 1 -1.0000 1.0000 1
Rs7744392 1.0479 0.3060 1 0.5027 1.0000 0
Rs6457815 0.9440 0.3313 1 1.5012 1.0000 0
Rs9296148 5.2888 0.0215 1 0.0009 0.9758 1
Rs9658085 0.5839 0.4448 1 1.9468 1.0000 0
Rs7770619 0.1856 0.6666 1 0.5068 1.0000 0
Rs9658100 5.4622 0.7911 1
Rs3798343 2.9715 1.0000 1
Rs1040436 0.9355 m 0.0194 1 0.0702
1 -1.0000
0.33m34 1 0.0098 0.9212 1
Rs2267665 0.2088 0.6477 1 0.0524 0.8190 1
Rs2267667 0.5862 0.4439 1 0.2167 0.6416 1
Rs2038068 0.1891 0.6637 1 0.0507 0.8218 1
Rs2267669 0.5961 0.4401 1 0.2576 0.6118 1
Rs2016520 0.0014 0.9707 1 0.0342 0.8532 1
Gs229601291 -1.0000 1.0000 1 0.0731 0.7869 1
Rs2076169 1.4995 0.2207 1 0.0629 0.8019 1
Rs2076167 0.8795 0.3483 1 0.0011 0.9739 1
Rs9658163 -1.0000 1.0000 1 0.0511 0.8212 1
Rs3734254 0.6236 0.4297 1 0.1670 0.6828 1
Rs1053046 8.9093 0.0028 1 0.1223 0.7266 1
Rs760783 0.8513 0.3562 1 0.0773 0.7810 1
[180] In summary, our results suggest that multiple PPARD variants may have contributed to the development of T2DM. These SNPs and haplotypes can be used to improve T2DM diagnosis, help design clinical trials by better patient stratification.
[181] PPARD Haplotype Analysis. In addition, significant association (p < 0.01) was also found with one haplotype within the PPARD genomic region. Two haplotype association methods were used for this analysis: Global jp and the Max-Statjp (See TABLE 3 and TABLE 10).
Figure imgf000067_0001
hap. Hap-Freq Hap-Score p-val sim p-val
[IJ 0.0058 -2.9798 0.0029 0.0000
[2,] 0.0086 -1.4727 0.1408 0.1490
P,] 0.0169 -0.9765 0.3288 0.3440
[4,] 0.0069 -0.8793 0.3792 0.3600
[5,] 0.0510 -0.5934 0.5529 0.5470
[6,] 0.0153 -0.2046 0.8379 0.8510
[7,] 0.0061 0.0946 0.9247 0.9810
[8,] 0.0144 0.2351 0.8141 0.7890
[9,] 0.0174 0.3156 0.7523 0.7540
[10,] 0.7347 0.7547 0.4505 0.4490
[H5] 0.0921 1.2111 0.2259 0.2280
[182] ACACB Haplotype Analysis. Acetyl-CoA carboxylase-beta (ACC-beta) is hypothesized to control fatty acid oxidation by means of the ability of malonyl-CoA to inhibit carnitine palmitoyl transferase I (CPTlA), the rate-limiting step in fatty acid uptake and oxidation by mitochondria. ACC-beta is expressed primarily in heart and skeletal muscles. [183] Abu-Elheiga et al, Science 291(5513): 2613-6 (2001) generated mice deficient in ACC2 by targeted disruption. Acc2 -/- mutant mice have a normal life span, a higher fatty acid oxidation rate, and lower amounts of fat. In comparison to the wildtype, Acc2 -deficient mice had 10- and 30-fold lower levels of malonyl-CoA in heart and muscle, respectively. The fatty acid oxidation rate in the soleus muscle of the Acc2 -/- mice was 30% higher than that of wildtype mice and was not affected by addition of insulin, while addition of insulin to the wildtype muscle reduced fatty acid oxidation by 45%. The mutant mice accumulated 50% less fat in their adipose tissue than did wildtype mice.
[184] ACACB inhibition is therefore a promising approach to treat obesity, insulin resistance, fatty liver disease and T2DM.
[185] While individual SNPs provided little or very weak evidence, significant association with diabetes was found with eight haplotypes within the ACACB genomic region: hapl, p < 0.001; hap2, 3, 30, 31, 32, 33, and 34 p < 0.05. Two haplotype association methods were used for this analysis: Global_p and the Max-Stat_p (TABLE 11). I
Figure imgf000069_0001
Figure imgf000070_0001
[186] In summary, our results suggest that variations in ACACB are associated with T2DM. These SNPs and haplotypes can be used to improve T2DM diagnosis, help design clinical trials by better patient stratification.
[187] Acyl-CoA dehydrogenase (ACADSB) association. One of 13 SNPs analyzed are associated with higher incidence of T2DM (p<0.05), by allele-specific analysis and/or genotype analysis using both dominant and recessive models. [188] The acyl-CoA dehydrogenases (ACADs) are a group of mitochondrial enzymes involved in the metabolism of fatty acids or branched-chain amino acids. ACADSB had the greatest activity toward the short branched chain acyl-CoA derivative, (S)-2-methylbutyryl- CoA, but also reacted significantly with other 2-methyl-branched chain substrates and with short straight chain acyl-CoA. It has been well established that dysregulation of fatty acid and lipid metabolism is of importance in the aetiology of obesity and type 2 diabetes mellitus. [189] TABLE 12 summaries the ACADSB SNPs used in this study. Of the 13 SNPs successfully genotyped and analyzed, one SNP (rs2421166) was shown to be associated with higher incidence of T2DM (p<0.05), by both allele-specific analysis and by genotype analysis using all three genotypes or using the dominant model. The TABLE provides frequency of each SNP in overall, controls and diabetic patients, Hardy- Weinberg Equilibrium analysis (HWE) of the population and two association analytical methods. HWE, which is used to measure the population admixture of the samples, is good for these SNPs.
TABLE 12A
Association analvsis based on allele (ldf) ldf
Ref ACC p.val df odds ci.hi ci.low perm p- cor p- squared val val rsl0510119 1.2375 0.2660 1 1.0781 1.2261 0.9479 0.2522 0.8062 rs4511227 0.0002 0.9884 1 0.9951 1.1768 0.8414 0.9738 0.9738 rs3824685 0.1319 0.7164 1 0.9682 1.1329 0.8274 0.6734 0.9510 rs6599641 0.1785 0.6727 1 1.0450 1.2562 0.8694 0.6278 0.9494 rs7912404 0.0909 0.7631 1 0.9706 1.1499 0.8193 0.7282 0.9510 rs7070793 0.4588 0.4982 1 0.9539 1.0864 0.8376 0.4868 0.9132 rs2421139 2.0511 0.1521 1 1.0987 1.2462 0.9686 0.1546 0.7278 rs 10902866 0.6865 0.4073 1 0.9350 1.0884 0.8032 0.3866 0.8622 rs4128783 0.9254 0.3361 1 0.9027 1.1006 0.7405 0.3172 0.8488 gs229601113 ' 1.3248 0.2497 1 1.1632 1.4854 0.9109 0.2320 0.8026 rs2421166 1.6122 0.2042 1 0.7073 1.1541 0.4335 0.1514 0.7278 rs3763738 3.8455 0.0499 1 0.5788 0.9727 0.3444 0.0330 0.2558 rs2118318 0.0018 0.9660 1 0.9541 1.6098 0.5655 0.8304 0.9738
Figure imgf000072_0001
[190] Carnitine Palmitoyltranserase IA (CPTlA) association. One of 20 SNPs analyzed are associated with higher incidence of T2DM (p<0.05), by allele-specific analysis and/or genotype analysis using both dominant and recessive models.
[191] The CPTlA gene encodes carnitine palmitoyltransferase IA5 a liver enzyme involved in fatty acid oxidation. Major control over fatty acid oxidation process is determined at the level of CPT I, whose activity in turn is inhibited by high cellular levels of malonyl-CoA concentration. It has been well established that dysregulation of fatty acid and lipid metabolism is of importance in the aetiology of obesity and type 2 diabetes mellitus. [192] Table 13 summaries the CPTlA SNPs used in this study. Of the 20 SNPs successfully genotyped and analyzed, one SNP (rsl 017641) was shown to be associated with higher incidence of T2DM (p<0.05), by both allele-specific analysis and by genotype analysis using all three genotypes or using the dominant model. The TABLE provides frequency of each SNP in overall, controls and diabetic patients, Hardy- Weinberg Equilibrium analysis (HWE) of the population and two association analytical methods. HWE, which is used to measure the population admixture of the samples, is good for these SNPs.
TABLE 13A
Association analysis based on allele (ldf) ldf
Ref ACC X- p.val df odds ci.hi ci.low perm p- cor p- squared val val rs2924699 0.8270 0.3631 1 1.1391 1.4799 0.8768 0.3248 0.9768 rs2123869 0.0012 0.9727 1 0.9993 1.1791 0.8470 0.9460 0.9990 rs2228502 0.0117 0.9140 1 0.9756 1.2813 0.7429 0.8694 0.9990 rs2305508 0.3662 0.5451 1 0.9597 1.0891 0.8456 0.5404 0.9988 rs747840 0.8635 0.3528 1 0.7786 1.2447 0.4870 0.3074 0.9740 rsl1228356 0.0089 0.9247 1 1.0143 1.2302 0.8362 0.8908 0.9990 gs229601477 0.0468 0.8288 1 0.8793 1.6968 0.4557 0.6170 0.9988 gs229601476 -1.0000 1.0100 1 -1.0000 -1.0000 -1.0000 9.9000 0.9990 gs229601475 1.1921 0.2749 1 1.1603 1.4933 0.9016 0.2438 0.9486 rs3794020 0.1246 0.7241 1 0.8518 1.5788 0.4596 0.5452 0.9988 gs229601471 -1.0000 1.0100 1 -1.0000 -1.0000 -1.0000 9.9000 0.9990 rs6591356 -1.0000 1.0100 1 -1.0000 -1.0000 -1.0000 9.9000 0.9990 rs3019607 0.0260 0.8719 1 0.9367 1.4841 0.5912 0.7408 0.9990 rs3019613 0.1551 0.6937 1 1.0410 1.2454 0.8702 0.6716 0.9988 rs879784 1.6137 0.2040 1 0.8783 1.0651 0.7243 0.2028 0.9166 rslO17641 1.7848 0.1816 1 0.9132 1.0400 0.8019 0.1796 0.8930 rs613084 0.2614 0.6092 1 0.9641 1.0997 0.8452 0.5814 0.9988 rs3019594 0.0031 0.9558 1 0.9865 1.2444 0.7821 0.9108 0.9990 rs597316 0.0683 0.7939 1 0.9800 1.1205 0.8571 0.7614 0.9990 rs2060982 0.5150 0.4730 1 0.9534 1.0801 0.8416 0.4596 0.9964 TABLE 13B
Association analysis ; based on genotvϋes (2d£)
2df
Ref ACC X-squared p.val df perm p-val cor p-val rs2924699 0.9978 0.6072 2 0.6028 0.9984 rs2123869 3.0757 0.2148 2 0.2080 0.9134 rs2228502 0.1077 0.9476 2 0.9504 0.9992 rs2305508 0.6346 0.7281 2 0.7322 0.9992 rs747840 0.2244 0.6357 1 0.5350 0.9984 rsl 1228356 4.5516 0.1027 2
Figure imgf000074_0001
0.7188 gs229601477 0.0472 0.8280 1 0.9984 gs229601476 -1.0000 1.0100 0 9.9000 0.9992 gs229601475 2.4037 0.1210 1 0.1076 0.7356 rs3794020 0.1260 0.7227 1 0.5390 0.9984 gs229601471 -1.0000 1.0100 0 9.9000 0.9992 rs6591356 -1.0000 1.0100 0 9.9000 0.9992 rs3019607 0.0052 0.9426 1 0.9774 0.9992 rs3019613 1.6144 0.4461 2 0.4460 0.9958 rs879784 1.8208 0.4024 2 0.4092 0.9928 rslO17641 6.0927 0.0475 2 0.0456 0.4568 rs613084 0.8629 0.6496 2 0.6502 0.9984 rs3019594 0.5135 0.7736 2 0.7690 0.9992 rs597316 0.4905 0.7825 2 0.7826 0.9992 rs2060982 1.3602 0.5066 2 0.4970 0.9974
TABLE 13C
Association analysis based on genotypes (2df)
2df dominant 2df recessive
Ref ACC X-squared p.val df X-squared p.val df rs2924699 0.6752 0.4113 1 0.0895 0.7648 1 rs2123869 0.2518 0.6158 1 1.7532 0.1855 1 rs2228502 0.0012 0.9722 1 0.0006 0.9812 1 rs2305508 0.0430 0.8358 1 0.5499 0.4584 1 rs747840 0.4966 0.4810 1 -1.0000 1.0000 1 rsl 1228356 0.4355 0.5093 1 2.5462 0.1106 1 gs229601477 0.0472 0.8280 1 0.1296 0.7189 1 gs229601476 -1.0000 1.0000 1 0.1294 0.7190 1 gs229601475 1.8271 0.1765 1 0.3929 0.2846 0 rs3794020 0.1260 0.7227 1 0.0728 0.7873 1 gs229601471 -1.0000 1.0000 1 0.1311 0.7172 1 rs6591356 4.0380 0.2166 0 0.0248 0.8749 1 rs3019607 0.0019 0.9649 1 -1.0000 1.0000 1 rs3019613 0.5130 0.4738 1 0.3866 0.5341 1 rs879784 1.0344 0.3091 1 0.8948 0.3442 1 rslO17641 4.5499 0.0329 1 0.0878 0.7670 1 rs613084 0.6132 0.4336 1 0.0041 0.9489 1 rs3019594 0.0001 0.9935 1 0.2050 0.6508 1 rs597316 0.2539 0.6144 1 0.0260 0.8720 1 rs2060982 1.1431 0.2850 1 0.0003 0.9872 1
EQUIVALENTS
[193] The details of one or more embodiments of the invention are set forth in the accompanying description above. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, the preferred methods and materials are now described. Other features, objects, and advantages of the invention will be apparent from the description and the claims. In the specification and the appended claims, the singular forms include plural referents unless the context clearly dictates otherwise. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. All references cited herein are incorporated herein by reference in their entirety and for all purposes to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference in its entirety for all purposes. [194] The present invention is not to be limited in terms of the particular embodiments described in this application, which are intended as single illustrations of individual aspects of the invention. Many modifications and variations of this invention can be made without departing from its spirit and scope, as will be apparent to those skilled in the art. Functionally equivalent methods and apparatuses within the scope of the invention, in addition to those enumerated herein, will be apparent to those skilled in the art from the foregoing descriptions. Such modifications and variations are intended to fall within the scope of the appended claims. The present invention is to be limited only by the terms of the appended claims, along with the full scope of equivalents to which such claims are entitled.
Figure imgf000077_0001
Figure imgf000078_0001
Figure imgf000079_0001
Figure imgf000080_0001
Figure imgf000081_0001
Figure imgf000082_0001
Figure imgf000083_0001
Figure imgf000084_0001
Figure imgf000085_0001
Figure imgf000086_0001
Figure imgf000087_0001
Figure imgf000088_0001
Figure imgf000089_0001
Figure imgf000090_0001
Figure imgf000091_0001
Figure imgf000092_0001
Figure imgf000093_0001
Figure imgf000094_0001
Figure imgf000095_0001
Figure imgf000096_0001
Figure imgf000097_0001
Figure imgf000098_0001
Figure imgf000099_0001
Figure imgf000100_0001
Figure imgf000101_0001
Figure imgf000102_0001
Figure imgf000103_0001
Figure imgf000104_0001
Figure imgf000105_0001
Figure imgf000106_0001
Figure imgf000107_0001
Figure imgf000108_0001

Claims

CLAIMS What is claimed is:
1. A method for diagnosing a predisposition to type 2 diabetes in an individual; comprising the steps of :
(a) obtaining a sample from an individual; and
(b) determining the genotype of the individual, wherein the determination of a single nucleotide polymorphism (SNP) in a gene selected from the group consisting of TCF2 (hepatocyte nuclear factor lβ), ACADSB (acetyl-CoA dehydrogenase), CPTlA (carnitine palmitoyl transferase I), ESRRA (estrogen related receptor α), PPARD (peroxisome proliferation-activated receptor δ), PPARGClA (peroxisome proliferation-activated receptor gamma coactivator lα) and SCD (stearoyl-CoA desaturase) that is indicative of a predisposition to type 2 diabetes is diagnostic that the individual has a predisposition to type 2 diabetes.
2. The method of claim 1 , wherein the SNP in the TCF2 gene indicative of a predisposition to type 2 diabetes is rsl 1651755.
3. The method of claim 1, wherein the SNP in the ACADSB gene indicative of a predisposition to type 2 diabetes is rs242116.
4. The method of claim 1 , wherein the SNP in the CPTlA gene indicative of a predisposition to type 2 diabetes is rslO17641.
5. The method of claim 1, wherein the SNP in the ESRRA gene indicative of a predisposition to type 2 diabetes is selected from the group consisting of gs229601623 and rsl 160090.
6. The method of elaim 1, wherein the SNP in the PPARD gene indicative of a predisposition to type 2 diabetes is selected from the group consisting of rs 1053046, rsl 1571504, rs9296148, rs9658100 and rs3798343.
7. The method of claim 1, wherein the SNP in the PPARGClA gene indicative of a predisposition to type 2 diabetes is selected from the group consisting of rs2305683, rs4469064 and rsl532195.
8. The method of claim 1 , wherein the SNP in the SCD gene indicative of a predisposition to type 2 diabetes is selected from the group consisting of rs3870747, rs7849 and rs508384.
9. A method for diagnosing a predisposition to type 2 diabetes in an individual; comprising the steps of :
(a) obtaining a sample from an individual; and
(b) determining the haplotype of the individual, wherein the determination of a haplotype in a gene selected from the group consisting of ESRRA (estrogen related receptor α), PPARD (peroxisome proliferation-activated receptor δ), SCD (stearoyl-CoA desaturase) and ACACB (acetyl-CoA carboxylase β), that is indicative of a predisposition to type 2 diabetes is diagnostic that the individual has a predisposition to type 2 diabetes.
10. The method of claim 9, wherein the haplotype in the ESRRA gene indicative of a predisposition to type 2 diabetes is hap3.
11. The method of claim 9, wherein the haplotype in the PPARD gene indicative of a predisposition to type 2 diabetes is hapl.
12. The method of claim 9, wherein the haplotype in the ACACB gene indicative of a predisposition to type 2 diabetes is selected from the group consisting of hap 1, hap2, hap3, hap30, hap31, hap32, hap33, and hap34, wherein the presence of hapl, hap2 or hap3 is indicative of a lower predisposition to type 2 diabetes and the presence of hap 30, hap31 , hap32 hap33 or hap34 is indicative of a higher predisposition to type 2 diabetes.
13. A method for treating type 2 diabetes in an individual, comprising the steps of:
(a) obtaining a sample from an individual suspected of having type 2 diabetes;
(b) determining the genotype of the individual, wherein the genotype determined is a single nucleotide polymorphism (SNP) in a gene selected from the group consisting of TCF2, ACADSB, CPTlA, ESRRA, PPARD, PPARGClA and SCDl and is indicative of a predisposition to type 2 diabetes; and
(c) if the determined genotype indicates that the individual has a predisposition to type 2 diabetes, administering an anti-type 2 diabetic therapy to the individual.
14. The method of claim 13, wherein the anti-type 2 diabetic therapy is selected from the group consisting of nateglinide, LAF237 and LBM642.
15. Use of a compound selected from the group consisting of nateglinide, LAF237 and LBM642 in the manufacture of a medicament for the treatment in a selected population of type 2 diabetes patients, where the population is selected on the basis of a single nucleotide polymorphism (SNP) in a gene selected from the group consisting of TCF2, ACADSB, CPTlA, ESRRA, PPARD, PPARGClA and SCDl that is indicative of a predisposition to type 2 diabetes
16. A method for treating type 2 diabetes in an individual, comprising the steps of:
(a) obtaining a sample from an individual suspected of having type 2 diabetes;
(b) determining the haplotype of the individual, wherein the haplotype determined is in a gene selected from the group consisting of ESRRA, PPARD and ACACB and is indicative of a predisposition to type 2 diabetes; and
(c) if the determined haplotype indicates that the individual has a predisposition to type 2 diabetes, administering an anti-type 2 diabetic therapy to the individual.
PCT/US2006/010464 2005-03-25 2006-03-22 Biomarkers for pharmacogenetic diagnosis of type 2 diabetes WO2006104812A2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2008503147A JP2008538177A (en) 2005-03-25 2006-03-22 Biomarkers for pharmacogenetic diagnosis of type 2 diabetes
EP06739311A EP1869214A2 (en) 2005-03-25 2006-03-22 Biomarkers for pharmacogenetic diagnosis of type 2 diabetes

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US66516805P 2005-03-25 2005-03-25
US60/665,168 2005-03-25

Publications (2)

Publication Number Publication Date
WO2006104812A2 true WO2006104812A2 (en) 2006-10-05
WO2006104812A3 WO2006104812A3 (en) 2007-06-28

Family

ID=37053914

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2006/010464 WO2006104812A2 (en) 2005-03-25 2006-03-22 Biomarkers for pharmacogenetic diagnosis of type 2 diabetes

Country Status (3)

Country Link
EP (1) EP1869214A2 (en)
JP (1) JP2008538177A (en)
WO (1) WO2006104812A2 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008087209A1 (en) * 2007-01-19 2008-07-24 Integragen Human diabetes susceptibility iglc gene
WO2008087204A1 (en) * 2007-01-19 2008-07-24 Integragen Human diabetes susceptibility btbd9 gene
WO2008098256A1 (en) * 2007-02-09 2008-08-14 Bristol-Myers Squibb Company Methods for identifying patients with an increased likelihood of responding to dpp-iv inhibitors
WO2008101972A1 (en) * 2007-02-21 2008-08-28 Integragen Human diabetes susceptibility pebp4 gene
WO2008101971A1 (en) 2007-02-21 2008-08-28 Integragen Human diabetes susceptibility shank2 gene
WO2008135508A2 (en) * 2007-05-04 2008-11-13 Integragen Human diabetes susceptibility eefsec gene
JP2010510804A (en) * 2006-11-30 2010-04-08 デコード・ジェネティクス・イーエイチエフ A genetically susceptible variant of type 2 diabetes
CN104195136A (en) * 2014-09-04 2014-12-10 中国农业科学院北京畜牧兽医研究所 Method or identifying length and/or height of pig body and special primer pair therefor
CN113117097A (en) * 2020-01-15 2021-07-16 中国药科大学 Medical application of estrogen related receptor alpha coding gene ESRRA

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014055687A1 (en) * 2012-10-05 2014-04-10 Hitachi Chemical Co., Ltd. Urine exosome mrnas and methods of using same to detect diabetic nephropathy
KR101741581B1 (en) 2015-11-16 2017-05-30 이화여자대학교 산학협력단 Prediction method for the level of anti-oxidation, anti-inflammation, or lipid metabolism of food
KR101980576B1 (en) * 2017-07-06 2019-05-22 충남대학교산학협력단 Biomarker for Diagnosing Diabetes Mellitus Type 2 Comprising PGC-1α

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998011254A1 (en) * 1996-09-10 1998-03-19 Arch Development Corporation MUTATIONS IN THE DIABETES SUSCEPTIBILITY GENES HEPATOCYTE NUCLEAR FACTOR (HNF) 1 ALPHA (α), HNF-1β AND HNF-4$g(a)
WO2004074514A2 (en) * 2003-02-20 2004-09-02 Centre National De La Recherche Scientifique (C.N.R.S.) Method of diagnosis of type 2 diabetes and early onset thereof
US20050064480A1 (en) * 2003-08-15 2005-03-24 Affymetrix, Inc. Association of FHOD2 with common type 2 diabetes mellitus

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998011254A1 (en) * 1996-09-10 1998-03-19 Arch Development Corporation MUTATIONS IN THE DIABETES SUSCEPTIBILITY GENES HEPATOCYTE NUCLEAR FACTOR (HNF) 1 ALPHA (α), HNF-1β AND HNF-4$g(a)
WO2004074514A2 (en) * 2003-02-20 2004-09-02 Centre National De La Recherche Scientifique (C.N.R.S.) Method of diagnosis of type 2 diabetes and early onset thereof
US20050064480A1 (en) * 2003-08-15 2005-03-24 Affymetrix, Inc. Association of FHOD2 with common type 2 diabetes mellitus

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BAIER L. J. ET AL.,: "mutations in the genes for hepatocyte nuclear factor (HNF)-1a, -4a, -1b, and -3b, the dimerization cofactor of hnf-1, and isulin promoter factor 1 are not common causes of early-onset type 2 diabetes in pima indians" DIABETES CARE, vol. 23, no. 3, March 2000 (2000-03), pages 302-304, XP002414331 *
EK J. ET AL.,: "studies of the variability of the hepatocyte nuclear factor-1 beta (HNF-1 beta /TCF2) and the dimerization cofactor of HNF-1 (DcoH/PCBD) genes in relation to type 2 diabetes mellitus and beta-cell function" HUMAN MUTATION, #447, 2001, pages 1-7, XP002414332 *
HORIKAWA Y. ET AL.,: "mutation in hepatocyte nuclear factor-1 beta gene (TCF2) associated with MODY" NATURE GENETICS, vol. 17, December 1997 (1997-12), pages 384-385, XP009076886 cited in the application *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010510804A (en) * 2006-11-30 2010-04-08 デコード・ジェネティクス・イーエイチエフ A genetically susceptible variant of type 2 diabetes
JP2014097060A (en) * 2006-11-30 2014-05-29 Decode Genetics Ehf Genetic sensitive variant of type 2 diabetes
WO2008087209A1 (en) * 2007-01-19 2008-07-24 Integragen Human diabetes susceptibility iglc gene
WO2008087204A1 (en) * 2007-01-19 2008-07-24 Integragen Human diabetes susceptibility btbd9 gene
WO2008098256A1 (en) * 2007-02-09 2008-08-14 Bristol-Myers Squibb Company Methods for identifying patients with an increased likelihood of responding to dpp-iv inhibitors
WO2008101972A1 (en) * 2007-02-21 2008-08-28 Integragen Human diabetes susceptibility pebp4 gene
WO2008101971A1 (en) 2007-02-21 2008-08-28 Integragen Human diabetes susceptibility shank2 gene
WO2008135508A2 (en) * 2007-05-04 2008-11-13 Integragen Human diabetes susceptibility eefsec gene
WO2008135508A3 (en) * 2007-05-04 2009-01-08 Integragen Sa Human diabetes susceptibility eefsec gene
CN104195136A (en) * 2014-09-04 2014-12-10 中国农业科学院北京畜牧兽医研究所 Method or identifying length and/or height of pig body and special primer pair therefor
CN113117097A (en) * 2020-01-15 2021-07-16 中国药科大学 Medical application of estrogen related receptor alpha coding gene ESRRA

Also Published As

Publication number Publication date
WO2006104812A3 (en) 2007-06-28
EP1869214A2 (en) 2007-12-26
JP2008538177A (en) 2008-10-16

Similar Documents

Publication Publication Date Title
WO2006104812A2 (en) Biomarkers for pharmacogenetic diagnosis of type 2 diabetes
US20100035251A1 (en) BioMarkers for the Progression of Alzheimer&#39;s Disease
US20100249107A1 (en) Biomarkers for Alzheimer&#39;s Disease Progression
WO2006130527A2 (en) Mutations and polymorphisms of fibroblast growth factor receptor 1
US20090118350A1 (en) Biomarkers for Identifying Efficacy of Tegaserod in Patients with Chronic Constipation
WO2006060429A2 (en) Identification of variants in histone deacetylase 1 (hdac1) to predict drug response
WO2006110478A2 (en) Mutations and polymorphisms of epidermal growth factor receptor
AU2006227283B2 (en) Biomarkers for efficacy of aliskiren as a hypertensive agent
US20090281090A1 (en) Biomarkers for the prediction of responsiveness to clozapine treatment
WO2007022041A2 (en) Mutations and polymorphisms of hdac3
RU2408363C2 (en) Biomarkers for estimating efficacy of aliskiren as hypertensive agent
WO2007109183A2 (en) Mutations and polymorphisms of fms-related tyrosine kinase 1
WO2007002217A2 (en) Mutations and polymorphisms of bcl-2
WO2007109515A2 (en) Mutations and polymorphisms of knockdown resistance polypeptide
JP2007512231A (en) Use of genetic polymorphisms to predict drug-induced hepatotoxicity
WO2007095032A2 (en) Mutations and polymorphisms of ptk2b
WO2007121017A2 (en) Mutations and polymorphisms of fms-like tyrosine kinase 4
WO2007127524A2 (en) Mutations and polymorphisms of insr
WO2007058991A2 (en) Mutations and polymorphisms of c-abl
CN101146915A (en) Biomarkers for efficacy of aliskiren as a hypertensive agent

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2006739311

Country of ref document: EP

ENP Entry into the national phase in:

Ref document number: 2008503147

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase in:

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase in:

Ref country code: RU