WO2011120043A1 - Methods and compositions for diagnosing and predicting risk of late-onset alzheimer's disease - Google Patents

Methods and compositions for diagnosing and predicting risk of late-onset alzheimer's disease Download PDF

Info

Publication number
WO2011120043A1
WO2011120043A1 PCT/US2011/030199 US2011030199W WO2011120043A1 WO 2011120043 A1 WO2011120043 A1 WO 2011120043A1 US 2011030199 W US2011030199 W US 2011030199W WO 2011120043 A1 WO2011120043 A1 WO 2011120043A1
Authority
WO
WIPO (PCT)
Prior art keywords
imputed
snp
load
genotype
snps
Prior art date
Application number
PCT/US2011/030199
Other languages
French (fr)
Inventor
Margaret Pericak-Vance
Jonathan L. Haines
Joseph D. Buxbaum
John Gilbert
Gary Beecham
Eden Martin
Adam Naj
Original Assignee
University Of Miami
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University Of Miami filed Critical University Of Miami
Publication of WO2011120043A1 publication Critical patent/WO2011120043A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • the invention relates generally to the fields of genetics and medicine. More particularly, the invention relates to methods of and biomarkers for diagnosing and predicting risk of late-onset Alzheimer's Disease,
  • AD Alzheimer Disease
  • the invention relates to the development of reagents and methods for diagnosing LOAD and for determining a subject's risk of developing LO AD.
  • Studies looking for genetic variants across the genome that affect LOAD have had little success identifying genes other than APOE.
  • an expanded set of AD cases and controls was used to improve the power to detect genetic variants driving LOAD risk.
  • Adjacent variants on chromosome 6 were also genot ped in these same cases and controls, and it was found that these variants were also associated with LOAD.
  • the association was replicated with rsl 1754661 and additional SNPs in MTHFDIL in a combined dataset of cases and controls, some from publicly-available datasets. This finding is important because the gene is known to be involved in biological pathways influencing levels of homocysteine, a significant risk factor for AD. Described herein are methods diagnosing LOAD in an individual and methods of identifying an individual's risk of developing LOAD involving the use of SNPs.
  • Examples of these methods include obtaining a sample from the individual, purifying genomic DNA from the sample; genotyping one or more single nucleotide polymorphisms SNPs associated with LOAD (e.g., one or more SNPs in the MTHFDIL gene); and correlating the presence of one or more of the SNPs associated with LOAD with an increased risk of developing LOAD in the individual.
  • SNPs associated with LOAD e.g., one or more SNPs in the MTHFDIL gene
  • a method of predicting a subject's risk of developing LOAD includes the steps of: obtaining a biological sample from the subject; analyzing the biological sample (e.g., blood, plasma, serum, saliva, urine, and tissue) for the presence of at least one SNP in the MTHFDIL gene (e.g., an SNP selected from Table 2); and correlating the presence of the at least one SNP with an increased risk of developing LOAD.
  • a biological sample e.g., blood, plasma, serum, saliva, urine, and tissue
  • the biological sample e.g., blood, plasma, serum, saliva, urine, and tissue
  • at least one SNP in the MTHFDIL gene e.g., an SNP selected from Table 2
  • the at least one SNP can be an SNP such as rs2075650, rsl 1754661, rs803424, rs2073067, rs2072064, rsl7349743, and rs803422, e.g., rsl 1754661 of chromosome 6q25.1.
  • the method can further include analyzing the biological sample for the presence of a mutation in the APOE gene, and correlating the presence of the at least one SNP in the MTHFDI L gene and the presence of the mutation in the APOE gene with an increased risk of developing LOAD.
  • the method can additionally include the step of analyzing homocysteine levels in the biological sample and correlating increased levels of homocysteine and the presence of the at least one SNP in the MTHFDI L
  • the at least one SNP can be two or more SNPs selected from Table 2.
  • a method of diagnosing LOAD in a subject includes the steps of: obtaining a biological sample (e.g., blood, plasma, serum, saliva, urine, and tissue, etc.) from the subject analyzing the biological sample for the presence of at least one SNP in the MTHFDIL gene; and correlating the presence of the at least one SNP with a diagnosis of LOAD in the subject.
  • the at least one SNP can be, e.g., an SNP selected from Table 2.
  • the at least one SNP can be one or more of the following SNPs: rs2075650, rsi 1754661 , rs803424, rs2073067, rs2072064, rsi 7349743, and rs8G3422, e.g., rs 11754661 of chromosome 6q25.1.
  • the method can further include analyzing the biological sample for the presence of a mutation in the APOE gene, and correlating the presence of the at least one SNP in the MTHFDIL gene and the presence of the mutation in the APOE gene with a diagnosis of LOAD in the subject.
  • the method can additionally include analyzing homocysteine levels in the biological sample and correlating increased levels of homocysteine and the presence of the at least one SNP in the MTHFDI L gene with a diagnosis of LOAD in the subject.
  • the at least one SNP can be two or more SNPs selected from Table 2. Unless otherwise defined, all technical terms used, herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
  • kits for predicting a subject's risk of developing LOAD includes: at least one reagent for analyzing a biological sample from the subject for the presence of at least one SNP in the MTHFDIL gene: at least one control; and instructions for use.
  • the at least one SNP can be an SNP selected from Table 2.
  • the at least one SNP can be one of the following SNPs: rs2075650, rsi 1754661, rs803424, rs2073067, rs2072064, rsl7349743, and rs803422, e.g., rsi 1754661 of chromosome 6q25.1.
  • gene is meant a nucleic acid molecule that codes for a particular protein, or in certain cases, a functional or structural RNA molecule,
  • nucleic acid ' ' or a “nucleic acid molecule” means a chain of two or more nucleotides such as RNA (ribonucleic acid) and DNA (deoxyribonucleic acid).
  • an "allele” may refer to a nucleotide at a SNP position (wherein at least two alternative nucleotides are present in the population at the SNP position, in
  • SUBSTITUTE SHEET RULE 26 may refer to an amino acid residue that is encoded by the codon which contains the SNP position (where the alternative nucleotides that are present in the population at the SNP position form alternative codons that encode different amino acid residues).
  • An "allele” may also be referred, to herein as a "variant”.
  • an amino acid residue that is encoded by a codon containing a particular SNP may simply be referred to as being encoded by the SNP,
  • complementary means that two sequences are complementary when the sequence of one can bind to the sequence of the other in an anti-parallel sense wherein the 3'- end of each sequence binds to the 5'-erid of the other sequence and each A, T(U), G, and C of one sequence is then aligned with a T(U), A, C, and G, respectively, of the other sequence.
  • complementary sequence as it refers to a polynucleotide sequence, relates to the base sequence in another nucleic acid molecule by the base-pairing rules. More particularly, the term or like term refers to the hybridization or base pairing between nucleotides or nucleic acids, such as, for instance, between the two strands of a double stranded DNA molecule or between an oligonucleotide orimer and a primer binding site on a single stranded nucleic acid to be sequenced or amplified.
  • Complementary nucleotides are, generally, A and T (or A and U), or C and.
  • Two single stranded RNA or DN molecules are said to be substantially complementary when the nucleotides of one strand, optimally aligned and compared and with appropriate nucleotide insertions or deletions, pair with at least about 95% of the nucleotides of the other strand, usually at least about 98%, and more preferably from about 99 % to about 100%.
  • Complementary polynucleotide sequences can be identified by a variety of approaches including use of well-known computer algorithms and software, for example the BLAST program.
  • biomarker and “genetic marker” and “marker” are used interchangeably herein and as used herein refer to a region of a nucleotide sequence (e.g., in a chromosome) that is subject to variability (i.e., the region can be polymorphic for a variety of alleles).
  • a SNP in a nucleotide sequence is a biomarker that is polymorphic for two alleles.
  • Other examples of biomarkers of this invention can include but are not limited to microsateilites, restriction fragment length polymorphisms (RFLPs), repeats (i.e., duplications), insertions, deletions, etc.
  • RFLPs restriction fragment length polymorphisms
  • risk genes By the phrases “risk genes,” “risk genes for diseases,” and “disease loci” is meant genetic variants that confer an increased likelihood of developing disease.
  • arrays are used herein interchangeably to refer to an array of distinct polynucleotides affixed to a substrate, such as
  • the polynucleotides can be synthesized directly on the substrate, or synthesized separate from the substrate and then affixed to the substrate, Microarrays can be prepared and used by a number of methods, including those described in U.S. Pat. No. 5,837,832 (Chee et al), PCT application W095/1 1995 (Chee et a!.), LocHiart, D. J. et al. (Nat. Biotech. 14: 1675- 1680, 1996) and Schena, M. et a!. (Proc. Natl. Acad. Sci. 93: 10614-10619, 1996), all of which are incorporated herein in their entirety by reference.
  • such arrays can be produced by the methods described by Brown et al, U.S. Pat. No. 5,807,522.
  • the term "convergence study” means an evidence-based evaluation of multiple independent experimental methods that substantiate the role of a gene in disease.
  • patient means a mammalian (e.g., human) subject to be diagnosed and/or treated and/or to obtain a biological sample from.
  • subject e.g., human
  • FIG. 1 is a pair of Quantile-Quantile plots for 483,399 single SNP tests of association. Plots depict expected versus observed -fog 10 P-values for 483,399 single SNP tests of association (in 931 LOAD cases and 1 ,104 cognitive controls, with, adjustment for principal components as covariates for population substructure). Plot A includes the most- strongly associated SNPs within the APOE locus, whereas plot B excludes the three most- strongly associated SNPs for clarity.
  • FIG. 2 is a Manhattan plot of SNP associations in MTHFD1L, on chromosome 6 between 151.2Mb and 151.3Mb. Plot of -foglO P-values for single SNP tests of association with LOAD with adjustment for population substructure for the chromosome 6
  • the orange line below the x-axis depicts the exons (thick line) and introns (thin line) of the MTRFDIL, oriented 5' to 3' from, left to right.
  • FIG. 3 is a pair of plots of % 10 P-values for 483,399 single SNP tests of association (in 931 LOAD cases and 1.104 cognitive controls, with adjustment for principal components as covariates for population substructure).
  • Plot A includes association results from all SNPs within the APOE locus, whereas plot B excludes the three most-strongly associated SNPs for clarify.
  • FIG. 4 shows an ID (Plot A: D Plot B: f) between 130 SNPs genotyped in
  • Described herein are results from an analysis of genome -wide association in a discovery dataset of 931 cases and 1,104 controls and results from a replication analysis on the strongest associations (P ⁇ 10 '5 ) using genotype data from four existing studies totaling 1,338 cases and 2,003 controls,
  • SNPs were identified that are associated with LOAD, providing methods, assays, reagents and kits for predicting a subject's risk of developing LOAD
  • a typical method of predicting a subject ' s risk of developing LOAD includes obtaining a biological sample from the subject; analyzing the biological sample for the presence of at least one SNP in the MTHFDIL gene: and correlating the presence of the at least one SNP with an increased risk of developing LOAD.
  • the at least one SNP can be one of the SNPs listed in Table 2 (e.g., one or more of rs2075650, rsl 1754661 , rs803424, rs2073067, rs2072064, rs!7349743, and rs803422).
  • a biomarker for predicting LOAD includes at least one SNP on chromosome 6q25.1 and/ or at least one SNP set forth as: rs2075650, rsl 1754661, rs803424, rs2073067, rs2072064, rsl 7349743, and rs803422, variants, mutants, alleles or complementary sequences thereof, in some embodiments, a biomarker for predicting LOAD includes at least one SNP from Table 2.
  • One example of a method of identifying the risk of developing LOAD or diagnosing LOAD in a patient includes identifying in a patient a biomarker set including at least one SNP on chromosome 6q25.1 and'' or at least one SNP set forth as: rs2075650, rsl 1754661 , rs803424, rs2073067, rs2072064, rsl7349743, and rs803422.
  • the biomarkers of this invention can be used individually or in combination.
  • a method of predicting a patient's risk of developing LOAD includes identifying one of the SNPs described herein in a sample from the patient, while in other embodiments, a method of predicting a patient's risk of developing LOAD includes identifying two or more of the SNPs described herein.
  • SNPs are single base positions in DNA at which different alleles, or alternative nucleotides, exist in a population, and. are the most common form of genetic variation in the genome.
  • the SNP position (interchangeably referred to herein as SNP, SNP site, SNP locus.
  • SNP marker, biomarker or marker) is usually preceded by and followed by highly conserved sequences of the allele (e.g., sequences that vary in less than
  • SUBSTITUTE SHEET RULE 26 1 /100 or 1 /1000 members of the populations).
  • An individual may be homozygous or heterozygous for an allele at each SNP position.
  • an SNP is referred to as a "cSNP" to denote that the nucleotide sequence containing the SNP is an amino acid coding sequence.
  • a SNP may arise from a substitution of one nucleotide for another at the polymorphic site. Substitutions can be transitions or transversions. A transition is the replacement of one purine nucleotide by another purine nucleotide, or one pyrimidine by another pyrimidine. A transversion is the replacement of a purine by a pyrimidine, or vice versa.
  • a SNP may also be a single base insertion or deletion variant referred to as an "indel" (Weber et al, Am. J. Hum. Genet. 71 :854-62, 2002).
  • references to SNPs and SNP genotypes include individual SNPs and/or haplotypes. which are groups of SNPs that are generally inherited together. Hapiotypes can have stronger correlations with diseases or other phenotypic effects compared with individual SNPs, and therefore may provide increased diagnostic accuracy in some cases (Stephens et al., Science 293, 489-493, 2001 ).
  • Causative SNPs are those SNPs that produce alterations in gene expression or in the expression, structure, and/or function of a gene product, and therefore are most predictive of a possible clinical phenotype.
  • One such class includes SNPs falling within regions of genes encoding a polypeptide product, i.e. cSNPs.
  • SNPs may result in an alteration of the amino acid sequence of the polypeptide product (i.e., non-synonymous codon changes) and give rise to the expression of a defective or other variant protein. Furthermore, in the case of nonsense mutations, a SNP may lead to premature termination of a polypeptide product. Such variant products can result in a pathological condition, e.g. genetic disease.
  • causative SNPs do not necessarily occur in coding regions; causative SNPs can occur in, for example, any genetic region that can ultimately affect the expression, structure, and/or activity of the protein encoded by a nucleic acid.
  • Such genetic regions include, for example, those involved in transcription, such as SNPs in transcription factor binding domains, SNPs in promoter regions, in areas involved in transcript processing, such as SNPs at intron-exon boundaries that may cause defective splicing, or SNPs in mRNA processing signal sequences such as polyaderrylation signal regions.
  • SNP SNP-associated neurotrophic factor
  • An association study (e.g., GWAS) of a SNP and a specific disorder involves determining the presence or frequency of the SNP allele in biological samples from individuals with the disorder of interest, such as neurodegenerative disease, and comparing the information to that of controls (i.e.. individuals (subjects) who do not have the disorder; controls may be also referred to as "healthy” or "normal” individuals (subjects)) who are preferably of similar age and race.
  • controls i.e. individuals (subjects) who do not have the disorder; controls may be also referred to as "healthy” or "normal” individuals (subjects)
  • the appropriate selection of patients and controls is important to the success of SNP association studies. Therefore, a pool of individuals with well-characterized phenoiypes is desirable.
  • a SNP may be screened in diseased tissue samples or any biological sample obtained from a diseased individual, and compared to control samples, and selected for its increased (or decreased) occurrence in a specific pathological condition, such as pathologies related to neurodegenerative disease (e.g., LOAD).
  • pathological condition such as pathologies related to neurodegenerative disease (e.g., LOAD).
  • LOAD neurodegenerative disease
  • the regions around the SNPs can optionally be thoroughly screened to identify the causative genetic locus or sequences (e.g., the causative SNP/mutation, gene, regulatory region, etc.) that influences the pathological condition or phenotype.
  • Association studies such as GWAS are generally conducted within the general population, in contrast to studies performed on related individuals in affected, families (linkage studies).
  • SNPs associated with LOAD were identified by methods involving GWAS and genomic convergence studies.
  • GWAS use high-throughput genotyping technologies to assay hundreds of thousands of the most common form of genetic variant, the SNP, and relate these variants to diseases or health-related traits.
  • the entire genome can now be interrogated with increased resolution, and increased power to detect disease loci for common diseases.
  • Genomic convergence is the combining of results from multiple sample sets and. multiple genomic technologies to further facilitate the dissection of LOAD and other complex diseases. This convergence across multiple studies effectively increases sample sizes, decreases false-positives, and adds layers of replication to the initial results. This allows the detection of consistent effects that may otherwise be lost in the flood, of statistical - genomic data.
  • the detection of a biomarker e.g., one or more of the SNPs described herein associated with a risk of developing LOAD
  • a biomarker e.g., one or more of the SNPs described herein associated with a risk of developing LOAD
  • DNA is obtained from any suitable sample from the subject that will contain DNA, preferably genomic DNA, and the DNA is then prepared and. analyzed, according to well-established protocols for the presence of biomarkers.
  • analysis of the DNA can be carried out by amplification of the region of interest according to amplification protocols well known in the art (e.g., polymerase chain reaction, ligase chain reaction, strand displacement amplification, transcription-based amplification, self-sustained sequence replication (3 SR.), Qp replicase protocols, nucleic acid sequence-based amplification (NASBA), repair chain reaction (RCR) and boomerang DNA amplification (BDA)).
  • amplification product can then be visualized directly in a gel by staining or the product can be detected by hybridization with a detectable probe.
  • oligonucleotides for use as primers and/or probes for detecting and/or identifying biomarkers according to the methods described herein.
  • a typical method of predicting a subject's risk of developing LOAD includes obtaining a biological sample from the subject; analyzing the biological sample for the presence of at least one SNP in the MTHFDiL. gene; and correlating the presence of the at least one SNP with an increased risk of developing LOAD,
  • the at least one SNP can be one of the SNPs listed in Table 2 (e.g., one or more of rs2075650, rsl 1754661, rs803424, rs2073067, rs2072064, rsl7349743, and rs803422).
  • the at least one SNP is the at least one SNP is rsl 1754661 of chromosome 6q25.1.
  • the method can further
  • SUBSTITUTE SHEET RULE 26 include analyzing the biological sample for the presence of a mutation in the APOE gene, and correlating the presence of the at least one SNP in the MTHFDI L gene and the presence of the mutation in the APOE gene with an increased, risk of developing LOAD. Additionally or alternatively, the method can include analyzing homocysteine levels in the biological sample and correlating increased le vels of homocysteine and the presence of the at least one SNP in the MTHFDIL gene with an increased risk of developing LOAD. Typically, an increased risk is relative to the risk in an individual who does not have the at least one SNP in the MTHFDIL gene.
  • Typical sources of DNA include amniotic fluid, serum, blood, buccal cells, urine, whole genome amplified DNA, plasma, cells, and tissue (e.g. brain).
  • Any suitable platform or method of genotyping a plurality of samples can be used.
  • a platform is a series of arrays or chips on which high- throughput genotyping is performed.
  • genotyping platforms are commercially available, including those from Affymetrix, Ulumina, and Perlegen, For example, one can use Ulumina' s inium II assay on the HumanHap550 Beadchip to genotype over half a million SNPs in each case and control participant.
  • genes, gene names, and gene products disclosed herein are intended, to correspond to homologs from any species for which the compositions and methods disclosed herein are applicable.
  • the terms include, but are not limited to, genes and gene products from humans and mice. It is understood that when a gene or gene product from a particular species is disclosed, this disclosure is intended to be exemplary only, and is not to be interpreted as a limitation unless the context in which it appears clearly indicates.
  • the genes disclosed, herein which in some embodiments relate to mammalian nucleic acid and amino acid sequences are intended to encompass homologous and/or orthologous genes and gene products from other animals including, but not limited, to other mammals, fish, amphibians, reptiles, and birds.
  • the genes or nucleic acid sequences are human.
  • GWAS Genome-wide association studies
  • MTHFDIL a gene involved in the tetrahydrofoiate synthesis pathway
  • MTHFDIL is an excellent candidate for predicting LOAD risk on account of its involvement in folate-pafhway abnormalities linked with homocysteine. Detection of abnormalities in the folate-pathway linked with homocysteine may be used as an index (e.g., a panel of biomarkers) to predict the probability of developing LOAD. Thus, high homocysteine levels can be used in the methods described herein as a biomarker to predict the increased risk of LOAD, The aging population may find particular use for this biomarker in predicting a risk of developing LOAD. In some embodiments, this biomarker may be used for diagnosing LOAD in a subject.
  • Table 1 depicts the demographic characteristics of the case and control samples examined in initial association analyses.
  • Table 2 shows the strongest associations (P ⁇ 10-5) from a GWAS of late- onset Alzheimer Disease.
  • Single nucleotide polymorphisms (SNPs) demonstrating association with late-onset Alzheimer Disease at P ⁇ 10-5 in association tests adjusting for covariates from principal components capturing population substructure, evaluated in the Discovery genome-wide association study (GWAS) dataset of 931 independent cases and 1 ,104 independent cognitively normal controls, in the Replication GWAS dataset of 1,242 independent cases and. 1,737 independent controls, and in the Combined GWAS dataset of 2, 174 cases and 2, 181 controls.
  • GWAS Discovery genome-wide association study
  • Table 3 shows SNPs demonstrating association with late-onset Alzheimer
  • Table 4 shows genotyped and imputed SNPs demonstrating association with late-onset Alzheimer Disease at P ⁇ 10 "4 in association tests adjusting for covariates from principal components capturing population substructure, evaluated in the Discovery genome- wide association study (GWAS) dataset of 931 independent cases and 1 , 104 independent cognitively normal controls, in the Replication GWAS dataset of 1,242 independent cases and 1 ,737 independent controls, and in the Combined GWAS dataset of 2,1 74 cases and 2,181 controls.
  • GWAS Discovery genome- wide association study
  • Table 5 shows the results from a follow-up of the strongest associations reported in the Beecham et al. (Am J Hum Genet vol. 84: 35-43, 2009) GWAS of late-onset Alzheimer Disease. 32 S ' NPs demonstrating the strongest association with late-onset Alzheimer Disease at P ⁇ 10 "5 in the Beecham et al. (Am J Hum Genet vol. 84:35-43, 2009) GWAS of late-onset Alzheimer's Disease, tested here for association with adjustment for covariates from principal components capturing population substructure, evaluated, in the Discovery genome-wide association study (GWAS) daiaset of 931 independent cases and 1 , 104 independent cognitively normal controls.
  • GWAS Discovery genome-wide association study
  • LOAD .70x l0 ⁇ s ; Bonferroni- corrected / ).022
  • We found no evidence of a difference in genotype frequencies among controls across subsets by genotyping platform (Fisher's exact test P 0 ⁇ ) or by study center (Fisher's exact test PNX95, Table 6).
  • Table 6 shows genotype frequency distributions and differences in three subsets of a GWAS dataset for SNPs with strong associations with late-onset Alzheimer Disease.
  • Table 7 shows demographic characteristics of participants, subsetted by study center, autopsy or clinical confirmation of case or control status, and by genotyping pl form (Mean ⁇ SD or Number (Perce t)).
  • Table 8 shows changes in effect size and p-value with additional covariate adjustment for age, sex, and presence/absence of the APOE ⁇ 4 allele for SNP associations demonstrating P ⁇ 10° in preliminary analyses of late-onset Alzheimer Disease.
  • haploiypes of MTHFDIL which included this SNP to identify potential markers for untyped variants associated with LOAD.
  • Two haploiypes (the first comprising rs2073066- rsl l 754661-rsl3201018, the second comprising rs2839947-rsl 1757561-rs2073066- rsl l 754661-rsI 32QlQ18) both containing the risk-increasing
  • a allele of rsl 1754661 had highly statistically significant associations similar to the genotypic association of rsl 1754661 [P 4.i> ( M ( ) x and /' 6.5 -1 i 0 ⁇ respectively) (Table 9). Both haploiypes had similar frequencies (MHF) to the A allele of rsl 1754661 (MHF-0.0696 and MHF
  • Table 9 shows associations with late-onset Alzheimer Disease oi MTHFDIL haplo types incorporating SNP rs 11754661 , with adjustment for covariates from principal components capturing population substructure, evaluated in the Discovery GWAS dataset of 931 independent cases and 1 ,104 independent cognitiveiy normal controls.
  • Figure 1 shows the - 10-transformed P- values for single SNP tests of association in the MTHFDIL and 50kb flanking region (151.2Mb- 153.3Mb) surrounding the chromosome 6 association signal at rsl 1754661, among both SNPs genotyped in the initial GWAS and those genotyped subsequently.
  • CAP Collaborative Alzheimer's Project
  • MSBB Mount Sinai Brain Bank
  • NCRAD National Cell Repository for Alzheimer's Disease
  • Genotyping efficiency was greater than 99%, and quality assurance was achieved by the inclusion of one CEPH control per 96- well plate that was genotyped multiple times. Technicians were blinded to affection status and quality-control samples.
  • genotype data was available on 870,954 SNPs (after quality control) using the Illumina 1M BeadCliip on 440 cases and 437 controls, while genotype data on 490,960 SNPs (after quality control) from the Illumina 610Quad BeadCliip was available on 172 controls.
  • sample efficiency is the proportion of valid genotype calls to attempted calls within a sample.
  • Samples with efficiency less than 0.98 were dropped from the analysis.
  • Reported gender and genetic gender were examined with the use of X-liiiked SNPs; 32 inconsistent samples were dropped from the analysis.
  • Relatedness between samples was tested via the program Graphical Representation of Relatedness (GRR) (Abecasis et al, Bioinformatics vol 17:742-743, 2001), and 3 related samples were dropped from the analysis.
  • GRR Graphical Representation of Relatedness
  • association analysis was performed using logistic regression to test association of genotypes with LOAD under an additive model. Logistic regression was used to permit covariate adjustment for loadings taken from the first three principal components identified in EIGENST AT to account for population substructure. Flere we report results from logistic regression models adjusting only for population substructure with principal components. Further regression modeling was also performed on SNPs with initial associations of P ⁇ 10 "5 , extending models to adjust for APOE genotype (designated as the number of ⁇ 4 alleles), age-at-onset in cases and age-at-exam in controls, and gender as covariates (Table 7). Ail analyses were performed using the PLINK software package (Purcell et al., Am J Hum Genet vol. 81 :559-575. 2007). Quanti!e-quantile plots of the associations were made ( Figure 1 ), and suggest the absence of systematic bias in the tests of association.
  • LOAD GWAS dataset (TGEN) (Reiman et al., Neuron vol. 54:713-720, 2007) (859 cases and 552 controls), and an additional set of LOAD cases and controls independent of the Discovery dataset and not used in prior publications (ADRC) (Slifer et al., Am J Med Genet B Neuropsychiatr Genet vol. 141B:208-213, 2006) (246 LOAD cases and 69 cognitively normal controls).
  • Haplotypes were constructed using LD blocks assigned by Haploview, including the LD block containing the SNP (consisting of rs2073066, rsl 1754661 , and rsl 3201018; analyzed as "Hapiotype 1") and immediately adjacent to the SNP (consisting of rs2839947 and rsl 1757561 ; analyzed as "Hapiotype 2”) were examined for association.
  • Extended haplotypes were constructed by further incorporating SNPs from LD Blocks immediately adjacent to the Hapiotype 1 block in MTHFD1L, which included the Hapiotype 2 block and a third, larger block incorporating SNPs rs!7348429, rsi7426727, rs803410, rs6917461, rs803407, rs803403, rsl7348890, rsl 7427389, rs9397027, and rsl0484779 (labeled here as "Hapiotype 3").
  • This set of haplotypes included “Extended Haploype 1” (comprising SNPs from Hapiotype 1 and 2 blocks), and “Extended Hapiotype 2” (comprising SNPs from Hapiotype 1, 2, and 3 blocks).
  • Haploiypic association tests were performed in a manner similar to genotypic association analysis, using a logistic regression approach with covariate adjusment for loadings taken from the first three principal components identified in EIGENSTRAT (Price et al, Nat Genet 38: 904-909) to account for population substructure, 2006). All analyses were performed using the "--hap-logistic" function in the PLTNK software package (Purcell et al, Am J Hum Genet 81 : 559-575, 2007),
  • MTHFD1L which encodes the meihylenetetrahydrofolate dehydrogenase (NADP+ dependent) 1 -like protein, is involved in tetrahydrofolate (THF) synthesis, catalyzing the reversible synthesis of lO-formyl-THF to formate and THF, an important step in homocysteine conversion to methionine.
  • THF tetrahydrofolate
  • Homocysteic acid derived from homocysteine and methionine, is elevated in these mice and treatment with antibodies to homocysteic acid reduced amyloid burden and inhibited cognitive decline in these animals [40].
  • B6--defieient diets lead to further increases in homocysteic acid in these mice.
  • MTHFD1L a novel association with experiment- wide statistical significance in a gene with a potential biological role
  • SUBSTITUTE SHEET RULE 26 replicated this association in additional publicly-available genomewide association datasets, and observed statistically significant association with a similar effect size and direction at this SNP. Irs summary, MTHFD1L is an excellent candidate for LOAD on account of its involvement in folate -pathway abnormalities linked, with homocysteine, a significant biological risk factor for AD.

Abstract

An analysis of genome-wide association in a discovery dataset of 931 cases and 1,104 controls and results from a replication analysis on the strongest associations (P<10-5) using genotype data from four existing studies totaling 1,338 cases and 2,003 controls identified SNPs that are associated with late-onset Alzheimer's Disease (LOAD), providing methods, assays, reagents and kits for predicting a subject's risk of developing LOAD. A typical method of predicting a subject's risk of developing LOAD includes obtaining a biological sample from the subject; analyzing the biological sample for the presence of at least one SNP in the MTHFD1L gene; and correlating the presence of the at least one SNP with an increased risk of developing LOAD. The at least one SNP can be one of the SNPs listed in Table 2 (e.g., one or more of rs2075650, rs11754661, rs803424, rs2073067, rs2072064, rs17349743, and rs803422).

Description

METHODS AND COMPOSITIONS FOR DIAGNOSING AND PREDICTING RISK OF LATE-ONSET ALZHEIMER'S DISEASE
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001 ] This application claims the benefit of Provisional Application Serial No. 61/317,900 filed March 26, 2010, which is herein incorporated by reference in its entirety.
FIELD OF THE INVENTION
[0002] The invention relates generally to the fields of genetics and medicine. More particularly, the invention relates to methods of and biomarkers for diagnosing and predicting risk of late-onset Alzheimer's Disease,
BACKGROUND
[0003] Alzheimer Disease (AD) [M 104300] is a neurodegenerative disorder characterized by memory and cognitive impairment affecting more than 13% of individuals aged 65 years and older (Alzheimer's Association 2009 Alzheimer's Disease Facts and Figures, Washington, D.C.; Hebert et al, Arch Neurol vol. 60: 11 19-1 122, 2003) and constitutes the most common form of dementia among older adults. While several major genes contributing to risk of Alzheimer Disease have been identified. (APP , PS1, PS2), all but one (APOE (Saunders et al, Neurology vol. 43: 1467-1472, 1993; Corder et al, Science vol. 261:921 -923, 1993; Strittmatter et al, Proc Natl Acad Sci USA vol. 90: 1977-1981, 1993) contributed predominantly to early -onset forms of AD that cluster within families; other than APOE, few consistent association signals have been observed for late-onset AD (LOAD). Recent estimates of the heritability of LOAD fall between 60% and 80% (Gatz et al, J Gerontol A Biol Sci Med Sci vol. 52:M117-125, 1997), However, while APOE 84-alleles elevate AD risk, only 50% of AD cases cany an APOE ε4 allele, suggesting genetic factors elsewhere in the genome contribute to AD risk (Huang et al, Arch Neuroi vol. 61: 1930-1934, 2004).
[0004] Studies have been conducted that have tested, association with LOAD on genome- wide panels of single nucleotide polymorphisms (SNPs). However, it remains unlikely that these studies, incorporating cases and controls from multiple samples with varying case/control inclusion criteria, have identified all loci with modest effect sizes in LOAD,
SUBSTITUTE SHEET RULE 26 SUMMARY
[0005] The invention relates to the development of reagents and methods for diagnosing LOAD and for determining a subject's risk of developing LO AD. Studies looking for genetic variants across the genome that affect LOAD have had little success identifying genes other than APOE. Here, an expanded set of AD cases and controls was used to improve the power to detect genetic variants driving LOAD risk. Analyzing 483,399 genetic variants across the genome in a discover}' dataset of 931 cases and 1 ,104 controls, a strong association to the marker rs 1 1754661 on chromosome 6 in the gene MTHFDIL was found, in addition to the highly replicated chromosome 19 APOE association. Adjacent variants on chromosome 6 were also genot ped in these same cases and controls, and it was found that these variants were also associated with LOAD. The association was replicated with rsl 1754661 and additional SNPs in MTHFDIL in a combined dataset of cases and controls, some from publicly-available datasets. This finding is important because the gene is known to be involved in biological pathways influencing levels of homocysteine, a significant risk factor for AD. Described herein are methods diagnosing LOAD in an individual and methods of identifying an individual's risk of developing LOAD involving the use of SNPs. Examples of these methods include obtaining a sample from the individual, purifying genomic DNA from the sample; genotyping one or more single nucleotide polymorphisms SNPs associated with LOAD (e.g., one or more SNPs in the MTHFDIL gene); and correlating the presence of one or more of the SNPs associated with LOAD with an increased risk of developing LOAD in the individual.
[0006] Accordingly, described herein is a method of predicting a subject's risk of developing LOAD, The method includes the steps of: obtaining a biological sample from the subject; analyzing the biological sample (e.g., blood, plasma, serum, saliva, urine, and tissue) for the presence of at least one SNP in the MTHFDIL gene (e.g., an SNP selected from Table 2); and correlating the presence of the at least one SNP with an increased risk of developing LOAD. The at least one SNP can be an SNP such as rs2075650, rsl 1754661, rs803424, rs2073067, rs2072064, rsl7349743, and rs803422, e.g., rsl 1754661 of chromosome 6q25.1. The method can further include analyzing the biological sample for the presence of a mutation in the APOE gene, and correlating the presence of the at least one SNP in the MTHFDI L gene and the presence of the mutation in the APOE gene with an increased risk of developing LOAD. In one embodiment, the method can additionally include the step of analyzing homocysteine levels in the biological sample and correlating increased levels of homocysteine and the presence of the at least one SNP in the MTHFDI L
SUBSTITUTE SHEET RULE 26 gene with an increased risk of developing LOAD. The at least one SNP can be two or more SNPs selected from Table 2.
[0007] Also described herein is a method of diagnosing LOAD in a subject. The method includes the steps of: obtaining a biological sample (e.g., blood, plasma, serum, saliva, urine, and tissue, etc.) from the subject analyzing the biological sample for the presence of at least one SNP in the MTHFDIL gene; and correlating the presence of the at least one SNP with a diagnosis of LOAD in the subject. The at least one SNP can be, e.g., an SNP selected from Table 2. The at least one SNP can be one or more of the following SNPs: rs2075650, rsi 1754661 , rs803424, rs2073067, rs2072064, rsi 7349743, and rs8G3422, e.g., rs 11754661 of chromosome 6q25.1. The method can further include analyzing the biological sample for the presence of a mutation in the APOE gene, and correlating the presence of the at least one SNP in the MTHFDIL gene and the presence of the mutation in the APOE gene with a diagnosis of LOAD in the subject. In one embodiment, the method can additionally include analyzing homocysteine levels in the biological sample and correlating increased levels of homocysteine and the presence of the at least one SNP in the MTHFDI L gene with a diagnosis of LOAD in the subject. The at least one SNP can be two or more SNPs selected from Table 2. Unless otherwise defined, all technical terms used, herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
[0008] Further described herein is a kit for predicting a subject's risk of developing LOAD. A typical kit includes: at least one reagent for analyzing a biological sample from the subject for the presence of at least one SNP in the MTHFDIL gene: at least one control; and instructions for use. The at least one SNP can be an SNP selected from Table 2. The at least one SNP can be one of the following SNPs: rs2075650, rsi 1754661, rs803424, rs2073067, rs2072064, rsl7349743, and rs803422, e.g., rsi 1754661 of chromosome 6q25.1.
[0009] As used herein, "protein" and '"polypeptide" are used synonymously to mean any peptide-linked chain of amino acids, regardless of length or post-translational modification, e.g., glycosylation or phosphorylation.
[00010] By the term "gene" is meant a nucleic acid molecule that codes for a particular protein, or in certain cases, a functional or structural RNA molecule,
[00011] As used herein, a "nucleic acid'' or a "nucleic acid molecule" means a chain of two or more nucleotides such as RNA (ribonucleic acid) and DNA (deoxyribonucleic acid).
[0010] As used herein, an "allele" may refer to a nucleotide at a SNP position (wherein at least two alternative nucleotides are present in the population at the SNP position, in
SUBSTITUTE SHEET RULE 26 accordance with the inherent definition of a SNP) or, for cS Ps, may refer to an amino acid residue that is encoded by the codon which contains the SNP position (where the alternative nucleotides that are present in the population at the SNP position form alternative codons that encode different amino acid residues). An "allele" may also be referred, to herein as a "variant". Also, an amino acid residue that is encoded by a codon containing a particular SNP may simply be referred to as being encoded by the SNP,
[0011] The term, "complementary" means that two sequences are complementary when the sequence of one can bind to the sequence of the other in an anti-parallel sense wherein the 3'- end of each sequence binds to the 5'-erid of the other sequence and each A, T(U), G, and C of one sequence is then aligned with a T(U), A, C, and G, respectively, of the other sequence.
[0012] The term "complementary sequence" as it refers to a polynucleotide sequence, relates to the base sequence in another nucleic acid molecule by the base-pairing rules. More particularly, the term or like term refers to the hybridization or base pairing between nucleotides or nucleic acids, such as, for instance, between the two strands of a double stranded DNA molecule or between an oligonucleotide orimer and a primer binding site on a single stranded nucleic acid to be sequenced or amplified. Complementary nucleotides are, generally, A and T (or A and U), or C and. G, Two single stranded RNA or DN molecules are said to be substantially complementary when the nucleotides of one strand, optimally aligned and compared and with appropriate nucleotide insertions or deletions, pair with at least about 95% of the nucleotides of the other strand, usually at least about 98%, and more preferably from about 99 % to about 100%. Complementary polynucleotide sequences can be identified by a variety of approaches including use of well-known computer algorithms and software, for example the BLAST program.
[0013] The terms "biomarker" and "genetic marker" and "marker" are used interchangeably herein and as used herein refer to a region of a nucleotide sequence (e.g., in a chromosome) that is subject to variability (i.e., the region can be polymorphic for a variety of alleles). For example, a SNP in a nucleotide sequence is a biomarker that is polymorphic for two alleles. Other examples of biomarkers of this invention can include but are not limited to microsateilites, restriction fragment length polymorphisms (RFLPs), repeats (i.e., duplications), insertions, deletions, etc.
[0014] By the phrases "risk genes," "risk genes for diseases," and "disease loci" is meant genetic variants that confer an increased likelihood of developing disease.
[0015] The terms "arrays," "microarrays," and "DNA chips" are used herein interchangeably to refer to an array of distinct polynucleotides affixed to a substrate, such as
SUBSTITUTE SHEET RULE 26 glass, plastic, paper, nylon or other type of membrane, filter, chip, or any other suitable solid support. The polynucleotides can be synthesized directly on the substrate, or synthesized separate from the substrate and then affixed to the substrate, Microarrays can be prepared and used by a number of methods, including those described in U.S. Pat. No. 5,837,832 (Chee et al), PCT application W095/1 1995 (Chee et a!.), LocHiart, D. J. et al. (Nat. Biotech. 14: 1675- 1680, 1996) and Schena, M. et a!. (Proc. Natl. Acad. Sci. 93: 10614-10619, 1996), all of which are incorporated herein in their entirety by reference. In other embodiments, such arrays can be produced by the methods described by Brown et al, U.S. Pat. No. 5,807,522.
[001 ] By t e terms "genoraewide association study" and "GWAS" is meant a study of genetic variation across the entire human genome designed to identify genetic association with observable traits or the presence or absence of a disease.
[0017] As used herein, the term "convergence study" means an evidence-based evaluation of multiple independent experimental methods that substantiate the role of a gene in disease.
[0018] The terms "patient," "subject" and "individual" are used interchangeably herein, and mean a mammalian (e.g., human) subject to be diagnosed and/or treated and/or to obtain a biological sample from.
[0019] Although methods and reagents similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods are described below. Ail publications, patent applications, and patents mentioned herein are incorporated by reference in their entirety. In the case of conflict, the present specification, including definitions, will control. The particular embodiments discussed below are illustrative onl and not intended to be limiting.
BRIEF DESCRIPTION OF THE DRAWINGS
[0020] FIG. 1 is a pair of Quantile-Quantile plots for 483,399 single SNP tests of association. Plots depict expected versus observed -fog 10 P-values for 483,399 single SNP tests of association (in 931 LOAD cases and 1 ,104 cognitive controls, with, adjustment for principal components as covariates for population substructure). Plot A includes the most- strongly associated SNPs within the APOE locus, whereas plot B excludes the three most- strongly associated SNPs for clarity.
[0021] FIG. 2 is a Manhattan plot of SNP associations in MTHFD1L, on chromosome 6 between 151.2Mb and 151.3Mb. Plot of -foglO P-values for single SNP tests of association with LOAD with adjustment for population substructure for the chromosome 6
SUBSTITUTE SHEET RULE 26 region from 151.2Mb to 151.3Mb in MTHFDIL. Blue circles depict results for SNPs examined as part of genomewide association testing, whereas the green circle depicts the genome wide significant association of rs 11754661 and the green triangles show the association results of the additional six SNPs proximal to rs 11754661 genotyped on the Taqnian platform. The orange line below the x-axis depicts the exons (thick line) and introns (thin line) of the MTRFDIL, oriented 5' to 3' from, left to right.
[0022] FIG. 3 is a pair of plots of % 10 P-values for 483,399 single SNP tests of association (in 931 LOAD cases and 1.104 cognitive controls, with adjustment for principal components as covariates for population substructure). Plot A includes association results from all SNPs within the APOE locus, whereas plot B excludes the three most-strongly associated SNPs for clarify.
10023] FIG. 4 shows an ID (Plot A: D Plot B: f) between 130 SNPs genotyped in
931 cases and 1,304 controls in and around the gene MTHFDIL (± 50 ki!obasepairs). The SNP with the most significant association, rsl 1754661, is highlighted with a blue arrow in the diagram below.
DETAILED DESCRIPTIO
[0024] Described herein are results from an analysis of genome -wide association in a discovery dataset of 931 cases and 1,104 controls and results from a replication analysis on the strongest associations (P<10'5) using genotype data from four existing studies totaling 1,338 cases and 2,003 controls, As described in greater detail in Example 1 , SNPs were identified that are associated with LOAD, providing methods, assays, reagents and kits for predicting a subject's risk of developing LOAD, A typical method of predicting a subject's risk of developing LOAD includes obtaining a biological sample from the subject; analyzing the biological sample for the presence of at least one SNP in the MTHFDIL gene: and correlating the presence of the at least one SNP with an increased risk of developing LOAD. The at least one SNP can be one of the SNPs listed in Table 2 (e.g., one or more of rs2075650, rsl 1754661 , rs803424, rs2073067, rs2072064, rs!7349743, and rs803422).
[0025] The below described preferred embodiments illustrate adaptations of these methods. Nonetheless, from the description of these embodiments, other aspects of the invention can be made and/or practiced based on the description provided, below.
Biological Methods
[0026] Methods involving conventional molecular biology techniques are described herein. Such techniques are generally known in the art and are described in detail in
SUBSTITUTE SHEET RULE 26 methodology treatises such as Molecular Cloning: A Laboratory Manual, 3rd ed., vol. 1 -3, ed, Sambrook et al, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 2001; and. Current Protocols in Molecular Biology, ed. Ausubei et al., Greene Publishing and Wiley- Interscience, New York, 1992 (with periodic updates). Genotyping techniques and SNP analysis methods are generally known in the art and are described in detail in methodology treatises such as Genetic Analysis of Complex Disease by Jonathan L. Haines and Margaret A. Pericak-Vance, 2nu ed., 2006, Wiiey-Liss Publishing, Hoboken, NJ; and Single Nucleotide Polymorphisms Methods and Protocols (Methods in Molecular Biology) by Pui-Yan Kwok, 1st ed., 2002, Humana Press, New York, New York. Genome-wide association studies are reviewed in Pearson and Manolio, JAMA vol. 299: 1335-1344, 2008. A human haplotype map of over 3.1 million SNPs is described in International HapMap Consortium et al, Nature vol. 449:851-861 , 2007.
Biomarkers and SNPs
[0027] In the experiments described herein, biomarkers were identified as being- associated with risk of LOAD and include SNPs which demonstrated strong association with LOAD risk. In a typical embodiment, a biomarker for predicting LOAD includes at least one SNP on chromosome 6q25.1 and/ or at least one SNP set forth as: rs2075650, rsl 1754661, rs803424, rs2073067, rs2072064, rsl 7349743, and rs803422, variants, mutants, alleles or complementary sequences thereof, in some embodiments, a biomarker for predicting LOAD includes at least one SNP from Table 2. One example of a method of identifying the risk of developing LOAD or diagnosing LOAD in a patient includes identifying in a patient a biomarker set including at least one SNP on chromosome 6q25.1 and'' or at least one SNP set forth as: rs2075650, rsl 1754661 , rs803424, rs2073067, rs2072064, rsl7349743, and rs803422. The biomarkers of this invention can be used individually or in combination. Thus, in some embodiments, a method of predicting a patient's risk of developing LOAD includes identifying one of the SNPs described herein in a sample from the patient, while in other embodiments, a method of predicting a patient's risk of developing LOAD includes identifying two or more of the SNPs described herein.
[0028] The coexistence of multiple forms of a genetic sequence gives rise to genetic polymorphisms, including SNPs. SNPs are single base positions in DNA at which different alleles, or alternative nucleotides, exist in a population, and. are the most common form of genetic variation in the genome. The SNP position (interchangeably referred to herein as SNP, SNP site, SNP locus. SNP marker, biomarker or marker) is usually preceded by and followed by highly conserved sequences of the allele (e.g., sequences that vary in less than
SUBSTITUTE SHEET RULE 26 1 /100 or 1 /1000 members of the populations). An individual may be homozygous or heterozygous for an allele at each SNP position. In some embodiments, an SNP is referred to as a "cSNP" to denote that the nucleotide sequence containing the SNP is an amino acid coding sequence.
[0029] A SNP may arise from a substitution of one nucleotide for another at the polymorphic site. Substitutions can be transitions or transversions. A transition is the replacement of one purine nucleotide by another purine nucleotide, or one pyrimidine by another pyrimidine. A transversion is the replacement of a purine by a pyrimidine, or vice versa. A SNP may also be a single base insertion or deletion variant referred to as an "indel" (Weber et al, Am. J. Hum. Genet. 71 :854-62, 2002).
[0030] As used herein, references to SNPs and SNP genotypes include individual SNPs and/or haplotypes. which are groups of SNPs that are generally inherited together. Hapiotypes can have stronger correlations with diseases or other phenotypic effects compared with individual SNPs, and therefore may provide increased diagnostic accuracy in some cases (Stephens et al., Science 293, 489-493, 2001 ). Causative SNPs are those SNPs that produce alterations in gene expression or in the expression, structure, and/or function of a gene product, and therefore are most predictive of a possible clinical phenotype. One such class includes SNPs falling within regions of genes encoding a polypeptide product, i.e. cSNPs. These SNPs may result in an alteration of the amino acid sequence of the polypeptide product (i.e., non-synonymous codon changes) and give rise to the expression of a defective or other variant protein. Furthermore, in the case of nonsense mutations, a SNP may lead to premature termination of a polypeptide product. Such variant products can result in a pathological condition, e.g. genetic disease.
[0031] Causative SNPs do not necessarily occur in coding regions; causative SNPs can occur in, for example, any genetic region that can ultimately affect the expression, structure, and/or activity of the protein encoded by a nucleic acid. Such genetic regions include, for example, those involved in transcription, such as SNPs in transcription factor binding domains, SNPs in promoter regions, in areas involved in transcript processing, such as SNPs at intron-exon boundaries that may cause defective splicing, or SNPs in mRNA processing signal sequences such as polyaderrylation signal regions. Some SNPs that are not causative SNPs nevertheless are in close association with, and therefore segregate with, a disease-causing sequence. In this situation, the presence of a SNP correlates with the presence of. or predisposition to, or an increased risk in developing the disease. These SNPs, although not causative, are nonetheless also useful for diagnostics, disease predisposition screening, and other uses.
SUBSTITUTE SHEET RULE 26 [0032] An association study (e.g., GWAS) of a SNP and a specific disorder involves determining the presence or frequency of the SNP allele in biological samples from individuals with the disorder of interest, such as neurodegenerative disease, and comparing the information to that of controls (i.e.. individuals (subjects) who do not have the disorder; controls may be also referred to as "healthy" or "normal" individuals (subjects)) who are preferably of similar age and race. The appropriate selection of patients and controls is important to the success of SNP association studies. Therefore, a pool of individuals with well-characterized phenoiypes is desirable.
[00.13] A SNP may be screened in diseased tissue samples or any biological sample obtained from a diseased individual, and compared to control samples, and selected for its increased (or decreased) occurrence in a specific pathological condition, such as pathologies related to neurodegenerative disease (e.g., LOAD). Once a statistically significant association is established between one or more SNPs and a pathological condition (or other phenotype) of interest, then the regions around the SNPs can optionally be thoroughly screened to identify the causative genetic locus or sequences (e.g., the causative SNP/mutation, gene, regulatory region, etc.) that influences the pathological condition or phenotype. Association studies such as GWAS are generally conducted within the general population, in contrast to studies performed on related individuals in affected, families (linkage studies).
GWAS and Genomic Convergence
[0034] Described herein are SNPs associated with LOAD that were identified by methods involving GWAS and genomic convergence studies. GWAS use high-throughput genotyping technologies to assay hundreds of thousands of the most common form of genetic variant, the SNP, and relate these variants to diseases or health-related traits. With the advent of GWAS and genome-wide expression studies, the entire genome can now be interrogated with increased resolution, and increased power to detect disease loci for common diseases. A typical GWAS genotypes 300,000 to one million SNPs and has four parts: selection of a large number of individuals with the disease or trait of interest and a suitable comparison group; DNA isolation, genotyping, and data review to ensure high genotyping quality; statistical tests for associations between the SNPs passing quality thresholds and the disease/trait; and replication of identified associations in an independent population sample or examination of functional implications experimentally, GWAS have already been completed for a variety of complex genetic diseases with varying degrees of success (Maragonore et al., Am. J. Hum. Genet. 77:685, 2005; Herbert et al., Obesity 14: 1454, 2006; Duerr et al, Science 2006;
SUBSTITUTE SHEET RULE 26 Libioiille ei al., PLoS Genet 3: e58, 2007). Two published GWAS have examined LOAD (Coon et al., J. Clin. Psychiatry 68:613, 2007; Li et al, Arch. Neurol. 65:45, 2008).
[0035] Genomic convergence is the combining of results from multiple sample sets and. multiple genomic technologies to further facilitate the dissection of LOAD and other complex diseases. This convergence across multiple studies effectively increases sample sizes, decreases false-positives, and adds layers of replication to the initial results. This allows the detection of consistent effects that may otherwise be lost in the flood, of statistical - genomic data.
Methods of Predicting Risk of Developing LOAD
[0036] In the methods described herein, the detection of a biomarker (e.g., one or more of the SNPs described herein associated with a risk of developing LOAD) in a subject can be carried out according to methods well known in the art. For example DNA is obtained from any suitable sample from the subject that will contain DNA, preferably genomic DNA, and the DNA is then prepared and. analyzed, according to well-established protocols for the presence of biomarkers. In some embodiments, analysis of the DNA can be carried out by amplification of the region of interest according to amplification protocols well known in the art (e.g., polymerase chain reaction, ligase chain reaction, strand displacement amplification, transcription-based amplification, self-sustained sequence replication (3 SR.), Qp replicase protocols, nucleic acid sequence-based amplification (NASBA), repair chain reaction (RCR) and boomerang DNA amplification (BDA)). The amplification product can then be visualized directly in a gel by staining or the product can be detected by hybridization with a detectable probe. When amplification conditions allow for amplification of all allelic types of a biomarker, the types can be distinguished by a variety of well-known methods, such as hybridization with an allele-specific probe, secondary amplification with allele— specific primers, by restriction endonuclease digestion, or by electrophoresis. Also described, herein are oligonucleotides for use as primers and/or probes for detecting and/or identifying biomarkers according to the methods described herein.
[0037] A typical method of predicting a subject's risk of developing LOAD includes obtaining a biological sample from the subject; analyzing the biological sample for the presence of at least one SNP in the MTHFDiL. gene; and correlating the presence of the at least one SNP with an increased risk of developing LOAD, The at least one SNP can be one of the SNPs listed in Table 2 (e.g., one or more of rs2075650, rsl 1754661, rs803424, rs2073067, rs2072064, rsl7349743, and rs803422). In some embodiments, the at least one SNP is the at least one SNP is rsl 1754661 of chromosome 6q25.1. The method can further
SUBSTITUTE SHEET RULE 26 include analyzing the biological sample for the presence of a mutation in the APOE gene, and correlating the presence of the at least one SNP in the MTHFDI L gene and the presence of the mutation in the APOE gene with an increased, risk of developing LOAD. Additionally or alternatively, the method can include analyzing homocysteine levels in the biological sample and correlating increased le vels of homocysteine and the presence of the at least one SNP in the MTHFDIL gene with an increased risk of developing LOAD. Typically, an increased risk is relative to the risk in an individual who does not have the at least one SNP in the MTHFDIL gene.
[00.18] Any suitable biological samples can be used to obtain DNA for genotyping.
Typical sources of DNA include amniotic fluid, serum, blood, buccal cells, urine, whole genome amplified DNA, plasma, cells, and tissue (e.g. brain). Any suitable platform or method of genotyping a plurality of samples can be used. A platform is a series of arrays or chips on which high- throughput genotyping is performed. Several genotyping platforms are commercially available, including those from Affymetrix, Ulumina, and Perlegen, For example, one can use Ulumina' s inium II assay on the HumanHap550 Beadchip to genotype over half a million SNPs in each case and control participant.
[0039] All genes, gene names, and gene products disclosed herein are intended, to correspond to homologs from any species for which the compositions and methods disclosed herein are applicable. Thus, the terms include, but are not limited to, genes and gene products from humans and mice. It is understood that when a gene or gene product from a particular species is disclosed, this disclosure is intended to be exemplary only, and is not to be interpreted as a limitation unless the context in which it appears clearly indicates. Thus, for example, for the genes disclosed, herein, which in some embodiments relate to mammalian nucleic acid and amino acid sequences are intended to encompass homologous and/or orthologous genes and gene products from other animals including, but not limited, to other mammals, fish, amphibians, reptiles, and birds. In preferred embodiments, the genes or nucleic acid sequences are human.
EXAMPLES
[0040] The present invention is further illustrated by the following specific examples. The examples are provided for illustration only and should not be construed as limiting the scope of the invention in any way.
SUBSTITUTE SHEET RULE 26 Example 1 - Novel Chromosome 6 Locus for Late-Onset AD Provides Genetic E vidence for
Folate-Pafh ay Abnormalities
[0041] Genome-wide association studies (GWAS) of LOAD have consistently observed strong evidence of association with polymorphisms in APOE, however, until recently, variants at few other loci with statistically significant associations have replicated across studies. The present study combines data on 483,399 S Ps from a previously- reported GWAS of 492 LOAD cases and 496 controls and from an independent set of 439 LOAD cases and 608 controls to strengthen power to identify novel genetic association signals. Associations exceeding the experiment-wide significance threshold (a=1.03xl 0"7) were replicated in an additional 1 ,338 cases and 2,003 controls. As expected, these analyses unequivocally confirmed APOE's risk effect (rs207565G, P=T .9xlO~ °). Additionally, the SNP rsl 1754661 at 153.2Mb of chromosome 6q25.1 in the gene MTHFDIL (which encodes the methylenetetrahydrofolate dehydrogenase (NADP+ dependent) 1-like protein) was significantly associated with LOAD (Ρ=4.70χ 10-8; Bonferroni-corrected P=0.022). Subsequent genotypmg of SNPs in high linkage disequilibrium (r2 > 0.8) with rsl 1754661 identified statistically significant associations in multiple SNPs (rs803424, P=0.016; rs2073067, P=0.03; rs2Q72064, P=0.035), reducing the likelihood of association due to genotyping error. In the replication case-control set, an association of rsl 1754661 was observed in the same direction as the previous association at P=0.002 (P=1.90x lQ"10 in combined analysis of discovery and replication sets), with associations of similar statistical significance at several adjacent SNPs (rsl7349743, P=0.005; rs803422, P=0.004). In summary, a novel statistically significant association in MTHFDIL, a gene involved in the tetrahydrofoiate synthesis pathway, was observed and replicated. This finding is noteworthy, as MTHFDIL may play a role in the generation of methionine from homocysteine and influence homocysteine-related pathways, and levels of homocysteine are a significant risk factor for LOAD development.
[0042] MTHFDIL is an excellent candidate for predicting LOAD risk on account of its involvement in folate-pafhway abnormalities linked with homocysteine. Detection of abnormalities in the folate-pathway linked with homocysteine may be used as an index (e.g., a panel of biomarkers) to predict the probability of developing LOAD. Thus, high homocysteine levels can be used in the methods described herein as a biomarker to predict the increased risk of LOAD, The aging population may find particular use for this biomarker in predicting a risk of developing LOAD. In some embodiments, this biomarker may be used for diagnosing LOAD in a subject.
SUBSTITUTE SHEET RULE 26 Results
Dataset Characteristics
Table 1 depicts the demographic characteristics of the case and control samples examined in initial association analyses. We examined 931 LOAD cases, average age 74.4 years at onset (standard deviation: ± 8.1 years), and 1, 104 cognitive controls, average age 73.8 years at exam (± 7.8 years) (Table 1). Cases were 64.5% female, while controls were 61.9% female.
Table 1. Demographic characteristics of participants in the study sample (Mean ± SD or Number (Percent)).
All Cases Controls
Number of subjects 2,036 931 1,104
Females (%) 1,284 (63.0%) 601 (64.5%) 683 (61.9%)
Age-at-onset [cases] (yr)/
Age-at-exam [controls]
(yr) " 74.4 ± 8.1 73.8 ± 7.8
APOE ε4 carrier status
■·■■/- carriers (0 copies) 1223 (60.1%) 399 (42.8%) 824 (74.6%)
ε4/- carriers (1 copy) 629 (30.9%) 398 (42.7%) 231 (20.9%)
Figure imgf000014_0001
"Percentage of successfully genotyped SNPs among those attempted.
GWAS Results
11 SNPs had association p-values (P) <i0"5 after adjustment for population substracture (Table 2; F<10~4 in Tables 3 and 4; all association results in. FIG, 3).
SUBSTITUTE SHEET RULE 26 Table 2:
Figure imgf000015_0001
Figure imgf000015_0002
Figure imgf000016_0002
* OR - Odds Ratio
** CI = Confidence Interval
*** Gene Annotation using SNPper database (Riva and ohane, Bioinformaties vol.18: 1681-1685, 2002)
Table 3:
Figure imgf000016_0003
Figure imgf000016_0001
Figure imgf000017_0001
Figure imgf000018_0001
Figure imgf000019_0001
Figure imgf000020_0002
* OR = Odds Ratio *** Freq, = Frequency
** CI = Confidence Interval **** Gene Annotation using SNPper database (Riva and Kohane, Bioinformaties vol. 1 8: 1681- 1685, 2002
Table 4:
Figure imgf000020_0001
Figure imgf000021_0001
Figure imgf000022_0001
Figure imgf000023_0001
Figure imgf000024_0001
Figure imgf000025_0001
Figure imgf000026_0001
Figure imgf000027_0001
Figure imgf000028_0001
Figure imgf000029_0001
Figure imgf000030_0001
Figure imgf000031_0001
Figure imgf000032_0001
Figure imgf000033_0001
Figure imgf000034_0001
* OR = Odds Ratio *** Freq,— Frequency
* * CI = Confidence Interval **** Gene Annotation using SNPper database (Riva and Kohane, Bioinformatics vol. 18: 1681- 1685, 2002
[0045] Table 2 shows the strongest associations (P<10-5) from a GWAS of late- onset Alzheimer Disease. Single nucleotide polymorphisms (SNPs) demonstrating association with late-onset Alzheimer Disease at P<10-5 in association tests adjusting for covariates from principal components capturing population substructure, evaluated in the Discovery genome-wide association study (GWAS) dataset of 931 independent cases and 1 ,104 independent cognitively normal controls, in the Replication GWAS dataset of 1,242 independent cases and. 1,737 independent controls, and in the Combined GWAS dataset of 2, 174 cases and 2, 181 controls.
[0046] Table 3 shows SNPs demonstrating association with late-onset Alzheimer
Disease at P<1(P in association tests adjusting for covariates from principal components capturing population substructure, evaluated in the Discovery genome -wide association study (GWAS) dataset of 931 independent cases and 1 , 104 independent cognitively normal controls, in the Replication GWAS dataset of 1 ,242 independent cases and 1 ,737 independent controls, and in the Combined GWAS dataset of 2,174 cases and 2, 181 controls.
[0047] Table 4 shows genotyped and imputed SNPs demonstrating association with late-onset Alzheimer Disease at P<10"4 in association tests adjusting for covariates from principal components capturing population substructure, evaluated in the Discovery genome- wide association study (GWAS) dataset of 931 independent cases and 1 , 104 independent cognitively normal controls, in the Replication GWAS dataset of 1,242 independent cases and 1 ,737 independent controls, and in the Combined GWAS dataset of 2,1 74 cases and 2,181 controls.
[0048] Although the SNPs defining the APOE ε2, ε3, and ε4 alleles, rs429358 and rs7412, were not included on our genotyping platforms, we independently genotyped these SNPs and tested the association of APOE ε4 with LOAD risk (OR (95% CI): 4.18 (3.51, 4.97); =5.49> 10"58). SNPs adjacent to the APOE haplotype on chromosome 19 otherwise demonstrated the highest associations observed, with the peak association being rs2075650 with /Ή .90Χ 10"36, confirming the expected effect of APOE on LOAD risk in this sample. The most significant non-APOE SNP in our previous GWAS (Beecham et al, Am J Hum Genet vol, 84:35-43, 2009) (Table 5) was rsl 1610206 on 12ql3 (45.92 Mb) with P= 1.43x 10" °; in this study, this SNP was still strongly associated with LOAD (OR (95% CI): 0,67 (0.54, 0.85); =7.70x l0"4), but not with experiment-wide statistical significance.
SUBSTITUTE SHEET RULE 26 Table 5:
Figure imgf000036_0001
Figure imgf000036_0002
Figure imgf000037_0001
Figure imgf000038_0001
* OR - Odds Ratio
** CI = Confidence Interval
*** Gene Annotation using SNPper database (Riva and Kohane, Biomformatics vol. 18: 1681-1685, 2002)
[0049] Table 5 shows the results from a follow-up of the strongest associations reported in the Beecham et al. (Am J Hum Genet vol. 84: 35-43, 2009) GWAS of late-onset Alzheimer Disease. 32 S'NPs demonstrating the strongest association with late-onset Alzheimer Disease at P<10"5 in the Beecham et al. (Am J Hum Genet vol. 84:35-43, 2009) GWAS of late-onset Alzheimer's Disease, tested here for association with adjustment for covariates from principal components capturing population substructure, evaluated, in the Discovery genome-wide association study (GWAS) daiaset of 931 independent cases and 1 , 104 independent cognitively normal controls.
[0050] The SNP rsl 1754661, located at 151.2Mb of chromosome 6q25.1 in the gene MTHFD1L, was significantly associated with LOAD ( = .70x l0~s; Bonferroni- corrected / ).022). To ensure that this association was not spurious due to differences between subsets of genotyped samples, we performed several post-hoc quality control analyses. We examined clustering plots from genotype calling by platform to determine if misclassification could have affected the associations observed for the top 11 SNPs with F<10"s, and observed discrete clustering by genotype for all 5 1 SNPs. We found no evidence of a difference in genotype frequencies among controls across subsets by genotyping platform (Fisher's exact test P=0 \) or by study center (Fisher's exact test PNX95, Table 6).
SUBSTITUTE SHEET RULE 26 Table 6:
Figure imgf000040_0001
Figure imgf000041_0001
* freq. = frequency (percent)
** P-value for deviation from Hardy -Weinberg Equilibrium (HWE) * ** P-value for Fisher's Exact Test (FET)
[0051 ] Table 6 shows genotype frequency distributions and differences in three subsets of a GWAS dataset for SNPs with strong associations with late-onset Alzheimer Disease. Genotype counts, P-values for Hardy-Weinberg Equilibrium (HWE), and P-values for differences in genotypic distribution from a Fisher's Exact Test (FET) comparing controls in SNPs with strong associations with late-onset Alzheimer Disease in three subsets of a GWAS dataset: cognitive!}' normal controls from the previously-published Beecham et al. (Am j Hum Genet 84:35-43, 2009) study ("Beecham et al controls") , cognitively normal controls recruited after the Beecham et al. (Am J Hum Genet 84:35-43, 2009) study ("New AD CoiUrols"), and cogntively normal controls consented for multiple genetic studies whose recruitment was funded through the Udall Parkinson's Disease Collaboration ("Udall Controls").
[0052] We also examined differences in dataset characteristics including variation in age, sex, and APOE ε4 genotype distributions, and found limited differences between subsets by study center, autopsy- or clinical-confirmation of case or control status, and by genotyping platform (Table 7).
SUBSTITUTE SHEET RULE 26 Table 7:
Figure imgf000043_0001
Cas s
Females (%) 312 (63.4%) 289 (65.7%)
Age-a.t-onset (yr) 72.9 ± 6.5 76.5 ±9.7
0 copies of APOE ε4
169 (34.4%) 230 (52.3%)
1 copy of APOE ε4 (%) 234 (47.6%) — 164 (37.3%)
2 copies of APOE ε4
(%) 86(17.5%) — 41 (9.3%) carrier status missing 3 (0.6%) — 5 (1.1%)
Controls
Females (%) 304 (61.4%) 119 (69.2%) 260 (59.5%)
Age -at- exam (yr) 74,2 ± 6.5 71.4 ±7.6 74.4 ±9.1
0 copies of APOE ε4
j
v-»; 377 (76.2%) 114 (66.3%) 333 (76.2%)
1 copy of APOE ε4 (%) 105 (21.2%) 50(29.1%) 76 (17.4%)
2 copies of APOE ε4
ί%ί 10(2.0%) 3 (1.7%) 4 (0.9%) ■ carrier status missme - 3 (0.6%) 5 (2.9%) 24 (5.5%)
[0053] Table 7 shows demographic characteristics of participants, subsetted by study center, autopsy or clinical confirmation of case or control status, and by genotyping pl form (Mean ± SD or Number (Perce t)).
[0054] Subsequently, we examined the first hundred principal components generated from EIGENSTRAT to determine if any of the principal components were associated with both differences in genotyping platform subset and disease status at PO.05 as markers of potential systematic bias. While two principal components other than those used to adjust tor population substructure showed association with both genotyping platform subset and LOAD, additional adjustment for these principal components did not change the strength of association between rsl 1754661 and LOAD. Models further adjusting for age, sex, and APOE ε4 carrier status (+/-) (Table 8) only marginally diminished the effect size and statistical significance of the association of rsl 1754661 with LOAD (adjustment for age and sex, OR (95% CI): 2.03 (1.56, 2.64), J .42xl O"7; adjustment for age, sex and APOE ε4 (+/- ), OR (95% CI): 2.01 ( 1.51, 2.67), =1.64x l0" 6).
SUBSTITUTE SHEET RULE 26 Table 8:
Figure imgf000046_0001
Figure imgf000046_0002
Figure imgf000047_0001
* OR = Odds Ratio
** CI ;;=: Confidence Interval
*** Gene Annotation using S Pper database (Riva and Koharie, Bioinformatics vol, 18:1681-1685, 2002)
[0055] Table 8 shows changes in effect size and p-value with additional covariate adjustment for age, sex, and presence/absence of the APOE ε4 allele for SNP associations demonstrating P<10° in preliminary analyses of late-onset Alzheimer Disease. SNPs demonstrating association with late-onset Alzheimer Disease at i-)<10"s as identified in Table 2, here showing results from logistic regression modeling with (1 ) no additional covariate adjustment, (2) additional covariate adjustment for age-at-onset (years, in cases only) and age-at-exam (years, in controls only) and sex, and (3) additional covariate adjustment for age- at-onset (years, in cases only) and age-at-exam (years, in controls only); sex; and presence presence/absence of the APOE ε4 allele. All models include, at minimum, covariate adjustment for principal components capturing population substructure.
[0056] Furthermore, we examined, the associations in 4 SNPs in linkage disequilibrium (LD) (FIG. 4) of Z) '>G.8 with rsl 1754661, which demonstrated variable patterns of association with LOAD (Figure 2; rs2839947, P=O.Q479; rsl 1757561, -0,000684; rs2073066, P-0.768; rsl3201018, /' 0. 185) It should be noted that due to the low minor allele frequency (MAF) of rsl 1754661 (MAF = 0.07), only one of these SNPs, rsl 1757561 (MAF=O.20), had an r > 0.10 (r -0.23). This SNP had a similar direction of association as rsl 1754461 (OR (95% CI): 1.31 (1.12, 1.53)).
[0057] Based on the pattern of LD in the vicinity of rsl 1754661 , we examined several haploiypes of MTHFDIL which included this SNP to identify potential markers for untyped variants associated with LOAD. Two haploiypes (the first comprising rs2073066- rsl l 754661-rsl3201018, the second comprising rs2839947-rsl 1757561-rs2073066- rsl l 754661-rsI 32QlQ18) both containing the risk-increasing A allele of rsl 1754661 had highly statistically significant associations similar to the genotypic association of rsl 1754661 [P 4.i>(M () x and /' 6.5 -1 i 0 \ respectively) (Table 9). Both haploiypes had similar frequencies (MHF) to the A allele of rsl 1754661 (MHF-0.0696 and MHF-0,0629, respectively).
SUBSTITUTE SHEET RULE 26
Figure imgf000049_0002
Figure imgf000049_0003
Figure imgf000049_0001
SUBSTITUTE SHEET RULE 26 [0058] Table 9 shows associations with late-onset Alzheimer Disease oi MTHFDIL haplo types incorporating SNP rs 11754661 , with adjustment for covariates from principal components capturing population substructure, evaluated in the Discovery GWAS dataset of 931 independent cases and 1 ,104 independent cognitiveiy normal controls.
[0059] in order to ensure that the association we observed at rsl 1754661 was not merely due to genotyping error, we genotyped four additional SNPs in MTHFD1L proximal to and in high LD (r > 0.8) with rsl 1754661. All SNPs but one (rs776552L -0.055) demonstrated associations with nominal statistical significance (rs803424, :;;:Q.0I6; rs2073067, P=0.030; and rs2072064, /M).035). Figure 1 shows the - 10-transformed P- values for single SNP tests of association in the MTHFDIL and 50kb flanking region (151.2Mb- 153.3Mb) surrounding the chromosome 6 association signal at rsl 1754661, among both SNPs genotyped in the initial GWAS and those genotyped subsequently.
[0060] Association analyses of pooled datasets combining data on 1 ,242 cases and
1 ,737 controls confirmed experiment-wide statistically significant associations for SNPs in/near APOE (replication from / ;.2i 10 to PH3.00187) for all but one SNP (rs81 Q6922; discovery P^S. lOxl O"12, replication iM).108) (Table 2), however the direction of association in the replication was consistent across each of these SNPs. The association of the MTHFDIL SNP rsl 1754661 in the replication was both statistically significant (P=0.00187) and showed similar strengt and direction (discovery OR (95%CI): 2,03 (1.58, 2.62); replication OR (95%CI): 2.34 (1 .37, 3.98)).
Association in Combined Discovery and Replication Datasets
[0061 ] In combined analyses, associations in and around the APOE locus were unequivocally strengthened, with the p-values observed ranging from P=3.8x l0~ L to =4.87x i0. Variation at rsl 1754461 was strongly associated { /' 1 .90 ! 0 ' ;i) with an elevated risk of LOAD with OR=2.10 (95% CI: 1.67, 2,64). Several adjacent SNPs also demonstrated nominal associations with similar direction of effect, including rsl 1757561 ( =0.000846) with OR=1.31 (95% CI: 1.12, 1.53) and rs 12195069 (P=0.0432) with OR=1.25 (95% CI: 1.01 , 1.55).
[0062] Two SNPs with only modest statistical significance of association in the discovery GWAS demonstrated highly statistically significant association in analyses combining both discovery and replication datasets ( Tables 3 and. 4), SNPs rs4676049 and rsl7034806, located at 109Mb on chromosome 2q l3, had associations of OR 1.62 (J .88x 10°) and OR=1.61 (P=2.66x10°) respectively in the discovery dataset. However,
SUBSTITUTE SHEET RULE 26 combining disco very and replication datasets, the 8NP associations gained modest strength in effect size (OR=1.76 for rs4676049 and OR=1.75 for rsl 7034806), but the associations now exceeded the threshold for experiment-wide statistical significance, with P=4.31 x i0" for rs4676049 and =5.14x lO"8 for rsl 7034806,
Methods
Ethics Statement
[0063] After complete description of the study to the subjects, written informed consent was obtained from all participants, in agreement with protocols approved by the institutional review board at each contributing center.
Ascertainment
[0064] Discovery dataset cases and controls were clinically ascertained through the
Collaborative Alzheimer's Project (CAP) comprising the University of Miami John P. Hussman Institute for Human Genomics (HIHG) and the Vanderbiit University Center for Human Genetics Research (CHGR), and autopsy-verified cases and controls were collected through the Mount Sinai Brain Bank (MSBB) at the Mount Sinai School of Medicine (see [47]). Additional controls were also identified in the National Cell Repository for Alzheimer's Disease (NCRAD). 266 cases and 643 controls genotyped in the discovery dataset from the NCR AD, HIHG, and CHGR (Edwards et al, Ann Hum Genet vol. 74:97- 109, 2010) and NCR AD were independent from previously published data sets including those from our group's previously published GWAS (Beecham et al., Am J Hum Genet vol. 84:35-43, 2009). All CAP-ascertained cases and controls were recruited and evaluated using standardized criteria and protocols, and case adjudication in the CAP was performed jointly by a Clinical Advisory Board. (CAB) composed of both HIHG and CHGR members, with controls evaluated jointly as well.
[0065] All cases and controls from the HIHG, CHGR, and NCRAD met selection criteria described in the Beecham et al. study (Beecham et al,, Am J Hum Genet vol. 84:35- 43, 2009). Briefly, the study was described and written informed consists were obtained from ail participants, in accordance with institutional review board protocols at each study center. Each individual classified as a LOAD case met the NINCDS-ADRDA criteria for probable or definite AD and. had. an age at onset greater than 60 years of age (McKhanii et al, Neurology vol. 34:939-944, 1984), as determined from specific questions within the clinical history answered by a reliable family informant or from documented significant cognitive impairment in the medical record. Vascular dementia was diagnosed according to contemporary standards (Roman et al., Neurology vol. 43:250-260, 1993) by the CAB, and
SUBSTITUTE SHEET RULE 26 individuals with confirmed vascular dementia or plienotypic uncertainty were excluded from analyses. Cognitive controls were individuals who showed signs of dementia in clinical history or upon interview, and were drawn from spouses, friends, and other biologically unrelated individuals of cases, were frequency-matched by age and gender to the cases, and were located in the same clinical catchment areas. All cognitive controls were examined, and none showed signs of dementia in clinical history or upon interview. Also, each cognitive control had a documented Mini-Mental State Exam (MMSE) score > 27 or a Modified Mini- Mental State (3 MS) Exam score > 87. Clinical history and interview data for C AD controls, including MMSE scores, were made available and collected along with whole blood for DNA extraction for inclusion in our study.
[0066] 306 cases and 81 controls identified in the MSBB were recently deceased patients at the Mount Sinai Medical Center in New York, NY, and had affection status verified through clinical revie and brain autopsy. Neither the cases nor controls examined have been used in previously published studies, Covariates including age at death and sex were abstracted from reviews of medical charts performed by members of the MSBB.
[0067] In total, 572 new cases and 724 new controls were genotyped in this study, and after quality controls measures, combined with data on 492 cases and 496 controls from the previous GWAS [18] for analysis. We also had available for replication from the H1HG, a dataset of 246 cases and 69 cognitiveiy normal controls from a previously described dataset (Slifer et al, Am J Med Genet B Neuropsychiatr Genet vol. 141 B:208-213, 2006).
Geno typing
[0068] We extracted DNA for individuals ascertained by the H1HG, CHGR, MSBB, and NCRAD from whole blood by using Puregene chemistry (Q1AGEN, German town, MD, USA). We performed genotyping using the Illumina Beadstation and the illumina Infmium Human 1M beadchip on 530 cases and 393 controls following the recommended protocol, only with a more strmgent GenCall score threshold of 0.25. Genotyping on 248 controls from the PD GWAS dataset (Edwards et al, Ann Hum Genet vol. 74:97-109, 2010) was performed using the Illumina Infinium Human 610-Quad beadchip. Genotyping efficiency was greater than 99%, and quality assurance was achieved by the inclusion of one CEPH control per 96- well plate that was genotyped multiple times. Technicians were blinded to affection status and quality-control samples. We used Taqman Genotyping Assays for SNPs +3937/rs429358 and. +4075/rs7412 and performed allelic discrimination/genotype calling on the AB1 7900 Taqman system, the results of which were used to determine APOE ε2/ε3/ε4 genotypes.
SUBSTITUTE SHEET RULE 26 [0069] After excluding samples which failed quality control (described in the next section) with low genotypmg call rates, genotype data was available on 870,954 SNPs (after quality control) using the Illumina 1M BeadCliip on 440 cases and 437 controls, while genotype data on 490,960 SNPs (after quality control) from the Illumina 610Quad BeadCliip was available on 172 controls. Combining these data with the 522,366 SNPs on 492 cases and 496 controls in our previous GWAS (Beecham et al, Am J Hum Genet vol, 84:35-43, 2009), a set of 483,399 SNPs common to all platforms was generated thai passed quality control for each subset individually and in a pooled dataset. The Bonferroni-corrected threshold for experiment-wide statistical significance was thus set at Bonferroni-corrected «=1.03 1 sr.
Sample Quality Control
[0070] After genotyping, multiple quality controls were performed including assessment of sample efficiency, which is the proportion of valid genotype calls to attempted calls within a sample. Samples with efficiency less than 0.98 were dropped from the analysis. Reported gender and genetic gender were examined with the use of X-liiiked SNPs; 32 inconsistent samples were dropped from the analysis. Relatedness between samples was tested via the program Graphical Representation of Relatedness (GRR) (Abecasis et al, Bioinformatics vol 17:742-743, 2001), and 3 related samples were dropped from the analysis.
[0071] To determine if population substructure exists in the case-control sample, a set of 10,000 SNPs with MAT > 0.25. selected for minimal beiween-SNP linkage disequilibrium (r < 0.20), and spread evenly across the autosomal chromosomes were analyzed using the program STRUCTURE (Pritchard et al, Genetics vol. 155:945-959, 2000; Pritchard et al., Am J Hum Genet vol 67: 170-181, 2000) (burn in: 5,000, iterations: 25,000) assuming different number of assumed subpopulations (K), The -log likelihood for was maximized at K 3, suggesting population substructure. Further analysis was performed in EIGEN8TR.AT (Price et al., Nat Genet vol. 38:904-909, 2006), where principal components analysis on the sample of 10,000 SNPs was used to generate principal component loadings for samples and remove outliers by using the top ten principal components over 5 iterations with a threshold of six standard deviations. The top three principal component loadings were used as covariates to account for population structure in the association analysis.
[0072] Removing genotyped individuals with low genotype call rates, incorrect reported gender, high relatedness with other samples, and extreme outliers in substructure
SUBSTITUTE SHEET RULE 26 analyses, 440 cases and 608 controls remained for inclusion in analysis, and were combined with 492 cases and 496 controls from the previous G WAS.
SNP Quality Control
[0073] Quality control was performed to remove any low quality SNPs. Genotype clusters were redefined using signal intensities of samples with efficiency greater than 0.98, and genotypes were recalled on the basis of these new clusters per the manufacturer's recommendation. Efficiency of individual SNPs was estimated as the proportion of samples with genotype calls for a given SNP, and SNPs with efficiency less than 0.95 were dropped from analysis. Due to concerns of low statistical power to detect association, SNPs with MAF < 0.005 were dropped from analysis. Hardy-Weinberg Disequilibrium (HWD) statistics were calculated among controls with the Fisher's exact test in the PLINK software package (Purcetl et al, Am J Hum Genet vol. 81:559-575, 2007); SNPs with P<10"6 for HWD were dropped from analysis. In addition, due to concerns with the spurious association originating from the use of different genotyping platforms on samples in the previous and current GWAS studies, distributions of genotype frequencies at each SNP in each study were examined among controls using a Fisher's exact test, and SNPs with highly-differing genotype distributions across genotyping subsets ( <0.00i) were dropped prior to analysis. After these quality control measures, 483,399 SNPs remained for association analysis, Association Analysis
[0074] Association analysis was performed using logistic regression to test association of genotypes with LOAD under an additive model. Logistic regression was used to permit covariate adjustment for loadings taken from the first three principal components identified in EIGENST AT to account for population substructure. Flere we report results from logistic regression models adjusting only for population substructure with principal components. Further regression modeling was also performed on SNPs with initial associations of P<10"5, extending models to adjust for APOE genotype (designated as the number of ε4 alleles), age-at-onset in cases and age-at-exam in controls, and gender as covariates (Table 7). Ail analyses were performed using the PLINK software package (Purcell et al., Am J Hum Genet vol. 81 :559-575. 2007). Quanti!e-quantile plots of the associations were made (Figure 1 ), and suggest the absence of systematic bias in the tests of association.
Imputation and Replication Analysis
[0075] To provide independent replication of the associations observed in the discovery dataset, genome-wide genotyping data were combined from four additional
SUBSTITUTE SHEET RULE 26 datasets (one unpublished and three publicly-available datasets) and missing genotype data imputed using IMPUTE vl .O (Marehini et al, Nat Genet vol. 39:906-913, 2007) (Table 10).
SUBSTITUTE SHEET RULE 26 Table 10:
. ii ati n
Figure imgf000056_0001
SUBSTITUTE SHEET RULE 26 Genotype Genotyp Genotyp Genotyp
TS669397 1 1 105351597 Imputed
d ed ed " ed
Genotype Genotyp
s 10225470 7 54155997 Imputed Imputed Imputed d ed
Genotype Genotyp
rs8074294 17 61 02137 Imputed Imputed Imputed d " ed "
Genotype
rs 1167272 1 49658574 Imputed Imputed imputed Imputed d
Genotype
s 17379721 1 50049596 imputed Imputed Imputed Imputed d
Genotype Genotyp
s 1 1025237 1 1 19726288 Imputed Imputed Imputed d " ed
Genotype Genotyp Genotyp Genotyp Genotyp
«7091819 10 26028836
d ed ed ed * ed
Genotype
s 12083887 1 1 18683212 imputed Imputed Imputed Imputed d
Genotype Genotyp Genotyp Genotyp Genotyp rs8113032 19 60245950
d " ed ed " ed ed
Genotype
s 12135821 1 1 18744972 Imputed Imputed Imputed imputed d
Genotype Genotyp Genotyp Genotypsl 1 131099 J 823802 Imputed
d " " ed * ed ed
Genotype
rs 1893953 4 160693858 Imputed Imputed Imputed Imputed d "
Genotype Genotyp
re4676049 109001689 Imputed Imputed Imputed d ed
Genotype Genotyp
rs7303876 iz 58421479 Imputed Imputed Imputed d ' " ed
Genotype
rsl 856297 1 49810849 Imputed Imputed Imputed Imputed d
Genotype
rs4388744 1 50294191 Imputed Imputed Imputed Imputed d
Genotype
sl 1 100238 4 160696507 Imputed Imputed Imputed Imputed d "
Genotype
rs4489606 1 50287140 Imputed Imputed imputed Imputed d
Genotype
s 12255607 10 20181408 Imputed Imputed Imputed Imputed d
Genotype
rs4926814 1 49729723 Imputed Imputed Imputed Imputed d "
Genotype Genoiyp Genotyp Genotyp rs4926547 1 50319397 Imputed
d ed ed ' ed
Genotype
sl l038913 1 1 46516306 Imputed imputed Imputed Imputed d
Genotype
rs!415985 1 49703336 Imputed Imputed Imputed Imputed d "
type Genotyp
rs2975139 12 Geno
16393084 Imputed Imputed Imputed d ed
Genotype Genotyp Genotyp
s 17034806 2 109002337 Imputed Imputed d ed " ed
SUBSTITUTE SHEET RULE 26 Genotype
rs2305543 19 60251527 Imputed Imputed Imputed Imputed d
Genotype
rs 1 720720 3 65794510 Imputed Imputed Imputed Imputed d
Genotype
rs2529491 110959870 Imputed Imputed Imputed Imputed
d "
Genotype Genotyp
rs 1360873 S J 63489710 Imputed imputed Imputed d ed
Genotype
rs6957883 7 1478701 12 imputed Imputed Imputed Imputed d
Genotype
rsl538981 10 31451361 Imputed Imputed Imputed Imputed
d "
Genotype
rs 1185222 1 49731548 Imputed Imputed Imputed Imputed d
Genotype
rs 1727987 1 49761808 imputed Imputed Imputed Imputed d
Genotype
rs 10888665 1 4991 1493 Imputed Imputed Imputed. Imputed d "
Genotype Genotyp Genotyp Genotyp re2050876 10 31093734 imputed
d ed " ed ed "
Genotype
rs 1167262 1 49646567 Imputed Imputed Imputed imputed d "
Genotype Genotyp
rs 10888679 1 50338853 Imputed Imputed. Imputed d " ed
Genotype Genotyp Genotyp Genotyp rs 1891667 1 49867972 Imputed
d ed ed. ed
Genotype
rs2000886 4 160740930 Imputed Imputed Imputed Imputed
d ' "
Genotype
rsl 112687 1 49841291 Imputed Imputed imputed Imputed d
Genotype Genotyp
rc7019702 9 132897543 Imputed Imputed Imputed d ed
Genotype
rs3957 1 49630498 Imputed Imputed Imputed Imputed
d "
Genotype Genotyp
rs9989763 2 132855872 Imputed Imputed Imputed d ed
Genotype
rs 12049328 1 49463210 Imputed Imputed Imputed Imputed d
Genotype Genotyp Genotyp rs 1343161 1 49883437 imputed Imputed
d " ed " ed
Genotype Genotyp
rs 16974980 16 83530064 Imputed Imputed Imputed d ed
Genotype
rsl3151952 4 1.38539922 Imputed imputed Imputed Imputed d
Genotype Genotyp Genotyp Genotyp rsl713417 14 19933651 Imputed
d " ed ed ed
Genotype
rs6693294 1 49651709 Imputed Imputed Imputed Imputed d
Genotype Genotyp
rs4820297 22 36435927 Imputed Imputed Imputed d ed
SUBSTITUTE SHEET RULE 26 Genotype
rs 1577969 1 49637882 Imputed Imputed Imputed Imputed d
Genotype Genotyp Genotyps 10736388 1 50108115 Imputed Imputed
d ed ed "
Genotype
rsi 112368 1 49932177 Imputed Imputed Imputed Imputed d "
Genotype
rs7530169 1 49941 178 Imputed Imputed imputed Imputed d
Genotype Genotyp
rs5930403 23 129318208 Imputed Imputed Imputed d ed
Genotype Genotyp Genotyp Genotyp rsi 179484 1 49681231 Imputed
d " ed ed " ed
Genotype
rs 1167270 1 49685647 Imputed Imputed Imputed Imputed d
Genotype Genotyp Genotyp
«5977248 23 129329168 Imputed Imputed d ed ed
re 1494462 1 49568744 Genotype
Imputed Imputed Imputed Imputed d "
Genotype Genotyp
rs2301343 40533653 Imputed Imputed imputed d ed
Genotype Genotyp
rs2064179 129236874 Imputed Imputed Imputed d " ed
Genotype
rs2846215 1 1 105376028 Imputed Imputed Imputed Imputed d "
Genotype
rs4926812 1 49586400 Imputed Imputed Imputed Imputed d
Genotype Genotyp Genotyp Genotyp rsl338214 1 49596084 Imputed
d ' " ed ed ed
Genotype
s 10259067 95069392 Imputed Imputed Imputed Imputed d
Genotype Genotyp
s 10244338 7 70359758 Imputed Imputed Imputed d ed
Genotype
rs6588362 1 50066939 Imputed Imputed Imputed Imputed d "
Genotype Genotyp Genotyp Genotyps 10888669 1 50022860 Imputed
d ed ed ed
Genotype
rs2787693 1 49688435 Imputed Imputed Imputed Imputed d
Genotype Genotyp
rs589104 1 1 105312982 Imputed Imputed Imputed d " ed "
Genotype
rs3092217 20 39813482 Imputed Imputed Imputed Imputed d
Genotype
rs4926545 1 50152063 Imputed imputed Imputed Imputed d
Genotype
rs6697839 1 50163925 Imputed Imputed Imputed Imputed d "
Genotype
s 10788924 1 50168618 Imputed Imputed Imputed Imputed d
Genotype Genotyp Genotyp Genotypsi 1603669 1 1 134212161 Imputed
d ed " ed ed
SUBSTITUTE SHEET RULE 26 Genotype Genotyp rs5932752 23 129334460 Imputed imputed Imputed d ed
Genotype
s 12036551 1 49555928 Imputed Imputed Imputed Imputed d
Genotype Genotyp
rs7795083 70376454 Imputed Imputed Imputed d " ed
Genotype Genotyp
rs3798267 6 46058786 Imputed Imputed Imputed d ed
Genotype
TS6029791 20 39794198 imputed Imputed Imputed Imputed d
Genotype Genotyp Genotyp Genotyps 10424969 19 60258324 Imputed
d " ed ed " ed
Genotype
«6693846 1 50087939 Imputed Imputed Imputed Imputed d
Genotype
«7544728 1 50112236 imputed Imputed Imputed Imputed d
Genotype Genotyp rs2832594 21 30388188 Imputed Imputed Imputed d " ed
Genotype Genotyp
rsl 939153 i i 105256936 Imputed Imputed imputed d ed
Genotype Genotyp
s 11842468 13 55833285 Imputed Imputed Imputed ά ed
Genotype
s 12713404 2 59860209 Imputed Imputed Imputed Imputed d "
Genotype
s 12743369 1 50337003 Imputed Imputed Imputed Imputed d
Genotype Genotyp
rs256335 19 39007736 Imputed Imputed Imputed d " " ed
Genotype
rs7540194 1 49563736 Imputed Imputed Imputed Imputed d
Genotype Genotyp
rs2714068 i l 122904751 Imputed Imputed Imputed d ed
Genotype
rs4806636 19 60240370 Imputed Imputed Imputed Imputed d "
Genotype
rs3899856 1 50016441 Imputed Imputed Imputed Imputed d
Genotype
rs6695041 1 50018096 Imputed Imputed Imputed Imputed d
Genotype
rs5932738 23 129242314 Imputed Imputed Imputed Imputed d "
Genotype
rs 1654431 19 60241570 Imputed Imputed Imputed Imputed d
Genotype Genotyp Genotyp rs241472 1 49544440 imputed Imputed
d ed ed
Genotype Genotyp
s 1 1061995 1815303 Imputed Imputed Imputed d " ed "
Genotype Genotyp Genotyp Genotyp Genotyp rs2529489 110947132
d ed ed ed * ed
Genotype
s 12976416 19 34053326 Imputed Imputed Imputed Imputed d
SUBSTITUTE SHEET RULE 26
Figure imgf000061_0001
SUBSTITUTE SHEET RULE 26 [0076] Table 10 shows geno typing or Imputation of SNPs associated with LOAD at
P<10~4. Index indicating whether SNPs demonstrating association with late-onset Alzheimer Disease at P<lQ-4 in association tests adjusting for population substructure in the Discovery dataset where genolyped or imputed in the Discovery dataset (931 independent cases and 1,104 independent cognitively normal controls) or any of the Replication Datasets, including the from the Alzheimer's Disease euroimaging Initiative (ADNI) (Frank et al., Meurobiol Aging vol. 24:521 -536, 2003) (147 cases and 182 controls), the Framingham Study SHARe dataset (SHARe) (Bachman el al, Neurology vol. 2: 1 15-1 19, 1992) (86 cases and 1 ,200 controls (ah unrelated)), the Reiman et al. LOAD GWAS dataset (TGEN) (Reiman et al., Neuron vol. 54:713-720, 2007) (859 cases and 552 controls), and an additional set of LOAD cases and controls independent of the Discovery dataset and not used in prior publications (ADRC) (Slifer et al., Am J Med Genet B Neuropsychiatr Genet vol. 141B:208-213, 2006) (246 LOAD cases and 69 cognitively normal controls).
[0077] SNPs with differing genoiypic distributions between datasets were excluded from imputation using the Fisher's exact test approached described earlier (Ziegler A., Genet Epidemiol vol. 33:845-850, 2009). Both primary and replication datasets were imputed to a HapMap reference of over 2.5 million SNPs, Individual genotypes with probability less than 0.90 were not included, and SNPs missing > 10% of genotypes within either data set were dropped. In addition to using the combined Hapmap Phase III CEPH Utah pedigree (CEU) and Tuscan (TSI) haplotype reference panels for imputation, for imputation within each study, we used genotype data on controls from other datasets to improve imputation accuracy, and Affymetrix 5.0 genotype data on 105 individuals genolyped in an independent Ashkenazi Jewish genotyping panel (IntraGenDB population genetics database).
[0078] We analyzed, existing pooled and imputed datasets of unrelated individuals from several studies: 147 cases and 182 controls from the Alzheimer's Disease Neuroimaging Initiative (ADNI) (Frank et al, Neurobiolo Aging vol. 24:521 -536, 2003), 86 cases and 1 ,200 controls (all unrelated) from the Framingham Study SHARe dataset (Bachman et al, Neurology vol. 42: 115-119, 1992), and 859 cases and 552 controls from the Reiman et ai. (Neuron vol. 54:713-720, 2007) LOAD GWAS dataset, and a set of 246 LOAD cases and 69 cognitively normal controls previously described (Slifer et al., Am J Med Genet B Neuropsycbiatric Genet vol. 1416:208-213, 2006) and genotyped on the Affymetrix 6.0 genotyping platform on which results have not been previously published.
Supplementary Methods
SUBSTITUTE SHEET RULE 26 Hap!otype Analyses
[0079] In order to define haplotypes, Linkage Disequilibrium (LD) structure in the vicinity of SNP rsl 1754661 was examined with the Haploview program (Barrett et al., Bioinformatics 21 : 263-265, 2005) estimating both D ' and r*: measures for this region. Haplotypes were constructed using LD blocks assigned by Haploview, including the LD block containing the SNP (consisting of rs2073066, rsl 1754661 , and rsl 3201018; analyzed as "Hapiotype 1") and immediately adjacent to the SNP (consisting of rs2839947 and rsl 1757561 ; analyzed as "Hapiotype 2") were examined for association. Extended haplotypes were constructed by further incorporating SNPs from LD Blocks immediately adjacent to the Hapiotype 1 block in MTHFD1L, which included the Hapiotype 2 block and a third, larger block incorporating SNPs rs!7348429, rsi7426727, rs803410, rs6917461, rs803407, rs803403, rsl7348890, rsl 7427389, rs9397027, and rsl0484779 (labeled here as "Hapiotype 3"). This set of haplotypes included "Extended Haploype 1" (comprising SNPs from Hapiotype 1 and 2 blocks), and "Extended Hapiotype 2" (comprising SNPs from Hapiotype 1, 2, and 3 blocks).
[0080] Haploiypic association tests were performed in a manner similar to genotypic association analysis, using a logistic regression approach with covariate adjusment for loadings taken from the first three principal components identified in EIGENSTRAT (Price et al, Nat Genet 38: 904-909) to account for population substructure, 2006). All analyses were performed using the "--hap-logistic" function in the PLTNK software package (Purcell et al, Am J Hum Genet 81 : 559-575, 2007),
Summary
[0081 ] MTHFD1L, which encodes the meihylenetetrahydrofolate dehydrogenase (NADP+ dependent) 1 -like protein, is involved in tetrahydrofolate (THF) synthesis, catalyzing the reversible synthesis of lO-formyl-THF to formate and THF, an important step in homocysteine conversion to methionine. On-going biological in estigations are continuing to elucidate the pathways connecting elevated homocysteine with AD, Mthfdli protein has been reported to be decreased in the hippocampus in a mouse model of AD using a proteomic approach (Martin et al, PLoS One 3:e2750, 2008). Homocysteic acid, derived from homocysteine and methionine, is elevated in these mice and treatment with antibodies to homocysteic acid reduced amyloid burden and inhibited cognitive decline in these animals [40]. B6--defieient diets lead to further increases in homocysteic acid in these mice. In this genome-wide association study of LOAD, we identified a novel association with experiment- wide statistical significance in a gene with a potential biological role, MTHFD1L. We
SUBSTITUTE SHEET RULE 26 replicated this association in additional publicly-available genomewide association datasets, and observed statistically significant association with a similar effect size and direction at this SNP. Irs summary, MTHFD1L is an excellent candidate for LOAD on account of its involvement in folate -pathway abnormalities linked, with homocysteine, a significant biological risk factor for AD.
Other Embodiments
[0082] Any improvement may be made in part or all of the method steps. All references, including publications, patent applications, and patents, cited herein are hereby incorporated by reference. The use of any and all examples, or exemplary language (e.g., "such as") provided herein, is intended to illuminate the invention and does not pose a limitation on the scope of the invention unless otherwise claimed. Any statement herein as to the nature or benefits of the invention or of the preferred embodiments is not intended to be limiting, and the appended claims should not be deemed to be limited by such statements. More generally, no language in the specification should be construed as indicating any non- claimed element as being essential to the practice of the invention. This invention includes all modifications and equivalents of the subject matter recited in the claims appended hereto as permitted by applicable law, Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the invention unless otherwise indicated herein or otherwise clearly contraindicated by context.
What is claimed is:
SUBSTITUTE SHEET RULE 26

Claims

1. A method of predicting a subject's risk of developing late-onset Alzheimer's Disease (LOAD), the method comprising the steps of:
a) obtaining a biological sample from the subject;
b) analyzing the biological sample for the presence of at least one single nucleotide polymorphism (SNP) in the MTHFDIL gene; and
c) correlating the presence of the at least one SNP with an increased risk of developing LOAD.
2. The method of claim 1, wherein the at least one SNP is an SNP selected from Table 2.
3. The method of claim 2, wherein the at least one SNP is an SNP selected from the group consisting of: rs2075650, rsl 1754661 , rs803424, rs2073067, rs2072064, rsl 7349743, and rs803422.
4. The method of claim 3, wherei the at least one SNP is rsl 1754661 of chromosome 6q25.1.
5. The method of claim 1 , further comprising analyzing the biological sample for the presence of a mutation in the APOE gene, and correlating the presence of the at least one SNP in the MTHFDIL gene and the presence of the mutation in the APOE gene with an increased risk of developing LOAD.
6. The method of claim 1 , further comprising analyzing homocysteine levels in the biological sample and correlating increased levels of homocysteine and the presence of the at least one SNP in the MTHFDIL gene with an increased risk of developing LOAD.
7. The method of claim 1 , wherein the at least one SNP comprises two or more SNPs selected from Table 2.
8. The method of claim 1, wherein the biological sample is selected from the group consisting of: blood, plasma, serum, saliva, urine, and [issue.
9. A method of diagnosing LOAD in a subject, the method comprising the steps of: a) obtaining a biological sample from the subject;
b) analyzing the biological sample for the presence of at least one SNP in the MIH' FDIL gene; and
c) correlating the presence of the at least one SNP with a diagnosis of LOAD in the subject.
10. The method of claim 9, wherein the at least one SNP is an SNP selected from Table 2.
1 . The method of claim 10, wherem the at least one SNP is an SNP selected from the group consisting of: rs2075650, rsl 1754661, rs803424, ts2073G67, rs2G72064, rsl 7349743, and rs803422.
12. The method of claim 1 1, wherein the at least one SNP is rsl 17546 1 of chromosome 6q25. i.
13. The method of claim 9, further comprising analyzing the biological sample for the presence of a mutation in the APOE gene, and correlating the presence of the at least one SNP in the MTHFD1L gene and the presence of the mutation in the APOE gene with a diagnosis of LOAD in the subject.
14. The method of claim 9, further comprising analyzing homocysteine levels in the biological sample and correlating increased levels of homocysteine and the presence of the at least one SNP in the MTHFD1L gene with a diagnosis of LOAD in the subject.
15. The method of claim 9, wherein the at least one SNP comprises two or more SNPs selected from Table 2.
16. The method of claim 9, wherein the biological sample is selected from the group consisting of: blood, plasma, serum, saliva, urine, and tissue.
17. A kit for predicting a subject's risk of developing LOAD, the kit comprising:
(a) at least one reagent for analyzing a biological sample from the subject for the presence of at least one SNP in the MTHFDIL gene;
(b) at least one control; and
(c) instructions for use.
18. The kit of claim 17, wherein the at least one SNP is an SNP selected from Table 2.
19. The method of claim 1 8, wherein the at least one SNP is an SNP selected from the group consisting of: rs2075650, rsl 1754661 , rs803424, rs2073067, rs2072064, rsi7349743, and rs803422.
20, The method of claim 19, wherein the at least one SNP is rsl 1754661 of chromosome 6q25.1.
PCT/US2011/030199 2010-03-26 2011-03-28 Methods and compositions for diagnosing and predicting risk of late-onset alzheimer's disease WO2011120043A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US31790010P 2010-03-26 2010-03-26
US61/317,900 2010-03-26

Publications (1)

Publication Number Publication Date
WO2011120043A1 true WO2011120043A1 (en) 2011-09-29

Family

ID=44673675

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2011/030199 WO2011120043A1 (en) 2010-03-26 2011-03-28 Methods and compositions for diagnosing and predicting risk of late-onset alzheimer's disease

Country Status (1)

Country Link
WO (1) WO2011120043A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060171938A1 (en) * 2005-02-03 2006-08-03 Stock Jeffry B Compositions and methods for enhancing cognitive function
US20080213775A1 (en) * 2005-06-16 2008-09-04 Government Of The United States Of America, Represented By The Secretary, Dept. Of Health And Methods and materials for identifying polymorphic variants, diagnosing susceptibilities, and treating disease
US20080286796A1 (en) * 2007-05-03 2008-11-20 Applera Corporation Genetic polymorphisms associated with neurodegenerative diseases, methods of detection and uses thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060171938A1 (en) * 2005-02-03 2006-08-03 Stock Jeffry B Compositions and methods for enhancing cognitive function
US20080213775A1 (en) * 2005-06-16 2008-09-04 Government Of The United States Of America, Represented By The Secretary, Dept. Of Health And Methods and materials for identifying polymorphic variants, diagnosing susceptibilities, and treating disease
US20080286796A1 (en) * 2007-05-03 2008-11-20 Applera Corporation Genetic polymorphisms associated with neurodegenerative diseases, methods of detection and uses thereof

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BEECHAM ET AL.: "Genome-wide association study implicates a chromosome 12 risk locus for late-onset Alzheimer disease.", AM. J. HUM. GENET., vol. 84, no. 1, 2009, pages 35 - 43 *
HAROLD ET AL.: "Genome-wide association study identifies variants at CLU and PICALM associated with Alzheimer's disease.", NAT. GENET., vol. 41, no. 10, 2009, pages 1088 - 1093 *
NAJ ET AL.: "Dementia revealed: novel chromosome 6 locus for late-onset Alzheimer disease provides genetic evidence for folate-pathway abnormalities. art. e1001130", PLOS GENET., vol. 6, no. 9, 12 September 2010 (2010-09-12) *

Similar Documents

Publication Publication Date Title
Mathews et al. Genome-wide linkage analysis of obsessive-compulsive disorder implicates chromosome 1p36
JP6496003B2 (en) Genetic marker for predicting responsiveness to FGF-18 compounds
US20100136540A1 (en) Methods and compositions for characterizing patients for clinical outcome trials
JP2010523097A (en) FTO gene polymorphism associated with obesity and / or type 2 diabetes
WO2014181107A9 (en) Genetic method of aiding the diagnosis and treatment of familial hypercholesterolaemia
CA2881027C (en) Prognosis biomarkers in cartilage disorders
EP2948563B1 (en) Method for predicting the onset of extrapyramidal symptoms (eps) induced by an antipsychotic-based treatment
KR101992952B1 (en) Composition, kit for predicting the risk of developing cardiovascular disease related to Cholesterol efflux capacity, and method using the same
Stamm et al. Refinement of 2q and 7p loci in a large multiplex NTD family
WO2011120043A1 (en) Methods and compositions for diagnosing and predicting risk of late-onset alzheimer&#39;s disease
WO2009089273A1 (en) Methods of identifying risk genes for diseases
EP2195456B1 (en) Use of polymorphisms in the tmem132d gene in the prediction and treatment of anxiety disorders
KR102591061B1 (en) Association of gene polymorphisms in NKPD1, APOE, XRCC1 and PEMT with amnestic moderate cognitive impairment using whole exome sequencing.
KR20230117872A (en) rs3120004 marker composition for diagnosing cerebral aneurysm and method of use thereof
KR20230117873A (en) rs1522095 marker composition for diagnosing cerebral aneurysm and method of use thereof
KR20230117875A (en) rs3826442 marker composition for diagnosing cerebral aneurysm and method of use thereof
KR20230117874A (en) rs12935558 marker composition for diagnosing cerebral aneurysm and method of use thereof
KR20230117871A (en) rs7779989 marker composition for diagnosing cerebral aneurysm and method of use thereof
KR20230117876A (en) rs2440154 marker composition for diagnosing cerebral aneurysm and method of use thereof
KR20150092937A (en) SNP Markers for hypertension in Korean
EP2155903A2 (en) Allelic polymorphism associated with diabetes
Stachon Clinical and Molecular Characterization of Psychosis in 22q11 Deletion Syndrome
JP2008141961A (en) Type-2 diabetes-sensitive gene in 10th chromosome long arm region
WO2011004345A1 (en) Upstream binding protein 1 polymorphisms and their use for prognosing or diagnosing arterial blood pressure

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11760375

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11760375

Country of ref document: EP

Kind code of ref document: A1