WO2010022235A2 - Genome-wide association study of autism reveals a comnon novel risk locus at 5p14.1 - Google Patents

Genome-wide association study of autism reveals a comnon novel risk locus at 5p14.1 Download PDF

Info

Publication number
WO2010022235A2
WO2010022235A2 PCT/US2009/054462 US2009054462W WO2010022235A2 WO 2010022235 A2 WO2010022235 A2 WO 2010022235A2 US 2009054462 W US2009054462 W US 2009054462W WO 2010022235 A2 WO2010022235 A2 WO 2010022235A2
Authority
WO
WIPO (PCT)
Prior art keywords
rsl
autism
gene
cadherin
biomarker
Prior art date
Application number
PCT/US2009/054462
Other languages
French (fr)
Other versions
WO2010022235A3 (en
Inventor
Margaret A. Pericak-Vance
Jonathan L. Haines
Original Assignee
University Of Miami
Vanderbilt University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University Of Miami, Vanderbilt University filed Critical University Of Miami
Publication of WO2010022235A2 publication Critical patent/WO2010022235A2/en
Publication of WO2010022235A3 publication Critical patent/WO2010022235A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/136Screening for pharmacological compounds
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • Embodiments of the invention are directed to biomarkers associated with autism and the spectrum of disorders associated with autism.
  • Methods of diagnosis and risk assessment comprise detecting one or more of the biomarkers.
  • Autism is a neurodevelopmental disorder characterized by three primary areas of impairment: social interaction, communication, and restricted and repetitive patterns of interest or behavior (1). It is among a spectrum of disorders (ASDs) with symptoms that may range from quite severe (autistic disorder) to relatively mild (Asperger syndrome). With improved surveillance and a broadening of the diagnostic criteria, the most recent prevalence studies suggest that ASDs may affect as many as 1 in 150 children in the U.S. making it one of the most common of the neurodevelopmental disorders (2). ASDs are most often diagnosed before age four, and are at least three to four times more frequent in males than females (3).
  • ASDs are most often diagnosed before age four, and are at least three to four times more frequent in males than females (3).
  • Embodiments of the invention provide a genome wide association analysis demonstrating that in addition to multiple rare variations, part of the complex genetic architecture of autism involves common variation.
  • Biomarkers identified as being associated with autisms or autism spectrum of disorders include single nucleotide polymorphisms (SNPs) which demonstrated strong association with autism risk.
  • SNPs single nucleotide polymorphisms
  • a novel region on chromosome 5pl4.1 was also identified which showed significance in both the discovery and validation datasets.
  • Biomarkers for diagnosing autism and autism spectrum associated disorders comprise at least one single nucleotide polymorphism on chromosome 5pl4.1 and/ or at least one single polymorphism nucleotide (SNP) set forth as: rs201171, rs4394668, rs4618985, rs 1831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081, rs6707773, rsl3019278, rsl2622496, rs3731723, rs72
  • the single nucleotide polymorphism on chromosome 5pl4.1 comprises: rsl0065041, rs7704909, rsl896731, rslOO38113, rs6894838, rsl2518194, rs4307059, or rs4327572, variants, mutants, alleles or complementary sequences thereof.
  • the biomarker further comprises at least one mutation in a cadherin gene and/or protocadherin gene, variants, mutants, alleles or complementary sequences thereof.
  • the cadherin gene comprises cadherin gene 9, cadherin 10, or protocadherin 10, variants, mutants, alleles or complementary sequences thereof.
  • a method of identifying the risk of developing or diagnosing a patient with autism or autism spectrum of disorders comprising: identifying in a patient a biomarker set comprising: at least one single nucleotide polymorphism on chromosome 5pl4.1 and/ or at least one single polymorphism nucleotide (SNP) set forth as: rs201171, rs4394668, rs4618985, rsl831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl 866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081,
  • the cadherin gene comprises cadherin gene 9, cadherin 10, or protocadherin 10.
  • a biomarker for diagnostically distinguishing between autism and autism spectrum associated disorders comprising: identifying in a patient a biomarker set comprising at least one single nucleotide polymorphism on chromosome 5pl4.1 and/ or at least one single polymorphism nucleotide (SNP) set forth as: rs201171, rs4394668, rs4618985, rsl831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl866206, rs4852531, rsl 1679682, rsl
  • SNP single polymorphism nucleotide
  • the biomarker comprises a mutation in a cadherin gene, and/or 5' to cadherin gene 10 (CDHlO) and 3' to cadherin gene 9 (CDH9), cadherin gene 9, cadherin 10, or protocadherin 10.
  • a method of diagnosing a patient pre-natally or post-natally with autism comprising: detecting at least one single nucleotide polymorphism on chromosome 5pl4.1 and/or at least one single nucleotide polymorphism comprising: rs201171, rs4394668, rs4618985, rsl831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl 866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081, rs6707773, rsl3019278, rsl26224
  • the sample comprises: amniotic fluid, serum, blood, plasma, cells or tissue.
  • a method of identifying a marker for the diagnosis of autism and autism spectrum disorders comprises obtaining a sample from individuals and their families and purifying genomic DNA from the sample; genotyping single nucleotide polymorphisms (SNP); assessing the single nucleotide polymorphisms.
  • the samples are assessed in genome -wide association analysis.
  • a biomarker for identifying a patient at risk of developing or for the diagnosis of autism and autism spectrum of disorders comprising: a mutation in a cadherin gene and/or 5' to cadherin gene 10 (CDHlO) and 3' to cadherin gene 9 (CDH9).
  • the cadherin gene comprises cadherin gene 9, cadherin 10, or protocadherin 10.
  • one or more biomarkers further comprise at least one single nucleotide polymorphism on chromosome 5pl4.1 and/ or at least one single polymorphism nucleotide (SNP) set forth as: rs201171, rs4394668, rs4618985, rsl831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl 866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081, rs6707773, rsl3019278, rsl26224
  • SNP single polymorphism nucleotide
  • a biomarker for diagnostically distinguishing between autism and autism spectrum associated disorders comprising: at least one single nucleotide polymorphism on chromosome 5pl4.1 and/ or at least one single polymorphism nucleotide (SNP) set forth as: rs201171, rs4394668, rs4618985, rsl831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081, rs6707773, rsl3019278, rsl2622496
  • Figure 1 shows a Quantile-Quantile (Q-Q) plot of PDT p-values for the discovery dataset.
  • the Q-Q plot measures deviation from the expected deviation of P-values.
  • the diagonal (red) line represents the expected (null) distribution. The slight deviation of the observed values above expected values at the tail of the distribution is consistent with modest genetic effects.
  • Figure 2 shows a genome- wide plot of association p-values in the discovery dataset. - logl ⁇ (p-value) for all 775,311 tested SNPs in 438 families are plotted against their genomic location. 96 SNPs have p-values ⁇ 10 " (horizontal red line) and 6 SNPs have p-values ⁇ 10 " (blue horizontal line). Individual chromosomes are demarked by different colors.
  • Figure 3 shows a Linkage Disequilibrium pattern among validated SNPs on chromosome 5pl4.1
  • Linkage disequilibrium (LD) was measured as r 2 values, which range from 0 (no correlation) to 1 (complete correlation).
  • LD was calculated between each pair of SNPs. Two blocks of strong LD were observed and span 3 Kb (SNPs 2-4) and 28 Kb (SNPs 5-8). SNP numbers correspond to the order in Table 2.
  • genes, gene names, and gene products disclosed herein are intended to correspond to homologs from any species for which the compositions and methods disclosed herein are applicable.
  • the terms include, but are not limited to genes and gene products from humans and mice. It is understood that when a gene or gene product from a particular species is disclosed, this disclosure is intended to be exemplary only, and is not to be interpreted as a limitation unless the context in which it appears clearly indicates.
  • the genes disclosed herein which in some embodiments relate to mammalian nucleic acid and amino acid sequences are intended to encompass homologous and/or orthologous genes and gene products from other animals including, but not limited to other mammals, fish, amphibians, reptiles, and birds. In preferred embodiments, the genes or nucleic acid sequences are human.
  • the term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within 1 or more than 1 standard deviation, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, preferably up to 10%, more preferably up to 5%, and more preferably still up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude, preferably within 5-fold, and more preferably within 2-fold, of a value. Where particular values are described in the application and claims, unless otherwise stated the term "about” meaning within an acceptable error range for the particular value should be assumed.
  • mutation refers to one or more changes to the sequence of a DNA sequence or a protein amino acid sequence relative to a reference sequence, usually a wild-type sequence.
  • a mutation in a DNA sequence may or may not result in a corresponding change to the amino acid sequence of the encoded protein.
  • a mutation may be a point mutation, i.e. an exchange of a single nucleotide and/or amino acid for another. Point mutations that occur within the protein-coding region of a gene's DNA sequence may be classified as a silent mutation (coding for the same amino acid), a missense mutation (coding for a different amino acid), and a nonsense mutation (coding for a stop which can truncate the protein).
  • a mutation may also be an insertion, i.e. an addition of one or more extra nucleotides and/or amino acids into the sequence. Insertions in the coding region of a gene may alter splicing of the mRNA (splice site mutation), or cause a shift in the reading frame (frameshift), both of which can significantly alter the gene product. A mutation may also be a deletion, i.e. removal of one or more nucleotides and/or amino acids from the sequence. Deletions in the coding region of a gene may alter the splicing and/or reading frame of the gene. A mutation may be spontaneous, induced, naturally occurring, or genetically engineered.
  • detecting a mutation in a subject may be done by any method useful for analyzing the DNA or amino acid sequence of the subject for the presence or absence of a mutation.
  • Such methods for analyzing a DNA or amino acid sequence are well known to those of skill in the art and any suitable means of detecting a mutation are encompassed by the present invention.
  • Such analysis may be done, for example, by isolating a genomic DNA sample from the subject and using nucleic acid hybridization with a detectable probe to test for the presence and/or absence of a mutation.
  • such analysis may be done using an mRNA sample from the subject, and optionally producing cDNA from the sample.
  • Such analysis may also be done, for example, using polymerase chain reaction to amplify a nucleic acid sequence and the amplification product may be sequenced and/or used for hybridization with a probe to detect the mutation.
  • Such analysis may also be done, for example, by isolating a protein sample from the subject and using antibodies to test for the presence and/or absence of a mutation in the protein.
  • biomarker or “genetic marker” or “marker” are used interchangeably herein and as used herein refers to a region of a nucleotide sequence (e.g., in a chromosome) that is subject to variability (i.e., the region can be polymorphic for a variety of alleles).
  • a single nucleotide polymorphism (SNP) in a nucleotide sequence is a biomarker that is polymorphic for two alleles.
  • Other examples of biomarkers of this invention can include but are not limited to microsatellites, restriction fragment length polymorphisms (RFLPs), repeats (i.e., duplications), insertions, deletions, etc.
  • RFLPs restriction fragment length polymorphisms
  • subject includes mammals, birds and reptiles.
  • subjects of this invention can include, but are not limited to, humans, non-human primates, dogs, cats, horses, cows, goats, guinea pigs, mice, rats and rabbits, as well as any other domestic, commercially or clinically valuable animal including animal models of autistic disorder.
  • modulate it is meant that any of the mentioned activities, are, e.g., increased, enhanced, increased, agonized (acts as an agonist), promoted, decreased, reduced, suppressed blocked, or antagonized (acts as an agonist). Modulation can increase activity more than 1-fold, 2-fold, 3-fold, 5-fold, 10-fold, 100-fold, etc., over baseline values. Modulation can also decrease its activity below baseline values.
  • an "allele” or “variant” is an alternative form of a gene. Variants may result from at least one mutation in the nucleic acid sequence and may result in altered mRNAs or in polypeptides whose structure or function may or may not be altered. Any given natural or recombinant gene may have none, one, or many allelic forms. Common mutational changes that give rise to variants are generally ascribed to natural deletions, additions, or substitutions of nucleotides. Each of these types of changes may occur alone, or in combination with the others, one or more times in a given sequence.
  • complementary means that two sequences are complementary when the sequence of one can bind to the sequence of the other in an anti-parallel sense wherein the 3'-end of each sequence binds to the 5'-end of the other sequence and each A, T(U), G, and C of one sequence is then aligned with a T(U), A, C, and G, respectively, of the other sequence.
  • the complementary sequence of the oligonucleotide has at least 80% or 90%, preferably 95%, most preferably 100%, complementarity to a defined sequence.
  • alleles or variants thereof can be identified.
  • a BLAST program also can be employed to assess such sequence identity.
  • complementary sequence as it refers to a polynucleotide sequence, relates to the base sequence in another nucleic acid molecule by the base-pairing rules. More particularly, the term or like term refers to the hybridization or base pairing between nucleotides or nucleic acids, such as, for instance, between the two strands of a double stranded DNA molecule or between an oligonucleotide primer and a primer binding site on a single stranded nucleic acid to be sequenced or amplified.
  • Complementary nucleotides are, generally, A and T (or A and U), or C and G.
  • Two single stranded RNA or DNA molecules are said to be substantially complementary when the nucleotides of one strand, optimally aligned and compared and with appropriate nucleotide insertions or deletions, pair with at least about 95% of the nucleotides of the other strand, usually at least about 98%, and more preferably from about 99 % to about 100%.
  • Complementary polynucleotide sequences can be identified by a variety of approaches including use of well-known computer algorithms and software, for example the BLAST program.
  • Bio samples include solid and body fluid samples. Preferably, the sample is obtained from heart. However, the biological samples used in the present invention can include cells, protein or membrane extracts of cells, blood or biological fluids such as ascites fluid or brain fluid (e.g., cerebrospinal fluid).
  • biological fluids such as ascites fluid or brain fluid (e.g., cerebrospinal fluid).
  • solid biological samples include, but are not limited to, samples taken from tissues of the central nervous system, bone, breast, kidney, cervix, endometrium, head/neck, gallbladder, parotid gland, prostate, pituitary gland, muscle, esophagus, stomach, small intestine, colon, liver, spleen, pancreas, thyroid, heart, lung, bladder, adipose, lymph node, uterus, ovary, adrenal gland, testes, tonsils and thymus.
  • body fluid samples include, but are not limited to blood, serum, semen, prostate fluid, seminal fluid, urine, saliva, sputum, mucus, bone marrow, lymph, and tears.
  • sample is used herein in its broadest sense.
  • a sample comprising polynucleotides, polypeptides, peptides, antibodies and the like may comprise a bodily fluid; a soluble fraction of a cell preparation, or media in which cells were grown; a chromosome, an organelle, or membrane isolated or extracted from a cell; genomic DNA, RNA, or cDNA, polypeptides, or peptides in solution or bound to a substrate; a cell; a tissue; a tissue print; a fingerprint, skin or hair; and the like.
  • “Mammal” covers warm blooded mammals that are typically under medical care (e.g., humans and domesticated animals). Examples include feline, canine, equine, bovine, and human, as well as just human.
  • Treating covers the treatment of a disease-state in a mammal, and includes: (a) preventing the disease-state from occurring in a mammal, in particular, when such mammal is predisposed to the disease-state but has not yet been diagnosed as having it; (b) inhibiting the disease-state, e.g., arresting it development; and/or (c) relieving the disease-state, e.g., causing regression of the disease state until a desired endpoint is reached. Treating also includes the amelioration of a symptom of a disease (e.g., lessen the pain or discomfort), wherein such amelioration may or may not be directly affecting the disease (e.g., cause, transmission, expression, etc.).
  • Diagnostic means identifying the presence or nature of a pathologic condition. Diagnostic methods differ in their sensitivity and specificity.
  • the "sensitivity” of a diagnostic assay is the percentage of diseased individuals who test positive (percent of "true positives”). Diseased individuals not detected by the assay are “false negatives.” Subjects who are not diseased and who test negative in the assay, are termed “true negatives.”
  • the "specificity” of a diagnostic assay is 1 minus the false positive rate, where the "false positive” rate is defined as the proportion of those without the disease who test positive. While a particular diagnostic method may not provide a definitive diagnosis of a condition, it suffices if the method provides a positive indication that aids in diagnosis.
  • safe and effective amount refers to the quantity of a component which is sufficient to yield a desired therapeutic response without undue adverse side effects (such as toxicity, irritation, or allergic response) commensurate with a reasonable benefit/risk ratio when used in the manner of this invention.
  • therapeutically effective amount is meant an amount of a compound of the present invention effective to yield the desired therapeutic response.
  • the specific safe and effective amount or therapeutically effective amount will vary with such factors as the particular condition being treated, the physical condition of the patient, the type of mammal or animal being treated, the duration of the treatment, the nature of concurrent therapy (if any), and the specific formulations employed and the structure of the compounds or its derivatives.
  • autism is one of the most heritable neuropsychiatric disorders, its underlying genetic architecture had heretofore largely eluded description.
  • GWAS genome -wide association study
  • 96 single nucleotide polymorphisms demonstrated strong association with autism risk (p-value ⁇ 0.0001).
  • the validation of the top 96 SNPs was performed using an independent dataset of 487 Caucasian autism families genotyped on the 550K Illumina BeadChip.
  • a novel region on chromosome 5pl4.1 showed significance in both the discovery and validation datasets. Joint analysis of all SNPs in this region identified 8 SNPs having improved p-values (3.24E-04 to 3.40E-06) than in either dataset alone.
  • detection of at least one single nucleotide polymorphism is diagnostic of autism or a spectrum of disorders (ASDs) associated with autism.
  • ASDs autism or a spectrum of disorders
  • the biomarkers of the present invention are single nucleotide polymorphisms (SNP).
  • Exemplary single nucleotide polymorphisms include but are not limited to T for G, T for A, C for A, C for T, A for G, A for C, A for T, G for A , G for T substitutions or any combinations thereof.
  • Common variations amongst individuals diagnosed with autism or ASD were identified on chromosome 5pl4.1.
  • a single nucleotide polymorphism on chromosome 5pl4.1 comprises: rsl0065041, rs7704909, rsl896731, rslOO38113, rs6894838, rsl2518194, rs4307059, or rs4327572.
  • a biomarker for the diagnosis of autism and autism spectrum associated disorders comprising detecting at least one single nucleotide polymorphism on chromosome 5pl4.1 and/ or at least one single polymorphism nucleotide (SNP) set forth as: rs201171, rs4394668, rs4618985, rsl831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl 866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081, rs67
  • SNP single polymorphism nucleotide
  • the biomarkers further comprise complementary sequences, fragments, alleles, variants, or gene products thereof of any one or more sequences comprising the SNPs that are indicative of autism or autism associated spectrum of disorders.
  • a biomarker for identifying a patient at risk of developing or for the diagnosis of autism and autism spectrum of disorders comprises at least one mutation in a cadherin gene and/or protocadherin gene and/or a mutation 5 ' to cadherin gene 10 (CDHlO) and 3' to cadherin gene 9 (CDH9).
  • the cadherin gene comprises cadherin gene 9, cadherin 10, or protocadherin 10.
  • the candidate region is flanked by two excellent candidate genes in CDH9 (Cadherin 9) and CDHlO (Cadherin 10).
  • the Cadherins are integral membrane proteins involved in calcium dependent cell-cell adhesion, and more specifically protein cell adhesion activity. Both genes have been identified as being expressed in fetal brain. Abnormalities in either of their functions could easily lead to inappropriate connections, and development, such as those believed to be present in autism, in developing brain.
  • the region of association indentified herein lies 5' to the CDHlO gene and 3' to CDH9.
  • CDHlO Since regulatory regions, which can be quite distant from the gene affected, generally tend to lie at the 5' ends of gene, detection of mutations or markers of CDHlO will be a priority, while not ignoring the fact that CDH9, which interacts with CDHlO, may also be a/the causative gene. Deletions in the related PCDHlO (proto cadherin 10) may also be involved in the etiology of autism. The CDHlO gene is predominantly expressed in brain and believed to be involved in the formation of synapses, axon growth and guidance.
  • a method of diagnosing a patient with autism or autism spectrum of disorders comprises detecting in a patient a biomarker set comprising identifying at least one single nucleotide polymorphism on chromosome 5pl4.1 and/ or at least one single polymorphism nucleotide (SNP) set forth as: rs201171, rs4394668, rs4618985, rsl831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081, rs6707773, r
  • SNP single polymorphis
  • a method of identifying a subject having an increased risk of developing autism or autism spectrum of disorders comprising: identifying in a patient a biomarker set comprising detecting at least one single nucleotide polymorphism on chromosome 5pl4.1 and/ or at least one single polymorphism nucleotide (SNP) set forth as: rs201171, rs4394668, rs4618985, rsl831870, rsl55288, rs5008948, rsl2024204, rsl l 162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081,
  • the assessment can also include detecting any or more of a single nucleotide polymorphism on chromosome 5pl4.1 comprises: rsl0065041, rs7704909, rsl896731, rslOO38113, rs6894838, rsl2518194, rs4307059, or rs4327572 and/or detecting a mutation in a cadherin gene.
  • the region of association with autism or autism spectrum of disorders comprises detecting at least one mutation in a cadherin gene and/or protocadherin gene and/or a mutation 5' to cadherin gene 10 (CDHlO) and 3' to cadherin gene 9 (CDH9).
  • a method of diagnosing a patient pre-natally or post-natally with autism comprises detecting at least one single nucleotide polymorphism on chromosome 5pl4.1 and/or at least one single nucleotide polymorphism comprising: rs201171, rs4394668, rs4618985, rsl831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl 866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081, rs6707773, rsl3019278, rsl2622496,
  • a sample comprises: amniotic fluid, serum, blood, plasma, or tissue from the fetus, and/or parents.
  • a method of identifying a marker for the diagnosis of autism and autism spectrum disorders comprises obtaining a sample from individuals and their families and purifying genomic DNA from the sample; genotyping single nucleotide polymorphisms (SNP); assessing the single nucleotide polymorphisms.
  • SNP single nucleotide polymorphisms
  • the samples are assessed in a genome wide association analysis.
  • the samples are subject to stringent quality control. Examples of the quality control are detailed in the examples section which follows.
  • samples from families with Mendelian errors greater than about 2% are excluded for the study.
  • single nucleotide polymorphisms having a Hardy-Weinberg equilibrium (HWE) p-value of about less than 10 " and a Mendelian Error (ME) of greater than about 4% are also excluded.
  • the data are subjected to mathematical and statistical analysis.
  • autosomal markers are analyzed for association comprising a pedigree disequilibrium test (PDT) and the samples are also assessed for linkage disequilibrium patterns.
  • PDT pedigree disequilibrium test
  • the detection of a biomarker in a subject can be carried out according to methods well known in the art.
  • DNA is obtained from any suitable sample from the subject that will contain DNA, preferably genomic DNA, and the DNA is then prepared and analyzed according to well-established protocols for the presence of biomarkers according to the methods of this invention.
  • analysis of the DNA can be carried out by amplification of the region of interest according to amplification protocols well known in the art (e.g., polymerase chain reaction, ligase chain reaction, strand displacement amplification, transcription-based amplification, self-sustained sequence replication (3SR), Q ⁇ replicase protocols, nucleic acid sequence-based amplification (NASBA), repair chain reaction (RCR) and boomerang DNA amplification (BDA)).
  • amplification product can then be visualized directly in a gel by staining or the product can be detected by hybridization with a detectable probe.
  • the types can be distinguished by a variety of well-known methods, such as hybridization with an allele-specif ⁇ c probe, secondary amplification with allele - specif ⁇ c primers, by restriction endonuclease digestion, or by electrophoresis.
  • the present invention can further provide oligonucleotides for use as primers and/or probes for detecting and/or identifying biomarkers according to the methods of this invention.
  • biomarkers of this invention are correlated with an autistic disorder as described herein according to methods well known in the art and as disclosed in the Examples provided herein for correlating biomarkers with various phenotypic traits, including disease states, disorders and pathological conditions and levels of risk associated with developing a disease, disorder or pathological condition.
  • identifying such correlation involves conducting analyses that establish a statistically significant association- and/or a statistically significant correlation between the presence of a biomarker or a combination of markers and the phenotypic trait in the subject.
  • An analysis that identifies a statistical association (e.g., a significant association) between the marker or combination of markers and the phenotype establishes a correlation between the presence of the marker or combination of markers in a subject and the particular phenotype being analyzed.
  • the present invention also provides a method wherein the biomarker is a combination of the single nucleotide polymorphisms, that is correlated with an aspect of autistic disorder as described herein.
  • SNPs correlated with increased risk of autistic disorder or with a diagnosis of autistic disorder include detecting at least one single nucleotide polymorphism on chromosome 5pl4.1 and/ or at least one single polymorphism nucleotide (SNP) set forth as: rs201171, rs4394668, rs4618985, rsl831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl866206
  • the assessment can also include detecting any or more of a single nucleotide polymorphism on chromosome 5pl4.1 comprising: rsl0065041, rs7704909, rsl896731, rslOO38113, rs6894838, rsl2518194, rs4307059, or rs4327572 and/or detecting a mutation in a cadherin gene.
  • the region of association with autism or autism spectrum of disorders comprises detecting at least one mutation in a cadherin gene and/or protocadherin gene and/or a mutation 5' to cadherin gene 10 (CDHlO) and 3' to cadherin gene 9 (CDH9).
  • the present invention also provides a method of identifying an effective treatment regimen for a subject with an autistic disorder, comprising detecting one or more biomarkers described in embodiments of the invention and correlated with an effective treatment regimen for an autistic disorder.
  • the present invention provides a method of identifying an effective treatment regimen for a subject with an autistic disorder, comprising: a) correlating the presence of one or more biomarkers in a test subject with an autistic disorder for whom an effective treatment regimen has been identified; and b) detecting the one or more markers of step (a) in the subject, thereby identifying an effective treatment regimen for the subject.
  • Patients who respond well to particular treatment protocols can be analyzed for specific biomarkers and a correlation can be established according to the methods provided herein. Alternatively, patients who respond poorly to a particular treatment regimen can also be analyzed for particular biomarkers correlated with the poor response. Then, a subject who is a candidate for treatment for an autistic disorder can be assessed for the presence of the appropriate biomarkers and the most appropriate treatment regimen can be provided.
  • the methods of correlating biomarkers with treatment regimens can be carried out using a computer database.
  • the present invention provides a computer-assisted method of identifying a proposed treatment for autistic disorder.
  • the method involves the steps of (a) storing a database of biological data for a plurality of patients, the biological data that is being stored including for each of said plurality of patients (i) a treatment type, (ii) at least one biomarker associated with autistic disorder and (iii) at least one disease progression measure for autistic disorder from which treatment efficacy can be determined; and then (b) querying the database to determine the dependence on said biomarker of the effectiveness of a treatment type in treating autistic disorder, to thereby identify a proposed treatment as an effective treatment for a subject carrying a biomarker correlated with autistic disorder.
  • treatment information for a patient is entered into the database (through any suitable means such as a window or text interface), biomarker information for that patient is entered into the database, and disease progression information is entered into the database. These steps are then repeated until the desired number of patients has been entered into the database.
  • the database can then be queried to determine whether a particular treatment is effective for patients carrying a particular marker, not effective for patients carrying a particular marker, etc. Such querying can be carried out prospectively or retrospectively on the database by any suitable means, but is generally done by statistical analysis in accordance with known techniques, as described herein.
  • an agent which can be used to treat a patient comprises an antisense oligonucleotide which modulates the expression and/or function of a gene comprising an SNP which has been associated with autism.
  • the oligonucleotides can be used to modulate the expression of the normal sequence or decrease the expression of the variant sequence(s). Another example would be to express a normal gene variant in a patient.
  • homology, sequence identity or complementarity, between the oligonucleotide and target is from about 50% to about 60%. In some embodiments, homology, sequence identity or complementarity, is from about 60% to about 70%. In some embodiments, homology, sequence identity or complementarity, is from about 70% to about 80%. In some embodiments, homology, sequence identity or complementarity, is from about 80% to about 90%. In some embodiments, homology, sequence identity or complementarity, is about 90%, about 92%, about 94%, about 95%, about 96%, about 97%, about 98%, about 99% or about 100%.
  • an oligonucleotide comprises combinations of phosphorothioate internucleotide linkages and at least one internucleotide linkage selected from the group consisting of: alkylphosphonate, phosphorodithioate, alkylphosphonothioate, phosphoramidate, carbamate, carbonate, phosphate triester, acetamidate, carboxymethyl ester, and/or combinations thereof.
  • an oligonucleotide optionally comprises at least one modified nucleobase comprising, peptide nucleic acids, locked nucleic acid (LNA) molecules, analogues, derivatives and/or combinations thereof.
  • LNA locked nucleic acid
  • An oligonucleotide is specifically hybridizable when binding of the compound to the target nucleic acid interferes with the normal function of the target nucleic acid to cause a loss of activity, and there is a sufficient degree of complementarity to avoid non-specific binding of the oligonucleotide to non-target nucleic acid sequences under conditions in which specific binding is desired.
  • Such conditions include, i.e., physiological conditions in the case of in vivo assays or therapeutic treatment, and conditions in which assays are performed in the case of in vitro assays.
  • An oligonucleotide whether DNA, RNA, chimeric, substituted etc, is specifically hybridizable when binding of the compound to the target DNA or RNA molecule interferes with the normal function of the target DNA or RNA to cause a loss of utility, and there is a sufficient degree of complementarily to avoid non-specific binding of the oligonucleotide to non-target sequences under conditions in which specific binding is desired, i.e., under physiological conditions in the case of in vivo assays or therapeutic treatment, and in the case of in vitro assays, under conditions in which the assays are performed.
  • oligomeric oligonucleotides bind to target nucleic acid molecules and modulate the expression and/or function of molecules encoded by a target gene.
  • the functions of DNA to be interfered comprise, for example, replication and transcription.
  • the functions of RNA to be interfered comprise all vital functions such as, for example, translocation of the RNA to the site of protein translation, translation of protein from the RNA, splicing of the RNA to yield one or more mRNA species, and catalytic activity which may be engaged in or facilitated by the RNA.
  • the functions may be up-regulated or inhibited depending on the functions desired.
  • the oligonucleotides include, antisense oligomeric compounds, antisense oligonucleotides, external guide sequence (EGS) oligonucleotides, alternate splicers, primers, probes, and other oligomeric compounds that hybridize to at least a portion of the target nucleic acid. As such, these compounds may be introduced in the form of single-stranded, double- stranded, partially single-stranded, or circular oligomeric compounds. [0077] Targeting an oligonucleotide to a particular nucleic acid molecule.
  • EGS external guide sequence
  • the targeting process usually also includes determination of at least one target region, segment, or site within the target nucleic acid for the antisense interaction to occur such that the desired effect, e.g., modulation of expression, will result.
  • region is defined as a portion of the target nucleic acid having at least one identifiable structure, function, or characteristic.
  • regions of target nucleic acids are segments.
  • Segments are defined as smaller or sub-portions of regions within a target nucleic acid.
  • Sites as used in the present invention, are defined as positions within a target nucleic acid.
  • the translation initiation codon is typically 5'-AUG (in transcribed mRNA molecules; 5'-ATG in the corresponding DNA molecule), the translation initiation codon is also referred to as the "AUG codon,” the “start codon” or the "AUG start codon”.
  • a minority of genes has a translation initiation codon having the RNA sequence 5'- GUG, 5'-UUG or 5'-CUG; and 5'-AUA, 5'-ACG and 5'-CUG have been shown to function in vivo.
  • translation initiation codon and “start codon” can encompass many codon sequences, even though the initiator amino acid in each instance is typically methionine (in eukaryotes) or formylmethionine (in prokaryotes).
  • Eukaryotic and prokaryotic genes may have two or more alternative start codons, any one of which may be preferentially utilized for translation initiation in a particular cell type or tissue, or under a particular set of conditions.
  • start codon and “translation initiation codon” refer to the codon or codons that are used in vivo to initiate translation of an mRNA transcribed from a gene regardless of the sequence(s) of such codons.
  • a translation termination codon (or "stop codon") of a gene may have one of three sequences, i.e., 5'-UAA, 5'-UAG and 5'-UGA (the corresponding DNA sequences are 5'-TAA, 5'-TAG and 5'-TGA, respectively).
  • start codon region and “translation initiation codon region” refer to a portion of such an mRNA or gene that encompasses from about 25 to about 50 contiguous nucleotides in either direction (i.e., 5' or 3') from a translation initiation codon.
  • stop codon region and “translation termination codon region” refer to a portion of such an mRNA or gene that encompasses from about 25 to about 50 contiguous nucleotides in either direction (i.e., 5' or 3') from a translation termination codon. Consequently, the "start codon region” (or “translation initiation codon region”) and the “stop codon region” (or “translation termination codon region”) are all regions that may be targeted effectively with the antisense compounds of the present invention.
  • a targeted region is the intragenic region encompassing the translation initiation or termination codon of the open reading frame (ORF) of a gene.
  • Another target region includes the 5' untranslated region (5'UTR), known in the art to refer to the portion of an mRNA in the 5' direction from the translation initiation codon, and thus including nucleotides between the 5' cap site and the translation initiation codon of an mRNA (or corresponding nucleotides on the gene).
  • Still another target region includes the 3' untranslated region (3'UTR), known in the art to refer to the portion of an mRNA in the 3' direction from the translation termination codon, and thus including nucleotides between the translation termination codon and 3' end of an mRNA (or corresponding nucleotides on the gene).
  • the 5' cap site of an mRNA comprises an N7-methylated guanosine residue joined to the 5'-most residue of the mRNA via a 5'-5' triphosphate linkage.
  • the 5' cap region of an mRNA is considered to include the 5' cap structure itself as well as the first 50 nucleotides adjacent to the cap site.
  • Another target region for this invention is the 5' cap region.
  • targeting splice sites i.e., intron-exon junctions or exon-intron junctions
  • intron-exon junctions or exon-intron junctions is particularly useful in situations where aberrant splicing is implicated in disease, or where an overproduction of a particular splice product is implicated in disease.
  • An aberrant fusion junction due to rearrangement or deletion is another embodiment of a target site.
  • mRNA transcripts produced via the process of splicing of two (or more) mRNAs from different gene sources are known as "fusion transcripts". Introns can be effectively targeted using antisense compounds targeted to, for example, DNA or pre- mRNA.
  • the antisense oligonucleotides bind to coding and/or non-coding regions of a target polynucleotide and modulate the expression and/or function of the target molecule.
  • RNA transcripts can be produced from the same genomic region of DNA. These alternative transcripts or "pre-mRNA variants" are transcripts produced from the same genomic DNA that differ from other transcripts produced from the same genomic DNA in either their start or stop position and contain both intronic and exonic sequence. [0085] Upon excision of one or more exon or intron regions, or portions thereof during splicing, pre-mRNA variants produce smaller "mRNA variants". Consequently, mRNA variants are processed pre-mRNA variants and each unique pre-mRNA variant must always produce a unique mRNA variant as a result of splicing. These mRNA variants are also known as "alternative splice variants". If no splicing of the pre-mRNA variant occurs then the pre-mRNA variant is identical to the mRNA variant.
  • Variants can be produced through the use of alternative signals to start or stop transcription.
  • Pre-mRNAs and mRNAs can possess more that one start codon or stop codon.
  • Variants that originate from a pre-mRNA or mRNA that use alternative start codons are known as "alternative start variants" of that pre-mRNA or mRNA.
  • Those transcripts that use an alternative stop codon are known as "alternative stop variants" of that pre-mRNA or mRNA.
  • One specific type of alternative stop variant is the "polyA variant” in which the multiple transcripts produced result from the alternative selection of one of the "polyA stop signals" by the transcription machinery, thereby producing transcripts that terminate at unique polyA sites.
  • the types of variants described herein are also embodiments of target nucleic acids.
  • the locations on the target nucleic acid to which the antisense compounds hybridize are defined as at least a 5-nucleobase portion of a target region to which an active antisense compound is targeted.
  • target segments While the specific sequences of certain exemplary target segments are set forth herein, one of skill in the art will recognize that these serve to illustrate and describe particular embodiments within the scope of the present invention. Additional target segments are readily identifiable by one having ordinary skill in the art in view of this disclosure. [0089] Once one or more target regions, segments or sites have been identified, antisense compounds are chosen which are sufficiently complementary to the target, i.e., hybridize sufficiently well and with sufficient specificity, to give the desired effect.
  • the oligonucleotides bind to an antisense strand of a particular target.
  • the oligonucleotides are at least 5 nucleotides in length and can be synthesized so each oligonucleotide targets overlapping sequences such that oligonucleotides are synthesized to cover the entire length of the target polynucleotide.
  • the targets also include coding as well as non coding regions.
  • antisense compounds include antisense oligonucleotides, ribozymes, external guide sequence (EGS) oligonucleotides, siRNA compounds, single- or double-stranded RNA interference (RNAi) compounds such as siRNA compounds, and other oligomeric compounds which hybridize to at least a portion of the target nucleic acid and modulate its function.
  • RNAi RNA interference
  • they may be DNA, RNA, DNA-like, RNA-like, or mixtures thereof, or may be mimetics of one or more of these.
  • These compounds may be single-stranded, double-stranded, circular or hairpin oligomeric compounds and may contain structural elements such as internal or terminal bulges, mismatches or loops.
  • Antisense compounds are routinely prepared linearly but can be joined or otherwise prepared to be circular and/or branched.
  • Antisense compounds can include constructs such as, for example, two strands hybridized to form a wholly or partially double-stranded compound or a single strand with sufficient self-complementarity to allow for hybridization and formation of a fully or partially double-stranded compound.
  • the two strands can be linked internally leaving free 3' or 5' termini or can be linked to form a continuous hairpin structure or loop.
  • the hairpin structure may contain an overhang on either the 5' or 3' terminus producing an extension of single stranded character.
  • the double stranded compounds optionally can include overhangs on the ends.
  • dsRNA can take the form of a self-complementary hairpin-type molecule that doubles back on itself to form a duplex.
  • the dsRNAs can be fully or partially double stranded. Specific modulation of gene expression can be achieved by stable expression of dsRNA hairpins in transgenic cell lines, however, in some embodiments, the gene expression or function is up regulated.
  • the two strands When formed from two strands, or a single strand that takes the form of a self-complementary hairpin-type molecule doubled back on itself to form a duplex, the two strands (or duplex-forming regions of a single strand) are complementary RNA strands that base pair in Watson-Crick fashion.
  • nucleic acids including oligonucleotides
  • DNA-like i.e., generally having one or more 2'-deoxy sugars and, generally, T rather than U bases
  • RNA-like i.e., generally having one or more 2'-hydroxyl or 2'-modified sugars and, generally U rather than T bases
  • Nucleic acid helices can adopt more than one type of structure, most commonly the A- and B-forms.
  • an antisense compound may contain both A- and B-form regions.
  • the desired oligonucleotides or antisense compounds comprise at least one of: antisense RNA, antisense DNA, chimeric antisense oligonucleotides, antisense oligonucleotides comprising modified linkages, interference RNA (RNAi), short interfering RNA (siRNA); a micro, interfering RNA (miRNA); a small, temporal RNA (stRNA); or a short, hairpin RNA (shRNA); small RNA-induced gene activation (RNAa); small activating RNAs (saRNAs), or combinations thereof.
  • RNAi interference RNA
  • siRNA short interfering RNA
  • miRNA micro, interfering RNA
  • shRNA small, temporal RNA
  • shRNA short, hairpin RNA
  • RNAa small RNA-induced gene activation
  • saRNAs small activating RNAs
  • oligonucleotides which contain two or more chemically distinct regions, each made up of at least one nucleotide. These oligonucleotides typically contain at least one region of modified nucleotides that confers one or more beneficial properties (such as, for example, increased nuclease resistance, increased uptake into cells, increased binding affinity for the target) and a region that is a substrate for enzymes capable of cleaving RNA:DNA or RNA:RNA hybrids.
  • antibodies and aptamers specifically bind to the biomarkers and components thereof.
  • the components include the nucleic acid sequences, complementary sequences, fragments, alleles, variants and gene products thereof of each component in each biomarker.
  • Aptamer polynucleotides are typically single-stranded standard phosphodiester DNA
  • a polynucleotide comprising a randomized sequence between "arms" having constant sequence is synthesized.
  • the arms can include restriction sites for convenient cloning and can also function as priming sites for PCR primers. The synthesis can easily be performed on commercial instruments.
  • the target protein is treated with the randomized polynucleotide.
  • the target protein can be in solution and then the complexes immobilized and separated from unbound nucleic acids by use of an antibody affinity column.
  • the target protein might be immobilized before treatment with the randomized polynucleotide.
  • the target protein-polynucleotide complexes are separated from the uncomplexed material and then the bound polynucleotides are separated from the target protein.
  • the bound nucleic acid can then be characterized, but is more commonly amplified, e.g. by PCR and the binding, separation and amplification steps are repeated.
  • use of conditions increasingly promoting separation of the nucleic acid from the target protein e.g. higher salt concentration, in the binding buffer used in step 2) in subsequent iterations, results in identification of polynucleotides having increasingly high affinity for the target protein.
  • nucleic acids showing high affinity for the target proteins are isolated and characterized. This is typically accomplished by cloning the nucleic acids using restriction sites incorporated into the arms, and then sequencing the cloned nucleic acid.
  • the affinity of aptamers for their target proteins is typically in the nanomolar range, but can be as low as the picomolar range. That is K D is typically 1 pM to 500 nM, more typically from 1 pM to 100 nM. Apatmers having an affinity of K D in the range of 1 pM to 10 nM are also useful.
  • Aptamer polynucleotides can be synthesized on a commercially available nucleic acid synthesizer by methods known in the art.
  • the product can be purified by size selection or chromatographic methods.
  • Aptamer polynucleotides are typically from about 10 to 200 nucleotides long, more typically from about 10 to 100 nucleotides long, still more typically from about 10 to 50 nucleotides long and yet more typically from about 10 to 25 nucleotides long.
  • a preferred range of length is from about 10 to 50 nucleotides.
  • the aptamer sequences can be chosen as a desired sequence, or random or partially random populations of sequences can be made and then selected for specific binding to a desired target protein by assay in vitro. Any of the typical nucleic acid-protein binding assays known in the art can be used, e.g. "Southwestern" blotting using either labeled oligonucleotide or labeled protein as the probe. See also U.S. Pat. No. 5,445,935 for a fluorescence polarization assay of protein-nucleic acid interaction.
  • a desired aptamer-protein complex for example, aptamer-thrombin complex of the invention can be labeled and used as a diagnostic agent in vitro in much the same manner as any specific protein-binding agent, e.g. a monoclonal antibody.
  • an aptamer-protein complex of the invention can be used to detect and quantitate the amount of its target protein in a sample, e.g. a blood sample, to provide diagnosis of a disease state correlated with the amount of the protein in the sample.
  • a desired aptamer-target/bait molecular complex can also be used for diagnostic imaging. In imaging uses, the complexes are labeled so that they can be detected outside the body.
  • Typical labels are radioisotopes, usually ones with short half-lives.
  • Nuclear magnetic resonance (NMR) imaging enhancers, such as gadolinium- 153, can also be used to label the complex for detection by NMR. Methods and reagents for performing the labeling, either in the polynucleotide or in the protein moiety, are considered known in the art.
  • an antibody or aptamer is specific for each biomarker or genetic marker comprising: : rs201171, rs4394668, rs4618985, rsl831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081, rs6707773, rsl3019278, rsl2622496, rs3731723, rs722555, rs6436915, rs2279977, rs9822786, rsl2
  • the region of association with autism or autism spectrum of disorders also comprises at least one mutation in a cadherin gene and/or protocadherin gene and/or a mutation 5' to cadherin gene 10 (CDHlO) and 3' to cadherin gene 9 (CDH9).
  • the biomarkers also encompass any complementary sequences, fragments, alleles, variants gene products thereof.
  • the biomarkers are useful for the identification of new drugs in the treatment of autism and spectrum of autistic disorders.
  • Small Molecules Small molecule test compounds or candidate therapeutic compounds can initially be members of an organic or inorganic chemical library. As used herein, "small molecules” refers to small organic or inorganic molecules of molecular weight below about 3,000 Daltons. The small molecules can be natural products or members of a combinatorial chemistry library. A set of diverse molecules should be used to cover a variety of functions such as charge, aromaticity, hydrogen bonding, flexibility, size, length of side chain, hydrophobicity, and rigidity.
  • Combinatorial techniques suitable for synthesizing small molecules are known in the art, e.g., as exemplified by Obrecht and Villalgordo, Solid-Supported Combinatorial and Parallel Synthesis of Small-Molecular-Weight Compound Libraries, Pergamon-Elsevier Science Limited (1998), and include those such as the "split and pool” or “parallel” synthesis techniques, solid-phase and solution-phase techniques, and encoding techniques (see, for example, Czarnik, Curr. Opin. Chem. Bio., 1:60 (1997). In addition, a number of small molecule libraries are commercially available. [0097] Particular screening applications of this invention relate to the testing of pharmaceutical compounds in drug research.
  • Assessment of the activity of candidate pharmaceutical compounds generally involves administering a candidate compound, determining any change in the morphology, marker phenotype and expression, or metabolic activity of the cells and function of the cells that is attributable to the compound (compared with untreated cells or cells treated with an inert compound), and then correlating the effect of the compound with the observed change.
  • the screening may be done, for example, either because the compound is designed to have a pharmacological effect on certain cell types, or because a compound designed to have effects elsewhere may have unintended side effects.
  • Two or more drugs can be tested in combination (by combining with the cells either simultaneously or sequentially), to detect possible drug—drug interaction effects.
  • compounds are screened initially for potential toxicity (Castell et ah, pp. 375-410 in "In vitro Methods in Pharmaceutical Research,” Academic Press, 1997). Cytotoxicity can be determined in the first instance by the effect on cell viability, survival, morphology, and expression or release of certain markers, receptors or enzymes. Effects of a drug on chromosomal DNA can be determined by measuring DNA synthesis or repair. [ H]thymidine or BrdU incorporation, especially at unscheduled times in the cell cycle, or above the level required for cell replication, is consistent with a drug effect. Unwanted effects can also include unusual rates of sister chromatid exchange, determined by metaphase spread. The reader is referred to A. Vickers (PP 375-410 in "In vitro Methods in Pharmaceutical Research,” Academic Press, 1997) for further elaboration.
  • a method of identifying a candidate agent comprising: (a) contacting a biological sample from a patient with the candidate agent and determining the level of expression of one or more biomarkers described herein; (b) determining the level of expression of a corresponding biomarker or biomarkers in an aliquot of the biological sample not contacted with the candidate agent; (c) observing the effect of the candidate agent by comparing the level of expression of the biomarker or biomarkers in the aliquot of the biological sample contacted with the candidate agent and the level of expression of the corresponding biomarker or biomarkers in the aliquot of the biological sample not contacted with the candidate agent; and (d) identifying said agent from said observed effect, wherein an at least 1%, 2%, 5%, 10% difference between the level of expression of the biomarker gene or combination of biomarker genes in the aliquot of the biological sample contacted with the candidate agent and the level of expression of the corresponding biomarker gene or combination of bio
  • a method of producing a drug comprising the steps of the method according to the invention (i) synthesizing the candidate agent identified in step (c) above or an analog or derivative thereof in an amount sufficient to provide said drug in a therapeutically effective amount to a subject; and/or (ii) combining the drug candidate the candidate agent identified in step (c) above or an analog or derivative thereof with a pharmaceutically acceptable carrier.
  • Vectors, Cells In some embodiments it is desirable to express the biomarker, in a vector and in cells. The applications of such combinations are unlimited.
  • the vectors and cells expressing the one or more biomarkers can be used in assays, kits, drug discovery, diagnostics, prognostics and the like.
  • the cells can be stem cells isolated from the bone marrow as a progenitor cell, or cells obtained from any other source, such as for example, ATCC.
  • BMDC bone marrow derived progenitor cell
  • bone marrow derived stem cell refers to a primitive stem cell with the machinery for self-renewal constitutively active.
  • stem cells that are totipotent, pluripotent and precursors.
  • a "precursor cell” can be any cell in a cell differentiation pathway that is capable of differentiating into a more mature cell.
  • the term “precursor cell population” refers to a group of cells capable of developing into a more mature cell.
  • a precursor cell population can comprise cells that are totipotent, cells that are pluripotent and cells that are stem cell lineage restricted (i.e. cells capable of developing into less than all hematopoietic lineages, or into, for example, only cells of erythroid lineage).
  • the term “totipotent cell” refers to a cell capable of developing into all lineages of cells.
  • the term “totipotent population of cells” refers to a composition of cells capable of developing into all lineages of cells.
  • pluripotent cell refers to a cell capable of developing into a variety ⁇ albeit not all) lineages and are at least able to develop into all hematopoietic lineages (e.g., lymphoid, erythroid, and thrombocytic lineages).
  • Bone marrow derived stem cells contain two well-characterized types of stem cells.
  • Mesenchymal stem cells (MSC) normally form chondrocytes and osteoblasts.
  • Hematopoietic stem cells are of mesodermal origin that normally give rise to cells of the blood and immune system (e.g., erythroid, granulocyte/macrophage, magakaryocite and lymphoid lineages).
  • hematopoietic stem cells also have been shown to have the potential to differentiate into the cells of the liver (including hepatocytes, bile duct cells), lung, kidney (e.g., renal tubular epithelial cells and renal parenchyma), gastrointestinal tract, skeletal muscle fibers, astrocytes of the CNS, Purkinje neurons, cardiac muscle (e.g., cardiomyocytes), endothelium and skin.
  • a method of identifying candidate therapeutic compounds comprises culturing cells expressing at least one biomarker, complementary sequences, fragments, alleles, variants and gene products thereof, complementary sequences, fragments, alleles, variants and gene products thereof, with a candidate therapeutic agent; identifying candidate therapeutic agents which modulate the expression of the biomarkers and identifying a candidate therapeutic agent.
  • a candidate therapeutic agent comprises organic molecules, inorganic molecules, vaccines, antibodies, nucleic acid molecules, proteins, peptides and vectors expressing nucleic acid molecules.
  • the methods include administering the compound to a model of the condition, e.g., contacting a cell (in vitro) model with the compound, or administering the compound to an animal model of the condition, e.g., an animal model of a condition associated with heart disease.
  • the model is then evaluated for an effect of the candidate compound on the clinical outcome in the model and can be considered a candidate therapeutic compound for the treatment of the condition.
  • Such effects can include clinically relevant effects, decreased pain; increased life span; and so on.
  • Such effects can be determined on a macroscopic or microscopic scale.
  • Candidate therapeutic compounds identified by these methods can be further verified, e.g., by administration to human subjects in a clinical trial.
  • the biomarkers can be expressed from one or more vectors.
  • a "vector” (sometimes referred to as gene delivery or gene transfer “vehicle”) refers to a macromolecule or complex of molecules comprising a polynucleotide to be delivered to a host cell, either in vitro or in vivo.
  • the polynucleotide to be delivered may comprise a coding sequence of interest in gene therapy.
  • Vectors include, for example, viral vectors (such as adenoviruses (“Ad”), adeno-associated viruses (AAV), and retroviruses), liposomes and other lipid-containing complexes, and other macromolecular complexes capable of mediating delivery of a polynucleotide to a host cell.
  • Vectors can also comprise other components or functionalities that further modulate gene delivery and/or gene expression, or that otherwise provide beneficial properties to the targeted cells.
  • such other components include, for example, components that influence binding or targeting to cells (including components that mediate cell-type or tissue-specific binding); components that influence uptake of the vector nucleic acid by the cell; components that influence localization of the polynucleotide within the cell after uptake (such as agents mediating nuclear localization); and components that influence expression of the polynucleotide.
  • Such components also might include markers, such as detectable and/or selectable markers that can be used to detect or select for cells that have taken up and are expressing the nucleic acid delivered by the vector.
  • Such components can be provided as a natural feature of the vector (such as the use of certain viral vectors which have components or functionalities mediating binding and uptake), or vectors can be modified to provide such functionalities.
  • Other vectors include those described by Chen et al; BioTechniques, 34: 167-171 (2003). A large variety of such vectors are known in the art and are generally available.
  • At least one candidate compound as defined elsewhere herein is used to promote a positive response with respect to a patient diagnosed with autism or ASD using the biomarkers described.
  • positive therapeutic response is intended an improvement in the disorder, syndrome, symptoms, or translational profile associated with the disorder.
  • Treating a patient comprises administration of one or more of the identified agents.
  • the agents can be administered as part of a treatment regimen with one or more other therapies and medicaments.
  • the compounds of this invention may also be administered orally to the patient, in a manner such that the concentration of drug is sufficient to inhibit bone resorption or to achieve any other therapeutic indication as disclosed herein.
  • a pharmaceutical composition containing the compound is administered at an oral dose of between about 0.1 to about 50 mg/kg in a manner consistent with the condition of the patient.
  • the oral dose would be about 0.5 to about 20 mg/kg.
  • an intravenous infusion of the compound in 5% dextrose in water or normal saline, or a similar formulation with suitable excipients is most effective, although an intramuscular bolus injection is also useful.
  • the parenteral dose will be about 0.01 to about 100 mg/kg; preferably between 0.1 and 20 mg/kg, in a manner to maintain the concentration of drug in the plasma at a concentration effective to inhibit a cysteine protease.
  • the compounds may be administered one to four times daily at a level to achieve a total daily dose of about 0.4 to about 400 mg/kg/day.
  • an inventive compound which is therapeutically effective is readily determined by one of ordinary skill in the art by comparing the blood level of the agent to the concentration required to have a therapeutic effect.
  • Prodrugs of compounds of the present invention may be prepared by any suitable method. For those compounds in which the prodrug moiety is a ketone functionality, specifically ketals and/or hemiacetals, the conversion may be effected in accordance with conventional methods. [00111] No unacceptable toxicological effects are expected when compounds, derivatives, salts, compositions etc, of the present invention are administered in accordance with the present invention.
  • the compounds of this invention which may have good bioavailability, may be tested in one of several biological assays to determine the concentration of a compound which is required to have a given pharmacological effect.
  • a pharmaceutical or veterinary composition comprising one or more compounds (e.g. antisense, compounds identified through an assay described above, etc) and a pharmaceutically or veterinarily acceptable carrier.
  • compounds e.g. antisense, compounds identified through an assay described above, etc
  • a pharmaceutically or veterinarily acceptable carrier e.g., a pharmaceutically or veterinarily acceptable carrier.
  • Other active materials may also be present, as may be considered appropriate or advisable for the disease or condition being treated or prevented.
  • each of the carriers must be acceptable in the sense of being compatible with the other ingredients of the formulation and not deleterious to the recipient.
  • the compounds described herein are suitable for use in a variety of drug delivery systems described above. Additionally, in order to enhance the in vivo serum half-life of the administered compound, the compounds may be encapsulated, introduced into the lumen of liposomes, prepared as a colloid, or other conventional techniques may be employed which provide an extended serum half-life of the compounds. A variety of methods are available for preparing liposomes, as described in, e.g., Szoka, et al., U.S. Pat. Nos. 4,235,871, 4,501,728 and 4,837,028 each of which is incorporated herein by reference.
  • the formulations include those suitable for rectal, nasal, topical (including buccal and sublingual), vaginal or parenteral (including subcutaneous, intramuscular, intravenous and intradermal) administration, but preferably the formulation is an orally administered formulation.
  • the formulations may conveniently be presented in unit dosage form, e.g. tablets and sustained release capsules, and may be prepared by any methods well known in the art of pharmacy.
  • Such methods include the step of bringing into association the above defined active agent with the carrier.
  • the formulations are prepared by uniformly and intimately bringing into association the active agent with liquid carriers or finely divided solid carriers or both, and then if necessary shaping the product.
  • the invention extends to methods for preparing a pharmaceutical composition comprising bringing a compound conjunction or association with a pharmaceutically or veterinarily acceptable carrier or vehicle.
  • kits comprising any one or more of the biomarkers comprising Tables 1 and 2.
  • kits or articles of manufacture are also provided by the invention.
  • Such kits may comprise a carrier means being compartmentalized to receive in close confinement one or more container means such as vials, tubes, and the like, each of the container means comprising one of the separate elements to be used in the method.
  • container means such as vials, tubes, and the like
  • each of the container means comprising one of the separate elements to be used in the method.
  • one of the container means may comprise a probe that is or can be detectably labeled.
  • the kit may also have containers containing nucleotide(s) for amplification of the target nucleic acid sequence and/or a container comprising a reporter-means, such as a biotin- binding protein, such as avidin or streptavidin, bound to a reporter molecule, such as an enzymatic, florescent, or radioisotope label.
  • a reporter-means such as a biotin- binding protein, such as avidin or streptavidin
  • the kit of the invention will typically comprise the container described above and one or more other containers comprising materials desirable from a commercial and user standpoint, including buffers, diluents, filters, needles, syringes, and package inserts with instructions for use.
  • a label may be present on the container to indicate that the composition is used for a specific therapy or non-therapeutic application, and may also indicate directions for either in vivo or in vitro use, such as those described above.
  • kits of the invention have a number of embodiments.
  • a typical embodiment is a kit comprising a container, a label on said container, and a composition contained within said container; wherein the composition includes a primary antibody that binds to the biomarkers of each molecular signature and instructions for using the antibody for evaluating the presence of biomarkers in at least one type of mammalian cell.
  • the kit can further comprise a set of instructions and materials for preparing a tissue sample and applying antibody and probe to the same section of a tissue sample.
  • the kit may include both a primary and secondary antibody, wherein the secondary antibody is conjugated to a label, e.g., an enzymatic label.
  • kits comprising a container, a label on said container, and a composition contained within said container; wherein the composition includes a polynucleotide that hybridizes to a complement of the polynucleotides under stringent conditions, the label on said container indicates that the composition can be used to evaluate the presence of a molecular signature in at least one type of mammalian cell, and instructions for using the polynucleotide for evaluating the presence of biomarker RNA or DNA in at least one type of mammalian cell.
  • kits include, microarrays, one or more buffers (e.g., block buffer, wash buffer, substrate buffer, etc), other reagents such as substrate (e.g., chromogen) which is chemically altered by an enzymatic label, epitope retrieval solution, control samples (positive and/or negative controls), control slide(s) etc.
  • buffers e.g., block buffer, wash buffer, substrate buffer, etc
  • substrate e.g., chromogen
  • control samples positive and/or negative controls
  • Example 1 Chromosome 5 Autism candidate gene analysis
  • the immediate area of association on chromosome 5 contains no clearly defined candidate genes.
  • Several potential genes such as a "merlin"-like annotated gene derived from similarity to Drosophila and an MSNLl (moesin-like) pseudogene, which may form part of the same Ch. X duplicated complex, lie in the immediate region of association.
  • several areas exhibit highly evolutionarily conserved sequence in this region.
  • Immediately flanking the region of maximum association analysis indicates that one or more un-annotated or uncharacterized genes or transcripts, some supported by spliced EST data, may lie close to the region of association.
  • CDH9 CDH9
  • CDHlO CDHlO
  • Cadherins are integral membrane proteins involved in calcium dependent cell-cell adhesion, and more specifically protein cell adhesion activity. Both genes have been identified as being expressed in fetal brain. Abnormalities in either of their functions could easily lead to inappropriate connections, and development, such as those believed to be present in autism, in developing brain.
  • the region of association indentified herein lies 5' to the CDHlO gene and 3' to CDH9.
  • CDHlO Since regulatory regions, which can be quite distant from the gene affected, generally tend to lie at the 5' ends of gene, study of CDHlO will be a priority, while not ignoring the fact that CDH9, which interacts with CDHlO, may also be a/the causative gene. Deletions in the related PCDHlO (protocadherin 10) may also be involved in the etiology of autism. The CDHlO gene is predominantly expressed in brain and believed to be involved in the formation of synapses, axon growth and guidance.
  • Example 2 A genome-wide association study of autism reveals a common novel risk locus at 5pl4.1
  • ADI-R Autism Diagnostic Interview-Revised
  • VABS Vineland Adaptive Behavior Scale
  • GWAS data were obtained from the Autism Genetic Resource Exchange (AGRE) (Autism Genetics Resource Exchange 2008) for use as a validation dataset.
  • the full AGRE dataset is publicly available and contains families with the full spectrum of autism spectrum disorders. Only families with one or more individuals diagnosed with autism (using DSM-IV and ADI-R) were selected; affected individuals with non-autism diagnosis within these families were excluded from the analysis.
  • Genotyping of the discovery dataset Genomic DNA was purified from whole blood using Puregene chemistry on the Qiagen Autopure LS according to standard automated Qiagen protocols (Qiagen, Valencia, CA).
  • DNA samples were quantitated via the ND-8000 spectrophotometer and DNA quality was evaluated via gel electrophoresis on a 0.8% agarose gel. The concentration for all qualified samples was normalized to 50 ng/ ⁇ l and samples were arrayed in Matrix 0.5ml 2D barcoded tubes in racks of 96. Sample identity was confirmed by genotyping 8 SNPs using Taqman allelic discrimination assays (Applied Biosystems, Foster City, CA) and assessing for concordance with historical data.
  • Samples that passed the above exclusion criteria were genotyped using Illumina's Human IM Beadchip, containing 1,072,820 SNPs (of those 258,665 loci are in reported and new CNV regions). The samples have been processed according to Illumina Procedures for processing of the Infinium II ® assay (Illumina Inc., San Diego, CA).
  • Sample quality control After genotyping, samples were subject to a battery of a quality control (QC) tests. The same protocol for both the discovery and validation datasets was used. Reported and genetic gender were examined using X-chromosome linked SNPs. Relatedness between samples, sample contaminations, mis-identification and duplications were tested using genome- wide identity-by-descent (IBD) estimation; inconsistent samples were dropped from the analysis. The numbers of remaining samples are listed in Table Sl .
  • IBD identity-by-descent
  • SNP quality control SNPs were subject to QC before analysis. SNPs with minor allele frequencies below 5% were removed because of restricted power in the discovery sample. Negative correlation between the proportion of ME per SNP and p-value for HWE were observed. To minimize genotyping errors we excluded SNPs with p-value ⁇ 10 "6 for HWE and ME > 4%. Remaining erroneous genotypes were set as missing. PLINK software was used for quality control steps described above (Purcell et al., 2007, Am J Hum Genet 81, 559-575).
  • Illumina provides information on which IM BeadChip SNPs were located within known common CNV regions. The distribution of ME per family and per SNP were compared. No significant differences between ME per SNP in the known CNV regions and the remaining markers were identified. The same quality criteria were used for both the discovery and the validation datasets. The summary of SNPs is presented in Table S2.
  • Genotype Imputation Since the validation dataset was genotyped on a different GWAS SNP panel with a smaller number of SNPs (558183), the genotypes from our data and the data from the AGRE were imputed independently by the program IMPUTE (Marchini et al., 2007, Nat Genet 39, 906-913) using a phased CEU HapMap dataset as a reference (International HapMap Consortium et al., 2007, Nature 449, 851-861). Individual genotypes with probability less than 0.90 were not included. All individuals were treated independently while doing imputation. Mendelian inconsistencies were zeroed out in PLINK (Purcell et al., 2007, Am J Hum Genet 81 , 559-575). The results for the imputation are found in Table 1. Results on imputed SNPs missing more than 10% of the genotypes were labeled in Table 1 and should be interpreted with caution because of possible bias.
  • Association analysis was performed using the pedigree disequilibrium test (PDT) (Martin et al, 2000, Am J Hum Genet 67, 146-154; Martin et al, 2001, Am J Hum Genet 68, 1065-1067). This method provides valid and robust tests for allelic association across both trios and extended families. Only autosomal markers were tested for association. The estimation of odds ratios and 95% confidence interval calculations were performed using UNPHASED (Dudbridge et al, 2008, Hum Hered 66,87-98). Power calculations for association analysis were performed using the Genetic Power Calculator (Purcell et al., 2008, Genetic Power Calculator [Homepage of Harvard University], [Online]. Available: pngu.mgh.harvard.edu/ ⁇ purcell/gpc/ [2008]).
  • Linkage disequilibrium Linkage disequilibrium (LD) patterns and haplotype block delineation were determined by using Haploview 4.1 (Choi et al., 2001, Yonsei Medical Journal 42, 247-254). Blocks were defined using the confidence interval method described by Gabriel et al. (2002), Science 296, 2225-2229. Pair-wise LD measures (r 2 ) were calculated in the 3,822 unrelated founders of the joint sample.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Pathology (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

A genome wide association analysis demonstrates that in addition to multiple rare variations, part of the complex genetic architecture of autism involves common variation. Biomarkers identified as being associated with autisms or autism spectrum of disorders include single nucleotide polymorphisms (SNPs) which demonstrated strong association with autism risk. A novel region on chromosome 5pl4.1 was also identified which showed significance in both the discovery and validation datasets.

Description

GENOME-WIDE ASSOCIATION STUDY OF AUTISM REVEALS A COMMON NOVEL RISK LOCUS AT 5P14.1
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The present application claims the priority of U.S. provisional patent application No. 61/090,436 filed August 20, 2008, which is incorporated herein by reference in its entirety.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH
[0002] This invention was made with U.S. government support under grant numbers NS26630, funded by the National Institute of Neurological Disorders and Stroke, NS36768, MH080647 funded by The National Institute of Neurological Disorders and Stroke and by National Institute of Mental Health. The U.S. government has certain rights in the invention.
FIELD OF THE INVENTION
[0003] Embodiments of the invention are directed to biomarkers associated with autism and the spectrum of disorders associated with autism. Methods of diagnosis and risk assessment comprise detecting one or more of the biomarkers.
BACKGROUND
[0004] Autism is a neurodevelopmental disorder characterized by three primary areas of impairment: social interaction, communication, and restricted and repetitive patterns of interest or behavior (1). It is among a spectrum of disorders (ASDs) with symptoms that may range from quite severe (autistic disorder) to relatively mild (Asperger syndrome). With improved surveillance and a broadening of the diagnostic criteria, the most recent prevalence studies suggest that ASDs may affect as many as 1 in 150 children in the U.S. making it one of the most common of the neurodevelopmental disorders (2). ASDs are most often diagnosed before age four, and are at least three to four times more frequent in males than females (3). [0005] Overwhelming evidence from twin and sibling studies demonstrates that autism is highly heritable (4-6), but there is no consensus on the underlying genetic architecture. There are two alternative proposals, one involving numerous rare genetic mutations and the other involving fewer but more common genetic variations. Supporting the rare mutation hypothesis are mutations in several genes and rare structural DNA variations both of which have been identified, although the pervasiveness of these effects remains controversial (7, 8). Data supporting the effect of common variation has been more difficult to find. Several genome-wide linkage screens and focused candidate gene association studies have been performed in autism (9, 10), but the results have been disappointing and no universally accepted susceptibility polymorphism has yet emerged. Collectively these data have suggested that the common variant hypothesis may not be relevant to autism genetics.
[0006] A recent study by Arking et al. (11) combining linkage and genome- wide association in 72 multiplex autism families identified a common variant in the CNTNAP2 gene that was associated with autism primarily in families where all affected individuals were male (male only families). This association was also seen by Alarcon et al., (12) and similar to Arking et al. (11), the effect was primarily in male only autism families. However, this association has not been widely replicated.
SUMMARY
[0007] This Summary is provided to present a summary of the invention to briefly indicate the nature and substance of the invention. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims.
[0008] Embodiments of the invention provide a genome wide association analysis demonstrating that in addition to multiple rare variations, part of the complex genetic architecture of autism involves common variation. Biomarkers identified as being associated with autisms or autism spectrum of disorders include single nucleotide polymorphisms (SNPs) which demonstrated strong association with autism risk. A novel region on chromosome 5pl4.1 was also identified which showed significance in both the discovery and validation datasets.
[0009] Biomarkers for diagnosing autism and autism spectrum associated disorders comprise at least one single nucleotide polymorphism on chromosome 5pl4.1 and/ or at least one single polymorphism nucleotide (SNP) set forth as: rs201171, rs4394668, rs4618985, rs 1831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081, rs6707773, rsl3019278, rsl2622496, rs3731723, rs722555, rs6436915, rs2279977, rs9822786, rsl2491012, rsl811763, rsl896731, rsl0038113, rsl 1739167, rs7447989, rs6873221, rsl2187724, rsl423435, rsl2153325, rslO461556, rs350436, rs830907, rs315717, rs3804254, rs2317222, rsl936022, rs6907646, rsl2529724, rsl504279, rsl504281, rs2799644, rs320813, rsl529001, rsl 1765584, rsl529813, rs2528795, rsl000058, rsl 149558, rsl 130496, rsl2155975, rsl6930253, rs2082804, rs4739071, rs6991229, rsl865641, rsl865638, rsl 1001685, rs881631, rsl2245799, rsl2249859, rsl0748804, rs7910491, rsl 0840070, rsl 1021927, rs9919560, rsl2366827, rsl894827, rsl 1055784, rsl 1055786, rsl 1117003, rsl0492086, rsl0444509, rsl 1616562, rs7325257, rsl 1636552, rsl532926, rs4456502, rs4148358, rs891754, rs8088001, rs547668, rsl 144093, rs637644, rs6049129, rs742759, or rsl71415, variants, mutants, alleles or complementary sequences thereof.
[0010] In another preferred embodiment, the single nucleotide polymorphism on chromosome 5pl4.1 comprises: rsl0065041, rs7704909, rsl896731, rslOO38113, rs6894838, rsl2518194, rs4307059, or rs4327572, variants, mutants, alleles or complementary sequences thereof.
[0011] In another preferred embodiment, the biomarker further comprises at least one mutation in a cadherin gene and/or protocadherin gene, variants, mutants, alleles or complementary sequences thereof.
[0012] In another preferred embodiment, the cadherin gene comprises cadherin gene 9, cadherin 10, or protocadherin 10, variants, mutants, alleles or complementary sequences thereof.
[0013] In another preferred embodiment, a method of identifying the risk of developing or diagnosing a patient with autism or autism spectrum of disorders comprising: identifying in a patient a biomarker set comprising: at least one single nucleotide polymorphism on chromosome 5pl4.1 and/ or at least one single polymorphism nucleotide (SNP) set forth as: rs201171, rs4394668, rs4618985, rsl831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl 866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081, rs6707773, rsl3019278, rsl2622496, rs3731723, rs722555, rs6436915, rs2279977, rs9822786, rsl2491012, rsl811763, rsl896731, rsl0038113, rsl 1739167, rs7447989, rs6873221, rsl2187724, rsl423435, rsl2153325, rs 10461556, rs350436, rs830907, rs315717, rs3804254, rs2317222, rsl 936022, rs6907646, rsl2529724, rsl504279, rsl504281, rs2799644, rs320813, rsl529001, rsl 1765584, rsl529813, rs2528795, rsl000058, rsl 149558, rsl 130496, rsl2155975, rsl6930253, rs2082804, rs4739071, rs6991229, rsl865641, rsl865638, rsl 1001685, rs881631, rsl2245799, rsl2249859, rsl0748804, rs7910491, rsl 0840070, rsl 1021927, rs9919560, rs 12366827, rsl 894827, rsl 1055784, rsl 1055786, rsl 1117003, rsl0492086, rsl0444509, rsl 1616562, rs7325257, rsl 1636552, rsl532926, rs4456502, rs4148358, rs891754, rs8088001, rs547668, rsl 144093, rs637644, rs6049129, rs742759, rsl71415, rsl0065041, rs7704909, rsl896731, rsl0038113, rs6894838, rsl2518194, rs4307059, or rs4327572.
[0014] In another preferred embodiment, the method of claim 5, further comprising: a mutation in a cadherin gene and/or 5' to cadherin gene 10 (CDHlO) and 3' to cadherin gene 9 (CDH9). Preferably, the cadherin gene comprises cadherin gene 9, cadherin 10, or protocadherin 10.
[0015] In another preferred embodiment, a biomarker for diagnostically distinguishing between autism and autism spectrum associated disorders method of identifying a subject having an increased risk of developing autism or autism spectrum of disorders comprising: identifying in a patient a biomarker set comprising at least one single nucleotide polymorphism on chromosome 5pl4.1 and/ or at least one single polymorphism nucleotide (SNP) set forth as: rs201171, rs4394668, rs4618985, rsl831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081, rs6707773, rsl3019278, rsl2622496, rs3731723, rs722555, rs6436915, rs2279977, rs9822786, rsl2491012, rsl811763, rsl896731, rsl0038113, rsl 1739167, rs7447989, rs6873221, rsl2187724, rsl423435, rsl2153325, rslO461556, rs350436, rs830907, rs315717, rs3804254, rs2317222, rsl936022, rs6907646, rsl2529724, rsl504279, rs 1504281, rs2799644, rs320813, rsl529001, rsl 1765584, rsl529813, rs2528795, rsl000058, rsl 149558, rsl 130496, rsl2155975, rsl6930253, rs2082804, rs4739071, rs6991229, rsl865641, rsl865638, rsl 1001685, rs881631, rsl2245799, rsl2249859, rsl 0748804, rs7910491, rsl 0840070, rsl 1021927, rs9919560, rsl 2366827, rsl 894827, rsl 1055784, rsl 1055786, rsl 1117003, rsl0492086, rsl0444509, rsl 1616562, rs7325257, rsl 1636552, rsl532926, rs4456502, rs4148358, rs891754, rs8088001, rs547668, rsl 144093, rs637644, rs6049129, rs742759, rsl71415, rsl 0065041, rs7704909, rsl 896731, rsl0038113, rs6894838, rsl2518194, rs4307059, or rs4327572.
[0016] In another embodiment, the biomarker comprises a mutation in a cadherin gene, and/or 5' to cadherin gene 10 (CDHlO) and 3' to cadherin gene 9 (CDH9), cadherin gene 9, cadherin 10, or protocadherin 10.
[0017] In another preferred embodiment, a method of diagnosing a patient pre-natally or post-natally with autism comprising: detecting at least one single nucleotide polymorphism on chromosome 5pl4.1 and/or at least one single nucleotide polymorphism comprising: rs201171, rs4394668, rs4618985, rsl831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl 866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081, rs6707773, rsl3019278, rsl2622496, rs3731723, rs722555, rs6436915, rs2279977, rs9822786, rsl2491012, rsl811763, rsl 896731, rslOO38113, rsl 1739167, rs7447989, rs6873221, rsl2187724, rsl423435, rsl2153325, rsl 0461556, rs350436, rs830907, rs315717, rs3804254, rs2317222, rsl 936022, rs6907646, rsl2529724, rsl504279, rsl504281, rs2799644, rs320813, rsl529001, rsl 1765584, rsl529813, rs2528795, rslOOOO58, rsl 149558, rsl 130496, rsl2155975, rsl6930253, rs2082804, rs4739071, rs6991229, rsl865641, rsl865638, rsl 1001685, rs881631, rsl2245799, rsl2249859, rsl0748804, rs7910491, rsl 0840070, rsl 1021927, rs9919560, rs 12366827, rsl 894827, rsl 1055784, rsl 1055786, rsl 1117003, rsl0492086, rsl0444509, rsl 1616562, rs7325257, rsl 1636552, rsl532926, rs4456502, rs4148358, rs891754, rs8088001, rs547668, rsl 144093, rs637644, rs6049129, rs742759, rsl71415, rsl0065041, rs7704909, rsl 896731, rslOO38113, rs6894838, rsl2518194, rs4307059, or rs4327572.
[0018] In a preferred embodiment, the sample comprises: amniotic fluid, serum, blood, plasma, cells or tissue.
[0019] In another preferred embodiment, a method of identifying a marker for the diagnosis of autism and autism spectrum disorders comprises obtaining a sample from individuals and their families and purifying genomic DNA from the sample; genotyping single nucleotide polymorphisms (SNP); assessing the single nucleotide polymorphisms. [0020] In a preferred embodiment, the samples are assessed in genome -wide association analysis.
[0021] In another preferred embodiment a biomarker for identifying a patient at risk of developing or for the diagnosis of autism and autism spectrum of disorders comprising: a mutation in a cadherin gene and/or 5' to cadherin gene 10 (CDHlO) and 3' to cadherin gene 9 (CDH9). Preferably, the cadherin gene comprises cadherin gene 9, cadherin 10, or protocadherin 10.
[0022] In another preferred embodiment, in conjunction with the cadherin biomarkers, one or more biomarkers further comprise at least one single nucleotide polymorphism on chromosome 5pl4.1 and/ or at least one single polymorphism nucleotide (SNP) set forth as: rs201171, rs4394668, rs4618985, rsl831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl 866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081, rs6707773, rsl3019278, rsl2622496, rs3731723, rs722555, rs6436915, rs2279977, rs9822786, rsl2491012, rsl811763, rsl896731, rsl0038113, rsl 1739167, rs7447989, rs6873221, rsl2187724, rsl423435, rsl2153325, rsl 0461556, rs350436, rs830907, rs315717, rs3804254, rs2317222, rsl 936022, rs6907646, rsl2529724, rsl504279, rsl 504281, rs2799644, rs320813, rsl529001, rsl 1765584, rsl529813, rs2528795, rsl000058, rsl 149558, rsl 130496, rsl2155975, rsl6930253, rs2082804, rs4739071, rs6991229, rsl865641, rsl865638, rsl 1001685, rs881631, rsl2245799, rsl2249859, rsl0748804, rs7910491, rsl 0840070, rsl 1021927, rs9919560, rs 12366827, rsl 894827, rsl 1055784, rsl 1055786, rsl 1117003, rsl0492086, rsl0444509, rsl 1616562, rs7325257, rsl 1636552, rsl532926, rs4456502, rs4148358, rs891754, rs8088001, rs547668, rsl 144093, rs637644, rs6049129, rs742759, rsl71415, rsl0065041, rs7704909, rsl896731, rsl0038113, rs6894838, rsl2518194, rs4307059, or rs4327572, variants, mutants, alleles or complementary sequences thereof.
[0023] In another preferred embodiment, a biomarker for diagnostically distinguishing between autism and autism spectrum associated disorders comprising: at least one single nucleotide polymorphism on chromosome 5pl4.1 and/ or at least one single polymorphism nucleotide (SNP) set forth as: rs201171, rs4394668, rs4618985, rsl831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081, rs6707773, rsl3019278, rsl2622496, rs3731723, rs722555, rs6436915, rs2279977, rs9822786, rsl2491012, rsl811763, rsl896731, rsl0038113, rsl 1739167, rs7447989, rs6873221, rsl2187724, rsl423435, rsl2153325, rslO461556, rs350436, rs830907, rs315717, rs3804254, rs2317222, rsl936022, rs6907646, rsl2529724, rsl504279, rsl504281, rs2799644, rs320813, rsl529001, rsl 1765584, rsl529813, rs2528795, rsl000058, rsl 149558, rsl 130496, rsl2155975, rsl6930253, rs2082804, rs4739071, rs6991229, rsl865641, rsl865638, rsl 1001685, rs881631, rsl2245799, rsl2249859, rsl0748804, rs7910491, rsl 0840070, rsl 1021927, rs9919560, rsl2366827, rsl894827, rsl 1055784, rsl 1055786, rsl 1117003, rsl0492086, rsl0444509, rsl 1616562, rs7325257, rsl 1636552, rsl532926, rs4456502, rs4148358, rs891754, rs8088001, rs547668, rsl 144093, rs637644, rs6049129, rs742759, rsl71415, rsl 0065041, rs7704909, rsl896731, rsl0038113, rs6894838, rsl2518194, rs4307059, or rs4327572, variants, mutants, alleles or complementary sequences thereof. The biomarker further comprising at least one mutation in a cadherin gene and/or protocadherin gene, variants, mutants, alleles or complementary sequences thereof.
[0024] Other aspects are described infra.
BRIEF DESCRIPTION OF THE DRAWINGS
[0025] Figure 1 shows a Quantile-Quantile (Q-Q) plot of PDT p-values for the discovery dataset. The Q-Q plot measures deviation from the expected deviation of P-values. The diagonal (red) line represents the expected (null) distribution. The slight deviation of the observed values above expected values at the tail of the distribution is consistent with modest genetic effects.
[0026] Figure 2 shows a genome- wide plot of association p-values in the discovery dataset. - loglθ(p-value) for all 775,311 tested SNPs in 438 families are plotted against their genomic location. 96 SNPs have p-values <10" (horizontal red line) and 6 SNPs have p-values <10" (blue horizontal line). Individual chromosomes are demarked by different colors.
[0027] Figure 3 shows a Linkage Disequilibrium pattern among validated SNPs on chromosome 5pl4.1 Linkage disequilibrium (LD) was measured as r2 values, which range from 0 (no correlation) to 1 (complete correlation). LD was calculated between each pair of SNPs. Two blocks of strong LD were observed and span 3 Kb (SNPs 2-4) and 28 Kb (SNPs 5-8). SNP numbers correspond to the order in Table 2.
DETAILED DESCRIPTION
[0028] Several aspects of the invention are described below with reference to example applications for illustration. It should be understood that numerous specific details, relationships, and methods are set forth to provide a full understanding of the invention. One having ordinary skill in the relevant art, however, will readily recognize that the invention can be practiced without one or more of the specific details or with other methods. The present invention is not limited by the illustrated ordering of acts or events, as some acts may occur in different orders and/or concurrently with other acts or events. Furthermore, not all illustrated acts or events are required to implement a methodology in accordance with the present invention.
[0029] All genes, gene names, and gene products disclosed herein are intended to correspond to homologs from any species for which the compositions and methods disclosed herein are applicable. Thus, the terms include, but are not limited to genes and gene products from humans and mice. It is understood that when a gene or gene product from a particular species is disclosed, this disclosure is intended to be exemplary only, and is not to be interpreted as a limitation unless the context in which it appears clearly indicates. Thus, for example, for the genes disclosed herein, which in some embodiments relate to mammalian nucleic acid and amino acid sequences are intended to encompass homologous and/or orthologous genes and gene products from other animals including, but not limited to other mammals, fish, amphibians, reptiles, and birds. In preferred embodiments, the genes or nucleic acid sequences are human.
Definitions
[0030] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Furthermore, to the extent that the terms "including", "includes", "having", "has", "with", or variants thereof are used in either the detailed description and/or the claims, such terms are intended to be inclusive in a manner similar to the term "comprising."
[0031] The term "about" or "approximately" means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, "about" can mean within 1 or more than 1 standard deviation, per the practice in the art. Alternatively, "about" can mean a range of up to 20%, preferably up to 10%, more preferably up to 5%, and more preferably still up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude, preferably within 5-fold, and more preferably within 2-fold, of a value. Where particular values are described in the application and claims, unless otherwise stated the term "about" meaning within an acceptable error range for the particular value should be assumed.
[0032] The term "mutation" refers to one or more changes to the sequence of a DNA sequence or a protein amino acid sequence relative to a reference sequence, usually a wild-type sequence. A mutation in a DNA sequence may or may not result in a corresponding change to the amino acid sequence of the encoded protein. A mutation may be a point mutation, i.e. an exchange of a single nucleotide and/or amino acid for another. Point mutations that occur within the protein-coding region of a gene's DNA sequence may be classified as a silent mutation (coding for the same amino acid), a missense mutation (coding for a different amino acid), and a nonsense mutation (coding for a stop which can truncate the protein). A mutation may also be an insertion, i.e. an addition of one or more extra nucleotides and/or amino acids into the sequence. Insertions in the coding region of a gene may alter splicing of the mRNA (splice site mutation), or cause a shift in the reading frame (frameshift), both of which can significantly alter the gene product. A mutation may also be a deletion, i.e. removal of one or more nucleotides and/or amino acids from the sequence. Deletions in the coding region of a gene may alter the splicing and/or reading frame of the gene. A mutation may be spontaneous, induced, naturally occurring, or genetically engineered.
[0033] As used herein, detecting a mutation in a subject may be done by any method useful for analyzing the DNA or amino acid sequence of the subject for the presence or absence of a mutation. Such methods for analyzing a DNA or amino acid sequence are well known to those of skill in the art and any suitable means of detecting a mutation are encompassed by the present invention. Such analysis may be done, for example, by isolating a genomic DNA sample from the subject and using nucleic acid hybridization with a detectable probe to test for the presence and/or absence of a mutation. Alternately, such analysis may be done using an mRNA sample from the subject, and optionally producing cDNA from the sample. Such analysis may also be done, for example, using polymerase chain reaction to amplify a nucleic acid sequence and the amplification product may be sequenced and/or used for hybridization with a probe to detect the mutation. Such analysis may also be done, for example, by isolating a protein sample from the subject and using antibodies to test for the presence and/or absence of a mutation in the protein.
[0034] The term "biomarker" or "genetic marker" or "marker" are used interchangeably herein and as used herein refers to a region of a nucleotide sequence (e.g., in a chromosome) that is subject to variability (i.e., the region can be polymorphic for a variety of alleles). For example, a single nucleotide polymorphism (SNP) in a nucleotide sequence is a biomarker that is polymorphic for two alleles. Other examples of biomarkers of this invention can include but are not limited to microsatellites, restriction fragment length polymorphisms (RFLPs), repeats (i.e., duplications), insertions, deletions, etc.
[0035] The terms "subject", "patient" or "individual" are used interchangeably herein and includes mammals, birds and reptiles. Examples of subjects of this invention can include, but are not limited to, humans, non-human primates, dogs, cats, horses, cows, goats, guinea pigs, mice, rats and rabbits, as well as any other domestic, commercially or clinically valuable animal including animal models of autistic disorder.
[0036] By the term "modulate," it is meant that any of the mentioned activities, are, e.g., increased, enhanced, increased, agonized (acts as an agonist), promoted, decreased, reduced, suppressed blocked, or antagonized (acts as an agonist). Modulation can increase activity more than 1-fold, 2-fold, 3-fold, 5-fold, 10-fold, 100-fold, etc., over baseline values. Modulation can also decrease its activity below baseline values.
[0037] An "allele" or "variant" is an alternative form of a gene. Variants may result from at least one mutation in the nucleic acid sequence and may result in altered mRNAs or in polypeptides whose structure or function may or may not be altered. Any given natural or recombinant gene may have none, one, or many allelic forms. Common mutational changes that give rise to variants are generally ascribed to natural deletions, additions, or substitutions of nucleotides. Each of these types of changes may occur alone, or in combination with the others, one or more times in a given sequence.
[0038] The term, "complementary" means that two sequences are complementary when the sequence of one can bind to the sequence of the other in an anti-parallel sense wherein the 3'-end of each sequence binds to the 5'-end of the other sequence and each A, T(U), G, and C of one sequence is then aligned with a T(U), A, C, and G, respectively, of the other sequence. Normally, the complementary sequence of the oligonucleotide has at least 80% or 90%, preferably 95%, most preferably 100%, complementarity to a defined sequence. Preferably, alleles or variants thereof can be identified. A BLAST program also can be employed to assess such sequence identity.
[0039] The term "complementary sequence" as it refers to a polynucleotide sequence, relates to the base sequence in another nucleic acid molecule by the base-pairing rules. More particularly, the term or like term refers to the hybridization or base pairing between nucleotides or nucleic acids, such as, for instance, between the two strands of a double stranded DNA molecule or between an oligonucleotide primer and a primer binding site on a single stranded nucleic acid to be sequenced or amplified. Complementary nucleotides are, generally, A and T (or A and U), or C and G. Two single stranded RNA or DNA molecules are said to be substantially complementary when the nucleotides of one strand, optimally aligned and compared and with appropriate nucleotide insertions or deletions, pair with at least about 95% of the nucleotides of the other strand, usually at least about 98%, and more preferably from about 99 % to about 100%. Complementary polynucleotide sequences can be identified by a variety of approaches including use of well-known computer algorithms and software, for example the BLAST program.
[0040] "Biological samples" include solid and body fluid samples. Preferably, the sample is obtained from heart. However, the biological samples used in the present invention can include cells, protein or membrane extracts of cells, blood or biological fluids such as ascites fluid or brain fluid (e.g., cerebrospinal fluid). Examples of solid biological samples include, but are not limited to, samples taken from tissues of the central nervous system, bone, breast, kidney, cervix, endometrium, head/neck, gallbladder, parotid gland, prostate, pituitary gland, muscle, esophagus, stomach, small intestine, colon, liver, spleen, pancreas, thyroid, heart, lung, bladder, adipose, lymph node, uterus, ovary, adrenal gland, testes, tonsils and thymus. Examples of "body fluid samples" include, but are not limited to blood, serum, semen, prostate fluid, seminal fluid, urine, saliva, sputum, mucus, bone marrow, lymph, and tears.
[0041] "Sample" is used herein in its broadest sense. A sample comprising polynucleotides, polypeptides, peptides, antibodies and the like may comprise a bodily fluid; a soluble fraction of a cell preparation, or media in which cells were grown; a chromosome, an organelle, or membrane isolated or extracted from a cell; genomic DNA, RNA, or cDNA, polypeptides, or peptides in solution or bound to a substrate; a cell; a tissue; a tissue print; a fingerprint, skin or hair; and the like.
[0042] "Mammal" covers warm blooded mammals that are typically under medical care (e.g., humans and domesticated animals). Examples include feline, canine, equine, bovine, and human, as well as just human.
[0043] "Treating" or "treatment" covers the treatment of a disease-state in a mammal, and includes: (a) preventing the disease-state from occurring in a mammal, in particular, when such mammal is predisposed to the disease-state but has not yet been diagnosed as having it; (b) inhibiting the disease-state, e.g., arresting it development; and/or (c) relieving the disease-state, e.g., causing regression of the disease state until a desired endpoint is reached. Treating also includes the amelioration of a symptom of a disease (e.g., lessen the pain or discomfort), wherein such amelioration may or may not be directly affecting the disease (e.g., cause, transmission, expression, etc.).
[0044] "Diagnostic" means identifying the presence or nature of a pathologic condition. Diagnostic methods differ in their sensitivity and specificity. The "sensitivity" of a diagnostic assay is the percentage of diseased individuals who test positive (percent of "true positives"). Diseased individuals not detected by the assay are "false negatives." Subjects who are not diseased and who test negative in the assay, are termed "true negatives." The "specificity" of a diagnostic assay is 1 minus the false positive rate, where the "false positive" rate is defined as the proportion of those without the disease who test positive. While a particular diagnostic method may not provide a definitive diagnosis of a condition, it suffices if the method provides a positive indication that aids in diagnosis. [0045] As used herein, the term "safe and effective amount" or "therapeutic amount" refers to the quantity of a component which is sufficient to yield a desired therapeutic response without undue adverse side effects (such as toxicity, irritation, or allergic response) commensurate with a reasonable benefit/risk ratio when used in the manner of this invention. By "therapeutically effective amount" is meant an amount of a compound of the present invention effective to yield the desired therapeutic response. The specific safe and effective amount or therapeutically effective amount will vary with such factors as the particular condition being treated, the physical condition of the patient, the type of mammal or animal being treated, the duration of the treatment, the nature of concurrent therapy (if any), and the specific formulations employed and the structure of the compounds or its derivatives.
Biomarkers
[0046] Although autism is one of the most heritable neuropsychiatric disorders, its underlying genetic architecture had heretofore largely eluded description. To comprehensively examine the hypothesis that common variation is important in autism, a genome -wide association study (GWAS) was conducted using a discovery dataset of 438 autistic Caucasian families and the Illumina Human IM beadchip. 96 single nucleotide polymorphisms (SNPs) demonstrated strong association with autism risk (p-value < 0.0001). The validation of the top 96 SNPs was performed using an independent dataset of 487 Caucasian autism families genotyped on the 550K Illumina BeadChip. A novel region on chromosome 5pl4.1 showed significance in both the discovery and validation datasets. Joint analysis of all SNPs in this region identified 8 SNPs having improved p-values (3.24E-04 to 3.40E-06) than in either dataset alone. These findings demonstrated that in addition to multiple rare variations, part of the complex genetic architecture of autism involves common variation.
[0047] In a preferred embodiment, detection of at least one single nucleotide polymorphism is diagnostic of autism or a spectrum of disorders (ASDs) associated with autism. In addition to multiple rare variations, part of the complex genetic architecture of autism involved common variations. The biomarkers of the present invention are single nucleotide polymorphisms (SNP). Exemplary single nucleotide polymorphisms include but are not limited to T for G, T for A, C for A, C for T, A for G, A for C, A for T, G for A , G for T substitutions or any combinations thereof. Common variations amongst individuals diagnosed with autism or ASD, were identified on chromosome 5pl4.1. For example, a single nucleotide polymorphism on chromosome 5pl4.1 comprises: rsl0065041, rs7704909, rsl896731, rslOO38113, rs6894838, rsl2518194, rs4307059, or rs4327572.
[0048] The biomarkers of this invention can be used individually or in combination. Thus, in a preferred embodiment, a biomarker for the diagnosis of autism and autism spectrum associated disorders comprising detecting at least one single nucleotide polymorphism on chromosome 5pl4.1 and/ or at least one single polymorphism nucleotide (SNP) set forth as: rs201171, rs4394668, rs4618985, rsl831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl 866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081, rs6707773, rsl3019278, rsl2622496, rs3731723, rs722555, rs6436915, rs2279977, rs9822786, rsl2491012, rsl811763, rsl896731, rslOO38113, rsl 1739167, rs7447989, rs6873221, rsl2187724, rsl423435, rsl2153325, rsl 0461556, rs350436, rs830907, rs315717, rs3804254, rs2317222, rsl 936022, rs6907646, rsl2529724, rsl504279, rsl504281, rs2799644, rs320813, rsl529001, rsl 1765584, rsl529813, rs2528795, rslOOOO58, rsl 149558, rsl 130496, rsl2155975, rsl6930253, rs2082804, rs4739071, rs6991229, rsl865641, rsl865638, rsl 1001685, rs881631, rsl2245799, rsl2249859, rsl0748804, rs7910491, rsl 0840070, rsl 1021927, rs9919560, rs 12366827, rsl 894827, rsl 1055784, rsl 1055786, rsl 1117003, rsl0492086, rsl0444509, rsl 1616562, rs7325257, rsl 1636552, rsl532926, rs4456502, rs4148358, rs891754, rs8088001, rs547668, rsl 144093, rs637644, rs6049129, rs742759, or rsl71415.
[0049] In another preferred embodiment, the biomarkers further comprise complementary sequences, fragments, alleles, variants, or gene products thereof of any one or more sequences comprising the SNPs that are indicative of autism or autism associated spectrum of disorders. [0050] In another preferred embodiment, a biomarker for identifying a patient at risk of developing or for the diagnosis of autism and autism spectrum of disorders comprises at least one mutation in a cadherin gene and/or protocadherin gene and/or a mutation 5 ' to cadherin gene 10 (CDHlO) and 3' to cadherin gene 9 (CDH9).
[0051] In a preferred embodiment, the cadherin gene comprises cadherin gene 9, cadherin 10, or protocadherin 10. [0052] Of great interest is the fact that the candidate region is flanked by two excellent candidate genes in CDH9 (Cadherin 9) and CDHlO (Cadherin 10). The Cadherins are integral membrane proteins involved in calcium dependent cell-cell adhesion, and more specifically protein cell adhesion activity. Both genes have been identified as being expressed in fetal brain. Abnormalities in either of their functions could easily lead to inappropriate connections, and development, such as those believed to be present in autism, in developing brain. The region of association indentified herein, lies 5' to the CDHlO gene and 3' to CDH9. Since regulatory regions, which can be quite distant from the gene affected, generally tend to lie at the 5' ends of gene, detection of mutations or markers of CDHlO will be a priority, while not ignoring the fact that CDH9, which interacts with CDHlO, may also be a/the causative gene. Deletions in the related PCDHlO (proto cadherin 10) may also be involved in the etiology of autism. The CDHlO gene is predominantly expressed in brain and believed to be involved in the formation of synapses, axon growth and guidance.
[0053] In another preferred embodiment, a method of diagnosing a patient with autism or autism spectrum of disorders comprises detecting in a patient a biomarker set comprising identifying at least one single nucleotide polymorphism on chromosome 5pl4.1 and/ or at least one single polymorphism nucleotide (SNP) set forth as: rs201171, rs4394668, rs4618985, rsl831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081, rs6707773, rsl3019278, rsl2622496, rs3731723, rs722555, rs6436915, rs2279977, rs9822786, rsl2491012, rsl811763, rsl896731, rsl0038113, rsl 1739167, rs7447989, rs6873221, rsl2187724, rsl423435, rsl2153325, rslO461556, rs350436, rs830907, rs315717, rs3804254, rs2317222, rsl936022, rs6907646, rsl2529724, rsl504279, rs 1504281, rs2799644, rs320813, rsl529001, rsl 1765584, rsl529813, rs2528795, rsl000058, rsl 149558, rsl 130496, rsl2155975, rsl6930253, rs2082804, rs4739071, rs6991229, rsl865641, rsl865638, rsl 1001685, rs881631, rsl2245799, rsl2249859, rsl0748804, rs7910491, rsl 0840070, rsl 1021927, rs9919560, rsl2366827, rsl894827, rsl 1055784, rsl 1055786, rsl 1117003, rsl0492086, rsl0444509, rsl 1616562, rs7325257, rsl 1636552, rsl532926, rs4456502, rs4148358, rs891754, rs8088001, rs547668, rsl 144093, rs637644, rs6049129, rs742759, or rsl71415. [0054] In another preferred embodiment, a method of identifying a subject having an increased risk of developing autism or autism spectrum of disorders comprising: identifying in a patient a biomarker set comprising detecting at least one single nucleotide polymorphism on chromosome 5pl4.1 and/ or at least one single polymorphism nucleotide (SNP) set forth as: rs201171, rs4394668, rs4618985, rsl831870, rsl55288, rs5008948, rsl2024204, rsl l 162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081, rs6707773, rsl3019278, rsl2622496, rs3731723, rs722555, rs6436915, rs2279977, rs9822786, rsl2491012, rsl811763, rsl896731, rslOO38113, rsl 1739167, rs7447989, rs6873221, rsl2187724, rsl423435, rsl2153325, rslO461556, rs350436, rs830907, rs315717, rs3804254, rs2317222, rsl936022, rs6907646, rsl2529724, rsl504279, rs 1504281, rs2799644, rs320813, rsl529001, rsl 1765584, rsl529813, rs2528795, rslOOOO58, rsl 149558, rsl 130496, rsl2155975, rsl6930253, rs2082804, rs4739071, rs6991229, rsl865641, rsl865638, rsl 1001685, rs881631, rsl2245799, rsl2249859, rsl 0748804, rs7910491, rsl 0840070, rsl 1021927, rs9919560, rsl 2366827, rsl 894827, rsl 1055784, rsl 1055786, rsl 1117003, rsl0492086, rsl0444509, rsl 1616562, rs7325257, rsl 1636552, rsl532926, rs4456502, rs4148358, rs891754, rs8088001, rs547668, rsl 144093, rs637644, rs6049129, rs742759, or rs 171415. The assessment can also include detecting any or more of a single nucleotide polymorphism on chromosome 5pl4.1 comprises: rsl0065041, rs7704909, rsl896731, rslOO38113, rs6894838, rsl2518194, rs4307059, or rs4327572 and/or detecting a mutation in a cadherin gene. The region of association with autism or autism spectrum of disorders comprises detecting at least one mutation in a cadherin gene and/or protocadherin gene and/or a mutation 5' to cadherin gene 10 (CDHlO) and 3' to cadherin gene 9 (CDH9).
[0055] In another preferred embodiment, a method of diagnosing a patient pre-natally or post-natally with autism comprises detecting at least one single nucleotide polymorphism on chromosome 5pl4.1 and/or at least one single nucleotide polymorphism comprising: rs201171, rs4394668, rs4618985, rsl831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl 866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081, rs6707773, rsl3019278, rsl2622496, rs3731723, rs722555, rs6436915, rs2279977, rs9822786, rsl2491012, rsl811763, rsl896731, rslOO38113, rsl 1739167, rs7447989, rs6873221, rsl2187724, rsl423435, rsl2153325, rs 10461556, rs350436, rs830907, rs315717, rs3804254, rs2317222, rs 1936022, rs6907646, rsl2529724, rsl504279, rsl 504281, rs2799644, rs320813, rsl529001, rsl 1765584, rsl529813, rs2528795, rsl000058, rsl 149558, rsl 130496, rsl2155975, rsl6930253, rs2082804, rs4739071, rs6991229, rsl865641, rsl865638, rsl 1001685, rs881631, rsl2245799, rsl2249859, rsl0748804, rs7910491, rsl 0840070, rsl 1021927, rs9919560, rs 12366827, rsl 894827, rsl 1055784, rsl 1055786, rsl 1117003, rsl0492086, rsl0444509, rsl 1616562, rs7325257, rsl 1636552, rsl532926, rs4456502, rs4148358, rs891754, rs8088001, rs547668, rsl 144093, rs637644, rs6049129, rs742759, rsl71415 and/ or rsl0065041, rs7704909, rsl896731, rsl0038113, rs6894838, rsl2518194, rs4307059, or rs4327572 and/or at least one mutation in a cadherin gene and/or protocadherin gene and/or a mutation 5' to cadherin gene 10 (CDHlO) and 3' to cadherin gene 9 (CDH9).
[0056] In a preferred embodiment, a sample comprises: amniotic fluid, serum, blood, plasma, or tissue from the fetus, and/or parents.
[0057] In another preferred embodiment, a method of identifying a marker for the diagnosis of autism and autism spectrum disorders comprises obtaining a sample from individuals and their families and purifying genomic DNA from the sample; genotyping single nucleotide polymorphisms (SNP); assessing the single nucleotide polymorphisms.
[0058] In a preferred embodiment, the samples are assessed in a genome wide association analysis. Preferably the samples are subject to stringent quality control. Examples of the quality control are detailed in the examples section which follows.
[0059] In a preferred embodiment, samples from families with Mendelian errors greater than about 2% are excluded for the study. In another preferred embodiment single nucleotide polymorphisms having a Hardy-Weinberg equilibrium (HWE) p-value of about less than 10" and a Mendelian Error (ME) of greater than about 4% are also excluded.
[0060] In a preferred embodiment, the data are subjected to mathematical and statistical analysis. For example, autosomal markers are analyzed for association comprising a pedigree disequilibrium test (PDT) and the samples are also assessed for linkage disequilibrium patterns.
[0061] In the methods described herein, the detection of a biomarker in a subject can be carried out according to methods well known in the art. For example DNA is obtained from any suitable sample from the subject that will contain DNA, preferably genomic DNA, and the DNA is then prepared and analyzed according to well-established protocols for the presence of biomarkers according to the methods of this invention. In some embodiments, analysis of the DNA can be carried out by amplification of the region of interest according to amplification protocols well known in the art (e.g., polymerase chain reaction, ligase chain reaction, strand displacement amplification, transcription-based amplification, self-sustained sequence replication (3SR), Qβ replicase protocols, nucleic acid sequence-based amplification (NASBA), repair chain reaction (RCR) and boomerang DNA amplification (BDA)). The amplification product can then be visualized directly in a gel by staining or the product can be detected by hybridization with a detectable probe. When amplification conditions allow for amplification of all allelic types of a biomarker, the types can be distinguished by a variety of well-known methods, such as hybridization with an allele-specifϊc probe, secondary amplification with allele - specifϊc primers, by restriction endonuclease digestion, or by electrophoresis. Thus, the present invention can further provide oligonucleotides for use as primers and/or probes for detecting and/or identifying biomarkers according to the methods of this invention.
[0062] The biomarkers of this invention are correlated with an autistic disorder as described herein according to methods well known in the art and as disclosed in the Examples provided herein for correlating biomarkers with various phenotypic traits, including disease states, disorders and pathological conditions and levels of risk associated with developing a disease, disorder or pathological condition. In general, identifying such correlation involves conducting analyses that establish a statistically significant association- and/or a statistically significant correlation between the presence of a biomarker or a combination of markers and the phenotypic trait in the subject. An analysis that identifies a statistical association (e.g., a significant association) between the marker or combination of markers and the phenotype establishes a correlation between the presence of the marker or combination of markers in a subject and the particular phenotype being analyzed.
[0063] The present invention also provides a method wherein the biomarker is a combination of the single nucleotide polymorphisms, that is correlated with an aspect of autistic disorder as described herein. Thus, for example, SNPs correlated with increased risk of autistic disorder or with a diagnosis of autistic disorder include detecting at least one single nucleotide polymorphism on chromosome 5pl4.1 and/ or at least one single polymorphism nucleotide (SNP) set forth as: rs201171, rs4394668, rs4618985, rsl831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081, rs6707773, rsl3019278, rsl2622496, rs3731723, rs722555, rs6436915, rs2279977, rs9822786, rsl2491012, rsl811763, rsl896731, rslOO38113, rsl 1739167, rs7447989, rs6873221, rsl2187724, rsl423435, rsl2153325, rslO461556, rs350436, rs830907, rs315717, rs3804254, rs2317222, rsl936022, rs6907646, rsl2529724, rsl504279, rs 1504281, rs2799644, rs320813, rsl529001, rsl 1765584, rsl529813, rs2528795, rslOOOO58, rsl 149558, rsl 130496, rsl2155975, rsl6930253, rs2082804, rs4739071, rs6991229, rsl865641, rsl865638, rsl 1001685, rs881631, rsl2245799, rsl2249859, rsl 0748804, rs7910491, rsl 0840070, rsl 1021927, rs9919560, rsl 2366827, rsl 894827, rsl 1055784, rsl 1055786, rsl 1117003, rsl0492086, rsl0444509, rsl 1616562, rs7325257, rsl 1636552, rsl532926, rs4456502, rs4148358, rs891754, rs8088001, rs547668, rsl 144093, rs637644, rs6049129, rs742759, or rs 171415. The assessment can also include detecting any or more of a single nucleotide polymorphism on chromosome 5pl4.1 comprising: rsl0065041, rs7704909, rsl896731, rslOO38113, rs6894838, rsl2518194, rs4307059, or rs4327572 and/or detecting a mutation in a cadherin gene. The region of association with autism or autism spectrum of disorders comprises detecting at least one mutation in a cadherin gene and/or protocadherin gene and/or a mutation 5' to cadherin gene 10 (CDHlO) and 3' to cadherin gene 9 (CDH9).
[0064] The present invention also provides a method of identifying an effective treatment regimen for a subject with an autistic disorder, comprising detecting one or more biomarkers described in embodiments of the invention and correlated with an effective treatment regimen for an autistic disorder.
[0065] In addition, the present invention provides a method of identifying an effective treatment regimen for a subject with an autistic disorder, comprising: a) correlating the presence of one or more biomarkers in a test subject with an autistic disorder for whom an effective treatment regimen has been identified; and b) detecting the one or more markers of step (a) in the subject, thereby identifying an effective treatment regimen for the subject. [0066] Patients who respond well to particular treatment protocols can be analyzed for specific biomarkers and a correlation can be established according to the methods provided herein. Alternatively, patients who respond poorly to a particular treatment regimen can also be analyzed for particular biomarkers correlated with the poor response. Then, a subject who is a candidate for treatment for an autistic disorder can be assessed for the presence of the appropriate biomarkers and the most appropriate treatment regimen can be provided.
[0067] In some embodiments, the methods of correlating biomarkers with treatment regimens can be carried out using a computer database. Thus the present invention provides a computer-assisted method of identifying a proposed treatment for autistic disorder. The method involves the steps of (a) storing a database of biological data for a plurality of patients, the biological data that is being stored including for each of said plurality of patients (i) a treatment type, (ii) at least one biomarker associated with autistic disorder and (iii) at least one disease progression measure for autistic disorder from which treatment efficacy can be determined; and then (b) querying the database to determine the dependence on said biomarker of the effectiveness of a treatment type in treating autistic disorder, to thereby identify a proposed treatment as an effective treatment for a subject carrying a biomarker correlated with autistic disorder.
[0068] In one embodiment, treatment information for a patient is entered into the database (through any suitable means such as a window or text interface), biomarker information for that patient is entered into the database, and disease progression information is entered into the database. These steps are then repeated until the desired number of patients has been entered into the database. The database can then be queried to determine whether a particular treatment is effective for patients carrying a particular marker, not effective for patients carrying a particular marker, etc. Such querying can be carried out prospectively or retrospectively on the database by any suitable means, but is generally done by statistical analysis in accordance with known techniques, as described herein.
[0069] In one embodiment, an agent which can be used to treat a patient comprises an antisense oligonucleotide which modulates the expression and/or function of a gene comprising an SNP which has been associated with autism. The oligonucleotides can be used to modulate the expression of the normal sequence or decrease the expression of the variant sequence(s). Another example would be to express a normal gene variant in a patient.
[0070] In some embodiments, homology, sequence identity or complementarity, between the oligonucleotide and target is from about 50% to about 60%. In some embodiments, homology, sequence identity or complementarity, is from about 60% to about 70%. In some embodiments, homology, sequence identity or complementarity, is from about 70% to about 80%. In some embodiments, homology, sequence identity or complementarity, is from about 80% to about 90%. In some embodiments, homology, sequence identity or complementarity, is about 90%, about 92%, about 94%, about 95%, about 96%, about 97%, about 98%, about 99% or about 100%.
[0071] In another preferred embodiment, an oligonucleotide comprises combinations of phosphorothioate internucleotide linkages and at least one internucleotide linkage selected from the group consisting of: alkylphosphonate, phosphorodithioate, alkylphosphonothioate, phosphoramidate, carbamate, carbonate, phosphate triester, acetamidate, carboxymethyl ester, and/or combinations thereof.
[0072] In another preferred embodiment, an oligonucleotide optionally comprises at least one modified nucleobase comprising, peptide nucleic acids, locked nucleic acid (LNA) molecules, analogues, derivatives and/or combinations thereof.
[0073] An oligonucleotide is specifically hybridizable when binding of the compound to the target nucleic acid interferes with the normal function of the target nucleic acid to cause a loss of activity, and there is a sufficient degree of complementarity to avoid non-specific binding of the oligonucleotide to non-target nucleic acid sequences under conditions in which specific binding is desired. Such conditions include, i.e., physiological conditions in the case of in vivo assays or therapeutic treatment, and conditions in which assays are performed in the case of in vitro assays. [0074] An oligonucleotide, whether DNA, RNA, chimeric, substituted etc, is specifically hybridizable when binding of the compound to the target DNA or RNA molecule interferes with the normal function of the target DNA or RNA to cause a loss of utility, and there is a sufficient degree of complementarily to avoid non-specific binding of the oligonucleotide to non-target sequences under conditions in which specific binding is desired, i.e., under physiological conditions in the case of in vivo assays or therapeutic treatment, and in the case of in vitro assays, under conditions in which the assays are performed. [0075] In embodiments of the present invention oligomeric oligonucleotides, particularly oligonucleotides, bind to target nucleic acid molecules and modulate the expression and/or function of molecules encoded by a target gene. The functions of DNA to be interfered comprise, for example, replication and transcription. The functions of RNA to be interfered comprise all vital functions such as, for example, translocation of the RNA to the site of protein translation, translation of protein from the RNA, splicing of the RNA to yield one or more mRNA species, and catalytic activity which may be engaged in or facilitated by the RNA. The functions may be up-regulated or inhibited depending on the functions desired.
[0076] The oligonucleotides, include, antisense oligomeric compounds, antisense oligonucleotides, external guide sequence (EGS) oligonucleotides, alternate splicers, primers, probes, and other oligomeric compounds that hybridize to at least a portion of the target nucleic acid. As such, these compounds may be introduced in the form of single-stranded, double- stranded, partially single-stranded, or circular oligomeric compounds. [0077] Targeting an oligonucleotide to a particular nucleic acid molecule. The targeting process usually also includes determination of at least one target region, segment, or site within the target nucleic acid for the antisense interaction to occur such that the desired effect, e.g., modulation of expression, will result. Within the context of the present invention, the term "region" is defined as a portion of the target nucleic acid having at least one identifiable structure, function, or characteristic. Within regions of target nucleic acids are segments. "Segments" are defined as smaller or sub-portions of regions within a target nucleic acid. "Sites," as used in the present invention, are defined as positions within a target nucleic acid.
[0078] Since, as is known in the art, the translation initiation codon is typically 5'-AUG (in transcribed mRNA molecules; 5'-ATG in the corresponding DNA molecule), the translation initiation codon is also referred to as the "AUG codon," the "start codon" or the "AUG start codon". A minority of genes has a translation initiation codon having the RNA sequence 5'- GUG, 5'-UUG or 5'-CUG; and 5'-AUA, 5'-ACG and 5'-CUG have been shown to function in vivo. Thus, the terms "translation initiation codon" and "start codon" can encompass many codon sequences, even though the initiator amino acid in each instance is typically methionine (in eukaryotes) or formylmethionine (in prokaryotes). Eukaryotic and prokaryotic genes may have two or more alternative start codons, any one of which may be preferentially utilized for translation initiation in a particular cell type or tissue, or under a particular set of conditions. In the context of the invention, "start codon" and "translation initiation codon" refer to the codon or codons that are used in vivo to initiate translation of an mRNA transcribed from a gene regardless of the sequence(s) of such codons. A translation termination codon (or "stop codon") of a gene may have one of three sequences, i.e., 5'-UAA, 5'-UAG and 5'-UGA (the corresponding DNA sequences are 5'-TAA, 5'-TAG and 5'-TGA, respectively). [0079] The terms "start codon region" and "translation initiation codon region" refer to a portion of such an mRNA or gene that encompasses from about 25 to about 50 contiguous nucleotides in either direction (i.e., 5' or 3') from a translation initiation codon. Similarly, the terms "stop codon region" and "translation termination codon region" refer to a portion of such an mRNA or gene that encompasses from about 25 to about 50 contiguous nucleotides in either direction (i.e., 5' or 3') from a translation termination codon. Consequently, the "start codon region" (or "translation initiation codon region") and the "stop codon region" (or "translation termination codon region") are all regions that may be targeted effectively with the antisense compounds of the present invention.
[0080] The open reading frame (ORF) or "coding region," which is known in the art to refer to the region between the translation initiation codon and the translation termination codon, is also a region which may be targeted effectively. Within the context of the present invention, a targeted region is the intragenic region encompassing the translation initiation or termination codon of the open reading frame (ORF) of a gene.
[0081] Another target region includes the 5' untranslated region (5'UTR), known in the art to refer to the portion of an mRNA in the 5' direction from the translation initiation codon, and thus including nucleotides between the 5' cap site and the translation initiation codon of an mRNA (or corresponding nucleotides on the gene). Still another target region includes the 3' untranslated region (3'UTR), known in the art to refer to the portion of an mRNA in the 3' direction from the translation termination codon, and thus including nucleotides between the translation termination codon and 3' end of an mRNA (or corresponding nucleotides on the gene). The 5' cap site of an mRNA comprises an N7-methylated guanosine residue joined to the 5'-most residue of the mRNA via a 5'-5' triphosphate linkage. The 5' cap region of an mRNA is considered to include the 5' cap structure itself as well as the first 50 nucleotides adjacent to the cap site. Another target region for this invention is the 5' cap region. [0082] Although some eukaryotic mRNA transcripts are directly translated, many contain one or more regions, known as "introns," which are excised from a transcript before it is translated. The remaining (and therefore translated) regions are known as "exons" and are spliced together to form a continuous mRNA sequence. In one embodiment, targeting splice sites, i.e., intron-exon junctions or exon-intron junctions, is particularly useful in situations where aberrant splicing is implicated in disease, or where an overproduction of a particular splice product is implicated in disease. An aberrant fusion junction due to rearrangement or deletion is another embodiment of a target site. mRNA transcripts produced via the process of splicing of two (or more) mRNAs from different gene sources are known as "fusion transcripts". Introns can be effectively targeted using antisense compounds targeted to, for example, DNA or pre- mRNA.
[0083] In another preferred embodiment, the antisense oligonucleotides bind to coding and/or non-coding regions of a target polynucleotide and modulate the expression and/or function of the target molecule.
[0084] Alternative RNA transcripts can be produced from the same genomic region of DNA. These alternative transcripts or "pre-mRNA variants" are transcripts produced from the same genomic DNA that differ from other transcripts produced from the same genomic DNA in either their start or stop position and contain both intronic and exonic sequence. [0085] Upon excision of one or more exon or intron regions, or portions thereof during splicing, pre-mRNA variants produce smaller "mRNA variants". Consequently, mRNA variants are processed pre-mRNA variants and each unique pre-mRNA variant must always produce a unique mRNA variant as a result of splicing. These mRNA variants are also known as "alternative splice variants". If no splicing of the pre-mRNA variant occurs then the pre-mRNA variant is identical to the mRNA variant.
[0086] Variants can be produced through the use of alternative signals to start or stop transcription. Pre-mRNAs and mRNAs can possess more that one start codon or stop codon. Variants that originate from a pre-mRNA or mRNA that use alternative start codons are known as "alternative start variants" of that pre-mRNA or mRNA. Those transcripts that use an alternative stop codon are known as "alternative stop variants" of that pre-mRNA or mRNA. One specific type of alternative stop variant is the "polyA variant" in which the multiple transcripts produced result from the alternative selection of one of the "polyA stop signals" by the transcription machinery, thereby producing transcripts that terminate at unique polyA sites. Within the context of the invention, the types of variants described herein are also embodiments of target nucleic acids.
[0087] The locations on the target nucleic acid to which the antisense compounds hybridize are defined as at least a 5-nucleobase portion of a target region to which an active antisense compound is targeted.
[0088] While the specific sequences of certain exemplary target segments are set forth herein, one of skill in the art will recognize that these serve to illustrate and describe particular embodiments within the scope of the present invention. Additional target segments are readily identifiable by one having ordinary skill in the art in view of this disclosure. [0089] Once one or more target regions, segments or sites have been identified, antisense compounds are chosen which are sufficiently complementary to the target, i.e., hybridize sufficiently well and with sufficient specificity, to give the desired effect.
[0090] In embodiments of the invention the oligonucleotides bind to an antisense strand of a particular target. The oligonucleotides are at least 5 nucleotides in length and can be synthesized so each oligonucleotide targets overlapping sequences such that oligonucleotides are synthesized to cover the entire length of the target polynucleotide. The targets also include coding as well as non coding regions.
[0091] According to the present invention, antisense compounds include antisense oligonucleotides, ribozymes, external guide sequence (EGS) oligonucleotides, siRNA compounds, single- or double-stranded RNA interference (RNAi) compounds such as siRNA compounds, and other oligomeric compounds which hybridize to at least a portion of the target nucleic acid and modulate its function. As such, they may be DNA, RNA, DNA-like, RNA-like, or mixtures thereof, or may be mimetics of one or more of these. These compounds may be single-stranded, double-stranded, circular or hairpin oligomeric compounds and may contain structural elements such as internal or terminal bulges, mismatches or loops. Antisense compounds are routinely prepared linearly but can be joined or otherwise prepared to be circular and/or branched. Antisense compounds can include constructs such as, for example, two strands hybridized to form a wholly or partially double-stranded compound or a single strand with sufficient self-complementarity to allow for hybridization and formation of a fully or partially double-stranded compound. The two strands can be linked internally leaving free 3' or 5' termini or can be linked to form a continuous hairpin structure or loop. The hairpin structure may contain an overhang on either the 5' or 3' terminus producing an extension of single stranded character. The double stranded compounds optionally can include overhangs on the ends. Further modifications can include conjugate groups attached to one of the termini, selected nucleobase positions, sugar positions or to one of the internucleoside linkages. Alternatively, the two strands can be linked via a non-nucleic acid moiety or linker group. When formed from only one strand, dsRNA can take the form of a self-complementary hairpin-type molecule that doubles back on itself to form a duplex. Thus, the dsRNAs can be fully or partially double stranded. Specific modulation of gene expression can be achieved by stable expression of dsRNA hairpins in transgenic cell lines, however, in some embodiments, the gene expression or function is up regulated. When formed from two strands, or a single strand that takes the form of a self-complementary hairpin-type molecule doubled back on itself to form a duplex, the two strands (or duplex-forming regions of a single strand) are complementary RNA strands that base pair in Watson-Crick fashion.
[0092] Once introduced to a system, the compounds of the invention may elicit the action of one or more enzymes or structural proteins to effect cleavage or other modification of the target nucleic acid or may work via occupancy-based mechanisms. In general, nucleic acids (including oligonucleotides) may be described as "DNA-like" (i.e., generally having one or more 2'-deoxy sugars and, generally, T rather than U bases) or "RNA-like" (i.e., generally having one or more 2'-hydroxyl or 2'-modified sugars and, generally U rather than T bases). Nucleic acid helices can adopt more than one type of structure, most commonly the A- and B-forms. It is believed that, in general, oligonucleotides which have B-form-like structure are "DNA-like" and those which have A-form-like structure are "RNA-like." In some (chimeric) embodiments, an antisense compound may contain both A- and B-form regions.
[0093] In another preferred embodiment, the desired oligonucleotides or antisense compounds, comprise at least one of: antisense RNA, antisense DNA, chimeric antisense oligonucleotides, antisense oligonucleotides comprising modified linkages, interference RNA (RNAi), short interfering RNA (siRNA); a micro, interfering RNA (miRNA); a small, temporal RNA (stRNA); or a short, hairpin RNA (shRNA); small RNA-induced gene activation (RNAa); small activating RNAs (saRNAs), or combinations thereof. [0094] Certain preferred oligonucleotides of this invention are chimeric oligonucleotides. "Chimeric oligonucleotides" or "chimeras," in the context of this invention, are oligonucleotides which contain two or more chemically distinct regions, each made up of at least one nucleotide. These oligonucleotides typically contain at least one region of modified nucleotides that confers one or more beneficial properties (such as, for example, increased nuclease resistance, increased uptake into cells, increased binding affinity for the target) and a region that is a substrate for enzymes capable of cleaving RNA:DNA or RNA:RNA hybrids.
Antibodies and Aptamers
[001] In a preferred embodiment, antibodies and aptamers specifically bind to the biomarkers and components thereof. The components include the nucleic acid sequences, complementary sequences, fragments, alleles, variants and gene products thereof of each component in each biomarker.
[002] Aptamer polynucleotides are typically single-stranded standard phosphodiester DNA
(ssDNA). Close DNA analogs can also be incorporated into the aptamer as described below.
[003] A typical aptamer discovery procedure is described below:
[004] A polynucleotide comprising a randomized sequence between "arms" having constant sequence is synthesized. The arms can include restriction sites for convenient cloning and can also function as priming sites for PCR primers. The synthesis can easily be performed on commercial instruments.
[005] The target protein is treated with the randomized polynucleotide. The target protein can be in solution and then the complexes immobilized and separated from unbound nucleic acids by use of an antibody affinity column. Alternatively, the target protein might be immobilized before treatment with the randomized polynucleotide.
[006] The target protein-polynucleotide complexes are separated from the uncomplexed material and then the bound polynucleotides are separated from the target protein. The bound nucleic acid can then be characterized, but is more commonly amplified, e.g. by PCR and the binding, separation and amplification steps are repeated. In many instances, use of conditions increasingly promoting separation of the nucleic acid from the target protein, e.g. higher salt concentration, in the binding buffer used in step 2) in subsequent iterations, results in identification of polynucleotides having increasingly high affinity for the target protein.
[007] The nucleic acids showing high affinity for the target proteins are isolated and characterized. This is typically accomplished by cloning the nucleic acids using restriction sites incorporated into the arms, and then sequencing the cloned nucleic acid.
[008] The affinity of aptamers for their target proteins is typically in the nanomolar range, but can be as low as the picomolar range. That is KD is typically 1 pM to 500 nM, more typically from 1 pM to 100 nM. Apatmers having an affinity of KD in the range of 1 pM to 10 nM are also useful.
[009] Aptamer polynucleotides can be synthesized on a commercially available nucleic acid synthesizer by methods known in the art. The product can be purified by size selection or chromatographic methods.
[0010] Aptamer polynucleotides are typically from about 10 to 200 nucleotides long, more typically from about 10 to 100 nucleotides long, still more typically from about 10 to 50 nucleotides long and yet more typically from about 10 to 25 nucleotides long. A preferred range of length is from about 10 to 50 nucleotides.
[0011] The aptamer sequences can be chosen as a desired sequence, or random or partially random populations of sequences can be made and then selected for specific binding to a desired target protein by assay in vitro. Any of the typical nucleic acid-protein binding assays known in the art can be used, e.g. "Southwestern" blotting using either labeled oligonucleotide or labeled protein as the probe. See also U.S. Pat. No. 5,445,935 for a fluorescence polarization assay of protein-nucleic acid interaction.
[0012] Appropriate nucleotides for aptamer synthesis and their use, and reagents for covalent linkage of proteins to nucleic acids and their use, are considered known in the art.
[0013] A desired aptamer-protein complex, for example, aptamer-thrombin complex of the invention can be labeled and used as a diagnostic agent in vitro in much the same manner as any specific protein-binding agent, e.g. a monoclonal antibody. Thus, an aptamer-protein complex of the invention can be used to detect and quantitate the amount of its target protein in a sample, e.g. a blood sample, to provide diagnosis of a disease state correlated with the amount of the protein in the sample. [0014] A desired aptamer-target/bait molecular complex can also be used for diagnostic imaging. In imaging uses, the complexes are labeled so that they can be detected outside the body. Typical labels are radioisotopes, usually ones with short half-lives. The usual imaging rad . radioisotopes, such a
Figure imgf000031_0001
s
Figure imgf000031_0002
can be used. Nuclear magnetic resonance (NMR) imaging enhancers, such as gadolinium- 153, can also be used to label the complex for detection by NMR. Methods and reagents for performing the labeling, either in the polynucleotide or in the protein moiety, are considered known in the art.
[0015] In a preferred embodiment, an antibody or aptamer is specific for each biomarker or genetic marker comprising: : rs201171, rs4394668, rs4618985, rsl831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081, rs6707773, rsl3019278, rsl2622496, rs3731723, rs722555, rs6436915, rs2279977, rs9822786, rsl2491012, rsl811763, rsl 896731, rsl0038113, rsl 1739167, rs7447989, rs6873221, rsl2187724, rsl423435, rsl2153325, rslO461556, rs350436, rs830907, rs315717, rs3804254, rs2317222, rsl936022, rs6907646, rsl2529724, rsl504279, rsl504281, rs2799644, rs320813, rsl529001, rsl 1765584, rsl529813, rs2528795, rsl000058, rsl 149558, rsl 130496, rsl2155975, rsl6930253, rs2082804, rs4739071, rs6991229, rsl865641, rsl865638, rsl 1001685, rs881631, rsl2245799, rsl2249859, rsl0748804, rs7910491, rsl0840070, rsl 1021927, rs9919560, rsl2366827, rsl894827, rsl 1055784, rsl 1055786, rsl 1117003, rsl0492086, rsl0444509, rsl 1616562, rs7325257, rsl 1636552, rsl532926, rs4456502, rs4148358, rs891754, rs8088001, rs547668, rsl 144093, rs637644, rs6049129, rs742759, rsl71415 rsl 0065041, rs7704909, rsl 896731, rsl0038113, rs6894838, rsl2518194, rs4307059, rs4327572, and/or a mutation in a cadherin gene. The region of association with autism or autism spectrum of disorders also comprises at least one mutation in a cadherin gene and/or protocadherin gene and/or a mutation 5' to cadherin gene 10 (CDHlO) and 3' to cadherin gene 9 (CDH9). The biomarkers also encompass any complementary sequences, fragments, alleles, variants gene products thereof.
Drug Discovery
[0095] In other preferred embodiments, the biomarkers are useful for the identification of new drugs in the treatment of autism and spectrum of autistic disorders. [0096] Small Molecules: Small molecule test compounds or candidate therapeutic compounds can initially be members of an organic or inorganic chemical library. As used herein, "small molecules" refers to small organic or inorganic molecules of molecular weight below about 3,000 Daltons. The small molecules can be natural products or members of a combinatorial chemistry library. A set of diverse molecules should be used to cover a variety of functions such as charge, aromaticity, hydrogen bonding, flexibility, size, length of side chain, hydrophobicity, and rigidity. Combinatorial techniques suitable for synthesizing small molecules are known in the art, e.g., as exemplified by Obrecht and Villalgordo, Solid-Supported Combinatorial and Parallel Synthesis of Small-Molecular-Weight Compound Libraries, Pergamon-Elsevier Science Limited (1998), and include those such as the "split and pool" or "parallel" synthesis techniques, solid-phase and solution-phase techniques, and encoding techniques (see, for example, Czarnik, Curr. Opin. Chem. Bio., 1:60 (1997). In addition, a number of small molecule libraries are commercially available. [0097] Particular screening applications of this invention relate to the testing of pharmaceutical compounds in drug research. The reader is referred generally to the standard textbook "In vitro Methods in Pharmaceutical Research", Academic Press, 1997, and U.S. Pat. No. 5,030,015). Assessment of the activity of candidate pharmaceutical compounds generally involves administering a candidate compound, determining any change in the morphology, marker phenotype and expression, or metabolic activity of the cells and function of the cells that is attributable to the compound (compared with untreated cells or cells treated with an inert compound), and then correlating the effect of the compound with the observed change. [0098] The screening may be done, for example, either because the compound is designed to have a pharmacological effect on certain cell types, or because a compound designed to have effects elsewhere may have unintended side effects. Two or more drugs can be tested in combination (by combining with the cells either simultaneously or sequentially), to detect possible drug—drug interaction effects. In some applications, compounds are screened initially for potential toxicity (Castell et ah, pp. 375-410 in "In vitro Methods in Pharmaceutical Research," Academic Press, 1997). Cytotoxicity can be determined in the first instance by the effect on cell viability, survival, morphology, and expression or release of certain markers, receptors or enzymes. Effects of a drug on chromosomal DNA can be determined by measuring DNA synthesis or repair. [ H]thymidine or BrdU incorporation, especially at unscheduled times in the cell cycle, or above the level required for cell replication, is consistent with a drug effect. Unwanted effects can also include unusual rates of sister chromatid exchange, determined by metaphase spread. The reader is referred to A. Vickers (PP 375-410 in "In vitro Methods in Pharmaceutical Research," Academic Press, 1997) for further elaboration.
[0099] In one embodiment of the invention, a method of identifying a candidate agent is provided said method comprising: (a) contacting a biological sample from a patient with the candidate agent and determining the level of expression of one or more biomarkers described herein; (b) determining the level of expression of a corresponding biomarker or biomarkers in an aliquot of the biological sample not contacted with the candidate agent; (c) observing the effect of the candidate agent by comparing the level of expression of the biomarker or biomarkers in the aliquot of the biological sample contacted with the candidate agent and the level of expression of the corresponding biomarker or biomarkers in the aliquot of the biological sample not contacted with the candidate agent; and (d) identifying said agent from said observed effect, wherein an at least 1%, 2%, 5%, 10% difference between the level of expression of the biomarker gene or combination of biomarker genes in the aliquot of the biological sample contacted with the candidate agent and the level of expression of the corresponding biomarker gene or combination of biomarker genes in the aliquot of the biological sample not contacted with the candidate agent is an indication of an effect of the candidate agent. [00100] In another embodiment of the invention, a pharmaceutical preparation comprising an agent according to the invention is provided.
[00101] In another preferred embodiment of the invention, a method of producing a drug comprising the steps of the method according to the invention (i) synthesizing the candidate agent identified in step (c) above or an analog or derivative thereof in an amount sufficient to provide said drug in a therapeutically effective amount to a subject; and/or (ii) combining the drug candidate the candidate agent identified in step (c) above or an analog or derivative thereof with a pharmaceutically acceptable carrier.
[00102] Vectors, Cells: In some embodiments it is desirable to express the biomarker, in a vector and in cells. The applications of such combinations are unlimited. The vectors and cells expressing the one or more biomarkers can be used in assays, kits, drug discovery, diagnostics, prognostics and the like. The cells can be stem cells isolated from the bone marrow as a progenitor cell, or cells obtained from any other source, such as for example, ATCC. [00103] "Bone marrow derived progenitor cell" (BMDC) or "bone marrow derived stem cell" refers to a primitive stem cell with the machinery for self-renewal constitutively active. Included in this definition are stem cells that are totipotent, pluripotent and precursors. A "precursor cell" can be any cell in a cell differentiation pathway that is capable of differentiating into a more mature cell. As such, the term "precursor cell population" refers to a group of cells capable of developing into a more mature cell. A precursor cell population can comprise cells that are totipotent, cells that are pluripotent and cells that are stem cell lineage restricted (i.e. cells capable of developing into less than all hematopoietic lineages, or into, for example, only cells of erythroid lineage). As used herein, the term "totipotent cell" refers to a cell capable of developing into all lineages of cells. Similarly, the term "totipotent population of cells" refers to a composition of cells capable of developing into all lineages of cells. Also as used herein, the term "pluripotent cell" refers to a cell capable of developing into a variety {albeit not all) lineages and are at least able to develop into all hematopoietic lineages (e.g., lymphoid, erythroid, and thrombocytic lineages). Bone marrow derived stem cells contain two well-characterized types of stem cells. Mesenchymal stem cells (MSC) normally form chondrocytes and osteoblasts. Hematopoietic stem cells (HSC) are of mesodermal origin that normally give rise to cells of the blood and immune system (e.g., erythroid, granulocyte/macrophage, magakaryocite and lymphoid lineages). In addition, hematopoietic stem cells also have been shown to have the potential to differentiate into the cells of the liver (including hepatocytes, bile duct cells), lung, kidney (e.g., renal tubular epithelial cells and renal parenchyma), gastrointestinal tract, skeletal muscle fibers, astrocytes of the CNS, Purkinje neurons, cardiac muscle (e.g., cardiomyocytes), endothelium and skin.
[00104] In a preferred embodiment, a method of identifying candidate therapeutic compounds comprises culturing cells expressing at least one biomarker, complementary sequences, fragments, alleles, variants and gene products thereof, complementary sequences, fragments, alleles, variants and gene products thereof, with a candidate therapeutic agent; identifying candidate therapeutic agents which modulate the expression of the biomarkers and identifying a candidate therapeutic agent. Preferably, a candidate therapeutic agent comprises organic molecules, inorganic molecules, vaccines, antibodies, nucleic acid molecules, proteins, peptides and vectors expressing nucleic acid molecules. [00105] Such compounds are useful, e.g., as candidate therapeutic compounds for the treatment of autism, autism associated spectrum of disorders and conditions thereof. The methods include administering the compound to a model of the condition, e.g., contacting a cell (in vitro) model with the compound, or administering the compound to an animal model of the condition, e.g., an animal model of a condition associated with heart disease. The model is then evaluated for an effect of the candidate compound on the clinical outcome in the model and can be considered a candidate therapeutic compound for the treatment of the condition. Such effects can include clinically relevant effects, decreased pain; increased life span; and so on. Such effects can be determined on a macroscopic or microscopic scale. Candidate therapeutic compounds identified by these methods can be further verified, e.g., by administration to human subjects in a clinical trial.
[00106] The biomarkers can be expressed from one or more vectors. A "vector" (sometimes referred to as gene delivery or gene transfer "vehicle") refers to a macromolecule or complex of molecules comprising a polynucleotide to be delivered to a host cell, either in vitro or in vivo. The polynucleotide to be delivered may comprise a coding sequence of interest in gene therapy. Vectors include, for example, viral vectors (such as adenoviruses ("Ad"), adeno-associated viruses (AAV), and retroviruses), liposomes and other lipid-containing complexes, and other macromolecular complexes capable of mediating delivery of a polynucleotide to a host cell. Vectors can also comprise other components or functionalities that further modulate gene delivery and/or gene expression, or that otherwise provide beneficial properties to the targeted cells. As described and illustrated in more detail below, such other components include, for example, components that influence binding or targeting to cells (including components that mediate cell-type or tissue-specific binding); components that influence uptake of the vector nucleic acid by the cell; components that influence localization of the polynucleotide within the cell after uptake (such as agents mediating nuclear localization); and components that influence expression of the polynucleotide. Such components also might include markers, such as detectable and/or selectable markers that can be used to detect or select for cells that have taken up and are expressing the nucleic acid delivered by the vector. Such components can be provided as a natural feature of the vector (such as the use of certain viral vectors which have components or functionalities mediating binding and uptake), or vectors can be modified to provide such functionalities. Other vectors include those described by Chen et al; BioTechniques, 34: 167-171 (2003). A large variety of such vectors are known in the art and are generally available.
[00107] In accordance with the methods of the present invention, at least one candidate compound as defined elsewhere herein is used to promote a positive response with respect to a patient diagnosed with autism or ASD using the biomarkers described. By "positive therapeutic response" is intended an improvement in the disorder, syndrome, symptoms, or translational profile associated with the disorder.
[00108] Treating a patient comprises administration of one or more of the identified agents. In another preferred embodiment, the agents can be administered as part of a treatment regimen with one or more other therapies and medicaments.
[00109] The compounds of this invention may also be administered orally to the patient, in a manner such that the concentration of drug is sufficient to inhibit bone resorption or to achieve any other therapeutic indication as disclosed herein. Typically, a pharmaceutical composition containing the compound is administered at an oral dose of between about 0.1 to about 50 mg/kg in a manner consistent with the condition of the patient. Preferably the oral dose would be about 0.5 to about 20 mg/kg.
[00110] An intravenous infusion of the compound in 5% dextrose in water or normal saline, or a similar formulation with suitable excipients, is most effective, although an intramuscular bolus injection is also useful. Typically, the parenteral dose will be about 0.01 to about 100 mg/kg; preferably between 0.1 and 20 mg/kg, in a manner to maintain the concentration of drug in the plasma at a concentration effective to inhibit a cysteine protease. The compounds may be administered one to four times daily at a level to achieve a total daily dose of about 0.4 to about 400 mg/kg/day. The precise amount of an inventive compound which is therapeutically effective, and the route by which such compound is best administered, is readily determined by one of ordinary skill in the art by comparing the blood level of the agent to the concentration required to have a therapeutic effect. Prodrugs of compounds of the present invention may be prepared by any suitable method. For those compounds in which the prodrug moiety is a ketone functionality, specifically ketals and/or hemiacetals, the conversion may be effected in accordance with conventional methods. [00111] No unacceptable toxicological effects are expected when compounds, derivatives, salts, compositions etc, of the present invention are administered in accordance with the present invention. The compounds of this invention, which may have good bioavailability, may be tested in one of several biological assays to determine the concentration of a compound which is required to have a given pharmacological effect.
[00112] In another preferred embodiment, there is provided a pharmaceutical or veterinary composition comprising one or more compounds (e.g. antisense, compounds identified through an assay described above, etc) and a pharmaceutically or veterinarily acceptable carrier. Other active materials may also be present, as may be considered appropriate or advisable for the disease or condition being treated or prevented.
[00113] The carrier, or, if more than one be present, each of the carriers, must be acceptable in the sense of being compatible with the other ingredients of the formulation and not deleterious to the recipient.
[00114] The compounds described herein are suitable for use in a variety of drug delivery systems described above. Additionally, in order to enhance the in vivo serum half-life of the administered compound, the compounds may be encapsulated, introduced into the lumen of liposomes, prepared as a colloid, or other conventional techniques may be employed which provide an extended serum half-life of the compounds. A variety of methods are available for preparing liposomes, as described in, e.g., Szoka, et al., U.S. Pat. Nos. 4,235,871, 4,501,728 and 4,837,028 each of which is incorporated herein by reference. Furthermore, one may administer the drug in a targeted drug delivery system, for example, in a liposome coated with a tissue- specific antibody. The liposomes will be targeted to and taken up selectively by the organ. [00115] The formulations include those suitable for rectal, nasal, topical (including buccal and sublingual), vaginal or parenteral (including subcutaneous, intramuscular, intravenous and intradermal) administration, but preferably the formulation is an orally administered formulation. The formulations may conveniently be presented in unit dosage form, e.g. tablets and sustained release capsules, and may be prepared by any methods well known in the art of pharmacy. [00116] Such methods include the step of bringing into association the above defined active agent with the carrier. In general, the formulations are prepared by uniformly and intimately bringing into association the active agent with liquid carriers or finely divided solid carriers or both, and then if necessary shaping the product. The invention extends to methods for preparing a pharmaceutical composition comprising bringing a compound conjunction or association with a pharmaceutically or veterinarily acceptable carrier or vehicle.
Kits
[00117] In another preferred embodiment, a kit is provided comprising any one or more of the biomarkers comprising Tables 1 and 2.
[00118] For use in the applications described or suggested above, kits or articles of manufacture are also provided by the invention. Such kits may comprise a carrier means being compartmentalized to receive in close confinement one or more container means such as vials, tubes, and the like, each of the container means comprising one of the separate elements to be used in the method. For example, one of the container means may comprise a probe that is or can be detectably labeled. Where the kit utilizes nucleic acid hybridization to detect the target nucleic acid, the kit may also have containers containing nucleotide(s) for amplification of the target nucleic acid sequence and/or a container comprising a reporter-means, such as a biotin- binding protein, such as avidin or streptavidin, bound to a reporter molecule, such as an enzymatic, florescent, or radioisotope label.
[00119] The kit of the invention will typically comprise the container described above and one or more other containers comprising materials desirable from a commercial and user standpoint, including buffers, diluents, filters, needles, syringes, and package inserts with instructions for use. A label may be present on the container to indicate that the composition is used for a specific therapy or non-therapeutic application, and may also indicate directions for either in vivo or in vitro use, such as those described above.
[00120] The kits of the invention have a number of embodiments. A typical embodiment is a kit comprising a container, a label on said container, and a composition contained within said container; wherein the composition includes a primary antibody that binds to the biomarkers of each molecular signature and instructions for using the antibody for evaluating the presence of biomarkers in at least one type of mammalian cell. The kit can further comprise a set of instructions and materials for preparing a tissue sample and applying antibody and probe to the same section of a tissue sample. The kit may include both a primary and secondary antibody, wherein the secondary antibody is conjugated to a label, e.g., an enzymatic label. [00121] Another embodiment is a kit comprising a container, a label on said container, and a composition contained within said container; wherein the composition includes a polynucleotide that hybridizes to a complement of the polynucleotides under stringent conditions, the label on said container indicates that the composition can be used to evaluate the presence of a molecular signature in at least one type of mammalian cell, and instructions for using the polynucleotide for evaluating the presence of biomarker RNA or DNA in at least one type of mammalian cell. [00122] Other optional components in the kit include, microarrays, one or more buffers (e.g., block buffer, wash buffer, substrate buffer, etc), other reagents such as substrate (e.g., chromogen) which is chemically altered by an enzymatic label, epitope retrieval solution, control samples (positive and/or negative controls), control slide(s) etc.
[00123] While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not limitation. Numerous changes to the disclosed embodiments can be made in accordance with the disclosure herein without departing from the spirit or scope of the invention. Thus, the breadth and scope of the present invention should not be limited by any of the above described embodiments.
[00124] All documents mentioned herein are incorporated herein by reference. All publications and patent documents cited in this application are incorporated by reference for all purposes to the same extent as if each individual publication or patent document were so individually denoted. By their citation of various references in this document, Applicants do not admit any particular reference is "prior art" to their invention. Embodiments of inventive compositions and methods are illustrated in the following examples.
EXAMPLES
[00125] The following non-limiting Examples serve to illustrate selected embodiments of the invention. It will be appreciated that variations in proportions and alternatives in elements of the components shown will be apparent to those skilled in the art and are within the scope of embodiments of the present invention.
Example 1: Chromosome 5 Autism candidate gene analysis [00126] The immediate area of association on chromosome 5 contains no clearly defined candidate genes. Several potential genes such as a "merlin"-like annotated gene derived from similarity to Drosophila and an MSNLl (moesin-like) pseudogene, which may form part of the same Ch. X duplicated complex, lie in the immediate region of association. In addition several areas exhibit highly evolutionarily conserved sequence in this region. Immediately flanking the region of maximum association analysis, indicates that one or more un-annotated or uncharacterized genes or transcripts, some supported by spliced EST data, may lie close to the region of association. Of great interest is the fact that the candidate region is flanked by two excellent candidate genes in CDH9 (Cadherin 9) and CDHlO (Cadherin 10). The Cadherins are integral membrane proteins involved in calcium dependent cell-cell adhesion, and more specifically protein cell adhesion activity. Both genes have been identified as being expressed in fetal brain. Abnormalities in either of their functions could easily lead to inappropriate connections, and development, such as those believed to be present in autism, in developing brain. The region of association indentified herein, lies 5' to the CDHlO gene and 3' to CDH9. Since regulatory regions, which can be quite distant from the gene affected, generally tend to lie at the 5' ends of gene, study of CDHlO will be a priority, while not ignoring the fact that CDH9, which interacts with CDHlO, may also be a/the causative gene. Deletions in the related PCDHlO (protocadherin 10) may also be involved in the etiology of autism. The CDHlO gene is predominantly expressed in brain and believed to be involved in the formation of synapses, axon growth and guidance.
Example 2: A genome-wide association study of autism reveals a common novel risk locus at 5pl4.1
[00127] To comprehensively examine the hypothesis that common variation is important in autism, a genome-wide association study (GWAS) was performed using a discovery dataset of 438 autistic Caucasian families and the Illumina Human IM beadchip. 96 single nucleotide polymorphisms (SNPs) demonstrated strong association with autism risk (p-value < 0.0001). The validation of the top 96 SNPs was performed using an independent dataset of 487 Caucasian autism families genotyped on the 550K Illumina BeadChip. A novel region on chromosome 5pl4.1 showed significance in both the discovery and validation datasets. Joint analysis of all SNPs in this region identified 8 SNPs having improved p-values (3.24E-04 to 3.40E-06) than in either dataset alone. The findings demonstrated that in addition to multiple rare variations, part of the complex genetic architecture of autism involves common variation.
[00128] Materials and Methods:
[00129] Ascertainment and Sample description: Autism patients and their affected and unaffected family members were ascertained as part of the Collaborative Autism Project (CAP) through four clinical groups at the Miami institute for Human Genomics (MIHG, Miami, Florida), University of South Carolina (Columbia, South Carolina), W.S. Hall Psychiatric Institute (Columbia, South Carolina) and Vanderbilt Center for Human Genetics Research (Vanderbilt University, Nashville, Tennessee).
[00130] Participating families were enrolled through a multi-site study of autism genetics and recruited via support groups, advertisements, and clinical and educational settings. All participants and families were ascertained using a standard protocol. These protocols were approved by appropriate Institutional Review Boards. Written informed consent was obtained from parents as well as from minors who were able to give informed consent; in individuals unable to give assent due to age or developmental problems, assent was obtained whenever possible.
[00131] Core inclusion criteria were as follows: (1) chronological age between 3 and 21 years of age; (2) presumptive clinical diagnosis of autism; (3) expert clinical determination of autism diagnosis using DSM-IV criteria supported by the Autism Diagnostic Interview-Revised (ADI-R) in the majority of cases and all available clinical information. The ADI-R is a semi-structured diagnostic interview which provides diagnostic algorithms for classification of autism (Autism Genetics Resource Exchange 2008, AGRE [Homepage of Autism Speaks], [Online]. Available: agre.org). All ADI-R interviews were conducted by formally trained interviewers who have achieved reliability according to established methods. Thirty-eight individuals were missing an ADI-R. For those cases a best estimate procedure was implemented to determine a final diagnosis using all available information from the research record and data from other assessment procedures. This information was reviewed by a clinical panel led by an experienced clinical psychologist and included two other psychologists and a pediatric medical geneticist - all of whom were experienced in autism. Following review of case material the panel discussed the case until a consensus diagnosis was obtained. Only those cases in which a consensus diagnosis of autism was reached were included; (4) minimal developmental level of 18 months as determined by the Vineland Adaptive Behavior Scale (VABS) (Sparrow, S. S., Balla, D. & Cicchetti, D. (1984) Vineland Adaptive Behavior Scales, Interview Edition, AGS Publishing, Circle Pines, MN) or the VABS-II (Sparrow, S.S., Cicchetti, D.V. & Balla, D. (2005) Vineland Adaptive Behavior Scales-Second Edition, AGS, Circle Pines, MN) or IQ equivalent > 35. These minimal developmental levels assure that ADI-R results are valid and reduce the likelihood of including individuals with severe mental retardation only. Participants with severe sensory problems (e.g., visual impairment or hearing loss), significant motor impairments (e.g., failure to sit by 12 months or walk by 24 months), or identified metabolic, genetic, or progressive neurological disorders were excluded.
[00132] A total of 487 Caucasian families (1537 individuals) were genotyped. This dataset consisted of 80 multiplex families (more than one affected individual) and 407 singleton (parent- child trio) families. In addition, GWAS data were obtained from the Autism Genetic Resource Exchange (AGRE) (Autism Genetics Resource Exchange 2008) for use as a validation dataset. The full AGRE dataset is publicly available and contains families with the full spectrum of autism spectrum disorders. Only families with one or more individuals diagnosed with autism (using DSM-IV and ADI-R) were selected; affected individuals with non-autism diagnosis within these families were excluded from the analysis. This resulted in a confirmation dataset of 680 multiplex families (3512 individuals) from the AGRE 'SingleAllAgre' beadstudio file (Autism Genetics Resource Exchange 2008). Family and individual identifiers for all AGRE samples which passed our quality control are listed in Table S3.
[00133] Genotyping of the discovery dataset: Genomic DNA was purified from whole blood using Puregene chemistry on the Qiagen Autopure LS according to standard automated Qiagen protocols (Qiagen, Valencia, CA).
[00134] DNA samples were quantitated via the ND-8000 spectrophotometer and DNA quality was evaluated via gel electrophoresis on a 0.8% agarose gel. The concentration for all qualified samples was normalized to 50 ng/μl and samples were arrayed in Matrix 0.5ml 2D barcoded tubes in racks of 96. Sample identity was confirmed by genotyping 8 SNPs using Taqman allelic discrimination assays (Applied Biosystems, Foster City, CA) and assessing for concordance with historical data.
[00135] Samples that passed the above exclusion criteria were genotyped using Illumina's Human IM Beadchip, containing 1,072,820 SNPs (of those 258,665 loci are in reported and new CNV regions). The samples have been processed according to Illumina Procedures for processing of the Infinium II ® assay (Illumina Inc., San Diego, CA).
[00136] The above protocol was automated using the Tecan EVO-I to further enhance the efficiency and consistency of the assay (Tecan Group Ltd., Mannedorf, Switzerland). Samples were processed in batches of 48 at a time. The same Quality Control DNA sample was repeated during each run to ensure reproducibility of results between runs. Data was extracted by the Illumina ® Beadstudio software from data files created by the Illumina BeadArray reader. Samples and markers with call rates below 95% were excluded from analysis and a GenCall cutoff score of 0.15 was used for all Infinium II ® products.
[00137] Sample quality control: After genotyping, samples were subject to a battery of a quality control (QC) tests. The same protocol for both the discovery and validation datasets was used. Reported and genetic gender were examined using X-chromosome linked SNPs. Relatedness between samples, sample contaminations, mis-identification and duplications were tested using genome- wide identity-by-descent (IBD) estimation; inconsistent samples were dropped from the analysis. The numbers of remaining samples are listed in Table Sl .
Table Sl:
Numbers of families after C was erformed.
Figure imgf000043_0001
[00138] As a next step Mendelian inconsistencies on all SNPs and samples were tested. Mendelian errors (ME) can emerge from sample mis-identification, DNA contamination, copy- number variation (CNV), genotype calling errors and other reasons. The median of ME per family in both investigated cohorts was below 0.005%. More than 99% of the discovery families and 98% of the validation families had ME below 0.02%. Families with ME > 2% were excluded from the analysis. This threshold would still allow for small deletions and duplications that are common in the human genome.
[00139] SNP quality control: SNPs were subject to QC before analysis. SNPs with minor allele frequencies below 5% were removed because of restricted power in the discovery sample. Negative correlation between the proportion of ME per SNP and p-value for HWE were observed. To minimize genotyping errors we excluded SNPs with p-value <10"6 for HWE and ME > 4%. Remaining erroneous genotypes were set as missing. PLINK software was used for quality control steps described above (Purcell et al., 2007, Am J Hum Genet 81, 559-575).
[00140] Illumina provides information on which IM BeadChip SNPs were located within known common CNV regions. The distribution of ME per family and per SNP were compared. No significant differences between ME per SNP in the known CNV regions and the remaining markers were identified. The same quality criteria were used for both the discovery and the validation datasets. The summary of SNPs is presented in Table S2.
Table S2:
Numbers of SNPs after ualit control was erformed.
Figure imgf000044_0001
[00141] Population Stratification: Although population substructure does not cause type I error in family-based association tests, multiple founder effects could result in reduced power to detect an association in a heterogeneous disease such as autism. Thus, EIGENSTRAT (Patterson et al., 2006) analysis was conducted on all parents from analyzed families for evidence of population substructure using the 491 ,664 SNPs genotyped in both the discovery and validation datasets. To ensure the most homogeneous groups for association screening and replication, we excluded all families with outliers defined by EIGENSTRAT (Patterson et al., 2006, PLoS Genet 2, el90.) out of 4 standard deviations of principle components 1 and 2. After all QC steps, 1,390 samples from 438 autistic families retained in the final discovery dataset and 2,390 samples from 457 autistic families (Tables Sl and S3) in the validation dataset. The average genotyping rate in the remaining individuals was 99.8 %.
[00142] Genotype Imputation: Since the validation dataset was genotyped on a different GWAS SNP panel with a smaller number of SNPs (558183), the genotypes from our data and the data from the AGRE were imputed independently by the program IMPUTE (Marchini et al., 2007, Nat Genet 39, 906-913) using a phased CEU HapMap dataset as a reference (International HapMap Consortium et al., 2007, Nature 449, 851-861). Individual genotypes with probability less than 0.90 were not included. All individuals were treated independently while doing imputation. Mendelian inconsistencies were zeroed out in PLINK (Purcell et al., 2007, Am J Hum Genet 81 , 559-575). The results for the imputation are found in Table 1. Results on imputed SNPs missing more than 10% of the genotypes were labeled in Table 1 and should be interpreted with caution because of possible bias.
[00143] Association Analysis: Association analysis was performed using the pedigree disequilibrium test (PDT) (Martin et al, 2000, Am J Hum Genet 67, 146-154; Martin et al, 2001, Am J Hum Genet 68, 1065-1067). This method provides valid and robust tests for allelic association across both trios and extended families. Only autosomal markers were tested for association. The estimation of odds ratios and 95% confidence interval calculations were performed using UNPHASED (Dudbridge et al, 2008, Hum Hered 66,87-98). Power calculations for association analysis were performed using the Genetic Power Calculator (Purcell et al., 2008, Genetic Power Calculator [Homepage of Harvard University], [Online]. Available: pngu.mgh.harvard.edu/~purcell/gpc/ [2008]).
[00144] Linkage disequilibrium: Linkage disequilibrium (LD) patterns and haplotype block delineation were determined by using Haploview 4.1 (Choi et al., 2001, Yonsei Medical Journal 42, 247-254). Blocks were defined using the confidence interval method described by Gabriel et al. (2002), Science 296, 2225-2229. Pair-wise LD measures (r2) were calculated in the 3,822 unrelated founders of the joint sample.
[00145] Results: [00146] To more comprehensively test the common variant hypothesis, an unbiased genome- wide association study of common variation was performed using as a discovery dataset the Caucasian autistic families from the Collaborative Autism Project (CAP). The findings were validated using an independent publicly available family -based Genome- Wide Association Study (GWAS) dataset from the Autism Genome Research Exchange (AGRE) (Autism Genetics Resource Exchange 2008). Quality-control (QC) procedures were applied to the more than 1,000,000 single nucleotide polymorphisms (SNPs) in the discovery dataset and 550,000 SNPs in the validation dataset.
[00147] After applying QC filters, 775,311 common autosomal SNPs remained in the discovery dataset with an average genotyping rate of 99.80% and 500,100 common autosomal SNPs remained in the validation dataset with an average genotyping rate of 99.82%. To account for possible population stratification, families were excluded if the values for the top two principal components for either of the probands' parents were > 4 standard deviations from the core Caucasian cluster generated in EIGENSTRAT (Patterson et al, 2006 PLoS Genet 2, el 90). The final datasets included 1,390 samples from 438 autistic families in the discovery dataset and 2,390 samples from 457 autistic families in the validation dataset. For any SNP of interest in the discovery dataset not directly genotyped in the validation dataset, imputation of genotypes was performed in the validation dataset using the program IMPUTE (Marchini et al, 2007 Nat Genet 39, 906-913). The Pedigree Disequilibrium Test (PDT) (Martin et al, 2000 Am J Hum Genet 67, 146-154; Martin et al, 2001 Am J Hum Genet 68, 1065-1067) was used for all association analyses. The distribution of p-values examined in the discovery dataset demonstrated a close match to that expected for a null distribution except at the extreme tail of low p-values (Figure I ). This is expected if there is little residual error in the data and common variants of modest effect sizes are acting in autism. In the discovery dataset, none of the p-values met the stringent and overly conservative Bonferroni correction for genome-wide significance (Fig. 2).
[00148] Examination of the 651 SNPs in the CNTNAP2 gene (Arking et al, 2008 Am J of Hum Genet 82, 160-164; Bakkaloglu et al, 2008 Am J Hum Genet 82, 165-173) in the discovery dataset revealed only eight genotyped SNPs that were nominally significant (p-values=0.002- 0.04). The results did not significantly improve in male only families. The tagging SNP, rs270102, reported by Alarcόn et al, (2008) Am J Hum Genet 82, 7-9), was not significant in either the overall or male only family dataset. SNP rs7794745 showing linkage in the Arking et al. (2008) study was not geno typed in our dataset. Association of imputed genotypes for this SNP was not significant (p=0.62). None of the tested markers met gene -wide (CNTNAP2) significance after correction.
[00149] Despite no genome -wide significant association, 96 SNPs showed strongly suggestive association with autism risk (Table 1, p<0.0001) and met the initial criteria for follow-up.
Figure imgf000048_0001
Figure imgf000049_0001
Figure imgf000050_0001
[00150] Among the 96 top hits, 2 SNPs, residing in 5pl4.1, had improved p-values in the joint analysis and also had nominally significant association signals in the validation dataset encouraging us to look at this region in more detail. Therefore, every SNP (n=46) genotyped in this region (25830kb to 26100kb) was examined in both datasets regardless of their initial p- value. Analyses of these data revealed a cluster of 19 SNPs including 8 imputed SNPs showing nominally significant association (p<0.05) in the validation dataset. Eight SNPs on chromosome 5pl4.1 (Table 2) showed improved association signals in the joint dataset. Risk was associated with the same allele for these eight SNPs in both datasets and the p-values became more significant (p-values: 3.24E-04 to 3.40E-06) in the joint analysis, with the most significant p- value coming from one of the top 96 hits rsl0038113. The odds ratios for the major alleles ranged from 0.75 to 1.32 (Table 2).
[00151] To determine if we might miss a strong signal by only using the CAP dataset as the discovery dataset, the datasets for discovery and validation were reversed and used the same two stage approach. 21 SNPs had p-values < 0.0001 in the AGRE dataset but none of them could be replicated in the CAP dataset even with a nominal significance of p<0.05. The power of the Transmission Disequilibrium Test (TDT) was computed in 438 triad families that approximates a lower bound for power of the PDT in our discovery sample. Given a prevalence of autism of 0.0066 (Chakrabarti & Fombonne 2005, Am J Psychiatry 162, 1133-1141) and a SNP in LD (D'=l) with a risk allele frequency of 0.6, 84% power was expected to detect an association at/? =0.0001 under a recessive model (GRRAA=2, GRRAa=I) and 33% under an additive model (GRRAA=2, GRRAa=I -5). These are consistent with the allelic GRR's estimated for the chromosome 5 region. The power to detect a Bonferroni-corrected genome -wide significance (p = 0.05 / 775,311 SNPs = 6.4x10 8) drops to 30% and 2.5%, respectively, for recessive and additive models.
Figure imgf000052_0001
[00152] Discussion: The linkage disequilibrium (LD) pattern was examined among the eight replicated SNPs with improved P-values (Figure 3) to gain a better understanding of the association. Seven of these SNPs form two tightly linked LD blocks. Given that none of these SNPs reside within known genes or known regulatory sequences, the clustering of association signals indicates that one or more nearby functional variants is responsible for the signal. A survey of the genomic landscape surrounding this region of association revealed several interesting avenues for further molecular investigation. There were numerous sequence segments exhibiting a high degree of evolutionary conservation, indicating potential regulatory, but currently undetermined, functions. In addition, there were three known copy number variants (CNVs) in proximity to the most significant SNPs (Table 2). Preliminary investigation of these CNVs in the discovery dataset was not suggestive of a causal relationship with autism. Exhaustive molecular analysis of the candidate region is ongoing. In addition, although the immediate 1 Mb vicinity of the association region contains no known genes, flanking the region were CDH9 and CDHlO, two genes belonging to the cadherin family, a group of proteins containing members that are involved in calcium-dependent cell-cell junctions in the nervous system (Liu et al, 2006, Gene Expr Patterns 6, 703-710; Pokutta &Weis 2007, Annu Rev Cell Dev Biol, 23, 237-261) and which are possible targets of regulatory action.
[00153] The power calculation showed that stringent adjustments for multiple testing provide power only to detect loci with large effects given this sample size. Lowering the threshold for significance allowed detection of loci with relatively small effects (such as the chromosome 5 locus), while also relying on replication to limit the false positives. It was noted that this region of 5pl4.1 did not generate exceptional p-values in our initial GWAS, indicating that a strong single gene association, such as those seen with the APOE gene in Alzheimer disease and the CFH gene in age related macular degeneration (International Multiple Sclerosis Genetics Consortium et al., 2007, Η Engl J Med, 357, 851-862) is highly unlikely in autism. The absence of a large effect is consistent with the results of previously published linkage studies (Ma et al., 2007, MoI Psychiatry 12, 376-384; Allen-Brady et al, 2008, MoI Psychiatry Feb 19 Epub ahead of print). Only through the analysis of the validation dataset were we able to identify this replicated signal, highlighting the value of both a validation dataset and of joint analyses. Two additional datasets have shown association of autism at 5p 14.1. These include a cohort of 1,241 ASD cases and 6,491 control subjects and a cohort of 108 ASD cases and 540 controls. The combined p-values for SNPs in the 5pl4.1 region in these datasets combined with these results, which includes over 10,000 subjects, range from 7.4 x 10"8 to 2.1 x 10"10. These results survive stringent Bonferroni correction. (Wang et ah, 2008, Nature, In Press).
[00154] The approach described herein, which uses a validation set as indication of a true association, has proven successful in other GWAS as exemplified by the identification of IL7RA and IL2RA susceptibility alleles in multiple sclerosis (MS) where no SNPs in either gene met genome -wide significance in the discovery dataset, but were confirmed through validation in an additional dataset (International Multiple Sclerosis Genetics Consortium et ah, 2007, N Engl J Med, 357, 851-862). These MS findings have been confirmed recently across numerous datasets (International Multiple Sclerosis Genetics Consortium (IMSGC) 2008, Lancet Neurol, 1 , 567- 569). It was also noted that other such common variants are likely to exist in autism and further GWAS studies are warranted.
[00155] The identification and replication of common variation on chromosome 5pl4.1 associated with autism is a promising development in the struggle to understand the genetics of autism. It also highlights the power of GWAS for detecting moderate genetic effects in neurobehavioral phenotypes. These results, in combination with the multiple rare variants already identified, indicate that the genetic architecture of autism is as exquisitely complex as is its clinical phenotype.
[00156] Although the invention has been illustrated and described with respect to one or more implementations, equivalent alterations and modifications will occur to others skilled in the art upon the reading and understanding of this specification and the annexed drawings. In addition, while a particular feature of the invention may have been disclosed with respect to only one of several implementations, such feature may be combined with one or more other features of the other implementations as may be desired and advantageous for any given or particular application.
[00157] The Abstract of the disclosure will allow the reader to quickly ascertain the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the following claims. References:
1. Centers for Disease Control, "Autism," (2008).
2. Chakrabarti, S. & Fombonne, E. Am. J. Psychiatry 162, 1133-1141 (2005).
3. NCBI, "Online Mendelian Inheritance of Man, Autism," (2008).
4. S. Steffenburg et al, J. Child Psychol. Psychiatry 30, 405 (1989).
5. A. Bailey et al, Psychol. Med. 25, 63 (1995).
6. P. Bolton et al, Journal of Child Psychology & Psychiatry & Allied Disciplines 35, 877 (1994).
7. L. A. Weiss et al., N. Engl. J. Med.(2008).
8. J. Sebat et al, Science 316, 445 (2007).
9. Shao Y et al., Am J Med Genet 2002; 114: 99-105.
10. AGP, Nat Genet. 2007 Mar; 39(3):319-28
11. D. E. Arking et al, Am. J. Hum. Genet. 82, 160 (2008).
12. M. Alarcon et al, Am. J. Hum. Genet. 82, 150 (2008).
13. Autism Genetics Resource Exchange, "AGRE," (2008).
14. N. Patterson, A. L. Price, D. Reich, PLoS. Genet. 2, el 90 (2006).
15. E. R. Martin, M. D. Ritchie, L. Hahn, S. Kang, J. H. Moore, Genet. Epidemiol. 30, 111 (2006).
16. E. H. Corder et al, Science 261, 921 (1993).
17. J. L. Haines et al, Science 308, 419 (2005).
18. S. F. Grant et al, Nat. Genet. 38, 320 (2006).
19. Q. Liu et al, Gene Expr. Patterns 6, 703 (2006).
20. S. Pokutta, W. I. Weis, Annu. Rev. Cell Dev. Biol. 23, 237 (2007).
21. IMSGC, N Engl J Med., Aug 30;357(9):851-62 (2007).
22. IMSGC, http://nεurology.thelancet.com Vo! 7 July

Claims

CLAIMSWhat is claimed is:
1. A biomarker for the diagnosis of autism and autism spectrum associated disorders comprising: at least one single nucleotide polymorphism on chromosome 5pl4.1 and/ or at least one single polymorphism nucleotide (SNP) set forth as: rs201171, rs4394668, rs4618985, rsl831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081, rs6707773, rsl3019278, rsl2622496, rs3731723, rs722555, rs6436915, rs2279977, rs9822786, rsl2491012, rsl811763, rsl896731, rslOO38113, rsl 1739167, rs7447989, rs6873221, rsl2187724, rsl423435, rsl2153325, rslO461556, rs350436, rs830907, rs315717, rs3804254, rs2317222, rsl936022, rs6907646, rsl2529724, rsl504279, rs 1504281, rs2799644, rs320813, rsl529001, rsl 1765584, rsl529813, rs2528795, rslOOOO58, rsl 149558, rsl 130496, rsl2155975, rsl6930253, rs2082804, rs4739071, rs6991229, rsl865641, rsl865638, rsl 1001685, rs881631, rsl2245799, rsl2249859, rsl0748804, rs7910491, rsl 0840070, rsl 1021927, rs9919560, rsl2366827, rsl894827, rsl 1055784, rsl 1055786, rsl 1117003, rsl0492086, rsl0444509, rsl 1616562, rs7325257, rsl 1636552, rsl532926, rs4456502, rs4148358, rs891754, rs8088001, rs547668, rsl 144093, rs637644, rs6049129, rs742759, or rsl71415, variants, mutants, alleles or complementary sequences thereof.
2. The biomarker of claim 1, wherein a single nucleotide polymorphism on chromosome 5pl4.1 comprises: rsl0065041, rs7704909, rsl896731, rslOO38113, rs6894838, rsl2518194, rs4307059, or rs4327572, variants, mutants, alleles or complementary sequences thereof.
3. The biomarker of claim 1, further comprising at least one mutation in a cadherin gene and/or protocadherin gene, variants, mutants, alleles or complementary sequences thereof.
4. The biomarker of claim 3, wherein the cadherin gene comprises cadherin gene 9, cadherin 10, or protocadherin 10, variants, mutants, alleles or complementary sequences thereof.
5. A method of diagnosing a patient with autism or autism spectrum of disorders comprising: identifying in a patient a biomarker set comprising: at least one single nucleotide polymorphism on chromosome 5pl4.1 and/ or at least one single polymorphism nucleotide (SNP) set forth as: rs201171, rs4394668, rs4618985, rsl831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081, rs6707773, rsl3019278, rsl2622496, rs3731723, rs722555, rs6436915, rs2279977, rs9822786, rsl2491012, rsl811763, rsl896731, rslOO38113, rsl 1739167, rs7447989, rs6873221, rsl2187724, rsl423435, rsl2153325, rslO461556, rs350436, rs830907, rs315717, rs3804254, rs2317222, rsl936022, rs6907646, rsl2529724, rsl504279, rs 1504281, rs2799644, rs320813, rsl529001, rsl 1765584, rsl529813, rs2528795, rslOOOO58, rsl 149558, rsl 130496, rsl2155975, rsl6930253, rs2082804, rs4739071, rs6991229, rsl865641, rsl865638, rsl 1001685, rs881631, rsl2245799, rsl2249859, rsl 0748804, rs7910491, rsl 0840070, rsl 1021927, rs9919560, rsl 2366827, rsl 894827, rsl 1055784, rsl 1055786, rsl 1117003, rsl 0492086, rsl 0444509, rsl 1616562, rs7325257, rsl 1636552, rsl532926, rs4456502, rs4148358, rs891754, rs8088001, rs547668, rsl 144093, rs637644, rs6049129, rs742759, or rs 171415 , variants, mutants, alleles or complementary sequences thereof; and, diagnosing a patient with autism or autism spectrum of disorders.
6. The method of claim 5, wherein a single nucleotide polymorphism on chromosome 5pl4.1 comprises: rsl0065041, rs7704909, rsl896731, rslOO38113, rs6894838, rsl2518194, rs4307059, or rs4327572, variants, mutants, alleles or complementary sequences thereof.
7. The method of claim 5, further comprising: a mutation in a cadherin gene and/or 5' to cadherin gene 10 (CDHlO) and 3' to cadherin gene 9 (CDH9).
8. The method of claim 7, wherein the cadherin gene comprises cadherin gene 9, cadherin 10, or protocadherin 10.
9. A method of identifying a subject having an increased risk of developing autism or autism spectrum of disorders comprising: identifying in a patient a biomarker set comprising at least one single nucleotide polymorphism on chromosome 5pl4.1 and/ or at least one single polymorphism nucleotide (SNP) set forth as: rs201171, rs4394668, rs4618985, rs 1831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081, rs6707773, rsl3019278, rsl2622496, rs3731723, rs722555, rs6436915, rs2279977, rs9822786, rsl2491012, rsl811763, rsl896731, rslOO38113, rsl 1739167, rs7447989, rs6873221, rsl2187724, rsl423435, rsl2153325, rslO461556, rs350436, rs830907, rs315717, rs3804254, rs2317222, rsl936022, rs6907646, rsl2529724, rsl504279, rsl504281, rs2799644, rs320813, rsl529001, rsl 1765584, rsl529813, rs2528795, rsl000058, rsl 149558, rsl 130496, rsl2155975, rsl6930253, rs2082804, rs4739071, rs6991229, rsl865641, rsl865638, rsl 1001685, rs881631, rsl2245799, rsl2249859, rsl0748804, rs7910491, rsl 0840070, rsl 1021927, rs9919560, rsl2366827, rsl894827, rsl 1055784, rsl 1055786, rsl 1117003, rsl0492086, rsl0444509, rsl 1616562, rs7325257, rsl 1636552, rsl532926, rs4456502, rs4148358, rs891754, rs8088001, rs547668, rsl 144093, rs637644, rs6049129, rs742759, or rsl71415, variants, mutants, alleles or complementary sequences thereof; and, identifying a subject having an increased risk of developing autism or autism spectrum of disorders.
10. The method of claim 9, wherein a single nucleotide polymorphism on chromosome 5pl4.1 comprises: rsl0065041, rs7704909, rsl896731, rslOO38113, rs6894838, rsl2518194, rs4307059, or rs4327572, variants, mutants, alleles or complementary sequences thereof.
11. The method of claim 9, further comprising detecting a mutation in a cadherin gene.
12. The method of claim 11, wherein the cadherin gene comprises cadherin gene 9, cadherin 10, or protocadherin 10.
13. A method of diagnosing a patient pre-natally or post-natally with autism comprising: detecting at least one single nucleotide polymorphism on chromosome 5pl4.1 and/or at least one single nucleotide polymorphism comprising: rs201171, rs4394668, rs4618985, rsl831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081, rs6707773, rsl3019278, rsl2622496, rs3731723, rs722555, rs6436915, rs2279977, rs9822786, rsl2491012, rsl811763, rsl896731, rslOO38113, rsl 1739167, rs7447989, rs6873221, rsl2187724, rsl423435, rsl2153325, rslO461556, rs350436, rs830907, rs315717, rs3804254, rs2317222, rsl936022, rs6907646, rsl2529724, rsl504279, rsl 504281, rs2799644, rs320813, rsl529001, rsl 1765584, rsl529813, rs2528795, rslOOOO58, rsl 149558, rsl 130496, rsl2155975, rsl6930253, rs2082804, rs4739071, rs6991229, rsl865641, rsl865638, rsl 1001685, rs881631, rsl2245799, rsl2249859, rsl0748804, rs7910491, rsl 0840070, rsl 1021927, rs9919560, rsl2366827, rsl894827, rsl 1055784, rsl 1055786, rsl 1117003, rsl0492086, rsl0444509, rsl 1616562, rs7325257, rsl 1636552, rsl532926, rs4456502, rs4148358, rs891754, rs8088001, rs547668, rsl 144093, rs637644, rs6049129, rs742759, or rsl71415.
14. The method of claim 13, wherein the sample comprises: amniotic fluid, serum, blood, plasma, cells or tissue.
15. The method of claim 13, wherein a single nucleotide polymorphism on chromosome 5pl4.1 comprises: rsl0065041, rs7704909, rsl896731, rslOO38113, rs6894838, rsl2518194, rs4307059, or rs4327572.
16. The method of claim 13, further comprising detecting a mutation in a cadherin gene.
17. The method of claim 13, wherein the cadherin gene comprises cadherin gene 9, cadherin 10, or protocadherin 10.
18. A method of identifying a marker for the diagnosis of autism and autism spectrum disorders comprising: obtaining a sample from individuals and their families and purifying genomic DNA from the sample; genotyping single nucleotide polymorphisms (SNP); assessing the single nucleotide polymorphisms; and, identifying a marker for the diagnosis of autism and autism spectrum disorders.
19. The method of claim 18, wherein samples are assessed in genome -wide association analysis.
20. The method of claim 18, wherein samples from families with Mendelian errors greater than 2% are excluded.
21. The method of claim 18, wherein single nucleotide polymorphisms having a Hardy- Weinberg equilibrium (HWE) p-value of about less than 10~6 and a Mendelian Error (ME) of greater than about 4% are excluded.
22. The method of claim 18, wherein autosomal markers are analyzed for association comprising a pedigree disequilibrium test (PDT).
23. The method of claim 18, wherein the samples are assessed for linkage disequilibrium patterns.
24. The method of claim 18, wherein the single nucleotide polymorphism is correlated with a autism and/or autism spectrum of disorders.
25. A biomarker for identifying a patient at risk of developing or for the diagnosis of autism and autism spectrum of disorders comprising: a mutation in a cadherin gene and/or 5' to cadherin gene 10 (CDHlO) and 3' to cadherin gene 9 (CDH9).
26. The biomarker of claim 25, wherein the cadherin gene comprises cadherin gene 9, cadherin 10, or protocadherin 10.
27. The biomarker of claim 25, wherein the biomarker further comprising: at least one single nucleotide polymorphism on chromosome 5pl4.1 and/ or at least one single polymorphism nucleotide (SNP) set forth as: rs201171, rs4394668, rs4618985, rs 1831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081, rs6707773, rsl3019278, rsl2622496, rs3731723, rs722555, rs6436915, rs2279977, rs9822786, rsl2491012, rsl811763, rsl896731, rslOO38113, rsl 1739167, rs7447989, rs6873221, rsl2187724, rsl423435, rsl2153325, rslO461556, rs350436, rs830907, rs315717, rs3804254, rs2317222, rsl936022, rs6907646, rsl2529724, rsl504279, rsl504281, rs2799644, rs320813, rsl529001, rsl 1765584, rsl529813, rs2528795, rslOOOO58, rsl 149558, rsl 130496, rsl2155975, rsl6930253, rs2082804, rs4739071, rs6991229, rsl865641, rsl865638, rsl 1001685, rs881631, rsl2245799, rsl2249859, rsl0748804, rs7910491, rsl 0840070, rsl 1021927, rs9919560, rsl2366827, rsl894827, rsl 1055784, rsl 1055786, rsl 1117003, rsl0492086, rsl0444509, rsl 1616562, rs7325257, rsl 1636552, rsl532926, rs4456502, rs4148358, rs891754, rs8088001, rs547668, rsl 144093, rs637644, rs6049129, rs742759, or rsl71415, variants, mutants, alleles or complementary sequences thereof.
28. The biomarker of claim 27, wherein a single nucleotide polymorphism on chromosome 5pl4.1 comprises: rsl0065041, rs7704909, rsl896731, rsl0038113, rs6894838, rsl2518194, rs4307059, or rs4327572, variants, mutants, alleles or complementary sequences thereof.
29. A biomarker for diagnostically distinguishing between autism and autism spectrum associated disorders comprising: at least one single nucleotide polymorphism on chromosome 5pl4.1 and/ or at least one single polymorphism nucleotide (SNP) set forth as: rs201171, rs4394668, rs4618985, rsl831870, rsl55288, rs5008948, rsl2024204, rsl 1162822, rs6424674, rsl0493644, rsl7425287, rs7523086, rsl6833075, rs492780, rsl467068, rs6732653, rsl 866206, rs4852531, rsl 1679682, rsl 1689493, rs4129081, rs6707773, rsl3019278, rsl2622496, rs3731723, rs722555, rs6436915, rs2279977, rs9822786, rsl2491012, rsl811763, rsl896731, rslOO38113, rsl 1739167, rs7447989, rs6873221, rsl2187724, rsl423435, rsl2153325, rsl 0461556, rs350436, rs830907, rs315717, rs3804254, rs2317222, rsl 936022, rs6907646, rsl2529724, rsl504279, rsl504281, rs2799644, rs320813, rsl529001, rsl 1765584, rsl529813, rs2528795, rslOOOO58, rsl 149558, rsl 130496, rsl2155975, rsl6930253, rs2082804, rs4739071, rs6991229, rsl865641, rsl865638, rsl 1001685, rs881631, rsl2245799, rsl2249859, rsl0748804, rs7910491, rsl 0840070, rsl 1021927, rs9919560, rs 12366827, rsl 894827, rsl 1055784, rsl 1055786, rsl 1117003, rsl0492086, rsl0444509, rsl 1616562, rs7325257, rsl 1636552, rsl532926, rs4456502, rs4148358, rs891754, rs8088001, rs547668, rsl 144093, rs637644, rs6049129, rs742759, or rsl71415, variants, mutants, alleles or complementary sequences thereof.
30. The biomarker of claim 29, wherein a single nucleotide polymorphism on chromosome 5pl4.1 comprises: rsl0065041, rs7704909, rsl896731, rslOO38113, rs6894838, rsl2518194, rs4307059, or rs4327572, variants, mutants, alleles or complementary sequences thereof.
31. The biomarker of claim 29, further comprising at least one mutation in a cadherin gene and/or protocadherin gene, variants, mutants, alleles or complementary sequences thereof.
32. The biomarker of claim 31 , wherein the cadherin gene comprises cadherin gene 9, cadherin 10, or protocadherin 10, variants, mutants, alleles or complementary sequences thereof.
PCT/US2009/054462 2008-08-20 2009-08-20 Genome-wide association study of autism reveals a comnon novel risk locus at 5p14.1 WO2010022235A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US9043608P 2008-08-20 2008-08-20
US61/090,436 2008-08-20

Publications (2)

Publication Number Publication Date
WO2010022235A2 true WO2010022235A2 (en) 2010-02-25
WO2010022235A3 WO2010022235A3 (en) 2010-04-22

Family

ID=41707657

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2009/054462 WO2010022235A2 (en) 2008-08-20 2009-08-20 Genome-wide association study of autism reveals a comnon novel risk locus at 5p14.1

Country Status (1)

Country Link
WO (1) WO2010022235A2 (en)

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
ABRAHAMS ET AL.: 'Advances in autism genetics: on the threshold of a new neurobiology.' NATURE REV GENET MAY vol. 9, 2008, pages 341 - 355 *
DATABASE NCBI 25 May 2006 Database accession no. rs201171 *
MORROW ET AL.: 'Identifying autism loci and genes by tracing recent shared ancestry.' SCIENCE vol. 321, 11 July 2008, pages 218 - 223 *
WANG ET AL.: 'Common genetic variants on 5p14.1 associate with autism spectrum disorders' NATURE vol. 459, 28 May 2009, pages 528 - 533 *
YAGI ET AL.: 'Cadherin superfamily genes: functions, genomic organization, and neurologic diversity.' GENES & DEVELOPMENT vol. 14, 2000, pages 1169 - 1180 *

Also Published As

Publication number Publication date
WO2010022235A3 (en) 2010-04-22

Similar Documents

Publication Publication Date Title
EP3730188B1 (en) Compositions and methods for use in combination for the treatment and diagnosis of autoimmune diseases
JP6078211B2 (en) Genetic changes associated with autism and the phenotype of autism and its use for diagnosis and treatment of autism
US20160222468A1 (en) Diagnosis, prognosis and treatment of glioblastoma multiforme
US20110160285A1 (en) Identification of mirna profiles that are diagnostic of hypertrophic cardiomyopathy
JP6216486B2 (en) Association of low-frequency recurrent genetic variation with attention-deficit / hyperactivity disorder and its use for diagnosis and treatment
US20090130660A1 (en) Single Nucelotide Polymorphism (SNP)
JP2010510769A (en) Methods and compositions for diagnosis of esophageal cancer and prognosis and improvement of patient survival
Aulchenko et al. LPIN2 is associated with type 2 diabetes, glucose metabolism, and body composition
Alkelai et al. Identification of new schizophrenia susceptibility loci in an ethnically homogeneous, family‐based, Arab‐Israeli sample
WO2006063703A2 (en) Single nucleotide polymorphism (snp) associated to type ii diabetes
CN113498437B (en) Pharmaceutical composition for preventing or treating cancer comprising terminal uridylyltransferase 4/7 expression modulator
JP2007528707A (en) EGR gene as a target for diagnosis and treatment of schizophrenia
US20210222233A1 (en) Compositions and methods for diagnosing and treating arrhythmias
US20220349008A1 (en) Novel genetic markers for postural orthostatic tachycardia syndrome (pots) and methods of use thereof for diagnosis and treatment of the same
US10202647B2 (en) Mutations in DSTYK cause dominant urinary tract malformations
US10648034B2 (en) Compositions and methods for diagnosing and treating meningioma
WO2012094681A1 (en) Compositions and methods for the diagnosis of schizophrenia
WO2010022235A2 (en) Genome-wide association study of autism reveals a comnon novel risk locus at 5p14.1
US20170357750A1 (en) Method for evaluating drug sensitivity and disease vulnerability by analyzing cyclic amp responsive element binding protein gene
US20130131148A1 (en) Micro-rna for cancer diagnosis, prognosis and therapy
EP2992112A1 (en) Mutations in pdgfrb and notch3 as causes of autosomal dominant infantile myofibromatosis
WO2012106404A2 (en) Diagnosis and treatment of neurological disorders through vipr2 and vpac2r
CN114480651B (en) Antisense oligonucleotide of PCAT1 and application thereof in preparation of medicaments for inhibiting prostate cancer nucleic acid
CN108752453B (en) LEMD3 and application of mutation thereof in BAVM diagnosis and treatment
WO2013035861A1 (en) Method for determining susceptibility to age-related macular degeneration, primer pair, probe, age-related macular degeneration diagnostic kit, therapeutic agent for age-related macular degeneration, and screening method for therapeutic agent for age-related macular degeneration

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09808818

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09808818

Country of ref document: EP

Kind code of ref document: A2