CA2863887C - Methods of screening low frequency gdna variation biomarkers for pervasive developmental disorder (pdd) or pervasive developmental disorder - not otherwise specified (pdd_nos) - Google Patents

Methods of screening low frequency gdna variation biomarkers for pervasive developmental disorder (pdd) or pervasive developmental disorder - not otherwise specified (pdd_nos) Download PDF

Info

Publication number
CA2863887C
CA2863887C CA2863887A CA2863887A CA2863887C CA 2863887 C CA2863887 C CA 2863887C CA 2863887 A CA2863887 A CA 2863887A CA 2863887 A CA2863887 A CA 2863887A CA 2863887 C CA2863887 C CA 2863887C
Authority
CA
Canada
Prior art keywords
pdd
seq
nos
subjects
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CA2863887A
Other languages
French (fr)
Other versions
CA2863887A1 (en
Inventor
Eli Hatchwell
Peggy S. Eis
Stephen Scherer
Apama PRASAD
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hospital for Sick Children HSC
Population Bio Inc
Original Assignee
Hospital for Sick Children HSC
Population Bio Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hospital for Sick Children HSC, Population Bio Inc filed Critical Hospital for Sick Children HSC
Publication of CA2863887A1 publication Critical patent/CA2863887A1/en
Application granted granted Critical
Publication of CA2863887C publication Critical patent/CA2863887C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/111General methods applicable to biologically active non-coding nucleic acids
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6845Methods of identifying protein-protein interactions in protein mixtures
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/14Type of nucleic acid interfering N.A.
    • C12N2310/141MicroRNAs, miRNAs
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2320/00Applications; Uses
    • C12N2320/10Applications; Uses in screening processes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/38Pediatrics

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Biomedical Technology (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Physics & Mathematics (AREA)
  • Analytical Chemistry (AREA)
  • Microbiology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Immunology (AREA)
  • Biochemistry (AREA)
  • Pathology (AREA)
  • Hematology (AREA)
  • Urology & Nephrology (AREA)
  • Medicinal Chemistry (AREA)
  • General Physics & Mathematics (AREA)
  • Plant Pathology (AREA)
  • Cell Biology (AREA)
  • Food Science & Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Provided herein are methods for screening a subject for a Pervasive Developmental Disorder (PDD) or a Pervasive Developmental Disorder ¨ Not Otherwise Specified (PDD-NOS) using a panel comprising low frequency genomic DNA (gDNA) variation biomarkers. Also provided are methods for measuring expression levels of polypeptides using antibodies or aptamer panels for the gDNA variation biomarkers.

Description

DEMANDES OU BREVETS VOLUMINEUX
LA PRESENTE PARTIE DE CETTE DEMANDE OU CE BREVETS
COMPREND PLUS D'UN TOME.
CECI EST LE TOME 1 ________________ DE 2 NOTE: Pour les tomes additionels, veillez contacter le Bureau Canadien des Brevets.
JUMBO APPLICATIONS / PATENTS
THIS SECTION OF THE APPLICATION / PATENT CONTAINS MORE
THAN ONE VOLUME.

NOTE: For additional volumes please contact the Canadian Patent Office.

Methods of Screening Low Frequency gDNA Variation Biomarkers for Pervasive Developmental Disorder (PDD) or Pervasive Developmental Disorder ¨ Not Otherwise Specified (PDD_NOS) [00011
[0002]
BACKGROUND OF THE INVENTION
[0003] Genetic risk can be conferred by subtle differences in individual genomes within a population. Genes can differ between individuals due to genomic variability, the most frequent of which are due to single nucleotide polymorphisms (SNPs). SNPs can be located, on average, every 500-1000 base pairs in the human genome. Additional genetic polymorphisms in a human genome can be caused by duplication, insertion, deletion, translocation and/or inversion, of short and/or long stretches of DNA. Thus, in general, genetic variability among individuals occurs on many scales, ranging from single nucleotide changes, to gross changes in chromosome structure and function. Recently, many copy number variations (CNVs) of DNA segments, including deletions, insertions, duplications, amplifications and complex multi-site variants, ranging in length from kilobases to megabases in size, have been discovered (Redon, R. et al. Nature 444:444-54 (2006) and Estivill, X. & Armengol, L. PLoS Genetics 3:e190 (2007)). To date, known CNVs account for over 15% of the assembled human genome (Estivill, X.
Armengol, L.
PLoS Genetics 3:e190 (2007)). However, a majority of these variants are extremely rare and cover a small percentage of a human genome of any particular individual.
[0004] Today, it is estimated that one in every 110 children is diagnosed with Autism Spectrum Disorder (ASD), making it more common than childhood cancer, juvenile diabetes and pediatric AIDS combined. An estimated 1.5 million individuals in the U.S. and tens of millions worldwide Date Recue/Date Received 2021-04-27 are affected by autism. Government statistics suggest the prevalence rate of autism is increasing 10-17 percent annually. There is no established explanation for this increase, although improved screening and environmental influences are two reasons often considered.
Studies suggest boys are more likely than girls to develop autism and receive the screening three to four times more frequently. Current estimates are that in the United States alone, one out of 70 boys is diagnosed with autism. ASD can be characterized by problems and symptoms in the following areas:
communication, both verbal and non-verbal, such as pointing, eye contact, and smiling; social, such as sharing emotions, understanding how others think and feel, and holding a conversation;
and routines or repetitive behaviors (also called stereotyped behaviors), such as repeating words or actions, obsessively following routines or schedules, and playing in repetitive ways. As genetic variations conferring risk to developmental disorders, including ASD, are uncovered, genetic testing can play a role for clinical therapeutics.
100051 Despite these advances towards an understanding of the etiology of developmental disorders, a large fraction of the genetic contribution to these disorders remains undetermined.
Identification of underlying genetic variants that can contribute to developmental disorder pathogenesis can aid in the screening and identification of individuals at risk of developing these disorders and can be useful for disease management. There is a need to identify new treatments for developmental disorders, specifically ASD, and the identification of novel genetic risk factors can assist in the development of potential therapeutics and agents.
There is also a need for improved assays for predicting and determining potential treatments and their effectiveness.
SUMMARY OF THE INVENTION
[0006] An aspect of the invention includes a method of screening one or more subjects for at least one genetic variation that disrupts or modulates one or more genes in Tables 1-7, comprising: assaying at least one genetic sample obtained from each of the one or more subjects for the at least one genetic variation in one or more genes in Tables 1-7.
[0007] In some embodiments, at least one genetic variation is associated with a Pervasive Developmental Disorders (PDD) or a Pervasive Developmental Disorder ¨ Not Otherwise Specified (PDD-NOS). In some embodiments, the at least one genetic variation is one encoded by SEQ1D NOs 1-643 or 2418-2557. In some embodiments, the at least one genetic variation comprises one or more point mutations, polymorphisms, translocations, insertions, deletions, amplifications, inversions, microsatellites, interstitial deletions, copy number variations (CNVs), or any combination thereof In some embodiments, the at least one genetic variation comprises a loss of heterozygosity. In some embodiments, the at least one genetic variation disrupts or modulates one or more genomic sequences of SEQ ID NOs 644-2417 or 2558-2739.
In some embodiments, the at least one genetic variation disrupts or modulates the expression or function of one or more RNA transcripts, one or more polypeptides, or a combination thereof, expressed from the one or more genomic sequences of SEQ ID NOs 644-2417 or 2558-2739.
[0008] In some embodiments, the assaying comprises detecting nucleic acid information from the at least one genetic sample. In some embodiments, the nucleic acid information is detected by one or more methods selected from the group comprising PCR, sequencing, Northern blots, or any combination thereof In some embodiments, the sequencing comprises one or more high-throughput sequencing methods. In some embodiments, the one or more high throughput sequencing methods comprise Massively Parallel Signature Sequencing (MPSS), polony sequencing, 454 pyrosequencing, Illumina sequencing, SOLiD sequencing, ion semiconductor sequencing, DNA nanoball sequencing, heliscope single molecule sequencing, single molecule real time (SMRT) sequencing, RNAP sequencing, Nanopore DNA sequencing, sequencing by hybridization, or microfluidic Sanger sequencing. In some embodiments, the at least one genetic sample is collected from blood, saliva, urine, serum, tears, skin, tissue, or hair from the one or more subjects. In some embodiments, the assaying the at least one genetic sample of the one or more subjects comprises purifying nucleic acids from the at least one genetic sample. In some embodiments, the assaying the at least one genetic sample of the one or more subjects comprises amplifying at least one nucleotide sequence in the at least one genetic sample. In some embodiments, the assaying the at least one genetic sample for at least one genetic variation comprises a microarray analysis of the at least one genetic sample. In some embodiments, the microarray analysis comprises a CGH array analysis. In some embodiments, the CGH array detects the presence or absence of the at least one genetic variations.
[0009] In some embodiments, the method further comprises determining whether the one or more subjects has a Pervasive Developmental Disorders (PDD) or a Pervasive Developmental Disorder ¨ Not Otherwise Specified (PDD-NOS), or an altered susceptibility to a PDD or PDD-NOS. In some embodiments, the one or more subjects were previously diagnosed or are suspected as having the PDD or PDD-NOS based on an evaluation by a psychologist, a neurologist, a psychiatrist, a speech therapist, or other professionals who screen subjects for a PDD or a PDD-NOS. In some embodiments, the determining comprises an evaluation of the one or more subject's communication, socialization, cognitive abilities, body movements, or a combination thereof. In some embodiments, the evaluation comprises observation, a questionnaire, a checklist, a test, or a combination thereof In some embodiments, the evaluation comprises a Checklist of Autism in Toddlers (CHAT), a modified Checklist for Autism in Toddlers (M-CHAT), a Screening Tool for Autism in Two-Year-Olds (STAT), a Social Communication Questionnaire (SCQ) for children 4 years of age and older, an Autism Diagnosis Interview-Revised (ADI-R), an Autism Diagnostic Observation Schedule (ADOS), a Childhood Autism Rating Scale (CARS), an Autism Spectrum Screening Questionnaire (ASSQ), an Australian Scale for Asperger's Syndrome, a Childhood Asperger Syndrome Test (CAST), or a combination thereof. In some embodiments, the screening the one or more subjects further comprises selecting one or more therapies based on the presence or absence of the one or more genetic variations. In some embodiments, the assaying at least one genetic sample obtained from each of the one or more subjects comprises analyzing the whole genome or whole exome from the one or more subjects. In some embodiments, the nucleic acid information has already been obtained for the whole genome or whole exome from the one or more individuals and the nucleic acid information is obtained from in silico analysis.
[0010] In some embodiments, the PDD is Autism Spectrum Disorder (ASD). In some embodiments, the PDD-NOS is Asperger Syndrome, Rett Syndrome or Childhood Disintegrative Disorder. In some embodiments, the one or more subjects has at least one symptom of a PDD. In some embodiments, the PDD is ASD. In some embodiments, the at least one symptom comprises difficulty with verbal communication, difficulty using language, difficulty understanding language, difficulty with non-verbal communication, difficulty with social interaction, unusual ways of playing with toys and other objects, difficulty adjusting to changes in routine or familiar surroundings, repetitive body movements or patterns of behavior, changing response to sound, temper tantrums, difficulty sleeping, aggressive behavior, fearfulness or anxiety, or a combination therof. In some embodiments, the at least one symptom comprises not babbling, pointing, or making meaningful gestures by 1 year of age, not speaking one word by 16 months of age, not combining two words by 2 years of age, not responding to their name, losing language, losing social skills, qualitative impairment in social interaction, impairments in the use of multiple nonverbal behaviors to regulate social interaction, failure to develop peer relationships appropriate to developmental level, not spontaneously seeking to share enjoyment or interests or achievements with other people, lacking social or emotional reciprocity, qualitative impairments in verbal communication, repetitive and stereotyped patterns of behavior and interests and activities, encompassing preoccupation with one or more stereotyped and restricted patterns of interest that is abnormal either in intensity or focus, apparently inflexible adherence to specific and nonfunctional routines or rituals, stereotyped and repetitive motor mannerisms, persistent preoccupation with parts of objects, abnormal functioning in symbolic or imaginative play, or a combination thereof. In some embodiments, the one or more subjects has at least one symptom of a PDD-NOS. In some embodiments, the at least one symptom of a PDD-NOS comprises qualitative impairment in social interaction, marked impairments in the use of multiple nonverbal behaviors to regulate social interaction, failure to develop peer relationships appropriate to developmental level, a lack of spontaneous seeking to share enjoyment or interest or achievements with other people lack of social or emotional reciprocity, restricted repetitive and stereotyped patterns of behavior or interests and activities, encompassing preoccupation with one or more stereotyped and restricted patterns of interest, nonfunctional routines or rituals, stereotyped and repetitive motor mannerisms, persistent preoccupation with parts of objects, clinically significant impairments in social or occupationalor other important areas of functioning, deceleration of head growth between ages 5 and 48 months, loss of previously acquired purposeful hand skills between ages 5 and 30 months with the subsequent development of stereotyped hand movements, loss of social engagement early in the, appearance of poorly coordinated gait or trunk movements, severely impaired expressive and receptive language development with severe psychomotor retardation, clinically significant loss of previously acquired skills before age 10 years, impairment in nonverbal behaviors, failure to develop peer relationships, lack of social or emotional reciprocity, qualitative impairments in communication restricted or repetitive or and stereotyped patterns of behavior or interests and activities, or a combination thereof.
[0011] In some embodiments, the one or more subjects is human. In some embodiments, the one or more subjects is less than 12 years old, less than 8 years old, less than 6 years old, or less than 3 years.
[0012] An aspect of the invention includes a method of diagnosing one or more subjects for a PDD or a PDD-NOS, comprising: assaying at least one genetic sample of each of the one or more subjects for the presence or absence of at least one genetic variation in one or more genes in Tables 1-7.
[0013] In some embodiments, the at least one genetic variation is one encoded by SEQ ID NOs 1-643 or 2418-2557. In some embodiments, the one or ore subjects is diagnosed with the PDD or PDD-NOS if the at least one genetic variation is present. In some embodiments, the one or more subjects is not diagnosed with PDD or PDD-NOS if the at least one genetic variation is absent.
[0014] In some embodiments, the assaying comprises detecting nucleic acid information from the at least one genetic sample. In some embodiments, the nucleic acid information is detected by one or more methods selected from the group comprising PCR, sequencing, Northern blots, or any combination thereof. In some embodiments, the sequencing comprises one or more high-throughput sequencing methods. In some embodiments, the one or more high throughput sequencing methods comprise Massively Parallel Signature Sequencing (MPSS), polony sequencing, 454 pyrosequencing, Illumina sequencing, SOLiD sequencing, ion semiconductor sequencing, DNA nanoball sequencing, heliscope single molecule sequencing, single molecule real time (SMRT) sequencing, RNAP sequencing, Nanopore DNA sequencing, sequencing by hybridization, or microfluidic Sanger sequencing. In some embodiments, the ethod further comprises determining whether the one or more subjects has a PDD or PDD-NOS or an altered susceptibility to a PDD or PDD-NOS. In some embodiments, the one or more subjects were previously diagnosed or are suspected as having the PDD or PDD-NOS based on an evaluation by a psychologist, a neurologist, a psychiatrist, a speech therapist, or other professionals who screen subjects for a PDD or a PDD-NOS.
[0015] In some embodiments, the determining comprises an evaluation of the one or more subject's communication, socialization, cognitive abilities, body movements, or a combination thereof. In some embodiments, the evaluation comprises an evaluation of the one or more subject's communication, socialization, cognitive abilities, body movements, or a combination thereof. In some embodiments, the evaluation comprises observation, a questionnaire, a checklist, a test, or a combination thereof In some embodiments, the evaluation comprises a Checklist of Autism in Toddlers (CHAT), a modified Checklist for Autism in Toddlers (M-CHAT), a Screening Tool for Autism in Two-Year-Olds (STAT), a Social Communication Questionnaire (SCQ) for children 4 years of age and older, an Autism Diagnosis Interview-Revised (ADI-R), an Autism Diagnostic Observation Schedule (ADOS), a Childhood Autism Rating Scale (CARS), an Autism Spectrum Screening Questionnaire (ASSQ), an Australian Scale for Asperger's Syndrome, a Childhood Asperger Syndrome Test (CAST), or a combination thereof. In some embodiments, the determining comprises comparing the nucleic acid information to those of one or more other subjects.
[0016] In some embodiments, the one more subjects comprise one or more subjects not suspected of having the PDD or the PDD-NOS. In some embodiments, the one or more other subjects comprise one or more subjects suspected of having the PDD or the PDD-NOS. In some embodiments, one or more subjects comprise one or more subjects with the PDD
or the PDD-NOS. In some embodiments, the one or more other subjects comprise one or more subjects without the PDD or the PDD-NOS. In some embodiments, the one or more subjects comprise one or more subjects who are symptomatic for the PDD or the PDD-NOS. In some embodiments, the one or more other subjects comprise one or more subjects who are asymptomatic for the PDD or the PDD-NOS. In some embodiments, the one or more subjects comprise one or more subjects that have an increased susceptibility to the PDD
or the PDD-NOS. In some embodiments, the one or more subjects comprise one or more subjects that have a decreased susceptibility to the PDD or the PDD-NOS. In some embodiments, the one or more subjects comprise one or more subjects receiving a treatment, therapeutic regimen, or any combination thereof for a PDD or PDD-NOS.
[0017] In some embodiments, determining whether the one or more subjects have the PDD or the PDD-NOS or an altered susceptibility to the PDD or the PDD-NOS comprises analyzing at least one behavioral analysis of the one or more subjects and the nucleic acid sequence information of the one or more subjects, or a combination thereof [0018] In some embodiments, the at least one genetic sample is collected from blood, saliva, urine, serum, tears, skin, tissue, or hair from the one or more subjects. In some embodiments, the assaying the at least one genetic sample of the one or more subjects comprises purifying nucleic acids from the at least one genetic sample. In some embodiments, the assaying the at least one genetic sample of the one or more subjects comprises amplifying at least one nucleotide sequence in the at least one genetic sample. In some embodiments, the assaying the at least one genetic sample for at least one genetic variation comprises a microarray analysis of the at least one genetic sample. In some embodiments, the microarray analysis comprises a CGH array analysis. In some embodiments, the CGH array detects the presence or absence of the at least one genetic variations. In some embodiments, the at least one genetic variation comprises one or more point mutations, polymorphisms, translocations, insertions, deletions, amplifications, inversions, microsatellites, interstitial deletions, copy number variations (CNVs), or any combination thereof. In some embodiments, the at least one genetic variation comprises a loss of heterozygosity. In some embodiments, the at least one genetic variation disrupts or modulates one or more genomic sequences of SEQ ID NOs 644-2417 or 2558-2739. In some embodiments, the at least one genetic variation disrupts or modulates the expression or function of one or more RNA transcripts from the one or more genomic sequences of SEQ ID NOs 644-2417 or 2558-2739.
100191 In some embodiments, the assaying at least one genetic sample obtained from each of the one or more subjects comprises analyzing the whole genome or whole exome from the one or more subjects. In some embodiments, the nucleic acid information has already been obtained for the whole genome or whole exome from the one or more individuals and the nucleic acid information is obtained from in silico analysis. In some embodiments, the method further comprises selecting one or more therapies based on the presence or absence of the one or more genetic variations.

[0020] In some embodiments, the PDD is ASD. In some embodiments, the PDD-NOS
is Asperger Syndrome, Rett Syndrome or Childhood Disintegrative Disorder. In some embodiments, the one or more subjects has at least one symptom of a PDD. In some embodiments, the PDD is ASD. In some embodiments, the at least one symptom comprises difficulty with verbal communication, difficulty using language, difficulty understanding language, difficulty with non-verbal communication, difficulty with social interaction, unusual ways of playing with toys and other objects, difficulty adjusting to changes in routine or familiar surroundings, repetitive body movements or patterns of behavior, changing response to sound, temper tantrums, difficulty sleeping, aggressive behavior, fearfulness or anxiety, or a combination therof. In some embodiments, the at least one symptom comprises not babbling, pointing, or making meaningful gestures by 1 year of age, not speaking one word by 16 months of age, not combining two words by 2 years of age, not responding to their name, losing language, losingsocial skills, qualitative impairment in social interaction, impairments in the use of multiple nonverbal behaviors to regulate social interaction, failure to develop peer relationships appropriate to developmental level, not spontaneously seeking to share enjoyment or interests or achievements with other people, lacking social or emotional reciprocity, qualitative impairments in verbal communication, repetitive and stereotyped patterns of behavior and interests and activities, encompassing preoccupation with one or more stereotyped and restricted patterns of interest that is abnormal either in intensity or focus, apparently inflexible adherence to specific and nonfunctional routines or rituals, stereotyped and repetitive motor mannerisms, persistent preoccupation with parts of objects, abnormal functioning in symbolic or imaginative play, or a combination thereof. In some embodiments, the one or more subjects has at least one symptom of a PDD-NOS. In some embodiments, the at least one symptom of a PDD-NOS comprises qualitative impairment in social interaction, marked impairments in the use of multiple nonverbal behaviors to regulate social interaction, failure to develop peer relationships appropriate to developmental level, a lack of spontaneous seeking to share enjoyment or interest or achievements with other people lack of social or emotional reciprocity, restricted repetitive and stereotyped patterns of behavior or interests and activities, encompassing preoccupation with one or more stereotyped and restricted patterns of interest, nonfunctional routines or rituals, stereotyped and repetitive motor mannerisms, persistent preoccupation with parts of objects, clinically significant impairments in social or occupationalor other important areas of functioning, deceleration of head growth between ages 5 and 48 months, loss of previously acquired purposeful hand skills between ages 5 and 30 months with the subsequent development of stereotyped hand movements, loss of social engagement early in the, appearance of poorly coordinated gait or trunk movements, severely impaired expressive and receptive language development with severe psychomotor retardation, clinically significant loss of previously acquired skills before age 10 years, impairment in nonverbal behaviors, failure to develop peer relationships, lack of social or emotional reciprocity, qualitative impairments in communication restricted or repetitive or and stereotyped patterns of behavior or interests and activities, or a combination thereof.
[0021] In some embodiments, the one or more subjects is human. In some embodiments, the one or more subjects is less than 12 years old, less than 8 years old, less than 6 years old, or less than 3 years.
[0022] One aspect of the invention includes a method of screening for a therapeutic agent for treatment of a PDD or a PDD-NOS, comprising identifying an agent that disrupts or modulates one or more genomic sequences of SEQ ID NOs 644-2417 or 2558-2739 or one or more expression products thereof.
100231 In some embodiments, the one or more expression products comprise one or more RNA
transcripts. In some embodiments, the one or more RNA transcripts comprise one or more RNA
transcripts of Tables 4 and/or 7. In some embodiments, the one or more expression products comprise one or more polypeptides. In some embodiments, the one or more polypeptides are translated from one or more RNA transcripts of Tables 4 and/or 7. In some embodiments, disrupting or modulating the one or more genomic sequences of SEQ ID NOs 644-2417 or 2558-2739 or expression products thereof, comprises an increase in expression of the one or more expression products. In some embodiments, disrupting or modulating the one or more genomic sequences of SEQ ID NOs 644-2417 or 2558-2739 or expression products thereof, comprises a decrease in expression of the one or more expression products.
[0024] An aspect of the invention includes a method of treating a subject for a PDD or a PDD-NOS, comprising administering one or more agents to disrupt or modulate one or more genomic sequences of SEQ ID NOs 644-2417 or 2558-2739 or one or more expression products thereof, thereby treating the PDD or the PDD-NOS.
[0025] In some embodiments, the one or more expression products comprise one or more RNA
transcripts. In some embodiments, the one or more RNA transcripts comprise one or more RNA
transcripts of Tables 4 and/or 7. In some embodiments, the one or more expression products comprise one or more polypeptides. In some embodiments, the one or more polypeptides are translated from one or more RNA transcripts of Tables 4 and/or 7. In some embodiments, the one or more agents are selected from the group comprising: an antibody, a drug, a combination of drugs, a compound, a combination of compounds, radiation, a genetic sequence, a combination of genetic sequences, heat, cryogenics, and a combination of two or more of any combination thereof.
100261 In some embodiments, the PDD is ASD. In some embodiments, the PDD-NOS
is Asperger Syndrome, Rett Syndrome or Childhood Disintegrative Disorder. In some embodiments, the one or more subjects has at least one symptom of a PDD. In some embodiments, the PDD is ASD. In some embodiments, the at least one symptom comprises difficulty with verbal communication, difficulty using language, difficulty understanding language, difficulty with non-verbal communication, difficulty with social interaction, unusual ways of playing with toys and other objects, difficulty adjusting to changes in routine or familiar surroundings, repetitive body movements or patterns of behavior, changing response to sound, temper tantrums, difficulty sleeping, aggressive behavior, fearfulness or anxiety, or a combination therof. In some embodiments, the at least one symptom comprises not babbling, pointing, or making meaningful gestures by 1 year of age, not speaking one word by 16 months of age, not combining two words by 2 years of age, not responding to their name, losing language, losing social skills, qualitative impairment in social interaction, impairments in the use of multiple nonverbal behaviors to regulate social interaction, failure to develop peer relationships appropriate to developmental level, not spontaneously seeking to share enjoyment or interests or achievements with other people, lacking social or emotional reciprocity, qualitative impairments in verbal communication, repetitive and stereotyped patterns of behavior and interests and activities, encompassing preoccupation with one or more stereotyped and restricted patterns of interest that is abnormal either in intensity or focus, apparently inflexible adherence to specific and nonfunctional routines or rituals, stereotyped and repetitive motor mannerisms, persistent preoccupation with parts of objects, abnormal functioning in symbolic or imaginative play, or a combination thereof In some embodiments, the one or more subjects has at least one symptom of a PDD-NOS. In some embodiments, the at least one symptom of a PDD-NOS comprises qualitative impairment in social interaction, marked impairments in the use of multiple nonverbal behaviors to regulate social interaction, failure to develop peer relationships appropriate to developmental level, a lack of spontaneous seeking to share enjoyment or interest or achievements with other people lack of social or emotional reciprocity, restricted repetitive and stereotyped patterns of behavior or interests and activities, encompassing preoccupation with one or more stereotyped and restricted patterns of interest, nonfunctional routines or rituals, stereotyped and repetitive motor mannerisms, persistent preoccupation with parts of objects, clinically significant impairments in social or occupationalor other important areas of functioning, deceleration of head growth between ages 5 and 48 months, loss of previously acquired purposeful hand skills between ages 5 and 30 months with the subsequent development of stereotyped hand movements, loss of social engagement early in the, appearance of poorly coordinated gait or trunk movements, severely impaired expressive and receptive language development with severe psychomotor retardation, clinically significant loss of previously acquired skills before age 10 years, impairment in nonverbal behaviors, failure to develop peer relationships, lack of social or emotional reciprocity, qualitative impairments in communication restricted or repetitive or and stereotyped patterns of behavior or interests and activities,or a combination thereof [0027] In some embodiments, the one or more subjects is human. In some embodiments, the one or more subjects is less than 12 years old, less than 8 years old, less than 6 years old, or less than 3 years.
[0028] An aspect of the invention includes a kit for screening for a PDD or PDD-NOS in one or more subjects, the kit comprising reagents for assaying a genetic sample from the one or more subjects for the presence of at least one genetic variation encoded by SEQID
NOs 1-643 or 2418-2557.
[0029] In some embodiments, the at least one genetic variation disrupts or modulates one or more genomic sequences of SEQ ID NOs 644-2417 or 2558-2739, or one or more expression products thereof In some embodiments, the one or more expression products comprise one or more RNA transcripts. In some embodiments, the one or more RNA transcripts comprise one or more RNA transcripts of Tables 4 and/or 7. In some embodiments, the one or more expression products comprise one or more polypeptides. In some embodiments, the one or more polypeptides are translated from one or more RNA transcripts of Tables 4 and/or 7.
[0030] In some embodiments, the reagents comprise nucleic acid probes. In some embodiments, the reagents comprise oligonucleotides. In some embodiments, the reagents comprise primers.
[0031] In some embodiments, the PDD is ASD. In some embodiments, the PDD-NOS
is Asperger Syndrome, Rett Syndrome or Childhood Disintegrative Disorder. In some embodiments, the one or more subjects has at least one symptom of a PDD. In some embodiments, the PDD is ASD. In some embodiments, the one or more subjects has at least one symptom of a PDD-NOS.
[0032] In some embodiments, the one or more subjects is human. In some embodiments, the one or more subjects is less than 12 years old, less than 8 years old, less than 6 years old, or less than 3 years.

[0033] An aspect of the invention includes an isolated polynucleotide sequence or fragment thereof, comprising at least 60% identity to any of polynucleotide sequence of SEQ ID NOs 1 to 2739.
[0034] In some embodiments, the isolated polynucleotide sequence comprises at least 70%
identity to any of polynucleotide sequence of SEQ ID NOs 1 to 2739. In some embodiments, the isolated polynucleotide sequence comprises at least 80% identity to any of polynucleotide sequence of SEQ ID NOs 1 to 2739. In some embodiments, the isolated polynucleotide sequence comprises at least 90% identity to any of polynucleotide sequence of SEQ ID
NOs 1 to 2739.
[0035] An aspect of the invention includes an isolated polynucleotide sequence comprising at least 60% identity to a compliment of any of polynucleotide sequence of SEQ ID
NOs 1 to 2739.
[0036] In some embodiments, the isolated polynucleotide sequence comprises at least 70%
identity to a compliment of any of polynucleotide sequence of SEQ ID NOs 1 to 2739. In some embodiments, the isolated polynucleotide sequence comprises at least 80%
identity to a compliment of any of polynucleotide sequence of SEQ ID NOs 1 to 2739. In some embodiments, the isolated polynucleotide sequence comprises at least 90% identity to a compliment of any of polynucleotide sequence of SEQ ID NOs 1 to 2739. In some embodiments, the isolated polynucleotide sequence comprises the polynucleotide sequence comprises any of a CNV of SEQ ID NOs 1-643 or 2418-2557. In some embodiments, the isolated polynucleotide sequence comprises comprises any of a genomic sequence of SEQ ID NOs 644-2417 or 2558-2739. In some embodiments, the isolated polynucleotide sequence comprises an RNA
sequence transcribed from a genomic sequence of SEQ ID NOs 644-2417 or 2558-2739. In some embodiments, the isolated polynucleotide sequence comprises any of a genetic variation not present in the human genome.
[0037] An aspect of the invention includes an isolated polypeptide encoded by an RNA sequence transcribed from any of genomic sequence of SEQ ID NOs 644-2417 or 2558-2739.
[0038] An aspect of the invention includes a host cell comprising an expression control sequence operably linked to a polynucleotide selected from the group consisting of any of polynucleotide sequence of SEQ ID Nos 644-2417 or 2558-2739, or a fragment thereof [0039] In some embodiments, the expression control sequence is non-native to the host cell. In some embodiments, the expression control sequence is native to the host cell.
[0040] An aspect of the invention includes a method for identifying an agent having a therapeutic benefit for treatment of a PDD or a PDD-NOS, comprising: a) providing cells comprising at least one genetic variation of SEQ ID NOs 1-643 or 2418-2557; b) contacting the cells of step a) with a test agent and c) analyzing whether the agent has a therapeutic benefit for treatment of the PDD or the PDD-NOS of step a), thereby identifying agents which have a therapeutic benefit for treatment of the PDD or the PDD-NOS.
[0041] In some embodiments, the method further comprises: d) providing cells which do not comprise at least one genetic variation of SEQ ID NOs 1-643 or 2418-2557; e) contacting the cells of steps a) and d) with a test agent; and 0 analyzing whether the agent has a therapeutic benefit for treatment of the PDD or the PDD-NOS of step a) relative to those of step b), thereby identifying agents which have a therapeutic benefit for treatment of the PDD
or the PDD-NOS.
In some embodiments, the therapeutic agent has efficacy for the treatment of a PDD or a PDD-NOS.
[0042] An aspect of the invention includes a therapeutic agent identified by any of the methods described herein.
[0043] An aspect of the invention includes a panel of biomarkers for a PDD or a PDD-NOS
comprising one or more genes contained in the one or more polynucleotide sequences selected from SEQ ID NOs 644-2417 or 2558-2739.
[0044] In some embodiments, the panel comprises two or more genes contained in the one or more polynucleotide sequences selected from SEQ ID NOs 644-2417 or 2558-2739.
In some embodiments, the panel comprises at least 5, 10, 25, 50, 100 or 200 genes contained in the one or more polynucleotide sequences selected from SEQ ID NOs 644-2417 or 2558-2739.
In some embodiments, at least one of the polynucleotide sequences is a fragment of the one or more polynucleotide sequences selected from SEQ ID NOs 644-2417 or 2558-2739. In some embodiments, at least one of the polynucleotide sequences is a variant of the one or more polynucleotide sequences selected from SEQ ID NOs 644-2417 or 2558-2739. In some embodiments, the panel is selected for analysis of polynucleotide expression levels for a PDD or a PDD-NOS. In some embodiments, the polynucleotide expression levels arc mRNA
expression levels. In some embodiments, the panel is used in the management of patient care for a PDD or a PDD-NOS, wherein the management of patient care includes one or more of risk assessment, early diagnosis, prognosis establishment, patient treatment monitoring, and treatment efficacy detection. In some embodiments, the panel is used in discovery of therapeutic intervention of a PDD or a PDD-NOS.
[0045] An aspect of the invention includes a method for measuring expression levels of polynucleotide sequences from biomarkers for a PDD or a PDD-NOS in a subject, comprising:
a) selecting a panel of biomarkers comprising two or more genes contained in one or more polynucleotide sequences selected from SEQ ID Nos 644-2417 or 2558-2739; b) isolating cellular RNA from a sample obtained from the subject; c) synthesizing cDNA
from the cellular RNA for each biomarker in the panel using suitable primers; d) optionally amplifying the cDNA;
and e) quantifying levels of the cDNA from the sample.
[0046] In some embodiments, the step of selecting a panel of biomarkers comprises at least 5, 10, 25, 50, 100 or 200 genes contained in one or more polynucleotide sequences selected from SEQ ID NOs 644-2417 or 2558-2739. In some embodiments, the step of quantifying the levels of cDNA further comprises labeling cDNA. In some embodiments, labeling cDNA
comprises labeling with at least one chromophore. In some embodiments, the cDNA levels for the sample are compared to a control cDNA level. In some embodiments, the comparison is used in the management of patient care in PDD or PDD-NOS. In some embodiments, the management of patient care includes one or more of risk assessment, early diagnosis, establishing prognosis, monitoring patient treatment, and detecting treatment efficacy. In some embodiments, the comparison is used in discovery of therapeutic intervention of PDD or PDD-NOS.
[0047] An aspect of the invention includes a method for measuring expression levels of polypeptides comprising: a) selecting a panel of biomarkers comprising at least two polypeptides encoded by an RNA sequence transcribed from a genomic sequence of SEQ ID Nos 644-2417 or 2558-2739; b) obtaining a biological sample; c) creating an antibody panel for each biomarker in the panel; d) using the antibody panel to bind the polypeptides from the sample; and e) quantifying levels of the polypeptides bound from the sample to the antibody panel.
[0048] In some embodiments, the polypeptide levels of the biological sample are increased or decreased compared to the polypeptide levels of a control biological sample.
In some embodiments, the subject is treated for a PDD or PDD-NOS patient based on the quantified levels of the polypeptides bound from the sample to the antibody panel. In some embodiments, the treatment of a subject includes one or more of risk assessment, early diagnosis, establishing prognosis, monitoring patient treatment, and detecting treatment efficacy. In some embodiments, the comparison is used in discovery of a therapeutic intervention of a PDD or PDD-NOS.
[0049] An aspect of the invention includes a kit for the determination of PDD
or PDD-NOS
comprising: at least one reagent that is used in analysis of one or more polynucleotide expression levels for a panel of biomarkers for PDD or PDD-NOS, wherein the panel comprises two or more genes contained in one or more polynucleotide sequences selected from SEQ
ID NOs 644-2417 or 2558-2739, and instructions for using the kit for analyzing the expression levels.
[0050] In some embodiments, the one or more polynucleotide expression levels comprise one or more RNA transcript expression levels. In some embodiments, the one or more RNA transcript expression levels correspond to one or more RNA transcripts of Tables 4 and/or 7. In some embodiments, the at least one reagent comprises at least two sets of suitable primers. In some embodiments, the at least one reagent comprises a reagent for the preparation of cDNA. In some embodiments, the at least one reagent comprises a reagent that is used for detection and quantization of polynucleotides. In some embodiments, the at least one reagent comprises at least one chromophore.
[0051] An aspect of the invention includes a kit for the determination of PDD
or PDD-NOS
comprising: at least one reagent that is used in analysis of polypeptide expression levels for a panel of biomarkers for PDD or PDD-NOS, wherein the panel comprises at least two polypeptides expressed from two or more genes contained in one or more polynucleotide sequences selected from SEQ ID NOs 644-2417 or 2558-2739; and instructions for using the kit for analyzing the expression levels.
[0052] In some embodiments, the reagent is an antibody reagent that binds a polypeptide selected in the panel. In some embodiments, the kit further comprises a reagent that is used for detection of a bound polypeptide. In some embodiments, the reagent includes a second antibody.
100531 An aspect of the invention includes a method of screening a subject for a PDD or PDD-NOS, the method comprising: a) assaying a nucleic acid sample obtained from the subject by PCR, array Comparative Genomic Hybridization, sequencing, SNP genotyping, or Fluorescence in Situ Hybridization to detect sequence information for more than one genetic loci; b) comparing the sequence information to a panel of nucleic acid biomarkers, wherein the panel comprises at least one nucleic acid biomarker for each of the more than one genetic loci; and wherein the panel comprises at least 2 low frequency nucleic acid biomarkers, wherein the low frequency nucleic acid biomarkers occur at a frequency of 0.1% or less in a population of subjects without a diagnosis of the PDD or PDD-NOS; and c) screening the subject for the presence or absence of the PDD or the PDD-NOS if one or more of the low frequency biomarkers in the panel are present in the sequence information.
[0054] In some embodiments, the panel comprises at least 5, 10, 25, 50, 100 or 200 low frequency nucleic acid biomarkers. In some embodiments, the presence or absence of the PDD
or the PDD-NOS in the subject is determined with at least 50% confidence. In some embodiments, the low frequency biomarkers occur at a frequency of 0.01% or less, 0.001% or less, or 0.0001% or less in a population of subjects without a diagnosis of the PDD or the PDD-NOS. In some embodiments, the panel of nucleic acid biomarkers comprises at least two genes contained in the one or more polynucleotide sequences selected from SEQ ID NOs 644-2417 or 2558-2739. In some embodiments, the PDD is ASD.
100551 In some embodiments, the PDD-NOS is Asperger Syndrome, Rett Syndrome or Childhood Disintegrative Disorder. In some embodiments, the method further comprises identifying a therapeutic agent useful for treating the PDD or the PDD-NOS. In some embodiments, the method further comprises administering one or more of the therapeutic agents to the subject if one or more of the low frequency biomarkers in the panel are present in the sequence infoimation.
[0056] An aspect of the invention includes a kit for screening a subject for a PDD or a PDD-NOS, the kit comprising at least one reagent for assaying a nucleic acid sample from the subject for information on a panel of nucleic acid biomarkers, wherein the panel comprises at least 2 low frequency biomarkers, and wherein the low frequency biomarkers occur at a frequency of 0.1%
or less in a population of subjects without a diagnosis of the PDD or the PDD-NOS.
[0057] In some embodiments, a presence or absence of the PDD or the PDD-NOS in the subject is determined with a 50% confidence. In some embodiments, the panel comprises at least 5, 10, 25, 50, 100 or 200 low frequency nucleic acid biomarkers. In some embodiments, the low frequency biomarkers occur at a frequency of 0.01% or less, 0.001% or less, or 0.0001% or less in a population of subjects without a diagnosis of the PDD or PDD-NOS. In some embodiments, the panel of nucleic acid biomarkers comprises at least two genes contained in the one or more polynucleotide sequences selected from SEQ ID NOs 644-2417 or 2558-2739. In some embodiments, the at least one reagent comprises at least two sets of suitable primers. In some embodiments, the at least one reagent comprises a reagent for the preparation of cDNA. In some embodiments, the at least one reagent comprises a reagent that is used for detection and quantization of polynucleotides. In some embodiments, the at least one reagent comprises at least one chromophore.
[0058] An aspect of the invention includes a method of generating a panel of nucleic acid biomarkers comprising: a) assaying a nucleic acid sample from a first population of subjects by PCR, array Comparative Gcnomic Hybridization, sequencing, SNP gcnotyping, or Fluorescence in Situ Hybridization for nucleic acid sequence information, wherein the subjects of the first population have a diagnosis of a PDD or a PDD-NOS. b) assaying a nucleic acid sample from a second population of subjects by PCR, array Comparative Gcnomic Hybridization, sequencing, SNP genotyping, or Fluorescence in Situ Hybridization for nucleic acid sequence information, wherein the subjects of the second population are without a diagnosis of a PDD
or a PDD-NOS;
c) comparing the nucleic acid sequence information from step (a) to that of step (b); d) determining the frequency of one or more biomarkers from the comparing step;
and e) generating the panel of a nucleic acid biomarkers, wherein the panel comprises at least 2 low frequency biomarkers, and wherein the low frequency biomarkers occur at a frequency of 0.1%
or less in a population of subjects without a diagnosis of a PDD or a PDD-NOS.

[0059] In some embodiments, the subjects in the second population of subjects without a diagnosis of a PDD or a PDD-NOS comprise one or more subjects not suspected of having the PDD or the PDD-NOS. In some embodiments, the subjects in the second population of subjects without a diagnosis of a PDD or a PDD-NOS comprise one or more subjects without the PDD or the PDD-NOS. In some embodiments, the subjects in the second population of subjects without a diagnosis of a PDD or a PDD-NOS comprise one or more subjects who are asymptomatic for the PDD or the PDD-NOS. In some embodiments, the subjects in the second population of subjects without a diagnosis of a PDD or a PDD-NOS comprise one or more subjects who have decreased susceptibility to the PDD or the PDD-NOS. In some embodiments, the subjects in the second population of subjects without a diagnosis of a PDD or a PDD-NOS comprise one or more subjects who are unassociated with a treatment, therapeutic regimen, or any combination thereof In some embodiments, the panel comprises at least 5, 10, 25, 50, 100 or 200 low frequency nucleic acid biomarkers. In some embodiments, the low frequency biomarkers occur at a frequency of 0.01% or less, 0.001% or less, or 0.0001% or less in the second population of subjects without a diagnosis of a PDD or a PDD-NOS. In some embodiments, the panel of nucleic acid biomarkers comprises at least two genes contained in the one or more polynucleotide sequences selected from SEQ ID NOs 644-2417 or 2558-2739.
[0060] An aspect of the invention includes an array comprising a plurality of nucleic acid probes, wherein each probe comprises a sequence complimentary to a target sequence of one of the polynucleotide sequences selected from SEQ ID NOs 644-2417 or 2558-2739, or a fragment thereof.
[0061] In some embodiments, the plurality of nucleic acid probes comprises at least 5, 10, 25, 50, 100 or 200 of the nucleic acid probes. In some embodiments, the array further comprises a second plurality of nucleic acid probes, wherein each probe in the second plurality of nucleic acid probes comprises a sequence complimentary to a complimentary target sequence of one of the polynucleotide sequences selected from SEQ ID NOs 1-643 or 2418-2557, or a fragment thereof In some embodiments, second plurality of nucleic acid probes comprises at least 5, 10, 25, 50, 100 or 200 nucleic acid probes. In some embodiments, each different nucleic acid probe is attached to a bead. In some embodiments, each different nucleic acid probe is labeled with a detectable label. In some embodiments, each different nucleic acid probe is attached to a solid support in a determinable location of the array. In some embodiments, the solid support comprises plastics, glass, beads, microparticles, microtitre dishes, or gels.
In some embodiments, the array further comprises control probes.

[0062]
BRIEF DESCRIPTION OF THE DRAWINGS
[0063] The novel features of the disclosure are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present disclosure will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the disclosure are utilized, and the accompanying drawings.
100641 Figure 1 depicts a 1og2 ratio plot of CGH probe data showing a deletion impacting the SYNGAP1 gene (gray bar located at chr6:33400195-33511247) in an individual with ASD. See Table 1 for other deletions (11-111 Kb size range) impacting SYNCiAP I that arc present in other ASD patients (10 of 682 ASD patients and 0 of 1005 controls. The overall OR
for this gene was calculated to be 14.9).
[0065] Figure 2 depicts 1og2 ratio plots of CGH probe data (chr17: 76.3-78.0 Mb) for 2 unaffected parents (top and middle panel) and one male child with ASD (bottom panel). The child has a de novo complex rearrangement, resulting in a large duplication (clir17:76954271-77777066. size 822,795 bp) and a smaller deletion (chr17:77787243-77847938, size 60,695 bp), as detailed in Table 1.
DETAILED DESCRIPTION OF THE DISCLOSURE
[0066] The details of one or more inventive embodiments are set forth in the accompanying drawings, the claims, and in the description herein. Other features, objects, and advantages of inventive embodiments disclosed and contemplated herein will be apparent from the description and drawings, and from the claims. As used herein, unless otherwise indicated, the article "a"
means one or more unless explicitly otherwise provided for. As used herein, unless otherwise indicated, terms such as "contain," "containing," "include," "including," and the like mean "comprising."As used herein, unless otherwise indicated, the term "or" can be conjunctive or Date Recue/Date Received 2021-04-27 disjunctive. As used herein, unless otherwise indicated, any embodiment can be combined with any other embodiment. As used herein, unless otherwise indicated, some inventive embodiments herein contemplate numerical ranges. When ranges are present, the ranges include the range endpoints. Additionally, every subrange and value within the range is present as if explicitly written out.
[0067] Described herein are methods of identifying variations in nucleic acids and genes associated with one or more developmental conditions. Described herein are methods of screening for determining a subject's susceptibility to developing or having, one or more developmental disorders, for example, Autism Spectrum Disorder (ASD), based on identification and detection of genetic nucleic acid variations. Also described herein, are methods and compositions for treating and/or preventing one or more developmental conditions using a therapeutic modality. The present disclosure encompasses methods of assessing an individual for probability of response to a therapeutic agent for a developmental disorder, methods for predicting the effectiveness of a therapeutic agent for a developmental disorder, nucleic acids, polypeptides and antibodies and computer-implemented functions. Kits for screening a sample from a subject to detect or determine susceptibility to a developmental disorder are also encompassed by the disclosure.
Genetic Variations Associated with Developmental Disorders [0068] Genomic sequences within populations exhibit variability between individuals at many locations in the genome. For example, the human genome exhibits sequence variations that occur on average every 1,000 base pairs. Such genetic variations in nucleic acid sequences are commonly referred to as polyTnorphisms or polymorphic sites. In some embodiments, these genetic variations can be found to be associated with one or more disorders and/or diseases using the methods disclosed herein. In some embodiments the one or more disorders and/or diseases comprise one or more developmental disorders. In some embodiments the one or more developmental disorders comprise one or more Pervasive Developmental Disorders (PDD). In some embodiments, the one or more PDDs comprise ASD. ASD can refer to autism.
In another embodiment, the one or more developmental disorders comprise Pervasive Developmental Disorder ¨ Not Otherwise Specified (PDD-NOS). In some embodiments, PDD-NOS can comprise Asperger Syndrome, Rett Syndrome, fragile X syndrome and/or Childhood Disintegrative Disorder. In some embodiments genetic variations can be associated with one or more PDDs. In some embodiments genetic variations can be associated with one or more PDD-NOSs.

[0069] Scientific evidence suggests there is a potential for various combinations of factors causing ASD, such as multiple genetic variations that may cause autism on their own or when combined with exposure to as yet undetermined environmental factors. Timing of exposure during the child's development, such as before, during, or after birth, may also play a role in the development or final presentation of the disorder. A small number of cases can be linked to genetic disorders such as Fragile X, Tuberous Sclerosis, and Angelman's Syndrome, as well as exposure to environmental agents such as infectious ones (maternal rubella or cytomegalovirus) or chemical ones (thalidomide or valproate) during pregnancy.
[0070] In some embodiments, these genetic variations comprise point mutations, polymorphisms, translocations, insertions, deletions, amplifications, inversions, interstitial deletions, copy number variations (CNVs), loss of heterozygosity, or any combination thereof. In some embodiments polymorphisms (e.g. polymorphic markers), can comprise any nucleotide position at which two or more sequences are possible in a subject population.
In some embodiments, each version of a nucleotide sequence with respect to the polymorphism can represent a specific allele, of the polymorphism. In some embodiments, genomic DNA from a subject can contain two alleles for any given polymorphic marker, representative of each copy of the marker on each chromosome. In some embodiments, an allele can be a nucleotide sequence of a given location on a chromosome. Polymorphisms can comprise any number of specific alleles. In some embodiments of the disclosure, a polymorphism can be characterized by the presence of two or more alleles in a population. In some embodiments, the polymorphism can be characterized by the presence of three or more alleles. In some embodiments, the polymorphism can be characterized by four or more alleles, five or more alleles, six or more alleles, seven or more alleles, nine or more alleles, or ten or more alleles. In some embodiments an allele can be associated with one or more diseases or disorders, for example, a developmental disorder risk allele can be an allele that is associated with increased or decreased risk of developing a developmental disorder. In some embodiments, genetic variations and alleles can be used to associate an inherited phenotype, for example, a developmental disorder, with a responsible genotype. In some embodiments, a developmental disorder risk allele can be a variant allele that is statistically associated with a screening of one or more developmental disorders. In some embodiments, genetic variations can be of any measurable frequency in the population, for example, a frequency higher than 10%, a frequency between 5-10%, a frequency between 1-5%, or frequency below 1%. As used herein, variant alleles can be alleles that differ from a reference allele. As used herein, a variant can be a segment of DNA that differs from the reference DNA, such as a genetic variation. In some embodiments, genetic variations can be used to track the inheritance of a gene that has not yet been identified, but whose approximate location is known.
100711 As used herein, a haplotype can be information regarding the presence or absence of one or more genetic markers in a given chromosomal region in a subject. In some embodiments, a haplotype can be a segment of DNA characterized by one or more alleles arranged along the segment, for example, a haplotype can comprise one member of the pair of alleles for each genetic variation or locus. In some embodiments, the haplotype can comprise two or more alleles, three or more alleles, four or more alleles, five or more alleles, or any combination thereof, wherein, each allele can comprise one or more genetic variations along the segment.
[0072] In some embodiments, a genetic variation can be a functional aberration that can alter gene function, gene expression, protein expression, protein function, or any combination thereof In some embodiments, a genetic variation can be a loss-of-function mutation, gain-of-function mutation, dominant negative mutation, or reversion. In some embodiments, a genetic variation can be part of a gene's coding region or regulatory region. Regulatory regions can control gene expression and thus protein expression. In some embodiments, a regulatory region can be a segment of DNA wherein regulatory proteins, for example, transcription factors, can bind. In some embodiments a regulatory region can be positioned near the gene being regulated, for example, positions upstream of the gene being regulated.
[0073] In some embodiments, variants can include changes that affect a polypeptide, such as a change in expression level, sequence, function, localization, binding partners, or any combination thereof. In some embodiments, a genetic variation can be a frameshift mutation, nonsense mutation, missense mutation, neutral mutation, or silent mutation.
For example, sequence differences, when compared to a reference nucleotide sequence, can include the insertion or deletion of a single nucleotide, or of more than one nucleotide, resulting in a frame shift; the change of at least one nucleotide, resulting in a change in the encoded amino acid; the change of at least one nucleotide, resulting in the generation of a premature stop codon; the deletion of several nucleotides, resulting in a deletion of one or more amino acids encoded by the nucleotides; the insertion of one or several nucleotides, such as by unequal recombination or gene conversion, resulting in an interruption of the coding sequence of a reading frame;
duplication of all or a part of a sequence; transposition; or a rearrangement of a nucleotide sequence. Such sequence changes can alter the polypeptide encoded by the nucleic acid, for example, if the change in the nucleic acid sequence causes a frame shift, the frame shift can result in a change in the encoded amino acids, and/or can result in the generation of a premature stop codon, causing generation of a truncated polypeptide. In some embodiments, a genetic variation associated with a developmental disorder can be a synonymous change in one or more nucleotides, for example, a change that does not result in a change in the amino acid sequence.
Such a polymorphism can, for example, alter splice sites, affect the stability or transport of niRNA, or otherwise affect the transcription or translation of an encoded polypeptide. In some embodiments, a synonymous mutation can result in the protein product having an altered structure due to rare codon usage that impacts protein folding during translation, which in some cases may alter its function and/or drug binding properties if it is a drug target. In some embodiments, the changes that can alter DNA to increase the possibility that structural changes, such as amplifications or deletions, occur at the somatic level. A polypeptide encoded by the reference nucleotide sequence can be a reference polypeptide with a particular reference amino acid sequence, and polypeptides encoded by variant nucleotide sequences can be variant polypeptides with variant amino acid sequences.
[0074] In some embodiments, one or more variant polypeptides or proteins can be associated with one or more diseases or disorders, such as ASD. In some embodiments, variant polypeptides and changes in expression, localization, and interaction partners thereof, can be used to associate an inherited phenotype, for example, a developmental disorder, with a responsible genotype. In some embodiments, a developmental disorder associated variant polypeptide can be statistically associated with a diagnosis, prognosis, or theranosis of one or more developmental disorders.
[0075] The most common sequence variants comprise base variations at a single base position in the genome, and such sequence variants, or polymorphisms, are commonly called single nucleotide polymorphisms (SNPs) or single nucleotide variants (SNVs). In some embodiments, a SNP represents a genetic variant present at greater than or equal to 1%
occurrence in a population and in some embodiments a SNP can represent a genetic variant present at any frequency level in a population. A SNP can be a nucleotide sequence variation occurring when a single nucleotide at a location in the genome differs between members of a species or between paired chromosomes in a subject. SNPs can include variants of a single nucleotide, for example, at a given nucleotide position, some subjects can have a `G-', while others can have a 'C'. SNPs can occur in a single mutational event, and therefore there can be two possible alleles possible at each SNP site; the original allele and the mutated allele. SNPs that are found to have two different bases in a single nucleotide position are referred to as biallelic SNPs, those with three are referred to as triallelic, and those with all four bases represented in the population are quadallelic. In some embodiments, SNPs can be considered neutral. In some embodiments SNPs can affect susceptibility to developmental disorders. SNP polymorphisms can have two alleles, for example, a subject can be homozygous for one allele of the polymorphism wherein both chromosomal copies of the individual have the same nucleotide at the SNP
location, or a subject can be heterozygous wherein the two sister chromosomes of the subject contain different nucleotides. The SNP nomenclature as reported herein is the official Reference SNP (rs) ID
identification tag as assigned to each unique SNP by the National Center for Biotechnological Information (NCBI).
[0076] Another genetic variation of the disclosure can be copy number variations (CNVs).
CNVs can be alterations of the DNA of a genome that results an abnormal number of copies of one or more sections of DNA. CNVs can be inherited or caused by de novo mutation and can be responsible for a substantial amount of human phenotypic variability, behavioral traits, and disease susceptibility. In a preferred embodiment, CNVs of the current disclosure can be associated with susceptibility to one or more developmental disorders, for example, ASD. In some embodiments, CNVs can be a single gene or include a contiguous set of genes. In some embodiments, CNVs can be caused by structural rearrangements of the genome, for example, unbalanced translocations, insertions, deletions, amplifications, inversions and interstitial deletions. In some embodiments, these structural rearrangements occur on one or more chromosomes. Low copy repeats (LCRs), which are region-specific repeat sequences, can be susceptible to these structural rearrangements, resulting in CNVs. Factors such as size, orientation, percentage similarity and the distance between the copies can influence the susceptibility of LCRs to genomic rearrangement. In some embodiments, CNVs are referred to as structural variants. In some embodiments, structural variants can be a broader class of variant that can also includes copy number neutral alterations such as inversions and balanced translocations.
[0077] CNVs can account for genetic variation affecting a substantial proportion of the human genome, for example, known CNVs can cover over 15% of the human genome sequence (Estivill, X Armengol; L., PLoS Genetics 3: 1787-99 (2007)). CNVs can affect gene expression, phenotypic variation and adaptation by disrupting gene dosage, and can cause disease, for example, microdeletion and microduplication disorders, and can confer susceptibility to diseases and disorders. Updated information about the location, type, and size of known CNVs can be found in one or more databases, for example, the Database of Genomic Variants (http://projects.tcag.ca/variation/), which currently contains data for over 66,000 CNVs (as of November 2, 2010).
Other types of sequence variants can be found in the human genome and can be associated with a disease or disorder, including but not limited to, microsatellites.
Microsatellite markers are stable, polymorphic, easily analyzed, and can occur regularly throughout the genome, making them especially suitable for genetic analysis. A polymorphic microsatellite can comprise multiple small repeats of bases, for example, CA repeats, at a particular site wherein the number of repeat lengths varies in a population. In some embodiments, microsatellites, for example, variable number of tandem repeats (VNTRs), can be short segments of DNA that have one or more repeated sequences, for example, about 2 to 5 nucleotides long, that can occur in non-coding DNA. In some embodiments, changes in microsatellites can occur during genetic recombination of sexual reproduction, increasing or decreasing the number of repeats found at an allele, or changing allele length.
Developmental Disorders [0078] Developmental disorders are disorders that occur at some stage in a child's development, often retarding the development, including psychological or physical disorders. In some embodiments, they can be distinguished into specific developmental disorders including Pervasive Developmental Disorders (PDDs) and Pervasive Developmental Disorder ¨ Not Otherwise Specified (PDD-NOS). A F'DD can comprise ASD. Generally, symptoms that may be present to some degree in a subject of the present disclosure with a PDD can include difficulty with verbal communication, including problems using and understanding language, difficulty with non-verbal communication, such as gestures and facial expressions such as smiling, difficulty with social interaction, including relating to people and to his or her surroundings, unusual ways of playing with toys and other objects, difficulty adjusting to changes in routine or familiar surroundings, repetitive body movements or patterns of behavior, such as hand flapping, spinning, and head banging, changing response to sound, temper tantrums, difficulty sleeping, aggressive behavior, and/ or fearfulness or anxiety. ASD can be defined by a certain set of behaviors that can range from the very mild to the severe. Possible indicators of ASDs include a subject whom does not babble, point, or make meaningful gestures by 1 year of age; does not speak one word by 16 months, does not combine two words by 2 years, does not respond to their name, and/or loses language or social skills. Other symptoms include qualitative impairment in social interaction, as manifested by marked impairments in the use of multiple nonverbal behaviors such as eye-to-eye gaze, facial expression, body posture, and gestures to regulate social interaction, failure to develop peer relationships appropriate to developmental level, a lack of spontaneous seeking to share enjoyment, interests, or achievements with other people, (e.g., by a lack of showing, bringing, or pointing out objects of interest to other people), or lack of social or emotional reciprocity ( note: in the description, it gives the following as examples: not actively participating in simple social play or games, preferring solitary activities, or involving others in activities only as tools or "mechanical" aids). Symptoms of Autism can also include qualitative impairments in communication as manifested by delay in, or total lack of, the development of spoken language (not accompanied by an attempt to compensate through alternative modes of communication such as gesture or mime), in individuals with adequate speech, marked impairment in the ability to initiate or sustain a conversation with others, stereotyped and repetitive use of language or idiosyncratic language, or lack of varied, spontaneous make-believe play or social imitative play appropriate to developmental level. Other symptoms of Autism include restricted repetitive and stereotyped patterns of behavior, interests and activities, as manifested by encompassing preoccupation with one or more stereotyped and restricted patterns of interest that is abnormal either in intensity or focus, apparently inflexible adherence to specific, nonfunctional routines or rituals, stereotyped and repetitive motor mannerisms (e.g hand or finger flapping or twisting, or complex whole-body movements), or persistent preoccupation with parts of objects. Other symptoms of Autism include delays or abnormal functioning in at areas, with onset prior to age 3 years including social interaction, language as used in social communication and symbolic or imaginative play As described herein, Pervasive Developmental Disorders¨Not Otherwise Specified (PDD-NOS) can comprise Asperger Syndrome, Rett Syndrome, fragile X syndrome, and/or Childhood Disintegrative Disorder. In some embodiments a screening of PDD-NOS can be a screening of being on the autism spectrum, but not falling within any of the existing specific categories of autism. PDD-NOS is a pervasive developmental disorder (PDD)/autism spectrum disorder (ASD) and is often referred to as atypical autism.
[0079] Symptoms of Asperger Sydrome can include qualitative impaiiment in social interaction, marked impairments in the use of multiple nonverbal behaviors such as eye-to-eye gaze, facial expression, body posture, and gestures to regulate social interaction, failure to develop peer relationships appropriate to developmental level a lack of spontaneous seeking to share enjoyment, interest or achievements with other people, (e.g.. by a lack of showing, bringing, or pointing out objects of interest to other people) and lack of social or emotional reciprocity. Other symptoms can oinclude restricted repetitive & stereotyped patterns of behavior, interests and activities, encompassing preoccupation with one or more stereotyped and restricted patterns of interest that is abnormal either in intensity or focus apparently inflexible adherence to specific, nonfunctional routines or rituals stereotyped and repetitive motor mannerisms (e.g. hand or finger flapping or twisting, or complex whole-body movements) and persistent preoccupation with parts of objects and clinically significant impairments in social, occupational, or other important areas of functioning. There may be no clinically significant general delay in language (for example, single words used by age 2 years, communicative phrases used by age 3 years).
There may be no clinically significant delay in cognitive development or in the development of age-appropriate self help skills, adaptive behavior (other than in social interaction) and curiosity about the environment in childhood.
[0080] Although apparently normal prenatal and perinatal development, apparently normal psychomotor development through the first 5 months after birth, normal head circumference at birth are observed, symptoms of Rett Syndrome begin after the period of normal development and include deceleration of head growth between ages 5 and 48 months, loss of previously acquired purposeful hand skills between ages 5 and 30 months with the subsequent development of stereotyped hand movements (i.e., hand-wringing or hand washing), loss of social engagement early in the course (although often social interaction develops later), appearance of poorly coordinated gait or trunk movements, and severely impaired expressive and receptive language development with severe psychomotor retardation.
[0081] Although apparently normal development occurs for at least the first 2 years after birth, Childhood Disintegrative Disorder symptoms manifest by the presence of age-appropriate verbal and nonverbal communication, social relationships, play, and adaptive behavior. Symptoms include clinically significant loss of previously acquired skills (before age 10 years) including expressive or receptive language, social skills or adaptive behavior, bowel or bladder control, play, and motor skills. Oher symtoms include abnormalities of functioning in areas including qualitative impairment in social interaction (e.g., impairment in nonverbal behaviors, failure to develop peer relationships, lack of social or emotional reciprocity), qualitative impairments in communication (e.g., delay or lack of spoken language, inability to initiate or sustain a conversation, stereotyped and repetitive use of language, lack of varied make-believe play), and restricted, repetitive, and stereotyped patterns of behavior, interests, and activities, including motor stereotypies and mannerisms.
Subjects [0082] A subject, as used herein, can be an individual of any age or sex from whom a sample containing nucleotides is obtained for analysis by one or more methods described herein so as to obtain genetic data, for example, a male or female adult, child, newborn, or fetus. In some embodiments, a subject can be any target of therapeutic administration. In some embodiments, a subject can be a test subject or a reference subject. In some embodiments, a subject can be associated with a condition or disease or disorder, asymptomatic or symptomatic, have increased or decreased susceptibility to a disease or disorder, be associated or unassociated with a treatment or treatment regimen, or any combination thereof. As used in the present disclosure a cohort can represent an ethnic group, a patient group, a particular age group, a group not associated with a particular disease or disorder, a group associated with a particular disease or disorder, a group of asymptomatic subjects, a group of symptomatic subjects, or a group or subgroup of subjects associated with a particular response to a treatment regimen or clinical trial.
In some embodiments, a patient can be a subject afflicted with a disease or disorder. In some embodiments, a patient can be a subject not afflicted with a disease or disorder. In some embodiments, a subject can be a test subject, a patient or a candidate for a therapeutic, wherein genomic DNA from said subject, patient, or candidate is obtained for analysis by one or more methods of the present disclosure herein, so as to obtain genetic variation information of said subject, patient or candidate.
100831 In some embodiments, the sample can be obtained prenatally from a fetus or embryo or from the mother, for example, from fetal or embryonic cells in the maternal circulation. In some embodiments, the sample can be obtained with the assistance of a health care provider, for example, to draw blood. In some embodiments, the sample can be obtained without the assistance of a health care provider, for example, where the sample is obtained non-invasively, such as a sample comprising buccal cells that is obtained using a buccal swab or brush, or a mouthwash sample.
[0084] The present disclosure also provides methods for assessing genetic variations in subjects who are members of a target population. Such a target population is in some embodiments a population or group of subjects at risk of developing the disease, based on, for example, other genetic factors, biomarkers, biophysical parameters, family history of a developmental disorder, previous screening or medical history, or any combination thereof.
[0085] Although ASD is known to affect children to a higher extent than adults, subjects of all ages are contemplated in the present disclosure. In some embodiments subjects can be from specific age subgroups, such as those over the age of 1, over the age of 2, over the age of 3, over the age of 4, over the age of 5, over the age of 6, over the age of 7, over the age of 8, over the age of 9, over the age of 10, over the age of 15, over the age of 20, over the age of 25, over the age of 30, over the age of 35, over the age of 40, over the age of 45, over the age of 50, over the age of 55, over the age of 60, over the age of 65, over the age of 70, over the age of 75, over the age of 80, or over the age of 85. Other embodiments of the disclosure pertain to other age groups, such as subjects aged less than 85, such as less than age 80, less than age 75, less than age 70, less than age 65, less than age 60, less than age 55, less than age 50, less than age 45, less than age 40, less than age 35, less than age 30, less than age 25, less than age 20, less than age 15, less than age 10, less than age 9, less than age 8, less than age 6, less than age
5, less than age 4, less than age 3, less than age 2, or less than age 1. Other embodiments relate to subjects with age at onset of the disease in any of particular age or age ranges defined by the numerical values described in the above or other numerical values bridging these numbers. It is also contemplated that a range of ages can be relevant in certain embodiments, such as age at onset at more than age 15 but less than age 20. Other age ranges are however also contemplated, including all age ranges bracketed by the age values listed in the above.
[0086] The genetic variations of the present disclosure found to be associated with a developmental disorder can show similar association in other human populations. Particular embodiments comprising subject human populations are thus also contemplated and within the scope of the disclosure. Such embodiments relate to human subjects that are from one or more human populations including, but not limited to, Caucasian, European, American, Eurasian, Asian, Central/South Asian, East Asian, Middle Eastern, African, Hispanic, and Oceanic populations. European populations include, but arc not limited to, Swedish, Norwegian, Finnish, Russian, Danish, Icelandic, Irish, Kelt, English, Scottish, Dutch, Belgian, French, German, Spanish, Portuguese, Italian, Polish, Bulgarian, Slavic, Serbian, Bosnian, Czech, Greek and Turkish populations. The racial contribution in subject subjects can also be determined by genetic analysis, for example, genetic analysis of ancestry can be carried out using unlinked microsatellite markers such as those set out in Smith et al. (Am J Hum Genet 74, 1001-13 (2004)) [0087] It is also well known to the person skilled in the art that certain genetic variations have different population frequencies in different populations, or are polymorphic in one population but not in another. A person skilled in the art can however apply the methods available and as thought herein to practice the present disclosure in any given human population. This can include assessment of genetic variations of the present disclosure, so as to identify those markers that give strongest association within the specific population. Thus, the at-risk variants of the present disclosure can reside on different haplotype background and in different frequencies in various human populations.
Samples 100881 Samples that are suitable for use in the methods described herein can be from a subject and can contain genetic or proteinaceous material, for example, genomic DNA
(gDNA). Genetic material can be extracted from one or more biological samples including but not limited to, blood, saliva, urine, mucosal scrapings of the lining of the mouth, expectorant, serum, tears, skin, tissue, or hair.
[0089] In some embodiments, the sample can comprise cells or tissue, for example, cell lines.
Exemplary cell types from which genetic material can be obtained using the methods described herein and include but are not limited to, a blood cell; such as a B
lymphocyte, T lymphocyte, leukocyte, erythrocyte, macrophage, or neutrophil; a muscle cell such as a skeletal cell, smooth muscle cell or cardiac muscle cell; a germ cell, such as a sperm or egg; an epithelial cell; a connective tissue cell, such as an adipocyte, chondrocyte; fibroblast or osteoblast; a neuron; an astrocyte; a stromal cell; an organ specific cell, such as a kidney cell, pancreatic cell, liver cell, or a keratinocyte; a stem cell; or any cell that develops there from. A cell from which gDNA is obtained can be at a particular developmental level including, for example, a hematopoietic stem cell or a cell that arises from a hematopoietic stem cell such as a red blood cell, B lymphocyte, T
lymphocyte, natural killer cell, neutrophil, basophil, eosinophil, monocyte, macrophage, or platelet. Generally any type of stem cell can be used including, without limitation, an embryonic stem cell, adult stem cell, or pluripotent stem cell.
[0090] In some embodiments, a sample can be processed for DNA isolation, for example, DNA
in a cell or tissue sample can be separated from other components of the sample. Cells can be harvested from a biological sample using standard techniques known in the art, for example, by centrifuging a cell sample and resuspending the pelleted cells, for example, in a buffered solution, for example, phosphate-buffered saline (PBS). In some embodiments, after centrifuging the cell suspension to obtain a cell pellet, the cells can be lysed to extract DNA. In some embodiments, the sample can be concentrated and/or purified to isolate DNA.
All samples obtained from a subject, including those subjected to any sort of further processing, are considered to be obtained from the subject. In some embodiments, standard techniques and kits known in the art can be used to extract genomic DNA from a biological sample, including, for example, phenol extraction, a QIAamp Tissue Kit (Qiagen, Chatsworth, Calif), a Wizard Genomic DNA purification kit (Promega), or a Qiagen Autopure method using Puregene chemistry, which can enable purification of highly stable DNA well-suited for archiving.
[0091] In some embodiments, determining the identity of an allele or determining copy number can, but need not, include obtaining a sample comprising DNA from a subject, and/or assessing the identity, copy number, presence or absence of one or more genetic variations and their chromosomal locations in the sample. The individual or organization that performs the determination need not actually carry out the physical analysis of a sample from a subject. In some embodiments, the methods can include using information obtained by analysis of the sample by a third party. In some embodiments, the methods can include steps that occur at more than one site. For example, a sample can be obtained from a subject at a first site, such as at a health care provider or at the subject's home in the case of a self-testing kit. The sample can be analyzed at the same or a second site, for example, at a laboratory or other testing facility.
Methods of Screening [0092] As used herein, screening a subject comprises diagnosing or determining, theranosing, or determining the susceptibility to developing (prognosing) a developmental disorder, for example, ASD. In particular embodiments, the disclosure is a method of determining a presence of, or a susceptibility to, a developmental disorder, by detecting at least one genetic variation in a sample from a subject as described herein. In some embodiments, detection of particular alleles, markers, variations, or hap lotypes is indicative of a presence or susceptibility to a developmental disorder. Although there can be many concerns about screening a subject with an ASD, the earlier the screening of ASD is made, the earlier needed interventions can begin. Evidence over the last 15 years indicates that intensive early intervention in optimal educational settings for at least 2 years during the preschool years results in improved outcomes in most young children with ASD. In evaluating a child, clinicians rely on behavioral characteristics to make a diagnosis, prognosis, or theranosis. Some of the characteristic behaviors of ASD may be apparent in the first few months of a child's life, or they may appear at any time during the early years. For the screening problems in at least one of the areas of communication, socialization, or restricted behavior must be present before the age of 3. The screening requires a two-stage process. The first stage involves developmental screening during "well-child" check-ups;
the second stage entails a comprehensive evaluation by a multidisciplinary team. A "well child"
check-up should include a developmental screening test. Several screening instruments have been developed to quickly gather information about a child's social and communicative development within medical settings. Among them are the Checklist of Autism in Toddlers (CHAT), the modified Checklist for Autism in Toddlers (M-CHAT), the Screening Tool for Autism in Two-Year-Olds (STAT), and the Social Communication Questionnaire (SCQ) for children 4 years of age and older. Some screening instruments rely solely on parent responses to a questionnaire, and some rely on a combination of parent report and observation. Key items on these instruments that appear to differentiate children with autism from other groups before the age of 2 include pointing and pretend play. Screening instruments do not provide individual diagnosis, prognosis, or theranosis, but serve to assess the need for referral for possible screening of ASD. These screening methods may not identify children with mild ASD, such as those with high-functioning autism or Asperger syndrome. The second stage of screening must be comprehensive in order to accurately rule in or rule out an ASD or other developmental problem. This evaluation may be done by a multidisciplinary team that includes a psychologist, a neurologist, a psychiatrist, a speech therapist, or other professionals who screen children with ASD. Because ASDs are complex disorders and may involve other developmental or genetic problems, a comprehensive evaluation should entail developmental and genetic assessment, along with in-depth cognitive and language testing. In addition, measures developed specifically for screening autism are often used. These include the Autism Diagnosis Interview-Revised (ADI-R) and the Autism Diagnostic Observation Schedule (ADOS-G). The ADI-R is a structured interview that contains over 100 items and is conducted with a caregiver. It consists of four main factors including the child's communication, social interaction, repetitive behaviors, and age-of-onset symptoms. The ADOS-G is an observational measure used to "press" for socio-communicative behaviors that are often delayed, abnormal, or absent in children with ASD.
Still another instrument often used by professionals is the Childhood Autism Rating Scale (CARS). It can aid in evaluating the child's body movements, adaptation to change, listening response, verbal communication, and relationship to people. It is suitable for use with children over 2 years of age. The examiner observes the child and also obtains relevant information from the parents. The child's behavior is rated on a scale based on deviation from the typical behavior of children of the same age. Two other tests that can be used to assess any child with a developmental delay are a formal audiologic hearing evaluation and a lead screening. Although some hearing loss can co-occur with ASD, some children with ASD may be incorrectly thought to have such a loss. In addition, if the child has suffered from an ear infection, transient hearing loss can occur. Lead screening is essential for children who remain for a long period of time in the oral-motor stage in which they put any and everything into their mouths. Children with an autistic disorder usually have elevated blood lead levels. Customarily, an expert screening team has the responsibility of thoroughly evaluating the child, assessing the child's unique strengths and weaknesses, and determining a formal screen. The team will then meet with the parents to explain the results of the evaluation.
[0093] PDD-NOS is typically screened by psychologists and Pediatric Neurologists. No singular specific test can be administered to determine whether or not a child is on the spectrum.
Screening can be made through observations, questionnaires, and tests. A
parent will usually initiate the quest into the screening with questions for their child's pediatrician about their child's development after noticing abnormalities. From there, doctors will ask questions to gauge the child's development in comparison to age-appropriate milestones. One test that measures this is the Modified Checklist of Autism in Toddlers (MCHAT). This is a list of questions whose answers will determine whether or not the child should be referred to a specialist such as a developmental pediatrician, a neurologist, a psychiatrist, or a psychologist.
Another checklist, the DSM-IV is a series of characteristics and criteria to qualify for an autism diagnosis. Because PDD-NOS is a spectrum disorder, not every child shows the same signs. The two main characteristics of the disorder are difficulties with social interaction skills and communication.
Signs are often visible in babies but a diagnosis is usually not made until around age 4. Even though PDD-NOS is considered milder than typical autism, this is not always true. While some characteristics may be milder, others may be more severe. Once a child with PDD-NOS enters school, he or she will often be very eager to interact with classmates, but may act socially different to peers and be unable to make genuine connections. As they age, the closest connections they make are typically with their parents. Children with PDD-NOS
have difficulty reading facial expressions and relating to feelings of others. They may not know how to respond when someone is laughing or crying. Literal thinking is also characteristic of PDD-NOS. They will most likely have difficulty understanding figurative speech and sarcasm.
Inhibited communication skills are a sign of PDD-NOS that begins immediately after birth. As an infant, they will not babble, and as they age, they do not speak when age appropriate.
Once verbal communication begins, their vocabulary is often limited. Some characteristics of language-based patterns are: repetitive or rigid language, narrow interests, uneven language development, and poor nonverbal communication. A very common characteristic of PDD-NOS is severe difficulty grasping the difference between pronouns, particularly between "you" and "me"
when conversing. During the last few years, screening instruments have been devised to screen for Asperger syndrome and higher functioning autism. The Autism Spectrum Screening Questionnaire (ASSQ), the Australian Scale for Asperger's Syndrome, and the most recent, the Childhood Asperger Syndrome Test (CAST), are some of the instruments that are reliable for identification of school-age children with Asperger syndrome or higher functioning autism.
These tools concentrate on social and behavioral impairments in children without significant language delay. If, following the screening process or during a routine "well child" check-up, a subject's doctor sees any of the possible indicators of ASD, further evaluation is indicated.
[0094] While means for screening ASDs exist, many times symptoms go unnoticed until late in childhood or symptoms are so minor they are left unnoticed. Thus there exists a need for an improved ASD screening test. Described herein are methods of screening an individual for one or more developmental disorders, including but not limited to, determining the identity and location of genetic variations, such as variations in nucleotide sequence and copy number, and the presence or absence of alleles or genotypes in one or more samples from one or more subjects using any of the methods described herein. In some embodiments, determining an association to having or developing a developmental disorder can be performed by detecting particular variations that appear more frequently in test subjects compared to reference subjects and analyzing the molecular and physiological pathways these variations can affect.
[0095] Within any given population, there can be an absolute susceptibility of developing a disease or trait, defined as the chance of a person developing the specific disease or trait over a specified time-period. Susceptibility (e.g. being at-risk) is typically measured by looking at very large numbers of people, rather than at a particular individual. As described herein, certain copy number variations (genetic variations) are found to be useful for susceptibility assessment of a developmental disorder. Susceptibility assessment can involve detecting particular genetic variations in the genome of individuals undergoing assessment. Particular genetic variations are found more frequently in individuals with a developmental disorder, than in individuals without screening of a developmental disorder. Therefore, these genetic variations have predictive value for detecting a developmental disorder, or a susceptibility to a developmental disorder, in an individual. Without intending to be limited by theory, it is believed that the genetic variations described herein to be associated with susceptibility of a developmental disorder represent functional variants predisposing to the disease. In some embodiments, a genetic variation can confer a susceptibility of the condition, for example, carriers of the genetic variation are at a different risk of the condition than non-carriers. In a preferred embodiment, the presence of a genetic variation is indicative of increased susceptibility to a developmental disorder, such as ASD.
[0096] In some embodiments, screening can be performed using any of the methods disclosed, alone or in combination. In some embodiments, screening can be performed using Polymerase Chain Reaction (PCR). In a preferred embodiment screening can be performed using Array Comparative Genomic Hybridization (aCGH). In some embodiments, the genetic variation information as it relates to the current disclosure can be used in conjunction with any of the above mentioned symptomatic screening tests to screen a subject for ASD, for example, using a combination of aCGH and a childhood screening test, such as the Checklist of Autism in Toddlers (CHAT).
[0097] In some embodiments, information from any of the above screening methods (e.g.
specific symptoms, scoring matrix, or genetic variation data) can be used to define a subject as a test subject or reference subject. In some embodiments, information from any of the above screening methods can be used to associate a subject with a test or reference population, for example, a subject in a population. In the present study, for example, all the probands in Tables 1 and 5 met the criteria for autism on one or both of the screening measures including the Autism Diagnostic Interview-Revised (ADI-R) training and the Autism Diagnostic Observation Schedule (ADOS) training.
[0098] In one embodiment, an association with a developmental disorder can determined by the statistical likelihood of the presence of a genetic variation in a subject with a developmental disorder, for example, an unrelated individual or a first or second-degree relation of the subject.
In some embodiments, an association with a developmental disorder can be determined by determining the statistical likelihood of the absence of a genetic variation in an unaffected reference subject, for example, an unrelated individual or a first or second-degree relation of the subject. The methods described herein can include obtaining and analyzing a sample from one or more suitable reference subjects.
100991 In the present context, the term screening comprises diagnosis, prognosis, and theranosis.
Screening can refer to any available screening method, including those mentioned herein. As used herein, susceptibility can be proneness of a subject towards the development of a developmental condition, or towards being less able to resist a particular developmental condition than one or more control subjects. In some embodiments, susceptibility can encompass increased susceptibility. For example, particular nucleic acid variations of the disclosure as described herein can be characteristic of increased susceptibility to development of a developmental disorder. In some embodiments, susceptibility can encompass decreased susceptibility, for example, particular nucleic variations of the disclosure as described herein can be characteristic of decreased susceptibility to development of a developmental disorder.
[00100] As described herein, a genetic variation predictive of susceptibility to or presence of a developmental disorder can be one where the particular genetic variation is more frequently present in a subject with the condition (affected), compared to the frequency of its presence in a reference group (control), such that the presence of the genetic variation is indicative of susceptibility to or presence of the developmental disorder. In some embodiments, the reference group can be a population sample, for example, a random sample from the general population or a mixture of two or more samples from a population. In some embodiments, disease-free controls can be characterized by the absence of one or more specific disease-associated symptoms, for example, individuals who have not experienced symptoms associated with a developmental disorder. In another embodiment, the disease-free control group is characterized by the absence of one or more disease-specific risk factors, for example, at least one genetic and/or environmental risk factor. In some embodiments, a reference sequence can be referred to for a particular site of genetic variation. In some embodiments, a reference allele can be a wild-type allele and can be chosen as either the first sequenced allele or as the allele from a control individual. In some embodiments, one or more reference subjects can be characteristically matched with one or more affected subjects, for example, with matched aged, gender or ethnicity.
[00101] A person skilled in the art will appreciate that for genetic variations with two alleles present in the population being studied, and wherein one allele can found in increased frequency in a group of individuals with a developmental disorder in the population, compared with controls, the other allele of the marker can be found in decreased frequency in the group of individuals with the trait or disease, compared with controls. In such a case, one allele of the marker, for example, the allele found in increased frequency in individuals with a developmental disorder, can be the at-risk allele, while the other allele can be a neutral or protective allele.
[00102] A genetic variant associated with a developmental disorder can be used to predict the susceptibility of the disease for a given genotype. For any genetic variation, there can be one or more possible genotypes, for example, homozygote for the at-risk variant (e.g., in autosomal recessive disorders), heterozygote, and non-carrier of the at-risk variant. In some embodiments, susceptibility associated with variants at multiple loci can be used to estimate overall susceptibility. For multiple genetic variants, there can be k (k _ 3An * 2^P) possible genotypes;
wherein n can be the number of autosomal loci and p can be the number of gonosomal (sex chromosomal) loci. Overall susceptibility assessment calculations can assume that the relative susceptibilities of different genetic variants multiply, for example, the overall susceptibility associated with a particular genotype combination can be the product of the susceptibility values for the genotype at each locus. If the susceptibility presented is the relative susceptibility for a person, or a specific genotype for a person, compared to a reference population, then the combined susceptibility can be the product of the locus specific susceptibility values and can correspond to an overall susceptibility estimate compared with a population.
If the susceptibility for a person is based on a comparison to non-carriers of the at-risk allele, then the combined susceptibility can correspond to an estimate that compares the person with a given combination of genotypes at all loci to a group of individuals who do not carry at-risk variants at any of those loci. The group of non-carriers of any at-risk variant can have the lowest estimated susceptibility and can have a combined susceptibility, compared with itself, for example, non-carriers, of 1.0, but can have an overall susceptibility, compared with the population, of less than 1Ø

[00103] Overall risk for multiple risk variants can be performed using standard methodology.
Genetic variations described herein can form the basis of risk analysis that combines other genetic variations known to increase risk of a developmental disorder, or other genetic risk variants for a developmental disorder. In certain embodiments of the disclosure, a plurality of variants (genetic variations, variant alleles, and/or haplotypes) can be used for overall risk assessment. These variants are in some embodiments selected from the genetic variations as disclosed herein. Other embodiments include the use of the variants of the present disclosure in combination with other variants known to be useful for screening a susceptibility to a developmental disorder. In such embodiments, the genotype status of a plurality of genetic variations, markers and/or haplotypes is determined in an individual, and the status of the individual compared with the population frequency of the associated variants, or the frequency of the variants in clinically healthy subjects, such as age-matched and sex-matched subjects.
[00104] Methods known in the art, such as the use of available algorithms and software can be used to identify, or call, significant genetic variations, including but not limited to, algorithms of DNA Analytics or DNAcopy, iPattem and/or QuantiSNP. For example, an Aberration Detection Module 2 (ADM2) algorithm, such as that of DNA Analytics 4Ø85 can be used to identify, or call, significant genetic variations. In some embodiments, two or more algorithms can be used to identify, or call, significant genetic variations. For example, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more algorithms can be used to identify, or call, significant genetic variations.
In some embodiments, significant genetic variations can be CNVs.
[00105] CNVs detected by 2 or more algorithms can be defined as stringent and can be utilized for further analyses. In some embodiments, the information and calls from two or more of the methods described herein can be compared to each other to identify significant genetic variations more or less stringently. For example, CNV calls generated by both Aberration Detection Module 2 (ADM2) algorithms and DNAcopy algorithms can be defined as stringent CNVs. In some embodiments, significant or stringent genetic variations can be tagged as identified or called if it can be found to have a minimal reciprocal overlap to a genetic variation detected by one or more platforms and/or methods described herein. For example, significant or stringent genetic variations can be tagged as identified or called if it can be found to have a reciprocal overlap of more than about 50%, 55% 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%, 99 %, or equal to 100%, to a genetic variation detected by one or more platforms and/or methods described herein. For example, significant or stringent genetic variations can be tagged as identified or called if it can be found to have a reciprocal overlap of more than about 50%

reciprocal overlap to a genetic variation detected by one or more platforms and/or methods described herein.
[00106] In some embodiments, a threshold log ratio value can be used to determine losses and gains. A log ratio value can be any log ratio value; for example, a log ratio value can be a 1og2 ratio or a log10 ratio. In some embodiments, a CNV segment whose median 1og2 ratio is less than or equal to a 1og2 ratio threshold value can be classified as a loss. For example, any segment whose median 1og2 ratio is less than or equal to -0.1, -0.11, -0.12, -0.13, -0.14, -0.15, -0.16, -0.17, -0.18, -0.19, -0.2, -0.21, -0.22, -0.23, -0.24, -0.25, -0.26, -0.27, -0.28, -0.29, -0.3, -0.31, -0.32, -0.33, -0.34, -0.35, -0.36, -0.37, -0.38, -0.39, -0.4, -0.41, -0.42, -0.43, -0.44, -0.45, -0.46, -0.47, -0.48, -0.49, -0.5, -0.55, -0.6, -0.65, -0.7, -0.75, -0.8, -0.85, -0.9, -0.95, -1, -1.1, -1.2, -1.3, -1.4, -1.5, -1.6, -1.7, -1.8, -1.9, -2, -2.1, -2.2, -2.3, -2.4, -2.5, -2.6, -2.7, -2.8, -2.9, -3, -3.1, -3.2,-3.3, -3.4, -3.5, -3.6, -3.7, -3.8, -3.9, -4, -4.1, -4.2, -4.3, -4.4, -4.5, -4.6, -4.7, -4.8, -4.9, -5, -5.5, -6,
-6.5, -7, -7.5, -8, -8.5, -9, -9.5, -10, -11, -12, -13, -14, -15, -16, -17, -18, -19, -20 or less, can be classified as a loss.
[00107] In some embodiments, one algorithm can be used to call or identify significant genetic variations, wherein any segment whose median 1og2 ratio was less than or equal to -0.1, -0.11, -0.12, -0.13, -0.14, -0.15, -0.16, -0.17, -0.18, -0.19, -0.2, -0.21, -0.22, -0.23, -0.24, -0.25, -0.26, -0.27, -0.28, -0.29, -0.3, -0.31, -0.32, -0.33, -0.34, -0.35, -0.36, -0.37, -0.38, -0.39, -0.4, -0.41, -0.42, -0.43, -0.44, -0.45, -0.46, -0.47, -0.48, -0.49, -0.5, -0.55, -0.6, -0.65, -0.7, -0.75, -0.8, -0.85, -0.9, -0.95, -1, -1.1, -1.2, -1.3, -1.4, -1.5, -1.6, -1.7, -1.8, -1.9, -2, -2.1, -2.2, -2.3, -2.4, -2.5, -2.6, -2.7, -2.8, -2.9, -3, -3.1, -3.2, -3.3, -3.4, -3.5, -3.6, -3.7, -3.8, -3.9, -4, -4.1, -4.2, -4.3, -4.4, -4.5, -4.6, -4.7, -4.8, -4.9, -5, -5.5, -6, -6.5, -7, -7.5, -8, -8.5, -9, -9.5, -10, -11, -12, -13, -14, -15, -16,-17, -18, -19, -20 or less, can be classified as a loss. For example, any CNV
segment whose median 1og2 ratio is less than -0.35 as determined by DNAcopy can be classified as a loss. For example, losses can be determined according to a threshold 1og2 ratio, which can be set at -0.35.
[00108] In some embodiments, two algorithms can be used to call or identify significant genetic variations, wherein any segment whose median 1og2 ratio is less than or equal to -0.1, -0.11, -0.12, -0.13, -0.14, -0.15, -0.16, -0.17, -0.18, -0.19, -0.2, -0.21, -0.22, -0.23, -0.24, -0.25, -0.26, -0.27, -0.28, -0.29, -0.3, -0.31, -0.32, -0.33, -0.34, -0.35, -0.36, -0.37, -0.38, -0.39, -0.4, -0.41, -0.42, -0.43, -0.44, -0.45, -0.46, -0.47, -0.48, -0.49, -0.5, -0.55, -0.6, -0.65, -0.7, -0.75, -0.8, -0.85, -0.9, -0.95, -1, -1.1, -1.2, -1.3, -1.4, -1.5, -1.6, -1.7, -1.8, -1.9, -2, -2.1, -2.2, -2.3, -2.4, -2.5, -2.6, -2.7, -2.8, -2.9, -3, -3.1, -3.2, -3.3, -3.4, -3.5, -3.6, -3.7, -3.8, -3.9, -4, -4.1, -4.2, -4.3, -4.4, -4.5, -4.6, -4.7, -4.8, -4.9, -5, -5.5, -6, -6.5, -7, -7.5, -8, -8.5, -9, -9.5, -10, -11, -12, -13, -14, -15, -16, -17, -18, -19, -20 or less, as determined by one algorithm, and wherein any segment whose median 10g2 ratio is less than or equal to -0.1, -0.11, -0.12, -0.13, -0.14, -0.15, -0.16, -0.17, -0.18, -0.19, -0.2, -0.21, -0.22, -0.23, -0.24, -0.25, -0.26, -0.27, -0.28, -0.29, -0.3, -0.31, -0.32, -0.33, -0.34, -0.35, -0.36, -0.37, -0.38, -0.39, -0.4, -0.41, -0.42, -0.43, -0.44, -0.45, -0.46, -0.47, -0.48, -0.49, -0.5, -0.55, -0.6, -0.65, -0.7, -0.75, -0.8, -0.85, -0.9, -0.95, -1, -1.1, -1.2, -1.3, -1.4, -1.5, -1.6, -1.7, -1.8, -1.9, -2, -2.1, -2.2, -2.3, -2.4, -2.5, -2.6, -2.7, -2.8, -2.9, -3, -3.1, -3.2, -3.3, -3.4, -3.5, -3.6, -3.7, -3.8, -3.9, -4, -4.1, -4.2, -4.3, -4.4, -4.5, -4.6, -4.7, -4.8, -4.9, -5, -5.5, -6, -6.5, -7, -
7.5, -8, -8.5, -9, -9.5, -10, -11, -12, -13, -14, -15, -16, -17, -18, -19, -20, or less, as determined by the other algorithm can be classified as a loss. For example, CNV calling can comprise using the Aberration Detection Module 2 (ADM2) algorithm and the DNAcopy algorithm, wherein losses can be determined according to a two threshold 1og2 ratios, wherein the Aberration Detection Module 2 (ADM2) algorithm 1og2 ratio can be -0.25 and the DNAcopy algorithm 1og2 ratio can be -0.41.
[00109] In some embodiments, the use of two algorithms to call or identify significant genetic variations can be a stringent method. In some embodiments, the use of two algorithms to call or identify significant genetic variations can be a more stringent method compared to the use of one algorithm to call or identify significant genetic variations.
[00110] In some embodiments, any CNV segment whose median 1og2 ratio is greater than a 1og2 ratio threshold value can be classified as a gain. For example, any segment whose median 10g2 ratio is greater than 0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4, 0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3, or more can be classified as a gain.
[00111] In some embodiments, one algorithm can be used to call or identify significant genetic variations, wherein any segment whose median 1og2 ratio is greater than or equal to 0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4, 0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3, or more can be classified as a gain.
For example, any CNV segment whose median 1og2 ratio is greater than 0.35 as determined by DNAcopy can be classified as a gain. For example, gains can be determined according to a threshold 1og2 ratio, which can be set at 0.35.
[00112] In some embodiments, two algorithms can be used to call or identify significant genetic variations, wherein any segment whose median 10g2 ratio is greater than or equal to 0.1, 0.11,
8 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4, 0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, or 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3 or more, as determined by one algorithm, and wherein any segment whose median log2 ratio is greater than or equal to 0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4, 0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, or 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3, or more, as determined by the other algorithm the can be classified as a gain. For example, CNV calling can comprise using the Aberration Detection Module 2 (ADM2) algorithm and the DNAcopy algorithm, wherein gains can be determined according to a two threshold 1og2 ratios, wherein the Aberration Detection Module 2 (ADM2) algorithm 1og2 ratio can be 0.25 and the DNAcopy algorithm 1og2 ratio can be 0.32.
[00113] Any CNV segment whose absolute (median log-ratio/mad) value is less than 2 can be excluded (not identified as a significant genetic variation). For example, any CNV segment whose absolute (median log-ratio/mad) value is less than 2, 1.9, 1.8, 1.7, 1.6, 1.5, 1.4, 1.3, 1.2, 1.1, 1, 0.9, 0.8, 0.7, 0.6, or 0.5 or less can be excluded [00114] In some embodiments, multivariate analyses or joint risk analyses, including the use of multiplicative model for overall risk assessment, and can subsequently be used to determine the overall risk conferred based on the genotype status at the multiple loci. Use of a multiplicative model, for example, assuming that the risk of individual risk variants multiply to establish the overall effect, allows for a straight-forward calculation of the overall risk for multiple markers.
The multiplicative model is a parsimonious model that usually fits the data of complex traits reasonably well. Deviations from multiplicity have been rarely described in the context of common variants for common diseases, and if reported are usually only suggestive since very large sample sizes are usually required to be able to demonstrate statistical interactions between loci. Assessment of risk based on such analysis can subsequently be used in the methods, uses and kits of the disclosure, as described herein [00115] In some embodiments, the significance of increased or decreased susceptibility can be measured by a percentage. In some embodiments, a significant increased susceptibility can be measured as a relative susceptibility of at least 1.2, including but not limited to: at least 1.5, at least 1.3, at least 1.4, at least 1.5, at least 1.6, at least 1.7, 1.8, at least 1.9, at least 2.0, at least 2.5, at least 3.0, at least 4.0, at least 5.0, at least 6.0, at least 7.0, at least 8.0, at least 9.0, at least 10.0, and at least 15Ø In some embodiments, a relative susceptibility of at least 2.0, at least 3.0, at least 4.0, at least, 5.0, at least 6.0, or at least 10.0 is significant. Other values for significant susceptibility are also contemplated, for example, at least 2.5, 3.5, 4.5, 5.5, or any suitable other numerical values, wherein said values are also within scope of the present disclosure. In some embodiments, a significant increase in susceptibility is at least about 20%, including but not limited to about 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 150%, 200%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, and 1500%. In one particular embodiment, a significant increase in susceptibility is at least 100%. In other embodiments, a significant increase in susceptibility is at least 200%, at least 300%, at least 400%, at least 500%, at least 700%, at least 800%, at least 900% and at least 1000%. Other cutoffs or ranges as deemed suitable by the person skilled in the art to characterize the disclosure are also contemplated, and those are also within scope of the present disclosure. In certain embodiments, a significant increase in susceptibility is characterized by a p-value, such as a p-value of less than 0.5, less than 0.4, less than 0.3, less than 0.2, less than 0.1, less than 0.05, less than 0.01, less than 0.001, less than 0.0001, less than 0.00001, less than 0.000001, less than 0.0000001, less than 0.00000001, or less than 0.000000001.
[00116] In some embodiments, an individual who is at a decreased susceptibility for or the lack of presence of a developmental condition can be an individual in whom at least one genetic variation, conferring decreased susceptibility for or the lack of presence of the developmental disorder is identified. In some embodiments, the genetic variations conferring decreased susceptibility are also said to be protective. In one aspect, the genetic variations can confer a significant decreased susceptibility of or lack of presence of the developmental disorder.
[00117] In some embodiments, significant decreased susceptibility can be measured as a relative susceptibility of less than 0.9, including but not limited to less than 0.9, less than 0.8, less than 0.7, less than 0,6, less than 0.5, less than 0.4, less than 0.3, less than 0.2 and less than 0.1. In another embodiment, the decrease in susceptibility is at least 20%, including but not limited to at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% and at least 98%. Other cutoffs or ranges as deemed suitable by the person, skilled in the art to characterize the disclosure are however also contemplated, and those are also within scope of the present disclosure. In certain embodiments, a significant decrease in susceptibility is characterized by a p-value, such as a p-value of less than 0.05, less than 0.01, less than 0.001, less than 0.0001, less than 0.00001, less than 0.000001, less than 0.0000001, less than 0.00000001, or less than 0.000000001. Other tests for significance can be used, for example, a Fisher-exact test. Other statistical tests of significance known to the skilled person are also contemplated and are also within scope of the disclosure.
[00118] In some preferred embodiments, the significance of increased or decreased susceptibility can be determined according to the ratio of measurements from a test subject to a reference subject. In a preferred embodiment, losses or gains of one or more CNVs can be determined according to a threshold log, ratio determined by these measurements. In some embodiments, a 10g2 ratio value greater than 0.35 is indicative of a gain of one or more CNVs. In some embodiments, a 10g2 ratio value less than -0.35 is indicative of a loss of one or more CNVs. In some embodiments, the ratio of measurents from a test subject to a reference subject may be inverted such that the 1og2 ratios of copy number gains are negative and the 1og2 ratios of copy number losses are positive.
[00119] In some embodiments, the combined or overall susceptibility associated with a plurality of variants associated with a developmental disorder can also be assessed; for example, the genetic variations described herein to be associated with susceptibility to a developmental disorder can be combined with other common genetic risk factors. Combined risk for such genetic variants can be estimated in an analogous fashion to the methods described herein.
[00120] Calculating risk conferred by a particular genotype for the individual can be based on comparing the genotype of the individual to previously determined risk expressed, for example, as a relative risk (RR) or an odds ratio (OR), for the genotype, for example, for a heterozygous carrier of an at-risk variant for a developmental disorder. An odds ratio can be a statistical measure used as a metric of causality. For example, in genetic disease research it can be used to convey the significance of a variant in a disease cohort relative to an unaffected/normal cohort.
The calculated risk for the individual can be the relative risk for a subject, or for a specific genotype of a subject, compared to the average population. The average population risk can be expressed as a weighted average of the risks of different genotypes, using results from a reference population, and the appropriate calculations to calculate the risk of a genotype group relative to the population can then be performed. Alternatively, the risk for an individual can be based on a comparison of particular genotypes, for example, heterozygous carriers of an at-risk allele of a marker compared with non-carriers of the at-risk allele. Using the population average can, in certain embodiments, be more convenient, since it provides a measure which can be easy to interpret for the user, such as a measure that gives the risk for the individual, based on his/her genotype, compared with the average in the population.
1001211 In certain embodiments of the disclosure, a genetic variation is correlated to a developmental disorder by referencing genetic variation data to a look-up table that comprises correlations between the genetic variation and a developmental disorder. The genetic variation in certain embodiments comprises at least one indication of the genetic variation. In some embodiments, the table comprises a correlation for one genetic variation. In other embodiments, the table comprises a correlation for a plurality of genetic variations In both scenarios, by referencing to a look-up table that gives an indication of a correlation between a genetic variation and a developmental disorder, a risk for a developmental disorder, or a susceptibility to a developmental disorder, can be identified in the individual from whom the sample is derived.
[00122] The present disclosure also pertains to methods of clinical screening, for example, diagnosis, prognosis, or theranosis of a subject performed by a medical professional using the methods disclosed herein. In other embodiments, the disclosure pertains to methods of screening performed by a layman. The layman can be a customer of a genotyping service.
The layman can also be a genotype service provider, who performs genotype analysis on a DNA
sample from an individual, in order to provide service related to genetic risk factors for particular traits or diseases, based on the genotype status of the subject obtained from use of the methods described herein. The resulting genotype information can be made available to the individual and can be compared to information about developmental disorder or risk of developing a developmental disorder associated with various genetic variations, including but not limited to, information from public literature and scientific publications. The screening applications of developmental disorder-associated genetic variations, as described herein, can, for example, be performed by an individual, a health professional, or a third party, for example, a service provider who interprets genotype information from the subject.
[00123] The information derived from analyzing sequence data can be communicated to any particular body, including the individual from which the sample or sequence data is derived, a guardian or representative of the individual, clinician, research professional, medical professional, service provider, and medical insurer or insurance company.
Medical professionals can be, for example, doctors, nurses, medical laboratory technologists, and pharmacists.
Research professionals can be, for example, principle investigators, research technicians, postdoctoral trainees, and graduate students.
[00124] In some embodiments, a professional can be assisted by determining whether specific genetic variants are present in a biological sample from a subject, and communicating information about genetic variants to a professional. After information about specific genetic variants is reported, a medical professional can take one or more actions that can affect subject care. For example, a medical professional can record information in the subject's medical record regarding the subject's risk of developing a developmental disorder. In some embodiments, a medical professional can record information regarding risk assessment, or otherwise transform the subject's medical record, to reflect the subject's current medical condition. In some embodiments, a medical professional can review and evaluate a subject's entire medical record and assess multiple treatment strategies for clinical intervention of a subject's condition.
[00125] A medical professional can initiate or modify treatment after receiving information regarding a subject's screening of a developmental disorder, for example. In some embodiments, a medical professional can recommend a change in therapy. In some embodiments, a medical professional can enroll a subject in a clinical trial for, by way of example, detecting correlations between a haplotype as described herein and any measurable or quantifiable parameter relating to the outcome of the treatment as described above.
[00126] In some embodiments, a medical professional can communicate information regarding a subject's screening of developing a developmental disorder to a subject or a subject's family. In some embodiments, a medical professional can provide a subject and/or a subjects family with information regarding a developmental disorder and risk assessment information, including treatment options, and referrals to specialists. In some embodiments, a medical professional can provide a copy of a subject's medical records to a specialist. In some embodiments, a research professional can apply information regarding a subject's risk of developing a developmental disorder to advance scientific research. In some embodiments, a research professional can obtain a subject's haplotype as described herein to evaluate a subject's enrollment, or continued participation, in a research study or clinical trial. In some embodiments, a research professional can communicate information regarding a subject's screening of a developmental disorder to a medical professional. In some embodiments, a research professional can refer a subject to a medical professional.
[00127] Any appropriate method can be used to communicate information to another person. For example, information can be given directly or indirectly to a professional and laboratory technician can input a subject's genetic variation as described herein into a computer-based record. In some embodiments, information is communicated by making a physical alteration to medical or research records. For example, a medical professional can make a permanent notation or flag a medical record for communicating the risk assessment to other medical professionals reviewing the record. In addition, any type of communication can be used to communicate the risk assessment information. For example, mail, e-mail, telephone, and face-to-face interactions can be used. The information also can be communicated to a professional by making that information electronically available to the professional. For example, the information can be communicated to a professional by placing the information on a computer database such that the professional can access the information. In addition, the information can be communicated to a hospital, clinic, or research facility serving as an agent for the professional.
[00128] Results of these tests, and optionally interpretive information, can be returned to the subject, the health care provider or to a third party. The results can be communicated to the tested subject, for example, with a prognosis and optionally interpretive materials that can help the subject understand the test results and prognosis; used by a health care provider, for example, to determine whether to administer a specific drug, or whether a subject should be assigned to a specific category, for example, a category associated with a specific disease endophenotype, or with drug response or non-response; used by a third party such as a healthcare payer, for example, an insurance company or HMO, or other agency, to determine whether or not to reimburse a health care provider for services to the subject, or whether to approve the provision of services to the subject. For example, the healthcare payer can decide to reimburse a health care provider for treatments for a developmental disorder if the subject has a developmental disorder or has an increased risk of developing a developmental disorder.
[00129] Also provided herein are databases that include a list of genetic variations as described herein, and wherein the list can be largely or entirely limited to genetic variations identified as useful for screening a developmental disorder as described herein. The list can be stored, for example, on a flat file or computer-readable medium. The databases can further include information regarding one or more subjects, for example, whether a subject is affected or unaffected, clinical information such as endophenotype, age of onset of symptoms, any treatments administered and outcomes, for example, data relevant to pharmacogenomics, diagnostics, prognostics or theranostics, and other details, for example, data about the disorder in the subject, or environmental or other genetic factors. The databases can be used to detect correlations between a particular haplotypc and the information regarding the subject.
[00130] The methods described herein can also include the generation of reports for use, for example, by a subject, care giver, or researcher, that include information regarding a subject's genetic variations, and optionally further information such as treatments administered, treatment history, medical history, predicted response, and actual response. The reports can be recorded in a tangible medium, e.g., a computer-readable disk, a solid state memory device, or an optical storage device.
Methods of Screening using Variations in Polypeptides 1001311 In another embodiment of the disclosure, screening of a developmental disorder can be made by examining or comparing changes in expression, localization, binding partners, and composition of a polypeptide encoded by a nucleic acid associated with a developmental disorder, for example, in those instances where the genetic variations of the present disclosure results in a change in the composition or expression of the polypeptide. Thus, screening of a developmental disorder can be made by examining expression and/or composition of one of these polypeptides, or another polypeptidc encoded by a nucleic acid associated with a developmental disorder, in those instances where the genetic variation of the present disclosure results in a change in the expression, localization, binding partners, and/or composition of the polypeptide. In some embodiments, screening can comprise diagnosing a subject.
In some embodiments, screening can comprise determining a prognosis of a subject, for example, determining the susceptibility of developing a developmental disorder. In some embodiments, screening can comprise theranosing a subject.
[00132] The genetic variations described herein that show association to a developmental disorder can play a role through their effect on one or more of these nearby genes. For example, while not intending to be limited by theory, it is generally expected that a deletion of a chromosomal segment comprising a particular gene, or a fragment of a gene, can either result in an altered composition or expression, or both, of the encoded protein.
Likewise, duplications, or high number copy number variations, are in general expected to result in increased expression of encoded polypeptide. Other possible mechanisms affecting genes within a genetic variation region include, for example, effects on transcription, effects on RNA
splicing, alterations in relative amounts of alternative splice forms of mRNA, effects on RNA
stability, effects on transport from the nucleus to cytoplasm, and effects on the efficiency and accuracy of translation. Thus, DNA variations can be detected directly, using the subjects unamplified or amplied genomic DNA, or indirectly, using RNA or DNA obtained from the subject's tissue(s) that are present in an aberrant form or expression level as a result of the genetic variations of the disclosure showing association to ASD.
[00133] In some embodiments, the genetic variations of the disclosure showing association to a developmental disorder can affect the expression of a gene within the genetic variation region. In some embodiments, a genetic variation affecting an exonic region of a gene can affect, disrupt, or modulate the expression of the gene. In some embodiments, a genetic variation affecting an intergenic region of a gene can affect, disrupt, or modulate the expression of the gene. Certain genetic variation regions can have flanking duplicated segments, and genes within such segments can have altered expression and/or composition as a result of such genomic alterations.
Regulatory elements affecting gene expression can be located far away, even as far as tens or hundreds of kilobases away, from the promoter region of a gene. Thus, in some embodiments, regulatory elements for genes that are located outside the genetic variation region can be located within the genetic variation, and can be affected by the genetic variation. It is thus contemplated that the detection of the genetic variations described herein, can be used for assessing expression for one or more of associated genes not directly impacted by the genetic variations. In some embodiments, a genetic variation affecting an intergenic region of a gene can affect, disrupt, or modulate the expression of a gene located elsewhere in the genome, such as described above. For example, a genetic variation affecting an intergenic region of a gene can affect, disrupt, or modulate the expression of a transcription factor, located elsewhere in the genome, which regulates the gene.
[00134] In some embodiments, genetic variations of the disclosure showing association to ASD
can affect protein expression at the translational level. It can be appreciated by those skilled in the art that this can occur by increased or decreased expression of one or more microRNAs (miRNAs) that regulates expression of a protein known to be important, or implicated, in the cause, onset, or progression of ASD. Increased or decreased expression of the one or more miRNAs can result from gain or loss of the whole miRNA gene, disruption of a portion of the gene (e.g., by an indel or CNV), or even a single base change (SNP or SNV) that produces an altered, non-functional or aberrant functioning miRNA sequence. It can also be appreciated by those skilled in the art that the expression of protein, for example, one known to cause ASD by increased or decreased expression, can result due to a genetic variation that results in alteration of an existing miRNA binding site within the protein's mRNA transcript, or even creates a new miRNA binding site that leads to aberrant protein expression.
[00135] A variety of methods can be used for detecting protein composition and/or expression levels, including but not limited to enzyme linked irnmunosorbent assays (ELISA), Western blots, spectroscopy, mass spectrometry, peptide arrays, colorimetry, electrophoresis, isoelectric focusing, immunoprecipitations, immunoassays, and immunofluorescence and other methods well-known in the art. A test sample from a subject can be assessed for the presence of an alteration in the expression and/or an alteration in composition of the polypeptide encoded by a nucleic acid associated with a developmental disorder. An ''alteration" in the polypeptide expression or composition, as used herein, refers to an alteration in expression or composition in a test sample, as compared to the expression or composition of the polypeptide in a control sample. Such alteration can, for example, be an alteration in the quantitative polypeptide expression or can be an alteration in the qualitative polypeptide expression, for example, expression of a mutant polypeptide or of a different splicing variant, or a combination thereof. In some embodiments, screening of a developmental disorder can be made by detecting a particular splicing variant encoded by a nucleic acid associated with a developmental disorder, or a particular pattern of splicing variants.
[00136] Antibodies can be polyclonal or monoclonal and can be labeled or unlabeled. An intact antibody, or a fragment thereof can be used. The term "labeled", with regard to the probe or antibody, is intended to encompass direct labeling of the probe or antibody by coupling a detectable substance to the probe or antibody, as well as indirect labeling of the probe or antibody by reactivity with another reagent that is directly labeled as previously described herein. Other non-limiting examples of indirect labeling include detection of a primary antibody using a labeled secondary antibody, for example, a fluorescently-labeled secondary antibody and end-labeling of a DNA probe with biotin such that it can be detected with fluorescently-labeled streptavidin.
Detecting Genetic Variations Associated with Autism Spectrum Disorder 1001371 Described herein, are methods that can be used to detect genetic variations. Detecting specific genetic variations, for example, polymorphic markers and/or haplotypes, copy number, absence or presence of an allele, or genotype associated with a developmental disorder as described herein, can be accomplished by methods known in the art for analyzing nucleic acids and/or detecting sequences at polymorphic or genetically variable sites, for example, amplification techniques, hybridization techniques, sequencing, arrays, or any combination thereof. Thus, by use of these methods disclosed herein or other methods available to the person skilled in the art, one or more alleles at polymorphic markers, including microsatellites, SNPs, CNVs, or other types of genetic variations, can be identified in a sample obtained from a subject.
Nucleic Acids [00138] The nucleic acids and polypeptides described herein can be used in methods and kits of the present disclosure. In some embodiments, aptamers that specifically bind the nucleic acids and polypeptides described herein can be used in methods and kits of the present disclosure. As used herein, a nucleic acid can comprise a deoxyribonucleotide (DNA) or ribonucleotide (RNA), whether singular or in polymers, naturally occurring or non-naturally occurring, double-stranded or single-stranded, coding, for example, a translated gene, or non-coding, for example, a regulatory region, or any fragments, derivatives, mimetics or complements thereof. In some embodiments, nucleic acids can comprise oligonucleotides, nucleotides, polynucleotides, nucleic acid sequences, genomic sequences, antisense nucleic acids, DNA regions, probes, primers, genes, regulatory regions, introns, exons, open-reading frames, binding sites, target nucleic acids and allele-specific nucleic acids.
1001391 "Isolated" nucleic acids, as used herein, are separated from nucleic acids that normally flank the gene or nucleotide sequence (as in genomic sequences) and/or has been completely or partially purified from other transcribed sequences (e.g., as in an RNA
library). For example, isolated nucleic acids of the disclosure can be substantially isolated with respect to the complex cellular milieu in which it naturally occurs, or culture medium when produced by recombinant techniques, or chemical precursors or other chemicals when chemically synthesized. In some instances, the isolated material can form part of a composition, for example, a crude extract containing other substances, buffer system or reagent mix. In some embodiments, the material can be purified to essential homogeneity using methods known in the art, for example, by polyacrylamide gel electrophoresis (PAGE) or column chromatography (e.g., HPLC). With regard to genomic DNA (gDNA), the term "isolated" also can refer to nucleic acids that are separated from the chromosome with which the genomic DNA is naturally associated. For example, the isolated nucleic acid molecule can contain less than about 250 kb, 200 kb, 150 kb, 100 kb, 75 kb, 50 kb, 25 kb, 10 kb, 5 kb, 4 kb, 3 kb, 2kb, 1 kb, 0.5 kb or 0.1 kb of the nucleotides that flank the nucleic acid molecule in the gDNA of the cell from which the nucleic acid molecule is derived.
1001401 Nucleic acids can be fused to other coding or regulatory sequences can be considered isolated. For example, recombinant DNA contained in a vector is included in the definition of "isolated" as used herein. In some embodiments, isolated nucleic acids can include recombinant DNA molecules in heterologous host cells or heterologous organisms, as well as partially or substantially purified DNA molecules in solution. Isolated nucleic acids also encompass in vivo and in vitro RNA transcripts of the DNA molecules of the present disclosure.
An isolated nucleic acid molecule or nucleotide sequence can be synthesized chemically or by recombinant means.
Such isolated nucleotide sequences can be useful, for example, in the manufacture of the encoded polypeptidc, as probes for isolating homologous sequences (e.g., from other mammalian species), for gene mapping (e.g., by in situ hybridization with chromosomes), or for detecting expression of the gene, in tissue (e.g., human tissue), such as by Northern blot analysis or other hybridization techniques disclosed herein. The disclosure also pertains to nucleic acid sequences that hybridize under high stringency hybridization conditions, such as for selective hybridization, to a nucleotide sequence described herein. Such nucleic acid sequences can be detected and/or isolated by allele- or sequence-specific hybridization (e.g., under high stringency conditions).
Stringency conditions and methods for nucleic acid hybridizations are well known to the skilled person (see, e.g., Current Protocols in Molecular Biology, Ausubel, F. et al., John Wiley & Sons, (1998), and Kraus, M. and Aaronson, S., Methods Enzymol., 200:546-556 (1991).
[00141] Calculations of "identity" or "percent identity" between two or more nucleotide or amino acid sequences can be determined by aligning the sequences for optimal comparison purposes (e.g., gaps can be introduced in the sequence of a first sequence).
The nucleotides at corresponding positions are then compared, and the percent identity between the two sequences is a function of the number of identical positions shared by the sequences (i.e., % identity = # of identical positions/total # of positions x 100). For example, a position in the first sequence is occupied by the same nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position. The percent identity between the two sequences is a function of the number of identical positions shared by the sequences, taking into account the number of gaps, and the length of each gap, which need to be introduced for optimal alignment of the two sequences.
In some embodiments, the length of a sequence aligned for comparison purposes is at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 95%, of the length of the reference sequence. The actual comparison of the two sequences can be accomplished by well-known methods, for example, using a mathematical algorithm. A non-limiting example of such a mathematical algorithm is described in Karlin, S.
and Altschul, S., Proc. Natl. Acad. Sci. USA, 90- 5873-5877 (1993). Such an algorithm is incorporated into the NBLAST and XBLAST programs (version 2.0), as described in Altschul, S. et al., Nucleic Acids Res., 25:3389-3402 (1997). When utilizing BLAST and Gapped BLAST programs, any relevant parameters of the respective programs (e.g., NBLAST) can be used. For example, parameters for sequence comparison can be set at score= 100, word length= 12, or can be varied (e.g. , W=5 or W=20). Other examples include the algorithm of Myers and Miller, CABIOS
(1989), ADVANCE, ADAM, BLAT, and FASTA. In another embodiment, the percent identity between two amino acid sequences can be accomplished using, for example, the GAP
program in the GCG software package (Accelrys, Cambridge, UK).
[00142] "Probes" or "primers" can be oligonucleotides that hybridize in a base-specific manner to a complementary strand of a nucleic acid molecule. Probes can include primers, which can be a single-stranded oligonucleotide probe that can act as a point of initiation of template-directed DNA synthesis using methods including but not limited to, polymerase chain reaction (PCR) and ligase chain reaction (LCR) for amplification of a target sequence.
Oligonucleotides, as described herein, can include segments or fragments of nucleic acid sequences, or their Date Recue/Date Received 2021-04-27 complements. In some embodiments, DNA segments can be between 5 and 10,000 contiguous bases, and can range from 5, 10, 12, 15, 20, or 25 nucleotides to 10, 15, 20, 25, 30, 40, 50, 100, 200, 500, 1000 or 10,000 nucleotides. In addition to DNA and RNA, probes and primers can include polypeptide nucleic acids (PNA), as described in Nielsen, P. et al., Science 254: 1497-1500 (1991). A probe or primer can comprise a region of nucleotide sequence that hybridizes to at least about 15, typically about 20-25, and in certain embodiments about 40, 50 or 75, consecutive nucleotides of a nucleic acid molecule.
[00143] The present disclosure also provides isolated nucleic acids, for example, probes or primers, that contain a fragment or portion that can selectively hybridize to a nucleic acid that comprises, or consists of, a nucleotide sequence, wherein the nucleotide sequence can comprise at least one polymorphism or polymorphic allele contained in the genetic variations described herein or the wild-type nucleotide that is located at the same position, or the compliments thereof. In some embodiments, the probe or primer can be at least 70%
identical, at least 80%
identical, at least 85% identical, at least 90% identical, or at least 95%
identical, to the contiguous nucleotide sequence or to the complement of the contiguous nucleotide sequence.
[00144] In a preferred embodiment, a nucleic acid probe can be an oligonucleotide capable of hybridizing with a complementary regions of a gene associated with a developmental disorder containing a genetic variation described herein. The nucleic acid fragments of the disclosure can be used as probes or primers in assays such as those described herein.
[00145] The nucleic acids of the disclosure, such as those described above, can be identified and isolated using standard molecular biology techniques well known to the skilled person. In some embodiments, DNA can be amplified and/or can be labeled (e.g., radiolabeled, fluorescently labeled) and used as a probe for screening, for example, a cDNA library derived from an organism. cDNA can be derived from mRNA and can be contained in a suitable vector. For example, corresponding clones can be isolated, DNA obtained fallowing in vivo excision, and the cloned insert can be sequenced in either or both orientations by art-recognized methods to identify the correct reading frame encoding a polypeptide of the appropriate molecular weight.
Using these or similar methods, the polypeptide and the DNA encoding the polypeptide can be isolated, sequenced and further characterized.
[00146] In some embodiments, nucleic acid can comprise one or more polymorphisms, variations, or mutations, for example, single nucleotide polymorphisms (SNPs), copy number variations (CNVs), for example, insertions, deletions, inversions, and translocations. In some embodiments, nucleic acids can comprise analogs, for example, phosphorothioates, phosphoramidates, methyl phosphonate, chiralmethyl phosphonates, 2-0-methyl ribonucleotides, or modified nucleic acids, for example, modified backbone residues or linkages, or nucleic acids combined with carbohydrates, lipids, protein or other materials, or peptide nucleic acids (PNAs), for example, chromatin, ribosomes, and transcriptosomes. In some embodiments nucleic acids can comprise nucleic acids in various stmctures, for example, A DNA, B DNA, Z-form DNA, siRNA, tRNA, and ribozymcs. In some embodiments, the nucleic acid may be naturally or non-naturally polymorphic, for example, having one or more sequence differences, for example, additions, deletions and/or substitutions, as compared to a reference sequence. In some embodiments, a reference sequence can be based on publicly available information, for example, the U.C. Santa Cruz Human Genome Browser Gateway or the NCBI website. In another embodiment, a reference sequence can be determined by a practitioner of the present invention using methods well known in the art, for example, by sequencing a reference nucleic acid.
[00147] In some embodiment a probe can hybridize to an allele, SNP, or CNV as described herein. In some embodiments, the probe can bind to another marker sequence associated with a developmental disorder as described herein.
[00148] One of skill in the art would know how to design a probe so that sequence specific hybridization will occur only if a particular allele is present in a genomic sequence from a test sample. The disclosure can also be reduced to practice using any convenient genotyping method, including commercially available technologies and methods for genotyping particular genetic variations [00149] Control probes can also be used, for example, a probe that binds a less variable sequence, for example, a repetitive DNA associated with a centromere of a chromosome, can be used as a control. In some embodiments, probes can be obtained from commercial sources. In some embodiments, probes can be synthesized, for example, chemically or in vitro, or made from chromosomal or genomic DNA through standard techniques. In some embodiments sources of DNA that can be used include genomic DNA, cloned DNA sequences, somatic cell hybrids that contain one, or a part of one, human chromosome along with the normal chromosome complement of the host, and chromosomes purified by flow cytometry or microdis section. The region of interest can be isolated through cloning, or by site-specific amplification using PCR.
[00150] One or more nucleic acids for example, a probe or primer, can also be labeled, for example, by direct labeling, to comprise a detectable label. A detectable label can comprise any label capable of detection by a physical, chemical, or a biological process for example, a radioactive label, such as 32P or 3H, a fluorescent label, such as FITC, a chromophore label, an affinity-ligand label, an enzyme label, such as alkaline phosphatase, horseradish peroxidase, or Date Recue/Date Received 2021-04-27 12 galactosidase, an enzyme cofactor label, a hapten conjugate label, such as digoxigenin or dinitrophenyl, a Raman signal generating label, a magnetic label, a spin label, an epitope label, such as the FLAG or HA epitope, a luminescent label, a heavy atom label, a nanoparticle an electrochemical label, a light scattering label, a spherical shell label, semiconductor nanocrystal label, such as quantum dots (described in U.S. Pat. No.
6,207,392), and probes labeled with any other signal generating label known to those of skill in the art, wherein a label can allow the probe to be visualized with or without a secondary detection molecule. A
nucleotide can be directly incorporated into a probe with standard techniques, for example, nick translation, random priming, and PCR labeling.
[00151] Non-limiting examples of label moieties useful for detection in the invention include, without limitation, suitable enzymes such as horseradish peroxidase, alkaline phosphatase, beta-galactosidase, or acetylcholinesterase; members of a binding pair that are capable of forming complexes such as streptavidin/biotin, avidin/biotin or an antigeniantibody complex including, for example, rabbit IgG and anti-rabbit IgG; fluorophores such as umbelliferone, fluorescein, fluorescein isothiocyanate, rhodamine, tetramethyl rhodamine, eosin, green fluorescent protein, erythrosin, coumarin, methyl coumarin, pyrenc, malachite green, stilbene, lucifer yellow, Cascade Blue, Texas Red, dichlorotriazinylamine fluorescein, dansyl chloride, phycoerythrin, fluorescent lanthanide complexes such as those including Europium and Terbium, cyanine dye family members, such as Cy3 and Cy5, molecular beacons and fluorescent derivatives thereof, as well as others known in the art as described, for example, in Principles of Fluorescence Spectroscopy, Joseph R. Lakowicz (Editor), Plenum Pub Corp, 2nd edition (July 1999) and the 6th Edition of the Molecular Probes Handbook by Richard P. Hoagland; a luminescent material such as luminol; light scattering or plasmon resonant materials such as gold or silver particles or 14 123 124 125 32-, 33P, 35S or 3H.
quantum dots; or radioactive material include C, 1, 1, 1, Tc,,m, P
[00152] Other labels can also be used in the methods of the present disclosure, for example, backbone labels. Backbone labels comprise nucleic acid stains that bind nucleic acids in a sequence independent manner. Non-limiting examples include intercalating dyes such as phenanthridines and acridines (e.g., ethidium bromide, propidium iodide, hexidium iodide, dihydroethidium, ethidium homodimer-1 and -2, ethidium monoazide, and ACMA);
some minor grove binders such as indoles and imidazoles (e.g., Hoechst 33258, Hoechst 33342, Hoechst 34580 and DAPI); and miscellaneous nucleic acid stains such as acridine orange (also capable of intercalating), 7-AAD, actinomycin D, LDS751, and hydroxystilbamidine. All of the aforementioned nucleic acid stains are commercially available from suppliers such as Molecular Probes, Inc. Still other examples of nucleic acid stains include the following dyes from Molecular Probes: cyanine dyes such as SYTOX Blue, SYTOX Green, SYTOX Orange, POPO-1, POPO-3, YOYO-1, YOYO-3, TOTO-1, TOTO-3, J0J0-1, LOLO-1, BOBO-1, BOBO-3, P0-PRO-1, PO-PRO-3, BO-PRO-1, BO-PRO-3, TO-PRO-1, TO-PRO-3, TO-PRO-5, JO-PRO-1, LO-PRO-1, YO-PRO-1, YO-PRO-3, PicoGreen, OliGreen, RiboGreen, SYBR Gold, SYBR
Green I, SYBR Green II, SYBR DX, SYTO-40, -41, -42, -43, -44, -45 (blue), SYTO-13, -16, -24, -21, -23, -12, -11, -20, -22, -15, -14, -25 (green), SYTO-81, -80, -82, -83, -84, -85 (orange), SYTO-64, -17, -59, -61, -62, -60, -63 (red).
100153] In some embodiments, fluorophores of different colors can be chosen, for example, 7-amino-4-methylcoumarin-3-acetic acid (AMCA), 5-(and-6)-carboxy-X-rhodamine, lissamine rhodamine B, 5-(and-6)-carboxyfluorescein, fluorescein-5-isothiocyanate (FITC), 7-diethylaminocoumarin-3-carboxylic acid, tetramethylrhodamine-5-(and-6)-isothiocyanate, 5-(and-6)-carboxytetramethylrhodamine, 7-hydroxycoumarin-3-carboxylic acid, 64fluorescein 5-(and-6)-carboxamido]hexanoic acid, N-(4,4-difluoro-5,7-dimethy1-4-bora-3a,4a diaza-3-indacenepropionic acid, eosin-5-isothiocyanate, erythrosin-5- isothiocyanate, TRITC, rhodamine, tetramethylrhodamine, R-phycoerythrin, Cy-3, Cy-5, Cy-7, Texas Red, Phar-Red, allophycocyanin (APC),and CASCADETM blue acctylazide, such that each probe in or not in a set can be distinctly visualized. In some embodiments, fluorescently labeled probes can be viewed with a fluorescence microscope and an appropriate filter for each fluorophore, or by using dual or triple band-pass filter sets to observe multiple fluorophores.
In some embodiments, techniques such as flow cytometry can be used to examine the hybridization pattern of the probes.
100154] In other embodiments, the probes can be indirectly labeled, for example, with biotin or digoxygenin, or labeled with radioactive isotopes such as 32P and/or 3H. As a non-limiting example, a probe indirectly labeled with biotin can be detected by avidin conjugated to a detectable marker. For example, avidin can be conjugated to an enzymatic marker such as alkaline phosphatase or horseradish peroxidase. In some embodiments, enzymatic markers can be detected using colorimetric reactions using a substrate and/or a catalyst for the enzyme. In some embodiments, catalysts for alkaline phosphatase can be used, for example, 5-bromo-4-chloro-3-indolylphosphate and nitro blue tetrazolium. In some embodiments, a catalyst can be used for horseradish peroxidase, for example, diaminobenzoate.
Methods of Detecting Genetic Variations 1001551 In some embodiments, standard techniques for genotyping for the presence genetic variations, for example, amplification, can be used. Amplification of nucleic acids can be accomplished using methods known in the art. Generally, sequence information from the region of interest can be used to design oligonucleotide primers that can be identical or similar in sequence to opposite strands of a template to be amplified. In some embodiments, amplification methods can include but are not limited to, fluorescence-based techniques utilizing PCR, for example, ligasc chain reaction (LCR), Nested PCR, transcription amplification, self-sustained sequence replication, and nucleic acid based sequence amplification (NASBA), and multiplex ligation-dependent probe amplification (MLPA). Guidelines for selecting primers for PCR
amplification are well known in the art. In some embodiments, a computer program can be used to design primers, for example, Oligo (National Biosciences, Inc, Plymouth Minn.), MacVector (Kodak/1BI), and GCG suite of sequence analysis programs.
[00156] In some embodiments, commercial methodologies available for genotyping, for example, SNP genotyping, can be used, but are not limited to, TaqMan genotyping assays (Applied Biosystems), SNPlex platforms (Applied Biosystems), gel electrophoresis, capillary electrophoresis, size exclusion chromatography, mass spectrometry, for example, MassARRAY
system (Sequenom), minisequencing methods, real-time Polymerase Chain Reaction (PCR), Bio-Plex system (BioRad), CEQ and SNPstream systems (Beckman), array hybridization technology, for example, Affymetrix GeneChip (Perlegen), BeadArray Technologies, for example, Illumina GoldenGate and Infinium assays, array tag technology, Multiplex Ligation-dependent Probe Amplification (MLPA), and endonuclease-based fluorescence hybridization technology (Invader; Third Wave). PCR can be a procedure in which target nucleic acid is amplified in a manner similar to that described in U.S. Pat. No. 4,683,195 and subsequent modifications of the procedure described therein. In some embodiments, real-time quantitative PCR
can be used to determine genetic variations, wherein quantitative PCR can permit both detection and quantification of a DNA sequence in a sample, for example, as an absolute number of copies or as a relative amount when normalized to DNA input or other normalizing genes.
In some embodiments, methods of quantification can include the use of fluorescent dyes that can intercalate with double-stranded DNA, and modified DNA oligonucleotidc probes that can fluoresce when hybridized with a complementary DNA.
[00157] In some embodiments of the disclosure, a sample containing genomic DNA
obtained from the subject can be collected and PCR can used to amplify a fragment of nucleic acid that comprises one or more genetic variations that can be indicative of a susceptibility to a developmental disorder. In another embodiment, detection of genetic variations can be accomplished by expression analysis, for example, by using quantitative PCR.
In some embodiments, this technique can assess the presence of an alteration in the expression or composition of one or more polypeptides or splicing variants encoded by a nucleic acid associated with a developmental disorder.
1001581 In a preferred embodiment, the DNA template of a sample from a subject containing a SNP can be amplified by PCR prior to detection with a probe. In such an embodiment, the amplified DNA serves as the template for a detection probe and, in some embodiments, an enhancer probe. Certain embodiments of the detection probe, the enhancer probe, and/or the primers used for amplification of the template by PCR can comprise the use of modified bases, for example, modified A, T, C, G, and U, wherein the use of modified bases can be useful for adjusting the melting temperature of the nucleotide probe and/or primer to the template DNA. In a preferred embodiment, modified bases are used in the design of the detection nucleotide probe.
Any modified base known to the skilled person can be selected in these methods, and the selection of suitable bases is well within the scope of the skilled person based on the teachings herein and known bases available from commercial sources as known to the skilled person.
1001591 In some embodiments, identification of genetic variations can be accomplished using hybridization methods. The presence of a specific marker allele or a particular genomic segment comprising a genetic variation, or representative of a genetic variation, can be indicated by sequence-specific hybridization of a nucleic acid probe specific for the particular allele or the genetic variation in a nucleic acid containing sample that has or has not been amplified but methods described herein. The presence of more than one specific marker allele or several genetic variations can be indicated by using two or more sequence-specific nucleic acid probes, wherein each is specific for a particular allele and/or genetic variation.
1001601 Hybridization can be performed by methods well known to the person skilled in the art, for example, hybridization techniques such as fluorescent in situ hybridization (FISH), Southern analysis, Northern analysis, or in situ hybridization. In some embodiments, hybridization refers to specific hybridization, wherein hybridization can be performed with no mismatches. Specific hybridization, if present, can be using standard methods. In some embodiments, if specific hybridization occurs between a nucleic acid probe and the nucleic acid in the sample, the sample can contain a sequence that can be complementary to a nucleotide present in the nucleic acid probe. In sonic embodiments, if a nucleic acid probe can contain a particular allele of a polymorphic marker, or particular alleles for a plurality of markers, specific hybridization is indicative of the nucleic acid being completely complementary to the nucleic acid probe, including the particular alleles at polymorphic markers within the probe. In some embodiments a probe can contain more than one marker alleles of a particular haplotype, for example, a probe can contain alleles complementary to 2, 3, 4, 5 or all of the markers that make up a particular haplotype. In some embodiments detection of one or more particular markers of the haplotype in the sample is indicative that the source of the sample has the particular haplotype.
[00161] In some embodiments, PCR conditions and primers can be developed that amplify a product only when the variant allele is present or only when the wild type allele is present, for example, allele-specific PCR. In some embodiments of allele-specific PCR, a method utilizing a detection oligonucleotide probe comprising a fluorescent moiety or group at its 3' terminus and a quencher at its 5' terminus, and an enhancer oligonucleotide, can be employed, as described by Kutyavin et al. (Nucleic Acid Res. 34:e128 (2006)).
[00162] An allele-specific primer/probe can be an oligonucleotide that is specific for particular a polymorphism can be prepared using standard methods. In some embodiments, allele-specific oligonucleotide probes can specifically hybridize to a nucleic acid region that contains a genetic variation. In some embodiments, hybridization conditions can be selected such that a nucleic acid probe can specifically bind to the sequence of interest, for example, the variant nucleic acid sequence.
[00163] In some embodiments, allele-specific restriction digest analysis can be used to detect the existence of a polymorphic variant of a polymorphism, if alternate polymorphic variants of the polymorphism can result in the creation or elimination of a restriction site.
Allele-specific restriction digests can be performed, for example, with the particular restriction enzyme that can differentiate the alleles. In some embodiments, PCR can be used to amplify a region comprising the polymorphic site, and restriction fragment length polymorphism analysis can be conducted.
In some embodiments, for sequence variants that do not alter a common restriction site, mutagenic primers can be designed that can introduce one or more restriction sites when the variant allele is present or when the wild type allele is present.
[00164] In some embodiments, fluorescence polarization template-directed dye-terminator incorporation (FP-TDI) can be used to determine which of multiple polymorphic variants of a polymorphism can be present in a subject. Unlike the use of allele-specific probes or primers, this method can employ primers that can terminate adjacent to a polymorphic site, so that extension of the primer by a single nucleotide can result in incorporation of a nucleotide complementary to the polymorphic variant at the polymorphic site.
[00165] In some embodiments, DNA containing an amplified portion can be dot-blotted, using standard methods and the blot contacted with the oligonucleotide probe. The presence of specific hybridization of the probe to the DNA can then be detected. The methods can include determining the genotype of a subject with respect to both copies of the polymorphic site present in the genome, wherein if multiple polymorphic variants exist at a site, this can be appropriately indicated by specifying which variants are present in a subject. Any of the detection means described herein can be used to determine the genotype of a subject with respect to one or both copies of the polymorphism present in the subject's genome.
[00166] In some embodiments, a peptide nucleic acid (PNA) probe can be used in addition to, or instead of, a nucleic acid probe in the methods described herein. A PNA can be a DNA mimic having a peptide-like, inorganic backbone, for example, N-(2-aminoethyl) glycine units with an organic base (A, G, C, T or U) attached to the glycine nitrogen via a methylene carbonyl linker.
[00167] Nucleic acid sequence analysis can also be used to detect genetic variations, for example, genetic variations can be detected by sequencing exons, introns, 5' untranslated sequences, or 3' untranslated sequences. One or more methods of nucleic acid analysis that are available to those skilled in the art can be used to detect genetic variations, including but not limited to, direct manual sequencing, automated fluorescent sequencing, single-stranded conformation polymorphism assays (SSCP); clamped denaturing gel electrophoresis (CDGE);
denaturing gradient gel electrophoresis (DGGE), two-dimensional gel electrophoresis (2DGE or TDGE); conformational sensitive gel electrophoresis (CSGE); denaturing high performance liquid chromatography (DHPLC), infrared matrix-assisted laser desorption/ionization (IR-MALDI) mass spectrometry, mobility shift analysis, quantitative real-time PCR, restriction enzyme analysis, heteroduplex analysis; chemical mismatch cleavage (CMC), RNase protection assays, use of polypeptides that recognize nucleotide mismatches, allele-specific PCR, real-time pyrophosphate DNA sequencing, PCR amplification in combination with denaturing high performance liquid chromatography (dHPLC), and combinations of such methods.
[00168] Sequencing can be accomplished through classic Sanger sequencing methods, which are known in the art. In a preferred embodiment sequencing can be performed using high-throughput sequencing methods some of which allow detection of a sequenced nucleotide immediately after or upon its incorporation into a growing strand, for example, detection of sequence in substantially real time or real time. In some cases, high throughput sequencing generates at least 1,000, at least 5,000, at least 10,000, at least 20,000, at least 30,000, at least 40,000, at least 50,000, at least 100,000 or at least 500,000 sequence reads per hour; with each read being at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 120 or at least 150 bases per read (or 500 ¨ 1,000 bases per read for 454).
[00169] High-throughput sequencing methods can include but are not limited to, Massively Parallel Signature Sequencing (MPSS, Lynx Therapeutics), Polony sequencing, pyrosequencing, Illumina (Solexa) sequencing, SOLiD sequencing, on semiconductor sequencing, DNA nanoball sequencing, Helioscopelm single molecule sequencing, Single Molecule SMRTTm sequencing, Single Molecule real time (RNAP) sequencing, Nanopore DNA
sequencing, and/or sequencing by hybridization, for example, a non-enzymatic method that uses a DNA microarray, or microfluidic Sanger sequencing.
[00170] In some embodiments, high-throughput sequencing can involve the use of technology available by Hclicos BioSciences Corporation (Cambridge, Mass.) such as the Single Molecule Sequencing by Synthesis (SMSS) method. SMSS is unique because it allows for sequencing the entire human genome in up to 24 hours. This fast sequencing method also allows for detection of a SNP/nucleotide in a sequence in substantially real time or real time.
Finally, SMSS is powerful because, like the MIP technology, it does not use a pre-amplification step prior to hybridization.
SMSS does not use any amplification. SMSS is described in US Publication Application Nos.
20060024711; 20060024678; 20060012793; 20060012784; and 20050100932. In some embodiments, high-throughput sequencing involves the use of technology available by 454 Life Sciences, Inc. (a Roche company, Branford, Conn.) such as the PicoTiterPlate device which includes a fiber optic plate that transmits chemiluminescent signal generated by the sequencing reaction to be recorded by a CCD camera in the instrument. This use of fiber optics allows for the detection of a minimum of 20 million base pairs in 4.5 hours.
[00171] In some embodiments, PCR-amplified single-strand nucleic acid can be hybridized to a primer and incubated with a polymerase, ATP sulfurylase, luciferase, apyrase, and the substrates luciferin and adenosine 5' phosphosulfate. Next, deoxynucleotide triphosphates corresponding to the bases A, C, G, and T (U) can be added sequentially. A base incorporation can be accompanied by release of pyrophosphate, which can be converted to ATP by sulfurylase, which can drive synthesis of oxyluciferin and the release of visible light. Since pyrophosphate release can be equimolar with the number of incorporated bases, the light given off can be proportional to the number of nucleotides adding in any one step. The process can repeat until the entire sequence can be determined. In some embodiments, pyrosequencing can be utilized to analyze amplicons to determine whether breakpoints are present. In another embodiment, pyrosequencing can map surrounding sequences as an internal quality control.
[00172] Pyrosequencing analysis methods are known in the art. Sequence analysis can include a four-color sequencing by ligation scheme (degenerate ligation), which involves hybridizing an anchor primer to one of four positions. Then an enzymatic ligation reaction of the anchor primer to a population of degenerate nonamers that are labeled with fluorescent dyes can be performed.
At any given cycle, the population of nonamers that is used can be structured such that the identity of one of its positions can be correlated with the identity of the fluorophore attached to that nonamer. To the extent that the ligase discriminates for complementarily at that queried position, the fluorescent signal can allow the inference of the identity of the base. After performing the ligation and four-color imaging, the anchor primer: nonamer complexes can be stripped and a new cycle begins. Methods to image sequence information after performing ligation are known in the art [00173] In some embodiments, analysis by restriction enzyme digestion can be used to detect a particular genetic variation if the genetic variation results in creation or elimination of one or more restriction sites relative to a reference sequence. In some embodiments, restriction fragment length polymorphism (RFLP) analysis can be conducted, wherein the digestion pattern of the relevant DNA fragment indicates the presence or absence of the particular genetic variation in the sample.
[00174] In some embodiments, arrays of oligonucleotide probes that can be complementary to target nucleic acid sequence segments from a subject can be used to identify genetic variations.
In some embodiments, an array of oligonucleotide probes comprises an oligonucleotide array, for example, a microarray. In some embodiments, the present disclosure features arrays that include a substrate having a plurality of addressable areas, and methods of using them. At least one area of the plurality includes a nucleic acid probe that binds specifically to a sequence comprising a genetic variation, and can be used to detect the absence or presence of said genetic variation, for example, one or more SNPs, microsatellites, or CNVs, as described herein, to determine or identify an allele or genotype. For example, the array can include one or more nucleic acid probes that can be used to detect a genetic variation such as those listed in Tables 1 and 5. In some embodiments, the array can further comprise at least one area that includes a nucleic acid probe that can be used to specifically detect another marker associated with a developmental disorder, for example, ASD, as described herein.
[00175] Microarray hybridization can be performed by hybridizing a nucleic acid of interest; for example, a nucleic acid encompassing a genetic variation, with the array and detecting hybridization using nucleic acid probes. In some embodiments, the nucleic acid of interest is amplified prior to hybridization. Hybridization and detecting can be carried out according to standard methods described in Published PCT Applications: WO 92/10092 and WO
95/11995, and U.S. Pat. No. 5,424,186. For example, an array can be scanned to determine the position on the array to which the nucleic acid hybridizes. The hybridization data obtained from the scan can be, for example, in the form of fluorescence intensities as a function of location on the array.
[00176] Arrays can be formed on substrates fabricated with materials such as paper; glass;
plastic, for example, polypropylene, nylon, or polystyrene; polyacrylamide;
nitrocellulose;
silicon; optical fiber; or any other suitable solid or semisolid support; and can be configured in a planar, for example, glass plates or silicon chips); or three dimensional, for example, pins, fibers, beads, particles, microtitcr wells, and capillaries, configuration.
1001771 Methods for generating arrays are known in the art and can include for example;
photolithographic methods (U.S. Pat. Nos. 5,143,854, 5,510,270 and 5,527,681);
mechanical methods, for example, directed-flow methods (U.S. Pat. No. 5,384,261); pin-based methods (U.S. Pat. No. 5;288;514); bead-based techniques (PCT US/93/04145); solid phase oligonucleotide synthesis methods; or by other methods known to a person skilled in the art (see, e.g., Bier, F.F., et al. Adv Biochem Eng Biotechnol 109:433-53 (2008);
Hoheisel, J. D., Nat Rev Genet 7: 200-10 (2006); Fan, J. B., et al. Methods Enzymol 410:57-73 (2006);
Raqoussis, J. &
Elvidge, G., Expert Rev Mol Design 6: 145-52 (2006); Mockler, T.C., et al.
Genomics 85: 1-15 (2005).
Many additional descriptions of the preparation and use of oligonucleotide arrays for detection of polymorphisms can be found, for example, in US
6,858,394, US
6,429,027, US 5,445,934, US 5,700,637, US 5,744,305, US 5,945,334, US
6,054,270, US
6,300,063, US 6,733,977, US 7,364,858, EP 619 321, and EP 373 203.
Methods for array production, hybridization, and analysis are also described in Snijders et al., Nat. Genetics 29:263-264 (2001); Klein et al., Proc.
Natl. Acad. Sci. USA 96:4494-4499 (1999); Albertson et al., Breast Cancer Research and Treatment 78:289-298 (2003); and Snijders et al., "BAC microarray based comparative genomic hybridization," in: Zhao et al. (eds), Bacterial Artificial Chromosomes:
Methods and Protocols, Methods in Molecular Biology, Humana Press, 2002.
1001781 In some embodiments, oligonucleotide probes forming an array can be attached to a substrate by any number of techniques, including, but not limited to, in situ synthesis, for example, high-density oligonucleotide arrays, using photolithographic techniques;
spotting/printing a medium to low density on glass, nylon, or nitrocellulose;
by masking; and by dot-blotting on a nylon or nitrocellulose hybridization membrane. In some embodiments, oligonucleatides can be immobilized via a linker, including but not limited to, by covalent, ionic, or physical linkage. Linkers for immobilizing nucleic acids and polypeptides, including reversible or cleavable linkers, are known in the art (U.S. Pat. No. 5,451,683 and W098/20019).
In some embodiments, oligonucleotides can be non-covalently immobilized on a substrate by hybridization to anchors, by means of magnetic beads, or in a fluid phase, for example, in wells or capillaries.
1001791 An array can comprise oligonucleotide hybridization probes capable of specifically hybridizing to different genetic variations. In some embodiments, oligonucleotide arrays can Date Recue/Date Received 2021-04-27 comprise a plurality of different oligonucleotide probes coupled to a surface of a substrate in different known locations. In some embodiments, oligonucleotide probes can exhibit differential or selective binding to polymorphic sites, and can be readily designed by one of ordinary skill in the art, for example, an oligonucleotide that is perfectly complementary to a sequence that encompasses a polymorphic site, for example, a sequence that includes the polymorphic site, within it, or at one end, can hybridize preferentially to a nucleic acid comprising that sequence, as opposed to a nucleic acid comprising an alternate polymorphic variant.
[00180] In some embodiments, arrays can include multiple detection blocks, for example, multiple groups of probes designed for detection of particular polymorphisms.
In some embodiments, these arrays can be used to analyze multiple different polymorphisms. In some embodiments, detection blocks can be grouped within a single array or in multiple, separate arrays, wherein varying conditions, for example, conditions optimized for particular polymorphisms, can be used during hybridization. General descriptions of using oligonucleotide arrays for detection of polymorphisms can be found, for example, in U.S. Pat.
Nos. 5,858,659 and 5,837,832. In addition to oligonucleotide arrays, cDNA arrays can be used similarly in certain embodiments.
[00181] The methods described herein can include but are not limited to providing an array as described herein; contacting the array with a sample, and detecting binding of a nucleic acid from the sample to the array. In some embodiments, the method can comprise amplifying nucleic acid from the sample, for example, a region associated with a developmental disorder or a region that includes another region associated with a developmental disorder. In some embodiments, the methods described herein can include using an array that can identify differential expression patterns or copy numbers of one or more genes in samples from control and affected individuals.
For example, arrays of probes to a marker described herein can be used to identify genetic variations between DNA from an affected subject, and control DNA obtained from an individual that does not have a developmental disorder. Since the nucleotides on the array can contain sequence tags, their positions on the array can be accurately known relative to the genomic sequence.
[00182] in some embodiments, it can be desirable to employ methods that can detect the presence of multiple genetic variations, for example, polymorphic variants at a plurality of polymorphic sites, in parallel or substantially simultaneously. In some embodiments, these methods can comprise oligonucleotide arrays and other methods, including methods in which reactions, for example, amplification and hybridization, can be performed in individual vessels, for example, within individual wells of a multi-well plate or other vessel.

[00183] Determining the identity of a genetic variation can also include or consist of reviewing a subject's medical history, where the medical history includes information regarding the identity, copy number, presence or absence of one or more alleles or SNPs in the subject, e.g., results of a genetic test.
[00184] In some embodiments extended runs of homozygosity (ROH) may be useful to map recessive disease genes in outbred populations. Furthermore, even in complex disorders, a high number of affected individuals may have the same haplotype in the region surrounding a disease mutation. Therefore, a rare pathogenic variant and surrounding haplotype can be enriched in frequency in a group of affected individuals compared with the haplotype frequency in a cohort of unaffected controls. Homozygous haplotypes (HH) that are shared by multiple affected individuals can be important for the discovery of recessive disease genes in complex disorders such as ASD. In some embodiments, the traditional homozygosity mapping method can be extended by analysing the haplotype within shared ROB regions to identify homozygous segments of identical haplotype that are present uniquely or at a higher frequency in ASD
probands compared to parental controls. Such regions are termed risk homozygous haplotypes (rHH), which may contain low-frequency recessive variants that contribute to ASD risk in a subset of ASD patients.
[00185] Genetic variations can also be identified using any of a number of methods well known in the art. For example, genetic variations available in public databases, which can be searched using methods and custom algorithms or algorithms known in the art, can be used. In some embodiments, a reference sequence can be from, for example, the human draft genome sequence, publicly available in various databases, or a sequence deposited in a database such as GenBank.
Methods of Detecting CNVs [00186] Detection of genetic variations, specifically CNVs, can be accomplished by one or more suitable techniques described herein. Generally, techniques that can selectively determine whether a particular chromosomal segment is present or absent in an individual can be used for genotyping CNVs. Identification of novel copy number variations can be done by methods for assessing genomic copy number changes.
[00187] In some embodiments, methods include but are not limited to, methods that can quantitatively estimate the number of copies of a particular genomic segment, but can also include methods that indicate whether a particular segment is present in a sample or not. In some embodiments, the technique to be used can quantify the amount of segment present, for example, determining whether a DNA segment is deleted, duplicated, or triplicated in subject, for example, Fluorescent In Situ Hybridization (FISH) techniques, and other methods described herein. In some embodiments, methods include detection of copy number variation from array intensity and sequencing read depth using a stepwise Bayesian model (Zhang Z.D., et al. BMC
Bioinformatics. 2010 Oct 31;11:539). In some embodiments, methods include detecting copy number variations using shotgun sequencing, CNV-seq (Xie C., et al. BMC
Bioinformatics. 2009 Mar. 6;10:80). In some embodiments, methods include analyzing next-generation sequencing (NGS) data for CNV detection using any one of several algorithms developed for each of the four broad methods for CNV detection using NGS, namely the depth of coverage (DOC), read-pair (RP), split-read (SR) and assembly-based (AS) methods. (Teo S.M., et al.
Bioinfoimatics.
2012 Aug. 31). In some embodiments, methods include combining coverage with map information for the identification of deletions and duplications in targeted sequence data (Nord A.S., et al. BMC Genomics. 2011 Apr 12;12:184).
[00188] In some embodiments, other genotyping technologies can be used for detection of CNVs, including but not limited to, karyotypc analysis, Molecular Inversion Probe array technology, for example, Affymetrix SNP Array 6.0, and BeadArray Technologies, for example, Illumina GoldenGate and Infinium assays, as can other platforms such as NimbleGen HD2.1 or HD4.2, High-Definition Comparative Genomic Hybridization (CGH) arrays (Agilent Technologies), tiling array technology (Affymetrix), multiplex ligation-dependent probe amplification (MLPA), Invader assay, fluorescence in situ hybridization, and, in one preferred embodiment, Array Comparative Genomic Hybridization (aCGH) methods. As described herein, karyotype analysis can be a method to determine the content and structure of chromosomes in a sample. In some embodiments, karyotyping can be used, in lieu of aCGH, to detect translocations, which can be copy number neutral, and, therefore, not detectable by aCGH.
Information about amplitude of particular probes, which can be representative of particular alleles, can provide quantitative dosage information for the particular allele, and by consequence, dosage information about the CNV in question, since the marker can be selected as a marker representative of the CNV and can be located within the CNV. In some embodiments, if the CNV is a deletion, the absence of particular marker allele is representative of the deletion. In some embodiments, if the CNV is a duplication or a higher order copy number variation, the signal intensity representative of the allele correlating with the CNV can represent the copy number. A summary of methodologies commonly used is provided in Perkel (Perkel J Nature Methods 5:447-453 (2008)).

[00189] PCR assays can be utilized to detect CNVs and can provide an alternative to array analysis. In particular, PCR assays can enable detection of precise boundaries of gene/chromosome variants, at the molecular level, and which boundaries are identical in different individuals. PCR assays can be based on the amplification of a junction fragment present only in individuals that carry a deletion. This assay can convert the detection of a loss by array CGH to one of a gain by PCR.
[00190] Examples of PCR techniques that can be used in the present invention include, but are not limited to quantitative PCR, real-time quantitative PCR (qPCR), quantitative fluorescent PCR (QF-PCR), multiplex fluorescent PCR (MF-PCR), real time PCR (RT-PCR), single cell PCR, PCR-RFLP/RT-PCR-RFLP, hot start PCR and Nested PCR. Other suitable amplification methods include the ligase chain reaction (LCR), ligation mediated PCR (LM-PCR), degenerate oligonucleotide probe PCR (DOP-PCR), transcription amplification, self-sustained sequence replication, selective amplification of target polynucleotide sequences, consensus sequence primed polymerase chain reaction (CP-PCR), arbitrarily primed polymerase chain reaction (AP-PCR) and nucleic acid based sequence amplification (NABSA).
[00191] Alternative methods for the simultaneous interrogation of multiple regions include quantitative multiplex PCR of short fluorescent fragments (QMPSF), multiplex amplifiable probe hybridization (MAPH) and multiplex ligation-dependent probe amplification (MLPA), in which copy-number differences for up to 40 regions can be scored in one experiment. Another approach can be to specifically target regions that harbor known segmental duplications, which are often sites of copy-number variation. By targeting the variable nucleotides between two copies of a segmental duplication (called paralogous sequence variants) using a SNP-genotyping method that provides independent fluorescence intensities for the two alleles, it is possible to detect an increase in intensity of one allele compared with the other.
[00192] In another embodiment, the amplified piece of DNA can be bound to beads using the sequencing element of the nucleic acid tag under conditions that favor a single amplified piece of DNA molecule to bind a different bead and amplification occurs on each bead.
In some embodiments, such amplification can occur by PCR. Each bead can be placed in a separate well, which can be a picoliter-sized well. In some embodiments, each bead is captured within a droplet of a PCR-reaction-mixture-in-oil-emulsion and PCR amplification occurs within each droplet.
The amplification on the bead results in each bead carrying at least one million, at least 5 million, or at least 10 million copies of the single amplified piece of DNA
molecule.
[00193] In embodiments where PCR occurs in oil-emulsion mixtures, the emulsion droplets are broken, the DNA is denatured and the beads carrying single-stranded nucleic acids clones are deposited into a well, such as a picoliter-sized well, for further analysis according to the methods described herein. These amplification methods allow for the analysis of genomic DNA regions.
Methods for using bead amplification followed by fiber optics detection are described in Margulies et al. 2005, Nature. 15; 437(7057):376-80, and as well as in US
Publication Application Nos. 20020012930; 20030068629; 20030100102; 20030148344;
20040248161;
20050079510, 20050124022; and 20060078909.
[00194] Another variation on the array-based approach can be to use the hybridization signal intensities that are obtained from the oligonucleotides employed on Affymetrix SNP arrays or in Illumina Bead Arrays. Here hybridization intensities are compared with average values that are derived from controls, such that deviations from these averages indicate a change in copy number. As well as providing information about copy number, SNP arrays have the added advantage of providing genotype information. For example, they can reveal loss of heterozygosity, which could provide supporting evidence for the presence of a deletion, or might indicate segmental uniparental disomy (which can recapitulate the effects of structural variation in some genomic regions ¨ Prader-Willi and Angelman syndromes, for example).
[00195] Many of the basic procedures followed in microarray-based genome profiling are similar, if not identical, to those followed in expression profiling and SNP
analysis, including the use of specialized microarray equipment and data-analysis tools. Since microarray-based expression profiling has been well established in the last decade, much can be learned from the technical advances made in this area. Examples of the use of microarrays in nucleic acid analysis that can be used are described in U.S. Pat. No. 6,300,063, U.S. Pat. No.
5,837,832, U.S. Pat. No.
6,969,589, U.S. Pat. No. 6,040,138, U.S. Pat. No. 6,858,412, U.S. application Ser. No.
08/529,115, U.S. application Ser. No. 10/272,384, U.S. application Ser. No.
10/045,575, U.S.
application Scr. No. 10;264,571 and U.S. application Ser. No. 10/264,574. It should be noted that there are also distinct differences such as target and probe complexity, stability of DNA over RNA, the presence of repetitive DNA and the need to identify single copy number alterations in genome profiling.
[00196] In a preferred embodiment, the genetic variations detected comprise CNVs and can be detected using array CGH. In some embodiments, array CGH can be been implemented using a wide variety of techniques. The initial approaches used arrays produced from large-insert genomic clones such as bacterial artificial chromosomes (BACs). Producing sufficient BAC
DNA of adequate purity to make arrays is arduous, so several techniques to amplify small amounts of starting material have been employed. These techniques include ligation-mediated PCR (Snijders et al, Nat. Genet. 29:263-64), degenerate primer PCR using one or several sets of primers, and rolling circle amplification. BAC arrays that provide complete genome tiling paths arc also available. Arrays madc from less complex nucleic acids such as cDNAs, selected PCR
products, and oligonucleotides can also be used. Although most CGH procedures employ hybridization with total genomic DNA, it is possible to use reduced complexity representations of the genome produced by PCR techniques. Computational analysis of the genome sequence can be used to design array elements complementary to the sequences contained in the representation. Various SNP genotyping platforms, some of which use reduced complexity genomic representations, can be useful for their ability to determine both DNA
copy number and allelic content across the genome. In some embodiments, small amounts of genomic DNA can be amplified with a variety of whole genome amplification methods prior to CGH
analysis of the sample.
[00197] The different basic approaches to array CGH provide different levels of performance, so some are more suitable for particular applications than others. The factors that determine performance include the magnitudes of the copy number changes, their genomic extents, the state and composition of the specimen, how much material is available for analysis, and how the results of the analysis can be used. Many applications use reliable detection of copy number changes of much less than 50%, a more stringent requirement than for other microarray technologies. Note that technical details are extremely important and different implementations of methods using the same array CGH approach can yield different levels of performance.
Various CGH methods are known in the art and are equally applicable to one or more methods of the present invention. For example, CGH methods are disclosed in U.S. Pat.
Nos. 7,034,144;
7,030,231; 7,011,949; 7,014,997; 6,977,148; 6,951,761; and 6,916,621.
[00198] The data provided by array CGH arc quantitative measures of DNA
sequence dosage.
Array CGH provides high-resolution estimates of copy number aberrations, and can be performed efficiently on many samples. The advent of array CGH technology makes it possible to monitor DNA copy number changes on a genomic scale and many projects have been launched for studying the genome in specific diseases.
[00199] In a preferred embodiment, whole genome array-based comparative genome hybridization (array CGH) analysis, or array CGH on a subset of genomic regions, can be used to efficiently interrogate human genomes for genomic imbalances at multiple loci within a single assay. The development of comparative genomic hybridization (CGH) (Kallioniemi et al, 1992, Science 258: 818-21) provided the first efficient approach to scanning entire genomes for variations in DNA copy number. The importance of normal copy number variation involving Date Recue/Date Received 2021-04-27 large segments of DNA has been unappreciated. Array CGH is a breakthrough technique in human genetics, which is attracting interest from clinicians working in fields as diverse as cancer and IVF (In Vitro Fertilization). The use of CGH microarrays in the clinic holds great promise for identifying regions of genomic imbalance associated with disease. Advances from identifying chromosomal critical regions associated with specific phenotypes to identifying the specific dosage sensitive genes can lead to therapeutic opportunities of benefit to patients. Array CGH is a specific, sensitive and rapid technique that can enable the screening of the whole genome in a single test. It can facilitate and accelerate the screening process in human genetics and is expected to have a profound impact on the screening and counseling of patients with genetic disorders. It is now possible to identify the exact location on the chromosome where an aberration has occurred and it is possible to map these changes directly onto the genomic sequence.
[00200] An array CGH approach provides a robust method for carrying out a genome-wide scan to find novel copy number variants (CNVs). The array CGH methods can use labeled fragments from a genome of interest, which can be competitively hybridized with a second differentially labeled genome to arrays that are spotted with cloned DNA fragments, revealing copy-number differences between the two genomes. Genomic clones (for example, BACs), cDNAs, PCR
products and oligonucleotides, can all be used as array targets. The use of array CGH with BACs was one of the earliest employed methods and is popular, owing to the extensive coverage of the genome it provides, the availability of reliable mapping data and ready access to clones. The last of these factors is important both for the array experiments themselves, and for confirmatory FISH experiments.
[00201] In a typical CGH measurement, total genomic DNA is isolated from control and reference subjects, differentially labeled, and hybridized to a representation of the genome that allows the binding of sequences at different genomic locations to be distinguished. More than two genomes can be compared simultaneously with suitable labels. Hybridization of highly repetitive sequences is typically suppressed by the inclusion of unlabeled Cot-1 DNA in the reaction. In some embodiments of array CGH, it is beneficial to mechanically shear the genomic DNA sample, for example, with sonication, prior to its labeling and hybridization step. In another embodiment, array CGH may be performed without use of Cot-1 DNA or a sonication step in the preparation of the genomic DNA sample. The relative hybridization intensity of the test and reference signals at a given location can be proportional to the relative copy number of those sequences in the test and reference genomes. If the reference genome is normal then increases and decreases in signal intensity ratios directly indicate DNA copy number variation within the genome of the test cells. Data are typically normalized so that the modal ratio for the genome is set to some standard value, typically 1.0 on a linear scale or 0.0 on a logarithmic scale. Additional measurements such as FISH or flow cytometry can be used to determine the actual copy number associated with a ratio level.
[00202] In some embodiments, an array CGH procedure can include the following steps. First, large-insert clones, for example, BACs can be obtained from a supplier of clone libraries. Then, small amounts of clone DNA can be amplified, for example, by degenerate oligonucleotide-primed (DOP) PCR or ligation-mediated PCR in order to obtain sufficient quantities needed for spotting. Next, PCR products can be spotted onto glass slides using, for example, microarray robots equipped with high-precision printing pins. Depending on the number of clones to be spotted and the space available on the microarray slide, clones can either be spotted once per array or in replicate. Repeated spotting of the same clone on an array can increase precision of the measurements if the spot intensities are averaged, and allows for a detailed statistical analysis of the quality of the experiments. Subject and control DNAs can be labeled, for example, with either Cy3 or Cy5-dUTP using random priming and can be subsequently hybridized onto the microarray in a solution containing an excess of Cotl-DNA to block repetitive sequences.
Hybridizations can either be performed manually under a coverslip, in a gasket with gentle rocking or, automatically using commercially available hybridization stations.
These automated hybridization stations can allow for an active hybridization process, thereby improving the reproducibility as well as reducing the actual hybridization time, which increases throughput.
The hybridized DNAs can detected through the two different fluorochromes using standard microarray scanning equipment with either a scanning confocal laser or a charge coupled device (CCD) camera-based reader, followed by spot identification using commercially or freely available software packages.
[00203] The use of CGH with arrays that comprise long oligonucleotides (60-100 bp) can improve the detection resolution (in some embodiments, as small as ¨3-5 kb sized CNVs on arrays designed for interrogation of human whole genomes) over that achieved using BACs (limited to 50-100 kb or larger sized CNVs due to the large size of BAC
clones). In some embodiments, the resolution of oligonucleotide CGH arrays is achieved via in situ synthesis of 1-2 million unique features/probes per microarray, which can include microarrays available from Roche NimbleGen and Agilent Technologies. In addition to array CGH methods for copy number detecton, other embodiments for partial or whole genome analysis of CNVs within a genome include, but are not limited to, use of SNP genotyping microarrays and sequencing methods.

[00204] Another method for copy number detection that uses oligonucleotides can be representational oligonucleotide microarray analysis (ROMA). It is similar to that applied in the use of BAC and CGH arrays, but to increase the signal-to-noise ratio, the 'complexity' of the input DNA is reduced by a method called representation or whole-genome sampling. Here, the DNA that is to be hybridized to the array can be treated by restriction digestion and then ligated to adapters, which results in the PCR-based amplification of fragments in a specific size-range.
As a result, the amplified DNA can make up a fraction of the entire genomic sequence ¨ that is, it is a representation of the input DNA that has significantly reduced complexity, which can lead to a reduction in background noise. Other suitable methods available to the skilled person can also be used, and are within scope of the present disclosure.
[00205] A comparison of one or more genomes relative to one or more other genomes with array CGH, or a variety of other CNV detection methods, can reveal the set of CNVs between two genomes, between one genome in comparison to multiple genomes, or between one set of genomes in comparison to another set of genomes. In some embodiments, an array CGH
experiment can be performed by hybrizing a single test genome against a pooled sample of two or more genomes, which can result in minimizing the detection of higher frequency variants in the experiment. In some embodiments, a test genome can be hybridized alone (i.e., one-color detetion) to a microarray, for example, using array CGH or SNP genotyping methods, and the comparison step to one or more reference genomes can be performed in silico to reveal the set of CNVs in the test genome relative to the one or more reference genomes. In one preferred embodiment, a single test genome is compared to a single reference genome in a 2-color experiment wherein both genomes are cohybridized to the microarray.
[00206] Array CGH can be used to identify genes that are causative or associated with a particular phenotype, condition, or disease by comparing the set of CNVs found in the affected cohort to the set of CNVs found in an unaffected cohort. An unaffected cohort may consist of any individual unaffected by the phenotype, condition, or disease of interest, but in one preferred embodiment is comprised of individuals or subjects that are apparently healthy (normal).
Methods employed for such analyses are described in US Patent Nos.: 7,702,468 and 7,957,913.
In some embodiments of CNV comparison methods, candidate genes that are causative or associated (i.e., potentially serving as a biomarker) with a phenotype, condition, or disease will be identified by CNVs that occur in the affected cohort but not in the unaffected cohort. In some embodiments of CNV comparison methods, candidate genes that are causative or associated (i.e., potentially serving as a biomarker) with a phenotype, condition, or disease will be identified by CNVs that occur at a statistically significant higher frequency in the affected cohort as compared their frequency in the unaffected cohort. Thus, CNVs preferentially detected in the affected cohort as compared to the unaffected cohort can serve as beacons of genes that are causative or associated with a particular phenotype, condition, or disease. In some embodiments, CNV
detection and comparison methods can result in direct identification of the gene that is causative or associated with phenotype, condition, or disease if the CNVs are found to overlap with or encompass the gene(s). In some embodiments, CNV detection and comparison methods can result in identification of regulatory regions of the genome (e.g., promoters, enhancers, transcription factor binding sites) that regulate the expression of one or more genes that are causative or associated with the phenotype, condition, or disease of interest.
[00207] Due to the large amount of genetic variation between any two genomes, or two sets (cohorts) of genomes, being compared, one preferred embodiment is to reduce the genetic variation search space by interrogating only CNVs, as opposed to the full set of genetic variants that can be identified in an individual's genome or exome. The set of CNVs that occur only, or at a statistically higher frequency, in the affected cohort as compared to the unaffected cohort can then be further investigated in targeted sequencing experiments to reveal the full set of genetic variants (of any size or type) that are causative or associated (i.e., potentially serving as a biomarker) with a phenotype, condition, or disease. It can be appreciated to those skilled in the art that the targeted sequencing experiments are performed in both the affected and unaffected cohorts in order to identify the genetic variants (e.g., SNVs and indels) that occur only, or at a statistically significant higher frequency, in the affected individual or cohort as compared to the unaffected cohort.
[00208] When investigating a particular phenotype, condition, or disease, such as ASD, it can be appreciated by those skilled in the art that the number of ASD candidate genes (or regulatory sequences) identified via CNV (or other variant types) detection methods may increase or decrease when additional ASD cohorts are analyzed. Similarly, the number of ASD candidate genes (or regulatory sequences), for example, identified via CNV (or other variant types) detection methods may increase or decrease when additional unaffected cohorts are used to interpret the affected cohort CNVs (or other variat types). For very rare CNVs (e.g., <0.1%
frequency in the general population), only a single case may be observed in a given ASD cohort (e.g., 100 cases) but further statistical significance or evidence for the gene (or regulatory sequence/locus in the genome) can be established by: 1) CNV analysis of additional ASD
cohorts, 2) CNV analysis of additional Normal cohorts, 3) targeted gene sequencing of both ASD and Normal cohorts, and/or 4) functional characterization of the ASD
candidate gene (e.g., in silico analysis of the predicted impact of the candidate mutation on the gene product, RNAi knockdown experiments, biochemical assays on ASD patient tissue, gene expression analysis of disease-relevant tissues or of induced pluripotent stem cells (iPSCs) created from the ASD
patient(s) harboring the candidate ASD-causing genetic variant).
[00209] It can be appreciated by those skilled in the art that a candidate gene may validate as causative of the phenotype, condition, or disease (e.g., ASD), which may, for example, be confirmed via mechansism of action experiments, or it may serve as a biomarker of the phenotype, condition, or disease. Thus, in the example of ASD, in some embodiments, the ASD-specific gene (or regulatory sequence/locus) may be a biomarker of age-of-onset for ASD and disease severity, and thus have diagnostic utility for monitoring patients known to be at risk for ASD or as a general screening test in the population for early diagnosis of the disease. In some embodiments, the ASD-specific gene/biomarker may be an indicator of drug response (e.g., a particular subtype of ASD may respond best to a therapeutic targeting a particular phenotype, causative gene, or other gene in the same pathway as the causative gene) and thus have utility during drug development in clinical trials. For example, clinical trials for a therapeutic that targets a ASD genetic subtype comprising only 10% of all patients exhibiting symptoms of ASD, can be designed to comprise only those 10% of patients with a specific genotype(s) in order to reduce the time and cost of such clinical trials (e.g., smaller number of patients in the clinical trial). It can be appreciated by those skilled in the art that such patient stratification methods (i.e., specific genotypes correlated with the disease or drug response) can be employed not only for targeted therapeutics, but in general for any drug that is approved or in development (i.e., the mechanism of action may or may not be known). For example, drugs in development or approved to treat, for example, cancer, may have utility in being repurposed to treat ASD. Such patient stratification methods can also be utilized to develop a companion diagnostic test (e.g., comprising the specific genes/genotypes found in patients that are indicative of drug response) for a particular drug, either concurrently during the clinical trials for the drug or after drug approval (e.g., as a new indication or for the physician to use in guiding medical decisions for the patient).
[00210] Further neurodevelopmental and/or links to ASD pathology can be established via pathway analysis of the genes, which may take into consideration binding interactions (e.g., via yeast 2-hybrid screen) and molecular events (e.g., kinase activity or other enzymatic processes) if such information is available for the gene(s) of interest (i.e., specified in the analysis). Both commercial (e.g., Ingenuity's IPA software and Thomson Reuter's GeneGo software) and open source software (e.g., String: string-db.org/) are available for such analyses. To assess connections to established ASD biology, analyses can be performed for the set of candidate ASD

genes independently or against known causative ASD genes singly or as a group.
In some embodiments, ASD candidate genes can be distributed into 5 main categories: 1) genes with neuroprotective function, 2) neuropsychiatric genes, some of which are known drug targets 3) genes linked to a known causative ASD gene (e.g., binding partner) or a novel gene family member of a known ASD gene, 4) genes linked to neurodevelopmental regulation, neurogenesis, and G-protein signaling pathways, and 5) other (e.g., established role in other diseases with no obvious neurodevelopmental biology, such as cancer) or unknown gene function (e.g., limited or no gene information presently annotated for the ASD-specific gene).
[00211] A method of screening a subject for a disease or disorder can comprise assaying a nucleic acid sample from the subject to detect sequence information for more than one genetic locus and comparing the sequence information to a panel of nucleic acid biomarkers and screening the subject for the presence or absence of the disease or disorder if one or more of low frequency biomarkers in the panel are present in the sequence information.
[00212] The panel can comprise at least one nucleic acid biomarker for each of the more than one genetic loci. For example, the panel can comprise 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 3, 14, 15, 15, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 75, 100, 150, 200 or more nucleic acid biomarkers for each of the more than one genetic loci. The panel can comprise at least 25 low frequency biomarkers. For example, the panel can comprise at least 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 135, 150, 175, 200, 250, 500, or 1000 or more low frequency biomarkers. In some embodiments, the panel can comprise from about 2-1000 nucleic acid biomarkers. For example, the panel can comprise from about 2-900, 2-800, 2-700, 2-600, 2-500, 2-400, 2-300, 2-200, 2-100, 25-900, 25-800, 25-700, 25-600, 25-500, 25-400, 25-300, 25-200, 25-100, 100-1000, 100-900, 100-800, 100-700, 100-600, 100-500, 100-400, 100-300, 100-200, 200-1000, 200-900, 200-800, 200-700, 200-600, 200-500, 200-400, 200-300, 300-1000, 300-900, 300-800, 300-700, 300-600, 300-500, 300-400, 400-1000, 400-900, 400-800, 400-700, 400-600, 400-500, 500-1000, 500-900, 500-800, 500-700, 500-600, 600-1000, 600-900, 600-800, 600-700, 700-1000, 700-900, 700-800, 800-1000, 800-900, or 900-1000 nucleic acid biomarkers.
[00213] The panel can comprise at least 2 low frequency biomarkers. For example, the panel can comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 3, 14, 15, 15, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 75, 100, 150, 200, 250, 500, or 1000 or more low frequency biomarkers. In some embodiments, the panel can comprise from about 2-1000 low frequency biomarkers. For example, the panel can comprise from about 2-900, 2-800, 2-700, 2-600, 2-500, 2-400, 2-300, 2-200, 2-100, 25-900, 25-800, 25-700, 25-600, 25-500, 25-400, 25-300, 25-200, 25-100, 100-1000, 100-900, 100-800, 100-700, 100-600, 100-500, 100-400, 100-300, 100-200, 200-1000, 200-900, 200-800, 200-700, 200-600, 200-500, 200-400, 200-300, 300-1000, 300-900, 300-800, 300-700, 300-600, 300-500, 300-400, 400-1000, 400-900, 400-800, 400-700, 400-600, 400-500, 500-1000, 500-900, 500-800, 500-700, 500-600, 600-1000, 600-900, 600-800, 600-700, 700-1000, 700-900, 700-800, 800-1000, 800-900, or 900-1000 1000 low frequency biomarkers. In some embodiments, a low frequency biomarker can occur at a frequency of 0.1% or less in a population of subjects without a diagnosis of the disease or disorder. For example, a low frequency biomarker can occur at a frequency of 0.05%, 0.01%, 0.005%, 0.001%, 0.0005%, 0.0001%, 0.00005%, or 0.00001% or less in a population of subjects without a diagnosis of the disease or disorder. In some embodiments, a low frequency biomarker can occur at a frequency from about 0.00001% -0.1% in a population of subjects without a diagnosis of the disease or disorder. For example, a low frequency biomarker can occur at a frequency of from about 0.00001% -0.00005%, 0.00001%-0.0001%, 0.00001% -0.0005%, 0.00001% -0.001%, 0.00001%
-0.005%, 0.00001% -0.01%, 0.00001% -0.05%, 0.00005%-0.0001%, 0.00005% -0.0005%, 0.00005% -0.001%, 0.00005% -0.005%, 0.00005% -0.01%, 0.00005% -0.05%, 0.00005%
-0.1%, 0.0001% -0.0005%, 0.0001% -0.001%, 0.0001% -0.005%, 0.0001% -0.01%, 0.0001% -0.05%, 0.0001% -0.1%, 0.0005% -0.001%, 0.0005% -0.005%, 0.0005% -0.01%, 0.0005% -0.05%, 0.0005% -0.1%, 0.001% -0.005%, 0.001% -0.01%, 0.001% -0.05%, 0.001% -0.1%, 0.005% -0.01%, 0.005% -0.05%, 0.005% -0.1%, 0.01% -0.05%, 0.01% -0.1%, or 0.05%-0.1% in a population of subjects without a diagnosis of the disease or disorder [00214] In some embodiments, the presence or absence of the disease or disorder in the subject can be determined with at least 50% confidence. For example, the presence or absence of the disease or disorder in the subject can be determined with at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99%, or 100% confidence. In some embodiments, the presence or absence of the disease or disorder in the subject can be determined with a 50%400%
confidence. For example, the presence or absence of the disease or disorder in the subject can be determined with a 60%-100%, 70%-100%, 80%-100%, 90%-100%, 50%-90%, 50%-80%, 50%-70%, 50%-60%, 60%-90%, 60%-80%, 60%-70%, 70%-90%, 70%-80%, or 80%-90%. In one embodiement, ASD candidate CNV-subregions and genes associated with these regions can be determined or identified by comparing genetic data from a cohort of normal individuals NYE)( to that of a cohort of individuals known to have, or be susceptible to a developmental disorder such as ASD.
100215] In some embodiments, genomic DNA samples from individuals within an NVE
(reference) and an ASD (test) can be hybridized against one or more sex-matched reference individuals. For example, reference DNA samples can be labeled with a fluorophore such as Cy5, using methods described herein, and test subject DNA samples can be labeled with a different fluorophore, such as Cy3. After labeling, samples can be combined and can be co-hybridized to a microaffay and analyzed using any of the methods described herein, such as aCGH.
[00216] Arrays can then be scanned and the data can be analyzed with software.
Genetic alterations, such as CNVs, can be called using any of the methods described herein. A list of the genetic alterations, such as CNVs, can be generated for each cohort. The list of CNVs can be used to generate a master list of non-redundant CNVs and/or CNV-subregions for each cohort.
The list can be based on the presence or absence of the CNV-subregion in individuals within the cohort. In this manner, the master list can contain a number of distinct CNV-subregions, some of which are uniquely present in a single individual and some of which are present in multiple individuals.
[00217] In some embodiments, CNV-subregions of interest can be obtained by annotation of each CNV-subregion with relevant information, such as overlap with known genes and/or exons.
In some embodiments, CNV-subregions of interest can be obtained bycalculating the OR for a CNV-subregion according to the following formula: OR=(ASID/((# individuals in ASD cohort) -ASD))/(NVE/((# individuals in NVE cohort) - NVE)), where: ASD = number of ASD
individuals with a CNV-subregion of interest and NVE = number of NVE
individuals with the CNV-subregion of interest. If NVE=0, it can be set to 1 to avoid dealing with infinities in cases where no CNVs are seen in the NVE. In some embodiments, a set of publicly available CNVs (e.g., the Database of Genomic Variants, http://projects.tcag.calvariation/) can be used as the Normal cohort for comparison to the affected cohort CNVs. In another embodiment, the set of Normal cohort CNVs may comprise a private database generated by the same CNV detection method, such as array CGH, or by a plurality of CNV detection methods that include, but are not limited to, array CGH, SNP genotyping arrays, custom CGH arrays, custom genotyping arrays, exome sequencing, whole genome sequencing, targeted sequencing, FISH, q-PCR, or MLPA.
[00218] The number of individuals in any given cohort can be at least about 10, 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2500, 5000, 7500, 10,000, 100,000, or more. in some embodiments, the number of individuals in any given cohort can be from 25-900, 25-800, 25-700, 25-600, 25-500, 25-400, 25-300, 25-200, 25-100, 100-1000, 100-900, 100-800, 100-700, 100-600, 100-500, 100-400, 100-300, 100-200, 200-1000, 200-900, 200-800, 200-700, 200-600, 200-500, 200-400, 200-300, 300-1000, 300-900, 300-800, 300-700, 300-600, 300-500, 300-400, 400-1000, 400-900, 400-800, 400-700, 400-600, 400-500, 500-1000, 500-900, 500-800, 500-700, 500-600, 600-1000, 600-900, 600-800, 600-700, 700-1000, 700-900, 700-800, 800-1000, 800-900, or 900-1000.
[00219] Different categories for CNVs of interest can be defined. In some embodiments, CNVs can be of interest if the CNVs are rare in the general population or in a cohort of individuals without the disease or condition of interest. In another embodiment, CNVs can be of interest if they are found only in those affected by a disease or condition and not in those without the disease or condition. In another embodiment, CNVs can be of interest if they are found at much greater frequency in those affected by the disease or condition as compared to those without the disease or condition.
[00220] Different categories for CNVs of interest can be defined. In some embodiments, CNVs/CNV-subregions can be of interest if the CNVs/CNV-subregions occur in the offspring of two parents, neither of whom has the relevant CNV. In some embodiments, CNVs/CNV-subregions can be of interest if the CNVs/CNV-subregions affect exons only, introns only, or exons and/or introns. In some embodiments, CNVs/CNV-subregions can be of interest if the CNVs/CNV-subregions are overlapping and/or non-overlapping within the same gene or regulatory locus. In some embodiments, CNVs/CNV-subregions can be of interest if the CNVs/CNV-subregions include regions present at high frequency in the ASD cohort compared to the normal cohort. In some embodiments, CNVs/CNV-subregions can be of interest if the CNVs/CNV-subregions occur in 2 or more ASD individuals affecting different exons of the same gene. In some embodiments, CNVs/CNV-subregions can be of interest if the CNVs/CNV-subregions occur in 2 or more ASD individuals affecting the same exon of a gene. In some embodiments, CNVs/CNV-subregions can be of interest if the CNVs/CNV-subregions have a relationship to genes with strong biological evidence in ASD. In some embodiments, CNVs can be of interest if the CNVs are associated with an OR greater than 0.5, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 12, 14, 16, 18, 20, 25, 30, 35, 40, 45, 50, or more. In some embodiments, CNVs can be of interest if the CNVs are associated with an OR from about 2.8-100, 2.8-50, 2.8-40, 2.8-30, 2.8-20, 2.8-10, 2.8-9, 2.8-8, 2.8-7, 5-100, 5-50, 5-40, 5-30, 5-20, 5-10, 10-100, 10-50, 10-40, 10-30, 10-20, 20-100, 20-50, 20-40, 20-30, 30-100, 30-50, 30-40, 40-100, 40-50, or 50-100.
[00221] The data presented herein was generated on the basis of a comparison of CNVs/CNV-subregions identified in an ASD cohort. CNV/CNV-subregion genome locations are provided using the Human Mar. 2006 (NCBI36/hg18) assembly. It can be appreciated by those skilled in the art that a CNV/CNV-subregion found in an affected individual may have one or more CNVs/CNV-subregions that are preferentially found in the affected cohort as compared to the unaffected cohort and, similarly, other CNVs/CNV-subregions that are found at comparable frequencies, or not statistically significant different frequencies, in the affected and unaffected cohorts. In a preferred embodiment, CNV/CNV-subregion detection and analysis methods are employed that enable comparison of CNVs/CNV-subregions to facilitate identification of genes (or regulatory loci) that are causative or associated with the phenotype, condition, or disease being investigated (or detected for diagnostic purposes). In Tables 1 and 5, SEQ IDs 1-643 and 2418-2557 refer to the CNV sequences (full sequence obtained for the whole CNV). In Tables 4 and 7, SEQ IDs 644-2417 and 2558-2739 refer to the genomic sequences over which the relevant transcripts extend (full genomic extent of the transcripts, not just the short sequence associated with the mRNA).

SEQ ID No Chr Orig CNV Orig CNV Orig CNV ASD RefSeq Gene Symbol(s) Category OR 0 i.) Start Stop CNV type Case -, Size ID(s) w , -, SEQ ID 1 17 77787243 77847938 60695 Loss 1891 SLC16A3, CSNK1D De Novo NA i..) =
=
SEQ ID 2 17 76954271 77777066 822795 Gain 1891 C17orf70, ACTG1, TSPAN10, De Novo NA .
ao DCXR, C17orf90, STRA13, ARL16, MIR3186, NPLOC4, PYCR1, SLC25A10, GPS1, DUS1L, ANAPC 1 1, L0C92659, FASN, ARHGDIA, MAFG, BAHCC1, DYSFIP1, MRPL12, SIRT7, RAC3, CCDC57. P4HB, PdYT2, HGS, Y RFNG, M' ADL2, FSCN2, P
THOC4, ASPSCR1, CCDC137, .
NOTUM, NPB, PDE6G, LRRC45 .

SEQ ID 3 5 180189516 180362342 172826 loss 1229 BTNL8, BTNL3, LOC729678, Exon+ve, >2 cases 59.24223602 2 SEQ ID 3 5 180189516 180362342 172826 loss 1548 BTNL8, BTNL3, L00729678, Exon+ve, >2 cases 59.24223602 ' o, SEQ ID 4 5 180189516 180365977 176461 loss 1532 BTNL8, BTNL3, LOC729678, Exon+ve, >2 cases 59.24223602 SEQ ID 5 5 180346557 180365977 19420 Loss 1540 BTNL3 Ctrl pos High OR 59.24223602 SEQ ID 5 5 180346557 180365977 19420 Loss 1754 BTNL3 Ctrl pos High OR 59.24223602 SEQ ID 5 5 180346557 180365977 19420 Loss 1755 BTNL3 Ctrl pos High OR 59.24223602 SEQ ID 6 5 180344964 180365977 21013 Loss 1261 BTNL3 Ctrl pos High OR 59.24223602 SEQ ID 6 5 180344964 180365977 21013 Loss 1265 BTNL3 Ctrl pos High OR 59.24223602 .o SEQ ID 6 5 180344964 180365977 21013 Loss 1438 BTNL3 Ctrl pos High OR 59.24223602 n SEQ ID 6 5 ,180344964 , 180365977 21013 ,Loss 1467 , BTNL3 Ctrl pos High OR 59.24223602, ci) SEQ ID 6 5 180344964 180365977 21013 Loss 1568 BTNL3 Ctrl pos High OR 59.24223602 6' -, SEQ ID 6 5 180344964 180365977 21013 Loss 1570 BTNL3 Ctrl pos High OR 59.24223602 w =-==
SEQ ID 6 5 180344964 180365977 21013 Loss 1662 BTNL3 Ctrl pos High OR 59.24223602 ul .1, C.AJ

a SEQ ID 6 5 180344964 180365977 21013 Loss 1671 BTNL3 Ctrl pos High OR 59.24223602 SEQ ID 6 5 180344964 180365977 21013 Loss 1726 BTNL3 Ctrl pos High OR 59.24223602 SEQ ID 6 5 180344964 180365977 21013 Loss 1769 BTNL3 Ctrl pos High OR 59.24223602 t-) SEQ ID 6 5 180344964 180365977 21013 Loss 1799 BTNL3 Ctrl pos High OR 59.24223602 SEQ ID 7 5 180346557 180378586 32029 Loss 1942 BTNL3 Ctrl pos High OR 59.24223602 SEQ ID 8 5 180344964 180378586 33622 Loss 1268 BTNL3 Ctrl pos High OR 59.24223602 SEQ ID 8 5 180344964 180378586 33622 Loss 1354 BTNL3 Ctrl pos High OR 59.24223602 ao SEQ ID 8 5 180344964 180378586 33622 Loss 1463 BTNL3 Ctrl pos High OR 59.24223602 SEQ ID 8 5 180344964 180378586 33622 Loss 1849 BTNL3 Ctrl pos High OR 59.24223602 SEQ ID 9 5 180344964 180379663 34699 Loss 1277 BTNL3 Ctrl pos High OR 59.24223602 SEQ ID 10 5 180189516 180357210 167694 loss 1861 BTNL8, BTNL3, LOC729678, Exon+ve, >2 cases 59.24223602 SEQ ID 11 5 180192214 180362342 170128 gain 1316 BTNL8, BTNL3, LOC729678, Exon+ve, >2 cases 59.24223602 SEQ ID 11 5 180192214 180362342 170128 loss 1580 BTNL8, BTNL3, LOC729678, Exon+ve, >2 cases 59.24223602 SEQ ID 11 5 180192214 180362342 170128 loss 1641 BTNL8, BTNL3, LOC729678, Exon+ve, >2 cases 59.24223602 SEQ ID 12 5 180194323 180365977 171654 Loss 1546 BTNL8, BTNL3, L00729678, Ctrl pos High OR 59.24223602 -SEQ ID 12 5 180194323 180365977 171654 Loss 1696 BTNL8, BTNL3, LOC729678, Ctrl pos High OR 59.24223602 SEQ ID 12 5 180194323 180365977 171654 Loss 1792 BTNL8, BTNL3, L00729678, Ctrl pos High OR 59.24223602 SEQ ID 12 5 180194323 180365977 171654 Loss 1927 BTNL8, BTNL3, L00729678, Ctrl pos High OR 59.24223602 SEQ ID 13 5 180192214 180365977 173763 loss 1606 BTNL8, BTNL3, L00729678, Exon+ve, >2 cases 59.24223602 SEQ ID 4 5 180189516 180365977 176461 loss 1612 BTNL8, BTNL3, LOC729678, Exon+ve, >2 cases 59.24223602 ci) SEQ ID 4 5 180189516 180365977 176461 loss 1686 BTNL8, BTNL3, LOC729678, Exon+ve, >2 cases 59.24223602 JI
r.) SEQ ID 14 5 180194323 180378586 184263 Loss 1429 BTNL8, BTNL3, L00729678, Ctrl pos High OR 59.24223602 . . . . .
. 0 SEQ ID 14 5 180194323 180378586 184263 Loss 1634 BTNL8, BTNL3, LOC729678, Ctrl pos High OR 59.24223602 =

.., w , SEQ ID 14 5 180194323 180378586 184263 Loss 1851 BTNL8, BTNL3, L00729678, Ctrl pos High OR 59.24223602 ¨, i..) =

SEQ ID 14 5 180194323 180378586 184263 Loss 1902 BTNL8, BTNL3, L00729678, Ctrl pos High OR 59.24223602 ao SEQ ID 15 7 147704200 147708382 4182 Loss 1371 CNTNAP2 Ctrl pos High OR 46.19631902 SEQ ID 15 7 147704200 147708382 4182 Loss 1617 CNTNAP2 Ctrl pos High OR 46.19631902 SEQ ID 15 7 147704200 147708382 4182 Loss 1803 CNTNAP2 Ctrl pos High OR 46.19631902 SEQ ID 16 7 147704200 147710037 5837 Loss 1227 CNTNAP2 Ctrl pos High OR 46.19631902 SEQ ID 16 7 147704200 147710037 5837 Loss 1346 CNTNAP2 Ctrl pos High OR 46.19631902 SEQ ID 16 ,7 147704200 147710037 ,5837 ,Loss ,1517 CNTNAP2 ,Ctrl pos High OR ,46.19631902 P
SEQ ID 16 7 147704200 147710037 5837 Loss 1621 CNTNAP2 Ctrl pos High OR 46.19631902 2 SEQ ID 16 7 147704200 147710037 5837 Loss 1636 CNTNAP2 Ctrl pos High OR 46.19631902 .9 SEQ ID 16 7 147704200 147710037 5837 Loss 1639 CNTNAP2 Ctrl pos High OR 46.19631902 ..

, SEQ ID 16 7 147704200 147710037 5837 Loss 1645 CNTNAP2 Ctrl pos High OR 46.19631902 ,.
SEQ ID 16 7 147704200 147710037 5837 Loss 1670 CNTNAP2 Ctrl pos High OR 46.19631902 .
, ..
SEQ ID 16 7 147704200 147710037 5837 Loss 1727 CNTNAP2 Ctrl pos High OR 46.19631902 ' ., SEQ ID 16 7 147704200 147710037 5837 Loss 1753 CNTNAP2 Ctrl pos High OR 46.19631902 SEQ ID 16 7 147704200 147710037 5837 Loss 1754 CNTNAP2 Ctrl pos High OR 46.19631902 SEQ ID 16 7 147704200 147710037 5837 Loss 1761 CNTNAP2 Ctrl pos High OR 46.19631902 SEQ ID 16 7 147704200 147710037 5837 Loss 1792 CNTNAP2 Ctrl pos High OR 46.19631902 SEQ ID 16 7 147704200 147710037 5837 Loss 1806 CNTNAP2 Ctrl pos High OR 46.19631902 SEQ ID 16 7 147704200 147710037 5837 Loss 1820 CNTNAP2 Ctrl pos High OR 46.19631902 SEQ ID 16 7 147704200 147710037 5837 Loss 1826 CNTNAP2 Ctrl pos High OR 46.19631902 .o SEQ ID 16 7 147704200 147710037 5837 Loss 1836 CNTNAP2 Ctrl pos High OR 46.19631902 n SEQ ID 16 7 147704200 147710037 5837 Loss 1854 CNTNAP2 Ctrl pos High OR 46.19631902 ci) SEQ ID 16 7 147704200 147710037 5837 Loss 1867 CNTNAP2 Ctrl pos High OR 46.19631902 =
.., SEQ ID 16 7 147704200 147710037 5837 Loss 1872 CNTNAP2 Ctrl pos High OR 46.19631902 w SEQ ID ID 16 7 147704200 147710037 5837 Loss 1916 CNTNAP2 Ctrl pos High OR 46.19631902 r.) ui .1, C.AJ

a SEQ ID 16 7 147704200 147710037 5837 Loss 1918 CNTNAP2 Ctrl pos High OR 46.19631902 SEQ ID 16 7 147704200 147710037 5837 Loss 1960 CNTNAP2 Ctrl pos High OR 46.19631902 SEQ ID 16 7 147704200 147710037 5837 Loss 2003 CNTNAP2 Ctrl pos High OR 46.19631902 t-) a SEQ ID 16 7 147704200 147710037 5837 Loss 2028 CNTNAP2 Ctrl pos High OR 46.19631902 -, w , SEQ ID 16 7 147704200 147710037 5837 Loss 2041 CNTNAP2 Ctrl pos High OR 46.19631902 -, t..) a SEQ ID 17 7 147702365 147710037 7672 Loss 1728 CNTNAP2 Ctrl pos High OR 46.19631902 ao SEQ ID 18 15 99632987 99635701 2714 gain 1404 SELS Exon+ve, >2 cases 41.38625954 SEQ ID 19 15 99632987 99636724 3737 gain 1728 SELS Exon+ve, >2 cases 41.38625954 SEQ ID 20 15 99634434 99635701 1267 loss 1389 SELS Exon+ve, >2 cases 41.38625954 SEQ ID 20 15 99634434 99635701 1267 gain 1401 SELS Exon+ve, >2 cases 41.38625954 SEQ ID 20 15 99634434 99635701 1267 loss 1413 SELS Exon+ve, >2 cases 41.38625954 SEQ ID 20 ,15 99634434 99635701 ,1267 ,loss , 1416 SELS Exon+ve, >2 cases 41.38625954 SEQ ID 20 15 99634434 99635701 1267 gain 1434 SELS Exon+ve, >2 cases 41.38625954 SEQ ID 20 15 99634434 99635701 1267 loss 1446 SELS Exon+ve, >2 cases 41.38625954 P
SEQ ID 20 15 99634434 99635701 1267 loss 1449 SELS Exon+ve, >2 cases 41.38625954 .
SEQ ID 20 15 99634434 99635701 1267 loss 1461 SELS Exon+ve, >2 cases 41.38625954 .

SEQ ID 20 15 ,99634434 , 99635701 1267 ,loss 1477 ,SELS Exon+ve, >2 cases 41.38625954 2 SEQ ID 20 15 99634434 99635701 1267 loss 1505 SELS Exon+ve, >2 cases 41.38625954 SEQ ID 20 15 99634434 99635701 1267 loss 1529 SELS Exon+ve, >2 cases 41.38625954 ' SEQ ID 20 15 99634434 99635701 1267 loss 1548 SELS Exon+ve, >2 cases 41.38625954 , o, SEQ ID 20 15 99634434 99635701 1267 loss 1559 SELS Exon+ve, >2 cases 41.38625954 SEQ ID 20 15 99634434 99635701 1267 loss 1572 SELS Exon+ve, >2 cases 41.38625954 SEQ ID 20 15 99634434 99635701 1267 gain 1576 SELS Exon+ve, >2 cases 41.38625954 SEQ ID 20 15 99634434 99635701 1267 loss 1584 SELS Exon+ve, >2 cases 41.38625954 SEQ ID 20 15 99634434 99635701 1267 gain 1596 SELS Exon+ve, >2 cases 41.38625954 SEQ ID 20 15 99634434 99635701 1267 loss 1609 SELS Exon+ve, >2 cases 41.38625954 SEQ ID 20 15 99634434 99635701 1267 gain 1633 SELS Exon+ve, >2 cases 41.38625954 .o n SEQ ID 20 15 99634434 99635701 1267 loss 1672 SELS Exon+ve, >2 cases 41.38625954 -3 SEQ ID 20 15 99634434 99635701 1267 loss 1687 SELS Exon+ve, >2 cases 41.38625954 ci) SEQ ID 20 15 99634434 99635701 1267 loss 1829 SELS Exon+ve, >2 cases 41.38625954 t.) a -, SEQ ID 20 15 99634434 99635701 1267 gain 1842 SELS Exon+ve, >2 cases 41.38625954 w -a-SEQ ID 20 15 99634434 99635701 1267 loss 1913 SELS Exon+ve, >2 cases 41.38625954 ,J1 .1, w a SEQ ID 20 15 99634434 99635701 1267 loss 1964 SELS Exon+ve, >2 cases 41.38625954 Loss 1800 MAOA Intronic 38.20395738 Loss 1842 MAOA Intronic 38.20395738 t-) a Loss 1848 MAOA Intronic 38.20395738 -, w , Loss 1855 MAOA Intronic 38.20395738 -, t..) a Loss 1859 MAOA Intronic 38.20395738 Loss 1898 MAOA Intronic 38.20395738 ao Loss 1907 MAOA Intronic 38.20395738 Loss 1916 MAOA Intronic 38.20395738 Loss 1921 MAOA Intronic 38.20395738 Loss 1935 MAOA Intronic 38.20395738 Loss 1946 MAOA Intronic 38.20395738 Loss 1958 MAOA Intronic 38.20395738 Loss 1960 MAOA Intronic 38.20395738 P

Loss 1961 MAOA Intronic 38.20395738 .

Loss 1965 MAOA Intronic 38.20395738 .

,Loss 1966 MAOA Intronic 38.20395738 ' , Loss 1967 MAOA Intronic 38.20395738 ' SEQ ID 21 X 43458232 43465307 7075 Loss 1969 MAOA Intronic 38.20395738 .
, Loss 1993 MAOA Intronic 38.20395738 .
o, Loss 2033 MAOA Intronic 38.20395738 Loss 2035 MAOA Intronic 38.20395738 Loss 1369 MAOA Intronic 38.20395738 Loss 1300 MAOA Intronic 38.20395738 Loss 1697 MAOA Intronic 38.20395738 Loss 1751 MAOA Intronic 38.20395738 204083 loss 1319 L00644246, KIAA1267 Exon+ve, >2 cases 31.89712557 .o n 205568 loss 1320 L00644246, KIAA1267 Exon+ve, >2 cases 31.89712557 SEQ ID 25 17 41508943 42142363 633420 loss 1542 NSFP1, NSF. ARL17B, Exon+ve, >2 cases 31.89712557 ci) L00644246, LRRC37A2, ARL17A, t.) a .., LRRC37A, KIAA1267 w "a-57597 loss 1656 KIAA1267 Exon+ve, >2 cases 31.89712557 t--) ul .1, w a loss 1861 K1AA1267 Exon+ve, >2 cases 31.89712557 195919 loss 1655 L00644246, KIAA1267 Exon+ve, >2 cases 31.89712557 198082 loss 1530 L00644246, KIAA1267 Exon+ve, >2 cases 31.89712557 t-) 198082 loss 1533 L00644246, KIAA1267 Exon+ve, >2 cases 31.89712557 198082 loss 1535 L00644246, KIAA1267 Exon+ve, >2 cases 31.89712557 198082 loss 1537 L00644246, KIAA1267 Exon+ve, >2 cases 31.89712557 198082 loss 1539 L00644246, KIAA1267 Exon+ve, >2 cases 31.89712557 ao 198082 loss 1586 L00644246, KIAA1267 Exon+ve, >2 cases 31.89712557 198082 loss 1684 L00644246, KIAA1267 Exon+ve, >2 cases 31.89712557 201457 loss 1587 L00644246, KIAA1267 Exon+ve, >2 cases 31.89712557 440355 gain 1991 NSF, ARL17B, NSFP1, LRRC37A2, Exon+ve, >2 cases 31.89712557 LRRC37A, ARL17A

578686 gain 2032 NSFP1, NSF, ARL17B, Exon+ve, >2 cases 31.89712557 L00644246, LRRC37A2, ARL17A, LRRC37A, KIAA1267 583402 gain 1800 NSFPI, NSF, ARL17B, Exon+ve, >2 cases 31.89712557 L00644246, LRRC37A2, ARL17A, LRRC37A, KIAA1267 627093 gain 1671 NSFP1, NSF, ARL17B, Exon+ve, >2 cases 31.89712557 L00644246, LRRC37A2, ARL17A, LRRC37A, KIAA1267 627093 gain 1751 NSFP1, NSF, ARL17B, Exon+ve, >2 cases 31.89712557 L00644246, LRRC37A2, ARL17A, LRRC37A, KIAA1267 630045 loss 1662 NSFPI, NSF, ARL17B, Exon+ve, >2 cases 31.89712557 L00644246, LRRC37A2, ARL17A, LRRC37A, KIAA1267 639623 loss 1536 NSFP1, NSF, ARL17B, Exon+ve, >2 cases 31.89712557 L00644246, LRRC37A2, ARL17A, LRRC37A, KIAA1267 ci) SEQ ID 37 7 147704200 147707161 2961 Gain 1808 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 37 7 147704200 147707161 2961 Gain 1877 CNTNAP2 Ctrl pos High OR 30.75754113 JI
r.) SEQ ID 37 7 147704200 147707161 2961 Gain 1895 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 37 7 147704200 147707161 2961 Gain 1907 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 37 7 147704200 147707161 2961 Gain 1951 CNTNAP2 Ctrl pos High OR 30.75754113 t-) =
SEQ ID 37 7 147704200 147707161 2961 Gain 1994 CNTNAP2 Ctrl pos High OR 30.75754113 .., w , SEQ ID 37 7 147704200 147707161 2961 Gain 2006 CNTNAP2 Ctrl pos High OR 30.75754113 -, t..) =
SEQ ID 15 7 147704200 147708382 4182 Gain 1220 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1223 CNTNAP2 Ctrl pos High OR 30.75754113 ao SEQ ID 15 7 147704200 147708382 4182 Gain 1230 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1234 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1240 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1252 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 ,7 147704200 147708382 ,4182 ,Gain ,1281 CNTNAP2 , Ctrl pos High OR ,30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1282 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1284 CNTNAP2 Ctrl pos High OR 30.75754113 P
SEQ ID 15 7 147704200 147708382 4182 Gain 1286 CNTNAP2 Ctrl pos High OR 30.75754113 .
SEQ ID 15 7 147704200 147708382 4182 Gain 1290 CNTNAP2 Ctrl pos High OR 30.75754113 .

SEQ ID 15 7 147704200 147708382 4182 ,Gain 1307 , CNTNAP2 Ctrl pos High OR 30.75754113, 2 SEQ ID 15 7 147704200 147708382 4182 Gain 1308 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1309 CNTNAP2 Ctrl pos High OR 30.75754113 ' SEQ ID 15 7 147704200 147708382 4182 Gain 1318 CNTNAP2 Ctrl pos High OR 30.75754113 Z
o, SEQ ID 15 7 147704200 147708382 4182 Gain 1320 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1345 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1389 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1405 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1415 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1421 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1422 CNTNAP2 Ctrl pos High OR 30.75754113 .o n SEQ ID 15 7 147704200 147708382 4182 Gain 1425 CNTNAP2 Ctrl pos High OR 30.75754113 -3 SEQ ID 15 7 147704200 147708382 4182 Gain 1432 CNTNAP2 Ctrl pos High OR 30.75754113 ci) SEQ ID 15 7 147704200 147708382 4182 Gain 1434 CNTNAP2 Ctrl pos High OR 30.75754113 t.) =
.., SEQ ID 15 7 147704200 147708382 4182 Gain 1438 CNTNAP2 Ctrl pos High OR 30.75754113 w SEQ

-SEQ ID 15 7 147704200 147708382 4182 Gain 1440 CNTNAP2 Ctrl pos High OR 30.75754113 ul .1, c..J

a SEQ ID 15 7 147704200 147708382 4182 Gain 1442 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1463 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1466 CNTNAP2 Ctrl pos High OR 30.75754113 t-) a SEQ ID 15 7 147704200 147708382 4182 Gain 1472 CNTNAP2 Ctrl pos High OR 30.75754113 .., w , SEQ ID 15 7 147704200 147708382 4182 Gain 1473 CNTNAP2 Ctrl pos High OR 30.75754113 -, t..) a SEQ ID 15 7 147704200 147708382 4182 Gain 1490 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1492 CNTNAP2 Ctrl pos High OR 30.75754113 ao SEQ ID 15 7 147704200 147708382 4182 Gain 1495 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1496 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1497 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1498 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 ,7 147704200 147708382 ,4182 ,Gain ,1502 CNTNAP2 , Ctrl pos High OR ,30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1504 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1506 CNTNAP2 Ctrl pos High OR 30.75754113 P
SEQ ID 15 7 147704200 147708382 4182 Gain 1508 CNTNAP2 Ctrl pos High OR 30.75754113 .
SEQ ID 15 7 147704200 147708382 4182 Gain 1512 CNTNAP2 Ctrl pos High OR 30.75754113 .

SEQ ID 15 7 147704200 147708382 4182 ,Gain 1513 , CNTNAP2 Ctrl pos High OR 30.75754113, 2 SEQ ID 15 7 147704200 147708382 4182 Gain 1514 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1515 CNTNAP2 Ctrl pos High OR 30.75754113 ' SEQ ID 15 7 147704200 147708382 4182 Gain 1519 CNTNAP2 Ctrl pos High OR 30.75754113 Z
o, SEQ ID 15 7 147704200 147708382 4182 Gain 1520 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1528 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1534 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1543 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1544 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1556 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1557 CNTNAP2 Ctrl pos High OR 30.75754113 .o n SEQ ID 15 7 147704200 147708382 4182 Gain 1558 CNTNAP2 Ctrl pos High OR 30.75754113 -3 SEQ ID 15 7 147704200 147708382 4182 Gain 1559 CNTNAP2 Ctrl pos High OR 30.75754113 ci) SEQ ID 15 7 147704200 147708382 4182 Gain 1560 CNTNAP2 Ctrl pos High OR 30.75754113 t.) a .., SEQ ID 15 7 147704200 147708382 4182 Gain 1565 CNTNAP2 Ctrl pos High OR 30.75754113 w SEQ ID ID 15 7 147704200 147708382 4182 Gain 1570 CNTNAP2 Ctrl pos High OR 30.75754113 ul .1, w a SEQ ID 15 7 147704200 147708382 4182 Gain 1571 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1573 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1584 CNTNAP2 Ctrl pos High OR 30.75754113 t-) a SEQ ID 15 7 147704200 147708382 4182 Gain 1586 CNTNAP2 Ctrl pos High OR 30.75754113 .., w , SEQ ID 15 7 147704200 147708382 4182 Gain 1592 CNTNAP2 Ctrl pos High OR 30.75754113 -, t..) a SEQ ID 15 7 147704200 147708382 4182 Gain 1597 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1601 CNTNAP2 Ctrl pos High OR 30.75754113 ao SEQ ID 15 7 147704200 147708382 4182 Gain 1602 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1603 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1610 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1618 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 ,7 147704200 147708382 ,4182 ,Gain ,1619 CNTNAP2 , Ctrl pos High OR ,30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1620 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1622 CNTNAP2 Ctrl pos High OR 30.75754113 P
SEQ ID 15 7 147704200 147708382 4182 Gain 1624 CNTNAP2 Ctrl pos High OR 30.75754113 .
SEQ ID 15 7 147704200 147708382 4182 Gain 1626 CNTNAP2 Ctrl pos High OR 30.75754113 .

SEQ ID 15 7 147704200 147708382 4182 ,Gain 1632 , CNTNAP2 Ctrl pos High OR 30.75754113, 2 SEQ ID 15 7 147704200 147708382 4182 Gain 1640 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1641 CNTNAP2 Ctrl pos High OR 30.75754113 ' SEQ ID 15 7 147704200 147708382 4182 Gain 1647 CNTNAP2 Ctrl pos High OR 30.75754113 Z
o, SEQ ID 15 7 147704200 147708382 4182 Gain 1650 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1653 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1654 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1662 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1667 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1688 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1707 CNTNAP2 Ctrl pos High OR 30.75754113 .o n SEQ ID 15 7 147704200 147708382 4182 Gain 1708 CNTNAP2 Ctrl pos High OR 30.75754113 -3 SEQ ID 15 7 147704200 147708382 4182 Gain 1710 CNTNAP2 Ctrl pos High OR 30.75754113 ci) SEQ ID 15 7 147704200 147708382 4182 Gain 1715 CNTNAP2 Ctrl pos High OR 30.75754113 t.) a .., SEQ ID 15 7 147704200 147708382 4182 Gain 1720 CNTNAP2 Ctrl pos High OR 30.75754113 w SEQ ID ID 15 7 147704200 147708382 4182 Gain 1755 CNTNAP2 Ctrl pos High OR 30.75754113 ul .1, w a SEQ ID 15 7 147704200 147708382 4182 Gain 1760 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1774 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1779 CNTNAP2 Ctrl pos High OR 30.75754113 t-) a SEQ ID 15 7 147704200 147708382 4182 Gain 1782 CNTNAP2 Ctrl pos High OR 30.75754113 .., w , SEQ ID 15 7 147704200 147708382 4182 Gain 1783 CNTNAP2 Ctrl pos High OR 30.75754113 -, t..) a SEQ ID 15 7 147704200 147708382 4182 Gain 1784 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1796 CNTNAP2 Ctrl pos High OR 30.75754113 ao SEQ ID 15 7 147704200 147708382 4182 Gain 1804 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1805 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1811 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1813 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 ,7 147704200 147708382 ,4182 ,Gain ,1814 CNTNAP2 , Ctrl pos High OR ,30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1815 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1818 CNTNAP2 Ctrl pos High OR 30.75754113 P
SEQ ID 15 7 147704200 147708382 4182 Gain 1831 CNTNAP2 Ctrl pos High OR 30.75754113 .
SEQ ID 15 7 147704200 147708382 4182 Gain 1832 CNTNAP2 Ctrl pos High OR 30.75754113 .

SEQ ID 15 7 147704200 147708382 4182 ,Gain 1835 , CNTNAP2 Ctrl pos High OR 30.75754113, 2 SEQ ID 15 7 147704200 147708382 4182 Gain 1838 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1839 CNTNAP2 Ctrl pos High OR 30.75754113 ' SEQ ID 15 7 147704200 147708382 4182 Gain 1845 CNTNAP2 Ctrl pos High OR 30.75754113 Z
o, SEQ ID 15 7 147704200 147708382 4182 Gain 1851 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1861 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1874 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1881 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1883 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1893 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 1905 CNTNAP2 Ctrl pos High OR 30.75754113 .o n SEQ ID 15 7 147704200 147708382 4182 Gain 1927 CNTNAP2 Ctrl pos High OR 30.75754113 -3 SEQ ID 15 7 147704200 147708382 4182 Gain 1930 CNTNAP2 Ctrl pos High OR 30.75754113 ci) SEQ ID 15 7 147704200 147708382 4182 Gain 1944 CNTNAP2 Ctrl pos High OR 30.75754113 t.) a .., SEQ ID 15 7 147704200 147708382 4182 Gain 1948 CNTNAP2 Ctrl pos High OR 30.75754113 w SEQ ID ID 15 7 147704200 147708382 4182 Gain 1970 CNTNAP2 Ctrl pos High OR 30.75754113 ul .1, w a SEQ ID 15 7 147704200 147708382 4182 Gain 1997 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 2024 CNTNAP2 Ctrl pos High OR 30.75754113 SEQ ID 15 7 147704200 147708382 4182 Gain 2026 CNTNAP2 Ctrl pos High OR 30.75754113 t-) a SEQ ID 15 7 147704200 147708382 4182 Gain 2034 CNTNAP2 Ctrl pos High OR 30.75754113 -, w , SEQ ID 38 7 147704200 147711471 7271 Gain 1423 CNTNAP2 Ctrl pos High OR 30.75754113 -, t..) a SEQ ID 39 1 85964576 85967615 3039 loss 1266 COL24A1 Exon+ve, >2 cases 28.77224736 SEQ ID 39 1 85964576 85967615 3039 loss 1283 COL24A1 Exon+ve, >2 cases 28.77224736 ao SEQ ID 39 1 85964576 85967615 3039 loss 1284 COL24A1 Exon+ve, >2 cases 28.77224736 SEQ ID 39 1 85964576 85967615 3039 loss 1308 COL24A1 Exon+ve, >2 cases 28.77224736 SEQ ID 39 1 85964576 85967615 3039 loss 1309 COL24A1 Exon+ve, >2 cases 28.77224736 SEQ ID 39 1 85964576 85967615 3039 loss 1354 COL24A1 Exon+ve, >2 cases 28.77224736 SEQ ID 39 ,1 85964576 85967615 ,3039 ,loss , 1413 COL24A1 ,Exon+ve, >2 cases ,28.77224736 SEQ ID 39 1 85964576 85967615 3039 loss 1418 COL24A1 Exon+ve, >2 cases 28.77224736 SEQ ID 39 1 85964576 85967615 3039 loss 1433 COL24A1 Exon+ve, >2 cases 28.77224736 P
SEQ ID 39 1 85964576 85967615 3039 loss 1449 COL24A1 Exon+ve, >2 cases 28.77224736 .
SEQ ID 39 1 85964576 85967615 3039 loss 1451 COL24A1 Exon+ve, >2 cases 28.77224736 .

SEQ ID 39 1 ,85964576 , 85967615 3039 ,loss 1640 ,COL24A1 Exon+ve, >2 cases 28.77224736 ' , SEQ ID 39 1 85964576 85967615 3039 loss 1781 COL24A1 Exon+ve, >2 cases 28.77224736 ' SEQ ID 39 1 85964576 85967615 3039 loss 1815 COL24A1 Exon+ve, >2 cases 28.77224736 .
, SEQ ID 39 1 85964576 85967615 3039 loss 1818 COL24A1 Exon+ve, >2 cases 28.77224736 .
o, SEQ ID 39 1 85964576 85967615 3039 loss 1929 COL24A1 Exon+ve, >2 cases 28.77224736 SEQ ID 39 1 85964576 85967615 3039 loss 1994 COL24A1 Exon+ve, >2 cases 28.77224736 SEQ ID 39 1 85964576 85967615 3039 loss 2031 COL24A1 Exon+ve, >2 cases 28.77224736 SEQ ID 39 1 85964576 85967615 3039 loss 2040 COL24A1 Exon+ve, >2 cases 28.77224736 SEQ ID 40 6 35853209 35862502 9293 loss 1940 C6orf127, C6orf126 Exon+ve, >2 cases 28.77224736 SEQ ID 41 6 35855652 35873335 17683 loss 1301 C6orf127, CLPS Exon+ve, >2 cases 28.77224736 SEQ ID 41 6 35855652 35873335 17683 loss 1837 C6orf127, CLPS Exon+ve, >2 cases 28.77224736 .o n SEQ ID 41 6 35855652 35873335 17683 loss 1839 C6orf127, CLPS Exon+ve, >2 cases 28.77224736 -3 SEQ ID 41 6 35855652 35873335 17683 loss 1952 C6orf127, CLPS Exon+ve, >2 cases 28.77224736 ci) SEQ ID 41 6 35855652 35873335 17683 loss 1959 C6orf127, CLPS Exon+ve, >2 cases 28.77224736 t.) a -, SEQ ID 42 6 35853209 35873335 20126 loss 1958 C6orf127, C6orf126, CLPS Exon+ve, >2 cases 28.77224736 w SEQ

-SEQ ID 42 6 35853209 35873335 20126 loss 1961 C6orf127, C6orf126, CLPS Exon+ve, >2 cases 28.77224736 ,J1 =P, w a SEQ ID 42 6 35853209 35873335 20126 loss 1962 C6orf127, C6orf126, CLPS Exon+ve, >2 cases 28.77224736 SEQ ID 42 6 35853209 35873335 20126 loss 2005 C6orf127, C6orf126, CLPS Exon+ve, >2 cases 28.77224736 SEQ ID 43 6 35851495 35872078 20583 loss 1852 C6orf127, C6orf126, CLPS Exon+ve, >2 cases 28.77224736 t-) a SEQ ID 44 6 35851495 35873335 21840 loss 1965 C6orf127, C6orf126, CLPS Exon+ve, >2 cases 28.77224736 -, w , SEQ ID 44 6 35851495 35873335 21840 loss 2018 C6orf127, C6orf126, CLPS Exon+ve, >2 cases 28.77224736 -, t..) a SEQ ID 45 6 35853209 35875112 21903 loss 1946 C6orf127, C6orf126, CLPS Exon+ve, >2 cases 28.77224736 SEQ ID 46 6 35851495 35875112 23617 loss 1950 C6orf127, C6orf126, CLPS Exon+ve, >2 cases 28.77224736 ao SEQ ID 47 6 35851495 35878656 27161 loss 2006 C6orf127, C6orf126, CLPS Exon+ve, >2 cases 28.77224736 SEQ ID 48 6 35849860 35878656 28796 loss 1680 C6orf127, C6orf126, CLPS Exon+ve, >2 cases 28.77224736 SEQ ID 49 6 35848099 35878656 30557 loss 1718 C6orf127, C6orf126, CLPS Exon+ve, >2 cases 28.77224736 SEQ ID 50 6 35846772 35878656 31884 loss 1694 C6orf127, C6orf126, CLPS Exon+ve, >2 cases 28.77224736 SEQ ID 51 ,12 130944468 130946248 ,1780 ,gain , 1448 ULK1 ,Exon+ve, >2 cases ,24.12012012 SEQ ID 51 12 130944468 130946248 1780 loss 1471 ULK1 Exon+ve, >2 cases 24.12012012 SEQ ID 51 12 130944468 130946248 1780 loss 1474 ULK1 Exon+ve, >2 cases 24.12012012 P
SEQ ID 51 12 130944468 130946248 1780 loss 1492 ULK1 Exon+ve, >2 cases 24.12012012 .
SEQ ID 51 12 130944468 130946248 1780 loss 1493 ULK1 Exon+ve, >2 cases 24.12012012 .

SEQ ID 51 12 ,130944468 , 130946248 1780 ,loss 1496 ,ULK1 Exon+ve, >2 cases 24.12012012 2 SEQ ID 51 12 130944468 130946248 1780 loss 1497 ULK1 Exon+ve, >2 cases 24.12012012 SEQ ID 51 12 130944468 130946248 1780 loss 1498 ULK1 Exon+ve, >2 cases 24.12012012 ' SEQ ID 51 12 130944468 130946248 1780 loss 1500 ULK1 Exon+ve, >2 cases 24.12012012 , o, SEQ ID 51 12 130944468 130946248 1780 loss 1505 ULK1 Exon+ve, >2 cases 24.12012012 SEQ ID 51 12 130944468 130946248 1780 loss 1517 ULK1 Exon+ve, >2 cases 24.12012012 SEQ ID 51 12 130944468 130946248 1780 loss 1566 ULK1 Exon+ve, >2 cases 24.12012012 SEQ ID 51 12 130944468 130946248 1780 loss 1579 ULK1 Exon+ve, >2 cases 24.12012012 SEQ ID 51 12 130944468 130946248 1780 loss 1580 ULK1 Exon+ve, >2 cases 24.12012012 SEQ ID 51 12 130944468 130946248 1780 loss 1582 ULK1 Exon+ve, >2 cases 24.12012012 SEQ ID 52 12 130944468 130947790 3322 loss 1416 ULK1 Exon+ve, >2 cases 24.12012012 .o n SEQ ID 53 14 22946615 22947034 419 Loss 1820 MYH6 Ctrl pos High OR 22.57871064 -3 SEQ ID 54 14 22946615 22947639 1024 Loss 1718 MYH6 Ctrl pos High OR 22.57871064 ci) SEQ ID 54 14 22946615 22947639 1024 Loss 1802 MYH6 Ctrl pos High OR 22.57871064 t.) a -, SEQ ID 54 14 22946615 22947639 1024 Loss 1816 MYH6 Ctrl pos High OR 22.57871064 w -a-SEQ ID 54 14 22946615 22947639 1024 Loss 1817 MYH6 Ctrl pos High OR 22.57871064 ,J1 .1, w a SEQ ID 54 14 22946615 22947639 1024 Loss 1819 MYH6 Ctrl pos High OR 22.57871064 SEQ ID 54 14 22946615 22947639 1024 Loss 1850 MYH6 Ctrl pos High OR 22.57871064 SEQ ID 54 14 22946615 22947639 1024 Loss 1895 MYH6 Ctrl pos High OR 22.57871064 t-) a SEQ ID 54 14 22946615 22947639 1024 Loss 1993 MYH6 Ctrl pos High OR 22.57871064 -, w , SEQ ID 54 14 22946615 22947639 1024 Loss 2043 MYH6 Ctrl pos High OR 22.57871064 -, t..) a SEQ ID 55 14 22943262 22951086 7824 Loss 1577 MYH6 Ctrl pos High OR 22.57871064 SEQ ID 56 14 22946615 22955470 8855 Loss 2032 MYH6, MYH7 Ctrl pos High OR 22.57871064 ao SEQ ID 57 14 22943262 22955470 12208 Loss 1856 MYH6, MYH7 Ctrl pos High OR 22.57871064 SEQ ID 58 14 22929952 22958797 28845 Loss 1537 MIR208B, MYH6, MYH7 Ctrl pos High OR 22.57871064 SEQ ID 59 14 22929952 22959469 29517 Loss 1669 MIR208B, MYH6, MYH7 Ctrl pos High OR 22.57871064 SEQ ID 60 7 142027745 142152205 124460 loss 1568 PRSSI, MTRNR2L6 Exon+ve, >2 cases 22.57871064 SEQ ID 60 ,7 142027745 142152205 ,124460 ,loss , 1753 PRSSI, MTRNR2L6 Exon+ve, >2 cases 22.57871064 SEQ ID 61 7 142021348 142152205 130857 loss 1347 PRSSI, MTRNR2L6 Exon+ve, >2 cases 22.57871064 SEQ ID 62 7 142009000 142140540 131540 loss 2018 PRSSI, MTRNR2L6 Exon+ve, >2 cases 22.57871064 P
SEQ ID 63 7 142018368 142152205 133837 loss 1349 PRSSI, MTRNR2L6 Exon+ve, >2 cases 22.57871064 .
SEQ ID 63 7 142018368 142152205 133837 loss 1374 PRSSI, MTRNR2L6 Exon+ve, >2 cases 22.57871064 .

SEQ ID 63 7 ,142018368 , 142152205 133837 ,loss 1697 ,PRSS1, MTRNR2L6 Exon+ve, >2 cases 22.57871064 2 SEQ ID 64 7 142007171 142152205 145034 loss 1242 PRSSI, MTRNR2L6 Exon+ve, >2 cases 22.57871064 SEQ ID 65 7 142005505 142152205 146700 loss 1601 PRSSI, MTRNR2L6 Exon+ve, >2 cases 22.57871064 ' SEQ ID 66 7 142041787 142205830 164043 loss 1837 PRSSI, TRY6, PRSS2, MTRNR2L6 Exon+ve, >2 cases 22.57871064 , o, SEQ ID 67 7 142018368 142202274 183906 loss 1784 PRSS1, TRY6, PRSS2, MTRNR2L6 Exon+ve, >2 cases 22.57871064 SEQ ID 68 7 142009000 142205830 196830 loss 2024 PRSS I, TRY6, PRSS2, MTRNR2L6 Exon+ve, >2 cases 22.57871064 SEQ ID 69 7 141993718 142207147 213429 loss 1930 PRSSI, TRY6, PRSS2, MTRNR2L6 Exon+ve, >2 cases 22.57871064 SEQ ID 70 7 141989750 142205830 216080 loss 1803 PRSSI, TRY6, PRSS2, MTRNR2L6 Exon+ve, >2 cases 22.57871064 SEQ ID 71 7 141953817 142205830 252013 loss 1232 PRSSI, TRY6, PRSS2, MTRNR2L6 Exon+vc, >2 cases 22.57871064 SEQ ID 72 19 14666403 14667646 1243 loss 1677 ZNF333 Exon+ve, >2 cases 17.98208955 SEQ ID 72 19 14666403 14667646 1243 loss 1738 ZNF333 Exon+ve, >2 cases 17.98208955 .o n SEQ ID 72 19 14666403 14667646 1243 loss 1775 ZNF333 Exon+ve, >2 cases 17.98208955 -3 SEQ ID 72 19 14666403 14667646 1243 loss 1826 ZNF333 Exon+ve, >2 cases 17.98208955 ci) SEQ ID 72 19 14666403 14667646 1243 loss 1837 ZNF333 Exon+vc, >2 cases 17.98208955 t.) a -, SEQ ID 72 19 14666403 14667646 1243 loss 1957 ZNF333 Exon+ve, >2 cases 17.98208955 w -a-SEQ ID 72 19 14666403 14667646 1243 loss 1968 ZNF333 Exon+ve, >2 cases 17.98208955 ,J1 .1, w a SEQ ID 72 19 14666403 14667646 1243 loss 2004 ZNF333 .. Exon+ve, >2 cases 17.98208955 SEQ ID 72 19 14666403 14667646 1243 loss 2031 ZNF333 Exon+ve, >2 cases 17.98208955 SEQ ID 73 19 14665135 14667646 2511 loss 1416 ZNF333 Exon+ve, >2 cases 17.98208955 t-) a SEQ ID 73 19 14665135 14667646 2511 loss 1578 ZNF333 Exon+ve, >2 cases 17.98208955 -, w , SEQ ID 73 19 14665135 14667646 2511 loss 1881 ZNF333 Exon+ve, >2 cases 17.98208955 -, t..) a SEQ ID 74 5 122534134 122535395 1261 loss 1224 PRDM6 Exon+ve, >2 cases 16.45901639 SEQ ID 74 5 122534134 122535395 1261 loss 1548 PRDM6 Exon+ve, >2 cases 16.45901639 ao SEQ ID 74 5 122534134 122535395 1261 loss 1552 PRDM6 Exon+ve, >2 cases 16.45901639 SEQ ID 74 5 122534134 122535395 1261 loss 1681 PRDM6 Exon+ve, >2 cases 16.45901639 SEQ ID 74 5 122534134 122535395 1261 loss 1740 PRDM6 Exon+ve, >2 cases 16.45901639 SEQ ID 74 5 122534134 122535395 1261 loss 1763 PRDM6 Exon+ve, >2 cases 16.45901639 SEQ ID 74 ,5 122534134 122535395 ,1261 ,loss , 1786 PRDM6 ,Exon+ve, >2 cases ,16.45901639 SEQ ID 74 5 122534134 122535395 1261 loss 1807 PRDM6 Exon+ve, >2 cases 16.45901639 SEQ ID 74 5 122534134 122535395 1261 loss 1880 PRDM6 Exon+vc, >2 cases 16.45901639 P
SEQ ID 74 5 122534134 122535395 1261 loss 1881 PRDM6 Exon+ve, >2 cases 16.45901639 .
SEQ ID 74 5 122534134 122535395 1261 loss 1915 PRDM6 Exon+ve, >2 cases 16.45901639 .

SEQ ID 75 2 ,10263146 , 10272211 9065 ,loss 1256 ,C2orf48 Exon+ve, >2 cases 14.94047619, ' , SEQ ID 75 2 10263146 10272211 9065 loss 1285 C2orf48 Exon+ve, >2 cases 14.94047619 ' SEQ ID 75 2 10263146 10272211 9065 loss 1370 C2orf48 Exon+ve, >2 cases 14.94047619 .
, SEQ ID 75 2 10263146 10272211 9065 loss 1396 C2orf48 Exon+ve, >2 cases 14.94047619 .
o, SEQ ID 76 6 33495074 33505974 10900 loss 1824 SYNGAP1 Exon+ve, >2 cases 14.94047619 SEQ ID 76 6 33495074 33505974 10900 loss 1840 SYNGAP1 Exon+ve, >2 cases 14.94047619 SEQ ID 77 2 10263146 10274556 11410 loss 1307 C2orf48 Exon+ve, >2 cases 14.94047619 loss 1415 C2orf48 Exon+ve, >2 cases 14.94047619 loss 1616 C2orf48 Exon+ve, >2 cases 14.94047619 loss 1654 C2orf48 Exon+ve, >2 cases 14.94047619 loss 1830 C2orf48 Exon+ve, >2 cases 14.94047619 .o n loss 1931 C2orf48 Exon+ve, >2 cases 14.94047619 SEQ ID 78 6 33491109 33504619 13510 loss 1718 SYNGAP1, CUTA, PHF1 Exon+ve, >2 cases 14.94047619 ci) SEQ ID 78 6 33491109 33504619 13510 loss 2032 SYNGAP1, CUTA, PHF1 Exon+ve, >2 cases 14.94047619 t.) a -, loss 1872 SYNGAP1, CUTA Exon+ve, >2 cases 14.94047619 w SEQ

-loss 1967 SYNGAP1, CUTA Exon+ve, >2 cases 14.94047619 ul .1, w a SEQ ID 80 6 33491109 33505974 14865 loss 1905 SYNGAP1, CUTA, PHF1 Exon+ve, >2 cases 14.94047619 SEQ ID 80 6 33491109 33505974 14865 loss 2031 SYNGAP1, CUTA, PHF1 Exon+ve, >2 cases 14.94047619 SEQ ID 81 6 33491109 33507587 16478 loss 1297 SYNGAPI, CUTA, PHF1 Exon+ve, >2 cases 14.94047619 t-) a SEQ ID 82 11 5742476 5774108 31632 gain 1394 0R52N5, OR52N1 Exon+ve, >2 cases 14.94047619 -, w , SEQ ID 82 11 5742476 5774108 31632 gain 1536 0R52N5, OR52N1 Exon+ve, >2 cases 14.94047619 -, i..) a SEQ ID 82 11 5742476 5774108 31632 gain 1821 0R52N5, OR52N1 Exon+ve, >2 cases 14.94047619 ao SEQ ID 82 11 5742476 5774108 31632 gain 1825 0R52N5, OR52N1 Exon+ve, >2 cases 14.94047619 SEQ ID 82 11 5742476 5774108 31632 gain 1902 0R52N5, OR52N1 Exon+ve, >2 cases 14.94047619 SEQ ID 83 11 5742476 5775970 33494 gain 1538 0R52N5, 0R52N1 Exon+ve, >2 cases 14.94047619 SEQ ID 83 11 5742476 5775970 33494 gain 1551 0R52N5, OR52N1 Exon+ve, >2 cases 14.94047619 SEQ ID 83 11 5742476 5775970 33494 gain 1727 0R52N5, OR52N1 Exon+ve, >2 cases 14.94047619 SEQ ID 83 ,11 5742476 5775970 ,33494 ,gain , 1823 0R52N5, OR52N1 ,Exon+ve, >2 cases ,14.94047619 SEQ ID 83 11 5742476 5775970 33494 gain 1824 0R52N5, OR52N1 Exon+ve, >2 cases 14.94047619 111052 loss 1841 SYNGAPE PHFI, CUTA, KIFC1 Exon+ve, >2 cases 14.94047619 P
SEQ ID 85 19 59174756 59183718 8962 loss 1859 CACNG8, MIR935 Exon+ve, >2 cases 13.42644874 .
SEQ ID 86 10 131651597 131652807 1210 loss 1572 EBF3 Exon+ve, >2 cases 11.91691395 .

SEQ ID 86 10 131651597 131652807 1210 ,gain 1597 ,EBF3 Exon+ve, >2 cases 11.91691395, 2 SEQ ID 86 10 131651597 131652807 1210 gain 1644 EBF3 Exon+ve, >2 cases 11.91691395 SEQ ID 86 10 131651597 131652807 1210 loss 1691 EBF3 Exon+ve, >2 cases 11.91691395 ' SEQ ID 86 10 131651597 131652807 1210 loss 1703 EBF3 Exon+ve, >2 cases 11.91691395 , o, SEQ ID 86 10 131651597 131652807 1210 loss 1704 EBF3 Exon+ve, >2 cases 11.91691395 SEQ ID 86 10 131651597 131652807 1210 gain 1709 EBF3 Exon+ve, >2 cases 11.91691395 SEQ ID 86 10 131651597 131652807 1210 loss 1724 EBF3 Exon+ve, >2 cases 11.91691395 SEQ ID 87 15 54513726 54522863 9137 loss 1237 TEX9, MNS1 Exon+ve, >2 cases 11.91691395 SEQ ID 87 15 54513726 54522863 9137 loss 1347 TEX9, MNS1 Exon+ve, >2 cases 11.91691395 SEQ ID 87 15 54513726 54522863 9137 loss 1441 TEX9, MNS1 Exon+ve, >2 cases 11.91691395 SEQ ID 87 15 54513726 54522863 9137 loss 1456 TEX9, MNS1 Exon+ve, >2 cases 11.91691395 .o n SEQ ID 87 15 54513726 54522863 9137 loss 1494 TEX9, MNS1 Exon+ve, >2 cases 11.91691395 -3 SEQ ID 87 15 54513726 54522863 9137 loss 1496 TEX9, MNS1 Exon+ve, >2 cases 11.91691395 ci) SEQ ID 87 15 54513726 54522863 9137 loss 1997 TEX9, MNS1 Exon+ve, >2 cases 11.91691395 a -, SEQ ID 88 15 54513726 54523657 9931 loss 1497 TEX9, MNS1 Exon+ve, >2 cases 11.91691395 w -a-SEQ ID 89 5 10683077 10691335 8258 loss 1438 ANKRD33B Exon+ve, >2 cases 11.91691395 ui .1, w a loss 1619 ANKRD33B Exon+ve, >2 cases 11.91691395 loss 1629 ANKRD33B Exon+ve, >2 cases 11.91691395 loss 1630 ANKRD33B Exon+ve, >2 cases 11.91691395 t-) =

8258 loss 1998 ANKRD33B Exon+ve, >2 cases 11.91691395 .., w , 8258 loss 2026 ANKRD33B Exon+ve, >2 cases 11 .
91691395 -, t..) =
SEQ ID 90 6 143693693 143705189 11496 gain 1372 AIG1 Exon+ve, >2 cases 11.91691395 gain 1281 AIG1 Exon+ve, >2 cases 11.91691395 ao loss 1666 ANKRD33B Exon+ve, >2 cases 11.91691395 12716482 3437233 loss 1850 TAG, CMBL, SEMA5A, FAM173B, Exon+ve, >2 cases 11.91691395 ROPNIL, CCT5, L0C285692, MARCH6, DAP, CTNND2, SNORD123, ANKRD33B, TAS2R1 SEQ ID 94 6 143697902 143705189 7287 gain 1905 AIG1 Exon+ve, >2 cases 11.91691395 SEQ ID 95 6 143696259 143705189 8930 gain 1429 AIG1 Exon+ve, >2 cases 11.91691395 P
SEQ ID 95 6 143696259 143705189 8930 gain 1926 AIG1 Exon+ve, >2 cases 11.91691395 2 SEQ ID 90 6 143693693 143705189 11496 gain 1409 AIG1 Exon+ve, >2 cases 11.91691395 .9 gain 1619 AIG1 Exon+ve, >2 cases 11.91691395 ..

, gain 1639 AIG1 Exon+ve, >2 cases 11.91691395 ,.

loss 1419 C 16orf89 Exon+ve, >2 cases 10.41185185 .
, ..

loss 1447 ELK3 Exon+ve, >2 cases 10.41185185 ' ., loss 1728 ELK3 Exon+ve, >2 cases 10.41185185 loss 1742 ELK3 Exon+ve, >2 cases 10.41185185 loss 1957 ELK3 Exon+ve, >2 cases 10.41185185 loss 1961 ELK3 Exon+ve, >2 cases 10.41185185 loss 1965 ELK3 Exon+ve, >2 cases 10.41185185 loss 1967 ELK3 Exon+ve, >2 cases 10.41185185 gain 1324 Cllorf96 Exon+ve, >2 cases 10.41185185 .o loss 1396 Cllorf96 Exon+ve, >2 cases 10.41185185 n gain 1530 Cllorf96 Exon+ve, >2 cases 10.41185185 ci) loss 1829 Cllorf96 Exon+ve, >2 cases 10.41185185 =
.., gain 1860 Cllorf96 Exon+ve, >2 cases 10.41185185 w -I-loss 1874 C11orf96 Exon+ve, >2 cases 10.41185185 r.) ul .1, C.AJ

a SEQ ID 98 11 43920001 43921971 1970 gain 1996 Cllorf96 Exon+ve, >2 cases 10.41185185 SEQ ID 99 16 3868512 3870705 2193 loss 1590 CREBBP Exon+ve, >2 cases 10.41185185 SEQ ID 100 16 3868512 3872218 3706 loss 1533 CREBBP Exon+ve, >2 cases 10.41185185 L-4 SEQ ID 100 16 3868512 3872218 3706 loss 1539 CREBBP Exon+ve, >2 cases 10.41185185 loss 1676 HEATR4 Exon+ve, >2 cases 10.41185185 loss 1806 HEATR4 Exon+ve, >2 cases 10.41185185 SEQ ID 103 16 4187745 4192873 5128 loss 1442 SRL Exon+ve, >2 cases 10.41185185 ao loss 1275 PKD1L2 Exon+ve, >2 cases 10.41185185 loss 1998 PKD1L2 Exon+ve, >2 cases 10.41185185 loss 1798 Cl lorf49, ARFGAP2, PACSIN3 Exon+ve, >2 cases 10.41185185 loss 1852 Cl lorf49, ARFGAP2, PACSIN3 Exon+ve, >2 cases 10.41185185 SEQ ID 105,11 47142460 47155662 13202 ,loss 1854 Cl lorf49, ARFGAP2, PACSIN3 ,Exon+ve, >2 cases 10.41185185 loss 1855 Cllorf49, ARFGAP2, PACSIN3 Exon+ve, >2 cases 10.41185185 loss 1857 C 1 lorf49, ARFGAP2, PACSIN3 Exon+ve, >2 cases 10.41185185 loss 1936 Cl lorf49, ARFGAP2, PACSIN3 Exon+ve, >2 cases 10.41185185 loss 2031 Cllorf49, ARFGAP2, PACSIN3 Exon+ve, >2 cases 10.41185185 SEQ ID 106 14 ,73058103 73071404 13301 ,loss 1687 ,HEATR4 Exon+ve, >2 cases 10.41185185, gain 1252 PKD1L2 Exon+ve, >2 cases 10.41185185 loss 1404 PKD1L2 Exon+ve, >2 cases 10.41185185 loss 1237 HEATR4 Exon+ve, >2 cases 10.41185185 SEQ ID 110 X 2768213 2788489 20276 loss 1654 GYG2 Exon+ve, >2 cases 10.41185185 gain 1763 PKD1L2, LOC100329108, GCSH Exon+ve, distinct 10.41185185 CNVs, same Gene SEQ ID 112 16 4554395 4588011 33616 loss 1689 L0C342346 Exon+ve, >2 cases 10.41185185 loss 1721 HEATR4, ACOT1 Exon+ve, >2 cases 10.41185185 SEQ ID 114 2 73732303 73770615 38312 gain 1533 ALMS1P Exon+ve, >2 cases 10.41185185 SEQ ID 114 2 73732303 73770615 38312 loss 1738 ALMS1P Exon+ve, >2 cases 10.41185185 SEQ ID 115 2 73732303 73785403 53100 gain 1887 NAT8B, ALMS1P Exon+ve, >2 cases 10.41185185 -3 loss 1718 HEATR4, ACOT2, ACOT1 Exon+ve, >2 cases 10.41185185 ci) gain 1369 NAT8, ALMS IP Exon+ve, >2 cases 10.41185185 gain 1626 NAT8, ALMS1P Exon+ve, >2 cases 10.41185185 =-==
SEQ ID 118 2 73706727 73766459 59732 loss 1551 NAT8, ALMS1P Exon+ve, >2 cases 10.41185185 C.AJ

SEQ ID 118 2 73706727 73766459 59732 loss 1728 NAT8, ALMS1P Exon+ve, >2 cases 10.41185185 loss 1917 PKD1L2 Exon+ve, >2 cases 10.41185185 gain 1291 HEATR4, CI4orf169, ACOT1 Exon+ve, >2 cases 10.41185185 t-) =
SEQ ID 121 X 2705378 2814330 108952 gain 1509 XG, GYG2 Exon+ve, >2 cases 10.41185185 -, w , SEQ ID 121 X 2705378 2814330 108952 gain 1732 XG, GYG2 Exon+ve, >2 cases 10.41185185 -, i..) =
SEQ ID 121 X 2705378 2814330 108952 gain 1825 XG, GYG2 Exon+ve, >2 cases 10.41185185 oc, SEQ ID 122 X 2705374 2814330 108956 gain 1434 XG, GYG2 Exon+ve, >2 cases 10.41185185 145826 gain 1459 PKD1L2, BCM01 Exon+ve, >2 cases 10.41185185 SEQ ID 124 X 2554044 2747802 193758 gain 1917 XGPY2, CD99P1, XG, CD99 Exon+ve, >2 cases 10.41185185 SEQ ID 125 X 2749116 3191663 442547 gain 1917 ARSD, ARSE, ARSF, ARSH, GYG2 Exon+ve, >2 cases 10.41185185 1706224 gain 1567 CLUAP1,NAGPA,COR07- Exon+ve, >2 cases 10.41185185 PAM16,GLIS2,ALG1,ROGDI,SEC1 4L5,C16orf5.ZNF597,NUDT16L1,G
LYR1,L0C40335,UBN1,COR07,C
P
16orf89,L0C342346,SLX4,TRAP1, DNASE1,PPL,ZNF434,PAM16,AN
.9 KS3,FAM100A,NLRC3,MTRNR2L
o's _., 4, Cl6orf71,VA SN,NMRAL1. SRL,N
AT 1 5,DNAJA3,TFAP4,ZNF174,Ab ,.
, CY9,HMOX2,C 16orf90,ZNF500, SE

PT12,MGRN1,CREBBP

loss 1773 SNUPN Exon+ve, >2 cases 8.911242604 SEQ ID 128 15 ,73443782 , 73460290 16508 ,gain 1301 ,MAN2C1, SIN3A Exon+ve, >2 cases 8.911242604.

gain 2018 IMP3, SNX33, SNUPN, CSPG4 Exon+ve, >2 cases 8.911242604 gain 1309 CYPIA1 Exon+ve, >2 cases 8.911242604 loss 1548 ARHGAP21 Exon+ve, >2 cases 8.911242604 loss 1699 ARHGAP21 Exon+ve, >2 cases 8.911242604 .o loss 1724 ARHGAP21 Exon+ve, >2 cases 8.911242604 n loss 1961 ARHGAP21 Exon+ve, >2 cases 8.911242604 gain 1401 ARHGAP2I Exon+ve, >2 cases 8.911242604 ci) i.) =

gain 1820 ARHGAP21 Exon+ve, >2 cases 8.911242604 -, w gain 1293 NE01 Exon+ve, >2 cases 8.911242604 =-==
r.) ui .1, ca a SEQ ID 134 6 139638465 139651247 12782 loss 1387 TXLNB
Exon+ve, >2 cases 8.911242604 SEQ ID 134 6 139638465 139651247 12782 loss 1396 TXLNB
Exon+ve, >2 cases 8.911242604 SEQ ID 134 6 139638465 139651247 12782 loss 1696 TXLNB
Exon+ve, >2 cases 8.911242604 t-) =
SEQ ID 135 6 139635466 139648318 12852 loss 1403 TXLNB
Exon+ve, >2 cases 8.911242604 -, w , SEQ ID 135 6 139635466 139648318 12852 loss 1895 TXLNB
Exon+ve, >2 cases 8.911242604 -, t..) =
SEQ ID 136 6 139635466 139651247 15781 loss 1401 TXLNB
Exon+ve, >2 cases 8.911242604 SEQ ID 137 7 100166257 100183859 17602 loss 1896 ZAN
Exon+ve, >2 cases 8.911242604 ao gain 1587 LCE1D, LCE1C Exon+ve, >2 cases 8.911242604 gain 1695 LCE1D, LCE1C Exon+ve, >2 cases 8.911242604 SEQ ID 139 7 100162851 100183859 21008 loss 1227 ZAN
Exon+ve, >2 cases 8.911242604 SEQ ID 139 7 100162851 100183859 21008 loss 1236 ZAN
Exon+ve, >2 cases 8.911242604 SEQ ID 139 7 100162851 100183859 21008 loss 1803 ZAN
Exon+ve, >2 cases 8.911242604 SEQ ID 139 7 100162851 100183859 21008 loss 1824 ZAN
Exon+ve, >2 cases 8.911242604 SEQ ID 139 7 100162851 100183859 21008 loss 2034 ZAN
Exon+ve, >2 cases 8.911242604 P
SEQ ID 140 1 151028700 151050046 21346 gain 1223 LCE1D, LCE1C
Exon+ve, >2 cases 8.911242604 .
SEQ ID 140 1 151028700 151050046 21346 gain 1664 LCE1D, LCE1C
Exon+ve, >2 cases 8.911242604 .

SEQ ID 140 1 151028700 151050046 21346 ,gain 1740 LCE1D, LCE1C
Exon+ve, >2 cases 8.911242604 2 SEQ ID 141 1 151026228 151050046 23818 gain 1936 LCE1D, LCE1E, LCE1C
Exon+ve, >2 cases 8.911242604 73892403 4300039 loss 1415 PKM2,C15orf59,PPCDC,CELF6,UB
Exon+ve, >2 cases 8.911242604 ' L7,HCN4,C15orf39,EDC3,ADPGK.

o, MAN2C1,C15or134,C0X5A,LOXL1 ,CYP11A1,NPTN,CSK,TBC1D21,M
IR631 ,MIR630,COMMD4.GRAMD
2,TMEM202,NE01,CCDC'33,PML, SNX33.PARP6,SIN3A,ULK3,SCA
MP5,SCAMP2,ARIH1.SENP8,PTP
N9,STRA6,THSD4,SNUPN,RPP25, .o CPLX3,C15orf60,GOLGA6D,GOL
n GA6C.GOLGA6B,GOLGA6A,NR2 E3,MIR4313,C15orf17,DNM1P35,S
ci) t.) =
EMA7A,LOC283731,IMP3,CYP1A
-, w 1,CYPIA2,ARID3B,ISLR,CSPG4,H
=-==
r.) ul .1, C.AJ

a EXA,HIGD2B,CD276,BBS4,STOM
L1.MPI,ODF3L1,NEIL1,MY09A,L

MAN1L,CLK3,ISLR2 r.) =

loss 1665 STARD3 Exon+ve, >2 cases 7.41506647 -, w , loss 2045 STARD3 Exon+ve, >2 cases 7.41506647 ¨, t..) =
SEQ ID 145 9 21321182 21330461 9279 loss 1687 KLHL9 Exon+ve, >2 cases 7.41506647 SEQ ID 146 9 21422879 21434788 11909 loss 1777 IFNA1 Exon+ve, >2 cases 7.41506647 ao SEQ ID 147 10 116949327 116971507 22180 gain 1292 ATRNL1 Exon+ve, >2 cases 7.41506647 SEQ ID 147 10 116949327 116971507 22180 gain 1880 ATRNL1 Exon+ve, >2 cases 7.41506647 SEQ ID 148 9 21245159 21274020 28861 gain 2020 IFNA22P Exon+ve, >2 cases 7.41506647 SEQ ID 149 10 116940096 116971507 31411 gain 1394 ATRNL1 Exon+ve, >2 cases 7.41506647 SEQ ID 149 10 116940096 116971507 31411 gain 1834 ATRNL1 Exon+ve, >2 cases 7.41506647 SEQ ID 149 10 116940096 116971507 31411 gain 1924 ATRNL1 Exon+ve, >2 cases 7.41506647 SEQ ID 150 4 20161068 20161847 779 loss 1426 SLIT2 Exon+ve, >2 cases 7.41506647 P
SEQ ID 150 4 20161068 20161847 779 loss 1528 SLIT2 Exon+ve, >2 cases 7.41506647 2 SEQ ID 150 4 20161068 20161847 779 loss 1665 SLIT2 Exon+ve, >2 cases 7.41506647 .9 ..
SEQ ID 150 4 20161068 20161847 779 loss 1667 SLIT2 Exon+ve, >2 cases 7.41506647 ' , loss 1269 SLC38A6 Exon+ve, >2 cases 7.41506647 ,.

gain 1281 SLC38A6 Exon+ve, >2 cases 7.41506647 .
, ..

gain 1773 SLC38A6 Exon+ve, >2 cases 7.41506647 01 ., SEQ ID 152 X 15463254 15464663 1409 loss 1234 BMX Exon+ve, >2 cases 7.41506647 SEQ ID 152 X 15463254 15464663 1409 loss 1320 BMX Exon+ve, >2 cases 7.41506647 SEQ ID 152 X 15463254 15464663 1409 loss 1822 BMX Exon+ve, >2 cases 7.41506647 SEQ ID 152 X 15463254 15464663 1409 loss 1827 BMX Exon+ve, >2 cases 7.41506647 SEQ ID 152 X 15463254 15464663 1409 loss 1876 BMX Exon+ve, >2 cases 7.41506647 loss 1442 ADAMTS5 Exon+ve, >2 cases 7.41506647 loss 1522 ADAMTS5 Exon+ve, >2 cases 7.41506647 .o loss 1714 ADAMTS5 Exon+ve, >2 cases 7.41506647 n ,loss 1828 ADAMTS5 Exon+ve, 22 cases 7.41506647 ci) loss 1915 ADAMTS5 Exon+ve, 22 cases 7.41506647 6' ¨, loss 1471 MIR1470, WIZ Exon+ve, 22 cases 7.41506647 w =-==

loss 1687 MIR1470, WIZ Exon+ve, 22 cases 7.41506647 t--) ul .1, C.AJ

a SEQ ID 154 19 15420954 15422784 1830 loss 1887 MIR1470, WIZ Exon+ve, 22 cases 7.41506647 SEQ ID 155 19 15420382 15422978 2596 loss 1676 MIR1470, WIZ Exon+ve, 22 cases 7.41506647 SEQ ID 156 10 5985730 5988631 2901 loss 2024 FBX0I8 Exon+ve, 22 cases 7.41506647 t-) =
SEQ ID 157 6 159234892 159238587 3695 loss 1419 C6orf99 Exon+ve, 22 cases 7.41506647 -, w , SEQ ID 144 17 35069605 35073438 3833 loss 1316 STARD3 Exon+ve, 22 cases 7.41506647 -, t..) =
SEQ ID 144 17 35069605 35073438 3833 loss 1318 STARD3 Exon+ve, 22 cases 7.41506647 SEQ ID 144 17 35069605 35073438 3833 loss 1676 STARD3 Exon+ve, 22 cases 7.41506647 ao SEQ ID 158 4 20157798 20161847 4049 loss 1671 SLIT2 Exon+ve, 22 cases 7.41506647 SEQ ID 159 19 15418682 15422978 4296 loss 1726 MIR1470, WIZ Exon+ve, 22 cases 7.41506647 SEQ ID 160 2 206586117 206590636 4519 gain 1220 IN080D
Exon+ve, 22 cases 7.41506647 SEQ ID 161 9 132916080 132921442 5362 loss 1897 LAMC3 Exon+ve, 22 cases 7.41506647 SEQ ID 162,6 105298061 105303833 ,5772 ,loss ,1426 HACE1 ,Exon+ve, 22 cases ,7.41506647 SEQ ID 162 6 105298061 105303833 5772 loss 1458 HACE1 Exon+ve, 22 cases 7.41506647 SEQ ID 162 6 105298061 105303833 5772 loss 1490 HACE1 Exon+ve, 22 cases 7.41506647 P
SEQ ID 162 6 105298061 105303833 5772 loss 1492 HACE1 Exon+ve, 22 cases 7.41506647 .
SEQ ID 163 2 206586117 206592116 5999 gain 1803 IN080D
Exon+ve, 22 cases 7.41506647 .

SEQ ID 163 2 ,206586117 ,206592116 5999 ,gain 1988 ,IN080D Exon+ve, 22 cases 7.41506647 , 2 SEQ ID 163 2 206586117 206592116 5999 gain 2028 IN080D
Exon+ve, 22 cases 7.41506647 SEQ ID 164 19 56882602 56889437 6835 loss 1965 MIR99B, MIRLET7E, MIR125A, Exon+ve, 22 cases 7.41506647 ' Z
o, SEQ ID 164 19 56882602 56889437 6835 loss 2032 MIR99B, MIRLET7E, MIR125A, Exon+ve, 22 cases 7.41506647 SEQ ID 165 3 64479002 64486008 7006 loss 1428 ADAMTS9 Exon+ve, 22 cases 7.41506647 SEQ ID 165 3 64479002 64486008 7006 loss 1434 ADAMTS9 Exon+ve, 22 cases 7.41506647 SEQ ID 165 3 64479002 64486008 7006 loss 1572 ADAMTS9 Exon+ve, 22 cases 7.41506647 SEQ ID 165 3 64479002 64486008 7006 loss 1592 ADAMTS9 Exon+ve, 22 cases 7.41506647 SEQ ID 165 3 64479002 64486008 7006 loss 1763 ADAMTS9 Exon+ve, 22 cases 7.41506647 .o SEQ ID 166 2 135704927 135712021 7094 loss 1512 ZRANB3 Exon+ve, 22 cases 7.41506647 n SEQ ID 166 2 135704927 135712021 7094 ,loss 1574 ,ZRANB3 Exon+ve, 22 cases 7.41506647 , ci) SEQ ID 166 2 135704927 135712021 7094 loss 1757 ZRANB3 Exon+ve, 22 cases 7.41506647 6' -, SEQ ID 166 2 135704927 135712021 7094 gain 1970 ZRANB3 Exon+ve, 22 cases 7.41506647 w =-==
SEQ ID 167 19 56881984 56889437 7453 loss 1859 MIR99B, MIRLET7E, MIR125A, Exon+ve, 22 cases 7.41506647 t--) ul .1, C.AJ

a SEQ ID 168 4 74504402 74511880 7478 loss 1373 ALB Exon+ve, >2 cases 7.41506647 SEQ ID 168 4 74504402 74511880 7478 loss 1464 ALB Exon+ve, >2 cases 7.41506647 t-) a SEQ ID 168 4 74504402 74511880 7478 loss 1798 ALB Exon+ve, >2 cases 7.41506647 -, w , SEQ ID 168 4 74504402 74511880 7478 loss 1959 ALB Exon+ve, >2 cases 7.41506647 ¨, t..) a SEQ ID 169 9 19775974 19783547 7573 loss 1511 SLC24A2 Exon+ve, >2 cases 7.41506647 SEQ ID 170 2 206584487 206592116 7629 gain 1921 IN080D
Exon+ve, >2 cases 7.41506647 ao SEQ ID 171 10 5985730 5993423 7693 loss 1307 FBX018 Exon+ve, >2 cases 7.41506647 SEQ ID 171 10 5985730 5993423 7693 loss 1409 FBX018 Exon+ve, >2 cases 7.41506647 SEQ ID 171 10 5985730 5993423 7693 loss 1619 FBX018 Exon+ve, >2 cases 7.41506647 loss 1470 SLC38A6 Exon+ve, >2 cases 7.41506647 SEQ ID 172,14 60544757 60553070 ,8313 ,loss ,2000 SLC38A6 ,Exon+ve, >2 cases ,7.41506647 SEQ ID 173 2 135704927 135713556 8629 gain 1451 ZRANB3 Exon+ve, >2 cases 7.41506647 loss 1232 MIR99B, MIRLET7E, MIRI25A, Exon+ve, >2 cases 7.41506647 P

SEQ ID 175 10 5984217 5993423 9206 loss 1654 FBX018 Exon+ve, >2 cases 7.41506647 .

SEQ ID 176 9 132912215 132921442 9227 loss 1345 LAMC3 Exon+ve, >2 cases 7.41506647 ' , SEQ ID 177,6 159234892 159244475 ,9583 ,loss ,1742 C6orf99 Exon+ve, >2 cases 7.41506647 .
.
SEQ ID 177 6 159234892 159244475 9583 loss 1900 C6orf99 Exon+ve, >2 cases 7.41506647 , SEQ ID 178 9 132910836 132921442 10606 loss 1621 LAMC3 Exon+ve, >2 cases 7.41506647 01 o, SEQ ID 178 9 132910836 132921442 10606 loss 1639 LAMC3 Exon+ve, >2 cases 7.41506647 SEQ ID 179 4 74504402 74515385 10983 loss 1852 ALB Exon+ve, >2 cases 7.41506647 SEQ ID 180 9 132907202 132921442 14240 loss 1720 LAMC3 Exon+ve, >2 cases 7.41506647 loss 1993 MIR99B, MIRLET7E, MIR125A, Exon+ve, >2 cases 7.41506647 SEQ ID 182 6 159184210 159203355 19145 loss 1582 OSTCL
Exon+ve, >2 cases 7.41506647 SEQ ID 183 6 105291227 105311034 19807 loss 1500 HACE1 Exon+ve, >2 cases 7.41506647 .o SEQ ID 184 7 153742206 153792779 50573 loss 1885 DPP6 Exon+ve, >2 cases 7.41506647 n SEQ ID 185 6 ,159190838 ,159251696 60858 ,loss 1468 , OSTCL, C6orf99 Exon+ve, 22 cases 7.41506647 , ci) SEQ ID 186 7 153775546 153845854 70308 loss 1949 DPP6 Exon+ve, 22 cases 7.41506647 6' ¨, SEQ ID 187 7 153134693 153290833 156140 gain 1486 DPP6 Exon+ve, 22 cases 7.41506647 w SEQ ID ID 188 7 153158956 153384745 225789 gain 1755 DPP6 Exon+ve, 22 cases 7.41506647 t--) ,J1 .1, w a SEQ ID 189 7 152883490 154689863 1806373 gain 1730 HTR5A, L0C100132707, Exon+ve, >2 cases 7.41506647 L0C202781, DPP6, PAXIP1 4997715 loss 1418 MIR31, ELAVL2, PTPLAD2, Exon+ve, >2 cases 7.41506647 r.) =
CDKN2B-AS1, MIR491, MLLT3, -, w , IFNW1, IFNB1. C9orf53, IFNA22P, ¨' t..) =
IFNAI3, IFNAIO, IFNA17, IFNAI6, =
IFNA14, CDKN2B, CDKN2A, ao IFNE, SLC24A2, KIAA1797, MTAP, KLHL9, IFNA8, IFNA2, IFNA1, DMRTAl. IFNA7, IFNA6, IFNA5, IFNA4, IFNA21, SEQ ID 191 6 160246670 160248266 1596 gain 1870 MAS1 Exon+ve, >2 cases 5.923303835 loss 2002 MAP3K9 Exon+ve, >2 cases 5.923303835 gain 1864 ELAVL3 Exon+ve, >2 cases 5.923303835 P
SEQ ID 194 2 218849164 218852974 3810 gain 2024 PNKD, TMBIM1 Exon+ve, >2 cases 5.923303835 loss 1662 CASC4 Exon+ve, >2 cases 5.923303835 ' SEQ ID 196 14 102447536 102455572 8036 loss 1800 TRAF3 Exon+ve, >2 cases 5.923303835 ,g loss 1475 PCDH15 Exon+ve, >2 cases 5.923303835 .
, loss 1537 PCDH15 Exon+ve, >2 cases 5.923303835 SEQ ID 198 8 22631429 22641498 10069 loss 1849 PEBP4 Exon+ve, >2 cases 5.923303835 o, gain 1309 PCDH15 Exon+ve, >2 cases 5.923303835 SEQ ID 200 6 134622620 134635779 13159 ,loss 1708 SGK1 Exon+ve, >2 cases 5.923303835 SEQ ID 201 7 45079997 45096030 16033 loss 1907 NACAD, CCM2 Exon+ve, >2 cases 5.923303835 gain 1717 GRAP, SLC5A10, FAM83G Exon+ve, >2 cases 5.923303835 loss 1919 BASP1P1 Exon+ve, >2 cases 5.923303835 589618 gain 1695 HGSNAT, FNTA, POTEA, SGK196 Exon+ve, >2 cases 5.923303835 .o loss 1225 SLCO1B3 Exon+ve, >2 cases 5.923303835 n loss 1577 SLCO1B3 Exon+ve, >2 cases 5.923303835 loss 1581 SLCO1B3 Exon+ve, >2 cases 5.923303835 ci) t.) =
SEQ ID 206 1 91632025 91632374 349 loss 1582 HFM1 Exon+ve, >2 cases 5.923303835 ¨, w loss 1687 HFM1 Exon+ve, >2 cases 5.923303835 =-==
r.) ul .1, C.AJ

a SEQ ID 206 1 91632025 91632374 349 loss 1929 HFM1 Exon+ve, >2 cases 5.923303835 SEQ ID 206 1 91632025 91632374 349 loss 2045 HFM1 Exon+ve, >2 cases 5.923303835 SEQ ID 207 6 160247865 160248266 401 gain 1242 MAS 1 Exon+ve, >2 cases 5.923303835 t-) =
SEQ ID 208 1 94115122 94116506 1384 loss 1782 DN 1.11P2 Exon+ve, >2 cases 5.923303835 -, w , loss 1910 MAP3K9 Exon+ve, >2 cases 5.923303835 -, t..) =

loss 2001 MAP3K9 Exon+ve, >2 cases 5.923303835 SEQ ID 210 2 201713188 201714627 1439 gain 1344 CFLAR
Exon+ve, >2 cases 5.923303835 ao SEQ ID 210 2 201713188 201714627 1439 gain 1824 CFLAR
Exon+ve, >2 cases 5.923303835 SEQ ID 210 2 201713188 201714627 1439 gain 1841 CFLAR
Exon+ve, >2 cases 5.923303835 SEQ ID 210 2 201713188 201714627 1439 gain 1927 CFLAR
Exon+ve, >2 cases 5.923303835 gain 1637 ELAVL3 Exon+ve, >2 cases 5.923303835 SEQ ID 212 1 3752549 3754045 1496 loss 1426 KIAA0562 Exon+ve, >2 cases 5.923303835 SEQ ID 212 1 3752549 3754045 1496 loss 1439 KIAA0562 Exon+ve, >2 cases 5.923303835 SEQ ID 212 1 3752549 3754045 1496 loss 1441 KIAA0562 Exon+ve, >2 cases 5.923303835 P
SEQ ID 212 1 3752549 3754045 1496 loss 1912 KIAA0562 Exon+ve, >2 cases 5.923303835 .
SEQ ID 191 6 160246670 160248266 1596 gain 1571 MAS1 Exon+ve, >2 cases 5.923303835 .

,loss 1488 SLCO1B3 Exon+ve, >2 cases 5.923303835 2 SEQ ID 214 1 94113132 94115122 1990 loss 1904 DNTTIP2 Exon+ve, >2 cases 5.923303835 SEQ ID 215 7 147734925 147737360 2435 loss 1346 CNTNAP2 Exon+ve, >2 cases 5.923303835 ' SEQ ID 215 7 147734925 147737360 2435 loss 1403 CNTNAP2 Exon+ve, >2 cases 5.923303835 , o, SEQ ID 215 7 147734925 147737360 2435 loss 1988 CNTNAP2 Exon+ve, >2 cases 5.923303835 gain 1309 L0C400456 Exon+ve, >2 cases 5.923303835 gain 1825 L0C400456 Exon+ve, >2 cases 5.923303835 gain 1837 L0C400456 Exon+ve, >2 cases 5.923303835 loss 1386 C9orf93 Exon+ve, >2 cases 5.923303835 SEQ ID 217 9 15655922 15658483 2561 loss 1477 C9orf93 Exon+ve, >2 cases 5.923303835 loss 1594 C9orf93 Exon+ve, >2 cases 5.923303835 .o n loss 1881 C9orf93 Exon+ve, >2 cases 5.923303835 loss 1314 MAP3K9 Exon+ve, >2 cases 5.923303835 ci) t.) SEQ ID 218 1 94113132 94116506 3374 loss 1802 DNTTIP2 Exon+ve, >2 cases 5.923303835 -, gain 1780 ELAVL3 Exon+ve, >2 cases 5.923303835 w =-==

gain 1788 ELAVL3 Exon+ve, >2 cases 5.923303835 r.) ul .1, C.AJ

SEQ ID 219 2 218971708 218975318 3610 loss 1913 CTDSP1 Exon+ve, >2 cases 5.923303835 SEQ ID 194 2 218849164 218852974 3810 gain 1284 PNKD, TMBIM1 Exon+ve, >2 cases 5.923303835 SEQ ID 194 2 218849164 218852974 3810 gain 1728 PNKD, TMBIM1 Exon+ve, >2 cases 5.923303835 t-) =
SEQ ID 220 2 214582921 214586936 4015 loss 1512 SPAG16 Exon+ve, >2 cases 5.923303835 -, w , SEQ ID 221 6 29653815 29658113 4298 loss 1275 SNORD32B Exon+ve, >2 cases 5.923303835 -, t..) =
SEQ ID 221 6 29653815 29658113 4298 loss 1862 SNORD32B Exon+ve, >2 cases 5.923303835 SEQ ID 222 1 94113132 94117960 4828 loss 1233 DNTT1P2 Exon+ve, >2 cases 5.923303835 ao SEQ ID 223 2 218972428 218978243 5815 loss 1718 MIR26B, CTDSP1 Exon+ve, >2 cases 5.923303835 SEQ ID 224 6 29653815 29659892 6077 loss 1440 SNORD32B Exon+ve, >2 cases 5.923303835 SEQ ID 224 6 29653815 29659892 6077 loss 1750 SNORD32B Exon+ve, >2 cases 5.923303835 SEQ ID 225 8 43288182 43294454 6272 loss 1549 POTEA Exon+ve, >2 cases 5.923303835 SEQ ID 226,17 57329783 57336509 ,6726 ,loss , 1784 INTS2 ,Exon+ve, >2 cases ,5.923303835 loss 1227 FUT2 Exon+ve, >2 cases 5.923303835 loss 1448 FUT2 Exon+ve, >2 cases 5.923303835 P
SEQ ID 228 2 218844854 218852974 8120 gain 1660 PNKD, TMBIM1 Exon+ve, >2 cases 5.923303835 .
SEQ ID 229 14 102447174 102455572 8398 loss 1820 TRAF3 Exon+ve, >2 cases 5.923303835 .

SEQ ID 230 14 ,102401445 , 102409996 8551 ,gain 1838 , TRAF3 Exon+ve, >2 cases 5.923303835, ' , loss 1439 INTS2 Exon+ve, >2 cases 5.923303835 ' SEQ ID 231 17 57327446 57336509 9063 loss 1601 INTS2 Exon+ve, >2 cases 5.923303835 .
, loss 1697 FUT2 Exon+ve, >2 cases 5.923303835 .
o, loss 1641 INTS2 Exon+ve, >2 cases 5.923303835 SEQ ID 234 1 226061846 226072012 10166 loss 1371 PRSS38 Exon+ve, >2 cases 5.923303835 SEQ ID 234 1 226061846 226072012 10166 loss 1653 PRSS38 Exon+ve, >2 cases 5.923303835 loss 1694 FUT2 Exon+ve, >2 cases 5.923303835 SEQ ID 236 X 8463131 8473482 10351 loss 1298 KAL1 Exon+ve, >2 cases 5.923303835 SEQ ID 236 X 8463131 8473482 10351 loss 1432 KAL1 Exon+ve, >2 cases 5.923303835 SEQ ID 237 2 218967950 218978839 10889 loss 1721 MIR26B, CTDSP1, SLC11A1 Exon+ve, >2 cases 5.923303835 .o n SEQ ID 237 2 218967950 218978839 10889 loss 1993 MIR26B, CTDSP1, SLC11A1 Exon+ve, >2 cases 5.923303835 -3 SEQ ID 238 6 134624093 134635779 11686 loss 1576 SGK1 Exon+ve, >2 cases 5.923303835 ci) SEQ ID 238 6 134624093 134635779 11686 loss 1667 SGK1 Exon+ve, >2 cases 5.923303835 t.) =
-, SEQ ID 239 8 22629771 22641498 11727 loss 1293 PEBP4 Exon+ve, >2 cases 5.923303835 w =-==
SEQ ID 239 8 22629771 22641498 11727 loss 1296 PEBP4 Exon+ve, >2 cases 5.923303835 ul .1, C.AJ

SEQ ID 239 8 22629771 22641498 11727 loss 1842 PEBP4 Exon+ve, >2 cases 5.923303835 SEQ ID 200 6 134622620 134635779 13159 loss 1224 SGK1 Exon+ve, >2 cases 5.923303835 SEQ ID 240 1 179250547 179263983 13436 loss 1950 STX6 Exon+ve, >2 cases 5.923303835 t-) =
SEQ ID 241 1 226061846 226075375 13529 loss 1234 PRSS38 Exon+ve, >2 cases 5.923303835 -, w , loss 1659 CASC4 Exon+ve, >2 cases 5.923303835 -, i..) =
SEQ ID 243 2 213922938 213938010 15072 loss 1870 SPAG16 Exon+ve, distinct 5.923303835 ao CNVs, same Gene SEQ ID 244 1 179248755 179263983 15228 loss 1662 STX6 Exon+ve, >2 cases 5.923303835 SEQ ID 201 7 45079997 45096030 16033 loss 1642 NACAD, CCM2 Exon+ve, >2 cases 5.923303835 SEQ ID 201 7 45079997 45096030 16033 loss 1819 NACAD, CCM2 Exon+ve, >2 cases 5.923303835 SEQ ID 201 7 45079997 45096030 16033 loss 1825 NACAD, CCM2 Exon+ve, >2 cases 5.923303835 SEQ ID 245 2 214582921 214599105 16184 loss 1636 SPAG16 Exon+ve, >2 cases 5.923303835 SEQ ID 246 1 179250547 179269450 18903 loss 1638 MR1, STX6 Exon+ve, >2 cases 5.923303835 loss 1638 CASC4 Exon+ve, >2 cases 5.923303835 P
SEQ ID 248 2 213900382 213922938 22556 loss 1832 SPAG16 Exon+ve, distinct 5.923303835 .
,,, CNVs, same Gene .
.

SEQ ID 249 1 179250547 179274160 23613 loss 1659 MR1, STX6 Exon+ve, >2 cases 5.923303835 0 , gain 1841 L0C400456 Exon+ve, >2 cases 5.923303835 ' , SEQ ID 251 1 226061846 226091036 29190 loss 1344 PRSS38 Exon+ve, >2 cases 5.923303835 .
, loss 1660 CASC4 Exon+ve, >2 cases 5.923303835 .
o, SEQ ID 253 11 5848930 5892024 43094 gain 1593 0R52E4 Exon+ve, >2 cases 5.923303835 gain 1920 0R52E4 Exon+ve, >2 cases 5.923303835 SEQ ID 254 11 5839924 5892024 52100 gain 1333 0R52E4 Exon+ve, >2 cases 5.923303835 SEQ ID 255 11 5848930 5902760 53830 gain 1301 0R52E4 Exon+ve, >2 cases 5.923303835 loss 1714 BASP1P1 Exon+ve, >2 cases 5.923303835 gain 1596 SLC5A10, FAM83G, PRPSAP2 Exon+ve, >2 cases 5.923303835 SEQ ID 257 6 160237631 160371016 133385 gain 1574 IGF2R, MASI
Exon+ve, >2 cases 5.923303835 .o n 153706 gain 1662 BASP1P1 Exon+ve, >2 cases 5.923303835 -3 SEQ ID 259 X 8397974 8677639 279665 gain 1566 KAL1 Exon+ve, >2 cases 5.923303835 ci) i.) 383428 gain 1744 BASP1P1 Exon+ve, >2 cases 5.923303835 =
.., SEQ ID 261 X 8397974 8790795 392821 gain 1901 KALI, FAM9A
Exon+ve, >2 cases 5.923303835 w SEQ

-476825 gain 1316 HGSNAT, POTEA Exon+ve, >2 cases 5.923303835 r.) ui .1, ca a SEQ ID 204 8 43057445 43647063 589618 gain 1406 HGSNAT, FNTA, POTEA, SGK196 Exon+ve, 22 cases 5.923303835 1976322 gain 1429 MTRNR2L5, PCDH15 Exon+ve, >2 cases 5.923303835 SEQ ID 264 14 102008576 105330913 3322337 gain 1447 BAG5,SNORA28,TRMT61A,EIF5, Exon+ve, 22 cases 5.923303835 t-) =
MIR4309,RCORLEXOC3L4,TME
-, w , M179,XRCC3,LOC100131366.INF2 -, t..) =
,ASPG,AMN,CKB,SIVA1.AN(RD9 ,MIR203,CDC42BPB,MARA(3,JAG
ao 2,C14orf153,L0C647310,MTA1,TD
RD9,TRAFITMEM121,CDCA4,TE
CPR2,KIF26A,NUDT14,AHNAK2, MGC23270,ADSSL1,BRF1,C14orfl 80,PAC S2, Cl4orf79,PLD4,ZFYVE2 1,AKT1,C14orf80,KIAA0284,TNFA
IP2,ZBTB42.PPP1R13B,GPR132,C1 4orf2,KLC ETBD6,CRIP1,CRIP2 P
SEQ ID 265 2 1469952 1472562 2610 loss 1564 TPO Exon+ve, 22 cases 4.435935199 SEQ ID 265 2 1469952 1472562 2610 loss 1639 TPO Exon+ve, 22 cases 4.435935199 .

, SEQ ID 266 X 70057266 70062203 4937 gain 1346 SLC7A3 Exon+ve, 22 cases 4.435935199 loss 1395 BRD7 Exon+ve, 22 cases 4.435935199 .
, loss 1409 BRD7 Exon+ve, 22 cases 4.435935199 , loss 1428 BRD7 Exon+ve, 22 cases 4.435935199 o, loss 1995 ZIM3 Exon+ve, 22 cases 4.435935199 loss 1996 ZIM3 Exon+ve, 22 cases 4.435935199 SEQ ID 269 X 46832380 46837814 5434 ,loss 1675 RGN Exon+ve, 22 cases 4.435935199 SEQ ID 269 X 46832380 46837814 5434 gain 1896 RGN Exon+ve, 22 cases 4.435935199 SEQ ID 269 X 46832380 46837814 5434 gain 2040 RGN Exon+ve, >2 cases 4.435935199 SEQ ID 270 X 128775325 128780946 5621 gain 1459 ZDHHC9 Exon+ve, 22 cases 4.435935199 .o SEQ ID 271 X 123691710 123698719 7009 loss 1421 ODZ1 Exon+ve, 22 cases 4.435935199 n SEQ ID 271 X 123691710 123698719 7009 loss 1428 ODZ1 Exon+ve, 22 cases 4.435935199 SEQ ID 271 X 123691710 123698719 7009 loss 1805 ODZ1 Exon+ve, 22 cases 4.435935199 ci) t.) =
SEQ ID 272 X 100665462 100673058 7596 gain 1269 ARMCX4 Exon+ve, 22 cases 4.435935199 -, w SEQ ID 272 X 100665462 100673058 7596 gain 1857 ARMCX4 Exon+ve, 22 cases 4.435935199 =-==
r.) ul .1, ca a loss 1901 GRIN2D Exon+ve, >2 cases 4.435935199 loss 1959 GRIN2D Exon+ve, >2 cases 4.435935199 SEQ ID 274 X 128772381 128782290 9909 gain 1824 ZDHHC9 Exon+ve, >2 cases 4.435935199 t-) a SEQ ID 275 X 70051128 70062203 11075 gain 1308 SLC7A3 Exon+ve, >2 cases 4.435935199 -, w , SEQ ID 276 X 70049036 70062203 13167 gain 1284 SLC7A3 Exon+ve, >2 cases 4.435935199 ¨, i..) a loss 1671 KDELR1, GRIN2D Exon+ve, >2 cases 4.435935199 ao SEQ ID 278 X 128768758 128782290 13532 gain 1806 ZDH1-IC9 Exon+ve, >2 cases 4.435935199 SEQ ID 279 X 100658130 100673058 14928 loss 1413 ARMCX4 Exon+ve, >2 cases 4.435935199 gain 1541 RPSAP58 Exon+ve, >2 cases 4.435935199 gain 1608 RPSAP58 Exon+ve, >2 cases 4.435935199 loss 1805 MICAL3 Exon+ve, >2 cases 4.435935199 loss 1780 MICAL3 Exon+ve, >2 cases 4.435935199 loss 2034 MICAL3 Exon+ve, >2 cases 4.435935199 gain 1783 RPSAP58 Exon+ve, >2 cases 4.435935199 P

gain 1879 TMEM231, CHST5 Exon+ve, >2 cases 4.435935199 .

gain 2032 TMEM231, CHST5 Exon+ve, >2 cases 4.435935199 .

,gain 1993 TMEM231, CHST5 Exon+ve, >2 cases 4.435935199 2 842889 loss 1461 TRAPPC2P1, ZNF835, USP29, Exon+ve, >2 cases 4.435935199 ZNF17, ZNF71, ZNF749, ZNF264, , LOC147670, VN1R1, AURKC, o, PEG3-AS1, Z1M2, Z1M3, ZNF304, ZNF805, ZNF547, ZNF543, MIMTI, ZNF460, DUXA, ZNF548, SEQ ID 289 9 98831789 98831814 25 gain 1629 CTSL2 Exon+ve, >2 cases 4.435935199 SEQ ID 289 9 98831789 98831814 25 loss 1715 CTSL2 Exon+ve, >2 cases 4.435935199 SEQ ID 289 9 98831789 98831814 25 loss 1718 CTSL2 Exon+ve, >2 cases 4.435935199 .o SEQ ID 290 X 12833576 12834706 1130 loss 1633 TLR8, LOC349408 Exon+ve, >2 cases 4.435935199 n SEQ ID 290 X 12833576 12834706 1130 loss 1901 TLR8, LOC349408 Exon+ve, >2 cases 4.435935199 ci) SEQ ID 290 X 12833576 12834706 1130 loss 2024 TLR8, LOC349408 Exon+ve, >2 cases 4.435935199 a SEQ ID 291 1 22787161 22788440 1279 loss 1278 EPHA8 Exon+ve, >2 cases 4.435935199 ¨, w loss 1687 EPHA8 Exon+ve, >2 cases 4.435935199 r.) ui .1, w a SEQ ID 291 1 22787161 22788440 1279 loss 1895 EPHA8 Exon+ve, >2 cases 4.435935199 SEQ ID 292 6 149109599 149110881 1282 loss 1369 UST
Exon+ve, >2 cases 4.435935199 SEQ ID 292 6 149109599 149110881 1282 loss 1645 UST Exon+ve, >2 cases 4.435935199 t-) =
SEQ ID 293 4 47358255 47359575 1320 gain 1658 CORIN Exon+ve, >2 cases 4.435935199 .., w , loss 1656 TGFBR3 Exon+ve, >2 cases 4.435935199 -, i..) =
SEQ ID 294 1 91946409 91948225 1816 loss 2043 TGFBR3 Exon+ve, >2 cases 4.435935199 loss 1536 EPSTI1 Exon+ve, distinct 4.435935199 ao CNVs, same Gene SEQ ID 296 6 146912375 146914496 2121 loss 1291 RAB32 Exon+ve, >2 cases 4.435935199 SEQ ID 296 6 146912375 146914496 2121 loss 1309 RAB32 Exon+ve, >2 cases 4.435935199 SEQ ID 297 3 9720244 9722646 2402 gain 1264 CPNE9 Exon+ve, >2 cases 4.435935199 SEQ ID 297 3 9720244 9722646 2402 gain 1587 CPNE9 Exon+ve, >2 cases 4.435935199 SEQ ID 297 3 9720244 9722646 2402 gain 1618 CPNE9 Exon+ve, >2 cases 4.435935199 loss 1226 C 14orf166 Exon+ve, >2 cases 4.435935199 P

loss 1253 C14orf166 Exon+ve, >2 cases 4.435935199 .

loss 1650 Cl 4orf166 Exon+ve, >2 cases 4.435935199 .

loss 1544 ALDH1A3 Exon+ve, >2 cases 4.435935199 ' , loss 1626 ALDH1A3 Exon+ve, >2 cases 4.435935199 gain 1644 ALDH1A3 Exon+ve, >2 cases 4.435935199 , loss 1738 KIF7 Exon+ve, >2 cases 4.435935199 01 o, SEQ ID 265 2 1469952 1472562 2610 loss 1510 TPO Exon+ve, >2 cases 4.435935199 loss 1966 CACNG8 Exon+ve, >2 cases 4.435935199 SEQ ID 302 5 90081197 90084436 3239 gain 1489 GPR98 Exon+ve, >2 cases 4.435935199 SEQ ID 303 2 106174179 106177686 3507 loss 1505 UXS1 Exon+ve, >2 cases 4.435935199 SEQ ID 303 2 106174179 106177686 3507 loss 1611 UXS1 Exon+ve, >2 cases 4.435935199 SEQ ID 304 4 47358255 47361851 3596 gain 1252 CORIN Exon+ve, >2 cases 4.435935199 SEQ ID 305 3 33868917 33873484 4567 loss 1259 PDCD6IP Exon+ve, >2 cases 4.435935199 .o n SEQ ID 305 3 33868917 33873484 4567 loss 1274 PDCD6IP Exon+ve, >2 cases 4.435935199 -3 SEQ ID 305 3 33868917 33873484 4567 loss 1724 PDCD6IP Exon+ve, >2 cases 4.435935199 ci) loss 1953 CACNG8 Exon+ve, >2 cases 4.435935199 =
.., SEQ ID 307 2 43857496 43862163 4667 loss 1688 DYNC2LI1 Exon+ve, >2 cases 4.435935199 w loss 1786 DYNC2LI1 Exon+ve, >2 cases 4.435935199 L'-) ui .1, ca a SEQ ID 307 2 43857496 43862163 4667 loss 1790 DYNC2LI1 Exon+ve, >2 cases 4.435935199 loss 1970 CTNNA3 Exon+ve, distinct 4.435935199 CNVs, same Gene =

loss 1317 KIF7 Exon+ve, >2 cases 4.435935199 .., w , loss 1720 CACNG8 Exon+ve, >2 cases 4.435935199 -, i..) =
SEQ ID 311 6 146908491 146914496 6005 loss 1535 RAB32 Exon+ve, >2 cases 4.435935199 SEQ ID 312 7 99028753 99035131 6378 gain 1411 L0C100289187 Exon+ve, >2 cases 4.435935199 ao SEQ ID 312 7 99028753 99035131 6378 gain 1755 L0C100289187 Exon+ve, >2 cases 4.435935199 SEQ ID 313 7 99028753 99037212 8459 gain 1799 L0C100289187 Exon+ve, >2 cases 4.435935199 SEQ ID 314 3 197848634 197857567 8933 loss 1285 LRRC33 Exon+ve, >2 cases 4.435935199 SEQ ID 315 3 197276556 197285789 9233 gain 1565 TFRC Exon+ve, >2 cases 4.435935199 loss 1333 ZNF878 Exon+ve, >2 cases 4.435935199 loss 1391 ZNF878 Exon+ve, >2 cases 4.435935199 loss 1742 ZNF878 Exon+ve, >2 cases 4.435935199 P
SEQ ID 317 9 73771180 73780717 9537 gain 1793 C9orf85 Exon+ve, >2 cases 4.435935199 .
SEQ ID 317 9 73771180 73780717 9537 gain 1883 C9orf85 Exon+ve, >2 cases 4.435935199 .

loss 1918 FA2H Exon+ve, >2 cases 4.435935199 ' , SEQ ID 319 9 73771087 73780717 9630 gain 1893 C9orf85 Exon+ve, >2 cases 4.435935199 SEQ ID 320 3 58161589 58171419 9830 gain 1267 DNASE1L3 Exon+ve, >2 cases 4.435935199 , SEQ ID 320 3 58161589 58171419 9830 gain 1268 DNASE1L3 Exon+ve, >2 cases 4.435935199 01 o, SEQ ID 320 3 58161589 58171419 9830 gain 1354 DNASE1L3 Exon+ve, >2 cases 4.435935199 SEQ ID 321 2 106174179 106184290 10111 loss 1697 UXS1 Exon+ve, >2 cases 4.435935199 SEQ ID 322 3 197848634 197859317 10683 loss 1909 LRRC33 Exon+ve, >2 cases 4.435935199 loss 1293 FA2H Exon+ve, >2 cases 4.435935199 loss 1297 FA2H Exon+ve, >2 cases 4.435935199 SEQ ID 324 3 197846987 197859317 12330 loss 2030 LRRC33 Exon+ve, >2 cases 4.435935199 gain 1946 VWA3A Exon+ve, >2 cases 4.435935199 .o n gain 1962 VWA3A Exon+ve, >2 cases 4.435935199 -3 SEQ ID 326 16 3047597 3065241 17644 loss 1585 MMP25, IL32 Exon+ve, >2 cases 4.435935199 ci) SEQ ID 326 16 3047597 3065241 17644 loss 1919 MMP25. IL32 Exon+ve, >2 cases 4.435935199 =
.., SEQ ID 327 4 47314693 47335844 21151 loss 1308 CORIN Exon+ve, distinct 4.435935199 w -I-CNVs, same Gene ui .1, ca a SEQ ID 328 16 3044051 3065241 21190 loss 1804 MMP25, IL32 Exon+ve, >2 cases 4.435935199 gain 1299 ZNF37BP Exon+ve, >2 cases 4.435935199 SEQ ID 330 17 6673256 6695979 22723 gain 1600 TEKT1 Exon+ve, >2 cases 4.435935199 t-) =
SEQ ID 331 6 149098235 149121186 22951 loss 1660 UST Exon+ve, >2 cases 4.435935199 -, w , SEQ ID 332 9 116122595 116146858 24263 loss 1301 ORM1, ORM2, AKNA
Exon+ve, >2 cases 4.435935199 -, t..) =
SEQ ID 333 9 5632749 5660083 27334 gain 1463 K1AA1432 Exon+ve, >2 cases 4.435935199 ao SEQ ID 334 9 5634019 5661740 27721 gain 1818 K1AA1432 Exon+ve, >2 cases 4.435935199 SEQ ID 335 3 48583014 48611409 28395 loss 1428 MIR711, COL7A1, UQCRC1 Exon+ve, >2 cases 4.435935199 SEQ ID 336 9 5632749 5661740 28991 gain 1667 K1AA1432 Exon+ve, >2 cases 4.435935199 SEQ ID 337 9 79037727 79067111 29384 gain 1782 VPS13A Exon+ve, >2 cases 4.435935199 SEQ ID 337 9 79037727 79067111 29384 gain 1897 VPS13A Exon+ve, >2 cases 4.435935199 SEQ ID 337,9 79037727 79067111 ,29384 ,gain , 1938 VPS13A ,Exon+ve, >2 cases ,4.435935199 gain 1502 EPSTI1 Exon+ve, >2 cases 4.435935199 SEQ ID 339 9 116088109 116142499 54390 gain 1406 COL27A1, ORM1, ORM2, AKNA
Exon+ve, >2 cases 4.435935199 P
SEQ ID 340 9 116088109 116144225 56116 gain 2020 C0L27A1, ORM1, ORM2, AKNA
Exon+ve, >2 cases 4.435935199 .

gain 1780 CTNNA3 Exon+ve, distinct 4.435935199 .

CNVs, same Gene loss 2035 TMEM89, COL7A1, CELSR3, Exon+ve, >2 cases 4.435935199 MIR711, SLC26A6, UCN2, .
' ' o, 100316 gain 1548 KIF7, C15orf42 Exon+ve, >2 cases 4.435935199 101846 loss 1969 TMEM89, COL7A1, CELSR3, Exon+ve, >2 cases 4.435935199 MIR711, SLC26A6, UCN2, 104764 loss 1600 AL0X12P2 Exon+ve, >2 cases 4.435935199 SEQ ID 346 4 191041481 191153613 112132 gain 1230 FRG1, TUBB4Q
Exon+ve, >2 cases 4.435935199 SEQ ID 346 4 191041481 191153613 112132 gain 1292 FRG1, TUBB4Q
Exon+ve, >2 cases 4.435935199 .o SEQ ID 347 3 197289125 197410852 121727 gain 1565 L0C401109, TFRC, ZDHHC19 Exon+ve, >2 cases 4.435935199 n 124384 loss 1835 CTNNA3 Exon+ve, distinct 4.435935199 ci) CNVs, same Gene t',) =
SEQ ID 349 4 190982421 191133609 151188 gain 1411 FRG1 Exon+ve, >2 cases 4.435935199 -, w =-==

174310 loss 1927 TEKT1, ALOX12P2, XAF1, Exon+ve, >2 cases 4.435935199 r.) ul .1, ca a 178127 gain 1405 TGFBR3 Exon+ve, >2 cases 4.435935199 314645 gain 1897 ENOXI, DNAJC15, EPSTI1 Exon+ve, >2 cases 4.435935199 t-) SEQ ID 353 1 144099302 144458571 359269 loss 1874 RNF115, RBM8A, GNRHR2, Exon+ve, >2 cases 4.435935199 CD160, HFE2, ANKRD34A, LIX1L, POLR3GL, ANKRD35, ITGA10, PEX11B, NUDTI 7, TXNIP, ao PDZKl, POLR3C, PIAS3 SEQ ID 354 3 197135314 197531031 395717 gain 1227 PCYT1A, TM4SF19-TCTEXID2, Exon+ve, >2 cases 4.435935199 ZDHHC19, OSTalpha, TFRC, L0C401109, TCTEXID2, SDHAPI
SEQ ID 355 1 144099302 144544352 445050 gain 1599 RNFI15, GPR89A, RBM8A, Exon+ve, >2 cases 4.435935199 GNRHR2, CD160, HFE2, ANKRD34A, LIX1L, POLR3GL, ANKRD35, ITGA10, PEXI1B, NUDT17, TXNIP, PDZKl, POLR3C, PIAS3 SEQ ID 355 1 144099302 144544352 445050 gain 1968 RNF115, GPR89A, RBM8A, Exon+ve, >2 cases 4.435935199 GNRHR2, CD160, HFE2, ANKRD34A, LIX1L, POLR3GL, ANKRD35, ITGA10, PEX11B, NUDT17, TXNIP, PDZKl, POLR3C, PIAS3 22338034 479154 gain 1426 EEF2K, 6:31R2, POLR3E, Cl6orf52, Exon+ve, >2 cases 4.435935199 UQCRC' 2, PDZD9, VWA3A
SEQ ID 357 5 89477991 90142704 664713 gain 1786 LYSMD3, POLR3G, CETN3, Exon+ve, >2 cases 4.435935199 MBLAC2, GPR98 SEQ ID 357 5 89477991 90142704 664713 gain 1886 LYSMD3, POLR3G, CETN3, Exon+ve, >2 cases 4.435935199 MBLAC2, GPR98 676222 gain 1968 RASGEFIA, BMS1. ZNF48713, Exon+ve, >2 cases 4.435935199 ci) FXYD4, RET, CSGLNACT2.
HNRNPF
=-==

1078030 gain 1746 RASGEF1A, BMS1, ZNF37BP, Exon+ve, >2 cases 4.435935199 RET, LOC441666, ZNF33B, LOC84856, CSGALNACT2 . . . . .
. 0 SEQ ID 360 4 149047165 149047423 258 loss 1498 ARHGAP10 Exon+ve, >2 cases 2.952941176 =
SEQ ID 360 4 149047165 149047423 258 loss 1916 ARHGAPIO
Exon+ve, >2 cases 2.952941176 .., w , loss 1349 CEP57 Exon+ve, >2 cases 2.952941176 -, t..) =

loss 1946 CEP57 Exon+ve, >2 cases 2.952941176 gain 1660 GRAMD4 Exon+ve, >2 cases 2.952941176 ao SEQ ID 362 22 45453176 45454102 926 gain 1880 GRAMD4 Exon+ve, >2 cases 2.952941176 SEQ ID 363 X 13695016 13696059 1043 gain 1590 OFD1 Exon+ve, >2 cases 2.952941176 gain 1790 SLC25A29 Exon+ve, distinct 2.952941176 CNVs, same Gene SEQ ID 365 1 206023028 206024152 1124 loss 1724 CD46 Exon+ve, >2 cases 2.952941176 SEQ ID 366 8 42134084 42135245 1161 loss 1251 AP3M2 Exon+ve, distinct 2.952941176 CNVs, same Gene P

gain 1585 M1R516B2 Exon+ve, >2 cases 2.952941176 2 SEQ ID 368 1 156784465 156785660 1195 loss 1877 OR6Y1 Exon+ve, >2 cases 2.952941176 .9 SEQ ID 369 4 56070868 56072259 1391 loss 1529 CLOCK Exon+ve, >2 cases 2.952941176 ..

, SEQ ID 370 X 13673158 13674550 1392 loss 1320 OFD1 Exon+ve, >2 cases 2.952941176 ,.
SEQ ID 371 , 2 179837050 179838443 , 1393 , loss , 1727 SESTDI ,Exon+ve, >2 cases ,2.952941176 .
, ..

loss 1774 FER1L4 Exon+ve, >2 cases 2.952941176 ' ., loss 1705 SLC25A29 Exon+ve, distinct 2.952941176 CNVs, same Gene SEQ ID 374 X 40940810 40942301 1491 loss 1583 USP9X Exon+ve, >2 cases 2.952941176 SEQ ID 375 12 9777077 9778598 1521 loss 1264 CLECL1 Exon+ve, >2 cases 2.952941176 SEQ ID 375 12 9777077 9778598 1521 loss 1705 CLECL1 Exon+ve, >2 cases 2.952941176 loss 1295 XPO6 Exon+ve, >2 cases 2.952941176 loss 1917 XPO6 Exon+ve, >2 cases 2.952941176 .o SEQ ID 377 3 155353325 155355022 1697 gain 1371 ARHGEF26 Exon+ve, distinct 2.952941176 n CNVs, same Gene ci) gain 1417 TRIO Exon+ve, distinct 2.952941176 t',) =
.., CNVs, same Gene w loss 2001 ANO5 Exon+ve, >2 cases 2.952941176 r.) ul .1, ca a SEQ ID 380 11 125808845 125810734 1889 gain 1861 KIRREL3 Exon+ve, >2 cases 2.952941176 SEQ ID 381 2 30306530 30308506 1976 loss 1429 LBH Exon+ve, >2 cases 2.952941176 0 SEQ ID 381 2 30306530 30308506 1976 loss 1884 LBH Exon+ve, >2 cases 2.952941176 "
=
-, SEQ ID 382 X 29595687 29597689 2002 loss 1506 IL1RAPL1 Exon+ve, >2 cases 2.952941176 w , -, SEQ ID 382 X 29595687 29597689 2002 loss 1811 IL1RAPL1 Exon+ve, >2 cases 2.952941176 1.4 =
=
SEQ ID 383 11 127895094 127897121 2027 gain 1429 ETS1 Exon+ve, >2 cases 2.952941176 .
ao SEQ ID 383 11 127895094 127897121 2027 gain 1779 ETS1 Exon+ve, >2 cases 2.952941176 SEQ ID 384 X 105750701 105752733 2032 loss 1239 CXorf57 Exon+ve, >2 cases 2.952941176 SEQ ID 384 X 105750701 105752733 2032 loss 1372 CXorf57 Exon+ve, >2 cases 2.952941176 loss 1775 HECTD1 Exon+ve, distinct 2.952941176 CNVs, same Gene SEQ ID 386 2 106784966 106787143 2177 ,loss 1592 ST6GAL2 Exon+ve, >2 cases 2.952941176 SEQ ID 386 2 106784966 106787143 2177 loss 1720 ST6GAL2 Exon+ve, >2 cases 2.952941176 loss 1241 COMMD7 Exon+ve, >2 cases 2.952941176 P

gain 1877 FANCA Exon+ve, distinct 2.952941176 CNVs, same Gene .

, SEQ ID 389 12 21514182 21516409 2227 gain 1465 RECQL, PYROXD1 Exon+ve, >2 cases 2.952941176 gain 1925 RECQL, PYROXD1 Exon+ve, >2 cases 2.952941176 ,.
, gain 1524 CSDAP1 Exon+ve, >2 cases 2.952941176 .
, o, SEQ ID 391 3 155389583 155391992 2409 gain 1446 ARHGEF26 Exon+ve, distinct 2.952941176 CNVs, same Gene loss 1419 FER1L4 Exon+ve, >2 cases 2.952941176 SEQ ID 393 X 137525298 137527811 2513 gain 1223 L0C158696 Exon+ve, >2 cases 2.952941176 SEQ ID 393 X 137525298 137527811 2513 gain 2041 L0C158696 Exon+ve, >2 cases 2.952941176 SEQ ID 394 7 6004111 6006782 2671 gain 1266 PMS2 Exon+ve, >2 cases 2.952941176 SEQ ID 394 7 6004111 6006782 2671 gain 1938 PMS2 Exon+ve, >2 cases 2.952941176 .o SEQ ID 395 1 93492660 93495455 2795 gain 1832 CCDC18 Exon+ve, >2 cases 2.952941176 n SEQ ID 395 1 93492660 93495455 2795 gain 2032 CCDC18 Exon+ve, >2 cases 2.952941176 ci) SEQ ID 396,2 44403707 44406514 ,2807 ,gain , 1826 PREPL ,Exon+ve, >2 cases ,2.952941176 "
=
.., SEQ ID 397 1 156784465 156787318 2853 loss 1858 OR6Y1 Exon+ve, >2 cases 2.952941176 w gain 1642 HOMEZ Exon+ve, >2 cases 2.952941176 r.) ul .1, C.AJ

gain 1875 HOMEZ Exon+ve, >2 cases 2.952941176 loss 1630 UBR1 Exon+ve, >2 cases 2.952941176 loss 2018 UBR1 Exon+ve, >2 cases 2.952941176 t-) =

loss 1959 APOBEC3C Exon+ve, >2 cases 2.952941176 .., w , loss 1965 APOBEC3C Exon+ve, >2 cases 2.952941176 ¨, t..) =
SEQ ID 401 10 118190679 118193786 3107 loss 1287 PNLIPRP3 Exon+ve, >2 cases 2.952941176 SEQ ID 402 9 32459710 32463040 3330 loss 2003 DDX58 Exon+ve, distinct 2.952941176 ao CNVs, same Gene SEQ ID 403 8 67685665 67689015 3350 loss 1275 MYBL1 Exon+ve, >2 cases 2.952941176 SEQ ID 403 8 67685665 67689015 3350 loss 1650 MYBL1 Exon+ve, >2 cases 2.952941176 SEQ ID 404 12 108878848 108882203 3355 loss 1279 GIT2 Exon+ve, >2 cases 2.952941176 SEQ ID 404 12 108878848 108882203 3355 loss 1665 GIT2 Exon+ve, >2 cases 2.952941176 SEQ ID 405 8 54952820 54956193 3373 loss 1604 RGS20 Exon+ve, >2 cases 2.952941176 SEQ ID 406 3 46687043 46690457 3414 loss 1834 ALS2CL Exon+ve, >2 cases 2.952941176 P
SEQ ID 407 8 42145982 42149494 3512 gain 1634 AP3M2 Exon+ve, distinct 2.952941176 .
CNVs, same Gene .

SEQ ID 408 11 110872005 110875598 3593 loss 1465 BTG4 Exon+ve, >2 cases 2.952941176 , SEQ ID 409 X 8960105 8963721 3616 gain 1454 FAM9B Exon+ve, >2 cases 2.952941176 SEQ ID 410 7 48528408 48532031 3623 loss 1891 ABCA13 Exon+ve, >2 cases 2.952941176 .., SEQ ID 411 3 96161892 96165551 3659 loss 1619 L0C255025 Exon+ve, >2 cases 2.952941176 ' o, SEQ ID 411 3 96161892 96165551 3659 loss 1624 L0C255025 Exon+ve, >2 cases 2.952941176 SEQ ID 412 7 133906667 133910372 3705 gain 1783 AKR1B15 Exon+ve, >2 cases 2.952941176 SEQ ID 413 X 40938342 40942301 3959 loss 1415 USP9X Exon+ve, >2 cases 2.952941176 SEQ ID 414 9 6606637 6610662 4025 loss 1391 GLDC Exon+ve, distinct 2.952941176 CNVs, same Gene loss 1295 L0C388387 Exon+ve, >2 cases 2.952941176 loss 1470 L0C388387 Exon+ve, >2 cases 2.952941176 .o SEQ ID 416 4 68168394 68172597 4203 loss 1221 UBA6 Exon+ve, >2 cases 2.952941176 n SEQ ID 416 4 68168394 68172597 4203 loss 1222 UBA6 Exon+ve, >2 cases 2.952941176 ci) SEQ ID 417 6 166499289 166503493 4204 loss 1859 T
Exon+ve, distinct 2.952941176 =
.., CNVs, same Gene w SEQ ID ID 418 1 206019923 206024152 4229 loss 1843 CD46 Exon+ve, >2 cases 2.952941176 r.) ul .1, C.AJ

loss 1659 STAT3 Exon+ve, >2 cases 2.952941176 loss 1887 STAT3 Exon+ve, >2 cases 2.952941176 SEQ ID 420 4 107311633 107316223 4590 loss 1280 TBCK
Exon+ve, >2 cases 2.952941176 t-) =
SEQ ID 420 4 107311633 107316223 4590 loss 1933 TBCK
Exon+ve, >2 cases 2.952941176 -, w , SEQ ID 421 4 39829776 39834522 4746 loss 1947 N4BP2 Exon+ve, >2 cases 2.952941176 -, t..) =
SEQ ID 422 7 122051537 122056508 4971 loss 1354 CADPS2 Exon+ve, distinct 2.952941176 ao CNVs, same Gene SEQ ID 423 9 36263984 36268995 5011 gain 1716 GNE Exon+ve, >2 cases 2.952941176 SEQ ID 423 9 36263984 36268995 5011 gain 1829 GNE Exon+ve, >2 cases 2.952941176 loss 1764 GATA6 Exon+ve, >2 cases 2.952941176 loss 1969 GATA6 Exon+ve, >2 cases 2.952941176 SEQ ID 425 5 128326107 128331280 5173 loss 1699 SLC27A6 Exon+ve, >2 cases 2.952941176 SEQ ID 426 1 243768850 243774213 5363 loss 1840 KIF26B
Exon+ve, >2 cases 2.952941176 loss 1950 RARRES3 Exon+ve, >2 cases 2.952941176 P

loss 1988 SPECCI Exon+ve, distinct 2.952941176 .
CNVs, same Gene .

loss 1920 IRAK2 Exon+ve, distinct 2.952941176 , CNVs, same Gene SEQ ID 430 8 134336459 134342059 5600 loss 1552 NDRG1 Exon+ve, distinct 2.952941176 .
, ' CNVs, same Gene .
o, loss 1238 ATAD5 Exon+ve, >2 cases 2.952941176 loss 1831 ATAD5 Exon+ve, >2 cases 2.952941176 loss 1403 HECTD1 Exon+ve, distinct 2.952941176 CNVs, same Gene SEQ ID 433 6 74521789 74527607 5818 gain 1638 CD109 Exon+ve, >2 cases 2.952941176 loss 1230 DNAH3 Exon+ve, >2 cases 2.952941176 loss 1247 IRAK2 Exon+ve, distinct 2.952941176 mo CNVs, same Gene n SEQ ID 436 12 8173177 8179355 6178 gain 1246 POU5F1P3, CLEC4A Exon+ve, >2 cases 2.952941176 ci) SEQ ID 436 12 8173177 8179355 6178 gain 1308 POU5F1P3, CLEC4A Exon+ve, >2 cases 2.952941176 t.) =
SEQ ID 437 9 26919782 26925984 6202 loss 1539 PLAA Exon+ve, >2 cases 2.952941176 .., w SEQ ID 438 5 95183456 95189721 6265 gain 1281 GLRX Exon+ve, >2 cases 2.952941176 -I-r.) ul .1, ca a SEQ ID 438 5 95183456 95189721 6265 gain 1824 GLRX Exon+ve, >2 cases 2.952941176 SEQ ID 439 8 54951684 54958115 6431 loss 1993 RGS20 Exon+ve, >2 cases 2.952941176 loss 1619 ALG12 Exon+ve, >2 cases 2.952941176 =

loss 1930 ALG12 Exon+ve, >2 cases 2.952941176 -, w , SEQ ID 441 8 134331224 134337808 6584 gain 1854 NDRG1 Exon+ve distinct 2.952941176 , ¨, i..) =
CNVs, same Gene gain 1895 L0C91316, RGL4 Exon+ve, distinct 2.952941176 ao CNVs, same Gene SEQ ID 443 6 165748837 165755595 6758 loss 1590 PDE10A
Exon+ve, >2 cases 2.952941176 loss 1884 ANKSIB Exon+ve, >2 cases 2.952941176 loss 1694 CECR2 Exon+ve, >2 cases 2.952941176 loss 1940 TAS1R2 Exon+ve, >2 cases 2.952941176 SEQ ID 447 5 37398626 37405778 7152 ,loss 1426 NUP155 Exon+ve, >2 cases 2.952941176 SEQ ID 448 9 32490919 32498096 7177 loss 1645 DDX58 Exon+ve, distinct 2.952941176 P
CNVs, same Gene SEQ ID 449 6 166487200 166494679 7479 gain 1392 T
Exon+ve, distinct 2.952941176 .9 CNVs, same Gene , loss 1591 STIL .. Exon+ve, >2 cases 2.952941176 .. ,.
SEQ ID 450 1 47549912 47557441 7529 loss 1759 STIL Exon+ve, >2 cases 2.952941176 .
, SEQ ID 451 4 99104657 99112516 7859 gain 1489 C4orf37 Exon+ve distinct 2.952941176 , ..
' ., CNVs, same Gene loss 1776 RARRES3 Exon+ve, >2 cases 2.952941176 SEQ ID 453 4 186681553 186689469 7916 loss 1458 PDLIM3 Exon+ve, >2 cases 2.952941176 SEQ ID 454 7 122003026 122010979 7953 loss 1910 CADPS2 Exon+ve, distinct 2.952941176 CNVs, same Gene SEQ ID 455,4 44319603 44327596 ,7993 ,loss , 1487 YIPF7 ,Exon+ve, >2 cases ,2.952941176 SEQ ID 455 4 44319603 44327596 7993 loss 1659 Y1PF7 Exon+ve, >2 cases 2.952941176 .o gain 1803 C13orf38-SOHLH2, C13orf38 Exon+ve, >2 cases 2.952941176 n SEQ ID 457 4 56070868 56079086 8218 loss 1738 CLOCK Exon+ve, >2 cases 2.952941176 ci) loss 2023 PLA2G15 Exon+ve, >2 cases 2.952941176 =
¨, loss 1901 COMMD7 Exon+ve, >2 cases 2.952941176 w =-==
SEQ ID 460 1 201194532 201202914 8382 loss 1572 CYB5R1 Exon+ve, >2 cases 2.952941176 r.) ul .1, ca a SEQ ID 460 1 201194532 201202914 8382 loss 1687 CYB5R1 Exon+ve, >2 cases 2.952941176 SEQ ID 461 19 12650727 12659347 8620 loss 1638 DHPS Exon+ve, >2 cases 2.952941176 SEQ ID 462 1 149957941 149966646 8705 loss 1867 RIIADI Exon+ve, >2 cases 2.952941176 t-) =
SEQ ID 463 16 20861337 20870187 8850 loss 1760 DNAH3 Exon+ve, >2 cases 2.952941176 .., w , SEQ ID 464 19 12651862 12660732 8870 loss 1538 FBXW9, DHPS Exon+ve, >2 cases 2.952941176 ¨, t..) =
SEQ ID 465 9 17347695 17356839 9144 loss 1502 CNTLN Exon+ve, distinct 2.952941176 ao CNVs, same Gene SEQ ID 466 8 82910933 82920255 9322 loss 1638 SNX16 Exon+ve, >2 cases 2.952941176 SEQ ID 466 8 82910933 82920255 9322 loss 1950 SNX16 Exon+ve, >2 cases 2.952941176 SEQ ID 467 1 177589995 177599597 9602 loss 1372 SOAT1 Exon+ve, >2 cases 2.952941176 SEQ ID 467 1 177589995 177599597 9602 loss 1635 SOAT1 Exon+ve, >2 cases 2.952941176 SEQ ID 468 7 86932062 86941683 9621 loss 1439 ABCB4 Exon+ve, >2 cases 2.952941176 SEQ ID 469 2 201773817 201783547 9730 loss 1534 CASP10 Exon+ve, >2 cases 2.952941176 SEQ ID 470 22 24636477 24646275 9798 gain 1348 M1R1302-1, MY018B Exon+ve, >2 cases 2.952941176 P
SEQ ID 471 1 97937667 97947671 10004 loss 1221 DPYD Exon+ve, >2 cases 2.952941176 .
SEQ ID 472 2 48666246 48676336 10090 gain 1386 STON1-GTF2A1L, STON1 Exon+ve, distinct 2.952941176 .
CNVs, same Gene o's SEQ ID 473 9 17260655 17271186 10531 loss 1743 CNTLN Exon+ve, distinct 2.952941176 ,CNVs, same Gene . . . .
.

gain 1293 CACNA2D3 Exon+ve, >2 cases 2.952941176 ' o, gain 1921 CACNA2D3 Exon+ve, >2 cases 2.952941176 SEQ ID 475 12 97694069 97704854 10785 loss 1872 ANKSIB Exon+ve, >2 cases 2.952941176 SEQ ID 476 22 16635762 16646613 10851 loss 1718 BID Exon+ve, >2 cases 2.952941176 SEQ ID 476 22 16635762 16646613 10851 loss 1859 BID Exon+ve, >2 cases 2.952941176 SEQ ID 477 17 19924055 19935009 10954 loss 2038 SPECC1 Exon+ve, distinct 2.952941176 CNVs, same Gene SEQ ID 478 5 150506984 150518075 11091 loss 1433 ANXA6 Exon+ve, >2 cases 2.952941176 .o SEQ ID 479 18 22717441 22728600 11159 loss 1442 C18orf16 Exon+ve, >2 cases 2.952941176 n SEQ ID 480 7 100967884 100979053 11169 loss 1680 EMID2 Exon+ve, >2 cases 2.952941176 ci) SEQ ID 481 22 16366605 16378078 11473 loss 1226 CECR2 Exon+ve, >2 cases 2.952941176 =
.., SEQ ID 482 1 110102580 110114121 11541 loss 1680 EPS8L3 Exon+ve, >2 cases 2.952941176 w loss 1883 N4BP2 Exon+ve, >2 cases 2.952941176 r.) ul .1, ca a SEQ ID 484 7 86930016 86941683 11667 loss 1579 ABCB4 Exon+ve, >2 cases 2.952941176 loss 1852 UPF0639 Exon+ve, >2 cases 2.952941176 loss 1871 UPF0639 Exon+ve, >2 cases 2.952941176 t-) =

loss 1502 C 18orf16 Exon+ve, >2 cases 2.952941176 .., w , gain 1232 TGFB1I1, ARMC5 Exon+ve, >2 cases 2.952941176 -, i..) =

gain 1508 TGFB 'IL ARMC5 Exon+ve, >2 cases 2.952941176 SEQ ID 488 3 46677853 46690457 12604 loss 1318 ALS2CL Exon+ve, >2 cases 2.952941176 ao SEQ ID 489 3 38415026 38428090 13064 loss 1802 XYLB Exon+ve, >2 cases 2.952941176 SEQ ID 490 19 58910511 58923614 13103 gain 1606 MIR526A2, MIR517B, MIR516B2, Exon+ve, >2 cases 2.952941176 MIR520G, MIR520D, MIR521-2 SEQ ID 491 1 110102580 110115770 13190 loss 1802 EPS8L3 Exon+ve, >2 cases 2.952941176 loss 1315 Clorf144 Exon+ve, distinct 2.952941176 CNVs, same Gene loss 1454 ZNF324B Exon+ve, >2 cases 2.952941176 P

gain 1564 C13orf38-SOHLH2, Cl3orf38 Exon+ve, >2 cases 2.952941176 2 gain 1502 TAS1R2 Exon+ve, >2 cases 2.952941176 .9 ..

loss 1993 RIN1 Exon+ve, >2 cases 2.952941176 , loss 1858 PLA2G15 Exon+ve, >2 cases 2.952941176 ,.
SEQ ID 498 6 74517372 74531383 14011 gain 1894 CD109 Exon+ve, >2 cases 2.952941176 .
, ..

loss 1678 ZNF808 Exon+ve, >2 cases 2.952941176 ' ., loss 1855 ZNF808 Exon+ve, >2 cases 2.952941176 SEQ ID 500 5 128316373 128331280 14907 loss 1248 SLC27A6 Exon+ve, >2 cases 2.952941176 SEQ ID 501 4 101572938 101587882 14944 gain 1867 EMCN
Exon+ve, >2 cases 2.952941176 SEQ ID 502 6 155530613 155545570 14957 loss 1347 TIAM2 Exon+ve, >2 cases 2.952941176 SEQ ID 502 6 155530613 155545570 14957 loss 1598 TIAM2 Exon+ve, >2 cases 2.952941176 SEQ ID 503 7 100967884 100982939 15055 loss 1820 EMID2 Exon+ve, >2 cases 2.952941176 SEQ ID 504 4 101572411 101587882 15471 gain 1752 EMCN
Exon+ve, >2 cases 2.952941176 .o loss 1354 BCAS1 Exon+ve, >2 cases 2.952941176 n SEQ ID 505 20 ,52074911 ,52090393 15482 ,loss 1860 ,BCAS1 Exon+ve, >2 cases 2.952941176, ci) SEQ ID 506 9 127014097 127029947 15850 loss 1222 RABEPK
Exon+ve, >2 cases 2.952941176 6' .., loss 2041 KLHDC4 Exon+ve, distinct 2.952941176 w -I-CNVs, same Gene r.) ui .1, ca a loss 1909 SEPT9 Exon+ve, >2 cases 2.952941176 loss 1844 KRT6C Exon+ve, >2 cases 2.952941176 loss 2037 KRT6C Exon+ve, >2 cases 2.952941176 t-) =
SEQ ID 510 7 107049716 107067706 17990 loss 1321 BCAP29 Exon+ve, >2 cases 2.952941176 .., w , SEQ ID 510 7 107049716 107067706 17990 loss 1475 BCAP29 Exon+ve, >2 cases 2.952941176 ¨, t..) =

loss 1958 RIN1 Exon+ve, >2 cases 2.952941176 SEQ ID 512 3 38415026 38433483 18457 loss 1725 XYLB Exon+ve, >2 cases 2.952941176 ao loss 1258 KLHDC4 Exon+ve, distinct 2.952941176 CNVs, same Gene SEQ ID 514 1 233582552 233602295 19743 loss 1720 TBCE
Exon+ve, >2 cases 2.952941176 SEQ ID 515 7 91585706 91605955 20249 loss 1856 CYP51A1 Exon+ve, >2 cases 2.952941176 SEQ ID 516 5 150504105 150524435 20330 loss 1942 ANXA6 Exon+ve, >2 cases 2.952941176 SEQ ID 517 9 92596909 92617806 20897 gain 1423 SYK Exon+ve, distinct 2.952941176 CNVs, same Gene P
SEQ ID 518 6 170680224 170701779 21555 gain 1954 PSMBI
Exon+ve, >2 cases 2.952941176 .
SEQ ID 519 9 134924325 134946471 22146 gain 1887 CEL
Exon+ve, >2 cases 2.952941176 .9 ..
SEQ ID 520 11 110853365 110875598 22233 loss 1276 BTG4 Exon+ve, >2 cases 2.952941176 ' , SEQ ID 521 3 197537870 197560934 23064 gain 1775 TM4SF19, TM4SF19-TCTEX1D2 Exon+ve, >2 cases 2.952941176 ,.
SEQ ID 522 1 149941641 149964885 23244 loss 2033 CELF3, RHAD1 Exon+ve, >2 cases 2.952941176 .
, ..
SEQ ID 523 1 206053098 206076352 23254 loss 1638 L0C148696 Exon+ve, >2 cases 2.952941176 01 ., SEQ ID 524 17 423068 446585 23517 loss 1268 VPS53 Exon+ve, >2 cases 2.952941176 SEQ ID 525 9 6555187 6578755 23568 loss 1609 GLDC Exon+ve, distinct 2.952941176 CNVs, same Gene SEQ ID 526 3 197712985 197736785 23800 loss 1546 RNF168, C3orf43 Exon+ve, >2 cases 2.952941176 gain 1276 STON1-GTF2A1L, STON1 Exon+ve, distinct 2.952941176 CNVs, same Gene SEQ ID 528 1 246138090 246162296 24206 gain 1798 OR218 Exon+ve, >2 cases 2.952941176 .o SEQ ID 529 X 32203770 32228244 24474 gain 2018 DMD
Exon+ve, >2 cases 2.952941176 n SEQ ID 530 1 206054159 206078819 24660 loss 1659 L0C148696 Exon+ve, >2 cases 2.952941176 ci) loss 1833 MIR1302-1, MY018B Exon+ve, >2 cases 2.952941176 =
.., SEQ ID 532 2 125058391 125084599 26208 gain 1803 CNTNAP5 Exon+ve, >2 cases 2.952941176 w SEQ ID ID 533 X 8931895 8958319 26424 loss 1496 FAM9B Exon+ve, >2 cases 2.952941176 r.) ul .1, ca a SEQ ID 534 X 48688957 48716140 27183 loss 1639 KCND1, OTUD5, GRIPAP1 Exon+ve, >2 cases 2.952941176 SEQ ID 535 2 143888582 143915868 27286 gain 1750 ARHGAP15 Exon+ve, >2 cases 2.952941176 SEQ ID 536 9 26919782 26947140 27358 loss 1656 PLAA. IFT74 Exon+ve, >2 cases 2.952941176 t-) =
SEQ ID 537 9 127001024 127028444 27420 loss 1669 RABEPK
Exon+ve, >2 cases 2.952941176 .., w , SEQ ID 538 7 89824673 89852155 27482 gain 1864 GTPBP10 Exon+ve, distinct 2.952941176 ¨, i..) =
CNVs, same Gene SEQ ID 539 4 70523201 70551081 27880 loss 1285 UGT2A2, UGT2A1 Exon+ve, >2 cases 2.952941176 ao SEQ ID 539 4 70523201 70551081 27880 loss 1433 UGT2A2, UGT2A1 Exon+ve, >2 cases 2.952941176 SEQ ID 540 2 125058391 125088012 29621 gain 1532 CNTNAP5 Exon+ve, >2 cases 2.952941176 SEQ ID 541 6 30021908 30052053 30145 loss 1244 HCG9 Exon+ve, >2 cases 2.952941176 SEQ ID 542 6 26539830 26571434 31604 loss 1968 BTN2A1, BTN3A3 Exon+ve, >2 cases 2.952941176 loss 1825 SEPT9 Exon+ve, >2 cases 2.952941176 loss 1724 APOL2 Exon+ve, >2 cases 2.952941176 loss 2035 APOL2 Exon+ve, >2 cases 2.952941176 P

loss 1549 L0C91316 Exon+ve, distinct 2.952941176 .
CNVs, same Gene .

SEQ ID 546 2 179804969 179838443 33474 loss 1425 SESTD1 Exon+ve, >2 cases 2.952941176 , SEQ ID 547 X 154395845 154429912 34067 gain 1724 TMLHE
Exon+ve, >2 cases 2.952941176 loss 1274 FANCA Exon+ve, distinct 2.952941176 .., CNVs, same Gene ' o, SEQ ID 549 6 26536902 26571434 34532 gain 1842 BTN2A3, BTN2A1, BTN3A3 Exon+ve, >2 cases 2.952941176 SEQ ID 550 7 133872990 133908027 35037 gain 1494 AKR1B1, AKR1B15 Exon+ve, >2 cases 2.952941176 SEQ ID 551 7 127640643 127675911 35268 gain 1733 LEP
Exon+ve, >2 cases 2.952941176 SEQ ID 552 6 30021908 30057524 35616 loss 1488 HCG9 Exon+ve, >2 cases 2.952941176 SEQ ID 553 7 127640643 127678165 37522 gain 1266 LEP
Exon+ve, >2 cases 2.952941176 gain 1295 L0C100133308 Exon+ve, >2 cases 2.952941176 SEQ ID 555 7 141408013 141446728 38715 gain 1225 MGAM
Exon+ve, >2 cases 2.952941176 .o SEQ ID 555 7 141408013 141446728 38715 gain 1720 MGAM
Exon+ve, >2 cases 2.952941176 n SEQ ID 556 2 31279154 31321453 42299 loss 1544 CAPN14, EHD3 Exon+ve, >2 cases 2.952941176 ci) SEQ ID 556 2 31279154 31321453 42299 loss 1929 CAPN14, EHD3 Exon+ve, >2 cases 2.952941176 =
.., gain 1609 ANO5 Exon+ve, >2 cases 2.952941176 w SEQ ID ID 558 9 115858589 115903754 45165 gain 1406 ZNF618, AMBP, KIF12 Exon+ve, >2 cases 2.952941176 r.) ul .1, ca a SEQ ID 558 9 115858589 115903754 45165 gain 2020 ZNF618, AMBP, KIF12 Exon+ve, >2 cases 2.952941176 SEQ ID 559 4 100955189 101000511 45322 gain 1462 DAPP1 Exon+ve, >2 cases 2.952941176 SEQ ID 559 4 100955189 101000511 45322 gain 1913 DAPP1 Exon+ve, >2 cases 2.952941176 t-) =

gain 1740 EFTUDI, FAM154B Exon+ve, >2 cases 2.952941176 -, w , SEQ ID 561 2 44403707 44458771 55064 loss 1504 CAMKMT, PREPL Exon+ve, >2 cases 2.952941176 -, t..) =

gain 1466 TSGA10, C2orf15, MRPL30, Exon+ve, >2 cases 2.952941176 MITD1, LIPTI
ao loss 1570 M1R548Y Exon+ve, >2 cases 2.952941176 gain 1995 Clorf144, FBX042 Exon+ve, distinct 2.952941176 CNVs, same Gene gain 1768 ANKRD33 Exon+ve, >2 cases 2.952941176 loss 1908 NRXN3 Exon+ve, distinct 2.952941176 CNVs, same Gene gain 1836 ANKRD33 Exon+ve, >2 cases 2.952941176 P
SEQ ID 568 2 143887281 143956453 69172 loss 1677 ARHGAP15 Exon+ve, >2 cases 2.952941176 2 gain 1618 CSDAPI Exon+ve, >2 cases 2.952941176 .9 SEQ ID 570 2 201740139 201811330 71191 gain 1943 CASP10, CFLAR, CASP8 Exon+ve, >2 cases 2.952941176 ..

, SEQ ID 571 8 10658422 10732498 74076 loss 1663 PINXI, MIR1322 Exon+ve, >2 cases 2.952941176 ,.
SEQ ID 572 X 154297852 154375564 77712 gain 1831 F8A1, F8A3, F8A2, H2AFB3, Exon+ve, >2 cases 2.952941176 .
, ..
H2AFB2, H2AFB1, MIR1184-1, ' ., MIR1184-2, MIR1184-3, TMLHE
SEQ ID 573 9 92658019 92739799 81780 gain 1626 SYK Exon+ve, distinct 2.952941176 CNVs, same Gene SEQ ID 574 8 10649592 10741416 91824 gain 2042 PINX1, MIR1322 Exon+ve, >2 cases 2.952941176 100664 gain 1252 LEPR Exon+ve, >2 cases 2.952941176 SEQ ID 576,9 118469713 118571048 ,101335 ,loss ,1559 ASTN2, TRIM32 ,Exon+ve, >2 cases ,2.952941176 103914 loss 1534 C4orf37 Exon+ve distinct 2.952941176 , .o CNVs, same Gene n 104325 gain 1709 M1R548Y Exon+ve, >2 cases 2.952941176 ci) 104909 gain 1793 M1R663 Exon+ve, >2 cases 2.952941176 =

106804 gain 1920 LEPR Exon+ve, >2 cases 2.952941176 -, w SEQ ID 581 9 118405993 118524253 118260 loss 1622 ASTN2, TRIM32 Exon+ve, >2 cases 2.952941176 =-==
r.) ul .1, ca a 129718 loss 1824 CDH13 Exon+ve, >2 cases 2.952941176 SEQ ID 583 12 110666479 110799506 133027 gain 2022 ACAD10, MAPKAPK5, Cl2orf47, Exon+ve, >2 cases 2.952941176 SEQ ID 584 12 110665203 110799506 134303 gain 1763 ACADIO, MAPKAPK5, Cl2orf47, Exon+ve, >2 cases 2.952941176 SEQ ID 585 10 118141035 118275679 134644 gain 2036 PNLIPRP3 Exon+ve, >2 cases 2.952941176 136089 gain 1708 0R4C46, OR4A5 Exon+ve, >2 cases 2.952941176 ao SEQ ID 587 6 170616733 170753106 136373 gain 1729 TBP, PDCD2, PSMBI
Exon+ve, >2 cases 2.952941176 140117 loss 1538 L00729513, PDPR, AARS, Exon+ve, >2 cases 2.952941176 EXOSC6, CLEC18C

142979 gain 1354 EFTUDL FAM154B Exon+ve, >2 cases 2.952941176 SEQ ID 590 1 199054239 199199515 145276 gain 1587 CAMSAP1L1, Clorf106, GPR25 .. Exon+ve, >2 cases 2.952941176 SEQ ID 590 1 199054239 199199515 145276 gain 1799 CAMSAP1L1, Clorf106, GPR25 Exon+ve, >2 cases 2.952941176 SEQ ID 591 1 246025834 246172497 146663 gain 2034 0R2L13, OR11L1, TRIM58, Exon+ve, >2 cases 2.952941176 0R2T8, 0R14A16, 0R2W3 149643 loss 1793 L00729513, PDPR, AARS, Exon+ve, >2 cases 2.952941176 EXOSC6, CLECI8C

159426 gain 1566 ZNF626 Exon+ve, >2 cases 2.952941176 159426 gain 1761 ZNF626 Exon+ve, >2 cases 2.952941176 182262 ,loss 1991 FHIT Exon+ve, >2 cases 2.952941176 197698 gain 1274 STEAPI, GTPBP10, STEAP2, Exon+ve distinct 2.952941176 C7orf63 CNVs, same Gene SEQ ID 596 8 6718944 6926661 207717 gain 1572 DEFB1, DEFA1OP, DEFT1P2, Exon+ve, >2 cases 2.952941176 DEFA6, DEFA5, DEFA4, DEFA3, DEFA1, DEFA1B, DEFT1P
SEQ ID 597 9 134914697 135122604 207907 loss 1321 GBGTI, RALGDS, OBP2B, CEL, Exon+ve, >2 cases 2.952941176 CELP, ABO, GTF3C5 221166 gain 1862 ZNF324B, ZNF446, LOC646862, Exon+ve, >2 cases 2.952941176 ZNF324, ZNF8, ZNF497, RPS5, ZNF584, ZNF837, SLC27A5, ci) ZNF132, A1BG-AS1, ZSCAN22, =-==
r.) SEQ ID 599 4 74035932 74268619 232687 gain 1347 COX18, ANKRD17 Exon+ve, >2 cases 2.952941176 SEQ ID 599 4 74035932 74268619 232687 gain 1945 COX18, ANKRD17 Exon+ve, >2 cases 2.952941176 237299 gain 1765 NUP155, C5orf42 Exon+ve, >2 cases 2.952941176 t-) =
SEQ ID 601 17 365082 612187 247105 gain 1494 VPS53, DBIL5P. FAM57A, Exon+ve, >2 cases 2.952941176 -, w , GEMIN4, GLOD4 -, t..) =

250149 gain 1828 NFIA Exon+ve, distinct 2.952941176 CNVs, same Gene ao SEQ ID 603 1 233499409 233769452 270043 gain 1466 B3GALNT2, ARID4B, TBCE, Exon+ve, >2 cases 2.952941176 289310 loss 2036 NRXN3 Exon+ve, distinct 2.952941176 CNVs, same Gene SEQ ID 605 6 165458835 165766046 307211 gain 1760 C6orf118, PDE10A
Exon+ve, >2 cases 2.952941176 324478 loss 1621 XKR5_ DEFB1, DEFA10P, DEFA6, Exon+ve, >2 cases 2.952941176 AGPT5, DEFA4 P
SEQ ID 607 4 186649665 , 186977002 327337 ,gain 1281 ,SORBS2, PDLIM3 Exon+ve, >2 cases 2.952941176 .
SEQ ID 608 3 76072 406838 330766 gain 1598 CHL1 Exon+ve, >2 cases 2.952941176 ' SEQ ID 609 19 47894889 48276273 381384 gain 1282 PSG11, LOCI00289650, PSG10P, Exon+ve, >2 cases 2.952941176 0 , PSG8, PSG6, PSG7, PSG2, PSG3, .
, 48279312 384423 gain 1281 PSG11, LOC100289650, PSG10P, Exon+ve, >2 cases 2.952941176 .
, o, PSG8, PSG6, PSG7, PSG2, PSG3, 386819 gain 1864 DMD Exon+ve, >2 cases 2.952941176 388979 gain 1282 TRIO, DNAH5 Exon+ve, distinct 2.952941176 CNVs, same Gene SEQ ID 613 11 125616034 126095587 479553 gain 1713 DCPS, SRPR, FLJ39051, TIRAP, Exon+ve, >2 cases 2.952941176 FAM118B, FOXRED1, ST3GAL'4, .o n 504837 loss 1734 MTERF. LOC401387, AKAP9, Exon+ve, >2 cases 2.952941176 ,CYP511 ci) i.) . . , . =

555002 gain 1968 ANUBL1, ALOX5, L0C338579, Exon+ve, >2 cases 2.952941176 .., w L0C100133308, MIR3156-1, -I-r.) ul .1, ca a OR13A I, MARCH8 SEQ ID 616 3 197412253 197977900 565647 gain 1565 PCYT1A, FBX045, C3orf34, Exon+ve, >2 cases 2.952941176 LRRC33, WDR53, TM4SF19-TCTEX1D2, RNF168, ZDHHC19, OSTalpha, C3orf43, TM4SF19, PICiX, TCTEXID2, UBXN7, PAK2 SEQ ID 617 4 188688388 189297555 609167 gain 1704 ZFP42, TRIML2, TRIML1 Exon+ve, >2 cases 2.952941176 ao 779703 gain 1461 MRPL30, LYG2, LIPTI, AFF3, Exon+ve, >2 cases 2.952941176 MITD1, TXNDC9, TSGA10, C2orf15, REV1, EIF5B, LYGI

826339 gain 1936 FHIT Exon+ve, >2 cases 2.952941176 1027568 loss 1886 UPP1, ABCA13, PKDIL1, HUS1, Exon+ve, >2 cases 2.952941176 CDC14C, C7orf57, SUN3 1027644 gain 1396 NFIA Exon+ve, distinct 2.952941176 CNVs, same Gene 1080169 gain 1653 L00643650, ANUBLL GPRIN2, Exon+ve, >2 cases 2.952941176 PTPN20B, PTPN20A, FAM35B, L0072864 3, FRMPD2P1, AGAP4, SYT15, BMS1P1, FAM21C, BMS1P5, PPYR1 1102391 loss 1454 MIR137, DPYD Exon+ve, >2 cases 2.952941176 SEQ ID 624 3 227364 1488979 1261615 gain 1657 CHLL CNTN6 Exon+ve, >2 cases 2.952941176 1349121 loss 1994 L0C100289656, TJP1, APBA2, Exon+ve, >2 cases 2.952941176 NDNL2, L00646278, FAM189A1 1503241 gain 1875 CDH13, MIR3182, MPHOSPH6 Exon+ve, >2 cases 2.952941176 47017598 1539495 gain 1408 GPRIN2, LOC643650, PTPN20B, Exon+ve, >2 cases 2.952941176 PTPN20A, FAM35B, FAM21C, SYT15, FAM25C, L00728643, FAM25G, L00642826, ANXA8, FAM35B2, ANXA8L1, FRMPD2P1, ci) AGAP4, FAM25B, BMS1P1, AGAP9, BMS1P5, PPYRI , =-==
r.) ANUBLI

1633947 gain 1988 L0C100289656, TJP1, APBA2, Exon+ve, >2 cases 2.952941176 FAM7A1, L00653075, DKFZP434L187, FAM7A2, FAM7A3, NDNL2, L00646278, SEQ ID 629 1 242999910 244841528 1841618 loss 1767 CNST, TFB2M, HNRNPU, KIF26B, Exon+ve, >2 cases 2.952941176 NCRNA00201, FAM36A, SMYD3, ao SEQ ID 630 4 188089090 190030740 1941650 gain 1691 L0C401164, ZFP42, TRIML2, Exon+ve, >2 cases 2.952941176 2171274 gain 1694 M1R663, FRG1B Exon+ve, >2 cases 2.952941176 3549326 gain 1943 0R4C46, OR4A5 Exon+ve, >2 cases 2.952941176 SEQ ID 633 X 48171740 52710629 4538889 gain 1349 SSX7,SSX8,ERAS,PPPIR3F,GAGE Exon+ve, >2 cases 2.952941176 1,WAS.XAGE2B,GAGE5,GAGE4, CACN1F,GAGE6,GATA1,NUDT
10,SLC38A5,TFE3,PORCN,GAGE2 D,GAGE2E,GAGE2A,GAGE2B,GA
GE2C,GAGE12J,MAGIX,AKAP4, MAGED1,MAGED4,PQBP1,L0C34 7376,FOXP3,XAGE1D,PAGE4,PA
GE1,WDR45,CCDC120,FTSJ1,SYP, TBC1D25,MIR532.GSPT2,GAGE8, GLOD5,XAGE2,HDAC6.0TUD5,P
RAF2,SHROOM4,PLP2,OPKOW,M
IRS 00A,MIR500B,LOC 158572,CEN
PVL 1 ,LOC441495,MIRI 88,GAGE1 2H,GAGE121MIR660,GRIPAP1,G
AGE12B.GAGE12C,GAGE12D,GA
GE12E,OAGE I 2F,GAGE12G.MIR5 02,MIR501,WDR13,RBM3,C6C2 ci) 2,BMP15,TIMM17B,PRICKLE3,DG
KK,KCND1,XAGE1A.XAGE1B,X
=-==
AGE1C,PIM2,XAGE1E,SUV39H1, C.AJ

USP27X,SLC35A2,CLCN5,GAGE7, C CNB3,MIR362,PC SKIN, SNORA1 1E,SNORA11D,GAGE10,GAGE13, NUDT11,EBP,MAGED4B
SEQ ID 634 19 62653275 62660645 7370 loss 1522 VNIR1 Exon+ve, >2 cases 1.474302496 SEQ ID 635 15 56031543 56044966 13423 loss 1680 ALDH1A2 Exon+ve, distinct 1.474302496 CNVs, same Gene ao SEQ ID 636 11 99646264 99660303 14039 loss 1936 CNTN5 Special 1.474302496 SEQ ID 637 11 70167828 70217957 50129 loss 1835 SHANK2 Special 1.474302496 SEQ ID 638 X 151730135 151853605 123470 gain 1887 ZNF185, CETN2, NSDHL
Exon+ve, >2 cases 1.474302496 486431 loss 1597 NRXN1 Exon+ve, distinct 1.474302496 CNVs, same Gene SEQ ID 640 3 2389001 2955718 566717 gain 1851 CNTN4 Special 1.474302496 SEQ ID 641 1 244191230 244851275 660045 gain 1819 TFB2M, CNST, SMYD3 Exon+ve, >2 cases 1.474302496 SEQ ID 642 X ,96492941 97405356 912415 ,gain 1348 ,DIAPH2 Exon+ve, >2 cases 1.474302496 SEQ ID 643 17 26847029 26870510 23481 loss 1411 RAB11FIP4 Special 1.474302496 * Position references refer to the human gcnomic sequence Hg18 Mar. 2006 (NCB1 Build 36.1) [00222] Table 1 lists all CNVs of interest, obtained as described in the text.
For each entry, the originating CNV start and stop positions are noted, along with CNV size, CNV type (loss or gain), gene annotation (for original CNV), category of interest, and Odds Ratio (OR). The table also includes SEQ IDs for the CNVs in the range SEQ ID 1 - SEQ ID 643. CNVs that are identical between different ASD subjects are grouped into a single SEQ ID. Each SEQ ID refers to a numbered sequence in file 33655-708.202_PDx_SK_5T25.txt. "De novo" refers to CNVs found to occur in the offspring of two parents, neither of whom has the relevant CNV; Intronic"
refers to CNV subregions affecting introns only; "Ctrl pos High OR" refers to CNVs which include regions present at high frequency in the ASD
cohort cf. normal cohort; "Exon+ve, distinct CNVs, same Gene"
refers to CNVs in 2 or more ASD individuals affecting different exons of the same gene; "Exon+ve, >2 cases" refers to CNVs in 2 or more ASD
individuals affecting the same exon of a gene; -Special" refers to CNVs added to the list because of relationship to genes with strong biological ci) evidence in ASD; OR refers to the odds ratio calculation for the candidate CNV. The OR is calculated by grouping together all cases with an identical CNV/CNV subregion, and comparing it to the frequency of the same CNV/CNV subregion in the normal cohort. The calculation is performed as follows: (ASD A/682-ASD A)/(NVE A/1,005-NVE A), where ASD A =
number of ASD cases with the CNV and NVE A = number of normals with the CNV. In those cases for which no normals possess the CNV
of interest, NVE A is set to 1 by convention. For example, the OR
calculation for the MAOA Intronic CNV is as follows: OR= (26/682-26)7(1/1005-1) = (26/656)7(1/1004) = 39.79268293.
[00223] Column 3 refers to the nucleotide position in the respective chromosome (column 2) where the corresponding CNV begins and column 4 refers to the nucleotide position in the respective chromosome where the corresponding CNV ends. Column 5 refers to the length/size of the CNV irao bps. Nucleotide positions were determined using the database Hg18 Mar. 2006 (NCBI Build 36.1). The CNV classifications (column 6) of gain or loss indicate whether each CNV region found in the subjects was duplicated/amplified (gain) or deleted (loss) in the genome.

Chr CNV CNV CNV CNV ASD RefSeq Gene Symbol(s) Category Exon Subregion Subregion Subregi type Case overlap Start Stop on Size ID(s) loss 1426 KIAA0562 Exon+ve, >2 cases Yes loss 1439 KIAA0562 Exon+ve, >2 cases Yes loss 1441 KIAA0562 Exon+ve, >2 cases Yes loss 1912 KIAA0562 Exon+ve, >2 cases Yes gain 1995 Clorf144 Exon+ve, distinct Yes CNVs, same Gene 1 16578594 16591820 13226 loss 1315 Clorf144 Exon+ve, distinct Yes CN Vs, same Gene gain 1502 TAS1R2 Exon+ve, >2 cases Yes loss 1940 TAS1R2 Exon+ve, >2 cases Yes loss 1278 EPHA8 Exon+ve, >2 cases Yes ci) loss 1687 EPHA8 ,Exon+ve, >2 cases Yes tµ') loss 1895 EPHA8 Exon+ve, >2 cases Yes loss 1591 STIL Exon+ve, >2 cases Yes L-1 loss 1759 STIL Exon+ve, >2 cases Yes 1 61097736 61359814 262078 gain 1396 NFIA
Exon+ve, distinct Yes 0 CNVs, same Gene L=3 =
-, 1 61661443 61707075 45632 gain 1828 NFIA
Exon+ve, distinct Yes w , CNVs, same Gene ¨, t..) =
1 65729501 65793446 63945 gain 1252 LEPR
Exon+ve, >2 cases Yes =
ao 1 65729501 65793446 63945 gain 1920 LEPR
Exon+ve, >2 cases Yes loss 1266 COL24A1 , Exon+ve, >2 cases Yes , loss 1283 COL24A1 Exon+ve, >2 cases Yes loss 1284 COL24A1 Exon+ve, >2 cases Yes loss 1308 COL24A1 Exon+ve, >2 cases , Yes loss 1309 COL24A1 Exon+ve, >2 cases Yes loss 1354 COL24A1 Exon+ve, >2 cases Yes loss 1413 COL24A1 Exon+ve, >2 cases Yes P

loss 1418 COL24A1 Exon+ve, >2 cases Yes loss 1433 COL24A1 Exon+ve, >2 cases Yes .

, loss 1449 COL24A1 Exon+ve, >2 cases Yes loss 1451 COL24A1 Exon+ve, >2 cases Yes .
, loss 1640 COL24A1 Exon+ve, >2 cases Yes .
, o, loss 1781 COL24A1 Exon+ve, >2 cases Yes loss 1815 COL24A1 Exon+ve, >2 cases Yes loss 1818 COL24A1 Exon+ve, >2 cases Yes loss 1929 COL24A1 Exon+ve, >2 cases Yes loss 1994 COL24A1 Exon+ve, >2 cases Yes loss 2031 COL24A1 Exon+ve, >2 cases Yes loss 2040 COL24A1 Exon+ve, >2 cases Yes .o n loss 1582 HEM1 Exon+ve, >2 cases Yes -3 loss 1687 HEM1 Exon+ve, >2 cases Yes ci) i.) loss 1929 HEM1 Exon+ve, >2 cases Yes .., w loss 2045 HEMI Exon+ve, >2 cases Yes -I-r.) 1 91946409 91948225 1816 gain 1405 TGFBR3 125 Exon+ve, >2 cases Yes ul .1, ca a loss 1656 TGFER3 Exon+ve, >2 cases Yes loss 2043 TGFER3 Exon+ve, >2 cases Yes gain 1832 CCDC18 Exon+ve, >2 cases Yes gain 2032 CCDC18 Exon+ve, >2 cases Yes loss 1233 DNTTIP2 Exon+ve, >2 cases Yes loss 1802 DNTTIP2 Exon+ve, >2 cases Yes ao loss 1904 DNTTIP2 Exon+ve, >2 cases Yes loss 1233 DNTTIP2 Exon+ve, >2 cases Yes loss 1782 DNTTIP2 Exon+ve, >2 cases Yes loss 1802 DNTTIP2 Exon+ve, >2 cases Yes 1 97937667 97947671 10004 loss 1221 DPYD
Exon+ve, >2 cases Yes 1 97937667 97947671 10004 loss 1454 DPYD
Exon+ve, >2 cases Yes 1 110102580 110114121 11541 loss 1680 EPS8L3 Exon+ve, >2 cases Yes 1 110102580 110114121 11541 loss 1802 EPS8L3 Exon+ve, >2 cases Yes 1 144099302 144337286 237984 gain 1599 RNF115 ',RBM8A_GNRHR2,HFE2,ANKRD34A,LIX1 Exon+ve, >2 cases Yes RD L,POLR3GL,ANK 35 ,ITGA10,PEX11B,NUDT17, TXNIP,POLR3C,PIAS3 1 144099302 144337286 237984 loss 1874 RNF115,RBM8A.GNRHR2,HFE2,ANKRD34A,LIX1 Exon+ve, >2 cases Yes L,POLR3GL,ANKRD' 35,ITGAI0,PEXI1B,NUDT17, TXNIP,POLR3C,PIAS3 1 144099302 144337286 237984 gain 1968 RNF115,RBM8A_'GNRHR2,HFE2,ANKRD34A,LIX1 Exon+ve, >2 cases YesRD L,POLR3GL,ANK
35 ,ITGA10,PEX11B,NUDT17, TXNIP,POLR3C,PIAS3 loss 1867 RIIAD1 Exon+ve, >2 cases Yes loss 2033 RIIAD1 Exon+ve, >2 cases Yes gain 1223 LCE1C Exon+ve, >2 cases Yes gain 1587 LCE1C Exon+ve, >2 cases Yes gain 1664 LCE1C Exon+ve, >2 cases Yes ci) gain 1695 LCE1C Exon+ve, >2 cases Yes gain 1740 LCE1C Exon+ve, >2 cases Yes gain 1936 LCE1C Exon+ve, >2 cases Yes loss 1858 OR6Y1 Exon+ve, >2 cases Yes loss 1877 OR6Y1 Exon+ve, >2 cases Yes loss 1372 SOAT1 Exon+ve, >2 cases Yes "
=
.., loss 1635 SOAT1 Exon+ve, >2 cases Yes w , ¨, 1 179250547 179263983 13436 loss 1638 STX6 Exon+ve, >2 cases Yes "
=
=
1 179250547 179263983 13436 loss 1659 STX6 Exon+ve, >2 cases Yes .
ao 1 179250547 179263983 13436 loss 1662 STX6 Exon+ve, >2 cases Yes 1 179250547 179263983 13436 loss 1950 STX6 Exon+ve, >2 cases Yes loss 1638 MR1 Exon+ve, >2 cases Yes loss 1659 MR1 Exon+ve, >2 cases Yes 1 199054239 199082294 28055 gain 1587 CAMSAP ILI
Exon+ve, >2 cases Yes 1 199054239 199082294 28055 gain 1799 CAMSAP1L1 Exon+ve, >2 cases Yes 1 199149079 199185984 36905 gain 1587 Clorf106 Exon+ve, >2 cases Yes 1 199149079 199185984 36905 gain 1799 Clorf106 Exon+ve, >2 cases Yes P
ip loss 1572 CYB5R1 Exon+ve, >2 cases Yes .=

loss 1687 CYB5R1 Exon+ve, >2 cases Yes o's ..i loss 1724 CD46 Exon+ve, >2 cases Yes loss 1843 CD46 Exon+ve, >2 cases Yes .
, ip 1 206054159 206076352 22193 loss 1638 L0C148696 Exon+ve, >2 cases Yes ' o, 1 206054159 206076352 22193 loss 1659 L0C148696 Exon+ve, >2 cases Yes loss 1234 PRSS38 Exon+ve, >2 cases , Yes loss 1344 PRSS38 Exon+ve, >2 cases Yes loss 1371 PRSS38 Exon+ve, >2 cases Yes loss 1653 PRSS38 Exon+ve, >2 cases Yes 1 233582552 233602295 19743 gain 1466 TBCE
Exon+ve, >2 cases Yes 1 233582552 233602295 19743 loss 1720 TBCE
Exon+ve, >2 cases Yes mo n loss 1767 KIF26B Exon+ve, >2 cases Yes -3 loss 1840 KIF26B Exon+ve, >2 cases Yes ci) i.) loss 1767 TFB2M Exon+ve, >2 cases Yes =
.., w 1 244768366 244771085 2719 gain 1819 TFB2M
127 Exon+ve, >2 cases Yes -I-r.) !A
.1, ca a 1 246138090 246162296 24206 gain 1798 0R2T8 Exon+ve, >2 cases Yes 1 246138090 246162296 24206 gain 2034 0R2T8 Exon+ve, >2 cases Yes loss 1510 TPO Exon+ve, >2 cases Yes .. "
=
.., loss 1564 TPO Exon+ve, >2 cases Yes w , ¨, loss 1639 TPO Exon+ve, >2 cases Yes "
=
=

loss 1256 C2orf48 Exon+ve, >2 cases Yes .
ao loss 1285 C2orf48 Exon+ve, >2 cases Yes loss 1307 C2orf48 Exon+ve, >2 cases Yes loss 1370 C2orf48 Exon+ve, >2 cases Yes loss 1396 C2orf48 Exon+ve, >2 cases Yes loss 1415 C2orf48 Exon+ve, >2 cases Yes loss 1616 C2orf48 Exon+ve, >2 cases Yes loss 1654 C2orf48 Exon+ve, >2 cases Yes loss 1830 C2orf48 Exon+ve, >2 cases Yes P

loss 1931 C2orf48 Exon+ve, >2 cases Yes .=

loss 1429 LBH Exon+ve, >2 cases Yes o's , loss 1884 LBH Exon+ve, >2 cases Yes 2 31279154 31321453 42299 loss 1544 CAPN14,EHD3 Exon+ve, >2 cases Yes .
, 2 31279154 31321453 42299 loss 1929 CAPN14,EHD3 Exon+ve, >2 cases Yes ' o, loss 1688 DYNC2LI1 Exon+ve, >2 cases Yes loss 1786 DYNC2LI1 Exon+ve, >2 cases , Yes loss 1790 DYNC2LI1 Exon+ve, >2 cases Yes loss 1504 PREPL Exon+ve, >2 cases Yes gain 1826 PREPL Exon+ve, >2 cases Yes 2 48603879 48627703 23824 gain 1276 STON1-GTF2A1L,STON1 Exon+ve, distinct Yes CNVs, same Gene .o n 2 48666246 48676336 10090 gain 1386 STON1-GTF2A1L,STON1 Exon+ve, distinct Yes -3 CNVs, same Gene ci) i.) 2 50421622 50452128 30506 loss 1597 NRXN1 Exon+ve, distinct Yes .., w CNVs, same Gene -I-2 50458654 50639069 180415 loss 1597 NRXN1 128 Exon+ve, distinct Yes r.) ul .1, ca a CNVs, same Gene 2 50642430 50722328 79898 loss 1597 NRXN1 Exon+ve, distinct Yes 0 CNVs, same Gene t-) =
.., 2 73706727 73732302 25575 gain 1369 NAT8,A LM SIP
Exon+ve, >2 cases Yes w , -, 2 73706727 73732302 25575 loss 1551 NAT8,ALMS1P
Exon+ve, >2 cases Yes t..) =
2 73706727 73732302 25575 gain 1626 NAT8,ALMS IP
Exon+ve, >2 cases Yes =
ao 2 73706727 73732302 25575 loss 1728 NAT8,ALMS1P
Exon+ve, >2 cases Yes 2 73732303 73764497 32194 gain 1369 ALMS1P
Exon+ve, >2 cases Yes 2 73732303 73764497 32194 gain 1533 ALMS IP
Exon+ve, >2 cases Yes 2 73732303 73764497 32194 loss 1551 ALMS1P
Exon+ve, >2 cases Yes 2 73732303 73764497 32194 gain 1626 ALMS1P
Exon+ve, >2 cases Yes 2 73732303 73764497 32194 loss 1728 ALMS1P
Exon+ve, >2 cases Yes 2 73732303 73764497 32194 loss 1738 ALMS1P
Exon+ve, >2 cases Yes 2 73732303 73764497 32194 gain 1887 ALMS1P
Exon+ve, >2 cases Yes P

gain 1533 ALMS1P Exon+ve, >2 cases Yes loss 1551 ALMS1P Exon+ve, >2 cases , Yes ' , loss 1728 ALMS1P Exon+ve, >2 cases Yes loss 1738 ALMS1P Exon+ve, >2 cases Yes .
, gain 1887 ALMS1P Exon+ve, >2 cases Yes 037 o, 2 99109502 99129872 20370 gain 1461 TSGA10,C2orf15 Exon+ve, >2 cases Yes 2 99109502 99129872 20370 gain 1466 TSGA10,C2orf15 Exon+ve, >2 cases Yes 2 99134855 99165006 30151 gain 1461 TSGA10,MRPL30,MITD1,LIPT1 Exon+ve, >2 cases Yes 2 99134855 99165006 30151 gain 1466 TSGA10,MRPL30,MITD 1, LIPT1 Exon+ve, >2 cases Yes loss 1505 UXS1 Exon+ve, >2 cases Yes loss 1611 UXS1 Exon+ve, >2 cases Yes loss 1697 UXS1 Exon+ve, >2 cases Yes .o n loss 1592 ST6GAL2 Exon+ve, >2 cases Yes -3 loss 1720 ST6GAL2 Exon+ve, >2 cases Yes ci) t.) gain 1532 CNTNAP5 Exon+ve, >2 cases Yes .., w gain 1803 CNTNA P5 Exon+ve, >2 cases Yes -I-r.) gain 1451 ZRANB3 Exon+ve, >2 cases Yes ul .1, ca a loss 1512 ZRANB3 Exon+ve, >2 cases Yes loss 1574 ZRANB3 Exon+ve, >2 cases Yes loss 1757 ZRANB3 Exon+ve, >2 cases Yes "
=
.., gain 1970 ZRANB3 Exon+ve, >2 cases Yes w , ¨, 2 143888582 143915868 27286 loss 1677 ARHGAP15 Exon+ve, >2 cases Yes "
=
=
2 143888582 143915868 27286 gain 1750 ARHGAP15 Exon+ve, >2 cases Yes .
ao loss 1425 SESTD1 Exon+ve, >2 cases Yes loss 1727 SESTD1 Exon+ve, >2 cases Yes gain 1344 CFLAR Exon+ve, >2 cases Yes gain 1824 CFLAR Exon+ve, >2 cases Yes gain 1841 CFLAR Exon+ve, >2 cases Yes gain 1927 CFLAR Exon+ve, >2 cases Yes loss 1534 CASP10 Exon+ve, >2 cases Yes gain 1943 CASPIO Exon+ve, >2 cases Yes P
ip gain 1220 IN080D Exon+ve, >2 cases Yes .=

gain 1803 IN080D Exon+ve, >2 cases Yes o's ..i gain 1921 IN080D Exon+ve, >2 cases Yes gain 1988 IN080D Exon+ve, >2 cases Yes .
, ip gain 2028 IN080D Exon+ve, >2 cases Yes ' o, gain 1803 IN080D Exon+ve, >2 cases Yes 2 206590637 206592116 , 1479 , gain 1921 TN080D
Exon+ve, >2 cases , Yes gain 1988 IN080D Exon+ve, >2 cases Yes gain 2028 IN080D Exon+ve, >2 cases Yes 2 213900382 213922938 22556 loss 1832 SPAG16 Exon+ve, distinct Yes CNVs, same Gene loss 1870 SPAG16 Exon+ve, distinct Yes mo n CNVs, same Gene loss 1870 SPAG16 Exon+ve, distinct Yes ci) t.) CNVs, same Gene =
.., loss 1512 SPAG16 Exon+ve, >2 cases Yes w -I-2 214585717 214586936 1219 loss 1636 SPAG16 130 Exon+ve, >2 cases Yes r.) !A
.1, ca a 2 214586937 214599105 12168 loss 1636 SPAG16 Exon+ve, distinct Yes CNVs, same Gene gain 1284 PNKD,TMBIM1 Exon+ve, >2 cases Yes L') =
.., gain 1660 PNKD,TIVIBIM1 Exon+ve, >2 cases Yes w , ¨, gain 1728 PNKD,TMBIM1 Exon+ve, >2 cases Yes t..) =

gain 2024 PNKD,TMBIM1 Exon+ve, >2 cases Yes =
ao loss 1721 SLC11A1 Exon+ve, >2 cases Yes loss 1993 SLC11A1 Exon+ve, >2 cases Yes loss 1718 CIDSP1 Exon+ve, >2 cases Yes loss 1721 CTDSP1 Exon+ve, >2 cases Yes loss 1913 CTDSP1 Exon+ve, >2 cases Yes loss 1993 CTDSP1 Exon+ve, >2 cases Yes loss 1718 MIR26B,CTDSP1 Exon+ve, >2 cases Yes loss 1721 MIR26B,CTDSP1 Exon+ve, >2 cases Yes P

loss 1993 MIR26B,CTDSP1 Exon+ve, >2 cases Yes 2 218978244 218978839 595 loss 1721 CTDSP1 Exon+ve, >2 cases , Yes ' , 2 218978244 218978839 595 loss 1993 CTDSP1 Exon+ve, >2 cases Yes gain 1598 CHL1 Exon+ve, >2 cases Yes .
, gain 1657 CHL1 Exon+ve, >2 cases Yes 037 o, 3 2548711 2645342 96631 gain 1851 CNTN4 Special Yes gain 1264 CPNE9 Exon+ve, >2 cases Yes gain 1587 CPNE9 Exon+ve, >2 cases Yes gain 1618 CPNE9 Exon+ve, >2 cases Yes loss 1247 IRAK2 Exon+ve, distinct Yes CNVs, same Gene loss 1920 IRAK2 Exon+ve, distinct Yes .o n CNVs, same Gene loss 1259 PDCD6IP Exon+ve, >2 cases Yes ci) t.) loss 1274 PDCD6IP Exon+ve, >2 cases Yes =
.., loss 1724 PDCD6IP Exon+ve, >2 cases Yes w -I-3 38415026 38417567 2541 loss 1725 XYLB
131 Exon+ve, >2 cases Yes r.) ul .1, ca a loss 1802 XYLB Exon+ve, >2 cases Yes loss 1318 ALS2CL Exon+ve, >2 cases Yes loss 1834 ALS2CL Exon+ve, >2 cases Yes "
=
.., loss 1428 COL7A1,UQCRC1 Exon+ve, >2 cases Yes w , ¨, loss 1969 COL7A1,UQCRC1 Exon+ve, >2 cases Yes "
=
=

loss 2035 COL7A1,UQCRC1 Exon+ve, >2 cases Yes .
ao 3 48611410 48667744 56334 loss 1969 TMEM89,CELSR3,SLC26A6,UQCRC1 Exon+ve, >2 cases Yes 3 48611410 48667744 56334 loss 2035 TMEM89,CELSR3,SLC26A6,UQCRC1 Exon+ve, >2 cases Yes 3 54504338 54514944 10606 gain 1293 CACNA2D3 Exon+ve, >2 cases Yes 3 54504338 54514944 10606 gain 1921 CACNA2D3 Exon+ve, >2 cases Yes gain 1267 DNASE1L3 Exon+ve, >2 cases Yes gain 1268 DNASE1L3 Exon+ve, >2 cases Yes gain 1354 DNASE1L3 Exon+ve, >2 cases Yes 3 59891946 60045382 153436 gain 1936 FHIT
Exon+ve, >2 cases Yes P
ip 3 59891946 60045382 153436 loss 1991 FHIT
Exon+ve, >2 cases Yes .=

loss 1428 ADAMTS9 Exon+ve, >2 cases Yes o's ..i loss 1434 ADAMTS9 Exon+ve, >2 cases Yes loss 1572 ADAMTS9 Exon+ve, >2 cases Yes .
, ip loss 1592 ADAMTS9 Exon+ve, >2 cases Yes ' o, loss 1763 ADAMTS9 Exon+ve, >2 cases Yes loss 1619 L0C255025 Exon+ve, >2 cases , Yes loss 1624 L0C255025 Exon+ve, >2 cases Yes gain 1371 ARHGEF26 Exon+ve, distinct Yes CNVs, same Gene gain 1446 ARHGEF26 Exon+ve, distinct Yes CNVs, same Gene mo n gain 1227 'FFRC Exon+ve, >2 cases Yes -3 gain 1565 TFRC Exon+ve, >2 cases Yes ci) i.) 3 197289125 197410852 121727 gain 1227 L0C401109,TFRC,ZDHHC19 Exon+ve, >2 cases Yes .., w 3 197289125 197410852 121727 gain 1565 L0C401109,TERC,ZDHHC19 Exon+ve, >2 cases Yes -I-r.) 3 197516474 197531031 14557 gain 1227 TCTEX1D2,TM4SF19-TCTEX1D2 Exon+ve, >2 cases Yes !A
.1, ca a 3 197516474 197531031 14557 gain 1565 TCTEX1D2,TM4SF19-TCTEX1D2 Exon+ve, >2 cases Yes 3 197537870 197560934 23064 gain 1565 TM4SF19,TM4SF19-TCTEX1D2 Exon+ve, >2 cases Yes 3 197537870 197560934 23064 gain 1775 TM4SF19,TM4SF19-TCTEX1D2 Exon+ve, >2 cases Yes "
=
.., 3 197712985 197736785 23800 loss 1546 RNF168,C3orf43 Exon+ve, >2 cases Yes w , ¨, 3 197712985 197736785 23800 gain 1565 RNF168,C3orf43 Exon+ve, >2 cases Yes "
=
=

loss 1285 LRRC33 Exon+ve, >2 cases Yes .
ao gain 1565 LRRC33 Exon+ve, >2 cases Yes loss 1909 LRRC33 Exon+ve, >2 cases Yes loss 2030 LRRC33 Exon+ve, >2 cases Yes loss 1426 SLIT2 Exon+ve, >2 cases Yes loss 1528 SLIT2 Exon+ve, >2 cases Yes loss 1665 SLIT2 Exon+ve, >2 cases Yes loss 1667 SLIT2 Exon+ve, >2 cases Yes loss 1671 SLIT2 Exon+ve, >2 cases Yes P

loss 1883 N4BP2 Exon+ve, >2 cases Yes .=

loss 1947 N4BP2 Exon+ve, >2 cases Yes o's , loss 1487 YIPF7 Exon+ve, >2 cases Yes loss 1659 YIPF7 Exon+ve, >2 cases Yes .
, 4 47314693 47335844 21151 loss 1308 CORN
Exon+ve, distinct Yes ' o, CNVs, same Gene gain 1252 CORN Exon+ve, >2 cases Yes gain 1658 CORN Exon+ve, >2 cases Yes gain 1252 CORN Exon+ve, distinct Yes CNVs, same Gene loss 1529 CLOCK Exon+ve, >2 cases Yes loss 1738 CLOCK Exon+ve, >2 cases Yes .o n loss 1221 UBA6 Exon+ve, >2 cases Yes -3 loss 1222 UBA6 Exon+ve, >2 cases Yes ci) i.) 4 70523201 70551081 27880 loss 1285 UGT2A2,UGT2A1 Exon+ve, >2 cases Yes .., w 4 70523201 70551081 27880 loss 1433 UGT2A2,UGT2A1 Exon+ve, >2 cases Yes -I-r.) 4 74035932 74268619 232687 gain 1347 COX18,ANKRD17 Exon+ve, >2 cases Yes ul .1, ca a 4 74035932 74268619 232687 gain 1945 COX18,ANKRD17 Exon+ve, >2 cases Yes loss 1373 ALB Exon+ve, >2 cases Yes 0 loss 1464 ALB Exon+ve, >2 cases Yes "
=
-, loss 1798 ALB Exon+ve, >2 cases Yes w , ¨, loss 1852 ALB Exon+ve, >2 cases Yes "
=
=

loss 1959 ALB Exon+ve, >2 cases Yes .
ao gain 1489 C4orf37 Exon+ve, distinct Yes CNVs, same Gene 4 99278436 99382350 103914 loss 1534 C4orf37 Exon+ve, distinct Yes CNVs, same Gene 4 100955189 100969192 14003 , gain 1462 , DAPP1 Exon+ve, >2 cases , Yes 4 100955189 100969192 14003 gain 1913 DAPP1 Exon+ve, >2 cases Yes 4 100980535 101000511 19976 gain 1462 DAPP1 Exon+ve, >2 cases Yes 4 100980535 101000511 19976 gain 1913 DAPP1 Exon+ve, >2 cases Yes P
ip 4 101572938 101587882 14944 gain 1752 EMCN
Exon+ve, >2 cases Yes 4 101572938 101587882 14944 gain 1867 EMCN
Exon+ve, >2 cases Yes .

..i loss 1280 TBCK Exon+ve, >2 cases Yes loss 1933 TBCK Exon+ve, >2 cases Yes .
, ip 4 149047165 149047423 258 loss 1498 ARHGAP10 Exon+ve, >2 cases Yes .
, o, 4 149047165 149047423 258 loss 1916 ARHGAP10 Exon+ve, >2 cases Yes gain 1281 PDLIM3 Exon+ve, >2 cases Yes loss 1458 PDLIM3 Exon+ve, >2 cases Yes gain 1691 TRIML1 Exon+ve, >2 cases Yes gain 1704 TRIML1 Exon+ve, >2 cases Yes 4 191041482 191133608 92126 gain 1230 FRG1 Exon+ve, >2 cases Yes 4 191041482 191133608 92126 gain 1292 FRG1 Exon+ve, >2 cases Yes mo n 4 191041482 191133608 92126 gain 1411 FRG1 Exon+ve, >2 cases Yes -3 10688337 10691335 2998 loss 1438 ANKRD33B Exon+ve, >2 cases Yes ci) t.) loss 1619 ANKRD33B Exon+ve, >2 cases Yes .., w loss 1629 ANKRD33B Exon+ve, >2 cases Yes -I-r.) loss 1630 ANKRD33B 134 Exon+ve, >2 cases Yes !A
.1, ca a 10688337 10691335 2998 loss 1666 ANKRD33B Exon+ve, >2 cases Yes loss 1850 ANKRD33B Exon+ve, >2 cases Yes 0 loss 1998 ANKRD33B Exon+ve, >2 cases Yes "
=
-, loss 2026 ANKRD33B Exon+ve, >2 cases Yes w , ¨, gain 1282 TRIO Exon+ve, distinct Yes "
=
=
CNVs, same Gene .
ao gain 1417 TRIO Exon+ve, distinct Yes CNVs, same Gene . 5 37398626 37405778 7152 loss 1426 NUP155 Exon+ve, >2 cases Yes gain 1765 NUP155 Exon+ve, >2 cases Yes 89477991 90081196 603205 gain 1786 LYSMD3,POLR3G,CETN3,MBLAC2,GPR98 Exon+ve, >2 cases , Yes 89477991 90081196 603205 gain 1886 LYSMD3,POLR3G,CETN3,MBLAC2,GPR98 Exon+ve, >2 cases Yes gain 1489 GPR98 Exon+ve, >2 cases Yes gain 1786 GPR98 Exon+ve, >2 cases Yes P

gain 1886 GPR98 Exon+ve, >2 cases Yes gain 1786 GPR98 Exon+ve, >2 cases Yes .

..i gain 1886 GPR98 Exon+ve, >2 cases Yes gain 1281 GLRX Exon+ve, >2 cases Yes .
, ip gain 1824 GLRX Exon+ve, >2 cases Yes .
, o, 5 122534134 122535395 1261 loss 1224 PRDM6 Exon+ve, >2 cases Yes 5 122534134 122535395 1261 loss 1548 PRDM6 Exon+ve, >2 cases Yes 5 122534134 122535395 1261 loss 1552 PRDM6 Exon+ve, >2 cases Yes 5 122534134 122535395 1261 loss 1681 PRDM6 Exon+ve, >2 cases Yes 5 122534134 122535395 1261 loss 1740 PRDM6 Exon+ve, >2 cases Yes 5 122534134 122535395 1261 loss 1763 PRDM6 Exon+ve, >2 cases Yes 5 122534134 122535395 1261 loss 1786 PRDM6 Exon+ve, >2 cases Yes mo n 5 122534134 122535395 1261 loss 1807 PRDM6 Exon+ve, >2 cases Yes -3 5 122534134 122535395 1261 loss 1880 PRDM6 Exon+ve, >2 cases Yes ci) t.) 5 122534134 122535395 1261 loss 1881 PRDM6 Exon+ve, >2 cases Yes .., w 5 122534134 122535395 1261 loss 1915 PRDM6 Exon+ve, >2 cases Yes -I-r.) 5 128326107 128331280 5173 loss 1248 SLC27A6 Exon+ve, >2 cases Yes !A
.1, ca a 128326107 128331280 5173 loss 1699 SLC27A6 Exon+ve, >2 cases Yes 5 150506984 150518075 11091 loss 1433 ANXA6 Exon+ve, >2 cases Yes 5 150506984 150518075 11091 loss 1942 ANXA6 Exon+ve, >2 cases Yes "
=
.., 5 180189516 180365977 176461 Loss 1532 BTNL3 Ctrl pos High OR Yes w , ¨, 5 180189516 180365977 176461 Loss 1612 BTNL3 Ctrl pos High OR Yes "
=
=
5 180189516 180365977 176461 Loss 1686 BTNL3 Ctrl pos High OR Yes .
ao loss 1229 L00729678 Exon+ve, >2 cases Yes loss 1532 L00729678 Exon+ve, >2 cases Yes loss 1548 L00729678 Exon+ve, >2 cases Yes loss 1612 L00729678 Exon+ve, >2 cases Yes loss 1686 L00729678 Exon+ve, >2 cases Yes loss 1861 L00729678 Exon+ve, >2 cases Yes 5 180192214 180365977 173763 Loss 1606 BTNL3 Ctrl pos High OR Yes loss 1229 L00729678 Exon+ve, >2 cases Yes P

gain 1316 L00729678 Exon+ve, >2 cases Yes .=

loss 1532 L00729678 Exon+ve, >2 cases Yes o's , loss 1548 L00729678 Exon+ve, >2 cases Yes loss 1580 L00729678 Exon+ve, >2 cases Yes .
, loss 1606 L00729678 Exon+ve, >2 cases Yes ' o, loss 1612 L00729678 Exon+ve, >2 cases Yes loss 1641 L00729678 Exon+ve, >2 cases Yes loss 1686 L00729678 Exon+ve, >2 cases Yes loss 1861 L00729678 Exon+ve, >2 cases Yes 5 180194323 180378586 184263 Loss 1429 BTNL3 Ctrl pos High OR Yes 5 180194323 180365977 171654 Loss 1546 BTNL3 Ctrl pos High OR Yes 5 180194323 180378586 184263 Loss 1634 BTNL3 Ctrl pos High OR Yes .o n 5 180194323 180365977 171654 Loss 1696 BTNL3 Ctrl pos High OR Yes -3 5 180194323 180365977 171654 Loss 1792 BTNL3 Ctrl pos High OR Yes ci) i.0 5 180194323 180378586 184263 Loss 1851 BTNL3 Ctrl pos High OR Yes =
.., w 5 180194323 180378586 184263 Loss 1902 BTNL3 136 Ctrl pos High OR Yes -I-r.) ul .1, ca a 180194323 180365977 171654 Loss 1927 BTNL3 Ctrl pos High OR Yes 5 180344964 180365977 21013 Loss 1261 BTNL3 Ctrl pos High OR Yes 0 5 180344964 180365977 21013 Loss 1265 BTNL3 Ctrl pos High OR Yes "
=
.., 5 180344964 180378586 33622 Loss 1268 BTNL3 Ctrl pos High OR Yes w , ¨, 5 180344964 180379663 34699 Loss 1277 BTNL3 Ctrl pos High OR Yes "
=
=
5 180344964 180378586 33622 Loss 1354 BTNL3 Ctrl pos High OR Yes .
ao 5 180344964 180365977 21013 Loss 1438 BTNL3 Ctrl pos High OR Yes 5 180344964 180378586 33622 Loss 1463 BTNL3 Ctrl pos High OR Yes 5 180344964 180365977 21013 Loss 1467 BTNL3 Ctrl pos High OR Yes 5 180344964 180365977 21013 Loss 1568 BTNL3 Ctrl pos High OR Yes 5 180344964 180365977 21013 Loss 1570 BTNL3 Ctrl pos High OR Yes 5 180344964 180365977 21013 Loss 1662 BTNL3 Ctrl pos High OR Yes 5 180344964 180365977 21013 Loss 1671 BTNL3 Ctrl pos High OR Yes 5 180344964 180365977 21013 Loss 1726 BTNL3 Ctrl pos High OR Yes P
5 180344964 180365977 21013 Loss 1769 BTNL3 Ctrl pos High OR Yes 5 180344964 180365977 21013 Loss 1799 BTNL3 Ctrl pos High OR Yes o's , 5 180344964 180378586 33622 Loss 1849 BTNL3 Ctrl pos High OR Yes 5 180346557 180365977 19420 Loss 1540 BTNL3 Ctrl pos High OR Yes .
, 5 180346557 180365977 19420 Loss 1754 BTNL3 Ctrl pos High OR Yes ' o, 5 180346557 180365977 19420 Loss 1755 BTNL3 Ctrl pos High OR Yes 5 180346557 180378586 32029 Loss 1942 BTNL3 Ctrl pos High OR , Yes 6 26539830 26571434 31604 gain 1842 BTN2A1,BTN3A3 Exon+ve, >2 cases Yes 6 26539830 26571434 31604 loss 1968 BTN2A1,BTN3A3 Exon+ve, >2 cases Yes loss 1275 SNORD32B Exon+ve, >2 cases Yes loss 1440 SNORD32B Exon+ve, >2 cases Yes loss 1750 SNORD32B Exon+ve, >2 cases Yes mo n loss 1862 SNORD32B Exon+ve, >2 cases Yes -3 loss 1244 HCG9 Exon+ve, >2 cases Yes ci) i.0 loss 1488 HCG9 Exon+ve, >2 cases Yes =
.., w loss 1297 CUTA,PHF1 Exon+ve, >2 cases Yes -i-r.) ul .1, ca a loss 1718 CUTA,PHF1 Exon+ve, >2 cases Yes loss 1841 CUTA,PHF1 Exon+ve, >2 cases Yes loss 1905 CUTA,PHF1 Exon+ve, >2 cases Yes "
=
.., loss 2031 CUTA,PHF1 Exon+ve, >2 cases Yes w , ¨, loss 2032 CUTA,PHF1 Exon+ve, >2 cases Yes "
=
=

loss 1297 CUTA Exon+ve, >2 cases Yes .
ao loss 1718 CUTA Exon+ve, >2 cases Yes loss 1841 CUTA Exon+ve, >2 cases Yes loss 1872 CUTA Exon+ve, >2 cases Yes loss 1905 CUTA Exon+ve, >2 cases Yes loss 1967 CUTA Exon+ve, >2 cases Yes loss 2031 CUTA Exon+ve, >2 cases Yes loss 2032 CUTA Exon+ve, >2 cases Yes loss 1297 SYNGAP1 Exon+ve, >2 cases Yes P

loss 1718 SYNGAP1 Exon+ve, >2 cases Yes .=

loss 1824 SYNGAP1 Exon+ve, >2 cases Yes o's , loss 1840 SYNGAP1 Exon+ve, >2 cases Yes loss 1841 SYNGAP1 Exon+ve, >2 cases Yes .
, loss 1872 SYNGAP1 Exon+ve, >2 cases Yes ' o, loss 1905 SYNGAP1 Exon+ve, >2 cases Yes loss 1967 SYNGAP1 Exon+ve, >2 cases , Yes loss 2031 SYNGAP1 Exon+ve, >2 cases Yes loss 2032 SYNGAP1 Exon+ve, >2 cases Yes loss 1680 C6orf126 Exon+ve, >2 cases Yes loss 1694 C6orf126 Exon+ve, >2 cases Yes loss 1718 C6orf126 Exon+ve, >2 cases Yes .o n loss 1852 C6orf126 Exon+ve, >2 cases Yes -3 loss 1950 C6orf126 Exon+ve, >2 cases Yes ci) t.) loss 1965 C6orf126 Exon+ve, >2 cases Yes =
.., w loss 2006 C6orf126 Exon+ve, >2 cases Yes -I-r.) ul =P, ca a loss 2018 C6orf126 Exon+ve, >2 cases Yes loss 1680 C6orf126 Exon+ve, >2 cases Yes loss 1694 C6orf126 Exon+ve, >2 cases Yes "
=
.., loss 1718 C6orf126 Exon+ve, >2 cases Yes w , ¨, loss 1852 C6orf126 Exon+ve, >2 cases Yes "
=
=

loss 1940 C6orf126 Exon+ve, >2 cases Yes .
ao loss 1946 C6orf126 Exon+ve, >2 cases Yes loss 1950 C6orf126 Exon+ve, >2 cases Yes loss 1958 C6orf126 Exon+ve, >2 cases Yes loss 1961 C6orf126 Exon+ve, >2 cases Yes loss 1962 C6orf126 Exon+ve, >2 cases Yes loss 1965 C6orf126 Exon+ve, >2 cases Yes loss 2005 C6orf126 Exon+ve, >2 cases Yes loss 2006 C6orf126 Exon+ve, >2 cases Yes P

loss 2018 C6orf126 Exon+ve, >2 cases Yes .=

loss 1301 C6orf127 Exon+ve, >2 cases Yes o's , loss 1680 C6orf127 Exon+ve, >2 cases Yes loss 1694 C6orf127 Exon+ve, >2 cases Yes .
, loss 1718 C6orf127 Exon+ve, >2 cases Yes ' o, loss 1837 C6orf127 Exon+ve, >2 cases Yes loss 1839 C6orf127 Exon+ve, >2 cases , Yes loss 1852 C6orf127 Exon+ve, >2 cases Yes loss 1940 C6orf127 Exon+ve, >2 cases Yes loss 1946 C6orf127 Exon+ve, >2 cases Yes loss 1950 C6orf127 Exon+ve, >2 cases Yes loss 1952 C6orf127 Exon+ve, >2 cases Yes .o n loss 1958 C6orf127 Exon+ve, >2 cases Yes -3 loss 1959 C6orf127 Exon+ve, >2 cases Yes ci) t.) loss 1961 C6orf127 Exon+ve, >2 cases Yes =
.., w loss 1962 C6orf127 Exon+ve, >2 cases Yes -I-r.) ul =P, ca a loss 1965 C6orf127 Exon+ve, >2 cases Yes loss 2005 C6orf127 Exon+ve, >2 cases Yes loss 2006 C6orf127 Exon+ve, >2 cases Yes "
=
.., loss 2018 C6orf127 Exon+ve, >2 cases Yes w , ¨, gain 1638 CD109 Exon+ve, >2 cases Yes "
=
=

gain 1894 CD109 Exon+ve, >2 cases Yes .
ao loss 1426 HACE1 Exon+ve, >2 cases Yes loss 1458 HACE1 Exon+ve, >2 cases Yes loss 1490 HACE1 Exon+ve, >2 cases Yes loss 1492 HACE1 Exon+ve, >2 cases Yes loss 1500 HACE1 Exon+ve, >2 cases Yes loss 1224 SGK1 Exon+ve, >2 cases Yes loss 1576 SGK1 Exon+ve, >2 cases Yes loss 1667 SGK1 Exon+ve, >2 cases Yes P

loss 1708 SGK1 Exon+ve, >2 cases Yes .=

loss 1387 TXLNB Exon+ve, >2 cases Yes o's , loss 1396 TXLNB Exon+ve, >2 cases Yes loss 1401 TXLNB Exon+ve, >2 cases Yes .
, loss 1403 TXLNB Exon+ve, >2 cases Yes ' o, loss 1696 TXLNB Exon+ve, >2 cases Yes loss 1895 TXLNB Exon+ve, >2 cases , Yes gain 1281 AIG1 Exon+ve, >2 cases Yes gain 1372 AIG1 Exon+ve, >2 cases Yes gain 1409 AIG1 Exon+ve, >2 cases Yes gain 1619 AIG1 Exon+ve, >2 cases Yes gain 1639 AIG1 Exon+ve, >2 cases Yes .o n gain 1281 AIG1 Exon+ve, >2 cases Yes -3 gain 1372 AIG1 Exon+ve, >2 cases Yes ci) t.) gain 1409 AIG1 Exon+ve, >2 cases Yes =
.., w 6 143696259 143697901 1642 gain 1429 AIG1 140 Exon+ve, >2 cases Yes -I-r.) ul .1, ca a gain 1619 AIGI Exon+ve, >2 cases Yes gain 1639 AIGI Exon+ve, >2 cases Yes gain 1926 AIG1 Exon+ve, >2 cases Yes "
=
.., gain 1281 AIGI Exon+ve, >2 cases Yes w , ¨, gain 1372 AIGI Exon+ve, >2 cases Yes "
=
=

gain 1409 AIG1 Exon+ve, >2 cases Yes .
ao gain 1429 AIGI Exon+ve, >2 cases Yes gain 1619 AIGI Exon+ve, >2 cases Yes gain 1639 AIGI Exon+ve, >2 cases Yes gain 1905 AIGI Exon+ve, >2 cases Yes gain 1926 AIGI Exon+ve, >2 cases Yes loss 1291 RAB32 Exon+ve, >2 cases Yes loss 1309 RAB32 Exon+ve, >2 cases Yes loss 1535 RAB32 Exon+ve, >2 cases Yes P

loss 1369 UST Exon+ve, >2 cases Yes .=

loss 1645 UST Exon+ve, >2 cases Yes o's , loss 1660 UST Exon+ve, >2 cases Yes 6 155530613 155545570 14957 loss 1347 TIAM2 Exon+ve, >2 cases Yes .
, 6 155530613 155545570 14957 loss 1598 TIAM2 Exon+ve, >2 cases Yes ' o, 6 159190838 159203355 12517 loss 1468 OSTCL
Exon+ve, >2 cases Yes 6 159190838 159203355 12517 loss 1582 OSTCL
Exon+ve, >2 cases , Yes loss 1419 C6orf99 Exon+ve, >2 cases Yes loss 1468 C6orf99 Exon+ve, >2 cases Yes loss 1742 C6orf99 Exon+ve, >2 cases Yes loss 1900 C6orf99 Exon+ve, >2 cases Yes 6 160247865 160248266 401 gain 1242 MASI
Exon+ve, >2 cases Yes .o n 6 160247865 160248266 401 gain 1571 MASI
Exon+ve, >2 cases Yes -3 6 160247865 160248266 401 gain 1574 MASI
Exon+ve, >2 cases Yes ci) t.) 6 160247865 160248266 401 gain 1870 MASI
Exon+ve, >2 cases Yes =
.., w loss 1590 PDE10A Exon+ve, >2 cases Yes -I-r.) ul .1, ca a gain 1760 PDE10A Exon+ve, >2 cases Yes gain 1392 T Exon+ve, distinct Yes 0 CNVs, same Gene t-) =
-, loss 1859 T Exon+ve, distinct Yes w , CNVs, same Gene -, i..) =
6 170683495 170701779 18284 gain 1729 PSMB1 Exon+ve, >2 cases Yes =
ao 6 170683495 170701779 18284 gain 1954 PSMB1 Exon+ve, >2 cases Yes 7 , 6004111 , 6006782 2671 gain , 1266 PMS2 ,Exon+ve, >2 cases Yes , gain 1938 PMS2 Exon+ve, >2 cases Yes 7 45079997 45096030 16033 loss 1642 NACAD,CCM2 Exon+ve, >2 cases Yes 7 45079997 45096030 16033 Joss 1819 ,NACAD,CCM2 Exon+ve, >2 cases , Yes 7 45079997 45096030 16033 loss 1825 NACAD,CCM2 Exon+ve, >2 cases Yes 7 45079997 45096030 16033 loss 1907 NACAD,CCM2 Exon+ve, >2 cases Yes loss 1886 ABCA13 Exon+ve, >2 cases Yes P
ip loss 1891 ABCA13 Exon+ve, >2 cases Yes loss 1439 ABCB4 Exon+ve, >2 cases Yes .

..i loss 1579 ABCB4 Exon+ve, >2 cases Yes 7 89728688 89820179 91491 gain 1274 GTPBP10,C7oif63 Exon+ve, distinct Yes .
, ip CNVs, same Gene .
, 7 89824673 89852155 27482 gain 1864 GTPBP10 Exon+ve, distinct Yes o, CNVs, same Gene 7 91585706 91605955 20249 loss 1734 CYP51A1 Exon+ve, >2 cases Yes 7 91585706 91605955 20249 loss 1856 CYP51A1 Exon+ve, >2 cases Yes gain 1411 L0C100289187 Exon+ve, >2 cases Yes gain 1755 L0C100289187 Exon+ve, >2 cases Yes gain 1799 L0C100289187 Exon+ve, >2 cases Yes mo n loss 1227 ZAN Exon+ve, >2 cases , Yes -3 loss 1236 ZAN Exon+ve, >2 cases Yes ci) loss 1803 ZAN Exon+ve, >2 cases Yes =
.., loss 1824 ZAN Exon+ve, >2 cases Yes w -I-loss 1896 ZAN Exon+ve, >2 cases Yes r.) !A
.1, ca a loss 2034 ZAN Exon+ve, >2 cases Yes 7 100967884 100979053 11169 loss 1680 EMID2 Exon+ve, >2 cases Yes 0 7 100967884 100979053 11169 loss 1820 EMID2 Exon+ve, >2 cases Yes "
=
-, 7 107049716 107067706 17990 loss 1321 BCAP29 Exon+ve, >2 cases Yes w , ¨, 7 107049716 107067706 17990 loss 1475 BCAP29 Exon+ve, >2 cases Yes "
=
=

loss 1910 CADPS2 Exon+ve, distinct Yes .
ao CNVs, same Gene loss 1354 CADPS2 Exon+ve, distinct Yes CNVs, same Gene 7 127640643 127675911 35268 gain 1266 LEP
Exon+ve, >2 cases Yes 7 127640643 127675911 , 35268 , gain 1733 , LEP
Exon+ve, >2 cases , Yes gain 1494 AKR1B15 Exon+ve, >2 cases Yes gain 1783 AKR1B15 Exon+ve, >2 cases Yes gain 1225 MGAM Exon+ve, >2 cases Yes P

gain 1720 MGAM Exon+ve, >2 cases Yes 7 142041787 142083554 41767 loss 1232 MTRNR2L6 Exon+ve, >2 cases Yes .

, 7 142041787 142083554 41767 loss 1242 MTRNR2L6 Exon+ve, >2 cases Yes 7 142041787 142083554 41767 loss 1347 MTRNR2L6 Exon+ve, >2 cases Yes .
, 7 142041787 142083554 41767 loss 1349 MTRNR2L6 Exon+ve, >2 cases Yes .
, o, 7 142041787 142083554 41767 loss 1374 MTRNR2L6 Exon+ve, >2 cases Yes 7 142041787 142083554 41767 loss 1568 MTRNR2L6 Exon+ve, >2 cases Yes 7 142041787 142083554 41767 loss 1601 MTRNR2L6 Exon+ve, >2 cases Yes 7 142041787 142083554 41767 loss 1697 MTRNR2L6 Exon+ve, >2 cases Yes 7 142041787 142083554 41767 loss 1753 MTRNR2L6 Exon+ve, >2 cases Yes 7 142041787 142083554 41767 loss 1784 MTRNR2L6 Exon+ve, >2 cases Yes 7 142041787 142083554 41767 loss 1803 MTRNR2L6 Exon+ve, >2 cases Yes .o n 7 142041787 142083554 41767 loss 1837 MIRNR2L6 Exon+ve, >2 cases Yes -3 7 142041787 142083554 41767 loss 1930 MTRNR2L6 Exon+ve, >2 cases Yes ci) t.) 7 142041787 142083554 41767 loss 2018 MTRNR2L6 Exon+ve, >2 cases Yes .., w 7 142041787 142083554 41767 loss 2024 MTRNR2L6 Exon+ve, >2 cases Yes -I-r.) 7 147702365 147710037 7672 Loss 1728 CNTNAP2 143 Ctrl pos High OR No ul .1, ca a Loss 1227 CNTNAP2 Ctrl pos High OR No Loss 1346 CNTNAP2 Ctrl pos High OR No 0 Loss 1371 CNTNAP2 Ctrl pos High OR No "
=
.., Loss 1517 CNTNAP2 Ctrl pos High OR No w , ¨, Loss 1617 CNTNAP2 Ctrl pos High OR No "
=
=

Loss 1621 CNTNAP2 Ctrl pos High OR No .
ao Loss 1636 CNTNAP2 Ctrl pos High OR No Loss 1639 CNTNAP2 Ctrl pos High OR No Loss 1645 CNTNAP2 Ctrl pos High OR No Loss 1670 CNTNAP2 Ctrl pos High OR No Loss 1727 CNTNAP2 Ctrl pos High OR No Loss 1753 CNTNAP2 Ctrl pos High OR No Loss 1754 CNTNAP2 Ctrl pos High OR No Loss 1761 CNTNAP2 Ctrl pos High OR No P

Loss 1792 CNTNAP2 Ctrl pos High OR No .=

Loss 1803 CNTNAP2 Ctrl pos High OR No o's , Loss 1806 CNTNAP2 Ctrl pos High OR No Loss 1820 CNTNAP2 Ctrl pos High OR No .
, Loss 1826 CNTNAP2 Ctrl pos High OR No ' o, Loss 1836 CNTNAP2 Ctrl pos High OR No Loss 1854 CNTNAP2 Ctrl pos High OR No Loss 1867 CNTNAP2 Ctrl pos High OR No Loss 1872 CNTNAP2 Ctrl pos High OR No Loss 1916 CNTNAP2 Ctrl pos High OR No Loss 1918 CNTNAP2 Ctrl pos High OR No Loss 1960 CNTNAP2 Ctrl pos High OR No .o n Loss 2003 CNTNAP2 Ctrl pos High OR No -i Loss 2028 CNTNAP2 Ctrl pos High OR No ci) t.) Loss 2041 CNTNAP2 Ctrl pos High OR No =
.., w 7 147704200 147708382 4182 Gain 1220 CNTNAP2 144 Ctrl pos High OR No -i-ul .1, ca a Gain 1223 CNTNAP2 Ctrl pos High OR No Gain 1230 CNTNAP2 Ctrl pos High OR No 0 Gain 1234 CNTNAP2 Ctrl pos High OR No "
=
.., Gain 1240 CNTNAP2 Ctrl pos High OR No w , -, Gain 1252 CNTNAP2 Ctrl pos High OR No "
=
=

Gain 1281 CNTNAP2 Ctrl pos High OR No .
ao Gain 1282 CNTNAP2 Ctrl pos High OR No Gain 1284 CNTNAP2 Ctrl pos High OR No Gain 1286 CNTNAP2 Ctrl pos High OR No Gain 1290 CNTNAP2 Ctrl pos High OR No Gain 1307 CNTNAP2 Ctrl pos High OR No Gain 1308 CNTNAP2 Ctrl pos High OR No Gain 1309 CNTNAP2 Ctrl pos High OR No Gain 1318 CNTNAP2 Ctrl pos High OR No P

Gain 1320 CNTNAP2 Ctr1pos High OR No .=

Gain 1345 CNTNAP2 Ctrl pos High OR No o's , Gain 1389 CNTNAP2 Ctrl pos High OR No Gain 1405 CNTNAP2 Ctrl pos High OR No .
, Gain 1415 CNTNAP2 Ctrl pos High OR No ' o, Gain 1421 CNTNAP2 Ctrl pos High OR No Gain 1422 CNTNAP2 Ctrl pos High OR , No Gain 1423 CNTNAP2 Ctrl pos High OR No Gain 1425 CNTNAP2 Ctrl pos High OR No Gain 1432 CNTNAP2 Ctrl pos High OR No Gain 1434 CNTNAP2 Ctrl pos High OR No Gain 1438 CNTNAP2 Ctrl pos High OR No .o n Gain 1440 CNTNAP2 Ctrl pos High OR No -i Gain 1442 CNTNAP2 Ctrl pos High OR No ci) t.) Gain 1463 CNTNAP2 Ctrl pos High OR No =
.., w 7 147704200 147708382 4182 Gain 1466 CNTNAP2 145 Ctrl pos High OR No -i-ul .1, ca a Gain 1472 CNTNAP2 Ctrl pos High OR No Gain 1473 CNTNAP2 Ctrl pos High OR No 0 Gain 1490 CNTNAP2 Ctrl pos High OR No "
=
.., Gain 1492 CNTNAP2 Ctrl pos High OR No w , -, Gain 1495 CNTNAP2 Ctrl pos High OR No "
=
=

Gain 1496 CNTNAP2 Ctrl pos High OR No .
ao Gain 1497 CNTNAP2 Ctrl pos High OR No Gain 1498 CNTNAP2 Ctrl pos High OR No Gain 1502 CNTNAP2 Ctrl pos High OR No Gain 1504 CNTNAP2 Ctrl pos High OR No Gain 1506 CNTNAP2 Ctrl pos High OR No Gain 1508 CNTNAP2 Ctrl pos High OR No Gain 1512 CNTNAP2 Ctrl pos High OR No Gain 1513 CNTNAP2 Ctrl pos High OR No P

Gain 1514 CNTNAP2 Ctr1pos High OR No .=

Gain 1515 CNTNAP2 Ctrl pos High OR No o's , Gain 1519 CNTNAP2 Ctrl pos High OR No Gain 1520 CNTNAP2 Ctrl pos High OR No .
, Gain 1528 CNTNAP2 Ctrl pos High OR No ' o, Gain 1534 CNTNAP2 Ctrl pos High OR No Gain 1543 CNTNAP2 Ctrl pos High OR , No Gain 1544 CNTNAP2 Ctrl pos High OR No Gain 1556 CNTNAP2 Ctrl pos High OR No Gain 1557 CNTNAP2 Ctrl pos High OR No Gain 1558 CNTNAP2 Ctrl pos High OR No Gain 1559 CNTNAP2 Ctrl pos High OR No .o n Gain 1560 CNTNAP2 Ctrl pos High OR No -i Gain 1565 CNTNAP2 Ctrl pos High OR No ci) t.) Gain 1570 CNTNAP2 Ctrl pos High OR No =
.., w 7 147704200 147708382 4182 Gain 1571 CNTNAP2 146 Ctrl pos High OR No -i-ul .1, ca a Gain 1573 CNTNAP2 Ctrl pos High OR No Gain 1584 CNTNAP2 Ctrl pos High OR No 0 Gain 1586 CNTNAP2 Ctrl pos High OR No "
=
.., Gain 1592 CNTNAP2 Ctrl pos High OR No w , -, Gain 1597 CNTNAP2 Ctrl pos High OR No "
=
=

Gain 1601 CNTNAP2 Ctrl pos High OR No .
ao Gain 1602 CNTNAP2 Ctrl pos High OR No Gain 1603 CNTNAP2 Ctrl pos High OR No Gain 1610 CNTNAP2 Ctrl pos High OR No Gain 1618 CNTNAP2 Ctrl pos High OR No Gain 1619 CNTNAP2 Ctrl pos High OR No Gain 1620 CNTNAP2 Ctrl pos High OR No Gain 1622 CNTNAP2 Ctrl pos High OR No Gain 1624 CNTNAP2 Ctrl pos High OR No P

Gain 1626 CNTNAP2 Ctr1pos High OR No .=

Gain 1632 CNTNAP2 Ctrl pos High OR No o's , Gain 1640 CNTNAP2 Ctrl pos High OR No Gain 1641 CNTNAP2 Ctrl pos High OR No .
, Gain 1647 CNTNAP2 Ctrl pos High OR No ' o, Gain 1650 CNTNAP2 Ctrl pos High OR No Gain 1653 CNTNAP2 Ctrl pos High OR , No Gain 1654 CNTNAP2 Ctrl pos High OR No Gain 1662 CNTNAP2 Ctrl pos High OR No Gain 1667 CNTNAP2 Ctrl pos High OR No Gain 1688 CNTNAP2 Ctrl pos High OR No Gain 1707 CNTNAP2 Ctrl pos High OR No .o n Gain 1708 CNTNAP2 Ctrl pos High OR No -i Gain 1710 CNTNAP2 Ctrl pos High OR No ci) t.) Gain 1715 CNTNAP2 Ctrl pos High OR No =
.., w 7 147704200 147708382 4182 Gain 1720 CNTNAP2 147 Ctrl pos High OR No -i-ul .1, ca a Gain 1755 CNTNAP2 Ctrl pos High OR No Gain 1760 CNTNAP2 Ctrl pos High OR No 0 Gain 1774 CNTNAP2 Ctrl pos High OR No "
=
.., Gain 1779 CNTNAP2 Ctrl pos High OR No w , -, Gain 1782 CNTNAP2 Ctrl pos High OR No "
=
=

Gain 1783 CNTNAP2 Ctrl pos High OR No .
ao Gain 1784 CNTNAP2 Ctrl pos High OR No Gain 1796 CNTNAP2 Ctrl pos High OR No Gain 1804 CNTNAP2 Ctrl pos High OR No Gain 1805 CNTNAP2 Ctrl pos High OR No Gain 1808 CNTNAP2 Ctrl pos High OR No Gain 1811 CNTNAP2 Ctrl pos High OR No Gain 1813 CNTNAP2 Ctrl pos High OR No Gain 1814 CNTNAP2 Ctrl pos High OR No P

Gain 1815 CNTNAP2 Ctr1pos High OR No 2 .9 Gain 1818 CNTNAP2 Ctrl pos High OR No .9 , Gain 1831 CNTNAP2 Ctrl pos High OR No Gain 1832 CNTNAP2 Ctrl pos High OR No , Gain 1835 CNTNAP2 Ctrl pos High OR No 21 Gain 1838 CNTNAP2 Ctrl pos High OR No Gain 1839 CNTNAP2 Ctrl pos High OR , No Gain 1845 CNTNAP2 Ctrl pos High OR No Gain 1851 CNTNAP2 Ctrl pos High OR No Gain 1861 CNTNAP2 Ctrl pos High OR No Gain 1874 CNTNAP2 Ctrl pos High OR No Gain 1877 CNTNAP2 Ctrl pos High OR No .o n Gain 1881 CNTNAP2 Ctrl pos High OR No -i Gain 1883 CNTNAP2 Ctrl pos High OR No ci) t.) Gain 1893 CNTNAP2 Ctrl pos High OR No =
.., w 7 147704200 147707161 2961 Gain 1895 CNTNAP2 148 Ctrl pos High OR No -i-ul .1, ca a Gain 1905 CNTNAP2 Ctrl pos High OR No Gain 1907 CNTNAP2 Ctrl pos High OR No 0 Gain 1927 CNTNAP2 Ctrl pos High OR No "
=
.., Gain 1930 CNTNAP2 Ctrl pos High OR No w , ¨, Gain 1944 CNTNAP2 Ctrl pos High OR No "
=
=

Gain 1948 CNTNAP2 Ctrl pos High OR No .
ao Gain 1951 CNTNAP2 Ctrl pos High OR No Gain 1970 CNTNAP2 Ctrl pos High OR No Gain 1994 CNTNAP2 Ctrl pos High OR No Gain 1997 CNTNAP2 Ctrl pos High OR No Gain 2006 CNTNAP2 Ctrl pos High OR No Gain 2024 CNTNAP2 Ctrl pos High OR No Gain 2026 CNTNAP2 Ctrl pos High OR No Gain 2034 CNTNAP2 Ctrl pos High OR No P

loss 1346 CNTNAP2 Exon+ve, >2 cases Yes .=

loss 1403 CNTNAP2 Exon+ve, >2 cases Yes o's , loss 1988 CNTNAP2 Exon+ve, >2 cases Yes 7 153158956 153290833 131877 gain 1486 DPP6 Exon+ve, >2 cases Yes .
, 7 153158956 153290833 131877 gain 1730 DPP6 Exon+ve, >2 cases Yes ' o, 7 153158956 153290833 131877 gain 1755 DPP6 Exon+ve, >2 cases Yes 7 153290834 153384745 , 93911 , gain 1730 , DPP6 Exon+ve, >2 cases , Yes 7 153290834 153384745 93911 gain 1755 DPP6 Exon+ve, >2 cases Yes 7 153742206 153775545 33339 gain 1730 DPP6 Exon+ve, >2 cases Yes 7 153742206 153775545 33339 loss 1885 DPP6 Exon+ve, >2 cases Yes 7 153798366 153819463 21097 gain 1730 DPP6 Exon+ve, >2 cases Yes 7 153798366 153819463 21097 loss 1949 DPP6 Exon+ve, >2 cases Yes .o n 60483 gain 1572 DEFA6,DEFB1 Exon+ve, >2 cases Yes -3 60483 loss 1621 DEFA6,DEFB1 Exon+ve, >2 cases Yes ci) t.) loss 1663 PINX1 Exon+ve, >2 cases Yes =
.., w gain 2042 PINX1 Exon+ve, >2 cases Yes -I-r.) ul .1, ca a 8 10670976 10732498 61522 loss 1663 PINX1,MIR1322 Exon+ve, >2 cases Yes 8 10670976 10732498 61522 gain 2042 P1NXLMIR1322 Exon+ve, >2 cases Yes 8 22631429 22641498 10069 loss 1293 PEBP4 Exon+ve, >2 cases Yes 8 22631429 22641498 10069 loss 1296 PEBP4 Exon+ve, >2 cases Yes 8 22631429 22641498 10069 loss 1842 PEBP4 Exon+ve, >2 cases Yes 8 22631429 22641498 10069 loss 1849 PEBP4 Exon+ve, >2 cases Yes ao loss 1251 AP3M2 Exon+ve, distinct Yes CNVs, same Gene gain 1634 AP3M2 Exon+ve, distinct Yes CNVs, same Gene 8 43057445 43170237 112792 gain 1406 HGSNAT,ENTA,SGK196 Exon+ve, >2 cases Yes 8 43057445 43170237 112792 gain 1695 HGSNAT,ENTA,SGK196 Exon+ve, >2 cases Yes gain 1316 POTEA Exon+ve, >2 cases Yes gain 1406 POTEA Exon+ve, >2 cases Yes loss 1549 POTEA Exon+ve, >2 cases Yes gain 1695 POTEA Exon+ve, >2 cases Yes loss 1604 RGS20 Exon+ve, >2 cases Yes loss 1993 RGS20 Exon+ve, >2 cases Yes loss 1275 MYBL1 Exon+ve, >2 cases Yes loss 1650 MYBL1 Exon+ve, >2 cases Yes loss 1638 SNX16 Exon+ve, >2 cases Yes loss 1950 SNX16 Exon+ve, >2 cases Yes gain 1854 NDRG1 Exon+ve, distinct Yes CNVs, same Gene loss 1552 NDRG1 Exon+ve, distinct Yes CNVs, same Gene 13668 gain 1463 KIAA1432 Exon+ve, >2 cases Yes -3 13668 gain 1667 KIAA1432 Exon+ve, >2 cases Yes ci) 13668 gain 1818 KIAA1432 Exon+ve, >2 cases Yes 23568 loss 1609 GLDC Exon+ve, distinct Yes CNVs, same Gene r.) loss 1391 GLDC Exon+ve, distinct Yes CNVs, same Gene loss 1386 C9orf93 Exon+ve, >2 cases Yes L') =
-, loss 1477 C9orf93 Exon+ve, >2 cases Yes w , ¨, loss 1594 C9orf93 Exon+ve, >2 cases Yes i..) =

loss 1881 C9orf93 Exon+ve, >2 cases Yes =
ao 9 17260655 17271186 10531 loss 1743 CNTLN
Exon+ve, distinct Yes CNVs, same Gene .
.

loss 1502 CNTLN Exon+ve, distinct Yes CNVs, same Gene loss 1418 SLC24A2 Exon+ve, >2 cases Yes loss 1511 SLC24A2 Exon+ve, >2 cases Yes loss 1418 IFNA22P Exon+ve, >2 cases Yes gain 2020 IFNA22P Exon+ve, >2 cases Yes P

loss 1418 KLHL9 Exon+ve, >2 cases Yes loss 1687 KLHL9 Exon+ve, >2 cases Yes .

, 9 21422879 21434788 11909 loss 1418 IFNA 1 Exon+ve, >2 cases Yes 9 21422879 21434788 11909 loss 1777 IFNA1 Exon+ve, >2 cases Yes .
, loss 1539 PLAA Exon+ve, >2 cases Yes loss 1656 PLAA Exon+ve, >2 cases Yes o, loss 2003 DDX58 Exon+ve, distinct Yes CNVs, same Gene loss 1645 DDX58 Exon+ve, distinct Yes CNVs, same Gene gain 1716 GNE Exon+ve, >2 cases Yes gain 1829 GNE Exon+ve, >2 cases Yes .o gain 1793 C9orf85 Exon+ve, >2 cases Yes n gain 1883 C9orf85 Exon+ve, >2 cases Yes ci) gain 1893 C9orf85 Exon+ve, >2 cases Yes =
¨, 9 79049925 79067111 17186 gain 1782 VPS13A
Exon+ve, >2 cases Yes w =-==
9 79049925 79067111 17186 gain 1897 VPS13A
Exon+ve, >2 cases Yes "
ui .1, ca a 9 79049925 79067111 17186 gain 1938 VPS13A
Exon+ve, >2 cases Yes 9 92596909 92617806 20897 gain 1423 SYK
Exon+ve, distinct Yes 0 CNVs, same Gene t-) =
.., 9 92658019 92700662 42643 gain 1626 SYK
Exon+ve, distinct Yes w , -, CNVs, same Gene t..) =
9 98831789 98831814 25 gain 1629 CTSL2 Exon+ve, >2 cases Yes =
ao 9 98831789 98831814 25 loss 1715 CTSL2 Exon+ve, >2 cases Yes 9 , 98831789 , 98831814 25 loss 1718 CTSL2 ,Exon+ve, >2 cases Yes , 9 115858589 115903754 45165 gain 1406 ZNF618,AMBP,KIF12 Exon+ve, >2 cases Yes 9 115858589 115903754 45165 gain 2020 ZNF618,AMBP,K1F12 Exon+ve, >2 cases Yes 9 116088109 116118906 , 30797 , gain 1406 , COL27A1 Exon+ve, >2 cases , Yes 9 116088109 116118906 30797 gain 2020 COL27A1 Exon+ve, >2 cases Yes loss 1301 AKNA Exon+ve, >2 cases Yes gain 2020 AKNA Exon+ve, >2 cases Yes P
9 118405993 118469712 63719 loss 1622 ASTN2 Exon+ve, distinct Yes CNVs, same Gene o' , 9 118469713 118507633 37920 loss 1559 ASTN2,TRIM32 Exon+ve, >2 cases Yes 9 118469713 118507633 37920 loss 1622 ASTN2,TRIM32 Exon+ve, >2 cases Yes .
, loss 1559 ASTN2 Exon+ve, distinct Yes 037 o, CNVs, same Gene 9 127014097 127028444 14347 loss 1222 RABEPK
Exon+ve, >2 cases Yes 9 127014097 127028444 14347 loss 1669 RABEPK
Exon+ve, >2 cases Yes loss 1621 LAMC3 Exon+ve, >2 cases Yes loss 1639 LAMC3 Exon+ve, >2 cases Yes loss 1720 LAMC3 Exon+ve, >2 cases Yes loss 1345 LAMC3 Exon+ve, >2 cases Yes .o 9 132912215 132916079 , 3864 Joss 1621 , LAMC3 Exon+ve, >2 cases , Yes n loss 1639 LAMC3 Exon+ve, >2 cases Yes ci) loss 1720 LAMC3 Exon+ve, >2 cases Yes =
.., loss 1345 LAMC3 Exon+ve, >2 cases Yes w -I-loss 1621 LAMC3 Exon+ve, >2 cases Yes r.) ul .1, ca a loss 1639 LAMC3 Exon+ve, >2 cases Yes loss 1720 LAMC3 Exon+ve, >2 cases Yes loss 1897 LAMC3 Exon+ve, >2 cases Yes "
=
.., loss 1321 CEL Exon+ve, >2 cases Yes w , ¨, gain 1887 CEL Exon+ve, >2 cases Yes "
=
=

loss 1307 FBX018 Exon+ve, >2 cases Yes .
ao 2901 loss 1409 FBX018 Exon+ve, >2 cases Yes 2901 loss 1619 FBX018 Exon+ve, >2 cases Yes 2901 loss 1654 FBX018 Exon+ve, >2 cases Yes 2901 loss 2024 FBX018 Exon+ve, >2 cases Yes 4791 loss 1307 FBX018 Exon+ve, >2 cases Yes 4791 loss 1409 FBX018 Exon+ve, >2 cases Yes 4791 loss 1619 FBX018 Exon+ve, >2 cases Yes 4791 loss 1654 FBX018 Exon+ve, >2 cases Yes P

gain 1401 ARHGAP21 Exon+ve, >2 cases Yes .=

loss 1548 ARHGAP21 Exon+ve, >2 cases Yes o's , loss 1699 ARHGAP21 Exon+ve, >2 cases Yes loss 1724 ARHGAP21 Exon+ve, >2 cases Yes .
, gain 1820 ARHGAP21 Exon+ve, >2 cases Yes ' o, loss 1961 ARHGAP21 Exon+ve, >2 cases Yes 10 25051426 25057232 , 5806 , gain 1401 , ARHGAP21 Exon+ve, >2 cases , Yes gain 1820 ARHGAP21 Exon+ve, >2 cases Yes 22149 gain 1299 ZNF37BP Exon+ve, >2 cases Yes 22149 gain 1746 ZNF37BP Exon+ve, >2 cases Yes 10 42955952 43009997 54045 gain 1746 RASGEF1A,CSGALNACT2 Exon+ve, >2 cases Yes 10 42955952 43009997 54045 gain 1968 RASGEF1A,CSGALNACT2 Exon+ve, >2 cases Yes .o n 38566 gain 1295 L0C100133308 Exon+ve, >2 cases Yes -3 38566 gain 1968 L0C100133308 Exon+ve, >2 cases Yes ci) t.) gain 1408 ANUBL1 Exon+ve, >2 cases Yes =
.., w 10 45478103 45487334 9231 gain 1653 ANUBL1 153 Exon+ve, >2 cases Yes -I-r.) ul .1, ca a 55328218 55334606 6388 gain 1309 PCDH15 Exon+ve, >2 cases Yes gain 1429 PCDH15 Exon+ve, >2 cases Yes gain 1429 PCDH15 Exon+ve, >2 cases Yes "
=
.., loss 1475 PCDH15 Exon+ve, >2 cases Yes w , ¨, loss 1537 PCDH15 Exon+ve, >2 cases Yes "
=
=

96041 loss 1835 CTNNA3 Exon+ve, distinct Yes .
ao CNVs, same Gene loss 1970 CTNNA3 Exon+ve, distinct Yes CNVs, same Gene 82565 gain 1780 CTNNA3 Exon+ve, distinct Yes CNVs, same Gene gain 1292 ATRNL1 Exon+ve, >2 cases Yes gain 1394 ATRNL1 Exon+ve, >2 cases Yes gain 1834 ATRNL1 Exon+ve, >2 cases Yes P
ip gain 1880 ATRNL1 Exon+ve, >2 cases Yes gain 1924 ATRNL1 Exon+ve, >2 cases Yes .

..i loss 1287 PNLIPRP3 Exon+ve, >2 cases Yes gain 2036 PNLIPRP3 Exon+ve, >2 cases Yes .
, loss 1572 EBF3 Exon+ve, >2 cases Yes gain 1597 EBF3 Exon+ve, >2 cases Yes o, gain 1644 EBF3 Exon+ve, >2 cases Yes loss 1691 EBF3 Exon+ve, >2 cases Yes loss 1703 EBF3 Exon+ve, >2 cases Yes loss 1704 EBF3 Exon+ve, >2 cases Yes gain 1709 EBF3 Exon+ve, >2 cases Yes loss 1724 EBF3 Exon+ve, >2 cases Yes mo n
11 5766616 5774108 7492 gain 1394 0R52N1 Exon+ve, >2 cases Yes -3 7492 gain 1536 0R52N1 Exon+ve, >2 cases Yes ci) i.) 7492 gain 1538 0R52N1 Exon+ve, >2 cases Yes =
.., 7492 gain 1551 0R52N1 Exon+ve, >2 cases Yes w -I-r.) gain 1727 0R52N1 154 Exon+ve, >2 cases Yes !A
.1, ca a 7492 gain 1821 0R52N1 Exon+ve, >2 cases Yes 7492 gain 1823 0R52N1 Exon+ve, >2 cases Yes 7492 gain 1824 0R52N1 Exon+ve, >2 cases Yes "
=
.., 7492 gain 1825 0R52N1 Exon+ve, >2 cases Yes w , ¨, 7492 gain 1902 0R52N1 Exon+ve, >2 cases Yes "
=
=

43094 gain 1301 0R52E4 Exon+ve, >2 cases Yes .
ao 43094 gain 1333 0R52E4 Exon+ve, >2 cases Yes 43094 gain 1593 0R52E4 Exon+ve, >2 cases Yes 43094 gain 1920 0R52E4 Exon+ve, >2 cases Yes gain 1609 ANO5 Exon+ve, >2 cases Yes loss 2001 ANO5 Exon+ve, >2 cases Yes gain 1324 Cllorf96 Exon+ve, >2 cases Yes loss 1396 Cllorf96 Exon+ve, >2 cases Yes gain 1530 C 1 1 orf96 Exon+ve, >2 cases Yes P
ip loss 1829 C 1 lorf96 Exon+ve, >2 cases Yes .=

gain 1860 C 1 1 orf96 Exon+ve, >2 cases Yes o's ..i loss 1874 C 1 1 orf96 Exon+ve, >2 cases Yes gain 1996 C 1 1 orf96 Exon+ve, >2 cases Yes .
, ip 13202 loss 1798 C 1 1 orf49,ARFGAP2,PACSIN3 Exon+ve, >2 cases Yes ' o, 13202 loss 1852 Cllorf49,ARFGAP2,PACSIN3 Exon+ve, >2 cases Yes 13202 loss 1854 C 1 1 orf49,ARFGAP2,PACSIN3 Exon+ve, >2 cases , Yes 13202 loss 1855 C 1 lorf49,ARFGAP2,PACSIN3 Exon+ve, >2 cases Yes 13202 loss 1857 C 1 1 orf49,ARFGAP2,PACSIN3 Exon+ve, >2 cases Yes 13202 loss 1936 C 1 1 orf49,ARFGAP2,PACSIN3 Exon+ve, >2 cases Yes 13202 loss 2031 C 1 lorf49,ARFGAP2,PACSIN3 Exon+ve, >2 cases Yes 45193 gain 1708 0R4A5 Exon+ve, >2 cases Yes mo n 45193 gain 1943 0R4A5 Exon+ve, >2 cases Yes -3 loss 1776 RARRES3 Exon+ve, >2 cases Yes ci) t.) loss 1950 RARRES3 Exon+ve, >2 cases Yes =
.., w loss 1958 RIN1 Exon+ve, >2 cases Yes -I-r.) !A
.1, ca a loss 1993 RIN1 Exon+ve, >2 cases Yes 11 70167828 70206326 38498 loss 1835 SHANK2 Special Yes loss 1349 CEP57 Exon+ve, >2 cases Yes "
=
.., loss 1946 CEP57 Exon+ve, >2 cases Yes w , ¨, 11 99646264 99660303 14039 loss 1936 CNTN5 Special Yes "
=
=

loss 1276 BTG4 Exon+ve, >2 cases Yes .
ao loss 1465 BTG4 Exon+ve, >2 cases Yes gain 1713 KIRREL3 Exon+ve, >2 cases Yes gain 1861 KIRREL3 Exon+ve, >2 cases Yes gain 1429 ETS1 Exon+ve, >2 cases Yes gain 1779 ETS1 Exon+ve, >2 cases Yes
12 8173177 8179355 6178 gain 1246 POU5F1P3,CLEC4A Exon+ve, >2 cases Yes 6178 gain 1308 POU5F1P3,CLEC4A Exon+ve, >2 cases Yes 1521 loss 1264 CLECL1 Exon+ve, >2 cases Yes P
ip 1521 loss 1705 CLECL1 Exon+ve, >2 cases Yes .=

loss 1225 SLCO1B3 Exon+ve, >2 cases Yes o's ..i loss 1488 SLCO1B3 Exon+ve, >2 cases Yes loss 1577 SLCO1B3 Exon+ve, >2 cases Yes .
, ip loss 1581 SLCO1B3 Exon+ve, >2 cases Yes ' o, 12 21514182 21516409 2227 gain 1465 RECQL,PYROXD1 Exon+ve, >2 cases Yes 12 21514182 21516409 , 2227 , gain 1925 , RECQL,PYROXD1 Exon+ve, >2 cases , Yes 59229 gain 1768 ANKRD33 Exon+ve, >2 cases Yes 59229 gain 1836 ANKRD33 Exon+ve, >2 cases Yes 17929 loss 1844 KRT6C Exon+ve, >2 cases Yes 17929 loss 2037 KRT6C Exon+ve, >2 cases Yes loss 1447 ELK3 Exon+ve, >2 cases Yes mo n loss 1728 ELK3 Exon+ve, >2 cases Yes -3 loss 1742 ELK3 Exon+ve, >2 cases Yes ci) t.0 loss 1957 ELK3 Exon+ve, >2 cases Yes =
.., w loss 1961 ELK3 Exon+ve, >2 cases Yes -I-r.) !A
.1, ca a loss 1965 ELK3 Exon+ve, >2 cases Yes loss 1967 ELK3 Exon+ve, >2 cases Yes loss 1872 ANKS1B Exon+ve, >2 cases Yes "
=
.., loss 1884 ANKS1B Exon+ve, >2 cases Yes w , ¨, loss 1279 GIT2 Exon+ve, >2 cases Yes "
=
=

loss 1665 GIT2 Exon+ve, >2 cases Yes .
ao 12 110666479 110799506 133027 gain 1763 ACAD10,MAPKAPK5,C12orf47,ALDH2 Exon+ve, >2 cases Yes 12 110666479 110799506 133027 gain 2022 ACADIO,MAPKAPK5,C12orf47,ALDH2 Exon+ve, >2 cases Yes loss 1416 ULK1 Exon+ve, >2 cases Yes gain 1448 ULK1 Exon+ve, >2 cases Yes loss 1471 ULK1 Exon+ve, >2 cases Yes loss 1474 ULK1 Exon+ve, >2 cases Yes loss 1492 ULK1 Exon+ve, >2 cases Yes loss 1493 ULK1 Exon+ve, >2 cases Yes P

loss 1496 ULK1 Exon+ve, >2 cases Yes .=

loss 1497 ULK1 Exon+ve, >2 cases Yes o's , loss 1498 ULK1 Exon+ve, >2 cases Yes loss 1500 ULK1 Exon+ve, >2 cases Yes .
, loss 1505 ULK1 Exon+ve, >2 cases Yes ' o, loss 1517 ULK1 Exon+ve, >2 cases Yes loss 1566 ULK1 Exon+ve, >2 cases , Yes loss 1579 ULK1 Exon+ve, >2 cases Yes loss 1580 ULK1 Exon+ve, >2 cases Yes loss 1582 ULK1 Exon+ve, >2 cases Yes
13 22323381 22381531 58150 gain 1662 BASP1P1 Exon+ve, >2 cases Yes 58150 loss 1714 BASP1P1 Exon+ve, >2 cases Yes .o n 58150 gain 1744 BASP1P1 Exon+ve, >2 cases Yes -3 58150 loss 1919 BASP1P1 Exon+ve, >2 cases Yes ci) t.) gain 1564 C13orf38-SOHLH2,C13orf38 Exon+ve, >2 cases Yes =
.., w gain 1803 C13orf38-SOHLH2,C13orf38 Exon+ve, >2 cases Yes -I-r.) ul .1, ca a loss 1536 EPSTI1 Exon+ve, distinct Yes CNVs, same Gene gain 1502 EPSTI1 Exon+ve, distinct Yes L') =
-, CNVs, same Gene w , 48219 gain 1502 EPSTI1 Exon+ve, >2 cases Yes ¨, t..) =

48219 gain 1897 EPSTI1 Exon+ve, >2 cases Yes =
ao 18347 gain 1897 EPSTI1 Exon+ve, distinct Yes CNVs, same Gene
14 22811680 22814547 , 2867 , gain 1642 ,HOMEZ
Exon+ve, >2 cases , Yes gain 1875 HOMEZ Exon+ve, >2 cases Yes 14 22929952 22958797 28845 Loss 1537 MYH6 Ctrl pos High OR Yes 14 22929952 22959469 29517 Loss 1669 MYH6 Ctrl pos High OR Yes Loss 1577 1\'IYH6 Ctrl pos High OR Yes 12208 Loss 1856 MYH6 Ctrl pos High OR Yes P

Loss 1718 MYH6 Ctrl pos High OR Yes .

Loss 1802 MYH6 Ctrl pos High OR Yes .

, Loss 1816 MYH6 Ctrl pos High OR Yes Loss 1817 MYH6 Ctrl pos High OR Yes .
, Loss 1819 MYH6 Ctrl pos High OR Yes Loss 1820 MYH6 Ctrl pos High OR Yes o, Loss 1850 MYH6 Ctrl pos High OR Yes Loss 1895 MYH6 Ctrl pos High OR Yes Loss 1993 MYH6 Ctrl pos High OR Yes 14 22946615 22955470 8855 Loss 2032 MYH6 Ctrl pos High OR Yes Loss 2043 MYH6 Ctrl pos High OR Yes loss 1775 HECTD1 Exon+ve, distinct Yes .o n CNVs, same Gene loss 1403 HECTD1 Exon+ve, distinct Yes ci) CNVs, same Gene =
¨, 10665 loss 1570 MIR548Y Exon+ve, >2 cases Yes w =-==

10665 gain 1709 MIR548Y Exon+ve, >2 cases Yes "
ul =P, ca a loss 1226 C14orf166 Exon+ve, >2 cases Yes loss 1253 C14orf166 Exon+ve, >2 cases Yes loss 1650 C14orf166 Exon+ve, >2 cases Yes "
=
.., loss 1269 SLC38A6 Exon+ve, >2 cases Yes w , ¨, gain 1281 SLC38A6 Exon+ve, >2 cases Yes "
=
=

loss 1470 SLC38A6 Exon+ve, >2 cases Yes .
ao gain 1773 SLC38A6 Exon+ve, >2 cases Yes loss 2000 SLC38A6 Exon+ve, >2 cases Yes 10106 loss 1852 UPF0639 Exon+ve, >2 cases Yes 10106 loss 1871 UPF0639 Exon+ve, >2 cases Yes loss 1314 MAP3K9 Exon+ve, >2 cases Yes loss 1910 MAP3K9 Exon+ve, >2 cases Yes loss 2001 MAP3K9 Exon+ve, >2 cases Yes loss 2002 MAP3K9 Exon+ve, >2 cases Yes P

gain 1291 HEATR4 Exon+ve, >2 cases Yes .=

loss 1806 HEATR4 Exon+ve, >2 cases Yes o's , loss 1237 HEATR4 Exon+ve, >2 cases Yes gain 1291 HEATR4 Exon+ve, >2 cases Yes .
, loss 1237 HEATR4 Exon+ve, >2 cases Yes ' o, gain 1291 HEATR4 Exon+ve, >2 cases Yes loss 1676 HEATR4 Exon+ve, >2 cases , Yes loss 1687 HEATR4 Exon+ve, >2 cases Yes loss 1718 HEATR4 Exon+ve, >2 cases Yes loss 1721 HEATR4 Exon+ve, >2 cases Yes 59617 loss 1908 NRXN3 Exon+ve, distinct Yes CNVs, same Gene .o n 25126 loss 2036 NRXN3 Exon+ve, distinct Yes -3 CNVs, same Gene ci) i.) gain 1790 SLC25A29 Exon+ve, distinct Yes .., w CNVs, same Gene -I-loss 1705 SLC25A29 Exon+ve, distinct Yes r.) ul .1, ca a CNVs, same Gene gain 1447 TRAF3 Exon+ve, >2 cases Yes gain 1838 TRAF3 Exon+ve, >2 cases Yes gain 1447 TRAF3 Exon+ve, >2 cases Yes loss 1820 TRAF3 Exon+ve, >2 cases Yes gain 1447 TRAF3 Exon+ve, >2 cases Yes ao loss 1800 TRAF3 Exon+ve, >2 cases Yes loss 1820 TRAF3 Exon+ve, >2 cases Yes
15 26805834 27028093 222259 gain 1988 L0C646278,L0C100289656,APBA2 Exon+ve, >2 cases Yes 222259 loss 1994 L0C646278,L0C100289656,APBA2 Exon+ve, >2 cases Yes 61206 gain 1988 FAM189A1 Exon+ve, >2 cases Yes 61206 loss 1994 FAM189A1 Exon+ve, >2 cases Yes 148085 gain 1988 FAM189A1,NDNL2 Exon+ve, >2 cases Yes 148085 loss 1994 FAM189A1,NDNL2 Exon+ve, >2 cases Yes 102612 gain 1988 FAM189A1 Exon+ve, >2 cases Yes 102612 loss 1994 FAM189A1 Exon+ve, >2 cases Yes o's loss 1630 UBR1 Exon+ve, >2 cases Yes loss 2018 UBR1 Exon+ve, >2 cases Yes loss 1638 CASC4 Exon+ve, >2 cases Yes loss 1659 CASC4 Exon+ve, >2 cases Yes loss 1660 CASC4 Exon+ve, >2 cases Yes loss 1662 CASC4 Exon+ve, >2 cases Yes loss 1237 TEX9,MNS1 Exon+ve, >2 cases Yes loss 1347 TEX9,MNS1 Exon+ve, >2 cases Yes loss 1441 TEX9,MNS1 Exon+ve, >2 cases Yes loss 1456 TEX9,MNS1 Exon+ve, >2 cases Yes loss 1494 1'EX9,MNS1 Exon+ve, >2 cases Yes -3 loss 1496 TEX9,MNS1 Exon+ve, >2 cases Yes ci) loss 1497 TEX9,MNS1 Exon+ve, >2 cases Yes loss 1997 1'EX9,MNS1 Exon+ve, >2 cases Yes JI
r.) loss 1680 ALDH1A2 Exon+ve, distinct Yes CNVs, same Gene loss 1680 ALDH1A2 Exon+ve, distinct Yes 1') =
-, CNVs, same Gene w , ¨, 10994 gain 1293 NE01 Exon+ve, >2 cases Yes i..) =

10994 loss 1415 NE01 Exon+ve, >2 cases Yes =
ao gain 1309 CYP1A1 Exon+ve, >2 cases Yes loss 1415 CYP1A1 ,Exon+ve, >2 cases Yes , 16508 gain 1301 MAN2C1,SIN3A Exon+ve, >2 cases Yes 16508 loss 1415 MAN2C1,SIN3A Exon+ve, >2 cases Yes 18616 Joss 1415 , SNUPN Exon+ve, >2 cases , Yes 18616 gain 2018 SNUPN Exon+ve, >2 cases Yes loss 1415 SNUPN Exon+ve, >2 cases Yes loss 1773 SNUPN Exon+ve, >2 cases Yes P
ip gain 2018 SNUPN Exon+ve, >2 cases Yes 39164 loss 1415 IMP3,SNX33,SNUPN Exon+ve, >2 cases Yes .

..i 39164 gain 2018 IMP3,SNX33,SNUPN Exon+ve, >2 cases Yes gain 1354 EFTUD1 Exon+ve, >2 cases Yes .
, ip gain 1740 EFTUD1 Exon+ve, >2 cases Yes .
, o, 35972 gain 1354 EFTUD1,FAM154B Exon+ve, >2 cases Yes 35972 gain 1740 EFTUD1,FAM154B Exon+ve, >2 cases Yes loss 1317 KIF7 Exon+ve, >2 cases Yes gain 1548 KIF7 Exon+ve, >2 cases Yes loss 1317 KIF7 Exon+ve, >2 cases Yes gain 1548 KIF7 Exon+ve, >2 cases Yes loss 1738 KIF7 Exon+ve, >2 cases Yes mo n gain 1309 L0C400456 Exon+ve, >2 cases Yes -3 gain 1825 L0C400456 Exon+ve, >2 cases Yes ci) t.) gain 1837 L0C400456 Exon+ve, >2 cases Yes .., w gain 1841 L0C400456 Exon+ve, >2 cases Yes -I-r.) 15 99236636 99239178 2542 loss 1544 ALDH1A3 161 Exon+ve, >2 cases Yes .1, ca a loss 1626 ALDH1A3 Exon+ve, >2 cases Yes gain 1644 ALDH1A3 Exon+ve, >2 cases Yes gain 1404 SELS Exon+ve, >2 cases Yes "
=
.., gain 1728 SELS Exon+ve, >2 cases Yes w , ¨, loss 1389 SELS Exon+ve, >2 cases Yes "
=
=

gain 1401 SELS Exon+ve, >2 cases Yes .
ao gain 1404 SELS Exon+ve, >2 cases Yes loss 1413 SELS Exon+ve, >2 cases Yes loss 1416 SELS Exon+ve, >2 cases Yes gain 1434 SELS Exon+ve, >2 cases Yes loss 1446 SELS Exon+ve, >2 cases Yes loss 1449 SELS Exon+ve, >2 cases Yes loss 1461 SELS Exon+ve, >2 cases Yes loss 1477 SELS Exon+ve, >2 cases Yes P

loss 1505 SELS Exon+ve, >2 cases Yes .=

loss 1529 SELS Exon+ve, >2 cases Yes o's , loss 1548 SELS Exon+ve, >2 cases Yes loss 1559 SELS Exon+ve, >2 cases Yes .
, loss 1572 SELS Exon+ve, >2 cases Yes ' o, gain 1576 SELS Exon+ve, >2 cases Yes loss 1584 SELS Exon+ve, >2 cases , Yes gain 1596 SELS Exon+ve, >2 cases Yes loss 1609 SELS Exon+ve, >2 cases Yes gain 1633 SELS Exon+ve, >2 cases Yes loss 1672 SELS Exon+ve, >2 cases Yes loss 1687 SELS Exon+ve, >2 cases Yes .o n gain 1728 SELS Exon+ve, >2 cases Yes -3 loss 1829 SELS Exon+ve, >2 cases Yes ci) t.) gain 1842 SELS Exon+ve, >2 cases Yes =
.., w loss 1913 SELS Exon+ve, >2 cases Yes -I-r.) ul .1, ca a loss 1964 SELS Exon+ve, >2 cases Yes
16 3047597 3065144 17547 loss 1585 MMP25,IL32 Exon+ve, >2 cases Yes 17547 loss 1804 MMP25,IL32 Exon+ve, >2 cases Yes "
=
.., 17547 loss 1919 MMP25,IL32 Exon+ve, >2 cases Yes w , ¨, 2192 loss 1533 CREBBP Exon+ve, >2 cases Yes "
=
=

2192 loss 1539 CREBBP Exon+ve, >2 cases Yes .
ao 2192 gain 1567 CREBBP Exon+ve, >2 cases Yes 2192 loss 1590 CREBBP Exon+ve, >2 cases Yes 5127 loss 1442 SRL Exon+ve, >2 cases Yes 5127 gain 1567 SRL Exon+ve, >2 cases Yes 14584 gain 1567 L0C342346 Exon+ve, >2 cases Yes 14584 loss 1689 L0C342346 Exon+ve, >2 cases Yes 13999 gain 1567 L0C342346 Exon+ve, >2 cases Yes 13999 loss 1689 L0C342346 Exon+ve, >2 cases Yes P

1759 loss 1419 C16orf89 Exon+ve, >2 cases Yes .=

1759 gain 1567 C16orf89 Exon+ve, >2 cases Yes o's , loss 1230 DNAH3 Exon+ve, >2 cases Yes loss 1760 DNAH3 Exon+ve, >2 cases Yes .
, 12896 gain 1426 VWA3A Exon+ve, >2 cases Yes ' o, 12896 gain 1946 VWA3A Exon+ve, >2 cases Yes 12896 , gain 1962 , VWA3A Exon+ve, >2 cases , Yes loss 1295 XPO6 Exon+ve, >2 cases Yes loss 1917 XPO6 Exon+ve, >2 cases Yes 12193 gain 1232 TGFB1I1,ARMC5 Exon+ve, >2 cases Yes 12193 gain 1508 TGFB1I1,ARMC5 Exon+ve, >2 cases Yes gain 1524 CSDAP1 Exon+ve, >2 cases Yes .o n gain 1618 CSDAP1 Exon+ve, >2 cases Yes -3 loss 1395 BRD7 Exon+ve, >2 cases Yes ci) t.) loss 1409 BRD7 Exon+ve, >2 cases Yes =
.., w loss 1428 BRD7 Exon+ve, >2 cases Yes -I-r.) ul .1, ca a loss 1858 PLA2G15 Exon+ve, >2 cases Yes loss 2023 PLA2G15 Exon+ve, >2 cases Yes loss 1538 AARS Exon+ve, >2 cases Yes "
=
-, loss 1793 AARS Exon+ve, >2 cases Yes w , ¨, loss 1293 FA2H Exon+ve, >2 cases Yes "
=
=

loss 1297 FA2H Exon+ve, >2 cases Yes .
ao loss 1293 FA2H Exon+ve, >2 cases Yes loss 1297 FA2H -- Exon+ve, >2 cases Yes loss 1918 FA2H Exon+ve, >2 cases Yes gain 1879 TMEM231 Exon+ve, >2 cases Yes gain 1993 TMEM231 Exon+ve, >2 cases Yes gain 2032 TMEM231 Exon+ve, >2 cases Yes gain 1763 PKD1L2 Exon+ve, distinct Yes CNVs, same Gene P
ip loss 1404 PKD1L2 Exon+ve, distinct Yes CNVs, same Gene .

..i loss 1275 PKD1L2 Exon+ve, >2 cases Yes loss 1404 PKD1L2 Exon+ve, >2 cases Yes .
, ip loss 1917 PKD1L2 Exon+ve, >2 cases Yes .
, o, loss 1998 PKD1L2 Exon+ve, >2 cases Yes loss 1917 PKD1L2 Exon+ve, distinct Yes CNVs, same Gene 10851 gain 1252 PKD1L2 Exon+ve, >2 cases Yes 10851 loss 1917 PKD1L2 Exon+ve, >2 cases Yes gain 1252 PKD1L2 Exon+ve, >2 cases Yes gain 1459 PKD1L2 Exon+ve, >2 cases Yes mo n loss 1917 PKD1L2 Exon+ve, >2 cases Yes -3 24105 gain 1459 PKD1L2 Exon+ve, >2 cases Yes ci) t.) 24105 loss 1917 PKD1L2 Exon+ve, >2 cases Yes =
.., 61312 loss 1824 CDH13 Exon+ve, >2 cases Yes w -I-r.) 16 81442167 81503479 61312 gain 1875 CDH13 164 Exon+ve, >2 cases Yes !A
.1, ca a loss 1258 KLHDC4 Exon+ve, distinct Yes CNVs, same Gene 17568 loss 2041 KLHDC4 Exon+ve, distinct Yes L') =
-, CNVs, same Gene w , ¨, 14939 loss 1274 FANCA Exon+ve, distinct Yes i..) =
, CNVs, same Gene =
.
ao gain 1877 FANCA Exon+ve, distinct Yes CNVs, same Gene
17 423069 446585 23516 loss 1268 VPS53 Exon+ve, >2 cases Yes 17 423069 446585 23516 gain 1494 VPS53 Exon+ve, >2 cases Yes 22723 gain 1600 TEKT1 Exon+ve, >2 cases Yes 22723 loss 1927 TEKT1 Exon+ve, >2 cases Yes 46342 loss 1600 ALOX12P2 Exon+ve, >2 cases Yes 46342 loss 1927 ALOX12P2 Exon+ve, >2 cases Yes p 18698 gain 1596 SLC5A10,FAM83G Exon+ve, >2 cases Yes ip 18698 gain 1717 SLC5A10,FAM83G Exon+ve, >2 cases Yes .

..i 18993 gain 1596 SLC5A10,FAM83G Exon+ve, >2 cases Yes 18993 gain 1717 SLC5A10,FAM83G Exon+ve, >2 cases Yes .
, ip 10954 loss 2038 SPEC Cl Exon+ve, distinct Yes .
, CNVs, same Gene o, loss 1988 SPECC1 Exon+ve, distinct Yes CNVs, same Gene loss 1238 ATAD5 Exon+ve, >2 cases Yes loss 1831 ATAD5 Exon+ve, >2 cases Yes 17 26865992 26870510 4518 loss 1411 RAB11FIP4 Special Yes loss 1316 STARD3 Exon+ve, >2 cases Yes loss 1318 STARD3 Exon+ve, >2 cases Yes n loss 1676 STARD3 Exon+ve, >2 cases Yes ci) loss 2045 STARD3 Exon+ve, >2 cases Yes =
¨, loss 1316 STARD3 Exon+ve, >2 cases Yes w =-==
17 35072083 35073438 1355 loss 1318 STARD3 165 Exon+ve, >2 cases Yes !A
=P, ca a loss 1665 STARD3 Exon+ve, >2 cases Yes loss 1676 STARD3 Exon+ve, >2 cases Yes loss 2045 STARD3 Exon+ve, >2 cases Yes "
=
.., loss 1659 STAT3 Exon+ve, >2 cases Yes w , ¨, loss 1887 STAT3 Exon+ve, >2 cases Yes "
=
=

loss 1295 L0C388387 Exon+ve, >2 cases Yes .
ao loss 1470 L0C388387 Exon+ve, >2 cases Yes loss 1319 KIAA1267 Exon+ve, >2 cases Yes loss 1320 KIAA1267 Exon+ve, >2 cases Yes loss 1530 KIAA1267 Exon+ve, >2 cases Yes loss 1533 KIAA1267 Exon+ve, >2 cases Yes loss 1535 KIAA1267 Exon+ve, >2 cases Yes loss 1536 KIAA1267 Exon+ve, >2 cases Yes loss 1537 KIAA1267 Exon+ve, >2 cases Yes P

loss 1539 KIAA1267 Exon+ve, >2 cases Yes .=

loss 1542 KIAA1267 Exon+ve, >2 cases Yes o's , loss 1586 KIAA1267 Exon+ve, >2 cases Yes loss 1587 KIAA1267 Exon+ve, >2 cases Yes .
, loss 1655 KIAA1267 Exon+ve, >2 cases Yes ' o, loss 1656 KIAA1267 Exon+ve, >2 cases Yes loss 1662 KIAA1267 Exon+ve, >2 cases , Yes loss 1684 KIAA1267 Exon+ve, >2 cases Yes loss 1861 KIAA1267 Exon+ve, >2 cases Yes loss 1536 NSF Exon+ve, >2 cases Yes gain 1671 NSF Exon+ve, >2 cases Yes gain 1751 NSF Exon+ve, >2 cases Yes .o n gain 1800 NSF Exon+ve, >2 cases Yes -3 gain 1991 NSF Exon+ve, >2 cases Yes ci) t.) gain 2032 NSF Exon+ve, >2 cases Yes =
.., w loss 1439 INTS2 Exon+ve, >2 cases Yes -I-r.) ul .1, ca a loss 1601 INTS2 Exon+ve, >2 cases Yes loss 1641 INTS2 Exon+ve, >2 cases Yes loss 1439 INTS2 Exon+ve, >2 cases Yes loss 1601 INTS2 Exon+ve, >2 cases Yes loss 1641 INTS2 Exon+ve, >2 cases Yes loss 1784 INTS2 Exon+ve, >2 cases Yes ao 17574 loss 1825 SEPT9 Exon+ve, >2 cases Yes 17574 loss 1909 SEPT9 Exon+ve, >2 cases Yes 822795 Gain 1891 C 1 7orf70,ACTGLTSPAN10,DCXR,C17orf90,STRA1 De Novo Yes 3,ARL16,MIR3186,NPLOC4,PYCR1,SLC25A10,GPS
1,DUS1L,ANAPC11,L0C92659,FASN,ARHGDIA,M
AFG,BAHCC1,DYSFIP1,MRPL12,SIRT7,RAC3,C'C
DC57,P4HB,PCYT2,HGS,RFNG,MYADML2,FSCN2 ,THOC4,ASPSCR1,CCDC137,NOTUM,NPB,PDE6G, 60695 Loss 1891 SLC16A3,CSNK1D Dc Novo Yes
18 17999811 18004912 5101 loss 1764 GATA6 Exon+ve, >2 cases Yes loss 1969 GATA6 Exon+ve, >2 cases Yes 11159 loss 1442 C18orf16 Exon+ve, >2 cases Yes 11159 loss 1502 C18orf16 Exon+ve, >2 cases Yes
19 11450908 11452390 1482 gain 1637 ELAVL3 Exon+ve, >2 cases Yes gain 1780 ELAVL3 Exon+ve, >2 cases Yes gain 1788 ELAVL3 Exon+ve, >2 cases Yes gain 1864 ELAVL3 Exon+ve, >2 cases Yes loss 1333 ZNF878 Exon+ve, >2 cases Yes loss 1391 ZNF878 Exon+ve, >2 cases Yes loss 1742 ZNF878 Exon+ve, >2 cases Yes loss 1538 DHPS Exon+ve, >2 cases Yes ci) loss 1638 DHPS Exon+ve, >2 cases Yes loss 1416 ZNF333 Exon+ve, >2 cases Yes loss 1578 ZNF333 Exon+ve, >2 cases Yes loss 1881 ZNF333 Exon+ve, >2 cases Yes loss 1416 ZNF333 Exon+ve, >2 cases Yes loss 1578 ZNF333 Exon+ve, >2 cases Yes "
=
,.., loss 1677 ZNF333 Exon+ve, >2 cases Yes w , ¨, loss 1738 ZNF333 Exon+ve, >2 cases Yes "
=
=

loss 1775 ZNF333 Exon+ve, >2 cases Yes .
ao loss 1826 ZNF333 Exon+ve, >2 cases Yes loss 1837 ZNF333 Exon+ve, >2 cases Yes loss 1881 ZNF333 Exon+ve, >2 cases Yes loss 1957 ZNF333 Exon+ve, >2 cases Yes loss 1968 ZNF333 Exon+ve, >2 cases Yes loss 2004 ZNF333 Exon+ve, >2 cases Yes loss 2031 ZNF333 Exon+ve, >2 cases Yes loss 1471 MIR1470,W1Z Exon+ve, >2 cases Yes P

loss 1676 MIR1470,WIZ Exon+ve, >2 cases Yes .=

loss 1687 MIR1470,WIZ Exon+ve, >2 cases Yes o's , loss 1726 MIR1470,WIZ Exon+ve, >2 cases Yes loss 1887 MIR1470,WIZ Exon+ve, >2 cases Yes .
, gain 1566 ZNF626 Exon+ve, >2 cases Yes ' o, gain 1761 ZNF626 Exon+ve, >2 cases Yes 19 23800105 23804481 , 4376 , gain 1541 , RPSAP58 Exon+ve, >2 cases , Yes gain 1608 RPSAP58 Exon+ve, >2 cases Yes gain 1783 RPSAP58 Exon+ve, >2 cases Yes 58728 gain 1281 PSG3,PSG8 Exon+ve, >2 cases Yes 58728 gain 1282 PSG3.PSG8 Exon+ve, >2 cases Yes loss 1671 GRIN2D Exon+ve, >2 cases Yes .o n loss 1901 GRIN2D Exon+ve, >2 cases Yes -3 loss 1959 GRIN2D Exon+ve, >2 cases Yes ci) t.) loss 1227 FUT2 Exon+ve, >2 cases Yes =
.., w loss 1448 FUT2 Exon+ve, >2 cases Yes -I-r.) ul .1, ca a loss 1694 FUT2 Exon+ve, >2 cases Yes loss 1697 FUT2 Exon+ve, >2 cases Yes loss 1227 FUT2 Exon+ve, >2 cases Yes "
=
.., loss 1448 FUT2 Exon+ve, >2 cases Yes w , ¨, loss 1694 FUT2 Exon+ve, >2 cases Yes "
=
=

loss 1697 FUT2 Exon+ve, >2 cases Yes .
ao 19 56882602 56889437 6835 loss 1232 MIR99B,MIRLET7E,MIR125A,NCRNA00085 Exon+ve, >2 cases Yes 19 56882602 56889437 6835 loss 1859 MIR99B,MIRLET7E,MIR125A,NCRNA00085 Exon+ve, >2 cases Yes 19 56882602 56889437 6835 loss 1965 MIR99B,MIRLET7E,MIR125A,NCRNA00085 Exon+ve, >2 cases Yes 19 56882602 56889437 6835 loss 1993 MIR99B,MIRLET7E,MIR125A,NCRNA00085 Exon+ve, >2 cases Yes 19 56882602 56889437 6835 loss 2032 MIR99B,MIRLET7E,M1R125A,NCRNA00085 Exon+ve, >2 cases Yes 14659 loss 1678 ZNF808 Exon+ve, >2 cases Yes 14659 loss 1855 ZNF808 Exon+ve, >2 cases Yes gain 1585 MIR516B2 Exon+ve, >2 cases Yes P

gain 1606 MIR516B2 Exon+ve, >2 cases Yes .=

loss 1720 CACNG8 Exon+ve, >2 cases Yes o's , loss 1859 CACNG8 Exon+ve, >2 cases Yes loss 1720 CACNG8 Exon+ve, >2 cases Yes .
, loss 1859 CACNG8 Exon+ve, >2 cases Yes ' o, loss 1953 CACNG8 Exon+ve, >2 cases Yes loss 1720 CACNG8 Exon+ve, >2 cases , Yes loss 1859 CACNG8 Exon+ve, >2 cases Yes loss 1953 CACNG8 Exon+ve, >2 cases Yes loss 1966 CACNG8 Exon+ve, >2 cases Yes loss 1461 ZIM3 Exon+ve, >2 cases Yes loss 1995 ZIM3 Exon+ve, >2 cases Yes .o n loss 1996 ZIM3 Exon+ve, >2 cases Yes -3 loss 1461 VN1R1 Exon+ve, >2 cases Yes ci) t.) loss 1522 VN1R1 Exon+ve, >2 cases Yes =
.., w 13258 loss 1454 ZNF324B Exon+ve, >2 cases Yes -I-r.) ul .1, ca a 13258 gain 1862 ZNF324B Exon+ve, >2 cases Yes
20 26127265 26144660 17395 gain 1694 MIR663 Exon+ve, >2 cases Yes 17395 gain 1793 MIR663 Exon+ve, >2 cases Yes "
=
.., 20 30793762 30795954 2192 loss 1241 COMMD7 Exon+ve, >2 cases Yes w , ¨, 20 30793762 30795954 2192 loss 1901 COMMD7 Exon+ve, >2 cases Yes "
=
=

loss 1419 FER1L4 Exon+ve, >2 cases Yes .
ao loss 1774 FER1L4 Exon+ve, >2 cases Yes loss 1354 BCAS1 Exon+ve, >2 cases Yes loss 1860 BCAS1 Exon+ve, >2 cases Yes
21 27260832 27262559 1727 loss 1442 ADAMTS5 Exon+ve, >2 cases Yes loss 1522 ADAMTS5 Exon+ve, >2 cases Yes loss 1714 ADAM'TS5 Exon+ve, >2 cases Yes loss 1828 ADAMTS5 Exon+ve, >2 cases Yes loss 1915 ADAMTS5 Exon+ve, >2 cases Yes P
22 16366605 16373481 6876 loss 1226 CECR2 Exon+ve, >2 cases Yes .=

loss 1694 CECR2 Exon+ve, >2 cases Yes o's , loss 1718 BID Exon+ve, >2 cases Yes loss 1859 BID Exon+ve, >2 cases Yes .
, 19703 loss 1780 MICAL3 Exon+ve, >2 cases Yes ' o, 19703 loss 1805 MICAL3 Exon+ve, >2 cases Yes 19703 loss 2034 MICAL3 Exon+ve, >2 cases , Yes 30004 loss 1549 L0C91316 Exon+ve, distinct Yes CNVs, same Gene gain 1895 L0C91316,RGL4 Exon+ve, distinct Yes CNVs, same Gene gain 1348 MIR1302-1,MY018B Exon+ve, >2 cases Yes .o n loss 1833 MIR1302-1,MY018B Exon+ve, >2 cases Yes -3 21901 loss 1724 APOL2 Exon+ve, >2 cases Yes ci) t.) 21901 loss 2035 APOL2 Exon+ve, >2 cases Yes .., w loss 1959 APOBEC3C Exon+ve, >2 cases Yes -i-r.) loss 1965 APOBEC3C Exon+ve, >2 cases Yes ul .1, ca a 22 45453176 45454102 926 gain 1660 GRAMD4 Exon+ve, >2 cases Yes 22 45453176 45454102 926 gain 1880 GRAMD4 Exon+ve, >2 cases Yes loss 1619 ALG12 Exon+ve, >2 cases Yes "
=
.., loss 1930 ALG12 Exon+ve, >2 cases Yes w , ¨, gain 1434 XG Exon+ve, >2 cases Yes "
=
=

gain 1509 XG Exon+ve, >2 cases Yes .
ao gain 1732 XG Exon+ve, >2 cases Yes gain 1825 XG Exon+ve, >2 cases Yes gain 1917 XG Exon+ve, >2 cases Yes 19096 gain 1434 GYG2 Exon+ve, >2 cases Yes 19096 gain 1509 GYG2 Exon+ve, >2 cases Yes 19096 gain 1732 GYG2 Exon+ve, >2 cases Yes 19096 gain 1825 GYG2 Exon+ve, >2 cases Yes 19096 gain 1917 GYG2 Exon+ve, >2 cases Yes P
ip 20276 gain 1434 GYG2 Exon+ve, >2 cases Yes .=

20276 gain 1509 GYG2 Exon+ve, >2 cases Yes o's ..i 20276 loss 1654 GYG2 Exon+ve, >2 cases Yes 20276 gain 1732 GYG2 Exon+ve, >2 cases Yes .
, ip 20276 gain 1825 GYG2 Exon+ve, >2 cases Yes ' o, 20276 gain 1917 GYG2 Exon+ve, >2 cases Yes X 2788490 2814330 , 25840 , gain 1434 , GYG2 Exon+ve, >2 cases , Yes 25840 gain 1509 GYG2 Exon+ve, >2 cases Yes 25840 gain 1732 GYG2 Exon+ve, >2 cases Yes 25840 gain 1825 GYG2 Exon+ve, >2 cases Yes 25840 gain 1917 GYG2 Exon+ve, >2 cases Yes 65155 gain 1566 KAL1 Exon+ve, >2 cases Yes mo n 65155 gain 1901 KAL1 Exon+ve, >2 cases Yes -3 10351 loss 1298 KAL1 Exon+ve, >2 cases Yes ci) t.) 10351 loss 1432 KAL1 Exon+ve, >2 cases Yes =
.., w 10351 gain 1566 KAL1 Exon+ve, >2 cases Yes -I-r.) !A
.1, ca a 10351 gain 1901 KAL1 Exon+ve, >2 cases Yes 57777 gain 1566 KAL1 Exon+ve, >2 cases Yes 57777 gain 1901 KAL1 Exon+ve, >2 cases Yes 20643 gain 1566 KAL1 Exon+ve, >2 cases Yes 20643 gain 1901 KAL1 Exon+ve, >2 cases Yes 26424 loss 1496 FAM9B Exon+ve, >2 cases Yes ao gain 1454 FAM9B Exon+ve, >2 cases Yes loss 1633 TLR8,L0C349408 Exon+ve, >2 cases Yes loss 1901 TLR8,L0C349408 Exon+ve, >2 cases Yes loss 2024 TLR8,L0C349408 Exon+ve, >2 cases Yes loss 1320 OFD1 Exon+ve, >2 cases Yes gain 1590 OFD I Exon+ve, >2 cases Yes loss 1234 BMX Exon+ve, >2 cases Yes loss 1320 BMX Exon+ve, >2 cases Yes loss 1822 BMX Exon+ve, >2 cases Yes loss 1827 BMX Exon+ve, >2 cases Yes o's loss 1876 BMX Exon+ve, >2 cases Yes loss 1506 IL1RAPL1 Exon+ve, >2 cases Yes loss 1811 IL1RAPL1 Exon+ve, >2 cases Yes X 32210107 32228244 18137 gain 2018 DMD
Exon+ve, >2 cases Yes X 32958581 33069843 111262 gain 1864 DMD
Exon+ve, >2 cases Yes X 33074762 33228204 153442 gain 1864 DMD
Exon+ve, >2 cases Yes X 33230517 33336759 106242 gain 1864 DMD
Exon+ve, >2 cases Yes loss 1415 USP9X Exon+ve, >2 cases Yes loss 1415 USP9X Exon+ve, >2 cases Yes loss 1583 USP9X Exon+ve, >2 cases Yes X 43457175 43465307 8132 Loss 1369 MAOA
Intronic No -3 X 43458232 43465307 7075 Loss 1300 MAOA
Intronic No ci) X 43458232 43465307 7075 Loss 1697 MAOA
Intronic No X 43458232 43465307 7075 Loss 1751 MAOA
Intronic No JI
r.) X 43458232 43465307 7075 Loss 1800 MAOA
Intronic No X 43458232 43465307 7075 Loss 1842 MAOA
Intronic No 0 X 43458232 43465307 7075 Loss 1848 MAOA
Intronic No "
=
.., X 43458232 43465307 7075 Loss 1855 MAOA
Intronic No w , -, X 43458232 43465307 7075 Loss 1859 MAOA
Intronic No "
=
=
X 43458232 43465307 7075 Loss 1898 MAOA
ktronic No .
ao X 43458232 43465307 7075 Loss 1907 MAOA
Intronic No X 43458232 43465307 7075 Loss 1916 MAOA
Intronic No X 43458232 43465307 7075 Loss 1921 MAOA
Intronic No X 43458232 43465307 7075 Loss 1935 MAOA
Intronic No X 43458232 43465307 7075 Loss 1946 MAOA
Intronic No X 43458232 43465307 7075 Loss 1958 MAOA
Intronic No X 43458232 43465307 7075 Loss 1960 MAOA
Intronic No X 43458232 43465307 7075 Loss 1961 MAOA
Intronic No P
X 43458232 43465307 7075 Loss 1965 MAOA
Intronic No .=
X 43458232 43465307 7075 Loss 1966 MAOA
Intronic No o's , X 43458232 43465307 7075 Loss 1967 MAOA
Intronic No X 43458232 43465307 7075 Loss 1969 MAOA
Intronic No .
, X 43458232 43465307 7075 Loss 1993 MAOA
Intronic No ' o, X 43458232 43465307 7075 Loss 2033 MAOA
Intronic No X 43458232 43465307 7075 Loss 2035 MAOA
Intronic ,No loss 1675 RGN Exon+ve, >2 cases Yes gain 1896 RGN Exon+ve, >2 cases Yes gain 2040 RGN Exon+ve, >2 cases Yes X 48688957 48716140 27183 gain 1349 KCND LOTUD5,GRIPAP1 Exon+ve, >2 cases Yes X 48688957 48716140 27183 loss 1639 KCND LOTUD5,GRIPAP1 Exon+ve, >2 cases Yes .o n gain 1284 SLC7A3 Exon+ve, >2 cases Yes -3 gain 1308 SLC7A3 Exon+ve, >2 cases Yes ci) t.) gain 1346 SLC7A3 Exon+ve, >2 cases Yes =
.., w X 96561809 96658023 96214 gain 1348 DIAPH2 173 Exon+ve, >2 cases Yes -I-r.) ul .1, ca a X 96718563 97203519 484956 gain 1348 DIAPH2 Exon+ve, >2 cases Yes X 100665462 100673058 7596 gain 1269 ARMCX4 Exon+ve, >2 cases Yes X 100665462 100673058 7596 loss 1413 ARMCX4 Exon+ve, >2 cases Yes X 100665462 100673058 7596 gain 1857 ARMCX4 Exon+ve, >2 cases Yes X 105750701 105752733 2032 loss 1239 CXorf57 Exon+ve, >2 cases Yes X 105750701 105752733 2032 loss 1372 CXorf57 Exon+ve, >2 cases Yes ao X 123691710 123698719 7009 loss 1421 ODZ1 Exon+ve, >2 cases Yes X 123691710 123698719 7009 loss 1428 ODZ1 Exon+ve, >2 cases Yes X 123691710 123698719 7009 loss 1805 ODZ1 Exon+ve, >2 cases Yes X 128772381 128775324 2943 gain 1806 ZDHHC9 Exon+ve, >2 cases Yes X 128772381 128775324 2943 gain 1824 ZDHHC9 Exon+ve, >2 cases Yes X 128775325 128777107 1782 gain 1459 ZDHHC9 Exon+ve, >2 cases Yes X 128775325 128777107 1782 gain 1806 ZDH,HC9 Exon+ve, >2 cases Yes X 128775325 128777107 1782 gain 1824 ZDHHC9 Exon+ve, >2 cases Yes X 137525298 137527811 2513 gain 1223 L0C158696 Exon+ve, >2 cases Yes X 137525298 137527811 2513 gain 2041 L0C158696 Exon+ve, >2 cases Yes o's X 151736328 151770679 34351 gain 1887 CETN2,NSDHL
Exon+ve, >2 cases Yes X 151788383 151853605 65222 gain 1887 ZNF185,NSDHL
Exon+ve, >2 cases Yes X 154321522 154375563 54041 gain 1831 F8A1,F8A3,F8A2,H2AFB3.H2AFB2,H2AFB1,MIR11 Exon+ve, >2 cases Yes 84-1,MIR1184-2,MIR1184-3,TMLHE
X 154404962 154427678 22716 gain 1724 TMLHE
Exon+ve, >2 cases Yes * Position references refer to the human genomic sequence Hg18 Mar. 2006 (NCBI
Build 36.1).
[00224] Table 2 is identical to Table 1, with four exceptions. Firstly, the CNV coordinates listed refer to the actual CNV subregions found to be unique or significantly different in frequency between ASD and Normal cohorts, as opposed to Table 1, which lists the originating CNVs. For example, a CNV of a particular size/length (e.g., 100,000bp) in an ASD patient may contain one or more smaller subregions within it (e.g., ci) 10,000bp in size/length) that do not occur at higher frequency in one or more ASD patients relative to the normal cohort. Another example is that a CNV unique to, or present at higher frequency in, ASD patients relative to normal subjects may partially overlap a second CNV that is present at JI

comparable or higher frequency in normal subjects; in this case, only the unique subregion is reported in Table 2 as such subregions may further refine specific gcnomic loci causative of autism/ASD phenotypes. Secondly, an extra column details whether the CNV subregion of interest t.) overlaps an exon or only an intron. Thirdly, no OR values are reported (see Table 1 for OR values). Fourthly, gene annotation is for CNV
subregions only (i.e., other genes that may be impacted by the parent CNV
reported in Table 1 are excluded if they are not likewise impacted by the CNV subregion(s)). "De novo" refers to CNV subregions found to occur in the offspring of two parents, neither of whom has the relevant CNV ao subregion(s); "Intronic" refers to CNV subregions affecting introns only;
"Ctrl pos High OR" refers to CNV subregions present at high frequency in the ASD cohort compared to the normal cohort; "Exon+ve, distinct CNVs, same Gene" refers to CNV subregions in 2 or more ASD individuals affecting different exons of the same gene; "Exon+ve, >2 cases" refers to CNV
subregions in 2 or more ASD individuals affecting the same exon of a gene; "Special" refers to CNV subregions added to list because of relationship to genes with strong biological evidence in ASD.
[00225] Column 2 refers to the nucleotide position in the respective chromosome (column 1) where the corresponding CNV subregion begins and column 3 refers to the nucleotide position in the respective chromosome where the corresponding CNV subregion ends. Column 4 refers to the length of the CNV subregion in bps. Nucleotide positions were determined using the database Hg18 Mar. 2006 (NCBI Build 36.1). The CNV
classifications of gain or loss indicate whether each CNV subregion found in the subjects was duplicated/amplified (gain) or deleted (loss) in the genome.

GENE NAME CNV NCBI Gene Description RefSeq Summmary Gene Gene Region ID
ci) AARS Exonic 16 alanyl-tRNA The human alanyl-tRNA synthetase (AARS) belongs to a family of tRNA
synthetasc, synthases, of the class II enzymes.
Class II tRNA synthascs evolved early in cytoplasmic evolution and are highly conserved.
This is reflected by the fact that 498 of the ,J1 968-residue polypeptide human AARS shares 41% identity witht the E.coli protein. tRNA synthases are the enzymes that interpret the RNA code and attach specific aminoacids to the tRNAs that contain the cognate trinucleotide anticodons. They consist of a catalytic domain which interacts with the amino acid acceptor-T psi C helix of the tRNA, and a second domain which interacts with the rest of the tRNA structure. [provided by RefSeq, Jul 2008].
ABCA13 Exonic 154664 ATP-binding In human, the ATP-binding cassette (ABC) family of transmembrane ao cassette sub-family transporters has at least 48 genes and 7 gene subfamilies.
This gene is a member A member 13 of ABC gene subfamily A (ABCA).
Genes within the ABCA family typically encode several thousand amino acids. Like other ABC transmembrane transporter proteins, this protein has 12 or more transmembrane alpha-helix domains that likely arrange to form a single central chamber with multiple substrate binding sites. It is also predicted to have two large extracellular domains and two nucleotide binding domains as is typical for ABCA proteins.
Alternative splice variants have been described but their biological validity has not been demonstrated .]provided by RefSeq, Mar 20091. Sequence Note: This RefSeq record was created from transcript and genomic sequence data to make the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on transcript alignments.
ABCB4 Exonic 5244 multidrug The membrane-associated protein encoded by this gene is a member of the resistance protein 3 superfamily of ATP-binding cassette (ABC) transporters.
ABC proteins isoform B transport various molecules across extra- and intra-cellular membranes. ABC
genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the MDR/TAP
subfamily. Members of the MDR/TAP subfamily are involved in multidrug resistance as well as antigen presentation. This gene encodes a full transporter and member of the p-glycoprotein family of membrane proteins with phosphatidylcholine as its substrate. The function of this protein has not yet been determined; however, it may involve transport of phospholipids from liver hepatocytes into bile. Alternative splicing of this gene results in several products ci) of undetermined function. [provided by RefSeq, Jul 2008]. Transcript Variant:
This variant (B) uses an alternate in-frame splice site in the 3' coding region, compared to variant A, resulting in a longer protein (isoforni B).

ACAD10 Exonic 80724 acyl-CoA This gene encodes a member of the acyl-CoA dehydrogenase family of dehydrogenase enzymes (ACADs), which participate in the beta-oxidation of fatty acids in family member 10 mitochondria. The encoded enzyme contains a hydrolase domain at the N-isoform a terminal portion, a serine/threonine protein kinase catlytic domain in the central region, and a conserved ACAD domain at the C-terminus. Several alternatively spliced transcript variants of this gene have been described, but the full-length nature of some of these variants has not been determined. [provided by RefSeq, ao Nov 2008]. Transcript Variant: This variant (1) represents the longest transcript and encodes the longest isoform (a).
ACTG1 Exonic 71 actin, cytoplasmic Actins are highly conserved proteins that are involved in various types of cell 2 motility, and maintenance of the cytoskeleton. In vertebrates, three main groups of actin isoforms, alpha, beta and gamma have been identified. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton, and as mediators of internal cell motility. Actin, gamma 1, encoded by this gene, is a cytoplasmic actin found in non-muscle cells.
Mutations in this gene are associated with DFNA20/26, a subtype of autosomal dominant non-syndromic sensorineural progressive hearing loss. Alternative splicing results in multiple transcript variants. [provided by RefSeq, Jan 2011].
Transcript Variant: This variant (1) represents the longest transcript.
Variants 1 and 2 encode the same protein.
ADAMTS5 Exonic 11096 A disintegrin and This gene encodes a member of the ADAMTS (a disintegrin and metalloproteinase metalloproteinase with thrombospondin motifs) protein family. Members of the with family share several distinct protein modules, including a propeptide region, a thrombospondin metalloproteinase domain, a disintegrin-like domain, and a thrombospondin type motifs 5 1 (TS) motif. Individual members of this family differ in the number of C-preproprotein terminal TS motifs, and some have unique C-terminal domains. The enzyme encoded by this gene contains two C-terminal TS motifs and functions as aggrecanase to cleave aggrecan, a major proteoglycan of cartilage. [provided by RefSeq, Jul 2008]. Sequence Note: The RefSeq transcript and protein were ci) derived from genomic sequence to make the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on alignments.

ADAMTS9 Exonic 56999 A disintegrin and This gene encodes a member of the ADAMTS (a disintegrin and metalloproteinase metalloproteinase with thrombospondin motifs) protein family. Members of the 0 with family share several distinct protein modules, including a propeptide region, a =
thrombospondin metalloproteinase domain, a disintegrin-like domain, and a thrombospondin type -, w , motifs 9 1 (TS) motif. Individual members of this family differ in the number of C- -, t..) =
preproprotein terminal TS motifs, and some have unique C-terminal domains. Members of the =
ADAMTS family have been implicated in the cleavage of proteoglycans, the ao control of organ shape during development, and the inhibition of angiogenesis.
This gene is localized to chromosome 3p14.3-p14.2, an area known to be lost in hereditary renal tumors. [provided by RefSeq, Jul 20081.
AIG1 Exonic 51390 androgen-induced N/A
gene 1 protein AKNA Exonic 80709 AT-hook- N/A
containing transcription factor P
AKRIB15 Exonic 441282 aldo-keto reductase N/A
family 1 member , ALB Exonic 213 serum albumin Albumin is a soluble, monomeric protein which comprises about one-half of the .
, preproprotein blood serum protein. Albumin functions primarily as a carrier protein for 0 , steroids, fatty acids, and thyroid hormones and plays a role in stabilizing o, extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. [provided by RefSeq. Jul 20081.
ALDH1A2 Exonic 8854 retinal This protein belongs to the aldehyde dehydrogenase family of proteins. The .o n debydrogenase 2 product of this gene is an enzyme that catalyzes the synthesis of retinoic acid -i isoform 3 (RA) from retinaldehyde. Retinoic acid, the active derivative of vitamin A ci) i.) (retinol), is a hormonal signaling molecule that functions in developing and =
.., w adult tissues. The studies of a similar mouse gene suggest that this enzyme and -i-the cytochrome CYP26A1, concurrently establish local embryonic retinoic acid r.) ,J1 .1, ca a levels which facilitate posterior organ development and prevent spina bifida.
Four transcript variants encoding distinct isoforms have been identified for this gene. [provided by RefSeq, May 20111. Transcript Variant: This variant (3) differs in the 5' UTR and coding sequence compared to variant 1. The resulting isoform (3) is shorter at the N-terminus compared to isoform 1.
ALDH1A3 Exonic 220 aldehyde Aldehyde dehydrogenase isozymes arc thought to play a major role in the dehydrogenase detoxification of aldehydes generated by alcohol metabolism and lipid ao family 1 member peroxidation. The enzyme encoded by this gene uses retinal as a substrate, either A3 in a free or cellular retinol-binding protein form. [provided by RefSeq, Jul 2008].
ALDH2 Exonic 217 aldehyde This protein belongs to the aldehyde dehydrogenase family of proteins.
dehydrogenase, Aldehyde dehydrogenase is the second enzyme of the major oxidative pathway mitochondrial of alcohol metabolism. Two major liver isoforms of aldehyde dehydrogenase, isoform 2 precursor cytosolic and mitochondrial, can be distinguished by their electrophoretic mobilities, kinetic properties, and subcellular localizations. Most Caucasians have two major isozymes, while approximately 50% of Orientals have the cytosolic isozymc but not the mitochondrial isozyme. A remarkably higher frequency of acute alcohol intoxication among Orientals than among Caucasians could be related to the absence of a catalytically active form of the mitochondrial isozyme. The increased exposure to acetaldehyde in individuals with the catalytically inactive form may also confer greater susceptibility to many types of cancer. This gene encodes a mitochondrial isoform, which has a low Km for acetaldehydes, and is localized in mitochondrial matrix.
Alternative splicing results in multiple transcript variants encoding distinct isoforms.[provided by RefSeq, Mar 2011]. Transcript Variant: This variant (2) lacks an in-frame exon in the 5' coding region, compared to variant 1, and encodes a shorter isoform (2), compared to isoform 1.
ALG12 Exonic 79087 dol-P- This gene encodes a member of the glycosyltransferase 22 family. The encoded Man:Man(7)G1cN protein catalyzes the addition of the eighth mannosc residue in an alpha-1,6 Ac(2)-PP-Dol linkage onto the dolichol-PP-oligosaccharide precursor (dolichol-PP-ci) alpha-1,6- Man(7)G1cNAc(2)) required for protein glycosylation. Mutations in this gene mannosyltransferas have been associated with congenital disorder of glycosylation type Ig (CDG-Ig)characterized by abnormal N-glycosylation. [provided by RefSeq, Jul 20081.
C.AJ

ALMS1P Exonic 200420 N/A N/A
ALOX12P2 Exonic 245 N/A N/A
ALS2CL Exonic 259173 ALS2 C-terminal- N/A
t-) like protein isoform AMBP Exonic 259 protein AMBP This gene encodes a complex glycoprotein secreted in plasma. The precursor is ao p reprop rote in proteolytically processed into distinct functioning proteins: alpha-1-microglobulin, which belongs to the superfamily of lipocalin transport proteins and may play a role in the regulation of inflammatory processes, and bikunin, which is a urinary- tryp sin inhibitor belonging to the superfamily of Kunitz-type protease inhibitors and plays an important role in many physiological and pathological processes. This gene is located on chromosome 9 in a cluster of lipocalin genes. [provided by RefSeq, Jul 2008].
ANAPC11 Exonic 51529 anaphase- N/A
promoting complex subunit 11 isoform ANKRD17 Exonic 26057 ankyrin repeat This gene encodes a protein with ankyrin repeats, which are associated with domain-containing protein-protein interactions. Studies in mice suggest that this protein is involved protein 17 isoform in liver development. Two transcript variants encoding different isoforms have been found for this gene. [provided by RefSeq, Jul 2008]. Transcript Variant:
This variant (2) lacks an alternate in-frame exon compared to variant 1. The resulting isoform (b) has the same N- and C-termini but is shorter compared to isoform a.
ANKRD33 Exonic 341405 ankyrin repeat N/A
domain-containing protein 33 isoform ANKRD33B Exonic 651746 ankyrin repeat N/A
domain-containing ci) protein 33B
ANKRD34A Exonic 284615 ankyrin repeat N/A
domain-containing ,J1 protein 34A
ANKRD35 Exonic 148741 ankyrin repeat N/A
domain-containing t.) protein 35 ANKS IB Exonic 56899 ankyrin repeat and This gene encodes a multi-domain protein that is predominantly expressed in sterile alpha motif brain and testis. This protein interacts with amyloid beta protein precursor ao domain-containing (AbetaPP) and may have a role in normal brain development, and in the protein 1B isoform pathogenesis of Alzheimer's disease. Expression of this gene has been shown to 1 be elevated in patients with pre-B
cell acute lymphocytic leukemia associated with t(1;19) translocation. Alternatively spliced transcript variants encoding different isoforms (some with different subcellular localization, PMID:15004329) have been described for this gene. [provided by RefSeq, Aug 20111. Transcript Variant: This variant (12) differs in the 5' UTR and coding region compared to variant 1. The resulting isoform (1) has a shorter and distinct N-terminus compared to isoform a. Publication Note: This RefSeq record includes a subset of the publications that are available for this gene. Please see the Gene record to access additional publications.
ANDS Exonic 203859 anoctamin-5 This gene encodes a member of the anoctamin family of transmembrane isoform b proteins. The encoded protein is likely a calcium activated chloride channel.
Mutations in this gene have been associated with gnathodiaphyseal dysplasia.
Alternatively spliced transcript variants have been described. [provided by RefSeq, Nov 20091. Transcript Variant: This variant (2) lacks an alternate in-frame segment, compared to variant 1, resulting in a shorter protein (isoform b), compared to isoform a.
ANUB Ll Exonic N/A N/A N/A
ANXA6 Exonic 309 annexin A6 Annexin VI belongs to a family of calcium-dependent membrane and isoform 2 phospholipid binding proteins.
Several members of the annexin family have been implicated in membrane-related events along exocytotic and endocytotic pathways. The annexin VI gene is approximately 60 kbp long and contains 26 exons. It encodes a protein of about 68 kDa that consists of eight 68-amino acid ci) repeats separated by linking sequences of variable lengths. It is highly similar to human annexins I and II sequences, each of which contain four such repeats.
Annexin VI has been implicated in mediating the endosome aggregation and vesicle fusion in secreting epithelia during exocytosis. Alternatively spliced transcript variants have been described. [provided by RefSeq, Aug 2010].
Transcript Variant: This variant (2) differs in the 5' UTR, lacks a portion of the 5' coding region, and initiates translation at a downstream start codon, compared to variant 1. The encoded isoform (2) is shorter than isoform 1. Publication Note: This RefSeq record includes a subset of the publications that are available for this gene. Please see the Gene record to access additional publications.
ao AP3M2 Exonic 10947 AP-3 complex This gene encodes a subunit of the heterotetrameric adaptor-related protein subunit mu-2 comlex 3 (AP-3), which belongs to the adaptor complexes medium subunits family. The AP-3 complex plays a role in protein trafficking to lysosomes and specialized organelles. Multiple alternatively spliced variants, encoding the same protein, have been identified. [provided by RefSeq, Aug 20081. Transcript Variant: This variant (1) represents the longest transcript. Variants 1 and 2 encode the same protein.
APBA2 Exonic 321 amyloid beta A4 The protein encoded by this gene is a member of the X11 protein family. It is a precursor protein- neuronal adapter protein that interacts with the Alzheimer's disease amyloid binding family A precursor protein (APP). It stabilizes APP and inhibits production of proteolytic member 2 isoform APP fragments including the A beta peptide that is deposited in the brains of Alzheimer's disease patients. This gene product is believed to be involved in signal transduction processes. It is also regarded as a putative vesicular trafficking protein in the brain that can form a complex with the potential to couple synaptic vesicle exocytosis to neuronal cell adhesion. Multiple transcript variants encoding different isoforms have been found for this gene. [provided by RefSeq, Jul 2008]. Transcript Variant: This variant (2) lacks an alternate in-frame exon, compared to variant 1, resulting in a shorter protein (isoform b), compared to isoform a. Publication Note: This RefSeq record includes a subset of the publications that are available for this gene. Please see the Gene record to access additional publications.
APOBEC3C Exonic 27350 probable DNA dC- This gene is a member of the cytidinc dcaminasc gene family. It is one of seven >dU-editing related genes or pseudogenes found in a cluster thought to result from gene ci) enzyme APOBEC- duplication, on chromosome 22. Members of the cluster encode proteins that are 3C structurally and functionally related to the C to U RNA-editing cytidine deaminase APOBEC1. It is thought that the proteins may be RNA editing C.AJ

enzymes and have roles in growth or cell cycle control. [provided by RefSeq, , Jul 2008].
. .
. 0 APOL2 Exonic 23780 apolipoprotein L2 This gene is a member of the apolipoprotein L gene family. The encoded t.) =
protein is found in the cytoplasm, where it may affect the movement of lipids or -, w , allow the binding of lipids to organelles. Two transcript variants encoding the -, t..) =
same protein have been found for this gene. [provided by RefSeq, Jul 2008].
=
Transcript Variant: This variant (beta) differs in the 5' UTR compared to variant ao alpha. Both variants encode the same protein.
ARFGAP2 Exonic 84364 ADP-ribosylation N/A
factor GTPase-activating protein 2 isoform 1 ARHGAP10 Exonic 79658 rho GTPase- N/A
activating protein P
ARHGAP15 Exonic 55843 rho GTPase- RHO GTPases (see ARHA; MIM 165390) regulate diverse biologic processes, activating protein and their activity is regulated by RHO GTPase-activating proteins (GAPs), such .. 0 , as ARHGAP15 (Seoh et al., 2003 [PubMed 126509401).[supplied by OMIM, Mar 2008].
.
, ARHGAP21 Exonic 57584 rho GTPase- ARHGAP21 functions preferentially as a GTPase-activating protein (GAP) for .

, activating protein CDC42 (MIM 116952) and regulates the ARP2/3 complex (MIM 604221) and o, 21 F-actin dynamics at the Golgi through control of CDC42 activity (Dubois et al., 2005 [PubMed 15793564]) .[supplied by OMIM. Mar 2008]. Sequence Note:
The 5'-most in-frame translation start codon is selected for this RefSeq and is well-conserved among mammalian species. An alternative start codon that would reduce the protein length by 1 aa is also present. The use of the downstream start codon is assumed in the literature, including PMIDs:12056806, 15793564 and 17347647.
.o n ARHGDIA Exonic 396 rho GDP- Aplysia Ras-related homologs (ARHs), also called Rho genes, belong to the -i dissociation RAS gene superfamily encoding small guanine nucleotide exchange ci) t.1 inhibitor 1 isoform (GTP/GDP) factors. The ARM proteins may be kept in the inactive, GDP-bound -, a state by interaction with GDP
dissociation inhibitors, such as ARHGDIA w =-==
(Leffers et al., 1993 [PubMed 82621331).[supplied by OMIM, Jan 20091.
ui .1, ca a Transcript Variant: This variant (1) represents the longest transcript and encodes the longer isoform (a). Variants 1 and 2 both encode isoform a.
ARHGEF26 Exonic 26084 Src homology 3 This gene encodes a member of the Rho-guanine nucleotide exchange factor t.) domain-containing (Rho-GEF) family. These proteins regulate Rho GTPases by catalyzing the guanine nucleotide exchange of GDP for GTP. The encoded protein specifically activates RhoG and exchange factor plays a role in the promotion of macropinocytosis. Underexpression of the isoform 1 encoded protein may be a predictive marker of chcmoresistant disease. ao Alternatively spliced transcript variants encoding multiple isofonns have been observed for this gene. [provided by RefSeq. Oct 20111. Transcript Variant:
This variant (2) differs in the 5' UTR compared to variant 1. Variants 1 and 2 encode the same isoform (1). Sequence Note: This RefSeq record was created from transcript and genomic sequence data because no single transcript was available for the full length of the gene. The extent of this transcript is supported by transcript alignments.
ARL16 Exonic 339231 ADP-ribosylation N/A

factor-like protein ARMC5 Exonic 79798 armadillo repeat- N/A
containing protein isoform a precursor ARMCX4 Exonic 100131 N/A N/A

ASPSCR1 Exonic 79058 N/A The protein encoded by this gene contains a UBX domain and interacts with glucose transporter type 4 (GLUT4). This protein is a tether, which sequesters the GLUT4 in intracellular vesicles in muscle and fat cells in the absence of insulin, and redistributes the GLUT4 to the plasma membrane within minutes of insulin stimulation. Translocation t(X;17)(p11;q25) of this gene with transcription factor TFE3 gene results in a ASPSCR1-TFE3 fusion protein in alveolar soft part sarcoma and in renal cell carcinomas. Multiple alternatively ci) spliced transcript variants have been found. [provided by RefSeq, Oct 20111.
Transcript Variant: This variant (3) lacks an internal exon in the 5' region, which results in a frame-shift and premature translation termination, compared to ,J1 variant 1. The resulting transcript is a nonsense-mediated mRNA decay candidate.
ASTN2 Exonic 23245 astrotactin-2 This gene encodes a protein that is expressed in the brain and may function in t.) isoform f neuronal migration, based on functional studies of the related astrotactin 1 gene in human and mouse. A deletion at this locus has been associated with schizophrenia. Multiple transcript variants encoding different proteins have been found for this locus. [provided by RefSeq, May 20101. Transcript Variant: This ao variant (6) has multiple differences compared to variant 1. These differences result in a distinct 5' UTR and lead to translation initiation at an alternate start codon, compared to variant 1. The encoded isoform (0 has distinct N- and C-termini and is shorter than isoform a.
ATAD5 Exonic 79915 ATPase family N/A
AAA domain-containing protein ATRN LI Exonic 26033 attractin-like N/A
protein 1 precursor BAHCC1 Exonic 57597 BAH and coiled- N/A
coil domain-containing protein BASP1P1 Exonic 646201 N/A N/A
BCAP29 Exonic 55973 B-cell receptor- N/A
associated protein 29 isoform a BCAS1 Exonic 8537 breast carcinoma- This gene resides in a region at 20q13 which is amplified in a variety of tumor amplified sequence types and associated with more aggressive tumor phenotypes.
Among the genes 1 identified from this region, it was found to be highly expressed in three amplified breast cancer cell lines and in one breast tumor without amplification at 20q13.2. However, this gene is not in the common region of maximal ci) amplification and its expression was not detected in the breast cancer cell line MCF7, in which this region is highly amplified. Although not consistently =-==
expressed. this gene is a candidate oncogene. [provided by RefSeq. Jul 20081.

Sequence Note: The RefSeq transcript and protein were derived from genomic sequence to make the sequence consistent with the reference genome assembly.
The genomic coordinates used for the transcript record were based on alignments.
BID Exonic 637 BH3-interacting This gene encodes a death agonist that heterodimerizes with either agonist BAX
domain death or antagonist BCL2. The encoded protein is a member of the BCL-2 family of agonist isoform 3 cell death regulators. It is a mediator of mitochondrial damage induced by ao caspase-8 (CASP8); CASP8 cleaves this encoded protein, and the COOH-terminal part translocates to mitochondria where it triggers cytochrome c release. Multiple alternatively spliced transcript variants have been found, but the full-length nature of some variants has not been defined. [provided by RefSeq, Jul 2008]. Transcript Variant: This variant (7) lacks two alternate coding exons compared to variant 1, that causes a frameshift. This variant uses a downstream in-frame start-codon, so the encoded isoform 3 has a shorter N-terminus, as compared to isoform 1.

BMX Exonic 660 cytoplasmic This gene encodes a non-receptor tyrosine kinase belonging to the Tee kinase tyrosine-protein family. The protein contains a PH-like domain, which mediates membrane kinase BMX targeting by binding to phosphatidylinositol 3,4,5-triphosphate (P1P3), and a SH2 domain that binds to tyrosine-phosphorylated proteins and functions in signal transduction. The protein is implicated in several signal transduction pathways including the Stat pathway, and regulates differentiation and tumorigenicity of several types of cancer cells. Multiple alternatively spliced variants, encoding the same protein, have been identifiedThrovided by RefSeq, Sep 2009]. Transcript Variant: This variant (2) has an alternate 5' UTR exon, as compared to variant 1. Both variants 1 and 2 encode the same protein.
BRD7 Exonic 29117 bromodomain- This gene encodes a protein which is a member of the bromodomain-containing containing protein protein family. The product of this gene has been identified as a component of 7 isoform 1 one form of the SWI/SNF chromatin remodeling complex, and as a protein which interacts with p53 and is required for p53-dependent oncogene-induced senescence which prevents tumor growth. Pseudogenes have been described on ci) chromosomes 2, 3, 6, 13 and 14. Alternative splicing results in multiple transcript variants. [provided by RefSeq, Jul 20101. Transcript Variant: This variant (1) represents the longer transcript and encodes the longer isoform (1).

Sequence Note: This RefSeq record was created from transcript and genomic sequence data to make the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on transcript alignments.
BTG4 Exonic 54766 protein BTG4 The protein encoded by this gene is a member of the BTG/Tob family. This family has structurally related proteins that appear to have antiproliferative properties. This encoded protein can induce GI arrest in the cell cycle.
ao [provided by RefSeq, Jul 20081. Sequence Note: This RefSeq record was created from transcript and genomic sequence data to make the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on transcript alignments.
BTN2A1 Exonic 11120 butyrophilin This gene is a member of the BTN2 subfamily of genes, which encode proteins subfamily 2 belonging to the butyrophilin protein family. The gene is located in a cluster on member Al chromosome 6, consisting of seven genes belonging to the expanding isoform 4 precursor B7/butyrophilin-like group, a subset of the immunoglobulin gene superfamily.
The encoded protein is an integral plasma membrane B box protein involved in lipid, fatty-acid and sterol metabolism. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. [provided by RefSeq, Oct 20101. Transcript Variant: This variant (4) has an alternate 3 exon compared to variant 1. The encoded isoform (4) is shorter and has a unique C-terminus compared to isoform 1.
BTN3A3 Exonic 10384 butyrophilin The butyrophilin (BTN) genes are a group of major histocompatibility complex subfamily 3 (MHC)-associated genes that encode type I membrane proteins with 2 member A3 extracellular immunoglobulin (Ig) domains and an intracellular B30.2 isoform c (PRYSPRY) domain. Three subfamilies of human BTN genes are located in the MHC class I region: the single-copy BTN 1A1 gene (MIM 601610) and the BTN2 (e.g., BTN2A1: MIM 613590) and BTN3 (e.g., BNT3A3) genes, which have undergone tandem duplication, resulting in 3 copies of each (summary by Smith et al., 2010 [PubMed 202080081).[supplied by OM1M, Nov 20101.
Transcript Variant: This variant (3) lacks several exons in two regions, but the ci) open reading frame is retained, compared to variant 1. The encoded isoform (c) has a shorter N-terminus and lacks an internal segment, compared to isoform a.
BTNL3 Exonic 10917 butyrophilin-like N/A

protein 3 precursor Cllorf49 Exonic 79096 UPF0705 protein N/A

Cllorf49 isoform 4 is) =
CI lorf96 Exonic 387763 uncharacterized N/A -, w --protein Cllorf96 -, i..) =
C12orf47 Exonic 51275 N/A N/A
C13orf38 Exonic N/A N/A N/A
oc, C13orf38- Exonic N/A N/A N/A

C14orf166 Exonic 51637 UPF0568 protein N/A
C14orf166 C16orf89 Exonic 146556 UPF0764 protein This gene is expressed predominantly in the thyroid. Based on expression C16orf89 isoform 1 patterns similar to thyroid transcription factors and proteins, this gene may precursor function in the development and function of the thyroid. Multiple transcript P
variants encoding different isoforms have been found for this gene. [provided by RefSeq, Oct 20111. Transcript Variant: This variant (1) encodes the longer .9 isoform (1).
o's _., C17orf70 Exonic 80233 Fanconi anemia-FAAP100 is a component of the Fanconi anemia (FA; MIM 277650) core 'g associated protein complex and is required for core complex stability and FANCD2 (see MIM
of 100 kDa isoform 227646) monoubiquitination (Ling et al., 2007 [PubMed 173961471).[supplied , b by OMIM, Mar 20081. Transcript Variant: This variant (2) represents the shorter transcript and encodes the functional protein. Sequence Note: This RefSeq record was created from transcript and genomic sequence data to make the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on transcript alignments.
C17orf90 Exonic 339229 uncharacterized N/A
, protein Cl7orf90 , .
. .ci C18orf16 Exonic N/A N/A N/A
n Clorf106 Exonic 55765 uncharacterized N/A
-i protein Clorf106 ci) i.) isoform 2 =
-, w C 1 orf144 Exonic 26099 UPF0485 protein N/A =-==
Clorf144 isoform 1 ,J1 = P, ca a C2orf15 Exonic 150590 uncharacterized N/A
, protein C2orf15 .
. 0 C2orf48 Exonic 348738 uncharacterized N/A
t.) =
protein C2orf48 -, w , C3orf43 Exonic 255798 uncharacterized N/A
-, i..) =
protein C3orf43 .-C4orf37 Exonic 285555 uncharacterized N/A
oc, protein C4orf37 C6orf126 Exonic 389383 colipase-like N/A
protein C6orf126 precursor C6orf127 Exonic 340204 colipase-like N/A
protein C6orf127 precursor C6orf99 Exonic 100130 putative N/A
P
967 uncharacterized .9 protein C6orf99 o's , C7orf63 Exonic 79846 uncharacterized N/A

protein C7orf63 isoform 1 . , C9orf85 Exonic 138241 uncharacterized N/A
protein C9orf85 C9orf93 Exonic 203238 uncharacterized N/A
protein C9orf93 CACNA2D3 Exonic 55799 voltage-dependent This gene encodes a member of the alpha-2/delta subunit family, a protein in calcium channel the voltage-dependent calcium channel complex. Calcium channels mediate the subunit alpha- influx of calcium ions into the cell upon membrane polarization and consist of a .o 2/delta-3 precursor complex of alpha-1, alpha-2/delta, beta, and gamma subunits in a 1:1:1:1 ratio. n -i Various versions of each of these subunits exist, either expressed from similar genes or the result of alternative splicing. Research on a highly similar protein in ci) t.1 rabbit suggests the protein described in this record is cleaved into alpha-2 and =
-, w delta subunits. Alternate transcriptional splice variants of this gene have been .--observed but have not been thoroughly characterized. [provided by RefSeq, Jul ul =P, c..J

a 2008].
CACNG8 Exonic 59283 voltage-dependent The protein encoded by this gene is a type I transmembrane AMPA receptor calcium channel regulatory protein (TARP). TARPs regulate both trafficking and channel gating gamma-8 subunit of the AMPA receptors. This gene is part of a functionally diverse eight-member protein subfamily of the PMP-22/EMP/MP20 family and is located in a cluster with two family members, a type 11 TARP and a calcium channel gamma subunit. The mRNA for this gene is believed to initiate translation from a non-ao AUG (CUG) start codon. [provided by RefSeq, Dec 20101. Sequence Note: This RefSeq record was created from transcript and genomic sequence data to make the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on transcript alignments.
CADPS2 Exonic 93664 calcium-dependent This gene encodes a member of the calcium-dependent activator of secretion secretion activator (CAPS) protein family, which are calcium binding proteins that regulate the 2 isoform c exocytosis of synaptic and dense-core vesicles in neurons and neuroendocrine cells. Mutations in this gene may contribute to autism susceptibility.
Multiple transcript variants encoding different isoforms have been found for this gene.

[provided by RefSeq, Nov 2009]. Transcript Variant: This variant (3) represents the longest transcript and encodes the longest isoform (c). Sequence Note:
This RefSeq record was created from transcript and genomic sequence data to make the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on transcript alignments.
CAM SAP1L1 Exonic N/A N/A N/A
CAPN14 Exonic 440854 calpain-14 Calpains are a family of cytosolic calcium-activated cysteine proteases involved in a variety of cellular processes including apoptosis, cell division, modulation of integrin-cytoskeletal interactions, and synaptic plasticity (Dear et al., [PubMed 10964513]). CAPN14 belongs to the calpain large subunit family. [supplied by OM1M, Mar 2008].
CASC4 Exonic 113201 protein CASC4 The increased expression level of this gene is associated with HER-2/neu proto-isoform b oncogene overexpression.
Amplification and resulting overexpression of this proto-oncogene are found in approximately 30% of human breast and 20% of ci) human ovarian cancers. Alternatively spliced variants encoding different isoforms have been identified for this gene. [provided by RefSeq, Dec 2010].
Transcript Variant: This variant (2) lacks an in-frame segment of the coding ,J1 region, compared to variant 1. It encodes a shorter isoform (b), that is missing an internal segment compared to isoform a.
CASP10 Exonic 843 caspase-10 isoform This gene encodes a protein which is a member of the cysteine-aspartic acid 6 preproprotein protease (caspase) family. Sequential activation of caspases plays a central role in the execution-phase of cell apoptosis. Caspases exist as inactive proenzymes which undergo proteolytic processing at conserved aspartic residues to produce two subunits, large and small, that dimerize to form the active enzyme. This ao protein cleaves and activates caspases 3 and 7, and the protein itself is processed by caspase 8. Mutations in this gene are associated with type IIA autoimmune lymphoprol ife rah ve syndrome, non-Hodgkin lymphoma and gastric cancer.
Alternatively spliced transcript variants encoding different isoforms have been described for this gene. [provided by RefSeq, Apr 20111. Transcript Variant:
This variant (6) lacks two in-frame coding exons compared to variant 1. This results in a shorter isoform (6) missing an internal protein segment compared to isoform 1. Sequence Note: This RefSeq record was created from transcript and genomic sequence data to make the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on transcript alignments.
CCDC137 Exonic 339230 coiled-coil domain- N/A
containing protein CCDC18 Exonic 343099 coiled-coil domain- N/A
containing protein CCDC57 Exonic 284001 coiled-coil domain- N/A
containing protein CCM2 Exonic 83605 malcavernin This gene encodes a scaffold protein that functions in the stress-activated p38 isofonn 4 Mitogen-activated protein kinase (MAPK) signaling cascade. The protein interacts with SMAD specific E3 ubiquitin protein ligase 1 (also known as ci) SMURF1) via a phosphotyrosine binding domain to promote RhoA degradation.
The protein is required for normal cytoskeletal structure, cell-cell interactions, and lumen formation in endothelial cells. Mutations in this gene result in cerebral cavernous malformations. Multiple transcript variants encoding different isoforms have been found for this gene.[provided by RefSeq, Nov 20091. Transcript Variant: This variant (4) represents use of an alternate promoter and 5' UTR, uses a distinct start codon, and lacks two alternate in-frame exons in the central coding region, compared to variant 1. The resulting isoform (4) has a shorter and distinct N-terminus and lacks an internal segment, compared to isoform I. Publication Note: This RefSeq record includes a subset ao of the publications that are available for this gene. Please see the Gene record to access additional publications.
CD109 Exonic 135228 CD109 antigen This gene encodes a member of the a1pha2-macroglobulin/complement isoform 3 precursor superfamily. The encoded GPI-linked glycoprotein is found on the cell surface of platelets, activated T-cells, and endothelial cells. The protein binds to and negatively regulates signaling of transforming growth factor beta (TGF-beta).
Multiple transcript variants encoding different isoforms have been found for this gene. [provided by RefSeq, Apr 20091. Transcript Variant: This variant (3) lacks an alternate in-frame exon in the 5' coding region, compared to variant 1. The resulting isoform (3) lacks an internal 77-an segment near the N-terminus, compared to isoform 1.
CD46 Exonic 4179 membrane cofactor The protein encoded by this gene is a type I membrane protein and is a protein isoform 14 regulatory part of the complement system. The encoded protein has cofactor precursor activity for inactivation of complement components C3b and C4b by serum factor I, which protects the host cell from damage by complement. In addition, the encoded protein can act as a receptor for the Edmonston strain of measles virus, human herpesvirus-6, and type IV pili of pathogenic Neisseria. Finally, the protein encoded by this gene may be involved in the fusion of the spermatozoa with the oocyte during fertilization. Mutations at this locus have been associated with susceptibility to hemolytic uremic syndrome.
Alternatively spliced transcript variants encoding different isoforms have been described.
[provided by RefSeq, Jun 2010]. Transcript Variant: This variant (n) lacks three alternate in-frame cxons as well as an alternate segment compared to variant a, ci) which causes a frameshift. The resulting isoform (14) is shorter and has a distinct C-terminus compared to isoform 1.
CDH13 Exonic 1012 cadherin-13 This gene encodes a member of the cadherin superfamily. The encoded protein L-4 isoform 6 precursor is localized to the surface of the cell membrane and is anchored by a GPI
moiety, rather than by a transmembrane domain. The protein lacks the cytoplasmic domain characteristic of other cadherins, and so is not thought to be a cell-cell adhesion glvcoprotein. This protein acts as a negative regulator of axon growth during neural differentiation. It also protects vascular endothelial cells from apoptosis due to oxidative stress, and is associated with resistance to atherosclerosis. The gene is hypermethylated in many types of cancer.
ao Alternative splicing results in multiple transcript variants encoding different isoforms. [provided by RefSeq, May 2011]. Transcript Variant: This variant (6) lacks several coding exons and includes two alternate exons at the 3' end, compared to variant 1. It encodes isoform 6, which is shorter and has a distinct C-terminus, compared to isoform 1.
CECR2 Exonic 27443 cat eye syndrome N/A
critical region protein 2 CEL Exonic 1056 bile salt-activated The protein encoded by this gene is a glycoprotein secreted from the pancreas lipase precursor into the digestive tract and from the lactating mammary gland into human milk.
The physiological role of this protein is in cholesterol and lipid-soluble vitamin ester hydrolysis and absorption. This encoded protein promotes large chylomicron production in the intestine. Also its presence in plasma suggests its interactions with cholesterol and oxidized lipoproteins to modulate the progression of atherosclerosis. In pancreatic tumoral cells, this encoded protein is thought to be sequestrated within the Golgi compartment and is probably not secreted. This gene contains a variable number of tandem repeat (VNTR) polymorphism in the coding region that may influence the function of the encoded protein. [provided by RefSeq, Jul 2008].
CELSR3 Exonic 1951 cadherin EGF LAG The protein encoded by this gene is a member of the flamingo subfamily, part seven-pass G-type of the cadherin superfamily. The flamingo subfamily consists of nonclassic-type receptor 3 cadherins; a subpopulation that does not interact with catenins. The flamingo precursor cadherins are located at the plasma membrane and have nine cadherin domains, ci) seven epidermal growth factor-like repeats and two laminin A G-type repeats in their ectodomain. They also have seven transmembrane domains, a characteristic unique to this subfamily. It is postulated that these proteins are receptors involved in contact-mediated communication, with cadherin domains acting as homophilic binding regions and the EGF-like domains involved in cell adhesion and receptor-ligand interactions. The specific function of this t.) particular member has not been determined. [provided by RefSeq, Jul 20081.
CEPS 7 Exonic 9702 centrosomal This gene encodes a cytoplasmic protein called Translokin. This protein protein of 57 kDa localizes to the centrosome and has a function in microtubular stabilization. The isoform a N-terminal half of this protein is required for its centrosome localization and for ao its multimerization, and the C-terminal half is required for nucleating, bundling and anchoring microtubules to the cerrtrosomes. This protein specifically interacts with fibroblast growth factor 2 (FGF2), sorting nexin 6, Ran-binding protein M and the kinesins KIF3A and KIF3B, and thus mediates the nuclear translocation and mitogenic activity of the FGF2. It also interacts with cyclin D1 and controls nucleocytoplasmic distribution of the cyclin D1 in quiescent cells.
This protein is crucial for maintaining correct chromosomal number during cell division. Mutations in this gene cause mosaic variegated aneuploidy syndrome, a rare autosomal recessive disorder. Multiple alternatively spliced transcript variants encoding different isoforms have been identified. [provided by RefSeq, Aug 2011]. Transcript Variant: This variant (1) encodes the longest isoform (a).
CETN2 Exonic 1069 centrin-2 Caltractin belongs to a family of calcium-binding proteins and is a structural component of the centrosome. The high level of conservation from algae to humans and its association with the centrosome suggested that caltractin plays a fundamental role in the structure and function of the microtubule-organizing center, possibly required for the proper duplication and segregation of the centrosome. [provided by RefSeq, Jul 20081.
CETN3 Exonic 1070 centrin-3 The protein encoded by this gene contains four EF-hand calcium binding domains, and is a member of the centrin protein family. Centrins are evolutionarily conserved proteins similar to the CDC31 protein of S.
cerevisiae.
Yeast CDC31 is located at the centrosome of interphase and mitotic cells, where it plays a fundamental role in centrosome duplication and separation. Multiple forms of the proteins similar to the yeast centrin have been identified in human ci) and other mammalian cells, some of which have been shown to be associated with centrosome fractions. This protein appears to be one of the most abundant =-==
centrins associated with centrosome, which suggests a similar function to its yeast counterpart. [provided by RefSeq, Jul 20081. Publication Note: This RefSeq record includes a subset of the publications that are available for this gene. Please see the Gene record to access additional publications.
t.) CFLAR Exonic 8837 CASP8 and The protein encoded by this gene is a regulator of apoptosis and is structurally FADD-like similar to caspase-8. However, the encoded protein lacks caspase activity and apoptosis regulator appears to be itself cleaved into two peptides by caspase-8. Several transcript isoform 6 variants encoding different isoforms have been found for this gene, and partial .. ao evidence for several more variants exists. [provided by RefSeq. Feb 20111.
Transcript Variant: This variant (7) differs in the 5' UTR and coding sequence and the 3 UTR and coding sequence compared to variant 1. The resulting isoform (6) is shorter at the N-terminus and has a shorter and distinct C-terminus compared to isoform 1. Variants 7 and 8 both encode isoform 6.
Sequence Note: This RefSeq record was created from transcript and genomic sequence data to make the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on transcript alignments.
CHL1 Exonic 10752 neural cell The protein encoded by this gene is a member of the Li gene family of neural adhesion molecule cell adhesion molecules. It is a neural recognition molecule that may be Li-like protein involved in signal transduction pathways. The deletion of one copy of this gene precursor may be responsible for mental defects in patients with 3p- syndrome. Several alternatively spliced transcript variants of this gene have been described, but their full length nature is not known. [provided by RefSeq, Jul 20081.
CLEC4A Exonic 50856 C-type lectin This gene encodes a member of the C-type lectin/C-type lectin-like domain domain family 4 (CTL/CTLD) superfamily. Members of this family share a common protein fold member A isoform and have diverse functions, such as cell adhesion, cell-cell signalling, 2 glycoprotein turnover, and roles in inflammation and immune response. The encoded type 2 transmembrane protein may play a role in inflammatory and immune response. Multiple transcript variants encoding distinct isoforms have been identified for this gene. This gene is closely linked to other CTL/CTLD
superfamily members on chromosome 12p13 in the natural killer gene complex ci) region. [provided by RefSeq, Jul 20081. Transcript Variant: This variant (2), also known as C-type lectin DDB27 short form, lacks an in-frame segment of the coding region, compared to variant 1. It encodes a shorter isoform (2), that is C.AJ

missing the transmembrane domain compared to isoform 1.
CLECL1 Exonic 160365 C-type lectin-like DCAL1 is a type II
transmembrane, C-type lectin-like protein expressed on domain family 1 dendritic cells (DCs) and B cells. It interacts with subsets of T cells as a costimulatory molecule that enhances interleukin-4 (IL4: MIM 147780) production.[supplied by OMIM, Apr 2004].
CLOCK Exonic 9575 circadian This gene encodes a protein that belongs to the basic helix-loop-helix (bHLH) locomoter output family of transcription factors.
Polymorphisms within the encoded protein have ao cycles protein been associated with circadian rhythm sleep disorders. A similar protein in mice kaput is a circadian regulator that acts as a transcription factor and foinis a heterodimer with aryl hydrocarbon receptor nuclear translocator-like to activate transcription of mouse period 1. [provided by RefSeq, Jul 2008].
CNTLN Exonic 54875 centlein isoform 2 N/A
CNTN4 Exonic 152330 contactin-4 isoform This gene encodes a member of the contactin family of immunoglobulins.
a precursor Contactins are axon-associated cell adhesion molecules that function in neuronal network formation and plasticity. The encoded protein is a glycosylphosphatidylinositol-anchored neuronal membrane protein that may play a role in the formation of axon connections in the developing nervous system. Deletion or mutation of this gene may play a role in 3p deletion syndrome and autism spectrum disorders. Alternative splicing results in multiple transcript variants. [provided by RefSeq, May 2011]. Transcript Variant: This variant (1) encodes the longest isoform (a). Both variants 1 and 4 encode the same isoform.
CNTN5 Exonic 53942 contactin-5 isoform The protein encoded by this gene is a member of the immunoglobulin 1 precursor superfamily, and contactin family, which mediate cell surface interactions during nervous system development. This protein is a glycosylphosphatidylinositol (GPO-anchored neuronal membrane protein that functions as a cell adhesion molecule. It may play a role in the formation of axon connections in the developing nervous system. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. [provided by RefSeq, Aug 2011], Transcript Variant: This variant (2) ci) lacks an exon in the 5' non-coding region, thus has a shorter 5' UTR compared to variant 1. Variants 1 and 2 encode the same isoform (1). Sequence Note:
This =-==
RefSeq record was created from transcript and genomic sequence data to make ,J1 the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on transcript alignments.
CNTNAP2 Both 26047 contactin- This gene encodes a member of the neurexin family which functions in the t.) associated protein- vertebrate nervous system as cell adhesion molecules and receptors. This like 2 precursor protein, like other neurexin proteins, contains epidermal growth factor repeats and laminin G domains. In addition, it includes an F5/8 type C domain, discoidin/ncuropilin- and fibrinogen-like domains, thrombospondin N-terminal-ao like domains and a putative PDZ binding site. This protein is localized at the juxtaparanodes of myelinated axons, and mediates interactions between neurons and glia during nervous system development and is also involved in localization of potassium channels within differentiating axons. This gene encompasses almost 1.5% of chromosome 7 and is one of the largest genes in the human genome. It is directly bound and regulated by forkhead box protein P2 (FOXP2), a transcription factor related to speech and language development. This gene has been implicated in multiple neurodevelopmental disorders, including Gilles de la Tourette syndrome, schizophrenia, epilepsy, autism, ADHD and mental retardation.[provided by RefSeq, Mar 2010]. Sequence Note: This RefSeq record was created from transcript and genomic sequence data to make the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on transcript alignments.
CNTNAP5 Exonic 129684 contactin- This gene product belongs to the neurexin family, members of which function in associated protein- the vertebrate nervous system as cell adhesion molecules and receptors. This like 5 precursor protein, like other neurexin proteins, contains epidermal growth factor repeats and laminin G domains. In addition, it includes an F5/8 type C domain, discoidin/neuropilin- and fibrinogen-like domains, and thrombospondin N-terminal-like domains. [provided by RefSeq, Jul 2008].
COL24A1 Exonic 255631 collagen alpha- N/A
1(XXIV) chain precursor COL27A I Exonic 85301 collagen alpha- Fibrillar collagens, such as COL27A1, compose one of the most ancient families ci) 1(XXVII) chain of extracellular matrix molecules.
They form major structural elements in preproprotein extracellular matrices of cartilage, skin, and tendon (Boot-Handford et al., 2003 =-==
[PubMed 12766169]) .[supplied by OMIM, Mar 2008].

COL7A1 Exonic 1294 collagen alpha- This gene encodes the alpha chain of type VII collagen. The type VII collagen 1(VII) chain fibril, composed of three identical alpha collagen chains, is restricted to the precursor basement zone beneath stratified squamous epithelia. It functions as an t.) anchoring fibril between the external epithelia and the underlying stroma.
Mutations in this gene are associated with all forms of dystrophic epidennolysis bullosa. In the absence of mutations, however, an acquired form of this disease can result from an autoimmune response made to type VII collagen. [provided ao by RefSeq, Jul 20081. Publication Note: This RefSeq record includes a subset of the publications that are available for this gene. Please see the Gene record to access additional publications.
COMMD7 Exonic 149951 COMM domain- N/A
containing protein 7 isoform 2 CORIN Exonic 10699 atrial natriuretic This gene encodes a member of the type II transmembrane serine protease class peptide-converting of the trypsin superfamily. Members of this family are composed of multiple enzyme structurally distinct domains. The encoded protein converts pro-atrial natriuretic peptide to biologically active atrial natriuretic peptide, a cardiac hormone that regulates blood volume and pressure. This protein may also function as a pro-brain-type natriuretic peptide convertase. [provided by RefSeq, Jul 20081.
COX18 Exonic 285521 mitochondrial inner COX18 encodes a cytochrome c oxidase (COX)-assembly protein. The S.
membrane protein cerevisiae Cox18 protein catalyzes the insertion of the Cox2 (MTCO2; MIM
COX18 precursor 516040) C-terminal tail into the mitochondrial inner membrane, an intermediate step in the assembly of complex IV of the mitochondrial respiratory chain (Sacconi et al., 2005 [PubMed 162129371).[supplied by OMIM, Mar 20081.
CPNE9 Exonic 151835 copine -9 N/A
CREBBP Exonic 1387 CREB-binding This gene is ubiquitously expressed and is involved in the transcriptional protein isoform b coactivation of many different transcription factors. First isolated as a nuclear protein that binds to cAMP-response element binding protein (CREB), this gene is now known to play critical roles in embryonic development, growth control, and homeostasis by coupling chromatin remodeling to transcription factor ci) recognition. The protein encoded by this gene has intrinsic histone acetyltransferase activity and also acts as a scaffold to stabilize additional protein interactions with the transcription complex. This protein acetylates both histone and non-histone proteins. This protein shares regions of very high sequence similarity with protein p300 in its bromodomain, cysteine-histidine-rich regions, and histone acetyltransferase domain. Mutations in this gene cause t.) Rubinstein-Taybi syndrome (RTS). Chromosomal translocations involving this gene have been associated with acute myeloid leukemia. Alternative splicing results in multiple transcript variants encoding different isoforms. [provided by RefSeq, Feb 20091. Transcript Variant: This variant (2) lacks an alternate in-ao frame exon in the 5 coding region, compared to variant 1, resulting in a shorter protein (isoform b), compared to isoform a.
CSDAP1 Exonic 440359 N/A N/A
CSGALNACT Exonic 55454 chondroitin sulfate N/A

acetylgalactosamin yltransferase 2 CSNK1D Exonic 1453 casein kinase I This gene is a member of the casein kinase I (CM) gene family whose isoform delta members have been implicated in the control of cytoplasmic and nuclear isoform 2 processes, including DNA replication and repair. The encoded protein is highly similar to the mouse and rat CK1 delta homologs. Two transcript variants encoding different isoforniis have been found for this gene. [provided by RefSeq, Jul 20081. Transcript Variant: This variant (2) has an additional exon at the 3' end compared to transcript variant 1. This results in a shorter isoform (2) with a different C-terminus compared to isoform 1. Sequence Note: This RefSeq record was created from transcript and genomic sequence data to make the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on transcript alignments.
Publication Note: This RefSeq record includes a subset of the publications that are available for this gene. Please see the Gene record to access additional publications.
CTDSP1 Exonic 58190 carboxy-terminal This gene encodes a member of the small C-terminal domain phosphatase domain RNA (SCP) family of nuclear phosphatases.
These proteins play a role in ci) polymerase II transcriptional regulation through specific dephosphorylation of phosphoserine polypeptide A 5 within tandem heptapeptide repeats of the C-terminal domain of RNA
small phosphatase polymerase II. The encoded protein plays a role in neuronal gene silencing in 1 isoform 3 non-neuronal cells, and may also inhibit osteoblast differentiation. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. [provided by RefSeq, Oct 20111. Transcript Variant: This variant (3) differs in the 5' UTR and has multiple differences in the coding region, including the use of an alternate start codon, compared to variant 1. The encoded isoform (3) is shorter and has a distinct N-terminus, compared to isoform 1. Sequence Note: This RefSeq record was created from transcript and ao gcnomic sequence data to make the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on transcript alignments.
CTNNA3 Exonic 29119 catenin alpha-3 N/A
CTSL2 Exonic 1515 cathepsin L2 The protein encoded by this gene, a member of the peptidase Cl family, is a preproprotein lysosomal cysteine proteinase that may play an important role in corneal physiology. This gene is expressed in colorectal and breast carcinomas but not in normal colon, mammary gland, or peritumoral tissues, suggesting a possible role for this gene in tumor processes. Alternatively spliced variants, encoding the same protein, have been identified. [provided by RefSeq. Jan 20111.
Transcript Variant: This variant (2) differs in the 5' UTR compared to variant 1.
Both variants 1 and 2 encode the same protein. Sequence Note: This RefSeq record was created from transcript and genomic sequence data to make the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on transcript alignments.
Publication Note: This RefSeq record includes a subset of the publications that are available for this gene. Please see the Gene record to access additional publications.
CUTA Exonic 51596 protein CutA N/A
isoform 3 precursor CXorf57 Exonic 55086 uncharacterized N/A
protein CXorf57 isoform 2 ci) CYB5R1 Exonic 51706 NADH-cytochrome N/A
b5 reductase 1 CYP1A1 Exonic 1543 cytochrome P450 This gene, CYP 1A1, encodes a member of the cvtochrome P450 superfamily of C.AJ

lA 1 enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum and t.) its expression is induced by some polycyclic aromatic hydrocarbons (PAHs), some of which are found in cigarette smoke. The enzyme's endogenous substrate is unknown; however, it is able to metabolize some PAHs to carcinogenic intermediates. The gene has been associated with lung cancer risk.
A related family member, CYPIA2, is located approximately 25 kb away from CYP 1A1 on chromosome 15. [provided by RefSeq, Jul 2008]. Sequence Note:
The RefSeq transcript and protein were derived from genomic sequence to make the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on alignments.
CYP51A1 Exonic 1595 lanosterol 14-alpha This gene encodes a member of the cytochrome P450 superfamily of enzymes.
demethylase The cytochrome P450 proteins are monooxygenases which catalyze many isoform 2 reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This endoplasmic reticulum protein participates in the synthesis of cholesterol by catalyzing the removal of the 14alpha-methyl group from lanosterol. Homologous genes are found in all three eukaryotic phyla, fungi, plants, and animals, suggesting that this is one of the oldest cytochrome P450 genes. Two transcript variants encoding different isoforms have been found for this gene. [provided by RefSeq, Mar 2009]. Transcript Variant: This variant (2) differs in the 5 UTR and coding sequence compared to variant 1. The resulting isoform (2) is shorter at the N-terminus compared to isoform I. Sequence Note:

This RefSeq record was created from transcript and genomic sequence data to make the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on transcript alignments.
DAPP1 Exonic 27071 dual adapter for N/A
phosphotyrosine and 3-ci) phosphotyrosine and 3-phosphoinositide C.AJ

DCXR Exonic 51181 L-xylulose The protein encoded by this gene acts as a homotetramer to catalyze diacetyl reductase isoform 2 reductase and L-xylulose reductase reactions. The encoded protein may play a role in the uronate cycle of glucose metabolism and in the cellular t.) osmoregulation in the proximal renal tubules. Defects in this gene are a cause of pentosuria. Two transcript variants encoding different isoforms have been found for this gene .[provided by RefSeq, Aug 2010]. Transcript Variant: This variant (2) uses an alternate in-frame splice junction at the 5' end of an exon compared ao to variant 1. The resulting isoform (2) has the same N- and C-termini but is 2 aa shorter compared to isoform 1.
DDX58 Exonic 23586 probable ATP- DEAD box proteins, characterized by the conserved motif A sp-Glu-Ala-Asp dependent RNA (DEAD), are putative RNA helicases which are implicated in a number of helicase DDX58 cellular processes involving RNA
binding and alteration of RNA secondary structure. This gene encodes a protein containing RNA helicase-DEAD box protein motifs and a caspase recruitment domain (CARD). It is involved in viral double-stranded (ds) RNA recognition and the regulation of immune response.
[provided by RefSeq, Jul 20081.
DEFA6 Exonic 1671 defensin-6 Defensins are a family of microbicidal and cytotoxic peptides thought to be preproprotein involved in host defense. They are abundant in the granules of ncutrophils and also found in the epithelia of mucosal surfaces such as those of the intestine, respiratory tract, urinary tract, and vagina. Members of the defensin family are highly similar in protein sequence and distinguished by a conserved cysteine motif Several alpha defensin genes appear to be clustered on chromosome 8.
The protein encoded by this gene, defensin, alpha 6, is highly expressed in the secretory granules of Paneth cells of the small intestine, and likely plays a role in host defense of human bowel. [provided by RefSeq, Jul 2008].
DEFB1 Exonic 1672 beta-defensin 1 Defensins form a family of microbicidal and cytotoxic peptides made by preproprotein neutrophils. Members of the defensin family are highly similar in protein sequence. This gene encodes defensin, beta 1, an antimicrobial peptide implicated in the resistance of epithelial surfaces to microbial colonization.
This gene maps in close proximity to defensin family member, defensin, alpha 1 and ci) has been implicated in the pathogenesis of cystic fibrosis. [provided by RefSeq, Jul 2008].
DHPS Exonic 1725 N/A This gene encodes a protein that is required for the formation of hypusine, a C.AJ

unique amino acid formed by the posttranslational modification of only one protein, eukaryotic translation initiation factor 5A. The encoded protein catalyzes the first step in hypusine formation by transferring the butylamine moiety of sperniidine to a specific lysine residue of the eukaryotic translation initiation factor 5A precursor, forming an intermediate deoxyhypusine residue.

Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. [provided by RefSeq, May 20111. Transcript Variant:
ao This variant (5) lacks an alternate internal exon, compared to variant 1. This variant is represented as non-coding because the use of the 5'-most expected translational start codon, as used in variant 1, renders the transcript a candidate for nonsense-mediated mRNA decay (NMD).
DIAPH2 Exonic 1730 protein diaphanous The product of this gene belongs to the diaphanous subfamily of the formin homolog 2 isoform homology family of proteins. This gene may play a role in the development and 12C normal function of the ovaries.
Defects in this gene have been linked to premature ovarian failure 2. Alternatively spliced transcript variants encoding different isoforms have been identified. [provided by RefSeq. Jul 20081.
Transcript Variant: This variant (12C) differs in the 3' UTR and the 3' coding region, compared to variant 156. The resulting isoform (isoform 12C) contains a distinct C-terminus, compared to isoform 156.
DMD Exonic 1756 dystrophin Dp140c The dystrophin gene is the largest gene found in nature, measuring 2.4 Mb. The 03 isoform gene was identified through a positional cloning approach, targeted at the isolation of the gene responsible for Duchenne (DMD) and Becker (BMD) Muscular Dystrophies. DMD is a recessive, fatal, X-linked disorder occurring at a frequency of about 1 in 3,500 new-born males. BMD is a milder allelic form.
In general, DMD patients carry mutations which cause premature translation termination (nonsense or frame shift mutations), while in BMD patients dystrophin is reduced either in molecular weight (derived from in-frame deletions) or in expression level. The dystrophin gene is highly complex, containing at least eight independent, tissue-specific promoters and two polyA-addition sites. Furthermore, dystrophin RNA is differentially spliced, producing ci) a range of different transcripts, encoding a large set of protein isofornis.
Dystrophin (as encoded by the Dp427 transcripts) is a large, rod-like cytoskeletal protein which is found at the inner surface of muscle fibers.

C.AJ

Dystrophin is part of the dystrophin-glycoprotein complex (DGC), which bridges the inner cytoskeleton (F-actin) and the extra-cellular matrix.
[provided by RefSeq, Jul 20081. Transcript Variant: Dp140 transcripts use exons 45-79, t.) starting at a promoter/exon 1 located in intron 44. Dp140 transcripts have a long (1 kb) 5' UTR since translation is initiated in exon 51 (corresponding to aa of dystrophin). In addition to the alternative promoter and exon 1, differential splicing of exons 71-74 and 78 produces at least five Dp140 isoforms. Of these, ao this transcript (Dp140c) lacks cxons 71-74. Sequence Note: This RefSeq record was created from transcript and genomic sequence data to make the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on transcript alignments.
DNAH3 Exonic 55567 dynein heavy chain N/A
3, axonemal DNASE1 L3 Exonic 1776 deoxyribonuclease This gene encodes a member of the DNase family. The protein hydrolyzes gamma precursor DNA, is not inhibited by actin, and mediates the breakdown of DNA during apoptosis. Alternate transcriptional splice variants of this gene have been observed but have not been thoroughly characterized. [provided by RefSeq, Jul 2008].
DNT'TIP2 Exonic 30836 deoxynucleotidyltr This gene is thought to be involved in chromatin remodeling and gene ansferase terminal- transcription. The encoded nuclear protein binds to and enhances the interacting protein transcriptional activity of the estrogen receptor alpha, and also interacts with 2 terminal deoxynucleotidyltransferase.
The expression profile of this gene is a potential biomarker for chronic obstructive pulmonary disease. [provided by RefSeq, Dec 2010].
DPP6 Exonic 1804 dipeptidyl This gene encodes a single-pass type II membrane protein that is a member of aminopeptidase- the S9B family in clan SC of the serine proteases. This protein has no detectable like protein 6 protease activity, most likely due to the absence of the conserved scrine residue isoform 2 normally present in the catalytic domain of scrine protcascs. However, it does bind specific voltage-gated potassium channels and alters their expression and biophysical properties. Alternate transcriptional splice variants, encoding ci) different isoforms, have been characterized. [provided by RefSeq, Jul 20081.
Transcript Variant: This variant (2) includes an alternate in-frame exon, compared to variant 1, resulting in a shorter protein (isoform 2, also referred to ,J1 C.AJ

as S) that has a shorter and distinct N-terminus, compared to isoform 1.
Sequence Note: This RefSeq record was created from transcript and genomic sequence data to make the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on transcript alignments. Publication Note: This RefSeq record includes a subset of the publications that are available for this gene. Please see the Gene record to access additional publications.
ao DPYD Exonic 1806 dihydropyrimidine The protein encoded by this gene is a pyrimidine catabolic enzyme and the dehydrogenase initial and rate-limiting factor in the pathway of uracil and thymidine [NADP+] isoform catabolism. Mutations in this gene result in dihydropyrimidine dehydrogenase 1 deficiency, an error in pyrimidine metabolism associated with thymine-uraciluria and an increased risk of toxicity in cancer patients receiving 5-fluorouracil chemotherapy. Two transcript variants encoding different isoforms have been found for this gene. [provided by RefSeq. May 20091. Transcript Variant: This variant (1) represents the longer transcript and encodes the longer isoform (1).
DUS1L Exonic 64118 tRN A- N/A
dihydrouridinc synthase 1-like DYNC2LI1 Exonic 51626 cytoplasmic dynein N/A

2 light intermediate chain 1 isoform 4 DYSFIP1 Exonic N/A N/A N/A
EBF3 Exonic 253738 transcription factor This gene encodes a member of the early B-cell factor (EBF) family of DNA
COE3 binding transcription factors. EBF
proteins are involved in B-cell differentiation, bone development and neurogenesis, and may also function as tumor suppressors. The encoded protein inhibits cell survival through the regulation of genes involved in cell cycle arrest and apoptosis, and aberrant methylation or deletion of this gene may play a role in multiple malignancies including glioblastoma multiforme and gastric carcinoma. [provided by RefSeq, Sep ci) 2011].
EFTUD 1 Exonic 79631 elongation factor N/A
=-==
Tu GTP-binding C.AJ

domain-containing protein 1 isoform 2 EHD3 Exonic 30845 EH domain- N/A
containing protein ELAVL3 Exonic 1995 ELAV-like protein A member of the ELAVL protein family, ELAV-like 3 is a neural-specific 3 isoform 2 RNA-binding protein which contains three RNP-type RNA recognition motifs. .. ao The observation that ELAVL3 is one of several Hu antigens (neuronal-specific RNA-binding proteins) recognized by the anti-Hu serum antibody present in sera from patients with paraneoplastic encephalomyelitis and sensory neuronopathy (PEM/PSN) suggests it has a role in neurogenesis. Two alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. [provided by RefSeq, Jul 20081. Transcript Variant: This variant (2) lacks an in-frame segment in the coding region, as compared to variant 1. It encodes isoform 2 which lacks an internal segment, as compared to isoform 1.
ELK3 Exonic 2004 ETS domain- The protein encoded by this gene is a member of the ETS-domain transcription containing protein factor family and the ternary complex factor (TCF) subfamily. Proteins in this Elk-3 subfamily regulate transcription when recruited by serum response factor to bind to serum response elements. This protein is activated by signal-induced phosphorylation; studies in rodents suggest that it is a transcriptional inhibitor in the absence of Ras, but activates transcription when Ras is present. [provided by RefSeq, Jul 2008].
EMCN Exonic 51705 endomucin isoform EMCN is a mucin-like sialoglycoprotein that interferes with the assembly of 1 focal adhesion complexes and inhibits interaction between cells and the extracellular matrix (Kinoshita et al., 2001 [PubMed 11418125]).[supplied by OMIM, Mar 2008]. Transcript Variant: This variant (1) represents the longer transcript and encodes the longer isoform (1). Sequence Note: This RefSeq record was created from transcript and genomic sequence data to make the sequence consistent with the reference genome assembly. The genomic ci) coordinates used for the transcript record were based on transcript alignments.
EMID2 Exonic 136227 collagen alpha- N/A
=-==
1(XXVI) chain ,J1 C.AJ

precursor EPHA8 Exonic 2046 ephrin type-A This gene encodes a member of the ephrin receptor subfamily of the protein-receptor 8 isoform tyrosine kinase family. EPH and EPH-related receptors have been implicated in t.) 2 precursor mediating developmental events, particularly in the nervous system. Receptors in the EPH subfamily typically have a single kinase domain and an extmcellular region containing a Cys-rich domain and 2 fibroncctin type III repeats. The ephrin receptors are divided into 2 groups based on the similarity of their ao extracellular domain sequences and their affinities for binding ephrin-A and ephrin-B ligands. The protein encoded by this gene functions as a receptor for ephrin A2, A3 and A5 and plays a role in short-range contact-mediated axonal guidance during development of the mammalian nervous system. [provided by RefSeq, Jul 2008]. Transcript Variant: This variant (2) uses an alternate splice site in the 3' coding region, compared to variant 1, that results in a frameshift. It encodes isoform 2, which has a shorter and distinct C-terminus compared to isoform 1. This transcript is supported by mRNA transcripts but the predicted ORE and its predicted precursor sequence have not yet been experimentally confirmed.
EPS8L3 Exonic 79574 epidermal growth This gene encodes a protein that is related to epidermal growth factor receptor factor receptor pathway substrate 8 (EPS8), a substrate for the epidermal growth factor kinase substrate 8- receptor. The function of this protein is unknown.
Alternatively spliced like protein 3 transcript variants encoding different isoforms exist. [provided by RefSeq, Jul isoform c 2008]. Transcript Variant: This variant (3) uses different splice acceptor sites for two coding region exons compared to variant 1. The encoded protein (isoform c) is shorter when it is compared to isoform a.
EP STI1 Exonic 94240 epithelial-stromal N/A
interaction protein 1 isoform 1 ETS1 Exonic 2113 protein C-cts-1 This gene encodes a member of the ETS family of transcription factors, which isoform 1 are defined by the presence of a conserved ETS DNA-binding domain that recognizes the core consensus DNA sequence GGAA/T in target genes. These ci) proteins function either as transcriptional activators or repressors of numerous genes, and are involved in stem cell development, cell senescence and death, =-==
and tumorigenesis. Alternatively spliced transcript variants encoding different C.AJ

isoforms have been described for this gene. [provided by RefSeq, Jul 2011].
Transcript Variant: This variant (1) encodes the longest isoform (1).
F8A1 Exonic 8263 factor VIII intron This gene is contained entirely within intron 22 of the factor VIII gene: spans 22 protein less than 2 kb, and is transcribed in the direction opposite of factor VIII. A
portion of intron 22 (int22h), containing F8A, is repeated twice extragenically closer to the Xq telomere. Although its function is unknown, the observation that this gene is conserved in the mouse implies it has some function. Unlike ao factor VIII, this gene is transcribed abundantly in a wide variety of cell types.
[provided by RefSeq, Jul 20081.
F8A2 Exonic 474383 factor VIII intron This gene is part of a region that is repeated three times on chromosome X, once 22 protein in intron 22 of the F8 gene and twice closer to the Xq telomere. This record represents the middle copy. Although its function is unknown, the observation that this gene is conserved in the mouse implies it has some function. Unlike factor VIII, this gene is transcribed abundantly in a wide variety of cell types.
[provided by RefSeq, Jul 20081.
F8A3 Exonic 474384 factor VIII intron This gene is part of a region that is repeated three times on chromosome X, once 22 protein in intron 22 of the F8 gene and twice closer to the Xq telomere. This record represents the most telomeric copy. Although its function is unknown, the observation that this gene is conserved in the mouse implies it has some function. Unlike factor VIII, this gene is transcribed abundantly in a wide variety of cell types. [provided by RefSeq, Jul 2008].
FA2H Exonic 79152 fatty acid 2- This gene encodes a protein that catalyzes the synthesis of 2-hydroxylase hydroxysphingolipids, a subset of sphingolipids that contain 2-hydroxy fatty acids. Sphingolipids play roles in many cellular processes and their structural diversity arises from modification of the hydrophobic ceramide moiety, such as by 2-hydroxylation of the N-acyl chain, and the existence of many different head groups. Mutations in this gene have been associated with leukodystrophy dysmyclinating with spastic paraparcsis with or without dystonia.[provided by RefSeq, Mar 2010].
FAM154B Exonic 283726 protein FAM154B N/A
ci) FAM189A1 Exonic 23359 protein N/A

FAM83G Exonic 644815 protein FAM83G N/A
C.AJ

FAM9B Exonic 171483 protein FAM9B This gene is a member of a gene family which arose through duplication on the X chromosome. The encoded protein may be localized to the nucleus as the protein contains several nuclear localization signals, and has similarity to a t.) synaptonemal complex protein. [provided by RefSeq, Aug 20111.
FANCA Exonic 2175 Fanconi anemia The Fanconi anemia complementation group (FANC) currently includes group A protein FANCA, FANCB, FANCC, FANCD I (also called BRCA2), FANCD2, isoform a FANCE, FANCF, FANCG, FANC1, FANCJ
(also called BRIP1), FANCL, ao FANCM and FANCN (also called PALB2). The previously defined group FANCH is the same as FANCA. Fanconi anemia is a genetically heterogeneous recessive disorder characterized by cytogenetic instability, hypersensitivity to DNA crosslinking agents, increased chromosomal breakage, and defective DNA
repair. The members of the Fanconi anemia complementation group do not share sequence similarity; they are related by their assembly into a common nuclear protein complex. This gene encodes the protein for complementation group A. Alternative splicing results in multiple transcript variants encoding different isoforms. Mutations in this gene are the most common cause of Fanconi anemia. [provided by RefSeq, Jul 2008]. Transcript Variant: This variant (1) represents the longer transcript and encodes the longer isoform (a).
FASN Exonic 2194 fatty acid synthase The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. [provided by RefSeq, Jul 2008].
FBX018 Exonic 84893 F-box only protein This gene encodes a member of the F-box protein family, members of which 18 isoform 1 are characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into three classes:
Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and ci) Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the Fbx class.
=-==
It contains an F-box motif and seven conserved helicase motifs, and has both DNA-dependent ATPase and DNA unwinding activities. Alternatively spliced transcript variants encoding distinct isoforms have been identified for this gene.
[provided by RefSeq, Jul 20081. Transcript Variant: This variant (1) encodes the t.) longer isoform (1).
FER I L4 Exonic 80307 N/A N/A
FHIT Exonic 2272 bis(5'-adenosyl)- This gene, a member of the histidine triad gene family, encodes a diadenosine triphosphatase 5',5 "-P1,P3-triphosphate hydrolase involved in purine metabolism. The gene ao encompasses the common fragile site FRA3B on chromosome 3, where carcinogen-induced damage can lead to translocations and aberrant transcripts of this gene. In fact, aberrant transcripts from this gene have been found in about half of all esophageal, stomach, and colon carcinomas. Alternatively spliced transcript variants have been found for this gene. [provided by RefSeq, Oct 20091. Transcript Variant: This variant (2) has an alternate splice site in the 3' UTR, as compared to variant 1. Both variants 1 and 2 encode the same protein.
FNTA Exonic 2339 protein Prenyltransferases can attach either a farnesyl group or a geranylgeranyl group farnesyltransferasc/ in thioether linkage to the cysteine residue of proteins with a C-terminal CAAX
geranylgeranyltrans box. CAAX geranylgeranyltransferase and CAAX
famesyltransferase are ferase type-1 heterodimers that share the same alpha subunit but have different beta subunits.
subunit alpha This gene encodes the alpha subunit of these transferases. Alternative splicing results in multiple transcript variants. Related pseudogenes have been identified on chromosomes 11 and 13. [provided by RefSeq, May 2010]. Transcript Variant: This variant (1) represents the longer transcript and encodes the functional protein.
FRGI Exonic 2483 protein FRG1 This gene maps to a location 100 kb centromeric of the repeat units on chromosome 4q35 which are deleted in facioscapulohumeral muscular dystrophy (FSHD). It is evolutionarily conserved and has related sequences on multiple human chromosomes but DNA sequence analysis did not reveal any homology to known genes. In vivo studies demonstrate the encoded protein is localized to the nucleolus. [provided by RefSeq, Jul 2008].
FSCN2 Exonic 25794 fascin-2 isoform 2 This gene encodes a member of the fascin protein family. Fascins crosslink ci) actin into filamentous bundles within dynamic cell extensions. This family member is proposed to play a role in photoreceptor disk morphogenesis. A
=-==
mutation in this gene results in one form of autosomal dominant retinitis ,J1 pigmentosa and macular degeneration. Multiple transcript variants encoding different isoforms have been found for this gene. [provided by RefSeq. Jul 20081. Transcript Variant: This variant (2) represents the longer transcript and t.) encodes the longer isoform (2). Sequence Note: This RefSeq record was created from transcript and genomic sequence data to make the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on transcript alignments.
ao FUT2 Exonic 2524 galactoside 2- The protein encoded by this gene is a Golgi stack membrane protein that is alpha-L- involved in the creation of a precursor of the H antigen, which is required for the fiicosyltransferase final step in the soluble A and B
antigen synthesis pathway. This gene is one of 2 two encoding the galactoside 2-L-fucosyltransferase enzyme. Two transcript variants encoding the same protein have been found for this gene. [provided by RefSeq, Jul 2008]. Transcript Variant: This variant (2) differs in the 5' UTR
compared to variant 1. Variants 1 and 2 both encode the same protein. Sequence Note: This RefSeq record was created from transcript and genomic sequence data because no single transcript was available for the full length of the gene.
The extent of this transcript is supported by transcript alignments. Sequence Note: This RefSeq record represents the SE*01.01.01 allele.
GATA6 Exonic 2627 transcription factor N/A

GIT2 Exonic 9815 ARF GTPase- This gene encodes a member of the GIT protein family, which interact with G 0 activating protein protein-coupled receptor kinases and possess ADP-ribosylation factor (ARF) GIT2 isoform 6 GTPase-activating protein (GAP) activity. GIT proteins traffic between cytoplasmic complexes, focal adhesions, and the cell periphery, and interact with Pak interacting exchange factor beta (PIX) to form large oligomeric complexes that transiently recruit other proteins. GIT proteins regulate cytoskeletal dynamics and participate in receptor internalization and membrane trafficking. This gene has been shown to repress lamellipodial extension and focal adhesion turnover, and is thought to regulate cell motility. This gene undergoes extensive alternative splicing to generate multiple isoforms, but the ci) full-length nature of some of these variants has not been determined. The various isoforms have functional differences, with respect to ARF GAP activity and to G protein-coupled receptor kinase 2 binding. [provided by RefSeq, Sep C.AJ

20081. Transcript Variant: This variant (6) lacks two in-frame exons in the 3' coding region and includes an additional short in-frame exon in the central coding region, compared to isoform 1. The resulting isoform (6) is missing two internal fragments and includes a 2 residue insertion, compared to isoform 1.
GLDC Exonic 2731 glycine Degradation of glycine is brought about by the glycine cleavage system, which dehydrogcnase is composed of four mitochondrial protein components: P protein (a pyridoxal [decarboxylating], phosphate-dependent glycine decarboxylasc), H protein (a lipoic acid-containing ao mitochondrial protein), T protein (a tetrahydrofolate-requiring enzyme), and L protein (a precursor lipoamide dehydrogenase). The protein encoded by this gene is the P protein, which binds to glycine and enables the methylamine group from glycine to be transferred to the T protein. Defects in this gene are a cause of nonketotic hyperglycinemia (NKH).[provided by RefSeq, Jan 2010].
GLRX Exonic 2745 glutaredoxin-1 This gene encodes a member of the glutaredoxin family. The encoded protein is a cytoplasmic enzyme catalyzing the reversible reduction of glutathione-protein mixed disulfides. This enzyme highly contributes to the antioxidant defense system. It is crucial for several signalling pathways by controlling the S-glutathionylation status of signalling mediators. It is involved in beta-amyloid toxicity and Alzheimer's disease. Multiple alternatively spliced transcript variants encoding the same protein have been identified. [provided by RefSeq, Aug 2011]. Transcript Variant: This variant (3) differs in the 3' UTR, compared to variant 1. Variants 1-4 encode the same protein. Sequence Note: This RefSeq record was created from transcript and genomic sequence data to make the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on transcript alignments.
GNE Exonic 10020 bifunctional UDP- The protein encoded by this gene is a bifunctional enzyme that initiates and N- regulates the biosynthesis of N-acetylneuraminic acid (NeuAc), a precursor of acetylglucosamine sialic acids. It is a rate-limiting enzyme in the sialic acid biosynthetic pathway.
2-epimerase/N- Sialic acid modification of cell surface molecules is crucial for their function in acetylmannosamine many biologic processes, including cell adhesion and signal transduction.
kinase isoform 4 Differential sialylation of cell surface molecules is also implicated in the ci) tumorigenicity and metastatic behavior of malignant cells. Mutations in this gene are associated with sialuria, autosomal recessive inclusion body myopathy, =-==
and Nonaka myopathy. Alternative splicing of this gene results in transcript C.AJ

variants encoding different isoforms. [provided by RefSeq, Jul 20081.
Transcript Variant: This variant (4) contains a different 5' terminal exon and lacks a 3' coding region segment, compared to transcript variant 1, which results in t.) translation initiation from an in-frame downstream AUG. The predicted protein (isoform 4) is shorter when it is compared to isoform 1. Sequence Note: This RefSeq record was created from transcript and genomic sequence data to make the sequence consistent with the reference genome assembly. The genomic ao coordinates used for the transcript record were based on transcript alignments.
GNRHR2 Exonic 114814 N/A In non-hominoid primates and non-mammalian vertebrates, the gonadotropin releasing hormone 2 receptor (GnRHR2) encodes a seven-transmembrane G-protein coupled receptor. However, in human, the N-terminus of the predicted protein contains a frameshift and premature stop codon. In human, GnRHR2 transcription occurs but the gene does not likely produce a functional C-terminal multi-transmembrane protein. A non-transcribed pseudogene of GnRHR2 is located on chromosome 14. [provided by RefSeq, Feb 20111. Publication Note:

This RefSeq record includes a subset of the publications that are available for this gene. Please see the Gene record to access additional publications.
GPR98 Exonic 84059 G-protein coupled This gene encodes a member of the G-protein coupled receptor superfamily.
receptor 98 The encoded protein contains a 7-transmembrane receptor domain, binds precursor calcium and is expressed in the central nervous system. Mutations in this gene are associated with Usher syndrome 2 and familial febrile seizures. Several alternatively spliced transcripts have been described. [provided by RefSeq, Jul 20081. Transcript Variant: This variant (1), also known as VEGR1b, encodes the predominant isoform (1). Publication Note: This RefSeq record includes a subset of the publications that are available for this gene. Please see the Gene record to access additional publications.
GPS1 Exonic 2873 COP9 signalosome This gene is known to suppress G-protein and mitogen-activated signal complex subunit 1 transduction in mammalian cells. The encoded protein shares significant isoform 2 similarity with Arabidopsis FUS6, which is a regulator of light-mediated signal transduction in plant cells. Two alternatively spliced transcript variants encoding ci) different isofonns have been found for this gene. [provided by RefSeq, Jul 20081. Transcript Variant: This variant (2) lacks an internal segment in the 5' region and uses an upstream translation start codon, as compared to variant 1.
It encodes isoform 2 which has a shorter and distinct N-terminus, as compared to isoform 1.
GRAMD4 Exonic 23151 GRAM domain- GRAMD4 is a mitochondrial effector of E2F1 (MIM 189971)-induced containing protein apoptosis (Stanelle et al., 2005 [PubMed 15565177]).[supplied by OMIM, Jan 4 20111. Sequence Note: This RefSeq record was created from transcript and gcnomic sequence data to make the sequence consistent with the reference genome assembly. The gcnomic coordinates used for the transcript record were ao based on transcript alignments.
GRIN2D Exonic 2906 glutamate [NMDA] N-methyl-D-aspartate (NMDA) receptors are a class of ionotropic glutamate receptor subunit receptors. NMDA channel has been shown to be involved in long-term epsilon-4 precursor potentiation, an activity-dependent increase in the efficiency of synaptic transmission thought to underlie certain kinds of memory and learning. NMDA
receptor channels are heteromers composed of the key receptor subunit NMDAR1 (GRIN1) and 1 or more of the 4 NMDAR2 subunits: NMDAR2A
(GRIN2A), NMDAR2B (GRIN2B), NMDAR2C (GRIN2C), and NMDAR2D

(GRIN2D). [provided by RefSeq, Mar 20101.
GRIPAP1 Exonic 56850 GRIP1-associated This gene encodes a guanine nucleotide exchange factor for the Ras family of protein 1 isoform 2 small G proteins (RasGEF). In brain studies, the encoded protein was found with the GRIP/AMPA receptor complex. Multiple alternatively spliced transcript variants have been described that encode different protein isofonns;

however, the full-length nature and biological validity of all of these variants have not been determined. [provided by RefSeq, Nov 20091. Transcript Variant:
This variant (2) lacks an alternate in-frame coding region segment and uses a different splice site in the 3' coding region, compared to variant 1. The reading frame is changed, such that the resulting protein (isoform 2) has a shorter and distinct C-terminus when compared to isoform 1.
GTPBP 10 Exonic 85865 GTP-binding Small G proteins, such as GTPBP10, act as molecular switches that play crucial protein 10 isoform roles in the regulation of fundamental cellular processes such as protein 1 synthesis, nuclear transport, membrane trafficking, and signal transduction (Hirano et al., 2006 [PubMed 170547261).[supplied by OMIM, Mar 20081.
ci) Transcript Variant: This variant (1) lacks alternate in-frame exons in the 5' coding region, compared to variant 2. The resulting protein (isoform 1) is shorter when it is compared to isoform 2. Sequence Note: This RefSeq record was created from transcript and genomic sequence data to make the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on transcript alignments.
t.) GYG2 Exonic 8908 glycogenin-2 This gene encodes a member of the glycogenin family. Glycogenin is a self-isoform a glucosylating protein involved in the initiation reactions of glycogen biosynthesis. A gene on chromosome 3 encodes the muscle glycogenin and this X-linked gene encodes the glycogenin mainly present in liver; both are involved ao in blood glucose homeostasis. This gene has a short version on chromosome Y, which is 3' truncated and can not make a functional protein. Multiple alternatively spliced transcript variants encoding different isoforms have been identified. provided by RefSeq, May 20101. Transcript Variant: This variant (1) lacks an in-frame exon in the CDS, as compared to variant 2. The resulting isoform (a) lacks an internal segment, as compared to isoform b.
H2AFB1 Exonic 474382 histone H2A-Bbd Histones are basic nuclear proteins that are responsible for the nucleosome type 1 structure of the chromosomal fiber in eukaryotes. Nucleosomes consist of approximately 146 bp of DNA wrapped around a histone octamer composed of pairs of each of the four core histoncs (H2A, H2B, H3, and H4). The chromatin fiber is further compacted through the interaction of a linker histonc, H1, with the DNA between the nucleosomes to form higher order chromatin structures.
This gene encodes a member of the histone H2A family. This gene is part of a region that is repeated three times on chromosome X, once in intron 22 of the gene and twice closer to the Xq telomere. This record represents the most centromeric copy which is in intron 22 of the F8 gene. [provided by RefSeq, Jul 2008].
H2AFB2 Exonic 474381 histone H2A-Bbd Histones are basic nuclear proteins that are responsible for the nucleosome type 2/3 structure of the chromosomal fiber in eukaryotes. Nucleosomes consist of approximately 146 bp of DNA wrapped around a histone octamer composed of pairs of each of the four core histones (H2A, H2B, H3, and H4). The chromatin fiber is further compacted through the interaction of a linker histone, HI, with the DNA between the nucleosomes to form higher order chromatin structures.
ci) This gene encodes a member of the histone H2A family. This gene is part of a region that is repeated three times on chromosome X, once in intron 22 of the gene and twice closer to the Xq telomere. This record represents the middle C.AJ

copy. [provided by RefSeq, Jul 2008]. Sequence Note: The RefSeq transcript and protein were derived from genomic sequence to make the sequence consistent with the reference genome assembly. The genomic coordinates used t.) for the transcript record were based on alignments.
H2AFB3 Exonic 83740 histone H2A-Bbd Histones are basic nuclear proteins that are responsible for the nucleosome type 2/3 structure of the chromosomal fiber in eukaryotcs. Nucleosomes consist of approximately 146 bp of DNA wrapped around a histonc octamer composed of ao pairs of each of the four core histones (H2A, H2B, H3, and H4). The chromatin fiber is further compacted through the interaction of a linker histone, H1, with the DNA between the nucleosomes to form higher order chromatin structures.
This gene encodes a member of the histone H2A family. This gene is part of a region that is repeated three times on chromosome X, once in intron 22 of the gene and twice closer to the Xq telomere. This record represents the most telomeric copy. [provided by RefSeq, Jul 2008].
HACEI Exonic 57531 E3 ubiquitin- N/A
protein ligase HCG9 Exonic 10255 N/A This gene lies within the MHC class I
region on chromosome 6p21.3. This gene is believed to be non-coding, but its function has not been determined.
[provided by RefSeq, Jul 20091.
HEATR4 Exonic 399671 HEAT repeat- N/A
containing protein HECTD1 Exonic 25831 E3 ubiquitin- N/A
protein ligase HFE2 Exonic 148738 hemojuvelin The product of this gene is involved in iron metabolism. It may be a component isoform c of the signaling pathway which activates hepcidin or it may act as a modulator of hepcidin expression. It could also represent the cellular receptor for hepcidin.
Alternatively spliced transcript variants encoding different isoforms have been ci) identified for this gene. Defects in this gene are the cause of hemochromatosis type 2A, also called juvenile hemochromatosis (JH). JH is an early-onset autosomal recessive disorder due to severe iron overload resulting in ,J1 hypogonadotrophic hypogonadism, hepatic fibrosis or cirrhosis and cardiomyopathy, occurring typically before age of 30. [provided by RefSeq, Jul 20081. Transcript Variant: This variant (c) lacks two segments in the 5' UTR
and =
an in-frame portion of the 5' coding region, compared to variant a. The resulting .., w , isoform (c) has a shorter N-terminus when compared to isoform a. Variants c -, t..) =
and d encode the same isoform (c).
=
HF'Ml Exonic 164045 probable ATP- N/A
ao dependent DNA
belicase HFM1 HGS Exonic 9146 hepatocyte growth The protein encoded by this gene regulates endosomal sorting and plays a factor-regulated critical role in the recycling and degradation of membrane receptors. The tyrosine kinase encoded protein sorts monoubiquitinated membrane proteins into the substrate multivesicular body, targeting these proteins for lysosome-dependent degradation. [provided by RefSeq, Dec 20101.
HGSNAT Exonic 138050 heparan-alpha- This gene encodes a lysosomal acetyltransferase, which is one of several P

glucosaminide N- enzymes involved in the lysosomal degradation of hcparin sulfate. Mutations in acetyltransferase this gene are associated with Sanfilippo syndrome C, one type of the lysosomal 0' , precursor storage disease mucopolysaccaridosis III, which results from impaired degradation of beparan sulfate. [provided by RefSeq, Jan 20091.
.
, HOMEZ Exonic 57594 homeobox and N/A

, leucine zipper o, protein Homez .
, IFNA1 Exonic 3439 interferon alpha- The protein encoded by this gene is produced by macrophages and has antiviral 1/13 precursor activity. This gene is intronless and the encoded protein is secreted. [provided by RefSeq, Sep 20111.
IFNA22P Exonic 3453 N/A N/A
IL1RAPL1 Exonic 1 1 14 1 interleukin-1 The protein encoded by this gene is a member of the interleukin 1 receptor .o receptor accessory family and is similar to the interleukin 1 accessory proteins. It is most closely n protein-like 1 related to interleukin 1 receptor accessory protein-like 2 (IL1RAPL2). This gene -i precursor and IL1RAPL2 are located at a region on chromosome X that is associated with ci) i.1 X-linked non-syndromic mental retardation. Deletions and mutations in this .., w gene were found in patients with mental retardation. This gene is expressed at a -i-high level in post-natal brain structures involved in the hippocampal memory ,J1 .1, ca a system, which suggests a specialized role in the physiological processes underlying memory and learning abilities. [provided by RefSeq, Jul 20081.
IL32 Exonic 9235 interleukin-32 This gene encodes a member of the cytokine family. The protein contains a t.) isoform D tyrosine sulfation site, 3 potential N-myristoylation sites, multiple putative phosphorylation sites, and an RGD cell-attachment sequence. Expression of this protein is increased after the activation of T-cells by mitogens or the activation of NK cells by 1L-2. This protein induces the production of TNFalpha from ao macrophage cells. Alternate transcriptional splice variants, encoding different isofomis, have been characterized. [provided by RefSeq, Jul 20081. Transcript Variant: This variant (7) lacks two alternate exons in the 5' UTR and an alternate in-frame exon within the coding region, compared to variant 1, resulting in a shorter protein (isoform D).
IMP3 Exonic 55272 U3 small nucleolar This gene encodes the human homolog of the yeast Imp3 protein. The protein ribonucleoprotein localizes to the nucleoli and interacts with the U3 snoRNP complex. The protein protein IMP3 contains an S4 domain. [provided by RefSeq, Jul 20081.
IN080D Exonic 54891 1N080 complex N/A
subunit D
1NTS2 Exonic 57508 integrator complex 1NTS2 is a subunit of the Integrator complex, which associates with the C-subunit 2 terminal domain of RNA polymerase II
large subunit (POLR2A; MIM 180660) and mediates 3-prime end processing of small nuclear RNAs Ul (RNU1; MIM
180680) and U2 (RNU2; MIM 180690) (Baillat et al., 2005 [PubMed 16239144]) .[supplied by OMIM, Mar 20081. Transcript Variant: This variant (1) is the protein-coding variant. Sequence Note: This RefSeq record was created from transcript and genomic sequence data because no single transcript was available for the full length of the gene. The extent of this transcript is supported by transcript alignments.
IRAK2 Exonic 3656 interleukin-1 IRAK2 encodes the interleukin-1 receptor-associated kinase 2, one of two receptor-associated putative serine/threonine kinases that become associated with the interleukin-1 kinase-like 2 receptor (IL1R) upon stimulation.
IRAK2 is reported to participate in the IL1-induced upregulation of NF-kappaB. [provided by RefSeq, Jul 20081.
ci) ITGA10 Exonic 8515 integrin alpha-10 Integrins are integral membrane proteins composed of an alpha chain and a beta precursor chain, and are known to participate in cell adhesion as well as cell-surface =-==
mediated signalling. The I-domain containing alpha 10 combines with the integrin beta 1 chain (ITGB1) to form a novel collagen type II-binding integrin expressed in cartilage tissue. [provided by RefSeq, Jul 20081.
KALI Exonic 3730 anosmin-1 Mutations in this gene cause the X-linked Kallmann syndrome. The encoded t.) precursor protein is similar in sequence to proteins known to function in neural cell adhesion and axonal migration. In addition, this cell surface protein is N-glycosylated and may have anti-protcase activity. [provided by RcfSeq, Jul 2008].
ao KCND1 Exonic 3750 potassium voltage- Voltage-gated potassium (Kv) channels represent the most complex class of gated channel voltage-gated ion channels from both functional and structural standpoints.
subfamily D Their diverse functions include regulating neurotransmitter release, heart rate, member 1 insulin secretion, neuronal excitability, epithelial electrolyte transport, smooth precursor muscle contraction, and cell volume.
Four sequence-related potassium channel genes - shaker, shaw, shab, and shal - have been identified in Drosophila, and each has been shown to have human homolog(s). This gene encodes a member of the potassium channel, voltage-gated, shal-related subfamily, members of which form voltage-activated A-type potassium ion channels and are prominent in the repolarization phase of the action potential. This gene is expressed at moderate levels in all tissues analyzed, with lower levels in skeletal muscle.

[provided by RefSeq, Jul 20081. Sequence Note: The RefSeq transcript and protein were derived from genomic sequence to make the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on alignments.
KIAA0562 Exonic N/A N/A N/A
KIAA1267 Exonic 284058 MLL1/MLL N/A
complex subunit KIAA1267 isoform KIAA1432 Exonic 57589 protein RIC1 N/A
homolog isofomi b KIF12 Exonic 113220 kinesin-like protein KIF12 is a member of the kinesin superfamily of microtubule-associated ci) KIF12 molecular motors (see MIM 148760) that play important roles in intracellular transport and cell division (Nakagawa et al., 1997 [PubMed 9275178]).[supplied by OMIM, Mar 20081.

KIF26B Exonic 55083 kinesin-like protein N/A

K1F7 Exonic 374654 kinesin-like protein This gene encodes a cilia-associated protein belonging to the kinesin family.
KIF7 This protein plays a role in the sonic hedgehog (SHH) signaling pathway through the regulation of GLI transcription factors. It functions as a negative regulator of the SHIA pathway by preventing inappropriate activation of GL12 in the absence of ligand, and as a positive regulator by preventing the processing of ao GLI3 into its repressor forni. Mutations in this gene have been associated with various ciliopathies. [provided by RefSeq, Oct 20111.
KIRREL3 Exonic 84623 kin of IRRE-like The protein encoded by this gene is a member of the nephrin-like protein protein 3 isoform 2 family. These proteins are expressed in fetal and adult brain, and also in precursor podocytes of kidney glomeruli. The cytoplasmic domains of these proteins interact with the C-terminus of podocin, also expressed in the podocytes, cells involved in ensuring size- and charge-selective ultrafiltration. Mutations in this gene are associated with mental retardation autosomal dominant type 4 (MRD4). Alternatively spliced transcript variants encoding different isofomis have been found for this genc.[provided by RefSeq, Sep 20091. Transcript Variant: This variant (2) includes an alternate segment at the 3' end compared to variant 1. This results in a frame-shift, and a shorter isoform (2) with a distinct C-terminus compared to isoform 1.
KLHDC4 Exonic 54758 kelch domain- N/A
containing protein 4 isoform 2 KLHL9 Exonic 55958 kelch-like protein 9 N/A
KRT6C Exonic 286887 keratin, type 11 Keratins are intermediate filament proteins responsible for the structural cytoskeletal 6C integrity of epithelial cells and are subdivided into epithelial keratins and hair keratins. The type II keratins are clustered in a region of chromosome 12q13.
[provided by RefSeq, Jul 20091.
LAMC3 Exonic 10319 laminin subunit Laminins, a family of extracellular matrix glycoproteins, are the major gamma-3 precursor noncollagenous constituent of basement membranes. They have been implicated ci) in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins are composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, Bl, and C.AJ

B2, respectively) and they form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains.
Each laminin chain is a multidomain protein encoded by a distinct gene. Several t.) isoforms of each chain have been described. Different alpha, beta and gamma chain isomers combine to give rise to different heterotrimeric laminin isoforms which are designated by Arabic numerals in the order of their discovery, i.e.
alphalbetalgammal heterotrimer is laminin 1. The biological functions of the ao different chains and trimer molecules are largely unknown, but some of the chains have been shown to differ with respect to their tissue distribution, presumably reflecting diverse functions in vivo. This gene encodes the gamma chain isoform laminin, gamma 3. The gamma 3 chain is most similar to the gamma 1 chain, and contains all the 6 domains expected of the gamma chain. It is a component of laminin 12. The gamma 3 chain is broadly expressed in skin, heart, lung, and the reproductive tracts. In skin, it is seen within the basement membrane of the dermal-epidermal junction at points of nerve penetration.
Gamma 3 is also a prominent element of the apical surface of ciliated epithelial cells of lung, oviduct, epididymis, ductus deferens, and seminiferous tubules.
The distribution of gamma 3-containing laminins along ciliated epithelial surfaces suggests that the apical laminins are important in the morphogenesis and structural stability of the ciliated processes of these cells. [provided by RefSeq, Aug 2011J.

LBH Exonic 81606 protein LBH N/A
LCE1C Exonic 353133 late comified N/A
envelope protein LEP Exonic 3952 leptin precursor This gene encodes a protein that is secreted by white adipocytes, and which plays a major role in the regulation of body weight. This protein, which acts through the leptin receptor, functions as part of a signaling pathway that can inhibit food intake and/or regulate energy expenditure to maintain constancy of the adipose mass. This protein also has several endocrine functions, and is ci) involved in the regulation of immune and inflammatory responses, hematopoiesis, angiogenesis and wound healing. Mutations in this gene and/or its regulatory regions cause severe obesity, and morbid obesity with ,J1 hypogonadism. This gene has also been linked to type 2 diabetes mellitus development. [provided by RefSeq. Jul 20081. Sequence Note: This RefSeq record was created from transcript and genomic sequence data to make the t.) sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on transcript alignments.
LEPR Exonic 3953 leptin receptor The protein encoded by this gene belongs to the gpI30 family of cytokinc isoform 3 precursor receptors that are known to stimulate gene transcription via activation of ao cytosolic STAT proteins. This protein is a receptor for leptin (an adipocyte-specific hormone that regulates body weight), and is involved in the regulation of fat metabolism, as well as in a novel hematopoietic pathway that is required for normal lymphopoiesis. Mutations in this gene have been associated with obesity and pituitary dysfunction. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. It is noteworthy that this gene and LEPROT gene (GeneID:54741) share the same promoter and the first 2 exons, however, encode distinct proteins (PMID:9207021).[provided by RefSeq, Nov 20101. Transcript Variant: This variant (6) contains alternate 5' UTR and 3' terminal exon compared to variant 1, resulting in a shorter isoform (3) with a distinct C-terminus compared to isoform 1. Variants 3 and 6 encode the same isoform.
LIPT1 Exonic 51601 lipoyltransferase 1, The process of transferring lipoic acid to proteins is a two-step process. The m itoch ond ri al first step is the activation of lipoic acid by lipoate-activating enzyme to form precursor lipoyl-AMP. For the second step, the protein encoded by this gene transfers the lipoyl moiety to apoproteins. Alternative splicing results in multiple transcript variants. A related pseudogene has been identified on chromosome 13. Read-through transcription also exists between this gene and the neighboring downstream mitochondrial ribosomal protein L30 (MRPL30) gene. [provided by RefSeq, Mar 20111. Transcript Variant: This variant (1) encodes the same protein as variants 3-6.
LIX1L Exonic 128077 LIX I -like protein N/A
L0C10013330 Exonic 100133 N/A N/A
ci) L0C10028918 Exonic 100289 transmembrane N/A
7 187 protein 225-like C.AJ

L0C10028965 Exonic 100289 N/A N/A
6 , 656 , .
. 0 LOC148696 Exonic 148696 N/A N/A
t.) =
L0C158696 Exonic 158696 N/A N/A
-, w , L0C255025 Exonic 255025 N/A N/A
-, t..) =
L0C342346 Exonic N/A N/A N/A
L0C349408 Exonic N/A N/A N/A
ao L0C388387 Exonic 388387 N/A N/A
L0C400456 Exonic 400456 N/A N/A
L0C401109 Exonic 401109 N/A N/A
L00646278 Exonic 646278 N/A N/A
L00729678 Exonic 729678 N/A N/A
L0C91316 Exonic N/A N/A N/A
L0C92659 Exonic 92659 N/A N/A
P
LRRC33 Exonic 375387 lcucinc-rich repeat- N/A

containing protein .9 o' 33 precursor LRRC45 Exonic 201255 leucine-rich repeat- N/A
'g ,."
containing protein , , .9 LYSMD3 Exonic 116068 lysM and putative N/A
peptidoglycan-binding domain-containing protein MAFG Exonic 4097 transcription factor Globin gene expression is regulated through nuclear factor erythroid-2 (NFE2) MafG elements located in enhancer-like locus control regions positioned many kb .o upstream of alpha- and beta-gene clusters (summarized by Blank et al., 1997 n -i [PubMed 9166829]). NFE2 DNA-binding activity consists of a heterodimer containing a ubiquitous small Maf protein (MafF, MIM 604877; MafG; or ci) t.) =
MafK, MIM 600197) and the tissue-restricted protein p45 NFE2 (MIM 601490).
-, w Both subunits are members of the activator protein-1-like superfamily of basic =-==
leucine zipper (bZIP) proteins (see MIM 165160).[supplied by OMIM, Mar ,J1 = P, a 20101. Transcript Variant: This variant (2) differs in the 5 UTR compared to variant 1. Both variants 1 and 2 encode the same protein. Sequence Note: This RefSeq record was created from transcript and genomic sequence data to make t.) the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on transcript alignments.
Publication Note: This RefSeq record includes a subset of the publications that are available for this gene. Please see the Gene record to access additional ao publications.
MAN2C1 Exonic 4123 alpha-mannosidase N/A

MAOA Intronic 4128 amine oxidase This gene encodes monoamine oxidase A, an enzyme that degrades amine [flavin-containing] neurotransmitters, such as dopamine, norepinephrine, and serotonin. The protein A localizes to the mitochondrial outer membrane. The gene is adjacent to a related gene on the opposite strand of chromosome X. Mutation in this gene results in monoamine oxidase deficiency, or Brunner syndrome. [provided by RefSeq, Jul 2008].
MAP3 K9 Exonic 4293 mitogen-activated N/A
protein kinase kinase kinase 9 MAPKAPK5 Exonic 8550 MAP kinase- The protein encoded by this gene is a member of the serine/threonine kinase activated protein family. In response to cellular stress and proinflammatory cytokines, this kinase kinase 5 isoform 2 is activated through its phosphorylation by MAP kinases including MAPK1/ERK, MAPK14/p38-alpha, and MAPK11/p38-beta. In vitro, this kinase phosphorvlates heat shock protein HSP27 at its physiologically relevant sites. Two alternately spliced transcript variants of this gene encoding distinct isoforms have been reported. [provided by RefSeq, Jul 20081. Transcript Variant: This variant (2) contains an extra 6 nt segment in the coding region when compared to variant 1. It encodes an isoform (2) longer by 2 aa, as compared to isoform 1.
MAS 1 Exonic 4142 proto-oncogene The structure of the MASI product indicates that it belongs to the class of ci) Mas receptors that are coupled to GTP-binding proteins and share a conserved structural motif, which is described as a '7-transmembrane segment' following the prediction that these hydrophobic segments form membrane-spanning alpha-helices. The MASI protein may be a receptor that, when activated, modulates a critical component in a growth-regulating pathway to bring about oncogenic effects. [provided by RefSeq, Jul 2008].
MBLAC2 Exonic 153364 metallo-beta- N/A
lactamase domain-containing protein ao MGAM Exonic 8972 maltase- This gene encodes maltase-glucoamylase, which is a brush border membrane glucoamylase, enzyme that plays a role in the final steps of digestion of starch. The protein has intestinal two catalytic sites identical to those of sucrase-isomaltase, but the proteins are only 59% homologous. Both are members of glycosyl hydrolase family 31, which has a variety of substrate specificities. [provided by RefSeq, Jul 20081.
MICAL3 Exonic 57553 protein MICAL-3 N/A
isoform 3 MIR1184-1 Exonic 100302 N/A microRNAs (miRNAs) arc short (20-24 nt) non-coding RNAs that are involved 111 in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III
enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop. [provided by RefSeq, Sep 20091. Sequence Note: This record represents a predicted microRNA stem-loop as defmed by miRBase. Some sequence at the 5' and 3' ends may not be included in the intermediate precursor miRNA produced by ci) Drosha cleavage.
MIR1184-2 Exonic 100422 N/A microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved 985 in post-transcriptional regulation of gene expression in multicellular organisms ,J1 C.AJ

by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-t.) coding. The primary transcript is cleaved by the Drosha ribonuclease III
enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature ao miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop. [provided by RefSeq, Sep 20091. Sequence Note: This record represents a predicted microRNA stem-loop as defined by miRBase. Some sequence at the 5' and 3' ends may not be included in the intermediate precursor miRNA produced by Drosha cleavage.

MIR1184-3 Exonic 100422 N/A microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved 977 in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase 11 as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III
enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop. [provided by RefSeq, Sep 20091. Sequence Note: This record represents a predicted microRNA stem-loop as defined by miRBase. Some sequence at the 5' and 3' ci) ends may not be included in the intermediate precursor miRNA produced by Drosha cleavage.
MIR125A Exonic 406910 N/A microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved L-4 C.AJ

l=J
Co4 IN) in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are ao transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III
enzyme to produce an approximately 70-nt stem-loop precursor miRNA (prc-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop. [provided by RefSeq, Sep 2009]. Sequence Note: This record represents a predicted microRNA stem-loop as defined by miRBase. Some sequence at the 5 and 3' ends may not be included in the intermediate precursor miRNA produced by Drosha cleavage. Publication Note: This RefSeq record includes a subset of the publications that are available for this gene. Please see the Gene record to access additional publications.
MIR1302-1 Exonic 100302 N/A microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that arc involved 227 in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase 11 as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III
enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target tuRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop. [provided by RefSeq, Sep 2009]. Sequence Note: This record represents a predicted microRNA stem-loop as defined by mikBase. Some sequence at the 5' and 3' -o-JI

l=J
Co4 IN) ends may not be included in the intermediate precursor miRNA produced by Drosha cleavage.
ao MIR1322 Exonic 100302 N/A microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved 166 in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuelease III
enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonucicase to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefScq represents the predicted microRNA stem-loop. [provided by RefSeq, Sep 2009]. Sequence Note: This record represents a predicted microRNA stem-loop as defined by miRBase. Some sequence at the Sand 3' ends may not be included in the intermediate precursor miRNA produced by Drosha cleavage.

1-µ
MIR1470 Exonic 100302 N/A microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved 127 in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs, miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III
enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop. [provided by RefSeq, Sep 2009]. Sequence Note: This record represents a predicted -o-JI

l=J
Co4 IN) microRNA stem-loop as defined by miRBasc. Some sequence at the 5 and 3' ends may not be included in the intermediate precursor miRNA produced by ao Drosha cleavage.
MIR26B Exonic 407017 N/A microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III
enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target 0 mRNA. The RefSeq represents the predicted microRNA stem-loop. [provided by RefSeq, Sep 2009]. Sequence Note: This record represents a predicted microRNA stem-loop as defined by miRBase. Some sequence at the 5' and 3' ends may not be included in the intermediate precursor miRNA produced by Drosha cleavage. Publication Note: This RefSeq record includes a subset of the publications that are available for this gene. Please see the Gene record to access additional publications.

MIR3186 Exonic 100422 N/A microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved 944 in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III
enzyme to produce an approximately 70-nt stein-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and -o-JI
.6- =

l=J
Co4 most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stern-loop. [provided ao by RefSeq, Sep 2009]. Sequence Note: This record represents a predicted microRNA stem-loop as defined by miRBase. Some sequence at the 5 and 3' ends may not be included in the intennediate precursor miRNA produced by Drosha cleavage.
MIR516B2 Exonic 574485 N/A microRNAs (miRNAs) are short (20-24 at) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymcrasc II as part of capped and polyadenylatcd primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III
enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop. [provided 1-µ
by RefSeq, Sep 2009]. Sequence Note: This record represents a predicted microRNA stem-loop as defined by miRBase. Some sequence at the 5' and 3' ends may not be included in the intermediate precursor miRNA produced by Drosha cleavage.
MIR548Y Exonic 100500 N/A microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved 919 in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs, miRNAs are transcribed by RNA polymerase 11 as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III
enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which ni -o-JI
.6- =

l=J
Co4 IN) recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target ao mRNA The RefSeq represents the predicted microRNA stem-loop. [provided by RefSeq, Sep 2009]. Sequence Note: This record represents a predicted microRNA stem-loop as defined by miRBase. Some sequence at the 5 and 3' ends may not be included in the intermediate precursor miRNA produced by Drosha cleavage.
MIR663 Exonic N/A N/A N/A
MIR99B Exonic 407056 N/A microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III
enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA The RefSeq represents the predicted microRNA stem-loop [provided by RefSeq, Sep 2009]. Sequence Note: This record represents a predicted microRNA stem-loop as defined by miRBase. Some sequence at the 5' and 3' ends may not be included in the intermediate precursor miRNA produced by Drosha cleavage.
M1RLE 17h Exonic 406887 N/A microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs, miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III
enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the -o-JI

l=J
Co4 mature miRNA and antiscnsc miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which ao recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop. [provided by RefSeq, Sep 20091. Sequence Note: This record represents a predicted microRNA stem-loop as defined by miRBase. Some sequence at the Sand 3' ends may not be included in the intermediate precursor miRNA produced by Drosha cleavage.
MITD1 Exonic 129531 MIT domain- N/A
containing protein MMP25 Exonic 64386 matrix Proteins of the matrix metalloproteinase (MMP) family are involved in the metalloproteinase- breakdown of extracellular matrix in normal physiological processes, such as 25 preproprotein embryonic development, reproduction, and tissue remodeling, as well as in disease processes, such as arthritis and metastasis. Most MIMPs are secreted as inactive proproteins which are activated when cleaved by extracellular proteinases. However, the protein encoded by this gene is a member of the membrane-type MMP (MT-MMP) subfamily, attached to the plasma membrane 1-µ
via a glyeosylphosphatidyl inositol anchor. In response to bacterial infection or inflammation, the encoded protein is thought to inactivate alpha-1 proteinase inhibitor, a major tissue protectant against proteolytic enzymes released by activated neutrophils, facilitating the transendothelial migration of neutrophils to inflammatory sites The encoded protein may also play a role in tumor invasion and metastasis through activation of MMP2. The gene has previously been referred to as MMP20 but has been renamed MMP25 [provided by RefSeq, Jul 2008].
MNS1 Exonic 55329 meiosis-specific This gene encodes a protein highly similar to the mouse meiosis-specific nuclear structural nuclear structural 1 protein. The mouse protein was shown to be expressed at the protein 1 pachytene stage during spermatogenesis and may function as a nuclear skeletal protein to regulate nuclear morphology during meiosis. [provided by RefSeq.
Oct 2008].
*1:1 MR1 Exonic 3140 major N/A

-o-Ji l=J
Co4 histocompatibility complex class I-ao related gene protein isoform 4 precursor MRPL12 Exonic 6182 39S ribosomal Mammalian mitochondrial ribosomal proteins arc encoded by nuclear genes protein L12, and help in protein synthesis within the mitochondrion. Mitochondrial mitochondrial ribosomes (mitoribosomes) consist of a small 28S
subunit and a large 39S
precursor subunit. They have an estimated 75% protein to rRNA composition compared to prokaryotic ribosomes, where this ratio is reversed. Another difference between mammalian mitoribosomes and prokaryotic ribosomes is that the latter contain a 5S rRNA. Among different species, the proteins comprising the mitoribosome differ greatly in sequence, and sometimes in biochemical properties, which prevents easy recognition by sequence homology. This gene encodes a 39S
subunit protein which forms homodimers. In prokaryotic ribosomes, two L7/L12 dimers and one L10 protein form the L8 protein complex. [provided by RefSeq, Jul 2008].
MRPL30 Exonic 51263 39S ribosomal Mammalian mitochondrial ribosomal proteins are encoded by nuclear genes protein L30, and help in protein synthesis within the mitochondrion. Mitochondrial mitochondrial ribosomes (mitoribosomes) consist of a small 285 subunit and a large 395 precursor subunit. They have an estimated 75% protein to rRNA composition compared to prokaryotic ribosomes, where this ratio is reversed. Another difference between mammalian mitoribosomes and prokaryotic ribosomes is that the latter contain a 5S rRNA. Among different species, the proteins comprising the mitoribosome differ greatly in sequence, and sometimes in biochemical properties, which prevents easy recognition by sequence homology. This gene encodes a 39S
subunit protein. Alternative splicing results in multiple transcript variants.
Pseudogenes corresponding to this gene are found on chromosomes 6p and 12p.
Read-through transcription also exists between this gene and the neighboring upstream lipoyltransferase 1 (LIPT1) gene. [provided by RefSeq, Mar 20111.
Transcript Variant. This variant (1) represents the longer transcript and encodes the supported protein. Sequence Note: This RefSeq record was created from transcript and genomic sequence data to make the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript -o-JI

l=J
Co4 record were based on transcript alignments.
MTRNR2L6 Exonic 100463 humanin-like N/A
482 protein 6 MYADML2 Exonic 255275 myeloid-associated N/A
differentiation marker-like protein MYBL1 Exonic 4603 myb-related protein N/A
A isofonn 2 MYH6 Exonic 4624 myosin-6 Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. [provided by RefSeq, Mar 2010].
MY0 18B Exonic 84700 myosin-XVIfib The protein encoded by this gene may regulate muscle-specific genes when in the nucleus and may influence intracellular trafficking when in the cytoplasm.

The encoded protein functions as a homodimer and may interact with F actin.
Mutations in this gene are associated with lung cancer. [provided by RefSeq, Jul 0 2008].

N4BP2 Exonic 55728 NEDD4-binding This gene encodes a protein containing a polynucleotide kinase domain (PNK) protein 2 near the N-terminal region, and a Small MutS
Related (Smr) domain near the C- 0 terminal region. The encoded protein can bind to both B-cell leukemia/lymphoma 3 (BCL-3) and neural precursor cell expressed, developmentally downregulated 4, (Nedd4) proteins. This protein binds and hydrolyzes ATP, may function as a 5.-polynucleotide kinase, and has the capacity to be a ubiquitylation substrate. This protein may play a role in transcription-coupled DNA repair or genetic recombination. [provided by RefSeq, Jul 2008]. Sequence Note: This RefSeq record was created from transcript and genomic sequence data to make the sequence consistent with the reference gcnome assembly. The gcnomic coordinates used for the transcript record were based on transcript alignments.
NACAD Exonic 23148 NAC-alpha -o-JI

l=J
Co4 IN) domain-containing protein 1 NAT8 Exonic 9027 probable N- This gene, isolated using the differential display method to detect tissue-acetyltransferase 8 specific genes, is specifically expressed in kidney and liver. The encoded protein shows amino acid sequence similarity to N-acetyltransferases. A
similar protein in Xenopus affects cell adhesion and gastrulation movements, and may be localized in the secretory pathway. A highly similar paralog is found in a cluster with this gene. [provided by RefSeq, Sep 2008].
NCRNA00085 Exonic N/A N/A N/A
NDNL2 Exonic 56160 melanoma- The protein encoded by this gene is part of the SMC5-6 chromatin reorganizing associated antigen complex and is a member of the MAGE superfamily. This is an intronless gene 01 [provided by RefSeq, May 2011].
NDRG I Exonic 10397 protein NDRG1 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein involved in stress responses, hormone responses, cell growth, and differentiation. It is necessary for p53-mediated caspase activation and apoptosis. Mutation in this gene has been reported to be causative for hereditary motor and sensory ncuropathy-Lom. Multiple alternatively spliced variants, encoding the same protein, have been identified. [provided by RefSeq, 1-µ
Sep 20081. Transcript Variant. This variant (2) uses an alternate splice site in the 5' UTR. Both variants 1 and 2 encode the same protein.
NE01 Exonic 4756 neogenin isoform 1 This gene encodes a cell surface protein that is a member of the 0 precursor immunoglobulin superfamily. The encoded protein consists of four N-terminal immunoglobulin-like domains, six fibronectin type HI domains, a transmembrane domain and a C-terminal internal domain that shares homology with the tumor suppressor candidate gene DCC. This protein may be involved in cell growth and differentiation and in cell-cell adhesion. Defects in this gene are associated with cell proliferation in certain cancers. Alternate splicing results in multiple transcript variants. [provided by RefSeq, Feb 2010]. Transcript Variant:
This variant (1) represents the longest transcript and encodes the longest isoforni (1).
NFIA Exonic 4774 nuclear factor 1 A- This gene encodes a member of the NF1 (nuclear factor 1) family of type isofonn 4 transcription factors. Multiple transcript variants encoding different isofonns 1-3 cit -o-JI

l=J
Co4 have bccn found for this gene. [provided by RefSeq, Sep 2011]. Transcript Variant. This variant (4) differs in the 5' UTR and coding region compared to variant 1. The resulting protein (isoform 4) has a longer and distinct N-terminus compared to isoform 1. Sequence Note: This RefSeq record was created from transcript and genomic sequence data to make the sequence consistent with the reference genomc assembly. The gcnomic coordinates used for the transcript record were based on transcript alignments.
NOTUM Exonic 147111 protein notum N/A
homolog precursor NPB Exonic 256933 neuropeptide B Neuropeptide B (NPB) is an endogenous peptide ligand for G protein-coupled preproprotein receptor-7 (GPR7; MIM 600730).[supplied by OMIM, Apr 2004].
NPLOC4 Exonic 55666 nuclear protein N/A
localization protein 4 homolog NRXN1 Exonic 9378 neurexin-1-beta Neurexins function in the vertebrate nervous system as cell adhesion molecules isoform beta and receptors. Two neurexin genes are among the largest known in human precursor (NRXN1 and NRXN3). By using alternate promoters, splice sites and exons, predictions of hundreds or even thousands of distinct mRNAs have been made.
Most transcripts use the upstream promoter and encode alpha-neurexin isofomis; fewer transcripts are produced from the downstream promoter and encode beta-neurexin isoforms. Alpha-neurexins contain epidermal growth factor-like (EGF-like) sequences and laminin G domains, and they interact with neurexophilins. Beta-neurexins lack EGF-like sequences and contain fewer laminin G domains than alpha-neurexins. The RefSeq Project has decided to create only a few representative transcript variants of the multitude that are possible. [provided by RefSeq, Oct 2008]. Transcript Variant: This variant (beta) represents a beta neurexin transcript. It is transcribed from a downstream promoter, includes a different segment for its 5' UTR and 5' coding region, and lacks most of the 5 exons present in alpha transcripts, as compared to variant a1pha2. The resulting protein (isoform beta) has a shorter and distinct N-terminus when it is compared to isoform a1pha2. Sequence Note: The RefSeq transcript and protein were derived from transcript and gcnomic sequence to make the sequence consistent with the reference genome assembly. The cit L.) JI

ts.) Co4 genomic coordinates used for the transcript record were based on alignments.
NRXN3 Exonic 9369 neurexin-3-beta Neurexins are a family of proteins that function in the vertebrate nervous system ao isoform 3 precursor as cell adhesion molecules and receptors. They are encoded by several unlinked genes of which two, NRXN1 and NRXN3, are among the largest known human genes. Three of the genes (NRXN1-3) utilize two alternate promoters and include numerous alternatively spliced exons to generate thousands of distinct mRNA transcripts and protein isoforms. The majority of transcripts are produced from the upstream promoter and encode alpha-nenrexin isofonns; a much smaller number of transcripts are produced from the downstream promoter and encode beta-neurexin isofornis. The alpha-ncurexins contain epidermal growth factor-like (EGF-like) sequences and laminin G domains, and have been shown to interact with neurexophilins. The beta-neurexins lack EGF-like sequences and contain fewer laminin G domains than alpha-neurexins.
[provided by RefSeq, Jul 20081. Transcript Variant: This variant (3) differs in the 5' UTR and has multiple coding region differences, compared to variant 1.
The resulting isofonn (3) has a shorter and distinct N-terminus when compared to isoform 1. Publication Note: This RefSeq record includes a subset of the publications that are available for this gene. Please see the Gene record to access additional publications.
NSDHL Exonic 50814 sterol-4-alpha- The protein encoded by this gene is localized in the endoplasmic reticultun and carboxylate 3- is involved in cholesterol biosynthesis.
Mutations in this gene are associated dehydrogenase, with CHILD syndrome, which is a X-linked dominant disorder of lipid decarboxylating metabolism with disturbed cholesterol biosynthesis, and typically lethal in males. Alternatively spliced transcript variants with differing 5' UTR have been found for this gene. [provided by RefSeq, Jul 2008]. Transcript Variant: This variant (1) represents the more predominant transcript. Transcript variants 1 and 2 encode the same protein.
NSF Exonic 4905 WA N/A
NUDT17 Exonic 200035 nucleoside .. N/A
diphosphate-linked moiety X motif 17 NUP 155 Exonic 9631 nuclear pore Nucicoporins are the main components of the nuclear pore complex (NPC) of *L:1 complex protein eukaryotic cells. They are involved in the bidirectional trafficking of molecules, l=J
Co4 Nup155 isoform 2 especially mRNAs and proteins, between thc nucleus and the cytoplasm. The protein encoded by this gene does not contain the typical FG repeat sequences ao found in most vertebrate nucleoporins. Two protein isofonns are encoded by transcript variants of this gene. [provided by RefSeq, Jul 2008]. Transcript Variant: This variant (2) uses alternate splicing in the 5' region and a downstream start codon, compared to variant 1. Isoforrn 2 has a shorter N-terminus, compared to isoform 1.
ODZ 1 Exonic 10178 teneurin-1 isoform The protein encoded by this gene belongs to the tenascin family and teneurin 3 subfamily. It is expressed in the neurons and may function as a cellular signal transducer. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene. [provided by RefSeq, Sep 2009].
Transcript Variant: This variant (3) lacks an in-frame coding exon compared to variant 1. This results in a shorter isoform (3) missing an internal 7 an protein segment compared to isoform 1. Sequence Note: This RefSeq record was created from transcript and gcnomic sequence data to make the sequence txt consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on transcript alignments.
OFD1 Exonic 8481 oral-facial-digital This gene is located on the X chromosome and encodes a centrosomal protein.
syndrome 1 protein A knockout mouse model has been used to study the effect of mutations in this gene. The mouse gene is also located on the X chromosome, however, unlike the human gene it is not subject to X inactivation. Mutations in this gene are associated with oral-facial-digital syndrome type I and Simpson-Golabi-Behmel syndrome type 2. Many pseudogenes have been identified; a single pseudogene is found on chromosome 5 while as many as fifteen have been found on the Y
chromosome. Alternatively spliced transcripts have been described for this gene but the biological validity of these transcripts has not been determined.
[provided by RefSeq, Jul 2008].
0R2T8 Exonic 343172 olfactory receptor Olfactory receptors interact with odorant molecules in the nose, to initiate a 2T8 neuronal response that triggers the perception of a smell. The olfactory receptor proteins are members of a large family of G-protein-coupled receptors (GPCR) arising from single coding-exon genes. Olfactory receptors share a 7-transmembrane domain structure with many neurotransmitter and hormone receptors and are responsible for the recognition and G protein-mediated ni -o-JI
t .6- =

l=J
Co4 Ira transduction of odorant signals. Thc olfactory receptor gene family is the largest in the genome. The nomenclature assigned to the olfactory receptor genes and proteins for this organism is independent of other organisms. [provided by RefSeq, Jul 2008].
0R4A5 Exonic 81318 olfactory receptor Olfactory receptors interact with odorant molecules in the nose, to initiate a 4A5 neuronal response that triggers the perception of a smell. The olfactory receptor proteins are members of a large family of G-protein-coupled receptors (GPCR) arising from single coding-exon genes. Olfactory receptors share a 7-transmembrane domain structure with many neurotransmitter and hormone receptors and are responsible for the recognition and G protein-mediated transduction of odorant signals. The olfactory receptor gene family is the largest in the genome. The nomenclature assigned to the olfactory receptor genes and proteins for this organism is independent of other organisms. [provided by RefSeq, Jul 20081. Sequence Note: The RefSeq transcript and protein were derived from genomic sequence to make the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on alignments.
0R52E4 Exonic 390081 olfactory receptor Olfactory receptors interact with odorant molecules in the nose, to initiate a 52E4 neuronal response that triggers the perception of a smell. The olfactory receptor 0 1-µ
proteins are members of a large family of G-protein-coupled receptors (GPCR) an from single coding-exon genes. Olfactory receptors share a 7-transmembrane domain structure with many neurotransmitter and hormone receptors and are responsible for the recognition and G protein-mediated transduction of odorant signals. The olfactory receptor gene family is the largest in the genome. The nomenclature assigned to the olfactory receptor genes and proteins for this organism is independent of other organisms. [provided by RefSeq, Jul 2008].
OR52N1 Exonic 79473 olfactory receptor Olfactory receptors interact with odorant molecules in the nose to initiate a 52N1 neuronal response that triggers the perception of a smell. The olfactory receptor proteins are members of a large family of G-protein-coupled receptors (GPCR) arising from single coding-exon genes. Olfactory receptors share a 7-transmembrane domain structure with many neurotransmitter and hormone receptors and are responsible for the recognition and G protein-mediated ni -o-JI
.6- =

l=J
Co4 IN) transduction of odorant signals. Thc olfactory receptor gene family is the largest in the genome. The nomenclature assigned to the olfactory receptor genes and 1¨r ao proteins for this organism is independent of other organisms. [provided by RefSeq, Jul 2008].
0R6Y1 Exonic 391112 olfactory receptor Olfactory receptors interact with odorant molecules in the nose, to initiate a 6Y1 neuronal response that triggers the perception of a smell. The olfactory receptor proteins are members of a large family of G-protein-coupled receptors (GPCR) arising from single coding-exon genes. Olfactory receptors share a 7-transmembrane domain structure with many neurotransmitter and hormone receptors and are responsible for the recognition and G protein-mediated transduction of odorant signals. The olfactory receptor gene family is the largest in the genome. The nomenclature assigned to the olfactory receptor genes and proteins for this organism is independent of other organisms. [provided by RefSeq, Jul 2008].
OSTCL Exonic N/A N/A N/A

s, OTUD5 Exonic 55593 OTU domain- This gene encodes a member of the OTU (ovarian tumor) domain-containing containing protein cysteine protease superfamily. The OTU domain confers deubiquitinase activity isoform b and the encoded protein has been shown to suppress the type I interferon-dependent innate immune response by cleaving the polyubiquitin chain from an essential type I interferon adaptor protein. Cleavage results in disassociation of the adaptor protein from a downstream signaling complex and disruption of the type I interferon signaling cascade. Alternatively spliced transcript variants encoding different isofonns have been described. [provided by RefSeq, Oct 2008], Transcript Variant: This variant (3) differs in the 3' UTR and lacks an in-frame portion of an internal coding axon, compared to variant 1, resulting in a shorter protein compared to isoform a. Variants 2 and 3 encode the same isoform (b).
P4IM Exonic 5034 protein disulfide- This gene encodes the beta subunit of prolyl 4-hydroxylase. a highly abundant isomerase multifunctional enzyme that belongs to the protein disulfide isomerase family.
precursor When present as a tetramer consisting of two alpha and two beta subunits, this enzyme is involved in hydroxylation of prolyl residues in preprocollagen. This enzyme is also a disulfide isomerase containing two thioredoxin domains that catalyze the formation, breakage and rearrangement of disulfide bonds. Other ni -o-JI

l=J
Co4 known functions include its ability to act as a chaperone that inhibits aggregation of misfolded proteins in a concentration-dependent manner, its ao ability to bind thyroid hormone, its role in both the influx and efflux of S-nitrosothiol-bound nitric oxide, and its function as a subunit of the microsomal triglyceride transfer protein complex. [provided by RefSeq, Jul 2008].
PACSIN3 Exonic 29763 protein kinase C This gene is a member of the protein kinase C and casein kinase substrate in and casein kinase neurons family. The encoded protein is involved in linking the actin substrate in cytoskeleton with vesicle formation. Alternative splicing results in multiple neurons protein 3 transcript variants. [provided by RefSeq, May 2010]. Transcript Variant: This variant (3) differs in the 5' UTR compared to variant 1. Variants 1,2 and 3 encode the same protein.
PCDH15 Exonic 65217 protocalherin-15 This gene is a member of the cadherin superfamily. Family members encode isoform CD1-4 integral membrane proteins that mediate calcium-dependent cell-cell adhesion.
precursor It plays an essential role in maintenance of normal retinal and cochlear function.

Mutations in this gene result in hearing loss and Usher Syndrome Type IF
(USH1F). Extensive alternative splicing resulting in multiple isoforms has been observed in the mouse ortholog. Similar alternatively spliced transcripts are inferred to occur in human, and additional variants are likely to occur.
[provided by RefSeq, Dec 2008]. Transcript Variant: This variant (C) lacks two alternate in-frame exons in the 5' and 3' coding region, compared to variant A. The resulting isoform (CD14) lacks a 5-an segment near the N-terminus and a 2-aa segment near the C-terminus, compared to isoform CD1-1. Publication Note:

This RefSeq record includes a subset of the publications that are available for this gene. Please see the Gene record to access additional publications.
PCYT2 Exonic 5833 ethanolamine- This gene encodes an enzyme that catalyzes the formation of CDP-phosphate ethanolamine from CTP and phosphoethanolamine in the Kennedy pathway of cytidylyltransferase phospholipid synthesis. Alternative splicing results in multiple transcript isoform 1 variants. [provided by RefSeq, May 2010].
Transcript Variant: This variant (1) encodes the longer isofomi (1). Sequence Note: This RefSeq record was created from transcript and genomic sequence data to make the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on transcript alignments.
PDCD6IP Exonic 10015 programmed cell This gene encodes a protein thought to participate in programmed cell death. 1-3 ni L.) JI

ts.) Co4 death 6-interacting Studies using mouse cells have shown that overexpression of this protein can protein isoform 2 block apoptosis. In addition, the product of this gene binds to the product of the ao PDCD6 gene, a protein required for apoptosis, in a calcium-dependent manner.
This gene product also binds to endophilins, proteins that regulate membrane shape during endocytosis. Overexpression of this gene product and endophihns results in cytoplasmic vacuolization, which may be partly responsible for the protection against cell death. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene. [provided by RefSeq, Jun 2009]. Transcript Variant: This variant (2) uses an alternative in-frame acceptor splice site at an internal coding exon compared to variant 1. This results in an isoform (2) 5 aa longer than isoform 1.
PDE I OA Exonic 10846 cAMP and cAMP- Various cellular responses are regulated by the second messengers cAMP and inhibited cGMP cGMP. Phosphodiesterases, such as PDEIOA, eliminate cAMP- and cGMP-3',5'-cyclic mediated intracellular signaling by hydrolyzing the cyclic nucleotide to the phosphodicsterase corresponding nucleoside 5-prime monophosphatc (Fujishigc et al., 2000 10A isofonn 2 [PubMed 10998054]).[supplied b OMIM, Mar 2008].
Transcript Variant: This variant (2) has an additional exon in the 5' region, which includes an in-frame AUG start codon, as compared to variant 1. The resulting isoform (2) has an alternate and shorter N-terminus, as compared to isoform 1.

1-µ
PDE6G Exonic 5148 retinal rod This gene encodes the gamma subunit of cyclic GMP-phosphodiesterase, which rhodopsin-sensitive is composed of alpha- and beta- catalytic subunits and two identical, inhibitory cGMP 3',5'-cyclic gamma subunits. This gene is expressed in rod photoreceptors and functions in phosphodiesterase the phototransduction signaling cascade. It is also expressed in a variety of other subunit gamma tissues, and has been shown to regulate the c-Src protein kinase and G-protein-coupled receptor kinase 2. Alternative splicing results in multiple transcript variants. [provided by RefSeq, Feb 20091. Transcript Variant: This variant (1) represents the longer transcript.
PDLIM3 Exonic 27295 PDZ and LIM The protein encoded by this gene contains a PDZ domain and a LIM domain, domain protein 3 indicating that it may be involved in cytoskeletal assembly. In support of this, isoform a the encoded protein has been shown to bind the spectrin-like repeats of alpha-actinin-2 and to colocalize with alpha-actinin-2 at the Z lines of skeletal muscle.
This gene is found near a region of chromosome 4 that has been implicated in *L:1 facioscapulohuineral muscular dystrophy, but this gene does not appear to be ni JI
-o-r.) l=J
Co4 IN) involved in the disease. Two transcript variants encoding different isoforms have been found for this gene. [provided by RefSeq, Jul 2008]. Transcript Variant: This variant (I) represents the longer transcript and encodes the longer isoform (a). Sequence Note: This RefSeq record was created from transcript and genomic sequence data to make the sequence consistent with the reference gcnome assembly. The genomic coordinates used for the transcript record were based on transcript alignments.
PEBP4 Exonic 157310 phosphatidylethano The phosphatidylethanolamine (PE)-binding proteins, including PEBP4, are an lamine-binding evolutionarily conserved family of proteins with pivotal biologic functions, such protein 4 precursor as lipid binding and inhibition of scrinc proteascs (Wang et al., 2004 [PubMcd 15302887]).[supplied by OMIM, Dec 2008].
PEX11B Exonic 8799 peroxisomal .. N/A
membrane protein 11B isoform 1 PHF1 Exonic 5252 PHD finger protein This gene encodes a Polycomb group protein. The protein is a component of a I isoform a historic H3 lysine-27 (H3K27)-specific methyltransferase complex, and functions in transcriptional repression of homeotic genes. The protein is also recruited to double-strand breaks, and reduced protein levels results in X-ray sensitivity and increased homologous recombination. Multiple transcript 1-µ
variants encoding different isoforms have been found for this gene. [provided by RefSeq, May 2009]. Transcript Variant: This variant (1), uses an alternate splice site and lacks an alternate exon in the 3 coding region resulting in a frameshitl, 0 compared to variant 2. The resulting isoform (a) has a shorter and distinct C-terminus, compared to isoform b.
PIA S3 Exonic 10401 El SUMO-protein This gene encodes a member of the PIAS [protein inhibitor of activated STAT
ligase PIAS3 (signal transducer and activator of transcription)] family of transcriptional modulators. The protein functions as a SUMO (small ubiquitin-like modifier)-E3 ligase which catalyzes the covalent attachment of a SUMO protein to specific target substrates. It directly binds to several transcription factors and either blocks or enhances their activity. Alternatively spliced transcript variants of this gene have been identified, but the full-length nature of some of these variants has not been determined. [provided by RefSeq, Jul 2008].
PINX1 Exonic 54984 PIN2/TERF1-L.) JI

l=J
Co4 IN) interacting telomeirase inhibitor 1 PKD1L2 Exonic 114780 polycystic kidney This gene encodes a member of the polycystin protein family. The encoded disease protein 1- protein contains 11 transmembrane domains, a latrophilin/CL-1-like GPCR
like 2 isoform a proteolytic site (GPS) domain, and a polycystin-1, lipoxygenase, alpha-toxin precursor (PLAT) domain. This protein may function as a component of cation channel pores. Two transcript variants encoding different isofom-is have been found for this gene. [provided by RefSeq, Jul 20081. Transcript Variant: This variant (1) represents the longer transcript, and encodes the longer isoforin (a).
PLA2G15 Exonic 23659 group XV Lysophospholipases are enzymes that act on biological membranes to regulate phospholipase A2 the multifunctional lysophospholipids. The protein encoded by this gene precursor hydrolyzes lysophosphatidylcholine to glycerophosphorylcholine and a free fatty acid. This enzyme is present in the plasma and thought to be associated with high-density lipoprotein. A later paper contradicts the function of this gene.
It demonstrates that this gene encodes a lysosomal enzyme instead of a lysophospholipase and has both calcium-independent phospholipase A2 and transacylase activities. [provided by RefSeq, Jul 2008].
PLAA Exonic 9373 phospholipase A-2- N/A
activating protein PMS2 Exonic 5395 N/A. This gene is one of the PMS2 gene family members found in clusters on chromosome 7. The product of this gene is involved in DNA mismatch repair. It forms a heterodimer with MLH1 and this complex interacts with other complexes bound to mismatched bases. Mutations in this gene are associated with hereditary nonpolyposis colorectal cancer, Turcot syndrome, and are a cause of supratentorial primitive neuroectodemial tumors. Alternatively spliced transcript variants have been observed for this gene. [provided by RefSeq, Jul 2008] Transcript Variant: This variant (2) uses an alternate acceptor splice site at exon 2, resulting in a frame-shift and premature translation termination, rendering the transcript susceptible to nonsense-mediated mRNA decay (NMD).
PNKD Exonic 25953 probable hydrolase This gene is thought to play a role in the regulation of myofibrillogenesis.
PNKD isoform 1 Mutations in this gene have been associated with the movement disorder precursor paroxysmal non-kinesigenic dyskinesia.
Alternative splicing results in multiple L.) JI
to.) ts.) Co4 transcript variants. [provided by RefSeq, Mar 2010]. Transcript Variant: This variant (1), alternately referred to as the long form (MR-1L), represents the ao longest transcript and encodes the longest isoform (1). Publication Note: This RefSeq record includes a subset of the publications that are available for this gene. Please see the Gene record to access additional publications.
PNLIPRP3 Exonic 119548 pancreatic lipase- .. N/A
related protein 3 precursor POLR3C Exonic 10623 DNA-directed N/A
RNA polymerase III subunit RPC3 P0LR3G Exonic 10622 DNA-directed N/A
RNA polymerase III subunit RPC7 POLR3GL Exonic 84265 DNA-directed .. N/A
RNA polymemse III subunit RPC7-like POTEA Exonic 340441 POTE ankyrin .. N/A
domain family member A isoform POU5F1P3 Exonic 642559 N/A N/A
PRDM6 Exonic 93166 putative historic- N/A
lysine N-methyltransfcrase PREPL Exonic 9581 prolyl The protein encoded by this gene belongs to the prolyl oligopeptidase endopeptidase-like subfamily of serine peptidases. Mutations in this gene have been associated with isoform 4 hypotonia-cystinuria syndrome, also known as the 2p21 deletion syndrome.
Several alternatively spliced transcript variants encoding either the same or different isoforms have been described for this gene.[provided by RefSeq, Jan 2010]. Transcript Variant: This variant (7, also known as variant B) contains an 1-3 JI
r.) l=J
Co4 IN) alternate exon at the 5 end compared to variant 1, resulting in translation initiation from an in-frame downstream AUG and a shorter isofonn (4) ao compared to isoform 1 Variants 6 and 7 encode the same isofonn. Sequence Note: This RefSeq record was created from transcript and genomic sequence data to make the sequence consistent with the reference genome assembly. The gcnomic coordinates used for the transcript record were based on transcript alignments.
PRSS38 Exonic 339501 serine protease 38 N/A
precursor PSG3 Exonic 5671 pregnancy-specific The human pregnancy-specific glycoproteins (PSGs) are a family of proteins beta-1-glya)protein that are synthesized in large amounts by placental trophoblasts and released into 3 precursor the maternal circulation during pregnancy.
Molecular cloning and analysis of several PSG genes has indicated that the PSGs form a subgroup of the careinoembiyonic antigen (CEA) gene family, which belongs to the immunoglobulin superfamily of genes. Members of the CEA family consist of a single N domain, with structural similarity to the immunoglobulin variable domains, followed by a variable number of immunoglobulin constant-like A
and/or B domains. Most PSGs have an arg-gly-asp (RGD) motif, which has been shown to function as an adhesion recognition signal for several integrins, 0 1-µ
in the N-terminal domain (summary by Teglund et al., 1994 [PubMed 7851896]) For additional general information about the PSG gene family, see PSGI (MIM 176390) .[supplied by OMIM, Oct 2009].

PSG8 Exonic 440533 pregnancy-specific The human pregnancy-specific glycoproteins (PSGs) are a group of molecules beta-1-glycoprotein that are mainly produced by the placental syncytiotrophoblasts during 8 isoform a pregnancy. PSGs comprise a subgroup of the carcinoembryonic antigen (CEA) precursor family, which belongs to the immunoglobulin superfamily. For additional general information about the PSG gene family, see PSG1 (MIM
176390) [supplied by OMIM, Oct 20091. Transcript Variant: This variant (1) encodes the longest isoform (a).
PSMB I Exonic 5689 proteasome subunit The proteasome is a multicatalytic proteinase complex with a highly ordered beta type-1 ring-shaped 20S core structure. The core structure is composed of 4 rings of 28 non-identical subunits; 2 rings are composed of 7 alpha subunits and 2 rings are 1T1 composed of 7 beta subunits. Proteasomes are distributed throughout eukaryotic -o-JI

l=J
Co4 cells at a high concentration and cleave peptides in an ATP/ubiquitin-dependent process in a non-lysosomal pathway. An essential function of a modified ao proteasome, the immunoproteasome, is the processing of class I ME-IC peptides.
This gene encodes a member of the proteasome B-type family, also known as the T1B family, that is a 20S core beta subunit. This gene is tightly linked to the TBP (TATA-binding protein) gene in human and in mouse, and is transcribed in the opposite orientation in both species. [provided by RefSeq. Jul 20081.
PYCR1 Exonic 5831 pyrroline-5- This gene encodes an enzyme that catalyzes the NAD(P)H-dependent carboxylate conversion of pyrroline-5-carboxylate to proline. This enzyme may also play a reductase 1, physiologic role in the generation of NADP( ) in some cell types. The protein mitochondrial forms a homopolymer and localizes to the mitochondrion. Alternate splicing isoform I results in two transcript variants encoding different isoforms. [provided by RefSeq, Jul 2008]. Transcript Variant: This variant (1) encodes the longer isoform (1) of this protein.
PYROXD1 Exonie 79912 pyridine N/A
nucleotide-disulfide oxidoreductase domain-containing protein 1 RAB 11FIP4 Exonic 84440 rob 11 family- Proteins of the large Rab GTPase family (see RAB1A; MIM 179508) have interacting protein regulatory roles in the fbnnation, targeting, and fusion of intracellular transport 4 vesicles. RAB11FIP4 is one of many proteins that interact with and regulate Rab GTPases (Hales et al., 2001 [PubMed 11495908]).[supplied by OMIMõApr 2008].
RAB32 Exonic 10981 ms-related protein Small GTP-binding proteins of the RAB family, such as RAB32, play essential Rab-32 roles in vesicle and granule targeting (Bao et al., 2002 [PubMed 11784320]).[supplied by OMIM, Aug 2009] Sequence Note: removed 2 bases from the 5' end that did not align to the reference genome assembly.
RABEPK Exonic 10244 rab9 effector N/A
protein with kelch motifs isoform b RAC3 Exonic 5881 ras-related C3 The protein encoded by this gene is a GTPase which belongs to the RAS 1-3 -o-JI

l=J
Co4 IN) botulinum toxin superfamily of small GTP-binding proteins.
Members of this superfamily appear substrate 3 to regulate a diverse an-ay of cellular events, including the contml of cell ao growth, cytoskeletal reorganization, and the activation of protein kinases [provided by RefSeq, Jul 20081.
RARRES3 Exonic 5920 retinoic acid Retinoids exert biologic effects such as potent growth inhibitory and cell receptor responder differentiation activities and are used in the treatment of ltyperproliferative protein 3 dermatological diseases. These effects are mediated by specific nuclear receptor proteins that are members of the steroid and thyroid hormone receptor superfamily of transcriptional regulators. RARRES1, RARRES2, and RARRES3 are genes whose expression is upregulated by the synthetic rctinoid tazarotene. RARRES3 is thought act as a tumor suppressor or growth regulator.
[provided by RefSeq, Jul 2008].
RASGEF1A Exonic 221002 ras-GEF domain- N/A
containing family member lA

ott RBM8A Exonic 9939 RNA-binding This gene encodes a protein with a conserved RNA-binding motif The protein protein 8A is found predominantly in the nucleus, although it is also present in the cytoplasm. It is preferentially associated with mRNAs produced by splicing, including both nuclear mRNAs and newly exported cytoplasmic mRNAs. It is 1-µ
thought that the protein remains associated with spliced mRNAs as a tag to indicate where introns had been present, thus coupling pre- and post-mRNA
splicing events. Previously, it was thought that two genes encode this protein, 0 RBI'vl8A and RBM8B; it is now thought that the RBM8B locus is a pseudogcne.
Two alternative start codons result in two forms of the protein, and this gene also uses multiple polyadenylation sites [provided by RefSeq, Jul 20081 Sequence Note: This RefSeq record was created from transcript and genomic sequence data to make the sequence consistent with the reference genome assembly. The genomic coordinates used for the transcript record were based on transcript alignments.
RECQL Exonic 5965 ATP-dependent The protein encoded by this gene is a member of the RecQ DNA helicase DNA helicase Q1 family. DNA helicases are enzymes involved in various types of DNA repair, including mismatch repair, nucleotide excision repair and direct repair. Some members of this family are associated with genetic disorders with predisposition 1-3 -o-JI

DEMANDES OU BREVETS VOLUMINEUX
LA PRESENTE PARTIE DE CETTE DEMANDE OU CE BREVETS
COMPREND PLUS D'UN TOME.
CECI EST LE TOME 1 ________________ DE 2 NOTE: Pour les tomes additionels, veillez contacter le Bureau Canadien des Brevets.
JUMBO APPLICATIONS / PATENTS
THIS SECTION OF THE APPLICATION / PATENT CONTAINS MORE
THAN ONE VOLUME.

NOTE: For additional volumes please contact the Canadian Patent Office.

Claims (19)

1. A method of screening a subject comprising:
providing a panel of nucleic acid biomarkers for a Pervasive Developmental Disorder (PDD) or a Pervasive Developmental Disorder ¨ Not Otherwise Specified (PDD-NOS) characterised in that the panel comprises at least one low frequency genomic DNA
(gDNA) variation biomarker for each of a plurality of genetic loci, said low frequency gDNA variation biomarkers occurring at a frequency of 0.1% or less in a population of subjects without a diagnosis of the PDD or PDD-NOS who are ethnically matched to the test subject, the genetic loci being selected from the genes in the polynucleotides of SEQ
ID NOs 644 to 2417 and 2558 to 2739, and wherein the panel comprises at least 50 low frequency gDNA variation biomarkers, and wherein each gDNA variation biomarker is a copy number variation (CNV) encoded by SEQ ID NOs 1 to 643 or 2418 to 2557, or is a variation that results in a nonsense or frameshift mutation in one of the said genes;
assaying a nucleic acid sample obtained from the subject to detect sequence information for the genetic loci; and comparing the panel of nucleic acid biomarkers to the sequence information for the genetic loci in order to detect if low frequency gDNA variation biomarkers in the panel are present in the sequence information.
2. The method of claim 1 further comprising diagnosing one or more subjects for the presence or absence of, or an altered susceptibility to a PDD or a PDD-NOS.
3. The method of claim 2, wherein the one or more subjects is diagnosed with the PDD or PDD-NOS if a gDNA variation biomarker of the panel is present.
4. The method of claim 2 or 3, wherein the one or more subjects is not diagnosed with PDD
or PDD-NOS if none of the gDNA variation biomarkers of the panel is present.
5. The method of any one of claims 1 to 4, wherein the at least one gDNA
variation biomarker comprises one or more point mutations, polymorphisms, translocations, insertions, deletions, amplifications, inversions, microsatellites, interstitial deletions, copy number variations (CNVs), single nucleotide variations (SNVs) or any combination thereof.

Date recue / Date received 2021-12-15
6. The method of any one of claims 1 to 5, wherein the nucleic acid information is detected or obtained by one or more methods selected from the group consisting of PCR, sequencing, Northern blots, fluorescence in situ hybridization (FISH), Invader assay, microarrays, and any combination thereof.
7. The method of any one of claims 1 to 6, wherein the nucleic acid information is detected by analysing the whole genome or whole exome from the one or more subjects or obtaining nucleic acid information from in silico analysis of a previously obtained exome or whole genome sequence from the one or more subjects.
8. The method of any one of claims 1 to 7, comprising comparing the nucleic acid information to those of one or more other subjects.
9. The method of claim 8, wherein the one or more subjects comprise:
one or more subjects not suspected of having the PDD or the PDD-NOS;
one or more subjects suspected of having the PDD or the PDD-NOS;
one or more subjects with the PDD or the PDD-NOS;
one or more subjects who are symptomatic for the PDD or the PDD-NOS;
one or more subjects who are asymptomatic for the PDD or the PDD-NOS;
one or more subjects that have an increased susceptibility to the PDD or the PDD-NOS;
one or more subjects that have a decreased susceptibility to the PDD or the PDD-NOS;
or one or more subjects receiving a treatment, therapeutic regimen, or any combination thereof for a PDD or PDD-NOS.
10. The method of any one of claims 1 to 9 wherein the screening the one or more subjects further comprises selecting one or more therapies based on the presence or absence of the one or more gDNA variations.
11. The method of any one of claims 2 to 10, wherein the PDD is Autism Spectrum Disorder (ASD), or wherein the PDD-NOS is Asperger Syndrome, Rett Syndrome, Childhood Disintegrative Disorder, or Fragile X syndrome.

Date recue / Date received 2021-12-15
12. The method of claim 11, wherein the one or more subjects has at least one symptom of a PDD.
13. The method of claim 12, wherein the PDD is ASD.
14. A method for measuring expression levels of biomarkers for a PDD or a PDD-NOS in a subject, comprising quantifying the level of nucleic acid expression products from two or more or at least 5, 10, or 25 genes as defined in claim 1 in a sample from the subject.
15. A method for measuring expression levels of polypeptides comprising:
a) selecting a panel of low frequency biomarkers as defined in claim 1;
c) creating an antibody or aptamer panel for each biomarker in the panel;
d) using the antibody or aptamer panel to bind the polypeptides in a sample from an individual; and e) quantifying levels of the polypeptides bound from the sample to the antibody or aptamer panel.
16. The method of claim 14 or 15, wherein the nucleic acid expression levels for the sample are compared to a control nucleic acid expression level, or wherein the polypeptide levels of the biological sample are increased or decreased compared to the polypeptide levels of a control biological sample.
17. The method of claim 16, wherein the comparison of nucleic acid expression level or quantified levels of polypeptides are used in the management of patient care in PDD or PDD-NOS.
18. The method of claim 17, wherein the management of patient care includes one or more of risk assessment, early diagnosis, establishing prognosis, monitoring patient treatment, and detecting treatment efficacy.
19. The method of claim 16, wherein the comparison of nucleic acid expression level or quantified levels of polypeptides are used in the discovery of a therapeutic intervention of PDD or PDD-NOS.

Date recue / Date received 2021-12-15
CA2863887A 2012-02-09 2013-02-08 Methods of screening low frequency gdna variation biomarkers for pervasive developmental disorder (pdd) or pervasive developmental disorder - not otherwise specified (pdd_nos) Active CA2863887C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201261633323P 2012-02-09 2012-02-09
US61/633,323 2012-02-09
PCT/US2013/025436 WO2013120018A1 (en) 2012-02-09 2013-02-08 Methods and compositions for screening and treating developmental disorders

Publications (2)

Publication Number Publication Date
CA2863887A1 CA2863887A1 (en) 2013-08-15
CA2863887C true CA2863887C (en) 2023-01-03

Family

ID=48948073

Family Applications (1)

Application Number Title Priority Date Filing Date
CA2863887A Active CA2863887C (en) 2012-02-09 2013-02-08 Methods of screening low frequency gdna variation biomarkers for pervasive developmental disorder (pdd) or pervasive developmental disorder - not otherwise specified (pdd_nos)

Country Status (5)

Country Link
US (2) US10407724B2 (en)
EP (1) EP2812452B1 (en)
CA (1) CA2863887C (en)
DK (1) DK2812452T3 (en)
WO (1) WO2013120018A1 (en)

Families Citing this family (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10522240B2 (en) 2006-05-03 2019-12-31 Population Bio, Inc. Evaluating genetic disorders
US7702468B2 (en) 2006-05-03 2010-04-20 Population Diagnostics, Inc. Evaluating genetic disorders
EP2601609B1 (en) 2010-08-02 2017-05-17 Population Bio, Inc. Compositions and methods for discovery of causative mutations in genetic disorders
US10221454B2 (en) 2011-10-10 2019-03-05 The Hospital For Sick Children Methods and compositions for screening and treating developmental disorders
US11180807B2 (en) 2011-11-04 2021-11-23 Population Bio, Inc. Methods for detecting a genetic variation in attractin-like 1 (ATRNL1) gene in subject with Parkinson's disease
DK2812452T3 (en) 2012-02-09 2020-06-29 Population Bio Inc METHODS AND COMPOSITIONS FOR SCREENING AND TREATING DEVELOPMENT DISORDERS
US10039777B2 (en) 2012-03-20 2018-08-07 Neuro-Lm Sas Methods and pharmaceutical compositions of the treatment of autistic syndrome disorders
EP2895621B1 (en) 2012-09-14 2020-10-21 Population Bio, Inc. Methods and compositions for diagnosing, prognosing, and treating neurological conditions
US10233495B2 (en) 2012-09-27 2019-03-19 The Hospital For Sick Children Methods and compositions for screening and treating developmental disorders
US10328062B2 (en) 2014-04-04 2019-06-25 Amgen, Inc. Biomarkers and use of MET inhibitor for treatment of cancer
GB2558326B (en) 2014-09-05 2021-01-20 Population Bio Inc Methods and compositions for inhibiting and treating neurological conditions
US20210357481A1 (en) 2014-10-06 2021-11-18 State Farm Mutual Automobile Insurance Company Medical diagnostic-initiated insurance offering
US11574368B1 (en) 2014-10-06 2023-02-07 State Farm Mutual Automobile Insurance Company Risk mitigation for affinity groupings
US10664920B1 (en) 2014-10-06 2020-05-26 State Farm Mutual Automobile Insurance Company Blockchain systems and methods for providing insurance coverage to affinity groups
WO2016077273A1 (en) * 2014-11-11 2016-05-19 Q Therapeutics, Inc. Engineering mesenchymal stem cells using homologous recombination
US20170369945A1 (en) * 2014-12-29 2017-12-28 The Board Of Trustees Of The Leland Stanford Junior University Methods of diagnosing autism spectrum disorders
KR101745590B1 (en) * 2015-11-13 2017-06-09 고려대학교 산학협력단 Optical Microscopy for RNA Splicing on Single Molecules Using Scattering Intensity, Localized Surface Plasmon Resonance and Surface-Enhanced Raman Scattering
US11753682B2 (en) 2016-03-07 2023-09-12 Father Flanagan's Boys'Home Noninvasive molecular controls
US10240205B2 (en) 2017-02-03 2019-03-26 Population Bio, Inc. Methods for assessing risk of developing a viral disease using a genetic test
US10618932B2 (en) 2017-02-21 2020-04-14 Arizona Board Of Regents On Behalf Of Arizona State University Method for targeted protein quantification by bar-coding affinity reagent with unique DNA sequences
US11266677B2 (en) 2017-03-10 2022-03-08 Memorial Sloan Kettering Cancer Center Methods for treatment or prevention of leukemia
EP3652743A1 (en) * 2017-09-07 2020-05-20 Liposcience, Inc. Multi-parameter metabolic vulnerability index evaluations
US11591656B2 (en) 2017-09-07 2023-02-28 The Children's Hospital Of Philadelphia Association of genetic variations to diagnose and treat attention-deficit hyperactivity disorder (ADHD)
US20200354419A1 (en) * 2017-11-03 2020-11-12 Hunterian Medicine Llc Compositions and methods of use thereof for the treatment of duchenne muscular dystrophy
WO2019233921A1 (en) 2018-06-05 2019-12-12 F. Hoffmann-La Roche Ag Oligonucleotides for modulating atxn2 expression
HRP20221504T1 (en) 2018-08-08 2023-03-31 Pml Screening, Llc Methods for assessing the risk of developing progressive multifocal leukoencephalopathy caused by john cunningham virus by genetic testing
KR20210095859A (en) * 2018-09-25 2021-08-03 에모리 유니버시티 Nucleic Acids for Cell Recognition and Integration
US11286485B2 (en) 2019-04-04 2022-03-29 Hoffmann-La Roche Inc. Oligonucleotides for modulating ATXN2 expression
CN110777204B (en) * 2019-11-23 2023-04-07 中南大学湘雅三医院 Application of MAFG-AS1 knock-out reagent in preparation of medicines for treating bladder cancer
US11434493B2 (en) 2020-02-05 2022-09-06 Viridos, Inc. Regulatory sequences for expression of transgenes
CA3173971A1 (en) * 2020-03-06 2021-09-10 Regeneron Pharmaceuticals, Inc. Fascin-2 (fscn2) variants and uses thereof
WO2021236832A1 (en) * 2020-05-19 2021-11-25 Nucleus Biologics Media formulations and production
CN111784585B (en) * 2020-09-07 2020-12-15 成都纵横自动化技术股份有限公司 Image splicing method and device, electronic equipment and computer readable storage medium
WO2022094720A1 (en) * 2020-11-06 2022-05-12 The Hospital For Sick Children System and method for cancer-cell specific transcription identification
WO2022240762A1 (en) * 2021-05-10 2022-11-17 University Of Iowa Research Foundation Targeted massively parallel sequencing for screening of genetic hearing loss and congenital cytomegalovirus- associated hearing loss
WO2023159225A2 (en) * 2022-02-18 2023-08-24 Precidiag, Inc. Microbial signatures of autism spectrum disorder
WO2023244737A1 (en) * 2022-06-16 2023-12-21 Immunovec, Inc. Improved enhancers and vectors
WO2024092095A1 (en) * 2022-10-27 2024-05-02 The Broad Institute, Inc. Systems, methods, and compositions for treating vascular disease
CN116083458B (en) * 2023-02-20 2024-06-11 中南大学湘雅医院 Mucopolysaccharide storage disease IIIC pathogenic mutant gene and application thereof

Family Cites Families (161)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3625214A (en) 1970-05-18 1971-12-07 Alza Corp Drug-delivery device
US4906474A (en) 1983-03-22 1990-03-06 Massachusetts Institute Of Technology Bioerodible polyanhydrides for controlled drug delivery
US4683195A (en) 1986-01-30 1987-07-28 Cetus Corporation Process for amplifying, detecting, and/or-cloning nucleic acid sequences
US4789734A (en) 1985-08-06 1988-12-06 La Jolla Cancer Research Foundation Vitronectin specific cell receptor derived from mammalian mesenchymal tissue
US5023252A (en) 1985-12-04 1991-06-11 Conrex Pharmaceutical Corporation Transdermal and trans-membrane delivery of drugs
NL8720442A (en) 1986-08-18 1989-04-03 Clinical Technologies Ass DELIVERY SYSTEMS FOR PHARMACOLOGICAL AGENTS.
US6024983A (en) 1986-10-24 2000-02-15 Southern Research Institute Composition for delivering bioactive agents for immune response and its preparation
US5075109A (en) 1986-10-24 1991-12-24 Southern Research Institute Method of potentiating an immune response
US4992445A (en) 1987-06-12 1991-02-12 American Cyanamid Co. Transdermal delivery of pharmaceuticals
US5001139A (en) 1987-06-12 1991-03-19 American Cyanamid Company Enchancers for the transdermal flux of nivadipine
US4897268A (en) 1987-08-03 1990-01-30 Southern Research Institute Drug delivery system and method of making the same
GB8810400D0 (en) 1988-05-03 1988-06-08 Southern E Analysing polynucleotide sequences
US5700637A (en) 1988-05-03 1997-12-23 Isis Innovation Limited Apparatus and method for analyzing polynucleotide sequences and method of generating oligonucleotide arrays
US6054270A (en) 1988-05-03 2000-04-25 Oxford Gene Technology Limited Analying polynucleotide sequences
ATE151110T1 (en) 1988-09-02 1997-04-15 Protein Eng Corp PRODUCTION AND SELECTION OF RECOMBINANT PROTEINS WITH DIFFERENT BINDING SITES
US5223409A (en) 1988-09-02 1993-06-29 Protein Engineering Corp. Directed evolution of novel binding proteins
US5750373A (en) 1990-12-03 1998-05-12 Genentech, Inc. Enrichment method for variant proteins having altered binding properties, M13 phagemids, and growth hormone variants
US5776434A (en) 1988-12-06 1998-07-07 Riker Laboratories, Inc. Medicinal aerosol formulations
US5744101A (en) 1989-06-07 1998-04-28 Affymax Technologies N.V. Photolabile nucleoside protecting groups
US6040138A (en) 1995-09-15 2000-03-21 Affymetrix, Inc. Expression monitoring by hybridization to high density oligonucleotide arrays
US5424186A (en) 1989-06-07 1995-06-13 Affymax Technologies N.V. Very large scale immobilized polymer synthesis
US5527681A (en) 1989-06-07 1996-06-18 Affymax Technologies N.V. Immobilized molecular synthesis of systematically substituted compounds
US5143854A (en) 1989-06-07 1992-09-01 Affymax Technologies N.V. Large scale photolithographic solid phase synthesis of polypeptides and receptor binding screening thereof
DE69032809T2 (en) 1989-11-06 1999-07-08 Cell Genesys Inc Production of proteins using homologous recombination
US5252743A (en) 1989-11-13 1993-10-12 Affymax Technologies N.V. Spatially-addressable immobilization of anti-ligands on surfaces
US5272071A (en) 1989-12-22 1993-12-21 Applied Research Systems Ars Holding N.V. Method for the modification of the expression characteristics of an endogenous gene of a given cell line
US5427908A (en) 1990-05-01 1995-06-27 Affymax Technologies N.V. Recombinant library screening methods
WO1992020791A1 (en) 1990-07-10 1992-11-26 Cambridge Antibody Technology Limited Methods for producing members of specific binding pairs
GB9015198D0 (en) 1990-07-10 1990-08-29 Brien Caroline J O Binding substance
ES2155822T3 (en) 1990-12-06 2001-06-01 Affymetrix Inc COMPOUNDS AND ITS USE IN A BINARY SYNTHESIS STRATEGY.
US5190029A (en) 1991-02-14 1993-03-02 Virginia Commonwealth University Formulation for delivery of drugs by metered dose inhalers with reduced or no chlorofluorocarbon content
CA2105300C (en) 1991-03-01 2008-12-23 Robert C. Ladner Process for the development of binding mini-proteins
JP3672306B2 (en) 1991-04-10 2005-07-20 ザ スクリップス リサーチ インスティテュート Heterodimeric receptor library using phagemids
DE4122599C2 (en) 1991-07-08 1993-11-11 Deutsches Krebsforsch Phagemid for screening antibodies
US5384261A (en) 1991-11-22 1995-01-24 Affymax Technologies N.V. Very large scale immobilized polymer synthesis using mechanically directed flow paths
US7534567B2 (en) 1992-03-04 2009-05-19 The Regents Of The University Of California Detection of nucleic acid sequence differences by comparative genomic hybridization
ATE205542T1 (en) 1992-03-04 2001-09-15 Univ California COMPARATIVE GENOME HYBRIDIZATION
US5541061A (en) 1992-04-29 1996-07-30 Affymax Technologies N.V. Methods for screening factorial chemical libraries
US5376359A (en) 1992-07-07 1994-12-27 Glaxo, Inc. Method of stabilizing aerosol formulations
US5288514A (en) 1992-09-14 1994-02-22 The Regents Of The University Of California Solid phase and combinatorial synthesis of benzodiazepine compounds on a solid support
US20040197774A1 (en) 1992-11-12 2004-10-07 Michael Wigler Representational approach to DNA analysis
US5928647A (en) 1993-01-11 1999-07-27 Dana-Farber Cancer Institute Inducing cytotoxic T lymphocyte responses
US5858659A (en) 1995-11-29 1999-01-12 Affymetrix, Inc. Polymorphism detection
US5837832A (en) 1993-06-25 1998-11-17 Affymetrix, Inc. Arrays of nucleic acid probes on biological chips
AU8126694A (en) 1993-10-26 1995-05-22 Affymax Technologies N.V. Arrays of nucleic acid probes on biological chips
DE69527585T2 (en) 1994-06-08 2003-04-03 Affymetrix Inc Method and device for packaging chips
US6287850B1 (en) 1995-06-07 2001-09-11 Affymetrix, Inc. Bioarray chip reaction apparatus and its manufacture
US6300063B1 (en) 1995-11-29 2001-10-09 Affymetrix, Inc. Polymorphism detection
EP2369007B1 (en) 1996-05-29 2015-07-29 Cornell Research Foundation, Inc. Detection of nucleic acid sequence differences using coupled ligase detection and polymerase chain reactions
DE19782097T1 (en) 1996-11-06 1999-10-14 Sequenom Inc Compositions and methods for immobilizing nucleic acids on solid supports
EP0878552A1 (en) 1997-05-13 1998-11-18 Erasmus Universiteit Rotterdam Molecular detection of chromosome aberrations
US6210878B1 (en) 1997-08-08 2001-04-03 The Regents Of The University Of California Array-based detection of genetic alterations associated with disease
CA2307674C (en) 1997-10-30 2013-02-05 Cold Spring Harbor Laboratory Probe arrays and methods of using probe arrays for distinguishing dna
US6207392B1 (en) 1997-11-25 2001-03-27 The Regents Of The University Of California Semiconductor nanocrystal probes for biological applications and process for making and using such probes
US20090304653A1 (en) 1998-01-30 2009-12-10 Evolutionary Genomics, Inc. Methods to identify polynucleotide and polypeptide sequences which may be associated with physiological and medical conditions
US6429027B1 (en) 1998-12-28 2002-08-06 Illumina, Inc. Composite arrays utilizing microspheres
US20030207295A1 (en) 1999-04-20 2003-11-06 Kevin Gunderson Detection of nucleic acid reactions on bead arrays
CA2381451A1 (en) 1999-08-30 2001-03-08 Human Genome Sciences, Inc. Attractin-like polynucleotides, polypeptides, and antibodies
US6146834A (en) 1999-09-10 2000-11-14 The United States Of America As Represented By The Secretary Of Agriculture PCR primers for detection of plant pathogenic species and subspecies of acidovorax
US6423499B1 (en) 1999-09-10 2002-07-23 The United States Of America, As Represented By The Secretary Of Agriculture PCR primers for detection and identification of plant pathogenic species, subspecies, and strains of acidovorax
US7211390B2 (en) 1999-09-16 2007-05-01 454 Life Sciences Corporation Method of sequencing a nucleic acid
US7244559B2 (en) 1999-09-16 2007-07-17 454 Life Sciences Corporation Method of sequencing a nucleic acid
US7030231B1 (en) 1999-09-30 2006-04-18 Catalyst Biosciences, Inc. Membrane type serine protease 1 (MT-SP1) and uses thereof
US6251607B1 (en) 1999-12-09 2001-06-26 National Science Council Of Republic Of China PCR primers for the rapid and specific detection of Salmonella typhimurium
US6892141B1 (en) 2000-03-17 2005-05-10 Hitachi, Ltd. Primer design system
EP1248832A4 (en) 2000-01-21 2004-07-07 Variagenics Inc Identification of genetic components of drug response
US6828097B1 (en) 2000-05-16 2004-12-07 The Childrens Mercy Hospital Single copy genomic hybridization probes and method of generating same
JP4287652B2 (en) 2000-10-24 2009-07-01 ザ・ボード・オブ・トラスティーズ・オブ・ザ・レランド・スタンフォード・ジュニア・ユニバーシティ Characterization of genomic DNA by direct multiple processing
US20040018491A1 (en) 2000-10-26 2004-01-29 Kevin Gunderson Detection of nucleic acid reactions on bead arrays
AU785425B2 (en) 2001-03-30 2007-05-17 Genetic Technologies Limited Methods of genomic analysis
GB0113908D0 (en) 2001-06-07 2001-08-01 Univ London Designing degenerate PCR primers
US20030049663A1 (en) 2001-06-27 2003-03-13 Michael Wigler Use of reflections of DNA for genetic analysis
US6951761B2 (en) 2001-08-31 2005-10-04 The United States Of America As Represented By The Department Of Health And Human Services Measurements of multiple molecules using a CryoArray
US20030082606A1 (en) 2001-09-04 2003-05-01 Lebo Roger V. Optimizing genome-wide mutation analysis of chromosomes and genes
US6977148B2 (en) 2001-10-15 2005-12-20 Qiagen Gmbh Multiple displacement amplification
US6902921B2 (en) 2001-10-30 2005-06-07 454 Corporation Sulfurylase-luciferase fusion proteins and thermostable sulfurylase
US20050124022A1 (en) 2001-10-30 2005-06-09 Maithreyan Srinivasan Novel sulfurylase-luciferase fusion proteins and thermostable sulfurylase
US7107155B2 (en) 2001-12-03 2006-09-12 Dnaprint Genomics, Inc. Methods for the identification of genetic features for complex genetics classifiers
US6916621B2 (en) 2002-03-27 2005-07-12 Spectral Genomics, Inc. Methods for array-based comparitive binding assays
US7282330B2 (en) 2002-05-28 2007-10-16 U.S. Genomics, Inc. Methods and apparati using single polymer analysis
AU2003262789A1 (en) 2002-08-20 2004-03-11 Aventis Pharma Sa Abca13 nucleic acids and proteins, and uses thereof
US7011949B2 (en) 2002-09-30 2006-03-14 Agilent Technologies, Inc. Methods and compositions for producing labeled probe nucleic acids for use in array based comparative genomic hybridization applications
US7822555B2 (en) 2002-11-11 2010-10-26 Affymetrix, Inc. Methods for identifying DNA copy number changes
US10229244B2 (en) 2002-11-11 2019-03-12 Affymetrix, Inc. Methods for identifying DNA copy number changes using hidden markov model based estimations
US7424368B2 (en) 2002-11-11 2008-09-09 Affymetix, Inc. Methods for identifying DNA copy number changes
EP2159285B1 (en) 2003-01-29 2012-09-26 454 Life Sciences Corporation Methods of amplifying and sequencing nucleic acids
JP2006519440A (en) 2003-02-14 2006-08-24 インタージェネティックス インコーポレイテッド Statistical identification of increased risk of disease
US8694263B2 (en) 2003-05-23 2014-04-08 Cold Spring Harbor Laboratory Method of identifying virtual representations of nucleotide sequences
EP1636337A4 (en) 2003-06-20 2007-07-04 Illumina Inc Methods and compositions for whole genome amplification and genotyping
KR100647277B1 (en) 2003-08-14 2006-11-17 삼성전자주식회사 PCR primer set for a detection of hepatitis B and a hepatitis B detection kit comprising the same
US20070141577A1 (en) 2003-09-11 2007-06-21 Moore Thomas F Method
CA2544041C (en) 2003-10-28 2015-12-08 Bioarray Solutions Ltd. Optimization of gene expression analysis using immobilized capture probes
US7169560B2 (en) 2003-11-12 2007-01-30 Helicos Biosciences Corporation Short cycle methods for sequencing polynucleotides
JP2008504803A (en) 2004-01-09 2008-02-21 ザ リージェンツ オブ ザ ユニバーシティ オブ カリフォルニア Cell type-specific pattern of gene expression
US20050233339A1 (en) 2004-04-20 2005-10-20 Barrett Michael T Methods and compositions for determining the relationship between hybridization signal of aCGH probes and target genomic DNA copy number
WO2005108621A1 (en) 2004-04-30 2005-11-17 Yale University Methods and compositions for cancer diagnosis
WO2005108997A1 (en) 2004-05-11 2005-11-17 Bayer Healthcare Ag Diagnostics and therapeutics for diseases associated with dipeptidyl-peptidase 6 (dpp6)
US20060024711A1 (en) 2004-07-02 2006-02-02 Helicos Biosciences Corporation Methods for nucleic acid amplification and sequence determination
US20060012793A1 (en) 2004-07-19 2006-01-19 Helicos Biosciences Corporation Apparatus and methods for analyzing samples
US7276720B2 (en) 2004-07-19 2007-10-02 Helicos Biosciences Corporation Apparatus and methods for analyzing samples
US20060024678A1 (en) 2004-07-28 2006-02-02 Helicos Biosciences Corporation Use of single-stranded nucleic acid binding proteins in sequencing
US7595159B2 (en) 2004-11-03 2009-09-29 The Brigham And Women's Hospital, Inc. Prediction of Parkinson's disease using gene expression levels of peripheral blood samples
NO323175B1 (en) 2004-12-23 2007-01-15 Jan O Aasly Procedure for showing a mutation that causes hereditary parkinsonism
KR20080016789A (en) 2005-02-18 2008-02-22 더 거번먼트 오브 더 유나이티드 스테이츠 오브 어메리카 애즈 레프리젠티드 바이 더 세크러터리 오브 더 디파트먼트 오브 헬쓰 앤드 휴먼 써비시즈 Identification of molecular diagnostic markers for endometriosis in blood lymphocytes
CN100340674C (en) 2005-04-28 2007-10-03 中国人民解放军总医院 Deaf-related gene mutation and its detecting method
EP1889058A4 (en) 2005-05-05 2008-10-29 Mount Sinai Hospital Corp Diagnosis and treatment of endometriosis
CA2633203A1 (en) 2005-12-14 2007-06-21 Michael H. Wigler Use of roma for characterizing genomic rearrangements
EP1969514A2 (en) 2005-12-14 2008-09-17 Cold Spring Harbor Laboratory Methods for assessing probabilistic measures of clinical outcome using genomic profiling
CA2641513C (en) 2006-02-28 2021-09-21 Elan Pharmaceuticals, Inc. Methods of treating inflammatory and autoimmune diseases with natalizumab
ES2337207T3 (en) 2006-04-12 2010-04-21 Medical Research Council PROCEDURE TO DETERMINE THE NUMBER OF COPIES.
US7702468B2 (en) 2006-05-03 2010-04-20 Population Diagnostics, Inc. Evaluating genetic disorders
US10522240B2 (en) 2006-05-03 2019-12-31 Population Bio, Inc. Evaluating genetic disorders
CN101148684A (en) 2006-09-22 2008-03-26 上海交通大学医学院附属瑞金医院 Method for detecting multiple endocrine adenoma II gene mutation
CA2667909A1 (en) 2006-10-31 2008-05-08 Janssen Pharmaceutica N.V. Treatment of pervasive developmental disorders
US20080131887A1 (en) 2006-11-30 2008-06-05 Stephan Dietrich A Genetic Analysis Systems and Methods
US8262900B2 (en) 2006-12-14 2012-09-11 Life Technologies Corporation Methods and apparatus for measuring analytes using large scale FET arrays
EP2092322B1 (en) 2006-12-14 2016-02-17 Life Technologies Corporation Methods and apparatus for measuring analytes using large scale fet arrays
US11287425B2 (en) 2009-04-22 2022-03-29 Juneau Biosciences, Llc Genetic markers associated with endometriosis and use thereof
GB2463833B (en) 2007-06-26 2012-02-08 Parkinson S Inst Compositions that prevent or reverse alpha-synuclein fibrillation for use in the treatment of neurological disorders
EP2200611A4 (en) 2007-09-14 2010-09-22 Biogen Idec Inc Compositions and methods for the treatment of progressive multifocal leukoencephalopathy (pml)
WO2009043178A1 (en) * 2007-10-04 2009-04-09 The Hospital For Sick Children Biomarkers for autism spectrum disorders
JP2011505579A (en) 2007-12-04 2011-02-24 ユニバーシティ オブ マイアミ Molecular targets for modulating intraocular pressure and distinguishing steroid responders from non-responders
US8996318B2 (en) 2007-12-28 2015-03-31 Pioneer Hi-Bred International, Inc. Using oligonucleotide microarrays to analyze genomic differences for the prediction of heterosis
US8470594B2 (en) 2008-04-15 2013-06-25 President And Fellows Of Harvard College Methods for identifying agents that affect the survival of motor neurons
WO2010008486A2 (en) 2008-06-24 2010-01-21 Parkinsons Institute Pluripotent cell lines and methods of use thereof
US20110111419A1 (en) 2008-07-04 2011-05-12 deCODE Geneties ehf. Copy Number Variations Predictive of Risk of Schizophrenia
EP2149613A1 (en) 2008-07-28 2010-02-03 Greenwood Genetic Center, Inc. Methods for determining dysregulation of methylation of brain expressed genes on the X chromosome to diagnose autism spectrum disorders
US20100035252A1 (en) 2008-08-08 2010-02-11 Ion Torrent Systems Incorporated Methods for sequencing individual nucleic acids under tension
EP2344672B8 (en) 2008-09-25 2014-12-24 SureGene LLC Genetic markers for optimizing treatment for schizophrenia
US20100137143A1 (en) 2008-10-22 2010-06-03 Ion Torrent Systems Incorporated Methods and apparatus for measuring analytes
CN101403008B (en) 2008-11-07 2011-07-27 北京市神经外科研究所 Gene diagnosis reagent kit for detecting adults progeria syndrome
EP2358907B1 (en) 2008-11-12 2015-05-20 University of Utah Research Foundation Autism associated genetic markers
JP2012511895A (en) 2008-11-14 2012-05-31 ザ チルドレンズ ホスピタル オブ フィラデルフィア Genetic variants responsible for human cognition and methods of using them as diagnostic and therapeutic targets
KR20110114664A (en) 2009-01-16 2011-10-19 메사추세츠 인스티튜트 오브 테크놀로지 Diagnosis and treatment of autism spectrum disorders
US9493834B2 (en) 2009-07-29 2016-11-15 Pharnext Method for detecting a panel of biomarkers
JP5686335B2 (en) 2009-08-25 2015-03-18 国立大学法人広島大学 Diagnostic marker and method for amyotrophic lateral sclerosis, model animal that develops amyotrophic lateral sclerosis, and model cell
US20120172251A1 (en) 2009-09-16 2012-07-05 The Regents Of The University Of Colorado, A Body Corporate Methods and compositions for diagnosing heart failure
WO2011065982A2 (en) 2009-11-30 2011-06-03 23Andme, Inc. Polymorphisms associated with parkinson's disease
US20130123124A1 (en) * 2010-03-12 2013-05-16 Children's Medical Center Corporation Methods and compositions for characterizing autism spectrum disorder based on gene expression patterns
WO2012006291A2 (en) 2010-07-06 2012-01-12 Life Technologies Corporation Systems and methods to detect copy number variation
EP2601609B1 (en) 2010-08-02 2017-05-17 Population Bio, Inc. Compositions and methods for discovery of causative mutations in genetic disorders
JP5867392B2 (en) 2010-08-17 2016-02-24 凸版印刷株式会社 Compound semiconductor thin film ink and method for producing solar cell
WO2012027491A1 (en) 2010-08-24 2012-03-01 The Children's Hospital Of Philadelphia Association of rare recurrent genetic variations to attention-deficit, hyperactivity disorder (adhd) and methods of use thereof for the diagnosis and treatment of the same
US20120048778A1 (en) 2010-08-25 2012-03-01 Catalytic Distillation Technologies Selective desulfurization of fcc gasoline
CA2744424A1 (en) 2010-09-14 2012-03-14 The Hospital For Sick Children Biomarkers for autism spectrum disorders
JP2013538589A (en) 2010-10-07 2013-10-17 ザ・ジョンズ・ホプキンス・ユニバーシティ Compositions and methods for diagnosing autism
ES2387358B1 (en) 2011-02-25 2013-08-02 Fundación Alzheimur PROCEDURE FOR THE DETERMINATION OF THE GENETIC PREDISPOSITION TO THE DISEASE OF PARKINSON.
WO2012135468A2 (en) 2011-03-29 2012-10-04 Cornell University Genetics of gender discrimination in date palm
US10221454B2 (en) 2011-10-10 2019-03-05 The Hospital For Sick Children Methods and compositions for screening and treating developmental disorders
US8798710B2 (en) 2011-10-19 2014-08-05 Cognionics, Inc. Apparatuses, systems and methods for biopotential sensing with dry electrodes
US11180807B2 (en) 2011-11-04 2021-11-23 Population Bio, Inc. Methods for detecting a genetic variation in attractin-like 1 (ATRNL1) gene in subject with Parkinson's disease
CA2854779A1 (en) 2011-11-10 2013-05-16 Genentech, Inc. Methods for treating, diagnosing and monitoring alzheimer's disease
DK2812452T3 (en) 2012-02-09 2020-06-29 Population Bio Inc METHODS AND COMPOSITIONS FOR SCREENING AND TREATING DEVELOPMENT DISORDERS
US9481889B2 (en) 2012-03-19 2016-11-01 The Malasian Palm Oil Board Gene controlling shell phenotype in palm
US10995342B2 (en) 2012-05-11 2021-05-04 Wisconsin Alumni Research Foundation Rhg1 mediated resistance to soybean cyst nematode
US20140024541A1 (en) 2012-07-17 2014-01-23 Counsyl, Inc. Methods and compositions for high-throughput sequencing
EP2895621B1 (en) 2012-09-14 2020-10-21 Population Bio, Inc. Methods and compositions for diagnosing, prognosing, and treating neurological conditions
US10233495B2 (en) 2012-09-27 2019-03-19 The Hospital For Sick Children Methods and compositions for screening and treating developmental disorders
CN103436606B (en) 2013-08-01 2014-10-29 中山大学附属肿瘤医院 Kit for auxiliary diagnosis and/or prognosis judgment of esophageal carcinoma
WO2015131078A1 (en) 2014-02-27 2015-09-03 Biogen Ma Inc. Method of assessing risk of pml
US9909167B2 (en) 2014-06-23 2018-03-06 The Board Of Trustees Of The Leland Stanford Junior University On-slide staining by primer extension

Also Published As

Publication number Publication date
US11174516B2 (en) 2021-11-16
CA2863887A1 (en) 2013-08-15
EP2812452A4 (en) 2016-03-02
DK2812452T3 (en) 2020-06-29
WO2013120018A1 (en) 2013-08-15
EP2812452B1 (en) 2020-05-27
US10407724B2 (en) 2019-09-10
US20200199674A1 (en) 2020-06-25
EP2812452A1 (en) 2014-12-17
US20140161721A1 (en) 2014-06-12

Similar Documents

Publication Publication Date Title
CA2863887C (en) Methods of screening low frequency gdna variation biomarkers for pervasive developmental disorder (pdd) or pervasive developmental disorder - not otherwise specified (pdd_nos)
US11920199B2 (en) Methods and compositions for screening and treating developmental disorders
US11618925B2 (en) Methods and compositions for screening and treating developmental disorders
Ogino et al. Spinal muscular atrophy: molecular genetics and diagnostics
JP6078211B2 (en) Genetic changes associated with autism and the phenotype of autism and its use for diagnosis and treatment of autism
US20220380851A1 (en) Methods and compositions for diagnosing, prognosing, and treating endometriosis
JP6216486B2 (en) Association of low-frequency recurrent genetic variation with attention-deficit / hyperactivity disorder and its use for diagnosis and treatment
JP2011509096A (en) Mutations in contact-related protein 2 (CNTNAP2) associated with increased risk of idiopathic autism
US20070249518A1 (en) Compositions and Methods for Treating Mental Disorders
de Morais Unravelling New Spastic Paraplegia Genes and Their Functions Through Next Generation Sequencing

Legal Events

Date Code Title Description
EEER Examination request

Effective date: 20180111

EEER Examination request

Effective date: 20180111

EEER Examination request

Effective date: 20180111

EEER Examination request

Effective date: 20180111

EEER Examination request

Effective date: 20180111

EEER Examination request

Effective date: 20180111