WO2005123951A2 - Methods of human leukocyte antigen typing by neighboring single nucleotide polymorphism haplotypes - Google Patents

Methods of human leukocyte antigen typing by neighboring single nucleotide polymorphism haplotypes Download PDF

Info

Publication number
WO2005123951A2
WO2005123951A2 PCT/US2005/017958 US2005017958W WO2005123951A2 WO 2005123951 A2 WO2005123951 A2 WO 2005123951A2 US 2005017958 W US2005017958 W US 2005017958W WO 2005123951 A2 WO2005123951 A2 WO 2005123951A2
Authority
WO
WIPO (PCT)
Prior art keywords
hla
allele
snp
determining
haplotype
Prior art date
Application number
PCT/US2005/017958
Other languages
French (fr)
Other versions
WO2005123951A3 (en
Inventor
Emily Walsh
John D. Rioux
Eric S. Lander
Original Assignee
Whitehead Institute For Biomedical Research
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Whitehead Institute For Biomedical Research filed Critical Whitehead Institute For Biomedical Research
Publication of WO2005123951A2 publication Critical patent/WO2005123951A2/en
Publication of WO2005123951A3 publication Critical patent/WO2005123951A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6881Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for tissue or cell typing, e.g. human leukocyte antigen [HLA] probes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • HLA Human Leukocyte Antigen
  • HLA loci have been associated with many human autoimmune and inflammatory diseases, and many research laboratories genotype their human subjects for these loci as a matter of course.
  • the HLA loci were originally studied by lower resolution serotyping techniques until the recent advent of "dot blot" hybridization-based molecular typing such as SSOP and SSP (Dynal Biotech, Biotest, One Lambda) that greatly improved examination of the region. Direct sequencing of HLA alleles is also possible.
  • these current methods are laborious and expensive. Accordingly, novel approaches to map the HLA loci in the context of the MHC region are desirable.
  • the invention provides a more uniform, comprehensive map of commonly linked variation, e.g., a haplotype map, that will help to discriminate between causal alleles and variation that is merely in linkage disequilibrium (LD) with them. Such a resource will also allow a more complete description of the haplotype structure and, potentially, insight into the evolutionary and recombinational history of the chromosomal region in question.
  • the invention provides an integrated SNP-haplotype map of a 4-Mb major histocompatibility complex (MHC) region.
  • MHC major histocompatibility complex
  • the integrated map comprises SNPs that are preferred to be reliable, polymorphic, and evenly spaced, e.g., one SNP every 20 kb.
  • the integrated map further comprises genotyped HLA genes, TAP genes, microsatellites, or combination thereof.
  • the invention further features a novel method of genotyping Human Leukocyte Antigen (HLA) genes using patterns of neighboring single nucleotide polymorphisms (SNPs).
  • HLA Human Leukocyte Antigen
  • SNPs single nucleotide polymorphisms
  • the SNP-based method is an improvement over existing hybridization-based techniques, as it allows quick and inexpensive genotyping of the HLA loci. This method does not directly assess the intra-gene variation, as is done by all other current methods for HLA genotyping, but rather define HLA genotypes by studying the neighboring extra-genie variation(s) which falls outside the HLA allele to be genotyped and which, due to LD patterns, is conveniently linked to the HLA loci.
  • This approach can be employed to map variation(s) in the regions neighboring HLA genes to fully resolve all known common HLA gene variants in multiple different ethnic populations.
  • This method can benefit clinical laboratories typing individuals for transplantation procedures, as well as research laboratories that are interested in studying HLA gene variation(s) in particular patient populations or disease associations.
  • this method can be employed to predict the likelihood or probability of developing a disease, particularly MHC-linked diseases or autoimmune diseases.
  • this method can be employed to predict the likelihood or probability of developing an immune response, e.g., a response against infection or a host-graft response (e.g., elicited by organ transplantation) in a subject, preferably a human subject.
  • One aspect of the invention provides a method of genotyping an HLA gene, such as for example an HLA-A or an HLA-DRBl gene.
  • the method comprises determining the nucleotide present at one or more extra-genic SNP sites, wherein the SNP is associated with an HLA genotype.
  • the extra-genic SNP sites correspond to the HLA allele to be genotyped, that is, the SNP sites are outside and in the neighboring region(s) of the HLA allele to be genotyped.
  • an extra- genic SNP to be assessed that corresponds to the HLA-A allele can be rs2517862, rsl655930, rsl616549, rs376253, rsl961135, rs2517706, rs2517701, rs2517699, rs435766, rs410909, rs2394255, rsl264807, rs2530388, rs356963, rs2286405, rs2240619, rs3129012, rs259938, or any combination thereof.
  • Another example involves genotyping the HLA-DRB 1 allele, wherein an extra-genic SNP to be assessed can be rs742697, rs523627, rs3129960, rs2395163, rs2395165, rs983561, rs2239804, rs2213584, rs2395182, rs2858860, rs3129907, rsl059544, rsl987529, or any combination thereof.
  • Another aspect of the invention provides a method of predicting or assisting in predicting the likelihood of developing a disease, in particular an iriflamiriatory disease, an MHC-linked disease, or an autoimmune disease, in a subject, preferably a human subject.
  • the method comprises genotyping an HLA gene in the subject to be tested by determiiiing the nucleotide present at one or more extra-genic SNP sites, wherein the SNP is associated with the HLA genotype.
  • a further aspect of the invention provides a method of predicting or assisting in predicting the likelihood of developing an immune response in a subject, preferably a human subject.
  • An immune response may be developed against an infection or inflammation.
  • an immune response may comprise a host-graft response, e.g., rejection of organ transplants.
  • the method comprises genotyping an HLA gene in the subject to be tested by determining the nucleotide present at one or more extra-genic SNP sites, wherein the SNP is associated with an HLA genotype.
  • Figure 1A-1E shows an integrated SNP map of the 4-Mb MHC in CEPH Europeans.
  • Figure 1A shows the location and exon-intron structure for a subset of genes above the map, for positional reference.
  • Figure IB shows 201 reliable, polymorphic SNPs, indicated on the map with ticks below the line. Ticks above the line are placed with 100-kb spacing.
  • Figure 1C shows haplotype blocks below and common haplotype variants (13% frequency) shown as colored lines (thickness indicates relative population frequency).
  • Figure IE shows the relative recombination rate, which is based on the sperm meiotic map, indicated in bar- graph form, where the value on the line is the regional average, 0.49 cM Mb. Green bars indicate recombination rates >0.49 cM/Mb, and yellow bars indicate rates ⁇ 0.49 cM/Mb.
  • Figure 2A-2D show block comparison between the MHC and other autosomal regions.
  • Figure 2A shows a plot of LD by physical distance revealing that LD is extended in the MHC.
  • Figure 2B shows that the average physical length of blocks in the MHC is longer than in the rest of the genome.
  • Figure 2C shows that, measured by genetic distance, block size in the MHC is somewhat less than in the rest of the genome.
  • Figure 2D shows that the number of haplotype variants in blocks not spanning classical HLA genes is the same as elsewhere in the genome.
  • FIG. 3A-3C shows EHH analysis of haplotype blocks, microsatellites, HLA genes, and TAP genes in the region.
  • EHH is computed as the percentage of instances in which two randomly selected chromosomes with the same variant locus have identical alleles at all SNPs assayed up to a particular distance from that locus (e.g., an EHH of 0.5 at marker X means that 50% of possible pairings of a particular variant exhibit sequence identity from the locus to marker X).
  • Figure 3A shows points representing the EHH at a distance of 0.25 cM from an allele at a particular locus. Outlying variants are indicated in color. The nine outlying variants define three extended haplotypes.
  • the six points labeled as " 1 " indicate variants that map on the DRB 1*1501 haplotype (associated with lupus and MS).
  • the two overlapping points labeled as “2” indicate variants C*0701 and D6S2840*219, which are both found on a haplotype associated with autoimmune diabetes, lupus, and hepatitis.
  • the point labeled as "3" indicates DRB1*1101 (associated with pemphigoid disease).
  • Figure 3B shows a recombination-distance-based map of the region. Microsatellites/genes are labeled and indicated with ticks above the line.
  • Figure 3C shows EHH values for loci that have at least one outlying variant.
  • EHH values are converted to grayscale values: EHH of 1 p black, EHH of 0.5 p 50% grayscale.
  • the solid lines 4-10 indicate the locus about which values were derived.
  • the dotted lines 11-17 and 11'-17' indicate 0.25-cM distance at which outliers were assessed.
  • Two HLA-C alleles, C*0702 and C*0701, are extended, as are two DRB1 alleles, DRB1*1501 and DRB 1*1101.
  • the other HLA gene alleles with extended haplotypes are DQA1*0102 and DQB1*0602.
  • FIG. 4A-4B show correlation of HLA alleles to SNP haplotype background. A map of region showing placement of SNPs and haplotypes assayed is shown for reference. Multi-SNP haplotypes are coded by single capital letters.
  • Figure 4A shows SNP-HLA haplotypes sorted by HLA allele. Percents indicate the percentage of a particular HLA allele that falls on the indicated SNP haplotype.
  • Figure 4B shows SNP- HLA haplotypes sorted by SNP haplotype allele. Percents indicate the percentage of a particular SNP haplotype allele that bears the indicated HLA allele. Counts are overall number of chromosomes bearing the SNP-HLA haplotype indicated.
  • an SNP haplotype map of the region was created.
  • 201 reliable, polymorphic, evenly spaced SNPs were genotyped in 136 independent chromosomes also genotyped for nine HLA genes, two TAP genes (involved in antigen processing), and 18 microsatellites. Markers were genotyped in families (18 multigenerational European pedigrees) to allow direct assessment of chromosomal phase and, thus, simple reconstruction of haplotypes.
  • FIGS. 1A-1E show an integrated map of the SNP, microsatellite, and HLA variation in the MHC. This map shows that, aside from the classical HLA loci, the variation and LD structure of the MHC are not different from a genomewide control data set. Specifically, whereas LD appears to extend over longer physical distances in the MHC, this seems to be accounted for by the reduced recombination rate in the region.
  • Multiblock SNP haplotypes contain considerable predictive information for common HLA alleles at HLA-A, HLA-B, HLA-C, and HLA-DRB1.
  • Multiblock SNP haplotypes should enable cost-efficient, large-scale exploration of the variation at the classical HLA loci and beyond.
  • An additional implication of these results is that multiblock SNP haplotypes may be sufficient to identify low-frequency variants throughout the genome. Such low-frequency variants would likely be missed in single, block-based, common variant analysis; however, their contribution to disease may be assayed by use of multiblock haplotypes in analysis. This integrated variation map of the MHC has considerable utility.
  • the MHC has been implicated in almost every human inflammatory and autoimmune disease. Although the MHC has been studied by typing of the classical HLA genes and microsatellites for many years, only rarely has this analysis definitively identified causal variation. Often, association studies using these methods implicate more than one allele at a single locus as influencing disease susceptibility. Although this may represent allelic heterogeneity underlying disease, reinterpreting such results with attention to shared SNP-haplotype variation might point to additional hypotheses regarding the causal variant.
  • SNPs to be discovered, as well as genotype information in other populations may be employed to build a more complete map according to the materials and methods described herein that were employed to build the map as shown in Figures 1A-1E.
  • a full understanding of the patterns of LD and haplotype diversity of this region should allow the identification of a subset of SNPs required for disease studies. This will allow MHC-association studies to be completed cost effectively by using a combination of haplotype-tagging and HLA allele-tagging SNPs.
  • SNP -based haplotype approaches will allow the examination of larger disease cohorts and enable the identification of rare recombinant haplotypes that would refine association signals and potentially identify the causal alleles for MHC-associated diseases.
  • Integrated Map SNPs used in creating the integrated map as shown in Figure 1 include the following SNPs, as shown in TABLE 1. All SNPs are located on human chromosome 6, and their respective chromosome positions are shown in Column CHROM_POS. The frequencies of allele types are also shown in Columns FREQ1 and FREQ2. The primer sequences, as well as probe information and flanking sequences for the SNPs are described in detail at: htt ⁇ ://www.broad.mit.edu/mpg/idrg/piOiects/HLA data SNP Info.xls (incorporated herein by reference in its entirety). The primer sequences for SNPs are also provided herein in TABLE 2.
  • rs210139 6 33545520 A 0.564885496 C 0.435114504 rs210145 6 33549551 C 0.467213115 G 0.532786885 rs396746 6 33558906 A 0.148148148 C 0.851851852 rs210120 6 33576523 A 0.588235294 G 0.411764706 rs407415 6 33581077 A 0.786764706 G 0.213235294
  • Genotyping HLA Loci The invention features a novel method of genotyping Human Leukocyte Antigen (HLA) genes using patterns of neighboring single nucleotide polymorphisms (SNPs).
  • SNP-based method is an improvement over existing hybridization-based techniques, as it allows quick and inexpensive genotyping of the HLA loci.
  • This method does not directly assess the intra-gene variation, as is done by all other current methods for HLA genotyping, but rather defines HLA genotypes by studying the neighboring extra-genic variation(s) which, due to LD patterns, is conveniently linked to the HLA loci.
  • extra-genic herein is meant outside or in the neighboring region(s) of the HLA allele to be genotyped.
  • One aspect of the invention provides a method of genotyping an HLA gene, such as for example an HLA-A or an HLA-DRBl gene.
  • the method comprises determining the nucleotide present at one or more extra-genic SNP sites, wherein the SNP is associated with an HLA genotype.
  • an extra- genic SNP to be assessed can be rs2517862, rsl655930, rsl616549, rs376253, rsl961135, rs2517706, rs2517701, rs2517699, rs435766, rs410909, rs2394255, rsl264807, rs2530388, rs356963, rs2286405, rs2240619, rs3129012, rs259938, or any combination thereof.
  • HLA-DRB allele Another example involves genotyping the HLA-DRB allele, wherein an extra-genic SNP to be assessed can be rs742697, rs523627, rs3129960, rs2395163, rs2395165, rs983561, rs2239804, rs2213584, rs2395182, rs2858860, rs3129907, rsl059544, rsl987529, or any combination thereof.
  • Nomenclature and designations of the HLA alleles have been described by Marsh et al., Tissue Antigens (2002) 60:407-464.
  • HLA-A, -B, -C, -DRB 1/3/4/5, -DQB1 alleles and their association with serologically defined HLA-A, -B, -C, -DR and - DQ antigens is provided by Schreuder et al., Tissue Antigens (2001) 58:109-140.
  • Methods of determining or analyzing SNPs are known in the art. For example, to detect any particular SNP in target DNA sample, e.g., a DNA sample from a subject to be tested, preferable a human subject, one can employ any of the known procedures in the art. For example, two distinct types of analysis and seven procedures are described in U.S. Patent Application Serial No.
  • the first type of analysis is sometimes referred to as de novo characterization. This analysis compares target sequences in different individuals to identify points of variation, i.e., polymorphic sites. By analyzing a group of individuals representing the greatest variety patterns characteristic of the most common alleles/haplotypes of the locus can be identified, and the frequencies of such populations in the population determined. Additional allelic frequencies can be determined for subpopulations characterized by criteria such as geography, race, or gender.
  • the second type of analysis determines which form(s) of a characterized polymorphism are present in individuals under assessment. There are a variety of suitable procedures: 1).
  • Allele-Specific Probes The design and use of allele-specific probes for analyzing SNPs is described by e.g., Saiki et al., Nature 324:163-166 (1986); Dattagupta, EP 235,726, Saiki, WO 89/11548. Allele-specific probes can be designed that hybridize to a segment of target DNA from one individual but do not hybridize to the corresponding segment from another individual due to the presence of different polymorphic forms in the respective segments from the two individuals. Hybridization conditions should be sufficiently stringent that there is a significant difference in hybridization intensity between alleles, and preferably an essentially binary response, whereby a probe hybridizes to only one of the alleles.
  • Some probes are designed to hybridize to a segment of target DNA such that the polymorphic site aligns with a central position (e.g., in a 15 mer at the 7 position; in a 16 mer, at either the 8 or 9 position) of the probe.
  • This design of probe achieves good discrimination in hybridization between different allelic forms.
  • Allele-specific probes are often used in pairs, one member of a pair showing a perfect match to a reference form of a target sequence and the other member showing a perfect match to a variant form. Several pairs of probes can then be immobilized on the same support for simultaneous analysis of multiple polymorphisms within the same target sequence. 2). Tiling Arrays
  • the SNPs can also be identified by hybridization to nucleic acid arrays.
  • Subarrays that are optimized for detection of a variant forms of a precharacterized polymorphism can also be utilized.
  • Such a subarray contains probes designed to be complementary to a second reference sequence, which is an allelic variant of the first reference sequence.
  • the inclusion of a second group (or further groups) can be particular useful for analyzing short subsequences of the primary reference sequence in which multiple mutations are expected to occur within a short distance commensurate with the length of the probes (i.e., two or more mutations within 9 to 21 bases).
  • Allele-Specific Primers An allele-specific primer hybridizes to a site on target DNA overlapping an SNP and only primes amplification of an allelic form to which the primer exhibits perfect complementarily.
  • This primer is used in conjunction with a second primer which hybridizes at a distal site. Amplification proceeds from the two primers leading to a detectable product signifying the particular allelic form is present.
  • a control is usually performed with a second pair of primers, one of which shows a single base mismatch at the polymorphic site and the other of which exhibits perfect complementarily to a distal site. The single-base mismatch prevents amplification and no detectable product is formed. The method works best when the mismatch is included in the 3 '-most position of the oligonucleotide aligned with the polymorphism because this position is most destabilizing to elongation from the primer. 4).
  • Direct-Sequencing The direct analysis of the sequence of any samples for use with the present invention can be accomplished using either the dideoxy-chain termination method or the Maxam-Gilbert method (see Sambrook et al., Molecular Cloning, A Laboratory Manual (2nd Ed., CSHP, New York 1989); Zyskind et al., Recombinant DNA Laboratory Manual, (Acad. Press, 1988)). 5). Denaturing Gradient Gel Electrophoresis Amplification products generated using the polymerase chain reaction can be analyzed by the use of denaturing gradient gel electrophoresis. Different alleles can be identified based on the different sequence-dependent melting properties and electrophoretic migration of DNA in solution.
  • the different electrophoretic mobilities of single- stranded amplification products can be related to base-sequence difference between alleles of target sequences. 7).
  • Single Base Extension An alternative method for identifying and analyzing SNPs is based on single-base extension (SBE) of a fluorescently-labeled primer coupled with fluorescence resonance energy transfer (FRET) between the label of the added base and the label of the primer.
  • SBE single-base extension
  • FRET fluorescence resonance energy transfer
  • the method such as that described by Chen et al., (PNAS 94:10756-61 (1997)), uses a locus-specific oligonucleotide primer labeled on the 5' terminus with 5- carboxyfluorescein (FAM).
  • FAM fluorescence resonance energy transfer
  • This labeled primer is designed so that the 3' end is immediately adj acent to the polymorphic site of interest.
  • the labeled primer is hybridized to the locus, and single base extension of the labeled primer is performed with fluorescently-labeled dideoxyribonucleotides (ddNTPs) in dye-terminator sequencing the effect of mfDNA D-loop sequence polymorphism on milk production, each cow was the next generation of the herd.
  • ddNTPs fluorescently-labeled dideoxyribonucleotides
  • TABLE 3 shows exemplary extra-genic SNPs that correspond to HLA-A alleles and can be used in genotyping HLA-A alleles.
  • the SNPs and HLA-A allele are lined up in each row of the table from the left to the right according to their respective positions on chromosome 6.
  • the percentage numbers on the right column represent the likelihood of the identity of a particular HLA-A allele when the exemplary SNPs are determined to be as shown in the respective rows.
  • the HLA-A allele has a 100% likelihood to be HLA-A*2402, when the 18 SNPs listed are determined to be the respective nucleotides as shown in row 1.
  • the HLA-A allele has a 92% likelihood to be HLA-A* 101, when the 18 SNPs listed are determined to be the respective nucleotides as shown in row 5.
  • the allele-type determinative SNPs between HLA- A*2402 and HLA-A*101 include: rs2517862, rsl655930, rs376253, rsl961135, rs2517706, rs 1264807, and rs3129012.
  • ambiguity exists, e.g., row 4, where the SNP-haplotype could be B or W, and this ambiguity may be resolved by determining an additional SNP: rs3129907. And if rs3129907 is 1 or A, the SNP-haplotype allele will be B, and if rs3129907 is 3 or G, the SNP-haplotype allele will be W. Similarly, row 6, the SNP-haplotype allele can be ascertained by determining the SNP rsl059544 (2 or C will correspond to SNP- haplotype allele U, and 4 or T will correspond to SNP-haplotype allele V).
  • the SNP-haplotype allele can be ascertained by determining the SNP rsl987529 (3 or G will correspond to SNP-haplotype allele K, and 1 or A will correspond to SNP-haplotype allele T). Also, Similarly, row 14, the SNP-haplotype allele can be ascertained by determining the SNP rsl 987529 (1 or A will correspond to SNP-haplotype allele G, and 3 or G will correspond to SNP-haplotype allele H).
  • the SNP-haplotype allele can be ascertained by determining the SNP rs2395165 (4 or T will correspond to SNP-haplotype allele A, and 2 or C will correspond to SNP-haplotype allele R).
  • Figure 4B shows the percentage of a particular SNP haplotype allele that bears the indicated HLA allele.
  • SNP-haplotype allele J having the SNPs as shown in row 1 above, corresponds to an HLA-DRBl allele that has a 100% likelihood to be HLA-DRBl* 1302.
  • SNP-haplotype allele N having the SNPs as shown in this row, corresponds to an HLA-DRBl allele that has a 92.6% likelihood to be HLA-DRB1*1501.
  • the invention further features a method of predicting or assisting in the prediction of the likelihood or probability of development of a disease, particularly an MHC-linked disease, in a subject, preferably a human subject.
  • the method comprises genotyping an HLA gene in the subject to be tested by determining the nucleotide present at one or more extra-genic SNP sites, wherein the SNP is associated with an HLA genotype.
  • MHC- linked diseases include, but are not limited to, ankylosing spondylitis, Behcet Syndrome, common variable immunodeficiency, Goodpasture Syndrome, psoriasis, inflammatory bowel disease, insulin-dependent diabetes mellitus (type 1), multiple sclerosis, myasthenia gravis, pemphigus vulgaris, rheumatoid arthritis, systemic lupus erythematosus.
  • Identification of an HLA genotype in the subject which is associated with a disease is indicative that the subject has a greater likelihood of developing the disease. For example, HLA-DRB 1*1101 genotype is associated with pemphigoid diseases, as discussed above.
  • the invention further features a method of predicting or assisting in the prediction of the likelihood or probability of development of a disease, particularly an autoimmune disease, in a subject, preferably a human subject.
  • the method comprises genotyping an HLA gene in the subject to be tested by determining the nucleotide present at one or more extra-genic SNP sites, wherein the SNP is associated with an HLA genotype. Identification of an HLA genotype in the subject which is associated with a disease is indicative that the subject has a greater likelihood of developing the disease. For example, HLA-DR2 haplotype is linked or associated with multiple sclerosis and lupus.
  • autoimmune diseases grouped based on main target organs include, but are not limited to: 1) Nervous System: multiple sclerosis, myasthenia gravis, autoimmune neuropathies such as Guillain-Barre, autoimmune uveitis; 2) Gastrointestinal System: Crohn's Disease, ulcerative colitis, primary biliary cirrhosis, autoimmune hepatitis; 3) Blood: autoimmune hemolytic anemia, pernicious anemia, autoimmune thrombocytopenia; 4) Endocrine Glands: Type 1 or immune-mediated diabetes mellitus, Grave's Disease, Hashimoto's thyroiditis, autoimmune oophoritis and orchitis, autoimmune disease of the adrenal gland; 5) BloodVessels: temporal arteritis, anti-phospholipid syndrome, vasculitides such as Wegener's granulomatosis, Behcet's disease; 6) Multiple Organs Including the Musculoskeletal System (These diseases are also called connective tissue (muscle, ste
  • a further aspect of the invention provides a method of predicting or assisting in the prediction of the likelihood of developing an immune response in a subject, preferably a human subject.
  • An immune response may be developed against an infecting organism or agent.
  • an immune response may comprise a host-graft response, e.g., rejection of organ transplants.
  • the method comprises genotyping an HLA gene in the subject to be tested by determining the nucleotide present at one or more extra-genic SNP sites, wherein the SNP is associated with an HLA genotype.
  • the method may also comprise separately genotyping an HLA gene in a host (e.g., a blood or organ recipient or donee) and the same HLA gene (or the corresponding HLA gene) in a graft (e.g., a blood or organ donor) by determining the nucleotide present at one or more extra-genic SNP sites in the host and the graft, wherein the SNP is associated with an HLA genotype.
  • Genotyping an HLA gene in a host may involve assessing more, fewer, or the same extra-genic SNPs as compared the extra-genic SNP(s) to be assessed in a graft.
  • more than one extra-genic SNP is determined in order to determine the genotype of an HLA allele.
  • An exemplary method of determining whether or not a host and a graft have the same HLA alleles or immune-compatible HLA alleles may include: a) determining the HLA allele in the host (or graft) by ascertaining the nucleotide present at one or more extra-genic SNP sites or any other method; b) selecting extra-genic SNPs to be assessed in the graft (or host) based on the HLA allele identity as determined in a); c) assessing the selected extra-genic SNPs to identify the HLA allele genotype.
  • a host is determined by a method of the invention or any other method to have an HLA-A*101 allele (e.g., having SNP-haplotype as shown in row 5 of TABLE 3 above)
  • only rs2517862 and/or rsl655930 need to be assessed to ascertain that a graft does not have HLA-A* 101.
  • TABLE 3 Based on the information in TABLE 3 , one can optimize the selection of SNPs to be assessed.
  • Exemplification Example 1 Materials and Methods DNA Samples Samples were obtained from the Coriell Cell Repository and drawn from the collection of Utah CEPH pedigrees of European descent. One hundred thirty-six independent, grandparental chromosomes were used for haplotype construction. Of these chromosomes, 96 were in common with Gabriel et al. (2002) and, therefore, were used for comparison with the genome-wide LD structure. Identifiers for all individuals can be found at the Inflammatory Disease Research Group (LDRG) Website. Genotyping and Data Checking All SNPs for which genotyping was attempted were publicly available at the dbSNP Web site.
  • LDRG Inflammatory Disease Research Group
  • SNPs were selected mainly to achieve a desired spacing (1/20 kb); however, SNPs with more than one submitter were preferentially chosen.
  • SNP primers and probes were designed in multiplex format (average fivefold multiplexing) with SpectroDESIGNER software (Sequenom). A total of 435 assays were designed. Assays were considered successful and genotype data were included in the analyses described herein if they passed all of the following criteria: (1) a minimum of 75%> of all genotyping calls were obtained, (2) markers did not deviate from Hardy- Weinberg equilibrium, and (3) markers had no more than one Mendelian error. These criteria defined 201 successful assays. Genotype calls for successful markers were then set to zero for any single Mendelian error.
  • TAP1 and TAP2 were genotyped as described elsewhere (Carrington et al. 1993).
  • D6S2971, D6S2749, D6S2874, D6S273, D6S2876, D6S2751, D6S2741, and D6S2739 were typed as described elsewhere (Martin et al. 1998).
  • Genotyping details for the 11 remaining microsatellites can be found as supplemental information on the LDRGWeb site.
  • D6S2972 and D6S265 genotypes were typed twice (LDRG; Martin et al. 1998), and conflicts were resolved by retyping.
  • Unincorporated dNTPs were deactivated using 0.3U of Shrimp Alkaline Phosphatase (Roche) followed by primer extension using 5.4 pmol of each primer extension probe, 50 ⁇ mole of the appropriate dNTP/ddNTP combination, and 0.5 units of Thermosequenase (Amersham Pharmacia). Reactions were heated to 94 °C for 2 minutes, followed by 40 cycles of 94 °C for 5 s, 50 °C for 5 s, 72 °C for 5 s.
  • SpectroCHLPs were analyzed using a Bruker Biflex III MALDI-TOF mass spectrometer (SpectroREADER, Sequenom, San Diego, CA) and spectra processed using SpectroTYPER (Sequenom). Eleven of the microsatellites were amplified using the following primers/amplification programs:
  • the forward primers were fluorescently tagged with 6-FAM, TET or HEX. Amplification was performed in 15 microliter volumes containing 0.8 units Taq polymerase (Roche Applied Science), 25 ng DNA, 200 ⁇ M dNTPs, 2.4 pmol each primer, 3.0 nmol dNTPs, and lx PCR buffer (1.5n ⁇ M MgC12, 10 mM Tris-HCl, 50mM KC1, pH 8.3, Roche Applied Science). Reactions were run in one of the following MJ Research thermocyclers (PTC- 100, PTC-200 or Genomyx CycLR).
  • Genotypes for individuals from families 1331, 1332, 1347, 1362, 1413, 1416 and 884 for D6S1542, D6S1560, D6S1701, D6S1666, D6S265, D6S258 were obtained from 15 the CEPH website (http://www.cephb.fr/test/cephdb/). To ensure correspondence in allele sizes with those genotyped for this study, individual 1347-2 was genotyped for these loci.
  • RG-MSATS amplification reactions were heated to 95 °C for 2 minutes followed by 29 cycles of 94 °C for 45 s, 57 °C for 45 s, 72 °C for 1 minute. The final extension was at 72 °C for 7 minutes.
  • MSATJH and 64ANN were the same as RG- MSATS, except annealing was carried out at 55 °C and 64 °C respectively.
  • MSATTD was a touchdown annealing starting at
  • D' confidence limits were determined by calculating the probability of the observed data for all possible values of D', from which an overall probability distribution was determined. For all blocks identified, the outermost marker pair was required to be in strong LD, with an upper confidence limit (CU) > 0.98 and a lower confidence limit (CL) > 0.7.
  • genotypes 91.6% of SNP genotypes and 95% of HLA, TAP, and microsatellite genotypes — were phased with family information. Apart from initial phasing with family information, HLA, TAP, and microsatellite genotypes were not phased further, and the 5% of genotypes that were indeterminate were considered "ambiguous" in further analyses. Further haplotype inference of SNP genotyping data was performed with a procedure that is based on a probability model for haplotypes proposed elsewhere (Fearnhead and Donnelly 2001). This model can be regarded as a refinement that allows for recombination of the model used in the well-known program, PHASE (Stephens et al. 2001).
  • EHH Extended-Haplotype-Homozygosity
  • microsatellite types D6S258, D6S2840, D6S2814, D6S2793, D6S1666, D6S1701, D6S1560, and D6S1542
  • these genotypes were left as "null calls.”
  • 5% of microsatellite, HLA, and TAP genotypes could not be phased with family information. Since EHH is a cumulative statistic, these heterozygotes and missing data are predicted to result in a conservative estimate of EHH values.
  • Outlying variants depicted in Figures 3A-3C, were chosen on the basis of two criteria designed to pick alleles with high EHH values for their frequency class.
  • scores were ranked by EHH value times allele frequency.
  • Outliers had values >4.5 SDs above the mean.
  • All variants were sorted by frequency into 5% bins.
  • Multiblock SNP haplotypes include information from the blocks indicated in Figure 4, as well as that from any intervening SNPs not in those blocks. "Leave-one-out" cross- validation was performed using the LeaveOneOut program. In brief, a single chromosome is selected from the data set. The remaining samples are used to build a predictor.
  • This predictor is then used to predict the HLA genotype of the sample that has been removed. If the SNP haplotype occurred once, it is not considered in the test. For each locus, prediction was performed with 106 iterations. (See the IDRG Web site for the LeaveOneOut program and genotyping details.)
  • Example 2 Analysis of the MHC Region Based on the Integrated Map Structure ofLD in the HLA Genes, Compared with the Genome at Large Recent studies have shown that LD extends across long segments of the genome (Daly et al. 2001 ; Dawson et al. 2002; Gabriel et al. 2002; Phillips et al. 2003). Within such segments, a small number of distinct, common patterns of sequence variation (haplotype alleles) are observed in the general population. Between these segments are short intervals where recombination is apparently most active in creating assortments of these patterns (Daly et al. 2001; Jeffreys et al. 2001; Gabriel et al. 2002).
  • haplotype-specific recombinational suppression may result in high-frequency, extended haplotypes by reducing the number of recombination events a given haplotype will undergo.
  • Every element in the DR2 haplotype has an EHH value at least 4.8 SDs above the mean EHH for other variants with the same allele frequency.
  • Two of the remaining outlying alleles map to a single haplotype (D6S2840*219-C*0701), and the last outlying allele is DRB1*1101.
  • D6S2840*219-C*0701 the last outlying allele is DRB1*1101.
  • the DR2 haplotype is associated with systemic lupus erythematosus (SLE [MIM 152700]) and multiple sclerosis (MS [MLM 126200]) susceptibility, and it is protective for type I diabetes (IDDM [MIM 222100]) (Thorsby 1997; Chataway et al. 1998; Haines et al. 1998; Barcellos et al. 2002).
  • DRB1*1101 is associated with pemphigoid vulgaris
  • D6S2840*219-C*0701 is associated with autoimmune diabetes (MIM 275000) and thyroid disease (MIM 140300) (Drouet et al. 1998; Price et al. 1999; Okazaki et al. 2000).
  • these three haplotypes appear to have functional consequences for the human immune system. Although these haplotypes are associated with autoimmune diseases at present, it is possible that, under certain conditions, these functional differences were (and perhaps still are) beneficial for disease resistance and, therefore, may have undergone positive selection in the past. The other possibility is that these extended haplotypes are subject to allele- specific recombination suppression. By examining the individual recombination rates used to construct the recombination map, it is observed that, of the 12 individuals examined, the single individual bearing DRB1*1501 showed many fewer recombination events across the MHC than did the others, although this difference did not significantly deviate from the mean.
  • HLA-A HLA-B
  • HLA-C HLA-DRB 1
  • SNP haplotypes spanning classical HLA loci contained enough information to predict HLA alleles. If so, it might be possible to use high-throughput SNP genotyping as a first-pass surrogate for traditional HLA gene molecular typing (e.g., probe-based typing or direct sequencing) in disease association studies. For one of these classical genes, HLA-A, a single 7-SNP haplotype block spanning the locus was identified.
  • This 7-SNP HLA-A block has only six common variants, and those are predictive of the correct HLA-A allele 66.2% of the time, as shown by cross-validation analysis (LeaveOneOut [see the "Materials and Methods" section]).
  • the genotype information for a neighboring block was included, and the SNP haplotypes that comprised the combinations of alleles of these two blocks were examined. The success of prediction improved from 66.2%> to 82.6%> of all HLA-A alleles present.
  • multiblock haplotypes can act as surrogate markers for HLA alleles.
  • HLA-A*0101 allele occurs on the "G" SNP haplotype (comprising the haplotype alleles of two blocks) 92% of the time ( Figure 4A), and the "G" SNP haplotype correlates to HLA-A*0101 95.6%> of the time ( Figure 4B).
  • Cross-validation analysis was used to estimate the success rate of prediction.
  • HLA alleles can be accurately predicted by SNP haplotype 75%-84%> of the time (HLA-A: 82.6%; HLA- B: 79.8%; HLA-C: 84.3%; and HLA-DRBl: 75.0%).
  • HLA-A 96.2%
  • HLA-B 98.8%
  • HLA-C 96.0%
  • HLA-DRBl 82.26%

Abstract

The disclosure relates to novel approaches to mapping the MHC region and provides novel methods of genotyping the HLA loci. A haplotype map of the region and methods of using the map are also disclosed.

Description

Methods of Human Leukocyte Antigen Typing by Neighboring Single Nucleotide Polymorphism Haplotypes
Related Applications This application is a continuation of U.S. Patent Application No. 10/850,359, filed May 19, 2004, the disclosure of which is incorporated by reference herein in its entirety.
Statement Regarding Federal Funding Work described herein was funded, in whole or in part, from the National Cancer Institute, National Institutes of Health, under contract N01-CO-12400. The United States government has certain rights in the invention1.
Background of the Invention The classical Human Leukocyte Antigen (HLA) loci are the most highly variable genes in the human genome. Historically, attempts to characterize the region have focused on a handful of highly variable, classical HLA genes (class-I genes: HLA- A, HLA-B, and HLA-C; and class-II genes: HLA-DRBl, HLA-DQA1, HLA-DQB1, HLA- DPAl, and HLA-DPBl). These genes encode antigen-presenting molecules that mediate acquired immune response during infection, as well as host-graft responses after organ transplantation. All organ transplant donors and recipients are typed for these genes in order to best match donor with recipient. Also, these genes have been associated with many human autoimmune and inflammatory diseases, and many research laboratories genotype their human subjects for these loci as a matter of course. The HLA loci were originally studied by lower resolution serotyping techniques until the recent advent of "dot blot" hybridization-based molecular typing such as SSOP and SSP (Dynal Biotech, Biotest, One Lambda) that greatly improved examination of the region. Direct sequencing of HLA alleles is also possible. However, these current methods are laborious and expensive. Accordingly, novel approaches to map the HLA loci in the context of the MHC region are desirable.
Summary of the Invention Accordingly, the invention provides a more uniform, comprehensive map of commonly linked variation, e.g., a haplotype map, that will help to discriminate between causal alleles and variation that is merely in linkage disequilibrium (LD) with them. Such a resource will also allow a more complete description of the haplotype structure and, potentially, insight into the evolutionary and recombinational history of the chromosomal region in question. The invention provides an integrated SNP-haplotype map of a 4-Mb major histocompatibility complex (MHC) region. Preferably, the integrated map comprises SNPs that are preferred to be reliable, polymorphic, and evenly spaced, e.g., one SNP every 20 kb. The integrated map further comprises genotyped HLA genes, TAP genes, microsatellites, or combination thereof. The invention further features a novel method of genotyping Human Leukocyte Antigen (HLA) genes using patterns of neighboring single nucleotide polymorphisms (SNPs). The SNP-based method is an improvement over existing hybridization-based techniques, as it allows quick and inexpensive genotyping of the HLA loci. This method does not directly assess the intra-gene variation, as is done by all other current methods for HLA genotyping, but rather define HLA genotypes by studying the neighboring extra-genie variation(s) which falls outside the HLA allele to be genotyped and which, due to LD patterns, is conveniently linked to the HLA loci. Identification of the correlation of this extra-genic variation to the HLA gene alleles allows for the discovery and utilization of surrogate markers for HLA genotypes. This approach to genotype the HLA loci overcomes a substantial technical difficulty to applying high-throughput genotyping techniques to these hypervariable genes. By focusing on variation outside of the hypervariable HLA genes themselves, this method avoids the pitfalls of polymerase chain reaction (PCR) primer design in locations where nucleotide diversity can be as high as 12% (i.e., an average of 12 base pairs substituted per 100 nucleotides assessed). Instead, ancestral " tcl^king mutations" outside of these genes are used to resolve HLA genotypes with traditional SNP genotyping methods. This approach can be employed to map variation(s) in the regions neighboring HLA genes to fully resolve all known common HLA gene variants in multiple different ethnic populations. This method can benefit clinical laboratories typing individuals for transplantation procedures, as well as research laboratories that are interested in studying HLA gene variation(s) in particular patient populations or disease associations. Further, this method can be employed to predict the likelihood or probability of developing a disease, particularly MHC-linked diseases or autoimmune diseases. Alternatively, this method can be employed to predict the likelihood or probability of developing an immune response, e.g., a response against infection or a host-graft response (e.g., elicited by organ transplantation) in a subject, preferably a human subject. One aspect of the invention provides a method of genotyping an HLA gene, such as for example an HLA-A or an HLA-DRBl gene. The method comprises determining the nucleotide present at one or more extra-genic SNP sites, wherein the SNP is associated with an HLA genotype. The extra-genic SNP sites correspond to the HLA allele to be genotyped, that is, the SNP sites are outside and in the neighboring region(s) of the HLA allele to be genotyped. For example, to genotype the HLA-A allele, an extra- genic SNP to be assessed that corresponds to the HLA-A allele can be rs2517862, rsl655930, rsl616549, rs376253, rsl961135, rs2517706, rs2517701, rs2517699, rs435766, rs410909, rs2394255, rsl264807, rs2530388, rs356963, rs2286405, rs2240619, rs3129012, rs259938, or any combination thereof. Another example involves genotyping the HLA-DRB 1 allele, wherein an extra-genic SNP to be assessed can be rs742697, rs523627, rs3129960, rs2395163, rs2395165, rs983561, rs2239804, rs2213584, rs2395182, rs2858860, rs3129907, rsl059544, rsl987529, or any combination thereof. Another aspect of the invention provides a method of predicting or assisting in predicting the likelihood of developing a disease, in particular an iriflamiriatory disease, an MHC-linked disease, or an autoimmune disease, in a subject, preferably a human subject. The method comprises genotyping an HLA gene in the subject to be tested by determiiiing the nucleotide present at one or more extra-genic SNP sites, wherein the SNP is associated with the HLA genotype. A further aspect of the invention provides a method of predicting or assisting in predicting the likelihood of developing an immune response in a subject, preferably a human subject. An immune response may be developed against an infection or inflammation. Alternatively, an immune response may comprise a host-graft response, e.g., rejection of organ transplants. The method comprises genotyping an HLA gene in the subject to be tested by determining the nucleotide present at one or more extra-genic SNP sites, wherein the SNP is associated with an HLA genotype.
Brief Description of the Drawings The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee. Figure 1A-1E shows an integrated SNP map of the 4-Mb MHC in CEPH Europeans. Figure 1A shows the location and exon-intron structure for a subset of genes above the map, for positional reference. Figure IB shows 201 reliable, polymorphic SNPs, indicated on the map with ticks below the line. Ticks above the line are placed with 100-kb spacing. Figure 1C shows haplotype blocks below and common haplotype variants (13% frequency) shown as colored lines (thickness indicates relative population frequency). Colors serve only to distinguish haplotypes and do not indicate block to- block connections. Asterisks are found below the seven largest haplotype blocks. Figure ID shows pairwise D' values (Lewontin 1964) for SNPs indicated below the haplotype blocks. Note that each block represents a single D' calculation and is placed in the middle between the two SNPs analyzed. Red indicates strong LD and high confidence of the D' estimate (D'>0.95; LOD>= 3.0). Blue indicates strong LD with low confidence of the estimate of D' (D'=l; LOD<3.0). White indicates weak LD. Figure IE, shows the relative recombination rate, which is based on the sperm meiotic map, indicated in bar- graph form, where the value on the line is the regional average, 0.49 cM Mb. Green bars indicate recombination rates >0.49 cM/Mb, and yellow bars indicate rates <0.49 cM/Mb. The black arrowhead denotes a region of well-mapped recombination rate from Jeffreys et al. (2001). SNP marker density in that region is too low to comment on any similarities between the studies described herein. Note that five of seven long haplotype blocks map to regions where the recombination rate is =<0.49 cM/Mb. The remaining two long blocks are found in domains where recombination rates are 0.64 cM/Mb and 0.83 cM/Mb (rates near or below the genomewide average). Figure 2A-2D show block comparison between the MHC and other autosomal regions. Figure 2A shows a plot of LD by physical distance revealing that LD is extended in the MHC. Figure 2B shows that the average physical length of blocks in the MHC is longer than in the rest of the genome. Figure 2C shows that, measured by genetic distance, block size in the MHC is somewhat less than in the rest of the genome. Figure 2D shows that the number of haplotype variants in blocks not spanning classical HLA genes is the same as elsewhere in the genome. Figure 3A-3C shows EHH analysis of haplotype blocks, microsatellites, HLA genes, and TAP genes in the region. EHH is computed as the percentage of instances in which two randomly selected chromosomes with the same variant locus have identical alleles at all SNPs assayed up to a particular distance from that locus (e.g., an EHH of 0.5 at marker X means that 50% of possible pairings of a particular variant exhibit sequence identity from the locus to marker X). Figure 3A shows points representing the EHH at a distance of 0.25 cM from an allele at a particular locus. Outlying variants are indicated in color. The nine outlying variants define three extended haplotypes. The six points labeled as " 1 " indicate variants that map on the DRB 1*1501 haplotype (associated with lupus and MS). The two overlapping points labeled as "2" indicate variants C*0701 and D6S2840*219, which are both found on a haplotype associated with autoimmune diabetes, lupus, and hepatitis. The point labeled as "3" indicates DRB1*1101 (associated with pemphigoid disease). Figure 3B shows a recombination-distance-based map of the region. Microsatellites/genes are labeled and indicated with ticks above the line. Figure 3C shows EHH values for loci that have at least one outlying variant. Outlying variants were seen at 7 of the 48 independent loci tested. The X-axis denotes distance in cM. EHH values are converted to grayscale values: EHH of 1 p black, EHH of 0.5 p 50% grayscale. The solid lines 4-10 indicate the locus about which values were derived. The dotted lines 11-17 and 11'-17' indicate 0.25-cM distance at which outliers were assessed. Two HLA-C alleles, C*0702 and C*0701, are extended, as are two DRB1 alleles, DRB1*1501 and DRB 1*1101. The other HLA gene alleles with extended haplotypes are DQA1*0102 and DQB1*0602. The microsatellite alleles with extended haplotypes are D6S2793*244, D6S2876*11, and D6S2840*219. Figure 4A-4B show correlation of HLA alleles to SNP haplotype background. A map of region showing placement of SNPs and haplotypes assayed is shown for reference. Multi-SNP haplotypes are coded by single capital letters. Figure 4A shows SNP-HLA haplotypes sorted by HLA allele. Percents indicate the percentage of a particular HLA allele that falls on the indicated SNP haplotype. Figure 4B shows SNP- HLA haplotypes sorted by SNP haplotype allele. Percents indicate the percentage of a particular SNP haplotype allele that bears the indicated HLA allele. Counts are overall number of chromosomes bearing the SNP-HLA haplotype indicated.
Detailed Description of the Invention Overview In order to fully map the variation, especially variations associated with diseases, in the MHC region, an SNP haplotype map of the region was created. To be able to integrate this map with the wealth of findings from association studies, 201 reliable, polymorphic, evenly spaced SNPs (target density: one SNP every 20 kb) were genotyped in 136 independent chromosomes also genotyped for nine HLA genes, two TAP genes (involved in antigen processing), and 18 microsatellites. Markers were genotyped in families (18 multigenerational European pedigrees) to allow direct assessment of chromosomal phase and, thus, simple reconstruction of haplotypes. Using these SNP data, the haplotype patterns of the region and mapped these patterns were examined, relative to both genetic and physical distance, as assayed by an exceedingly high- resolution recombination map (Figures 1 A-1E). This recombination map is the result of the analysis of 20,000 sperm meioses from 12 men (Cullen et al. 2002). This SNP density is a large first step toward a comprehensive characterization of the patterns of common variation in the MHC. Here, this map is used to first explore the structure of LD in the region, with respect to both haplotype blocks and extended haplotypes. Next, SNP-haplotype variation in the MHC was examined, first considering regions between the classical HLA loci and then examining SNP-haplotype variation across these genes. The question of whether the SNP haplotype diversity near classical HLA loci contained enough information to predict the HLA allele carried on the chromosome was also examined. Figures 1A-1E show an integrated map of the SNP, microsatellite, and HLA variation in the MHC. This map shows that, aside from the classical HLA loci, the variation and LD structure of the MHC are not different from a genomewide control data set. Specifically, whereas LD appears to extend over longer physical distances in the MHC, this seems to be accounted for by the reduced recombination rate in the region. Furthermore, this map shows that, in the regions that do not span classical HLA loci, the number of common haplotype alleles in the MHC are not different from the rest of the genome. The integrated map of Figures 1 A-1E and the results shown in Figures 3 A-3C and
4A-4B demonstrate that multiblock SNP haplotypes contain considerable predictive information for common HLA alleles at HLA-A, HLA-B, HLA-C, and HLA-DRB1. Multiblock SNP haplotypes should enable cost-efficient, large-scale exploration of the variation at the classical HLA loci and beyond. An additional implication of these results is that multiblock SNP haplotypes may be sufficient to identify low-frequency variants throughout the genome. Such low-frequency variants would likely be missed in single, block-based, common variant analysis; however, their contribution to disease may be assayed by use of multiblock haplotypes in analysis. This integrated variation map of the MHC has considerable utility. In the 50 years of study since its first discovery, the MHC has been implicated in almost every human inflammatory and autoimmune disease. Although the MHC has been studied by typing of the classical HLA genes and microsatellites for many years, only rarely has this analysis definitively identified causal variation. Often, association studies using these methods implicate more than one allele at a single locus as influencing disease susceptibility. Although this may represent allelic heterogeneity underlying disease, reinterpreting such results with attention to shared SNP-haplotype variation might point to additional hypotheses regarding the causal variant. For instance, one may find that two different disease-predisposing HLA alleles share a common SNP haplotype, which suggests that a variant carried on that haplotype may, in fact, be the underlying cause of disease. Another common finding in MHC studies is that an extended haplotype, rather than a single variant, is associated with disease. A uniform map of the variation in the region would allow fine mapping of association signals on the basis of rare recombinant chromosomes. Because SNPs are more abundantly present, reliably typed, and cost efficient than microsatellites, they are an excellent choice for this sort of large-scale, high density genotyping. A denser sampling of all the haplotype variation in the region will allow researchers to fully consider all of the 120 genes that lie in the MHC, rather than to focus solely on the classical HLA loci. The map as shown in Figures 1A-1E identifies haplotype blocks covering 24.5% of the MHC. On the basis of the estimated average size of blocks in this region, SNP coverage must be increased four-fold to reach saturation. This map in Figures 1 A-1E is based on the genotypes only of individuals of European ancestry; variation in other populations must also be examined to unify MHC association results between populations. The additional SNPs to be discovered, as well as genotype information in other populations, may be employed to build a more complete map according to the materials and methods described herein that were employed to build the map as shown in Figures 1A-1E. Ultimately, a full understanding of the patterns of LD and haplotype diversity of this region should allow the identification of a subset of SNPs required for disease studies. This will allow MHC-association studies to be completed cost effectively by using a combination of haplotype-tagging and HLA allele-tagging SNPs. Although a large number of SNPs were used to construct the map as shown in Figures 1 A-1E, and more SNPs will be needed to fully describe the haplotype structure of the region, an estimate of 10-15 SNPs per locus may be sufficient for common, classical HLA alleles. Moreover, in cases where there is already significant association to a particular locus, these informative SNPs may be used to map outward from the original signal and delimit the region of association. An estimate of a few dozen SNPs may be needed in such endeavors. SNP -based haplotype approaches will allow the examination of larger disease cohorts and enable the identification of rare recombinant haplotypes that would refine association signals and potentially identify the causal alleles for MHC-associated diseases.
Integrated Map SNPs used in creating the integrated map as shown in Figure 1 include the following SNPs, as shown in TABLE 1. All SNPs are located on human chromosome 6, and their respective chromosome positions are shown in Column CHROM_POS. The frequencies of allele types are also shown in Columns FREQ1 and FREQ2. The primer sequences, as well as probe information and flanking sequences for the SNPs are described in detail at: htt^://www.broad.mit.edu/mpg/idrg/piOiects/HLA data SNP Info.xls (incorporated herein by reference in its entirety). The primer sequences for SNPs are also provided herein in TABLE 2. LD r- ^r ^r CM - co co co CVJ M CO CM - σ> r— VI co σ> n co CD LO T- in o LO CD LO σ> ι-- o CV| cvl LO O D CO LO co T- 00 D 00 LO O LO o r— 00 LO T- co σ> LO LO LO o *=r co co T- O "* LO co H r- f ° C°O M- co O oo D OO CO o M ^r r- 00 Oi LO CM CM en en o r~ co co O co r-~ LO O C σ> Lo - LO co CO D LO M T- σ> oo CO co oo co oo CD CD oo - - CM c o LO LO T- o o CNJ r~ co M o o o "vt CO CD O CO CD LO CD CD o CM O T- oo σ> o o σ o ^r O OI CD LO * co LO r»- T- - CO CO CD S- co o CM D r- CD CM CM LO LO CM 00 CM "sf CN r- - T- o co CO CD LO oo o σs en LO OO CD r O CM σ> CM LO 00 00 o O co 00 v- LU T- o σ> -* o N CM O O) co N- co o OO CD co co LO CD r~ LO LO T- r- en LO LO CO Q_ co co O O O) CO CM O CD N T- co c CO CD ιq co co cq n v^ ^r oo σ> -tf o 00 M U_ o d o o o o o o o d odd o o o d d d d d d d d d d odd d d d d d
CM LU _l LU _J < r-<ϋr-OϋϋOOOtDt50r-|-l-r-heiUr-(3ϋOO|- 5θe)OOOr-Or- LO - '* '* o o -ι-ιo _ σ> σ> Z r-- LO UJ 'Nr' CO Di u_
Figure imgf000011_0001
ό <ό
UJ _j LU (5<<ϋ<<<ϋ<ϋ<<<<ϋϋϋϋ<<ϋ<<<<ϋ<ϋϋ<<ϋϋ<ϋ
CO 0 o_ I CD LO 1- ■51 O N r- * CO IΛ £M IN tλj u r^ w u n t\ T- T wj u iλi υj lit T v- -ϊ- t^ i'j ι^ vu u vj uj U vj v»» O - CO CVI CO C CM OJ CO CM ^ ^ ^ OO CM CVI CO O O CD OO OO CO OO O O O CO N- T- O - OO CJJ OJ CM =p oo co cn cn cj cn o o o o o o o ^ ^ ^ - cvi cM cvi cvi cM cvi cM CM co co co co co co co co co i- ^r OJ O OJ OJ O OJ O O O O O O O O O O O O O O O O O O O O O O O O O O O O O ϋ CM CM C C CVl CM CO CO ^ CO O CO CO CO v^ CO CO CO CO CO CO CO CO CO CO
o O D CO CO CD CD CD CD CD CO CD CO CD CD CD CD CD CD CD CD CD CD CD CO CD CD CD CD CD CD CD CD CO CD CD CD
O
--- LO O S S
HI T- N
_ι τ- o
CQ 0. ffl s
<? y -- Cj i/i
Figure imgf000011_0002
io
Figure imgf000011_0003
rs2187978 6 30418709 C 0.888888889 T 0.111111111
SNP CHROM CHROM_POS ALLELE1 FREQ1 ALLELE2 FREQ2 rs1264562 6 30428292 A 0.362204724 C 0.637795276 rs1150769 6 30438720 A 0.438461538 G 0.561538462 rs1264511 6 30469758 C 0.571428571 G 0.428571429 rs2524172 6 30489460 A 0.125925926 G 0.874074074 rs2021720 6 30504972 C 0.410852713 G 0.589147287 rs1059510 6 30513496 C 0.669230769 T 0.330769231 rs2844724 6 30524708 C 0.348484848 T 0.651515152 rs2516650 6 30550400 C 0.179104478 G 0.820895522 rs1362119 6 30555226 A 0.656716418 T 0.343283582 rs1468079 6 30562131 A 0.338461538 C 0.661538462 rs2516640 6 30572208 C 0.121212121 T 0.878787879 rs2074505 6 30576689 C 0.351145038 T 0.648854962 rs3130242 6 30593090 A 0.654411765 G 0.345588235 rs1264440 6 30606834 C 0.669117647 T 0.330882353 rs2252745 6 30635057 C 0.325925926 T 0.674074074 rs1059612 6 30764464 C 0.858208955 T 0.141791045 rs3129973 6 30776893 C 0.851851852 T 0.148148148 rs3130673 6 30802269 G 0.843283582 T 0.156716418 rs1264377 6 30820115 C 0.828358209 T 0.171641791 rs1264352 6 30845173 C 0.172932331 G 0.827067669 rs2535335 6 30868020 A 0.7 G 0.3 rs3095354 6 30891957 C 0.544117647 T 0.455882353 rs1264332 6 30901222 C 0.715384615 G 0.284615385 rs1264314 6 30926804 A 0.782945736 T 0.217054264 rs1264297 6 30940639 A 0.705882353 C 0.294117647 rs2532936 6 30950044 G 0.298507463 T 0.701492537 rs3132571 6 30961148 A 0.632352941 G 0.367647059 rs2517434 6 30999873 C 0.92481203 T 0.07518797 rs2253417 6 31025801 A 0.088235294 G 0.911764706 rs1619376 6 31039164 A 0.213235294 G 0.786764706 rs2523897 6 31049548 A 0.15037594 G 0.84962406 rs2844670 6 31061565 A 0.792592593 G 0.207407407 rs2517523 6 31082273 A 0.635658915 G 0.364341085 rs2523882 6 31097806 C 0.75 T 0.25 rs2517407 6 31122306 C 0.287878788 T 0.712121212 rs1064190 6 31130758 G 0.488549618 T 0.511450382
rs1265099 6 31161025 A 0.679389313 G 0.320610687 rs1265114 6 31173035 C 0.661764706 T 0.338235294
SNP CHROM CHROM_POS ALLELE1 FREQ1 ALLELE2 FREQ2 rs915660 6 31199448 C 0.874074074 G 0.125925926 rs1265181 6 31211656 C 0.257575758 G 0.742424242 rs3132502 6 31239363 A 0.280701754 G 0.719298246 rs1793895 6 31247789 G 0.748148148 T 0.251851852 rs3134756 6 31269208 C 0.257352941 T 0.742647059 rs1793892 6 31275440 C 0.139705882 T 0.860294118 rs3130542 6 31286439 A 0.274074074 G 0.725925926 rs364415 6 31327348 C 0.80952381 T 0.19047619 rs3130690 6 31340260 G 0.832061069 T 0.167938931 rs2524227 6 31356125 A 0.353383459 G 0.646616541 rs2854008 6 31366306 A 0.291044776 G 0.708955224 rs2853996 6 31386696 C 0.821705426 T 0.178294574 hCV2995747 6 31393338 A 0.777777778 G 0.222222222 rs2301747 6 31425383 C 0.902985075 G 0.097014925 rs2256184 6 31434180 A 0.612403101 G 0.387596899 hCV2995705 6 31441607 C 0.066176471 G 0.933823529 rs2855804 6 31521290 C 0.681481481 T 0.318518519 rs3132464 6 31531399 C 0.259259259 T 0.740740741 rs2516400 6 31553393 C 0.671641791 T 0.328358209 hCV3273612 6 31564553 A 0.674242424 T 0.325757576 rs361525 6 31606263 A 0.068181818 G 0.931818182 rs986475 6 31619673 C 0.066666667 T 0.933333333 rs2857595 6 31631860 A 0.161764706 G 0.838235294 rs2844476 6 31645038 A 0.378787879 G 0.621212121 rs750332 6 31670185 C 0.233082707 T 0.766917293 rs1052486 6 31673871 C 0.465648855 T 0.534351145 rs3117583 6 31682962 A 0.857142857 G 0.142857143 rs805256 6 31698902 A 0.803149606 G 0.196850394 rs805290 6 31711788 C 0.735294118 T 0.264705882 rs805281 6 31724677 A 0.736842105 G 0.263157895 rs805292 6 31753147 A 0.21641791 G 0.78358209 rs1150793 6 31789133 A 0.961538462 G 0.038461538 rs707928 6 31814030 A 0.681481481 G 0.318518519 rs480092 6 31836340 C 0.174242424 T 0.825757576 rs2075799 6 31850226 C 0.941176471 T 0.058823529
rs539689 6 31857089 C 0.492647059 G 0.507352941 rs2763979 6 31866094 C 0.653846154 T 0.346153846 rs3130679 6 31879245 A 0.904 G 0.096 SNP CHROM CHROM_POS ALLELE1 FREQ1 ALLELE2 FREQ2 rs574914 6 31890829 A 0.151515152 G 0.848484848 rs660550 6 31909117 A 0.52238806 C 0.47761194 rs605203 6 31918651 G 0.351145038 T 0.648854962 rs589428 6 31919809 G 0.62962963 T 0.37037037 rs558702 6 31941915 A 0.082706767 G 0.917293233 rs419788 6 31990590 C 0.705882353 T 0.294117647 rs429608 6 31992054 A 0.112781955 G 0.887218045 rs433061 6 32043622 A 0.090225564 G 0.909774436 rs1269852 6 32119789 C 0.080882353 G 0.919117647 rs2269425 6 32150480 C 0.823076923 T 0.176923077 rs204989 6 32188886 A 0.110294118 G 0.889705882 rs2071280 6 32191654 C 0.294117647 G 0.705882353 rs2071277 6 32210496 C 0.435114504 T 0.564885496 rs3130316 6 32250273 C 0.691588785 T 0.308411215 rs926070 6 32283275 C 0.346153846 T 0.653846154 rs742697 6 32318423 C 0.345864662 T 0.654135338 rs3129960 6 32327712 A 0.227941176 G 0.772058824 rs2022534 6 32333790 A 0.596899225 G 0.403100775 rs3129907 6 32350292 A 0.759398496 G 0.240601504 rs2143462 6 32361583 C 0.860294118 T 0.139705882 rs1555115 6 32381050 C 0.895522388 G 0.104477612 rs2395158 6 32401103 A 0.880597015 G 0.119402985 rs2395161 6 32414066 A 0.867647059 C 0.132352941 rs983561 6 32430210 A 0.785185185 C 0.214814815 rs2239802 6 32438402 C 0.723880597 G 0.276119403 rs7194 6 32439002 A 0.563492063 G 0.436507937 rs1987529 6 32502240 A 0.85 G 0.15 rs1059544 6 32578551 C 0.268292683 T 0.731707317 rs2858860 6 32598186 G 0.454545455 T 0.545454545 rs2395253 6 32717006 A 0.053030303 G 0.946969697 rs2857210 6 32738722 A 0.35483871 G 0.64516129 rs719654 6 32749097 A 0.22962963 G 0.77037037 rs2157080 6 32758398 A 0.376923077 G 0.623076923 rs2621343 6 32771751 C 0.383458647 T 0.616541353
rs1383267 6 32830393 C 0.560606061 T 0.439393939 rs1029295 6 32853431 C 0.105263158 T 0.894736842 rs241404 6 32862944 C 0.428571429 T 0.571428571 rs2187688 6 32868648 A 0.446153846 G 0.553846154 SNP CHROM CHROM_POS ALLELE1 FREQ1 ALLELE2 FREQ2 rs151719 6 32900606 A 0.785714286 G 0.214285714 rs188245 6 32955171 C 0.529411765 T 0.470588235 rs663310 6 33009016 C 0.217054264 T 0.782945736 rs2071351 6 33045675 A 0.84496124 G 0.15503876 rs2144014 6 33067793 C 0.703125 T 0.296875 rs3130216 6 33079418 A 0.574626866 G 0.425373134 rs3129272 6 33099767 C 0.696296296 T 0.303703704 rs2294478 6 33102058 A 0.474074074 c 0.525925926 rs734181 6 33133246 C 0.201550388 G 0.798449612 rs2076311 6 33148710 A 0.298507463 C 0.701492537 rs2855433 6 33161160 A 0.701492537 C 0.298507463 rs421446 6 33178124 A 0.729323308 G 0.270676692 rs213213 6 33186822 C 0.725925926 T 0.274074074 rs213194 6 33198941 A 0.234375 G 0.765625 rs105445 6 33220331 C 0.146153846 G 0.853846154 rs464865 6 33259080 A 0.441176471 G 0.558823529 rs1014779 6 33278611 A 0.533333333 G 0.466666667 rs1061783 6 33284571 A 0.544776119 G 0.455223881 rs3130267 6 33318929 G 0.5390625 T 0.4609375 rs456993 6 33360326 C 0.444444444 T 0.555555556 rs211457 6 33367887 C 0.873134328 T 0.126865672 rs1705003 6 33388001 A 0.888888889 G 0.111111111 rs2076775 6 33396500 C 0.634328358 G 0.365671642 rs453590 6 33405670 C 0.634328358 T 0.365671642 rs1755047 6 33433266 C 0.451851852 G 0.548148148 rs210190 6 33466198 A 0.066176471 G 0.933823529 rs1755038 6 33467280 A 0.909774436 G 0.090225564 rs769051 6 33476004 G 0.659090909 T 0.340909091 rs210180- 6 33487120 A 0.348148148 T 0.651851852 rs210196 6 33509584 A 0.669117647 G 0.330882353 rs210203 6 33513090 A 0.37037037 G 0.62962963 rs210132 6 33538781 G 0.536764706 T 0.463235294 rs210135 6 33542803 A 0.827868852 T 0.172131148
Figure imgf000016_0001
rs210139 6 33545520 A 0.564885496 C 0.435114504 rs210145 6 33549551 C 0.467213115 G 0.532786885 rs396746 6 33558906 A 0.148148148 C 0.851851852 rs210120 6 33576523 A 0.588235294 G 0.411764706 rs407415 6 33581077 A 0.786764706 G 0.213235294
SNP CHROM CHROM_POS ALLELE1 FREQ1 ALLELE2 FREQ2 rs999943 6 33626118 C 0.266666667 T 0.733333333 rs2229634 6 33640290 C 0.701492537 T 0.298507463 rs658087 6 33667130 A 0.148148148 T 0.851851852 rs2281829 6 33677752 A 0.544117647 G 0.455882353 rs1555965 6 33679261 A 0.555555556 G 0.444444444 rs549652 6 33688213 A 0.147058824 G 0.852941176 rs608971 6 33703990 C 0.78030303 T 0.21969697 rs530614 6 33716891 A 0.161764706 G 0.838235294 rs2395449 6 33730616 A 0.388059701 T 0.611940299 rs943473 6 33745761 C 0.816176471 G 0.183823529 rs2395402 6 33755534 A 0.559701493 G 0.440298507 rs2894342 6 33776504 A 0.25 C 0.75 rs1547668 6 33777490 A 0.139344262 G 0.860655738
snp HG12 Primer 1 (SEQ ID NOS: 1-639) Primer 2 (SEQ ID NOS: 640-1278) rs1611750 29891606 ACGTTGGATGTGAGGACCACAAAAGTCAGG ACGTTGGATGCCCATCAATTGACCCAGTTC rs2517862 29898525 ACGTTGGATGGGGAAAACAGCAAGGTACAG ACGTTGGATGTGTTCTTTCTCCCTTTGCAC rs885933 29913724 ACGTTGGATGTGTAGCCAGTCATAGCTGTC ACGTTGGATGACTTCTCAGCTGCATCGATG rs886399 29913789 ACGTTGGATGGCTATCTGCCCTTTTGCTAC ACGTTGGATGTGGCTACATTTGACACCCTC rs2394233 29924799 ACGTTGGATGGATAAATGGGTGTTGTTTCG ACGTTGGATGGCAAAACACGGAAAAAGTTC rs1054175 29925728 ACGTTGGATGGGCCCCATGATGTATAAATG ACGTTGGATGACAGGTACACTGCAAAAGTG rs1611545 29931342 ACGTTGGATGACAATACCTGCAGTACCCTC ACGTTGGATGAAAACTTCCCTCATCCCAGC rs1632910 29935874 ACGTTGGATGGGCTCAACAGACTCGGAATG ACGTTGGATGACGTGAGCATATGAGGGCAT rs2517762 29938895 ACGTTGGATGTCCCTGGAATACTGATGAGG ACGTTGGATGAAAGCAGAGAACAAGGCCTG rs1655930 29942211 ACGTTGGATGTGTAGTAATCCTAGTGCTGG ACGTTGGATGATGGGTCCAATTTTCCACCC rs2905764 29953186 ACGTTGGATGTTTGGTGCCAGAGAGTAAGC ACGTTGGATGTTCTGTCTCATGCACTCAGG rs1616549 29957706 ACGTTGGATGAGTTCACGTGGACATCCATG ACGTTGGATGTTTGTGCTGAAGTGTGCAGG rs376253 29957969 ACGTTGGATGGGGTTATGGTGCATACGTTC ACGTTGGATGTCACTCCAGGACTCAGGTTC rs1961135 29958142 ACGTTGGATGGAACCCTCCTTTTCAGTGAC ACGTTGGATGGGCTGATACTCTGGGTTATC rs2735099 29958264 ACGTTGGATGGTCAGAAAAGATGGGCAGAC ACGTTGGATGTGCTCCTCAATTCCACATGC rs2524037 29958386 ACGTTGGATGATAGGCTCCTTTGCAGAAGG ACGTTGGATGAAGAACCTTGGGACACGATG rs2517706 29963094 ACGTTGGATGGGATTAGAAGCATGAGCCAC ACGTTGGATGGGCACACAAGGTGCATTTTG rs2975041 29964111 ACGTTGGATGCTCCATTCTCTGTCTCAAAG ACGTTGGATGCTTGTATCTGACTGATTTTC rs382875 29967813 ACGTTGGATGAGTCTTTGAGGGAAAGGAGG ACGTTGGATGAAAATTCCTGGTGCCCAAGG rs2517701 29969404 ACGTTGGATGATTGGAGTCATGGGAACCTG ACGTTGGATGACCCTAGGTAAGAGGATGTG rs2517699 29970029 ACGTTGGATGCCCCACTTCTCACATGATAC ACGTTGGATGGCCTCTGTCTTCTCTTTCTG rs435766 29983284 ACGTTGGATGTCCATGCCTTTCTGTGTGGG ACGTTGGATGTGAGGAATAGGGGTCAGCAG rs410909 29992085 ACGTTGGATGCACACAGATTCACACACACG ACGTTGGATGAAGTCAGCCTGTCCCACAAC rs2246555 29992480 ACGTTGGATGATCTCCCCCCTTCTTCAGAG ACGTTGGATGGTCTC I I I I I CCTGGAGGTG rs2394255 29993239 ACGTTGGATGGGTGTAAAGGAAACTGCAGG ACGTTGGATGATGGAGACAGTCCTTTCCAG rs1264807 30002241 ACGTTGGATGTAACATTCCCCTTTCTCCAC ACGTTGGATGGAGATATATCTTACCCTAACC rs1632926 30004710 ACGTTGGATGGCACAATTCTCATCCGACAC ACGTTGGATGTAACCTCTGTCTCCTTCCAG rs2530388 30010564 ACGTTGGATGCCCAACTCTCAACAAGGTAG ACGTTGGATGTCAGCCTCGTTATTCCTTCC rs356962 30012120 ACGTTGGATGAACACAGAAGGCAGAGGTTG ACGTTGGATGAAAGTCTTGCTCTTGTCCCC rs356963 30013075 ACGTTGGATGTGTGGTTGCTCCATTCATGC ACGTTGGATGCAGACAAATGGCAGTTAGCC rs2286405 30016829 ACGTTGGATGAAAACAGGCAGTGCATGAGC ACGTTGGATGTCACCTCAAAGTTGCAAGCG
rs2240619 30018890 ACGTTGGATGATTCCTCTCCGTCAGGACAG ACGTTGGATGATCTCCTGTAGATCTCCCGG rs886997 30021056 ACGTTGGATGACAAGGTTCTACTGAAGGGC ACGTTGGATGACCATGGGCTTTATGTGGTC rs3129012 30032096 ACGTTGGATGCCCACTTGGCATGGTGAATC ACGTTGGATGAAGGTCTTAGGAGAGGGCTG rs1150743 30035516 ACGTTGGATGCTGGACATTTCATCAGGACC ACGTTGGATGATCTCAGCATGTGAGGCTTC rs259938 30051799 ACGTTGGATGGAGGCTATGGTACCAAACTG ACGTTGGATGCTGTGGATTCTGGGATAGAG rs2844800 30061564 ACGTTGGATGCTCGGACTCCTTTGCTCATT ACGTTGGATGCCCACAGGAAAGGAGAAAAG rs259948 30065078 ACGTTGGATGACTTTCACTCCACTGCCTTC ACGTTGGATGAAACTTTCGTGCTGCAGGTG rs3132129 30071689 ACGTTGGATGCGTCTCCCTTTGTAAGACAG ACGTTGGATGACTGCTGAAGAGTGACAAGC rs 1150736 30098575 ACGTTGGATGTCGCGGAGTTGTTGGTGGAAG ACGTTGGATGTAAACTTCCACAGGGCCTCC rs1264709 30112291 ACGTTGGATGTTTTGGAGCTAGGATTCTGG ACGTTGGATGCTACCTCACTTCTGCATTTC rs1264701 30122157 ACGTTGGATGCCCTCTATGCTCACTATCTC ACGTTGGATGAAAAGAGCCAAGGGCAACAC rs2023472 30160044 ACGTTGGATGAI I I I I I CAGGCTCCCTGTG ACGTTGGATGTCAGTTTCTCAACCCAACCC rs2427749 30163400 ACGTTGGATGAGTGGGGTCACAATGTCTTC ACGTTGGATGTATTGTTTGAGCCTGGGAGG rs 015465 30170856 ACGTTGGATGGATAGTGCCCATTCACACTG ACGTTGGATGAATGTCCAGAGCTGATGAGG rs1419673 30181231 ACGTTGGATGAGTCATTGGCCTG I I I I I CG ACGTTGGATGTGACATCTACAAACAGTTTC rs1362104 30186130 ACGTTGGATGGGGAAAAAAAACCGTAAGTG ACGTTGGATGGGCAGTTGTAAATA I I I I I C rs1573297 30200857 ACGTTGGATGAGTCTTGTGGCTGCAAGAAC ACGTTGGATGGGGAAGAATCTGCTTCCAAG rs2285797 30204577 ACGTTGGATGATCCAAAGCACCCAAACCTG ACGTTGGATGCAAAAGACACAGCTAAAGCC rs2074477 30216292 ACGTTGGATGTTCTTTGCACTCCACCTCTG ACGTTGGATGTGGATGTGTGGTAGTTCCTG rs2844786 30228928 ACGTTGGATGTAGCCAAAGAAATCCTGAGC ACGTTGGATGCCTGACAAAAATGTCTCTAG rs2074473 30238687 ACGTTGGATGACTTCCACTTCCCAGTAGAC ACGTTGGATGTGTACAAGAGTGCCTACCTG rs2517614 30248435 ACGTTGGATGCATAAAAAGCTTCCACCAGC ACGTTGGATGTCATCACTGCCATCACAAGC rs2021722 30258150 ACGTTGGATGACACTTGGCTTACTTTCCCC ACGTTGGATGAGCCCTGGTAGTTTTTGTGG rs885916 30260107 ACGTTGGATGTGGAI I I I I CTTCCCCACTC ACGTTGGATGTAAGATGTTGCCACAGTTCC rs3129696 30270016 ACGTTGGATGAACTCCTGACGTGATCTGCC ACGTTGGATGAAAAAATAGGCTGGGCACGG rs928824 30280200 ACGTTGGATGCGCAAAAAAAAGTTGCAGTC ACGTTGGATGGGAATTGTTGGGTGATATGG rs3132656 30289505 ACGTTGGATGACCATGATTCTGAGGCCTCC ACGTTGGATGCTGCCGATAAAGACATACCC rs968909 30300336 ACGTTGGATGCAGTGTGAAATTGGACCCTG ACGTTGGATGTCCCTAAAGGGATCAATGGC rs1264626 30303763 ACGTTGGATGAGTCATGAAAGATCCACCCC ACGTTGGATGCCCACCCAAATTTCGTGTTC rs261956 30320688 ACGTTGGATGTTAGCAGGTATGGTGGCATG ACGTTGGATGAACTCCTGGCTCAAATGATC rs261945 30327954 ACGTTGGATGCATGGCCTCTTATGAGAACC ACGTTGGATGGGGCAACAAGAGTGAAACTG rs3094628 30341046 ACGTTGGATGAGGTGTGTTGGAAGGTGGTG ACGTTGGATGCCCATGCATGCAATTACCTC rs1264582 30350326 ACGTTGGATGTTGATTCCCCCTGCTGCTTC ACGTTGGATGTCGTGTCAGTGGAAGCTGGG rs1110464 30351625 ACGTTGGATGCTCAGAACTGCTGAAAACTG ACGTTGGATGAGACTCGTTGCTCTCTTTTC rs1264579 30358950 ACGTTGGATGTCTGCCTTCTTTGCTCAAGC ACGTTGGATGATTAGCTGAGTCTGGTGGTG
Figure imgf000018_0001
Figure imgf000019_0001
rs3132608 30665408 ACGTTGGATGGCACCCACTGACAGTAAGAG ACGTTGGATGTAGGGAGAAAGATCGAAGGG rs 1127955 30676469 ACGTTGGATGAAACAGAACCTGACACCAGC ACGTTGGATGTCCCAAATGTTCCCACAAGC rs1124795 30677163 ACGTTGGATGTCCTGACCCCTATCATCCTG ACGTTGGATGTATGCTCTGGGAGCCCTCAAC rs1076829 30682989 ACGTTGGATGCAGGCACACAGCTTTTTCAC ACGTTGGATGTCAGTTGGAGAAACCCACAC rs 1076828 30684025 ACGTTGGATGTGATCTGCAACCTATCCCAG ACGTTGGATGTATGGCTAACTTGTCCTGGC rs2285320 30696449 ACGTTGGATGATGGCGACTCACGCTCCCTG ACGTTGGATGTAGAGGTCCCAAGGTAGCTG rs2394392 30705580 ACGTTGGATGAGAGTTCCTCTGACCCAGAC ACGTTGGATGTTGCAGCAGAGCTGGGACAAG rs2239888 30705674 ACGTTGGATGCCTCTGTACTTTATTTTCTAC ACGTTGGATGTGAGGAGACAGGCAGGGTAG rs1075496 30714001 ACGTTGGATGCCATGC I I I I I GCAACTGCC ACGTTGGATGTTCCATCCCTAGTTTCTGCC rs3130644 30716427 ACGTTGGATGAAGTGCTGGGATTACAGCTG ACGTTGGATGCAGACAGCAGGTATGGTAAG rs3094090 30725716 ACGTTGGATGACCTGTAGTCCCAGCTACTC ACGTTGGATGTCTCGGCTCACTACAATCTC rs2239886 30726450 ACGTTGGATGGCTCTCTCTAAATGCTAGGC ACGTTGGATGAGCAGTCAGCATCAAAGCTC rs2394394 30733626 ACGTTGGATGACCTGAGATCGGGAGCTTGA ACGTTGGATGTTACAGGCATGCACCACCAC rs2075015 30736108 ACGTTGGATGAGCTTGGCTTTTCTCCAGAG ACGTTGGATGTCCATGGAGTAGGTACAAGG rs25525 30746293 ACGTTGGATGATCCCCTTTGGGTGAATCTG ACGTTGGATGAGACTTGTCATTCCAGGTCC rs2244011 30750829 ACGTTGGATGCAGACTGTTTGAGCCTGTTG ACGTTGGATGAAGTTGAAAACCTCCAGCCC rs1059612 30764464 ACGTTGGATGCCCCCCTCATTTTGACATCC ACGTTGGATGTCATGGCCCACATGACTGTG rs3129973 30776893 ACGTTGGATGAGTTCCCAACCCAAATCCAG ACGTTGGATGGATGCACAACATCAAGAAGC rs2894045 30788522 ACGTTGGATGGGGCACCTTGAAAAAAGAGC ACGTTGGATGAAATATGGCTCTGTTCCGCC rs2394402 30789242 ACGTTGGATGTTTCTGCAACCTCTGCCTCC ACGTTGGATGTTTGTGGCATGCGCCTGTAG rs3130673 30802269 ACGTTGGATGTCTTTAAGTGGATGGGCTCG ACGTTGGATGTGGCAGGCAGAGCAATTTAG rs3131041 30810816 ACGTTGGATGAGGTTGAAGCGATTCTCCTG ACGTTGGATGACAAAAGTTAGCTGGGCGTG rs 1264377 30820115 ACGTTGGATGAAGACCACTTCAGAGTCCAG ACGTTGGATGGGAGAGGTGGTCATGATCAG rs2394403 30823632 ACGTTGGATGCTATTCCAAAACATCACTGGC ACGTTGGATGCGGCCTATTTCTAGTCTTTTG rs1264364 30831067 ACGTTGGATGAGCCTCCCACCCACTCAAAG ACGTTGGATGTTGGGTGGTCGATGGGACTG rs2894046 30837877 ACGTTGGATGCCATGGTTGAAGGAGAAGAG ACGTTGGATGATCTTCTGTGGCAGACGTAG rs1264352 30845173 ACGTTGGATGCTTGGTACAAGTGAAACTGG ACGTTGGATGGCTCTTGCTCTTTCTTCTGG rs915664 30850194 ACGTTGGATGTATGACAGCACGTTTCTGCC ACGTTGGATGCCTCAAGGAGGCAGTTAAAC rs2535338 30860692 ACGTTGGATGGCCTGGCAACATAGCAAGAC ACGTTGGATGTCAGCCTCATGAGTAGCTGG rs2535335 30868020 ACGTTGGATGACCCCTCATCTCCTAAGCTC ACGTTGGATGTGAGCTGTCTTCCTTGCCTC rs2250264 30876536 ACGTTGGATGAGGAGGGAAGGAAGTATAAC ACGTTGGATGGAAACTGTCACCACAATCAAG rs3095354 30891957 ACGTTGGATGGCTGCATAATAAATTGCCCC ACGTTGGATGGTGTGTATGTGTTTAAGAGAG rs1264332 30901222 ACGTTGGATGGGAAAGAGATTCAGGCTTGG ACGTTGGATGCCTTTCTGACCTCTCTCTTG rs2855542 30912003 ACGTTGGATGGAAACTAGGGCAGAGATCAG ACGTTGGATGTCTAAGCCGTTGTTTATGGG rs3130799 30921946 ACGTTGGATGTGTGACTGATGGAGACCAGG ACGTTGGATGTGCATCCTCATGGTGAGCAG
Figure imgf000020_0001
Figure imgf000021_0001
rs885701 31199563 ACGTTGGATGTCTTCTCTGTCAAGCACATC ACGTTGGATGAGTGCATGCTGGGTACATGG rs1052989 31202267 ACGTTGGATGGGAGGCACTAAATATTCACG ACGTTGGATGTTGAAACCTCCTGCATCCTG rs1265181 31211656 ACGTTGGATGTTTGGCCTAGTTTGAGTGCC ACGTTGGATGGCTGCACAAACAACTTTCGC rs886389 31222612 ACGTTGGATGAGAAAGAAAGAAGAGAGAGAG ACGTTGGATGGTCCATTGAATGGAGTATAGC rs1793899 31225739 ACGTTGGATGACCTCTCTGCTCTCTGTCTC ACGTTGGATGTCCTTGTCAGGGACCACAAG rs3132502 31239363 ACGTTGGATGCAAGACTCCTTTCCTGTAAC ACGTTGGATGATCGTGCCATTGCACTCTAG rs1793895 31247789 ACGTTGGATGTCTGAACCCACACAGTACAC ACGTTGGATGTGGCACAGTCAGAATAAGGC rs1793894 31252511 ACGTTGGATGTTTCTCCATGTTGGTCAGGC ACGTTGGATGAATCTCAGCACCTTGAGAGG rs3134756 31269208 ACGTTGGATGAAAACATTGCAGGAGCTGAC ACGTTGGATGCAGCTTTATCAGGTTGGTTTC rs1793893 31272501 ACGTTGGATGTACCATGAATATAGCTATCG ACGTTGGATGTTTGCCTGAAGGACTGAAAC rs2394948 31275364 ACGTTGGATGGGGTCTAGAGAAGTAGGTTG ACGTTGGATGGGCAATACAGCTGCATTCAG rs 1793892 31275440 ACGTTGGATGTTTGCATCCCTAGTCCTGAG ACGTTGGATGTACAATCCTTCCCAAGGTGG rs3130542 31286439 ACGTTGGATGGTCTGCTAAACACAGGTTTC ACGTTGGATGTTATGTGACCCCCTCAAAGG rs2040748 31297875 ACGTTGGATGAGCAATCACAGCAAAGGAAC ACGTTGGATGTCAGGAACACTGAGAGAATG rs2253288 31301099 ACGTTGGATGCAAAGCCACAATGAGATACC ACGTTGGATGAGCCTCACCAGCATCTATTG rs2253487 31303455 ACGTTGGATGTCATGCTGAAAGGCTGTGTG ACGTTGGATGAGGTCAATCTTCTCCAGAGC rs2853941 31303557 ACGTTGGATGGTGGTCCCATGAATGCTTTC ACGTTGGATGAAGTTCATTGACACCCCCTC rs2844604 31304836 ACGTTGGATGCTGAAAGTGGACTGTGAAATG ACGTTGGATGTGAGACTCAAGACTGGCTAG rs2853939 31304971 ACGTTGGATGAAACCCTAGCCAGTCTTGAG ACGTTGGATGTAACTCCTCTTTCTGGGCAG rs2524059 31305152 ACGTTGGATGCAGTGACTTTGTTGCCTTGC ACGTTGGATGTTCTCCAAGTGTGGACACAG rs2844603 31305183 ACGTTGGATGATTCCACTTTACCCAGTGTC ACGTTGGATGTCAAGGTTTCTTTCTCCAAG rs2853938 31305806 ACGTTGGATGCCTGGAGGATGAGCAATGAC ACGTTGGATGTTGCAGTGCTCCTGCTCCCA rs2524058 31305898 ACGTTGGATGTGGGAGCAGGAGCACTGCAA ACGTTGGATGAGAAATCCCAAGGAGAGGCC rs2524053 31306798 ACGTTGGATGGACTTTTACGATCATCACTTC ACGTTGGATGTTTCAAGGAAGAATCTATAG rs2853935 31308207 ACGTTGGATGCTATAATCAAAGCCTGGGAC ACGTTGGATGGGAAATGCAAGAATGAGAGC rs2853933 31308417 ACGTTGGATGTTCCCTCATGTTGTTGCTGG ACGTTGGATGACAGCTACGGGTCTATCAAG rs2524151 31316283 ACGTTGGATGCCTTCAGATAAGGTATTGGG ACGTTGGATGTTGGATCAGCAGCTCTTTTG rs2524123 31319639 ACGTTGGATGTCCCCAAGAGGTTTTCACAG ACGTTGGATGCTGCAGTGGTAGAAGAGAAG rs2247056 31319815 ACGTTGGATGTGCATGGCTGTAAATTAGGC ACGTTGGATGAGGGCTGTCTAATCATTCCC rs2524089 31320847 ACGTTGGATGCCCCTTCCTTGTATAGTTCC ACGTTGGATGTACAGGTCTGTCCCACCATC rs364415 31327348 ACGTTGGATGTTGAACCATGAGGAGGAGTC ACGTTGGATGTCTCCTCTCACACCATCCAG rs3130690 31340260 ACGTTGGATGATGAGGTCATGTGAGTGTGC ACGTTGGATGTTCCTCCGTATCTGTCTGTG rs2524227 31356125 ACGTTGGATGAAAGAGAATGCCCTGAATGG ACGTTGGATGAAAAAGAGTAGAGCCCCTGG rs2854008 31366306 ACGTTGGATGAAGACCCATTTGCTGCTTCC ACGTTGGATGTGGGAGGGCCTTGAAAATAC rs709052 31376822 ACGTTGGATGAGATCACACTGACCTGGCAG ACGTTGGATGTTCTATCTCCTGCTGGTCTG
Figure imgf000022_0001
Figure imgf000023_0001
Figure imgf000024_0001
Figure imgf000025_0001
rs491870 32298352 ACGTTGGATGAAAGAATTGGGACTTGCCTC ACGTTGGATGTGGTGTATTTAGACCTACAC rs1018433 32308410 ACGTTGGATGTGTCCTAACTTCCTGGGTAC ACGTTGGATGTGTGCTACCCATGCAGTGTG rs513095 32314754 ACGTTGGATGATCACTCACCACTCACATGG ACGTTGGATGGCCCCCAAGGAATAAGAAAC rs742697 32318423 ACGTTGGATGAGGGAAAACTTTCCCTTTGG ACGTTGGATGGGGTGCATACTACTTTAACC rs523627 32318719 ACGTTGGATGTGACCTGCTGATAAACTTTC ACGTTGGATGTGACACATCATTCTCTCACC rs2077333 32320455 ACGTTGGATGACCACCACCTAAGTTTCCAG ACGTTGGATGAAGCAAGAGATGGCTAGTGC rs2395143 32320463 ACGTTGGATGACCACCAAAGTTTCCAGAAC ACGTTGGATGCTGTCAAAGCAAGAGATGGC rs504703 32320902 ACGTTGGATGAGAAGTGACAGGGAAGCTAC ACGTTGGATGTTCTTTGTACCAGCTAGGCC rs3129958 32327056 ACGTTGGATGACTGATGGTAGGGAAAGGTG ACGTTGGATGAGAACAGTCCCTTGAGAAAG rs3129960 32327712 ACGTTGGATGATCATGCCACTGCATTCCAG ACGTTGGATGTTTCTAGCTCTGATCGCCTG rs2022537 32328814 ACGTTGGATGCTTGGATAGGTGATCACTTC ACGTTGGATGAGGGAAATGAGTATGTTGAG rs2022534 32333790 ACGTTGGATGCTCTCTTTTACCAGTGTGAG ACGTTGGATGCAGTCACTTAGAGGATCTTG rs2143468 32335906 ACGTTGGATGTATCCACAGAGACAATGTCC ACGTTGGATGGGGCAGTGGAAGGTATTTAC rs2395145 32338774 ACGTTGGATGCTCACCATCTTTTGGAACTG ACGTTGGATGAAACCCTGTCATTGATCGAC rs2076542 32343684 ACGTTGGATGTACATGGCTAGCACGAAAGG ACGTTGGATGATCTCTTCCATTGCTGCCAG rs2076541 32343772 ACGTTGGATGGGATAAGAGCAAAAAGTTAG ACGTTGGATGCTGAGGACACAGCTAATATC rs2076540 32343828 ACGTTGGATGGAGAGCAATTTCCAAACCTG ACGTTGGATGCTAACTTTTTGCTCTTATCC rs3129907 32350292 ACGTTGGATGGTTATAAGGTAAGTTGAGGTC ACGTTGGATGTGAATTCTCAGTCAGCTGAG rs3129927 32360382 ACGTTGGATGCCTGCCACAACATAAAAGGC ACGTTGGATGAAATGGTGCCTCATAGCGTG rs2143462 32361583 ACGTTGGATGTTAGTGGTACTGGTGTGTCC ACGTTGGATGCAGGTTTTGAAACGTGAGAG rs2073047 32362481 ACGTTGGATGTTGGTGATTGACACAGTCAC ACGTTGGATGGCAGGAACTAGGAATTGTGC rs2073044 32365548 ACGTTGGATGGTACTGAGTACACCATCTAG ACGTTGGATGCAAGTAGTCAATATGCCCTC rs2050190 32365638 ACGTTGGATGCCCTATTAATAGGGTGGACC ACGTTGGATGAGTGTCTGAAATGCCCTGTC rs2076536 32365910 ACGTTGGATGTCCTTGCCTGCTTCCTTTTC ACGTTGGATGTAACTGTGGGTTGTTTCCCC rs2050189 32366209 ACGTTGGATGGCTTAGGTCTGATCAATCTG ACGTTGGATGTATGAACTTGGGTGTCAGGG rs2395151 32369884 ACGTTGGATGAAAACGATGCCCCTATCAGC ACGTTGGATGGTACGTCTAACTGCTGTTCG rs2894252 32371988 ACGTTGGATGGGGAAAGAAAATGTCTATGGC ACGTTGGATGTAGATGAGAGTGCAACTTCG rs2395156 32374635 ACGTTGGATGTTGCACAGATGCAAAGATTC ACGTTGGATGAAATGTTTGTGCCATCTAAG rs2395157 32374682 ACGTTGGATGTTTAAAATGTTTGTGCCATC ACGTTGGATGTTGCACAGATGCAAAGATTC rs1555115 32381050 ACGTTGGATGATCTATTCCAGCCAGGCTAG ACGTTGGATGCCCATCCTGAAAACCTTACC rs2076534 32388689 ACGTTGGATGTTTGCAGAGGATAGCAGGAG ACGTTGGATGAGACCAACTCAGACTTACTC rs2076533 32390050 ACGTTGGATGCTAATAACACACTGTGAAAC ACGTTGGATGAGGAAATCTGAGTATCTTAC rs2076530 32390339 ACGTTGGATGAGGCCAGTTTGGATCTGAAG ACGTTGGATGATTAAAGTGGCAGGAGCAGG rs2076529 32390478 ACGTTGGATGTCAGTCTGCCCTCGTCAATG ACGTTGGATGGAGAGCAGATGGCAGAGTAC rs2294880 32394245 ACGTTGGATGACCTGACAGGAAGCAAAGGG ACGTTGGATGTAAGTCATGGTAACCTCCGG
Figure imgf000026_0001
rs2294878 32394318 ACGTTGGATGTAGGAACAACAGGACATGGG ACGTTGGATGTCCTCTGAGTTCTCTGAGAC rs2076525 32397124 ACGTTGGATGGCACCTCGTAI I I I l ATCAAG ACGTTGGATGTGGCTTTCAATACATATTGC rs2076524 32397192 ACGTTGGATGTACGAGGTGCTATGGTGCAG ACGTTGGATGAGGTCAGTGCTCTGCCTCTAG rs2076523 32397343 ACGTTGGATGTATTGGGAAGACATCCGGG ACGTTGGATGTGGCTTCCGCATAGAACAGG rs2395158 32401103 ACGTTGGATGGCTGAGTCACCTTTGGAAAG ACGTTGGATGGGCCTCTGAGATGTAGTTAC rs3135380 32411189 ACGTTGGATGTAAAATTGGGCATGGGAAAC ACGTTGGATGGAAATCTGCTAGGCTTAAAC rs2395161 32414066 ACGTTGGATGTTTCCCTCCCCACAATCTAC ACGTTGGATGTCACCTGGACCTGATTGATC rs2395163 32414323 ACGTTGGATGATCGGCAGCTTGGAAACTAC ACGTTGGATGGGGCTGGATAATGATGGATG rs2395165 32414658 ACGTTGGATGCAGCTTCCATGTGGTGTTTG ACGTTGGATGTTTGTCCCTCTAGCCCTTTG rs2395166 32414789 ACGTTGGATGCAGTTCCTATGAAGGATGATC ACGTTGGATGCCATAGAAACCTTGGAAGTC rs2213581 32415060 ACGTTGGATGCAGTATCCCACAGAGAAGTC ACGTTGGATGGGAGCCTCAAATTATCACTC rs732163 32421456 ACGTTGGATGACCCCTTTCTAATATCTCTC ACGTTGGATGTCTTCTATATCGGATAATGC rs732162 32421458 ACGTTGGATGACCCCTTTCTAATATCTCTC ACGTTGGATGTCTTCTATATCGGATAATGC rs 1894552 32422010 ACGTTGGATGGCTCTTCAACTTATGATGGG ACGTTGGATGGCCACATGATCATGAAGGTG rs2105903 32422201 ACGTTGGATGAAACTACAGACACACCTGAC ACGTTGGATGTCACCTTCATGATCATGTGG rs983561 32430210 ACGTTGGATGTCATATTGGCCACTCCGAAG ACGTTGGATGTGAGAAGATGAGAGCAACAG rs3129868 32430931 ACGTTGGATGTATTCCAGCAGACCAGCTTC ACGTTGGATGGAGGTGCTGAGGGAATATTG rs2395173 32431414 ACGTTGGATGTACATCTCTCAGGCTTGCTC ACGTTGGATGACTTCCACCTCCCAAATCTC rs2395174 32431433 ACGTTGGATGTACATCTCTCAGGCTTGCTC ACGTTGGATGACTTCCACCTCCCAAATCTC rs2395177 32431631 ACGTTGGATGATCTGCAACATCAGCAGAGG ACGTTGGATGAGCCCTTAAAACTGTTAGGG rs2239804 32438079 ACGTTGGATGTGTTACTTCTTCCCACACTC ACGTTGGATGGCTTGGAGCATCAAACTCTG rs2239802 32438402 ACGTTGGATGCTGAAGCTTTGGGATACCAG ACGTTGGATGAGGAACAGATGTGGCTCTTG rs1051336 32438914 ACGTTGGATGAGTGTGGATATGCCTCTTCG ACGTTGGATGGGAAAAGGCAATAGACAGGG rs3177928 32438957 ACGTTGGATGGGTAACTATGTGTGTCTTGC ACGTTGGATGGCAGAAGTTTCTTCAGTGATC rs7194 32439002 ACGTTGGATGCATGGAGGTGATGGTGTTTC ACGTTGGATGTGCTTTCACTGAGGTCAAGG rs2213586 32439616 ACGTTGGATGTCTGAGATCCATACCTTGGG ACGTTGGATGTTGGGAGATCTCTACTGAGC rs2213585 32439672 ACGTTGGATGAACCCCAAGGTATGGATCTC ACGTTGGATGTTCCTTCTCCCCACTCTAAC rs2213584 32439781 ACGTTGGATGAATGGGTAAGGCCAGTCTTC ACGTTGGATGGAAGGAAGACAGAAGAATCC rs2395182 32439839 ACGTTGGATGGGCCTTACCCATTCTGTTAG ACGTTGGATGTCAGTCAGACTACTCTCTCG rs2227139 32439981 ACGTTGGATGGACATTAAGATGAGAGGAAGG ACGTTGGATGTGGTTTATGGCAGGTTCTAG rs1547422 32453362 ACGTTGGATGTGCATAAGCATTTCACTGAG ACGTTGGATGCAAACCTGTACATGTATCCC rs1548306 32453442 ACGTTGGATGATAATGTGAGGAGGCTAGTC ACGTTGGATGATTTCAGAGATTTCGGGATC rs2187824 32465527 ACGTTGGATGCTCTAGCCTTCTTTCTGTCC ACGTTGGATGTTCCAGGGAGACAGAATGTG rs2187823 32465789 ACGTTGGATGCCAGGATCCAAACAGTGATC ACGTTGGATGAGTACACAGTAGCTGCTGAG rs2187822 32475997 ACGTTGGATGACCAGGCCTTTGATTTTCAG ACGTTGGATGACTACATTTGGGATACTGGG
Figure imgf000027_0001
Figure imgf000028_0001
Figure imgf000029_0001
Figure imgf000030_0001
Figure imgf000031_0001
Figure imgf000032_0001
rs1547668 | 33777490 | ACGTTGGATGTAGTGGCTGTTTCTCTCCTG | ACGTTGGATGATATCCGTGGCAATTCCCAC
Genotyping HLA Loci The invention features a novel method of genotyping Human Leukocyte Antigen (HLA) genes using patterns of neighboring single nucleotide polymorphisms (SNPs). The SNP-based method is an improvement over existing hybridization-based techniques, as it allows quick and inexpensive genotyping of the HLA loci. This method does not directly assess the intra-gene variation, as is done by all other current methods for HLA genotyping, but rather defines HLA genotypes by studying the neighboring extra-genic variation(s) which, due to LD patterns, is conveniently linked to the HLA loci. By "extra-genic" herein is meant outside or in the neighboring region(s) of the HLA allele to be genotyped. Identification of the correlation of this extra-genic variation to the HLA gene alleles allows for the discovery and utilization of surrogate markers for HLA genotypes. One aspect of the invention provides a method of genotyping an HLA gene, such as for example an HLA-A or an HLA-DRBl gene. The method comprises determining the nucleotide present at one or more extra-genic SNP sites, wherein the SNP is associated with an HLA genotype. For example, to genotype the HLA-A allele, an extra- genic SNP to be assessed can be rs2517862, rsl655930, rsl616549, rs376253, rsl961135, rs2517706, rs2517701, rs2517699, rs435766, rs410909, rs2394255, rsl264807, rs2530388, rs356963, rs2286405, rs2240619, rs3129012, rs259938, or any combination thereof. Another example involves genotyping the HLA-DRB allele, wherein an extra-genic SNP to be assessed can be rs742697, rs523627, rs3129960, rs2395163, rs2395165, rs983561, rs2239804, rs2213584, rs2395182, rs2858860, rs3129907, rsl059544, rsl987529, or any combination thereof. Nomenclature and designations of the HLA alleles have been described by Marsh et al., Tissue Antigens (2002) 60:407-464. A summary of HLA-A, -B, -C, -DRB 1/3/4/5, -DQB1 alleles and their association with serologically defined HLA-A, -B, -C, -DR and - DQ antigens is provided by Schreuder et al., Tissue Antigens (2001) 58:109-140. Methods of determining or analyzing SNPs are known in the art. For example, to detect any particular SNP in target DNA sample, e.g., a DNA sample from a subject to be tested, preferable a human subject, one can employ any of the known procedures in the art. For example, two distinct types of analysis and seven procedures are described in U.S. Patent Application Serial No. 10/213,272, Publication No. 20030170665, incorporated herein by reference in its entirety. The first type of analysis is sometimes referred to as de novo characterization. This analysis compares target sequences in different individuals to identify points of variation, i.e., polymorphic sites. By analyzing a group of individuals representing the greatest variety patterns characteristic of the most common alleles/haplotypes of the locus can be identified, and the frequencies of such populations in the population determined. Additional allelic frequencies can be determined for subpopulations characterized by criteria such as geography, race, or gender. The second type of analysis determines which form(s) of a characterized polymorphism are present in individuals under assessment. There are a variety of suitable procedures: 1). Allele-Specific Probes The design and use of allele-specific probes for analyzing SNPs is described by e.g., Saiki et al., Nature 324:163-166 (1986); Dattagupta, EP 235,726, Saiki, WO 89/11548. Allele-specific probes can be designed that hybridize to a segment of target DNA from one individual but do not hybridize to the corresponding segment from another individual due to the presence of different polymorphic forms in the respective segments from the two individuals. Hybridization conditions should be sufficiently stringent that there is a significant difference in hybridization intensity between alleles, and preferably an essentially binary response, whereby a probe hybridizes to only one of the alleles. Some probes are designed to hybridize to a segment of target DNA such that the polymorphic site aligns with a central position (e.g., in a 15 mer at the 7 position; in a 16 mer, at either the 8 or 9 position) of the probe. This design of probe achieves good discrimination in hybridization between different allelic forms. Allele-specific probes are often used in pairs, one member of a pair showing a perfect match to a reference form of a target sequence and the other member showing a perfect match to a variant form. Several pairs of probes can then be immobilized on the same support for simultaneous analysis of multiple polymorphisms within the same target sequence. 2). Tiling Arrays The SNPs can also be identified by hybridization to nucleic acid arrays. Subarrays that are optimized for detection of a variant forms of a precharacterized polymorphism can also be utilized. Such a subarray contains probes designed to be complementary to a second reference sequence, which is an allelic variant of the first reference sequence. The inclusion of a second group (or further groups) can be particular useful for analyzing short subsequences of the primary reference sequence in which multiple mutations are expected to occur within a short distance commensurate with the length of the probes (i.e., two or more mutations within 9 to 21 bases). 3). Allele-Specific Primers An allele-specific primer hybridizes to a site on target DNA overlapping an SNP and only primes amplification of an allelic form to which the primer exhibits perfect complementarily. See Gibbs, Nucleic Acid Res. 17, 2427-2448 (1989). This primer is used in conjunction with a second primer which hybridizes at a distal site. Amplification proceeds from the two primers leading to a detectable product signifying the particular allelic form is present. A control is usually performed with a second pair of primers, one of which shows a single base mismatch at the polymorphic site and the other of which exhibits perfect complementarily to a distal site. The single-base mismatch prevents amplification and no detectable product is formed. The method works best when the mismatch is included in the 3 '-most position of the oligonucleotide aligned with the polymorphism because this position is most destabilizing to elongation from the primer. 4). Direct-Sequencing The direct analysis of the sequence of any samples for use with the present invention can be accomplished using either the dideoxy-chain termination method or the Maxam-Gilbert method (see Sambrook et al., Molecular Cloning, A Laboratory Manual (2nd Ed., CSHP, New York 1989); Zyskind et al., Recombinant DNA Laboratory Manual, (Acad. Press, 1988)). 5). Denaturing Gradient Gel Electrophoresis Amplification products generated using the polymerase chain reaction can be analyzed by the use of denaturing gradient gel electrophoresis. Different alleles can be identified based on the different sequence-dependent melting properties and electrophoretic migration of DNA in solution. Erlich, ed., PCR Technology, Principles and Applications for DNA Amplification, (W. H. Freeman and Co, New York, 1992), Chapter 7. 6). Single-Strand Conformation Polymorphism Analysis Alleles of target sequences can be differentiated using single-strand conformation polymorphism analysis, which identifies base differences by alteration in electrophoretic migration of single stranded PCR products, as described in Orita et al., Proc. Nat. Acad. Sci. 86, 2766-2770 (1989). Amplified PCR products can be generated as described above, and heated or otherwise denatured, to form single stranded amplification products. Single-stranded nucleic acids may refold or form secondary structures which are partially dependent on the base sequence. The different electrophoretic mobilities of single- stranded amplification products can be related to base-sequence difference between alleles of target sequences. 7). Single Base Extension An alternative method for identifying and analyzing SNPs is based on single-base extension (SBE) of a fluorescently-labeled primer coupled with fluorescence resonance energy transfer (FRET) between the label of the added base and the label of the primer. Typically, the method, such as that described by Chen et al., (PNAS 94:10756-61 (1997)), uses a locus-specific oligonucleotide primer labeled on the 5' terminus with 5- carboxyfluorescein (FAM). This labeled primer is designed so that the 3' end is immediately adj acent to the polymorphic site of interest. The labeled primer is hybridized to the locus, and single base extension of the labeled primer is performed with fluorescently-labeled dideoxyribonucleotides (ddNTPs) in dye-terminator sequencing the effect of mfDNA D-loop sequence polymorphism on milk production, each cow was the next generation of the herd. TABLE 3 shows exemplary extra-genic SNPs that correspond to HLA-A alleles and can be used in genotyping HLA-A alleles. The SNPs and HLA-A allele are lined up in each row of the table from the left to the right according to their respective positions on chromosome 6. The percentage numbers on the right column represent the likelihood of the identity of a particular HLA-A allele when the exemplary SNPs are determined to be as shown in the respective rows. For example, in row 1, the HLA-A allele has a 100% likelihood to be HLA-A*2402, when the 18 SNPs listed are determined to be the respective nucleotides as shown in row 1. Take row 5 as another example, the HLA-A allele has a 92% likelihood to be HLA-A* 101, when the 18 SNPs listed are determined to be the respective nucleotides as shown in row 5. The allele-type determinative SNPs between HLA- A*2402 and HLA-A*101 include: rs2517862, rsl655930, rs376253, rsl961135, rs2517706, rs 1264807, and rs3129012.
O O CO t- CM O O O O O O O O Θ|Θ||B V O O OJ CD OJ O O O O O O O O
-V1H sift jo "" "" P00L)!|Θ|!| %
Figure imgf000039_0001
r- f- o co LO .-— ^ CM o σ> CM CD CD CD CD o 00 co m CD CD CD σ> v— LO oo L oo CM CO Oi LO LO O en co LO CM CM C35 en oo co en α oo ^r O N- m CO co O) CM CVI co oo 5° f3 co CVI CM to CN CM CM CM P> CO O to O to to
Figure imgf000040_0001
TABLE 4 Legend: 1=A; 2 =C; 3=G; 4=T The above table shows exemplary extra-genic SNPs that correspond to HLA- DRBl alleles and can be employed to genotype HLA-DRBl alleles. Again the relative positions of the SNPs and the HLA-DRBl on human chromosome 6 are shown (from left to right in each row). The letters on the right-most column are arbitrarily assigned to the SNP-haplotype alleles as shown on each row. For example, row 1 corresponds to SNP- haplotype allele J, with the extra-genic SNPs determined to be the nucleotides as shown in this row. Where ambiguity exists, e.g., row 4, where the SNP-haplotype could be B or W, and this ambiguity may be resolved by determining an additional SNP: rs3129907. And if rs3129907 is 1 or A, the SNP-haplotype allele will be B, and if rs3129907 is 3 or G, the SNP-haplotype allele will be W. Similarly, row 6, the SNP-haplotype allele can be ascertained by determining the SNP rsl059544 (2 or C will correspond to SNP- haplotype allele U, and 4 or T will correspond to SNP-haplotype allele V). Also, row 11, the SNP-haplotype allele can be ascertained by determining the SNP rsl987529 (3 or G will correspond to SNP-haplotype allele K, and 1 or A will correspond to SNP-haplotype allele T). Also, Similarly, row 14, the SNP-haplotype allele can be ascertained by determining the SNP rsl 987529 (1 or A will correspond to SNP-haplotype allele G, and 3 or G will correspond to SNP-haplotype allele H). Similarly, row 16, the SNP-haplotype allele can be ascertained by determining the SNP rs2395165 (4 or T will correspond to SNP-haplotype allele A, and 2 or C will correspond to SNP-haplotype allele R). Next, Figure 4B shows the percentage of a particular SNP haplotype allele that bears the indicated HLA allele. For example, SNP-haplotype allele J, having the SNPs as shown in row 1 above, corresponds to an HLA-DRBl allele that has a 100% likelihood to be HLA-DRBl* 1302. Take row 2 as another example, SNP-haplotype allele N, having the SNPs as shown in this row, corresponds to an HLA-DRBl allele that has a 92.6% likelihood to be HLA-DRB1*1501. The invention further features a method of predicting or assisting in the prediction of the likelihood or probability of development of a disease, particularly an MHC-linked disease, in a subject, preferably a human subject. The method comprises genotyping an HLA gene in the subject to be tested by determining the nucleotide present at one or more extra-genic SNP sites, wherein the SNP is associated with an HLA genotype. MHC- linked diseases include, but are not limited to, ankylosing spondylitis, Behcet Syndrome, common variable immunodeficiency, Goodpasture Syndrome, psoriasis, inflammatory bowel disease, insulin-dependent diabetes mellitus (type 1), multiple sclerosis, myasthenia gravis, pemphigus vulgaris, rheumatoid arthritis, systemic lupus erythematosus. Identification of an HLA genotype in the subject which is associated with a disease is indicative that the subject has a greater likelihood of developing the disease. For example, HLA-DRB 1*1101 genotype is associated with pemphigoid diseases, as discussed above. The invention further features a method of predicting or assisting in the prediction of the likelihood or probability of development of a disease, particularly an autoimmune disease, in a subject, preferably a human subject. The method comprises genotyping an HLA gene in the subject to be tested by determining the nucleotide present at one or more extra-genic SNP sites, wherein the SNP is associated with an HLA genotype. Identification of an HLA genotype in the subject which is associated with a disease is indicative that the subject has a greater likelihood of developing the disease. For example, HLA-DR2 haplotype is linked or associated with multiple sclerosis and lupus. Examples of autoimmune diseases grouped based on main target organs include, but are not limited to: 1) Nervous System: multiple sclerosis, myasthenia gravis, autoimmune neuropathies such as Guillain-Barre, autoimmune uveitis; 2) Gastrointestinal System: Crohn's Disease, ulcerative colitis, primary biliary cirrhosis, autoimmune hepatitis; 3) Blood: autoimmune hemolytic anemia, pernicious anemia, autoimmune thrombocytopenia; 4) Endocrine Glands: Type 1 or immune-mediated diabetes mellitus, Grave's Disease, Hashimoto's thyroiditis, autoimmune oophoritis and orchitis, autoimmune disease of the adrenal gland; 5) BloodVessels: temporal arteritis, anti-phospholipid syndrome, vasculitides such as Wegener's granulomatosis, Behcet's disease; 6) Multiple Organs Including the Musculoskeletal System (These diseases are also called connective tissue (muscle, skeleton, tendons, fascia, etc.) diseases.): rheumatoid arthritis, systemic lupus erythematosus, scleroderma, polymyositis, dermatomyositis, spondyloarthropathies such as ankylosing spondylitis, Sjogren's syndrome; 7) Skin: psoriasis, dermatitis herpetiformis, pemphigus vulgaris, vitiligo. A further aspect of the invention provides a method of predicting or assisting in the prediction of the likelihood of developing an immune response in a subject, preferably a human subject. An immune response may be developed against an infecting organism or agent. Alternatively, an immune response may comprise a host-graft response, e.g., rejection of organ transplants. The method comprises genotyping an HLA gene in the subject to be tested by determining the nucleotide present at one or more extra-genic SNP sites, wherein the SNP is associated with an HLA genotype. The method may also comprise separately genotyping an HLA gene in a host (e.g., a blood or organ recipient or donee) and the same HLA gene (or the corresponding HLA gene) in a graft (e.g., a blood or organ donor) by determining the nucleotide present at one or more extra-genic SNP sites in the host and the graft, wherein the SNP is associated with an HLA genotype. Genotyping an HLA gene in a host may involve assessing more, fewer, or the same extra-genic SNPs as compared the extra-genic SNP(s) to be assessed in a graft. hi preferred embodiments of the invention, more than one extra-genic SNP, more preferably more than three extra-genic SNPs, more preferably more than five extra-genic SNPs, and more preferably more than seven extra-genic SNPs are determined in order to determine the genotype of an HLA allele. An exemplary method of determining whether or not a host and a graft have the same HLA alleles or immune-compatible HLA alleles may include: a) determining the HLA allele in the host (or graft) by ascertaining the nucleotide present at one or more extra-genic SNP sites or any other method; b) selecting extra-genic SNPs to be assessed in the graft (or host) based on the HLA allele identity as determined in a); c) assessing the selected extra-genic SNPs to identify the HLA allele genotype. For example, if a host is determined by a method of the invention or any other method to have an HLA-A*101 allele (e.g., having SNP-haplotype as shown in row 5 of TABLE 3 above), only rs2517862 and/or rsl655930 need to be assessed to ascertain that a graft does not have HLA-A* 101. Based on the information in TABLE 3 , one can optimize the selection of SNPs to be assessed. All publications, patents, patent applications and information from databases cited above are hereby incorporated by reference in their entirety for all purposes to the same extent as if each individual publication or patent application were specifically and individually indicated to be so incorporated by reference. The invention is now further described in the following non-limiting examples.
Exemplification Example 1. Materials and Methods DNA Samples Samples were obtained from the Coriell Cell Repository and drawn from the collection of Utah CEPH pedigrees of European descent. One hundred thirty-six independent, grandparental chromosomes were used for haplotype construction. Of these chromosomes, 96 were in common with Gabriel et al. (2002) and, therefore, were used for comparison with the genome-wide LD structure. Identifiers for all individuals can be found at the Inflammatory Disease Research Group (LDRG) Website. Genotyping and Data Checking All SNPs for which genotyping was attempted were publicly available at the dbSNP Web site. SNPs were selected mainly to achieve a desired spacing (1/20 kb); however, SNPs with more than one submitter were preferentially chosen. SNP primers and probes were designed in multiplex format (average fivefold multiplexing) with SpectroDESIGNER software (Sequenom). A total of 435 assays were designed. Assays were considered successful and genotype data were included in the analyses described herein if they passed all of the following criteria: (1) a minimum of 75%> of all genotyping calls were obtained, (2) markers did not deviate from Hardy- Weinberg equilibrium, and (3) markers had no more than one Mendelian error. These criteria defined 201 successful assays. Genotype calls for successful markers were then set to zero for any single Mendelian error. All of these working assays had minor allele frequencies 15%>, and 89% of these assays had minor allele frequencies 110%. Overall, for successful markers, 97.6% of all attempted genotypes were obtained. The entire list of SNP assays, as well as detailed genotyping information, can be found at the LDRGWeb site. Four-digit HLA types were determined for HLA-A, HLA-B, HLA-C, HLA-DRBl, HLA-DMB1, HLADQAl, HLA-DQB1, HLA-DPA1, and HLA-DPBl, as described elsewhere (Begovich et al. 1992; Carrington et al. 1994, 1999; Moonsamy et al. 1997; Bugawan et al. 2000). Typing was performed twice independently, and conflicting types were resolved, in most cases, by two independent retyping experiments. TAP1 and TAP2 were genotyped as described elsewhere (Carrington et al. 1993). D6S2971, D6S2749, D6S2874, D6S273, D6S2876, D6S2751, D6S2741, and D6S2739 were typed as described elsewhere (Martin et al. 1998). Genotyping details for the 11 remaining microsatellites can be found as supplemental information on the LDRGWeb site. D6S2972 and D6S265 genotypes were typed twice (LDRG; Martin et al. 1998), and conflicts were resolved by retyping. Alias details for all microsatellites are provided elsewhere (Cullen et al. 2003). Genotyping Details Multiplex PCR was performed in six microliter volumes containing 0.1 units of Taq polymerase (Amplitaq Gold, Applied Biosystems), 5 ng genomic DNA, 2.5 pmol of each PCR primer, and 2.5 μmol of dNTP. Thennocycling was at 95 °C for 15 minutes followed by 45 cycles of 95 °C for 20s, 56 °C for 30s, 72 °C for 30s. Unincorporated dNTPs were deactivated using 0.3U of Shrimp Alkaline Phosphatase (Roche) followed by primer extension using 5.4 pmol of each primer extension probe, 50 μmole of the appropriate dNTP/ddNTP combination, and 0.5 units of Thermosequenase (Amersham Pharmacia). Reactions were heated to 94 °C for 2 minutes, followed by 40 cycles of 94 °C for 5 s, 50 °C for 5 s, 72 °C for 5 s. Following addition of a cation exchange resin to remove residual salt from the reactions, seven nanoliters of the purified primer extension reaction was loaded onto a matrix pad (3-hydroxypicoloinic acid) of a SpectroCHIP (Sequenom, San Diego, CA). SpectroCHLPs were analyzed using a Bruker Biflex III MALDI-TOF mass spectrometer (SpectroREADER, Sequenom, San Diego, CA) and spectra processed using SpectroTYPER (Sequenom). Eleven of the microsatellites were amplified using the following primers/amplification programs:
Figure imgf000046_0001
The forward primers were fluorescently tagged with 6-FAM, TET or HEX. Amplification was performed in 15 microliter volumes containing 0.8 units Taq polymerase (Roche Applied Science), 25 ng DNA, 200 μM dNTPs, 2.4 pmol each primer, 3.0 nmol dNTPs, and lx PCR buffer (1.5nιM MgC12, 10 mM Tris-HCl, 50mM KC1, pH 8.3, Roche Applied Science). Reactions were run in one of the following MJ Research thermocyclers (PTC- 100, PTC-200 or Genomyx CycLR). Samples were then multiplexed and 2-3 μl of multiplex was combined with an equal amount of size standard loading buffer mix (containing formamide, blue dextran and fluorescently labeled size 10 standard Genescan-350 or -500 Tamara), denatured for 3 minutes at 95 °C, and electrophoresed on a 5% gel (National Diagnostics), using an ABI model 377 DNA sequencer (Applied Biosystems). Genotypes for individuals from families 1331, 1332, 1347, 1362, 1413, 1416 and 884 for D6S1542, D6S1560, D6S1701, D6S1666, D6S265, D6S258 were obtained from 15 the CEPH website (http://www.cephb.fr/test/cephdb/). To ensure correspondence in allele sizes with those genotyped for this study, individual 1347-2 was genotyped for these loci. For RG-MSATS amplification, reactions were heated to 95 °C for 2 minutes followed by 29 cycles of 94 °C for 45 s, 57 °C for 45 s, 72 °C for 1 minute. The final extension was at 72 °C for 7 minutes. MSATJH and 64ANN were the same as RG- MSATS, except annealing was carried out at 55 °C and 64 °C respectively. MSATTD was a touchdown annealing starting at
60 °C and decreasing in each subsequent cycle by 0.3 °C until arriving at 55 °C were annealing was held constant for the remaining 15 cycles. EPA reactions were heated to 95 °C for 2 minutes followed by 4 cycles at 96 °C for 30 s, 57 °C for 90 s and 72 °C for 90 s; followed by 28 cycles at 95 °C for 30 s, 55 °C for 45 s, 72 °C for 1 minute. The final extension was at 72 °C for 30 minutes. D ' Confidence Limits, Definition of Haplotype Blocks, and Structure Comparison Pairwise D' values — estimates of the strength of LD (Lewontin 1964) — for SNP markers were assessed and haplotype blocks were defined as per Gabriel et al. (2002). In brief, D' confidence limits were determined by calculating the probability of the observed data for all possible values of D', from which an overall probability distribution was determined. For all blocks identified, the outermost marker pair was required to be in strong LD, with an upper confidence limit (CU) > 0.98 and a lower confidence limit (CL) > 0.7. Blocks defined by only two markers required confidence bounds of (CL) > 0.8 and (CU) > 0.98 and an intervening distance of =<20 kb; for three consecutive markers, all pairs had to have confidence bounds of CL > 0.5 and CU > 0.98 and an intervening distance of <30 kb; and for four markers, the fraction of informative pairs in strong LD (CL > 0.7 and CU > 0.98) was required to be >95%, with an intervening distance of <30 kb. For runs of five or more markers, the fraction of informative pairs in strong LD was required to be >95%, and markers were allowed to span any distance. SNP genotypes from Gabriel et al. (2002) were used for comparison of haplotype block structure. As the density of coverage was different between these two studies, 20 data sets were derived from the Gabriel et al. (2002) data by randomly removing markers to achieve the same average spacing and spacing distribution. Since there were two existing 100-kb gaps in the SNP coverage described herein, owing to a lack of available SNPs to type near FLOT1 and DQB1, comparison was done by segmenting the MHC into three parts at these gaps. Phase Inference for Extended-Haplotype-Homozygosity Analysis Initial SNP, HLA, TAP, and microsatellite chromosomal phasing was done, on the basis of segregation analysis, using the Genehunter program (Kruglyak et al. 1996). The bulk of genotypes— 91.6% of SNP genotypes and 95% of HLA, TAP, and microsatellite genotypes — were phased with family information. Apart from initial phasing with family information, HLA, TAP, and microsatellite genotypes were not phased further, and the 5% of genotypes that were indeterminate were considered "ambiguous" in further analyses. Further haplotype inference of SNP genotyping data was performed with a procedure that is based on a probability model for haplotypes proposed elsewhere (Fearnhead and Donnelly 2001). This model can be regarded as a refinement that allows for recombination of the model used in the well-known program, PHASE (Stephens et al. 2001). Both unphased and missing SNP data were inferred in this manner. Since a dense set of markers were used, and most markers are in strong LD with several other markers, the phasing unlikely introduced serious bias into the results. Extended-Haplotype-Homozygosity Analysis Extended-haplotype-homozygosity (EHH) analysis was performed, as described elsewhere (Sabeti et al. 2002), for each haplotype block, microsatellite, HLA, and TAP allele, with cM estimations used as distance. Grandparental chromosomes from all families were analyzed. However, some microsatellite types (D6S258, D6S2840, D6S2814, D6S2793, D6S1666, D6S1701, D6S1560, and D6S1542) were not determined for five of these families (1346, 1345, 1420, 1350, and 13292). Rather than infer genotypes, these genotypes were left as "null calls." As mentioned above, 5% of microsatellite, HLA, and TAP genotypes could not be phased with family information. Since EHH is a cumulative statistic, these heterozygotes and missing data are predicted to result in a conservative estimate of EHH values. Outlying variants, depicted in Figures 3A-3C, were chosen on the basis of two criteria designed to pick alleles with high EHH values for their frequency class. First, as a simple approximation of the distribution, scores were ranked by EHH value times allele frequency. Outliers had values >4.5 SDs above the mean. Second, all variants were sorted by frequency into 5% bins. Outliers had EHH values >=4.79 SDs above the mean for the remaining values in that bin. Analysis of SNP Haplotypes around HLA-A, HLA-B, HLA-C, and HLA-DRBl Subsequent to the initial SNP genotyping and analysis of the entire region, additional SNP genotyping was performed near HLA-A, HLA-B, HLA-C, and HLA- DRBl to assess the correlation between the HLA genotype and local SNP haplotype. Multiblock SNP haplotypes include information from the blocks indicated in Figure 4, as well as that from any intervening SNPs not in those blocks. "Leave-one-out" cross- validation was performed using the LeaveOneOut program. In brief, a single chromosome is selected from the data set. The remaining samples are used to build a predictor. This predictor is then used to predict the HLA genotype of the sample that has been removed. If the SNP haplotype occurred once, it is not considered in the test. For each locus, prediction was performed with 106 iterations. (See the IDRG Web site for the LeaveOneOut program and genotyping details.)
Example 2. Analysis of the MHC Region Based on the Integrated Map Structure ofLD in the HLA Genes, Compared with the Genome at Large Recent studies have shown that LD extends across long segments of the genome (Daly et al. 2001 ; Dawson et al. 2002; Gabriel et al. 2002; Phillips et al. 2003). Within such segments, a small number of distinct, common patterns of sequence variation (haplotype alleles) are observed in the general population. Between these segments are short intervals where recombination is apparently most active in creating assortments of these patterns (Daly et al. 2001; Jeffreys et al. 2001; Gabriel et al. 2002). Operationally, it is not necessary to test each variant within an LD segment for association with disease phenotype. Rather, a small subset of variants that identifies all common haplotype alleles within a segment can be used. In order to compare the LD structure in the MHC with that of the genome as a whole, this MHC data was compared with the data set from Gabriel et al. (2002), as this data set offers a genomewide comparison in which the same CEPH samples were genotyped. The empirical definition of an LD segment or "haplotype block" described in Gabriel et al. (2002) was used, as it provides a common measure for comparison of genomic regions (see "Materials and Methods" section). Because the SNP coverage described herein is less dense than that of Gabriel et al. (2002), subsets of markers were randomly selected from the Gabriel et al. (2002) study to create a data set with a spacing similar to that of the present study and thus appropriate for comparison (see "Materials and Methods" section). Given the SNP coverage used, all haplotype blocks are not detected. At this density, only 25% of the MHC and 14.5% of the Gabriel et al. (2002) data set is found to lie in blocks, compared with 85% when using the full density in the Gabriel et al. (2002) data set. This analysis shows that that LD extends over greater physical distances in the MHC than elsewhere in the genome (Figure 2A). Seventeen LD segments were identified in the region that meet the criteria of haplotype blocks (Gabriel et al. 2002) (Figures 1 A- IE). These MHC blocks are longer, on average, in physical distance than those found in the rest of the genome, although this finding does not reach significance, likely because of the small sample size (average length of 31.1 kb vs. 22.3 kb) (Figure 2B). Despite being longer in physical distance, haplotype blocks in the MHC are actually shorter, in terms of genetic distance. The average recombination rate in the
MHC is 0.49 cM/Mb, versus 0.81 cM/Mb in the genome as a whole (Cullen et al. 2002; Kong et al. 2002). Given this difference in recombination rate, it was found that blocks in the MHC have an average length of 0.012 cM, whereas the average is 0.017 cM for the genomewide control data set (significance not tested) (Figure 2C). Furthermore, the distribution of recombination across the region correlates well with most of the long blocks (Figure 1, asterisks) in the region. Six of the seven largest blocks (>=75 kb) lie in areas where recombination rate is well below the genome average of 0.81 cM/Mb. Moreover, five of these blocks lie in regions where the recombination rate is below the MHC regional average of 0.49 cM/Mb. The remaining large block falls into a region where the rate is 0.83 cM/Mb. This leads to a conclusion that the extent of LD in the MHC is longer in physical distance but not in genetic distance than elsewhere in the genome. Extended-Haplotype Analysis This work looked for alleles of haplotype blocks, microsatellites, or classical HLA genes that occur on haplotypes that extend across multiple blocks. Such so called "extended haplotypes" are believed to represent a common feature of the MHC (Alper et al. 1992). To analyze the long-range structure of the region, EHH analysis was used, which determines the length of the chromosomal haplotypes that extend from a specific allele at a particular locus (Sabeti et al. 2002). High-frequency, extended haplotypes may result from positive selection or haplotype-specific recombination suppression. Positive selection brings rare alleles to higher frequency in relatively few generations, thus affording fewer opportunities for recombination events to separate an allele from its original chromosomal context. Alternatively, haplotype-specific recombinational suppression may result in high-frequency, extended haplotypes by reducing the number of recombination events a given haplotype will undergo. Since there is a detailed sperm- typing recombination map of the region, this was used to control for positional variation in average recombination rates that would artificially affect the length of haplotypes. Utilizing the integrated haplotype map, the entire MHC was scanned, using each HLA gene, TAP gene, microsatellite, and haplotype block as an independent locus from which to determine EHH values, assessing every allele from a total of 46 loci. The 50 regions in the Gabriel et al. (2002) data set each span only 250 kb and are, therefore, not long enough to serve as a suitable control data set for this analysis. Thus, the EHH values of haplotype, microsatellite, and gene alleles within the MHC data set were compared with each other and allelic variants that are outliers were identified, on the basis of statistical rank of the EHH value at 0.25 cM, relative to allele frequency (see "Materials and Methods" section) (Figure 3A). Nine alleles were identified that map onto three different extended haplotypes (Figure 3B). It is striking that six of these nine variants map to a single multigene haplotype (HLAC*0702-D6S2793*244- DRB1*1501-DQA1*0102-DQB1*0602-D6S2876*11 [hereafter referred to as "DR2"]). Every element in the DR2 haplotype has an EHH value at least 4.8 SDs above the mean EHH for other variants with the same allele frequency. Two of the remaining outlying alleles map to a single haplotype (D6S2840*219-C*0701), and the last outlying allele is DRB1*1101. As noted above, there are at least two possible underlying causes for these extended haplotypes. One possibility is that a variant on the haplotype has experienced recent positive selection. It is interesting that each of the three extended haplotypes has been implicated elsewhere in autoimmune disease (Thorsby 1997; Klein and Sato 2000). The DR2 haplotype is associated with systemic lupus erythematosus (SLE [MIM 152700]) and multiple sclerosis (MS [MLM 126200]) susceptibility, and it is protective for type I diabetes (IDDM [MIM 222100]) (Thorsby 1997; Chataway et al. 1998; Haines et al. 1998; Barcellos et al. 2002). DRB1*1101 is associated with pemphigoid vulgaris, and D6S2840*219-C*0701 is associated with autoimmune diabetes (MIM 275000) and thyroid disease (MIM 140300) (Drouet et al. 1998; Price et al. 1999; Okazaki et al. 2000). Thus, these three haplotypes appear to have functional consequences for the human immune system. Although these haplotypes are associated with autoimmune diseases at present, it is possible that, under certain conditions, these functional differences were (and perhaps still are) beneficial for disease resistance and, therefore, may have undergone positive selection in the past. The other possibility is that these extended haplotypes are subject to allele- specific recombination suppression. By examining the individual recombination rates used to construct the recombination map, it is observed that, of the 12 individuals examined, the single individual bearing DRB1*1501 showed many fewer recombination events across the MHC than did the others, although this difference did not significantly deviate from the mean. This suggests that allele-specific recombination suppression could be a possibility in this case. Further sperm typing of additional individuals bearing each of these extended haplotypes should resolve whether the underlying cause of this extended haplotype is haplotype-specific recombinational suppression or whether recent positive selection is more likely. Common Patterns of Sequence Variation in the MHC in Regions Between the Classical HLA Loci Next, the haplotype block variation in the MHC was compared with the rest of the genome. With the initial coverage, blocks that spanned classical loci were not identified. These blocks have the same number of common patterns of sequence variation (haplotype alleles) as found in other regions of the genome (3.9 vs. 4.1 for blocks with five or more markers) (Figure ID). Furthermore, the same percentage of rare haplotype alleles in both data sets (3%) is seen, which indicates that the MHC, aside from the classical loci, does not appear to have an excess of rare haplotype variants detectable at the current marker density. The observation that the diversity of haplotypes outside the classical loci is typical of the rest of the genome is perhaps surprising, given the high level of variation at the classical HLA genes. Common Variation in Regions Spanning the Classical HLA Loci The SNP haplotype diversity were separately analyzed in regions spanning the classical HLA genes (but outside the highly variable exons) to understand how this variation is structured. For this purpose, it was necessary to increase the density of SNP coverage by three- to five-fold around the four HLA genes chosen for analysis, HLA-A, HLA-B, HLA-C, and HLA-DRB 1. One motivating question in this analysis was whether SNP haplotypes spanning classical HLA loci contained enough information to predict HLA alleles. If so, it might be possible to use high-throughput SNP genotyping as a first-pass surrogate for traditional HLA gene molecular typing (e.g., probe-based typing or direct sequencing) in disease association studies. For one of these classical genes, HLA-A, a single 7-SNP haplotype block spanning the locus was identified. This 7-SNP HLA-A block has only six common variants, and those are predictive of the correct HLA-A allele 66.2% of the time, as shown by cross-validation analysis (LeaveOneOut [see the "Materials and Methods" section]). To capture more of the variation at this locus, the genotype information for a neighboring block was included, and the SNP haplotypes that comprised the combinations of alleles of these two blocks were examined. The success of prediction improved from 66.2%> to 82.6%> of all HLA-A alleles present. Using such multiblock haplotypes for all four classical HLA loci studied, multiblock SNP haplotypes can act as surrogate markers for HLA alleles. For example, the HLA-A*0101 allele occurs on the "G" SNP haplotype (comprising the haplotype alleles of two blocks) 92% of the time (Figure 4A), and the "G" SNP haplotype correlates to HLA-A*0101 95.6%> of the time (Figure 4B). Cross-validation analysis was used to estimate the success rate of prediction. Even with the current coverage, HLA alleles can be accurately predicted by SNP haplotype 75%-84%> of the time (HLA-A: 82.6%; HLA- B: 79.8%; HLA-C: 84.3%; and HLA-DRBl: 75.0%). Considering only haplotypes bearing common HLA alleles (allele frequency 15%), predictions are accurate at a higher rate (HLA-A: 96.2%, HLA-B: 98.8%; HLA-C: 96.0%; and HLA-DRBl : 82.2%) was found, which suggests that the bulk of the prediction failures reflect an inability to predict low-frequency variants. These data suggest that two elements are needed to improve the predictive power: (1) a larger data set, which would increase the numbers of observations of rare HLA variants, and (2) increased marker density that would provide additional SNP haplotype information, as evidenced by the case of HLA-A above. Electronic-Database Information
URLs for data presented herein are as follows and are incorporated herein by reference:
Coriell Institute, http://locus.umdnj.edu/ccr/ dbSNP, http://www.ncbi.nlm.nih.gov/SNP/ IDRG, http://www.genome.wi.mit.edu/mpg/idrg/ (for supplementary "Materials and
Methods" information and pairwise D' analysis for 201 reliable, polymorphic SNP assays in 18 multigenerational European CEPH families); http://www.broad.mit.edu/mpg/idrg/projects/hla.htm.
Online Mendelian Inheritance in Man (OMIM), http://www.ncbi.nlm.nih.gov/Omim/ (for SLE, MS, IDDM, Hashimoto thyroiditis, Graves disease).
References Alper CA, Awdeh Z, Yunis EJ (1992) Conserved, extended MHC haplotypes. Exp Clin Immunogenet 9:58-71. Barcellos LF, Oksenberg JR, Green AJ, Bucher P, Rimmler JB, Schmidt S, Garcia ME Lincoln RR, Pericak- Vance MA, Haines JL, Hauser SL, Multiple Sclerosis Genetic Group (2002) Genetic basis for clinical expression in multiple sclerosis. Brain 125:150- 158. Begovich AB, McClure GR, Suraj VC, Helmuth RC, Fildes N, Bugawan TL,
Erlich HA, Klitz W (1992) Polymorphism, recombination, and linkage disequilibrium witWn the HLA class II region. J Immunol 148:249-258. Bugawan TL, Klitz W, Blair A, Erlich HA (2000) High-resolution HLA class I typing in the CEPH families: analysis of linkage disequilibrium among HLA loci. Tissue Antigens 56:392-404. Carrington M, Colonna M, Spies T, Stephens JC, Mann DL (1993) Haplotypic variation of the transporter associated with antigen processing (TAP) genes and their extension of HLAclass II region haplotypes. Immunogenetics 37:266-273. Carrington M, Nelson GW, Martin MP, Kissner T, Vlahov D, Goedert JJ, Kaslow R, Buchbinder S, Hoots K, O'Brien SJ (1999) HLA and HIV-1 : heterozygote advantage andB*35-Cw*04 disadvantage. Science 283:1748-1752. Carrington M, Stephens JC, Klitz W, Begovich AB, Erlich HA, Mann D (1994) Major liistocompatibility complex class II haplotypes and linkage disequilibrium values observed in the CEPH families. Hum Immunol 41 :234-240. Chataway J, Feakes R, Coraddu F, Gray J, Deans J, Fraser M, Robertson N,
Broadley S, Jones H, Clayton D, Goodfellow P, Sawcer S, Compston A (1998) The genetics of multiple sclerosis: principles, background and updated results of the United Kingdom systematic genome screen. Brain 121:1869-1887. Cullen M, Malasky M, Harding A, Carrington M (2003) High-density map of short tandem repeats across the human major histocompatibility complex. Immunogenetics 54(12):900-10. Cullen M, Perfetto SP, KlitzW, Nelson G, Carrington M (2002) High-resolution patterns of meiotic recombination across the human major histocompatibility complex. Am J Hum Genet 71 -.159-116. Daly MJ, Rioux JD, Schaffher SF, Hudson TJ, Lander ES (2001) High-resolution haplotype structure in the human genome. Nat Genet 29:229-232. Dawson E, Abecasis GR, Bumpstead S, Chen Y, Hunt S, Beare DM, Pabial J, et al (2002) A first-generation linkage disequilibrium map of human chromosome 22. Nature 418:544-548. Drouet M, Delpuget-Bertin N, Vaillant L, Chauchaix S, Boulanger MD, Bonnetblanc JM, Bernard P (1998) HLA-DRBl and HLA-DQBl genes in susceptibility and resistance to cicatricial pemphigoid in French Caucasians. Eur J Dermatol 8:330- 333. Fearnhead P, Donnelly P (2001) Estimating recombination rates from population genetic data. Genetics 159:1299-1318. Gabriel SB, Schaffher SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins
J, DeFelice M, Lochner A, Faggart M, Liu-Cordero SN, Rotimi C, Adeyemo A, Cooper R, Ward R, Lander ES, Daly MJ, Altshuler D (2002) The structure of haplotype blocks in the human genome. Science 296:2225-2229. Haines JL, Terwedow HA, Burgess K, Pericak- Vance MA, Rimmler JB, Martin ER, Oksenberg JR, Lincoln R, Zhang DY, Banatao DR, Gatto N, Goodkin DE, SL H
(1998) Linkage of the MHC to familial multiple sclerosis suggests genetic heterogeneity. The Multiple Sclerosis Genetics Group. Human Molecular Genetics 7:1229-1234. Jeffreys AJ, Kauppi L, Neumann R (2001) Intensely punctuate meiotic recombination in the class II region of the major histocompatibility complex. Nat Genet 29:217-222. Klein J, Sato A (2000) The HLA system: second of two parts. N Engl J Med 343:782-786. Kong A, Gudbjartsson DF, Sainz J, Jonsdottir GM, Gudjonsson SA, Richardsson B, Sigurdardottir S, Barnard J, Hallbeck B, Masson G, Shlien A, Palsson ST, Frigge ML, Thorgeirsson TE, Gulcher JR, Stefansson K (2002) A high-resolution recombination map of the human genome. Nat Genet 31:241-247. Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES (1996) Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet 58:1347-1363. Lewontin RC (1964) The interaction of selection and linkage. I. General considerations; heterotic models. Genetics 49:49-67. Martin MP, Harding A, Chadwick R, Kronick M, Cullen M, Lin L, Mignot E,
Carrington M (1998) Characterization of 12 microsatellite loci of the human MHC in a panel of reference cell lines. Immunogenetics 47:131-138. Moonsamy PV, Klitz W, Tilanus MG, Begovich AB (1997) Genetic variability and linkage disequilibrium within the DP region in the CEPH families. Hum Immunol 58:112-121. Okazaki A, Miyagawa S, Yamashina Y, Kitamura W, Shirai T (2000) Polymorphisms of HLA-DR and -DQ genes in Japanese patients with bullous pemphigoid. J Dermatol 27:149-156. Phillips MS, Lawrence R, Sachidanandam R, Morris AP, Balding DJ, Donaldson MA, Studebaker JF, et al (2003) Chromosome-wide distribution of haplotype blocks and the role of recombination hot spots. Nat Genet 33:382-387. Price P, Witt C, Allcock R, Sayer D, Garlepp M, Kok CC, French M, Mallal S, Christiansen F (1999) The genetic basis for the association of the 8.1 ancestral haplotype (Al, B8, DR3) with multiple immunopathological diseases. Immunol Rev 167:257-274. Sabeti PC, Reich DE, Higgins JM, Levine HZ, Richter DJ, Schaffher SF, Gabriel
SB, Platko JV, Patterson NJ, McDonald GJ, Ackerman HC, Campbell SJ, Altshuler D, Cooper R, Kwiatkowski D,Ward R, Lander ES (2002) Detecting recent positive selection in the human genome from haplotype structure. Nature 419:832-837. Stephens M, Smith NJ, Donnelly P (2001) A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 68:978-989. Thorsby E (1997) Invited anniversary review: HLA associated diseases. Hum Immunol 53:1-11. All references cited herein are incorporated by reference in their entirety.
While this invention has been particularly shown and described with references to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention encompassed by the appended claims.

Claims

We claim: 1. An SNP-haplotype map of a 4-Mb MHC region, said map comprising evenly spaced SNPs, genotyped HLA genes, TAP genes, and microsatellites.
2. The SNP-haplotype map according to claim 1, wherein the SNPs are spaced at approximately every 20kb of said MHC region.
3. A method of determining the identity of an HLA allele, said method comprising a) determining the nucleotide present at one or more extra-genic SNP sites corresponding to the HLA allele to be assessed; and b) identifying said HLA allele based on the nucleotide identity determined in a).
4. The method according to claim 3, wherein the HLA allele is an HLA-A allele.
5. The method according to claim 4, wherein said one ore more SNP sites are selected from the group consisting of: rs2517862, rsl655930, rsl616549, rs376253, rsl961135, rs2517706, rs2517701, rs2517699, rs435766, rs410909, rs2394255, rsl264807, rs2530388, rs356963, rs2286405, rs2240619, rs3129012, and rs259938.
6. The method according to claim 4 comprising determining the nucleotide present at more than one SNP site selected from the group consisting of: rs2517862, rsl655930, rsl616549, rs376253, rsl961135, rs2517706, rs2517701, rs2517699, rs435766, rs410909, rs2394255, rsl264807, rs2530388, rs356963, rs2286405, rs2240619, rs3129012, and rs259938.
7. The method according to claim 3, wherein the HLA allele is an HLA- DRBl allele.
8. The method according to claim 7, wherein said one ore more SNP sites are selected from the group consisting of: rs742697, rs523627, rs3129960, rs2395163, rs2395165, rs983561, rs2239804, rs2213584, rs2395182, and rs2858860.
9. The method according to claim 8 comprising additionally determining the nucleotide present at one or more SNP sites selected from the group consisting of rs3129907, rsl059544, and rsl987529.
10. The method according to claim 7 comprising determining the nucleotide present at more than one SNP site selected from the group consisting of: rs742697, rs523627, rs3129960, rs2395163, rs2395165, rs983561, rs2239804, rs2213584, rs2395182, rs2858860 rs3129907, rsl059544, and rsl987529.
11. A method of predicting the likelihood of development of an MHC-linked disease or an autoimmune disease in a human, comprising determining the identity of an HLA allele in the human by determining the nucleotide present at one or more extra- genic SNP sites corresponding to the HLA allele to be assessed, wherein if the HLA allele in the human is associated with an MHC-linked disease or an autoimmune disease, the human has a greater likelihood of development of said disease.
12. A method of predicting the likelihood of development of a host-graft response in a human host, comprising determining the identity of an HLA allele of the graft by determining the nucleotide present at one or more extra-genic SNP sites corresponding to the HLA allele to be assessed in the graft, wherein if the HLA allele in the human host is identical to the corresponding HLA allele in the graft, there is a low likelihood of development of a host-graft response in the human host.
13. The method according to claim 12, optionally comprising additionally determining the identity of the corresponding HLA allele of the human host by determining the nucleotide present at one or more extra-genic SNP sites corresponding to the HLA allele to be assessed in the human host.
14. The method according to claim 12, comprising determining the HLA-A allele in the graft by determining the nucleotide present at one or more SNP sites selected from the group consisting of: rs2517862, rsl655930, rsl616549, rs376253, rsl961135, rs2517706, rs2517701, rs2517699, rs435766, rs410909, rs2394255, rsl264807, rs2530388, rs356963, rs2286405, rs2240619, rs3129012, and rs259938.
15. The method according to claim 12, comprising determining the HLA- DRBl allele in the graft by determining the nucleotide present at one or more SNP sites selected from the group consisting of: rs742697, rs523627, rs3129960, rs2395163, rs2395165, rs983561, rs2239804, rs2213584, rs2395182, rs2858860 rs3129907, rsl059544, and rsl987529.
16. A method of predicting the likelihood of development of a host-graft response in a human host, comprising determining the identity of an HLA allele of the graft by determining the nucleotide present at one or more extra-genic SNP sites corresponding to the HLA allele to be assessed in the graft, wherein if the HLA allele in the human host is different from the corresponding HLA allele in the graft, there is a high likelihood of developing a host-graft response in the human host.
17. The method according to claim 16, optionally comprising additionally determining the identity of the corresponding HLA allele of the human host by determining the nucleotide present at one or more extra-genic SNP sites corresponding to the HLA allele to be assessed in the human host.
18. The method according to claim 16, comprising determining the HLA-A allele in the graft by determining the nucleotide present at one or more SNP sites selected from the group consisting of: rs2517862, rsl655930, rsl616549, rs376253, rsl961135, rs2517706, rs2517701, rs2517699, rs435766, rs410909, rs2394255, rsl264807, rs2530388, rs356963, rs2286405, rs2240619, rs3129012, and rs259938.
19. The method according to claim 16, comprising determining the HLA- DRBl allele in the graft by determining the nucleotide present at one or more SNP sites selected from the group consisting of: rs742697, rs523627, rs3129960, rs2395163, rs2395165, rs983561, rs2239804, rs2213584, rs2395182, rs2858860 rs3129907, rsl059544, and rsl987529.
PCT/US2005/017958 2004-05-19 2005-05-19 Methods of human leukocyte antigen typing by neighboring single nucleotide polymorphism haplotypes WO2005123951A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/850,359 US20050266410A1 (en) 2004-05-19 2004-05-19 Methods of Human Leukocyte Antigen typing by neighboring single nucleotide polymorphism haplotypes
US10/850,359 2004-05-19

Publications (2)

Publication Number Publication Date
WO2005123951A2 true WO2005123951A2 (en) 2005-12-29
WO2005123951A3 WO2005123951A3 (en) 2006-10-19

Family

ID=35425773

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2005/017958 WO2005123951A2 (en) 2004-05-19 2005-05-19 Methods of human leukocyte antigen typing by neighboring single nucleotide polymorphism haplotypes

Country Status (2)

Country Link
US (1) US20050266410A1 (en)
WO (1) WO2005123951A2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008110206A1 (en) * 2007-03-13 2008-09-18 Genome Diagnostics B.V. Method for determining a hla-dq haplotype in a subject
WO2010075249A2 (en) 2008-12-22 2010-07-01 Genentech, Inc. A method for treating rheumatoid arthritis with b-cell antagonists
CN101250587B (en) * 2008-03-26 2012-01-04 上海市血液中心 Method for identifying TAP allelomorph by SNPs combination
WO2013128199A1 (en) * 2012-02-28 2013-09-06 Nhs Blood & Transplant A method of molecular typing

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2890859B1 (en) 2005-09-21 2012-12-21 Oreal DOUBLE-STRANDED RNA OLIGONUCLEOTIDE INHIBITING TYROSINASE EXPRESSION
US20090221620A1 (en) * 2008-02-20 2009-09-03 Celera Corporation Gentic polymorphisms associated with stroke, methods of detection and uses thereof
WO2013055683A1 (en) 2011-10-10 2013-04-18 Teva Pharmaceutical Industries Ltd. Single nucleotide polymorphisms useful to predict clinical response for glatiramer acetate
TWI518538B (en) * 2013-04-17 2016-01-21 中央研究院 Predicting hla genotypes using unphased and flanking single-nucleotide polymorphisms in han chinese population
WO2016022641A1 (en) * 2014-08-05 2016-02-11 The Johns Hopkins University Platform independent haplotype identification and use in ultrasensitive dna detection
JP2022534071A (en) * 2019-05-22 2022-07-27 ソウル ナショナル ユニバーシティ アールアンドディービー ファウンデーション Method and apparatus for genotype prediction using NGS data

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005082110A2 (en) * 2004-02-26 2005-09-09 Illumina Inc. Haplotype markers for diagnosing susceptibility to immunological conditions

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005082110A2 (en) * 2004-02-26 2005-09-09 Illumina Inc. Haplotype markers for diagnosing susceptibility to immunological conditions

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
AHMAD TARIQ ET AL: "Haplotype-specific linkage disequilibrium patterns define the genetic topography of the human MHC." HUMAN MOLECULAR GENETICS. 15 MAR 2003, vol. 12, no. 6, 15 March 2003 (2003-03-15), pages 647-656, XP002391029 ISSN: 0964-6906 *
HORTON ROGER ET AL: "Gene map of the extended human MHC." NATURE REVIEWS. GENETICS. DEC 2004, vol. 5, no. 12, December 2004 (2004-12), pages 889-899, XP002391032 ISSN: 1471-0056 *
KERMAN R H: "Relevance of histocompatibility testing in clinical transplantation." THE SURGICAL CLINICS OF NORTH AMERICA. OCT 1994, vol. 74, no. 5, October 1994 (1994-10), pages 1015-1028, XP009069748 ISSN: 0039-6109 *
STENZEL ANNETTE ET AL: "Patterns of linkage disequilibrium in the MHC region on human chromosome 6p." HUMAN GENETICS. MAR 2004, vol. 114, no. 4, 22 January 2004 (2004-01-22), pages 377-385, XP002391031 ISSN: 0340-6717 *
STEWART C ANDREW ET AL: "Complete MHC haplotype sequencing for common disease gene mapping." GENOME RESEARCH. JUN 2004, vol. 14, no. 6, 12 May 2004 (2004-05-12), pages 1176-1187, XP002391030 ISSN: 1088-9051 *
TIERCY JEAN-MARIE ET AL: "Selection of unrelated bone marrow donors by serology, molecular typing and cellular assays." TRANSPLANT IMMUNOLOGY. AUG 2002, vol. 10, no. 2-3, August 2002 (2002-08), pages 215-221, XP002391033 ISSN: 0966-3274 *
WALSH EMILY C ET AL: "An integrated haplotype map of the human major histocompatibility complex." AMERICAN JOURNAL OF HUMAN GENETICS. SEP 2003, vol. 73, no. 3, September 2003 (2003-09), pages 580-590, XP002391028 ISSN: 0002-9297 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008110206A1 (en) * 2007-03-13 2008-09-18 Genome Diagnostics B.V. Method for determining a hla-dq haplotype in a subject
CN101250587B (en) * 2008-03-26 2012-01-04 上海市血液中心 Method for identifying TAP allelomorph by SNPs combination
WO2010075249A2 (en) 2008-12-22 2010-07-01 Genentech, Inc. A method for treating rheumatoid arthritis with b-cell antagonists
WO2013128199A1 (en) * 2012-02-28 2013-09-06 Nhs Blood & Transplant A method of molecular typing

Also Published As

Publication number Publication date
US20050266410A1 (en) 2005-12-01
WO2005123951A3 (en) 2006-10-19

Similar Documents

Publication Publication Date Title
TWI236502B (en) Prediction of inflammatory disease
WO2005123951A2 (en) Methods of human leukocyte antigen typing by neighboring single nucleotide polymorphism haplotypes
Miller et al. High-density single-nucleotide polymorphism maps of the human genome
CA2409774A1 (en) Methods for genetic analysis of dna to detect sequence variances
US8206911B2 (en) Identification of the gene and mutation responsible for progressive rod-cone degeneration in dog and a method for testing same
US9637788B2 (en) Discrimination of blood type variants
EP1417338A1 (en) Genes and snps associated with eating disorders
KR101761801B1 (en) Composition for determining nose phenotype
EP0672179B1 (en) Nucleic acid analysis
AU2005259787B2 (en) Method of detecting mutations in the gene encoding cytochrome P450-2D6
US20160053333A1 (en) Novel Haplotype Tagging Single Nucleotide Polymorphisms and Use of Same to Predict Childhood Lymphoblastic Leukemia
Xu et al. Screening candidate genes for mutations in patients with hypogonadotropic hypogonadism using custom genome resequencing microarrays
Moonsamy et al. Genetic variability and linkage disequilibrium within the DP region in the CEPH families
EP1618210B1 (en) Associations of polymorphisms in the frzb gene with osteoporosis
JP6245796B2 (en) Markers, probes, primers and kits for predicting the risk of developing primary biliary cirrhosis and methods for predicting the risk of developing primary biliary cirrhosis
Pai et al. Flow chart HLA-DQA1 genotyping and its application to a forensic case
US20080026367A9 (en) Methods for genomic analysis
EP0887423A1 (en) A method for determining the Histocompatibility locus antigen class II
Liu et al. Genetic mapping of the chicken stem cell antigen 2 (SCA2) gene to chromosome 2 via PCR primer mutagenesis 1.
JP2004528847A (en) Diagnosis of single nucleotide polymorphism in schizophrenia
Magnani et al. Informativity of intragenic microsatellites for carrier detection and prenatal diagnosis of cystic fibrosis in the Italian population
WO2017222247A2 (en) Marker for predicting efficacy of therapeutic agent for hemophilia and use thereof
KR101168737B1 (en) Polynucleotides derived from FANCA gene comprising single nucleotide polymorphism, microarrays and diagnostic kits comprising the same, and analytic methods using the same
KR101167942B1 (en) Polynucleotides derived from ALG12 gene comprising single nucleotide polymorphisms, microarrays and diagnostic kits comprising the same, and analytic methods for autism spectrum disorders using the same
KR20230063010A (en) HLA-B genotype analysis method using Korean-specific SNPs and optimized pipeline

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

122 Ep: pct application non-entry in european phase