WO2006089238A2 - Multiplex assays for inferring ancestry - Google Patents
Multiplex assays for inferring ancestry Download PDFInfo
- Publication number
- WO2006089238A2 WO2006089238A2 PCT/US2006/005863 US2006005863W WO2006089238A2 WO 2006089238 A2 WO2006089238 A2 WO 2006089238A2 US 2006005863 W US2006005863 W US 2006005863W WO 2006089238 A2 WO2006089238 A2 WO 2006089238A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- panel
- seq
- nos
- primers
- primer pairs
- Prior art date
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6888—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/106—Pharmacogenomics, i.e. genetic variability in individual responses to drugs and drug metabolism
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/16—Primer sets for multiplex assays
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/172—Haplotypes
Definitions
- the invention relates generally to the identification of genetic markers predictive of an individual's biogeographical ancestry, and more specifically to combinations of single nucleotide polymorphisms useful as ancestry informative markers (AEVIs), which allow an inference as to a trait of an individual, algorithms for identifying such AIMs, and methods of using such AIMs to infer a trait of an individual, including an individual's ancestry, responsiveness of an individual to a drug, and predisposition of an individual to a disease.
- AEVIs ancestry informative markers
- BGA BioGeographical Ancestry
- BGA is relevant for most any type of genetics or epidemiological study design. For example, BGA is an important component in the variability of drug response (Burroughs et al., J. Natl. Med. Assoc.
- Consistency is a significant problem with the self- reporting of race on questionnaires, and one that the Food and Drug Administration is attempting to address during the clinical trial design process.
- Consistency can be a difficult end to achieve.
- STRs short tandem repeats
- the present invention provides methods and compositions for measuring, with a desired predetermined level of confidence, within individual population structure, which, as disclosed herein, allows inferences to be drawn, for example, as to ancestry, pigmentation traits, drug responsiveness, and disease susceptibility of the individual.
- the present methods and compositions were used in a forensics capacity, wherein DNA samples obtained at the crime scenes of a serial murder/rapist in Louisiana were examined. Based on psychological profiling, police were of the belief that the serial killer was a Caucasian male, and had tested the DNA of over 1,000 Caucasian men without finding a match.
- the police then turned to the inventors, who, using the compositions and methods of the invention, determined that the individual committing the crimes was African American and, more specifically, had a proportional and confidence qualified ancestry of 85% sub-Saharan African and 15% Native American. Based on this result and additional results as disclosed herein, the police were further advised that the average African American is of 20% hidoEuropean ancestry, that greater levels of mdoEuropean ancestry correlate with lighter skin tone, and, therefore, that the person committing the crimes was likely an African American with average to darker than average skin tone. Within two months of refocusing their efforts based on this information, the police arrested an African American man of average skin tone (for African Americans); DNA testing determined that he was the person whose DNA was found at the crime scenes.
- the present invention relates to a method of inferring, with a predetermined level of confidence, a trait of an individual.
- a method can be performed, for example, by contacting a sample, which includes nucleic acid molecules of a test individual, with hybridizing oligonucleotides, wherein the hybridizing nucleotides can detect nucleotide occurrences of single nucleotide polymorphisms (SNPs) of a panel of at least about ten ancestry informative markers (AEVIs) indicative of a population structure correlated with the trait, and wherein said contacting is performed under conditions suitable for detecting the nucleotide occurrences of the AIMs of the individual by the hybridizing oligonucleotides; and identifying, with a predetermined level of confidence, a population structure that correlates with the nucleotide occurrences of the AIMs in the individual, wherein the population structure correlates with a trait.
- SNPs single nucleotide polymorphism
- apanel of at least about ten AEVIs (e.g., 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, or more) is examined in practicing a method of the invention.
- a panel of at least about ten AEVIs e.g., 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, or more
- the greater number of ATMs examined the greater the confidence level of an inference made using the method.
- a trait for which an inference is made according to a method of the invention can be any trait, including a trait for which an ethnic predisposition is known or suspected to occur and a trait for which it known that no ethnic predisposition occurs or for which it is not known or unclear as to whether there is an ethnic predisposition, hi one embodiment, the trait is biogeographical ancestry (BGA).
- BGA biogeographical ancestry
- the panel of ABVIs used to examine BGA includes ABVIs as set forth in SEQ BD NOS :1 to 71.
- the panel includes ATMs as set forth in SEQ ID NOS:7, 21, 23, 27, 45, 54, 59, 63, and 72 to 152; in SEQ ID NOS:3, 8, 9, 11, 12, 33, 40, 59, 63, and 153 to 239; or in SEQ ID NOS:1, 8, 11, 21, 24, 40, 172, and 240 to 331, as well as apanel containing combinations of ABVIs as set forth in SEQ ID NOS:1 to 331.
- ADVIs useful in practicing a method of the invention can, but need not, be linked to a gene linked to the trait (i.e., a gene known to be involved in the trait phenotype) and generally are not in linkage disequilibrium with the gene (or locus).
- a gene linked to the trait i.e., a gene known to be involved in the trait phenotype
- an ABVI useful for inferring drug responsiveness of an individual according to a method of the invention need not be linked to a gene involved in responsiveness to the drug (e.g., a drug metabolism gene or a drug transport gene such as a cytochrome P450 gene or P- glycoprotein gene).
- an AIM useful for inferring a pigmentation trait of an individual according to a method of the invention need not be linked to a gene involved in pigmentation (e.g., a tyrosinase gene or a melanocortin-1 receptor gene).
- a gene involved in pigmentation e.g., a tyrosinase gene or a melanocortin-1 receptor gene.
- at least one (e.g., 1, 2, 3, 4, or 5) AIM of a panel is not linked to a gene involved in the trait for which an inference is being made.
- an individual being examined can have an ancestry that includes any one or a combination of ancestral groups, including, for example, a proportion of sub-Saharan African ancestry, Native American ancestry, hidoEuropean ancestry, East Asian ancestry, Middle Eastern ancestry, Pacific Islander ancestry, or a combination including one or more of these ancestries.
- the proportional ancestry of an individual can comprise one ancestry (e.g., 100% IndoEuropean ancestry), or any proportion of two, three, four, or more ancestral groups.
- a test individual can have, for example, a proportion of at least three ancestral groups, which can include proportions of sub-Saharan African ancestry and two other ancestries, or can include proportions of sub-Saharan African and rndoEuropean ancestral groups and a third ancestry; or Native American and IndoEuropean ancestral groups and a third ancestry; or East Asian and Native American ancestral groups and a third ancestry; or hidoEuropean and East Asian ancestral groups and a third ancestry; or can include proportions of Native American, East Asian, and IndoEuropean ancestral groups, or of sub-Saharan African, Native American, and hidoEuropean ancestral groups, and the like.
- a trait of a test individual for which an inference is being made is responsiveness of the individual to a drug, particularly a therapeutic drug.
- a method of the invention provides a tool for realizing personalized medicine.
- a drug for which an inference can be made as to whether a test individual will be responsive in either a positive or negative manner, can be, for example, a cancer chemotherapeutic agent such as paclitaxel, or a drug such as a statin, which can be useful for maintaining or lowering cholesterol levels.
- AIMs of the panel of AIMs used to practice the method includes AIMs of genes other than genes known to be involved in melanin synthesis or metabolism.
- a trait of a test individual for which an inference is being made is a susceptibility or predisposition of the individual to a disease.
- various traits are associated with population structure at a continental level, whereas other traits are associated with population structure at finer levels.
- a method of the invention can provide an means for making an inference with respect to a trait such as disease susceptibility for diseases such as diabetes, hypertension, and cancers that are known to have an ethnic predisposition (i.e., known to occur with higher frequencies in individuals of certain ethnic/ancestral groups), as well as for disease such as such as alcoholism, or schizophrenia, Parkinson's disease, and other neurological disorders, which do not (or at least are not known to) have an ethnic predisposition.
- a trait such as disease susceptibility for diseases such as diabetes, hypertension, and cancers that are known to have an ethnic predisposition (i.e., known to occur with higher frequencies in individuals of certain ethnic/ancestral groups), as well as for disease such as such as alcoholism, or schizophrenia, Parkinson's disease, and other neurological disorders, which do not (or at least are not known to) have an ethnic predisposition.
- a trait of a test individual for which an inference is being made is a pigmentation trait.
- the pigmentation trait can be any such trait including, for example, eye color or shade, skin color, hair color, or a combination thereof.
- AIMs of the panel of ATMs used to practice the method includes AIMs of genes other than genes known to be involved in melanin synthesis or metabolism, or other aspects of pigmentation.
- a method of inferring a trait of a test individual by determining a population structure that correlates with nucleotide occurrences of AEVIs in the individual can further include identifying, with a predetermined level of confidence, a sub-population structure of the population structure, wherein the sub-population structure correlates with a trait.
- a population structure of an individual can correlate to an intercontinental group with which, by inference, the individual shares ancestry, for example, ⁇ hidoEuropean
- a sub-population structure can further correlate with an intracontinental group with which the individual shares hidoEuropean ancestry, for example, Mediterranean ethnicity.
- the hybridizing oligonucleotides useful in the methods of the invention can be oligonucleotide probes or oligonucleotide primers. Oligonucleotide probes useful in the present methods can hybridize to a nucleotide sequence that includes the SNP position for an AIM, wherein the nucleotide at the position of the hybridizing oligonucleotide that corresponds to the position of the SNP for the AIM either matches or does not match the nucleotide occurrence at the SNP position.
- Additional oligonucleotide probes useful in the methods of the invention include oligonucleotide probes that hybridize to a polynucleotide sequence adjacent to and upstream and/or adjacent to and downstream of the SNP position, and that can, but need not, include a nucleotide corresponding to the nucleotide position of the SNP, and wherein such a corresponding nucleotide, when present in the probe, can, but need not match the nucleotide occurrence at the SNP.
- Oligonucleotide primers useful in the methods of the invention include oligonucleotide primers useful for a primer extension reaction, as well as oligonucleotide primers that, in combination, allow for amplification of template polynucleotide comprising the AIM.
- Such amplification primer pairs generally include a forward primer and a reverse primer useful for amplification of a template polynucleotide comprising an AIM of interest.
- 2, 3, 4, or more different forward primers can be used with a common reverse primer for amplification of different template polynucleotides comprising the AEVI (e.g., in a multiplex reaction) and a common gene sequence (e.g., AIMs of a family of related gene sequences) or for generating amplification products of different sizes from a single template.
- a common forward primer can be used with one or a plurality of different reverse primers.
- a method of the invention is performed using oligonucleotide primers, hi one aspect of this embodiment, the method includes contacting the sample with the oligonucleotide primers and with a polymerase, under condition suitable for generation of a primer extension product, hi such a method, the nucleotide occurrence of a SNP can be determined by detecting the presence of the primer extension product, or by sequencing the primer extension product (or a product thereof) and identifying the nucleotide at the position corresponding to the position of the SNP.
- the method includes contacting the sample with oligonucleotide primers that comprise amplification primer pairs and with a polymerase, under condition suitable for generation of an amplification product.
- the nucleotide occurrence of a SNP can be determined by detecting the presence of the amplification product, or by sequencing the amplification product (or a product thereof) and identifying the nucleotide at the position corresponding to the position of the SNP.
- the methods of the invention are particularly adaptable to being performed in a high throughput format, including in a multiplex format, thus allowing examination of a large number of AIMs and/or a large number of samples of test individuals, as well as controls, in parallel.
- the methods can be performed using a format in which the samples being examined are arranged in an array, particularly an addressable array, e.g., on in wells in a tray or on a glass slide or silicon chip, and can be partly or fully automated using robotics.
- the AIMs examined need not necessarily be those having the greatest delta values for the particular trait, but also can be selected to balance the delta value with the compatibility of primers in a multiplex set, for example, to select ADVIs such that hybridizing oligonucleotides (e.g., amplification primer pairs) can be designed that can be used in a single reaction for examining a panel of AIMs but that do not substantially cross- hybridize with ABVIs other than the target AIM for which the hybridizing oligonucleotides are designed.
- hybridizing oligonucleotides e.g., amplification primer pairs
- the present invention also relates to a method of estimating, with a predetermined level of confidence, proportional ancestry of at least two ancestral groups of a test individual.
- a method can be performed, for example, by contacting a sample, which includes nucleic acid molecules of the test individual, with hybridizing oligonucleotides that can detect nucleotide occurrences of SNPs of a panel of at least about ten ATMs that are indicative of BGA for each ancestral group examined, wherein the contacting is under conditions suitable for detecting the nucleotide occurrences of the AIMs of the test individual by the hybridizing oligonucleotides; and identifying, with a predetermined level of confidence, a population structure that correlates with, or is most likely given, the nucleotide occurrences of the AIMs of each of the ancestral groups examined, wherein the population structure is indicative of proportional ancestry.
- the proportional ancestry estimated according to a method of the invention can be a proportion of any ancestral group, including, for example, a proportion of sub- Saharan African, Native American, IndoEuropean, East Asian, Middle Eastern, or Pacific Islander ancestral group, and generally is a combination of two or more of such ancestral groups.
- the proportional ancestry of a test individual can include proportions of sub-Saharan African and IndoEuropean ancestral groups (e.g., 80% sub-Saharan African and 20% IndoEuropean; or 60% sub-Saharan African, 20% IndoEuropean, and 20% of a third ancestral group); or can include proportions of Native American and rndoEuropean ancestral groups; East Asian and Native American ancestral groups; IndoEuropean and East Asian ancestral groups; and the like.
- the proportional ancestry can include proportions of Native American, East Asian, and IndoEuropean ancestral groups; sub-Saharan African, Native American, and IndoEuropean ancestral groups; sub-Saharan African, Native American, and East Asian ancestral groups; and the like.
- a panel of AIMs useful for estimating proportional ancestry of an individual can include AIMs as set forth in SEQ ID NOS:1 to 331, for example, AIMs as set forth in SEQ ID NOS :1 to 71, which can be useful for determining proportional ancestries including IndoEuropean, sub-Saharan African, East Asian, and Native American; or ABVIs as set forth in SEQ ID NOS:7, 21, 23, 27, 45, 54, 59, 63, and 72 to 152, which can be useful for determining proportional ancestry of East Asians and sub-Saharan Africans; or in SEQ ID NOS:3, 8, 9, 11, 12, 33, 40, 59, 63, and 153 to 239, which can be useful for determining proportional ancestry of East Asians and IndoEuropeans; or in SEQ ID NOS:1, 8, 11, 21, 24, 40, 172, and 240 to 331, which can be useful for determining proportional ancestry of IndoEuropeans and sub
- an estimate is made wherein the proportional ancestry includes proportions of three ancestral groups.
- identifying a population structure that correlates with, or is most likely given, the nucleotide occurrences of the AIMs of the test individual is practiced by performing a likelihood determination for affiliation with each of a sub-Saharan African ancestral group, a Native American ancestral group, an IndoEuropean ancestral group, and an East Asian ancestral group; thereafter selecting three ancestral groups having a greatest likelihood value; determining a likelihood of all possible proportional affiliations among the three ancestral groups having the greatest likelihood value, whereby a population structure or proportional affiliation that correlates with the nucleotide occurrences of the AIMs of the test individual is identified; and identifying a single proportional combination of maximum likelihood.
- identifying a population structure that correlates with, or is most likely given, the nucleotide occurrences of the AIMs is practiced by performing six two-way comparisons comprising likelihood determinations for affiliation between each group with each other group; thereafter selecting three ancestral groups having a greatest likelihood value; determining a likelihood of all possible proportional affiliations among the three ancestral groups having the greatest likelihood value, whereby a population structure or proportional affiliation that correlates with, or is most likely given, the nucleotide occurrences of the AIMs of the test individual is identified; and identifying a single proportional combination of maximum likelihood.
- the method is practiced by performing three three-way comparisons among the groups; determining a likelihood of all possible proportional affiliations among the three ancestral groups having the greatest likelihood value, whereby a population structure or proportional affiliation that correlates with, or is most likely given, the nucleotide occurrences of the AIMs of the test individual is identified; and identifying a single proportional combination of maximum likelihood.
- the method can further include generating a graphical representation of the comparison of the three ancestral groups, wherein the graphical representation comprises a triangle with each ancestral group independently represented by a vertex of the triangle, and wherein the maximum likelihood value of proportional affiliation for an individual comprises a point within the triangle.
- the graphical representation can further include a confidence contour that indicates a level of confidence associated with estimating the proportional ancestry.
- an estimate is made wherein the proportional ancestry includes proportions of four ancestral groups.
- identifying a population structure that correlates with, or is most likely given, the nucleotide occurrences of the AEVIs of the test individual is practiced by performing six two-way comparisons, or by performing three three-way comparisons, or by performing one four- way comparison among the groups; determining a likelihood of all possible proportional affiliations among the four ancestral groups having the greatest likelihood value, whereby a population structure or proportional affiliation that correlates with, or is most likely given, the nucleotide occurrences of the AIMs of the test individual is identified; and identifying a single proportional combination of maximum likelihood.
- the method can further include generating a graphical representation of the comparison of the three ancestral groups, wherein the graphical representation comprises a pyramid with each ancestral group independently represented by a vertex of the pyramid, and wherein the maximum likelihood value of proportional affiliation for an individual comprises a point within the pyramid.
- the graphical representation can further include a confidence contour comprising a sphere around the point, wherein the sphere indicates a level of confidence associated with estimating the proportional ancestry.
- the method of estimating, with a predetermined level of confidence, proportional ancestry of at least two ancestral groups of a test individual by identifying a population structure indicative of the proportional ancestry can further include identifying a sub-population structure indicative of ethnicity associated with one of the ancestral groups for which the test individual has a proportional ancestry.
- a sub-population structure of the population structure that correlates with the nucleotide occurrences of the AIMs in the test individual is identified, wherein the sub-population structure correlates with ethnicity of the test individual.
- Such a method of identifying a sub-population structure can be performed, for example, by identifying those chromosomes of the test individual that contain the AIMs indicative of affiliation with a BioGeographical ancestral group (where the individual is proportionally affiliated with more than one BioGeographical Ancestry group), contacting a sample including nucleic acid molecules of the test individual with second hybridizing oligonucleotides that can detect nucleotide occurrences of SNPs of a second panel of AIMs, wherein the AIMs of the second panel are informative for ethnicity within one of these groups and are present on the same chromosomes of the test individual that contain the AIMs indicative of the larger (intercontinental) ancestral group within which the ethnicity occurs; and identifying a sub-population structure that correlates with the nucleotide occurrences of the AIMs of the second panel, wherein the sub-population is indicative of ethnicity of the ancestral group of the test individual.
- a test individual can be determined to be 60% IndoEuropean (IE) and 40% East Asian.
- IE IndoEuropean
- only a fraction of the total possible AIMs that can be indicative of the IE ancestral group will have been positive (if all were positive, the individual would have been 100% IE) and, therefore, only some of the individuals chromosomes or chromosomal regions will be of IndoEuropean origin.
- the chromosomes of the individual containing the positive AlMs for IE are then identified, and second hybridizing oligonucleotides specific for a second panel of AIMs are selected (e.g., from a group of 1000 or so ABvIs that cover all 23 pairs of human chromosomes), wherein the ABVIs of the second panel are limited to those that are highly variable in allele frequencies between IE ethnic groups and, therefore, indicative of IE ethnicity, and also are present on the chromosomes for which the first panel AEvIs were IE positive.
- second hybridizing oligonucleotides specific for a second panel of AIMs are selected (e.g., from a group of 1000 or so ABvIs that cover all 23 pairs of human chromosomes), wherein the ABVIs of the second panel are limited to those that are highly variable in allele frequencies between IE ethnic groups and, therefore, indicative of IE ethnicity, and also are present on the chromosomes for which the
- a sub-population structure that correlates with the nucleotide occurrences of the AHVIs of the second panel is then identified, thus indicating an ethnicity with respect to the IE ancestral group of the test individual, for example, that the IE ancestral group derives from a Northern European, a Mediterranean, a Middle Eastern, or a South Asian Indian ethnicity.
- the method provides a means to identify the ethnic origin of particular chromosomes (e.g., a Mediterranean origin of chromosomes previously determined to be of IndoEuropean origin) that contain AIMs that correlate with a population structure indicative of IndoEuropean BioGeographical Ancestry, and further contain AIMs that correlate more specifically with a sub-population structure indicative of Mediterranean ethnicity.
- the method of estimating proportional ancestry of a test individual can include generating an ancestral map of the world, wherein locations of populations having a proportional ancestry corresponding to the proportional ancestry of the test individual are indicated on the ancestral map.
- the method can supplement genealogical information.
- the method can further include overlaying the ancestral map with a genealogical map, wherein the genealogical map indicates locations of populations having geopolitical relevance with respect to the test individual, and statistically combining the information of the ancestral map and genealogical map to obtain a most likely estimate of family history of the test individual.
- Identifying a population structure that correlates with, or is most likely given, the nucleotide occurrences of the AIMs can be performed by comparing the nucleotide occurrences of the ADvIs of the test individual with known proportional ancestries corresponding to nucleotide occurrences of ADVIs indicative of BGA.
- the known proportional ancestries corresponding to nucleotide occurrences of AMs indicative of BGA can be contained in a table or other list, and the nucleotide occurrences of the test individual can be compared to the table or list visually, or can be contained in a database, and the comparison can be made electronically, for example, using a computer.
- each of the known proportional ancestries corresponding to nucleotide occurrences of AIMs indicative of BGA can be associated with a photograph of a person from whom the known proportional ancestry was determined, thus providing a means to further infer physical characteristics of a test individual.
- the photograph is a digital photograph, which comprises digital information that can be contained in a database that can further contain a plurality of such digital information of digital photographs, each of which is associated with a known proportional ancestry corresponding to nucleotide occurrences of AIMs indicative of BGA of the person in the photographs.
- a method of the invention can further include identifying a photograph of a person having a proportional ancestry corresponding to the proportional ancestry of the test individual. Such identifying can be done by manually looking through one or more files of photographs, wherein the photographs are organized, for example, according to the nucleotide occurrences of AEVIs of the person in the photograph.
- Identifying the photograph also can be performed by scanning a database comprising a plurality of files, each file containing digital information corresponding to a digital photograph of a person having a known proportional ancestry, and identifying at least one photograph of a person having nucleotide occurrences of ABVIs indicative of BGA that correspond to the nucleotide occurrences of AIMs indicative of BGA of the test individual.
- the present invention also relates to an article of manufacture, which is at least one photograph of a person having a known proportional ancestry corresponding to a population structure comprising nucleotide occurrences of AIMs indicative of BGA, as well as to a plurality of such articles, each article of the plurality comprising one (or more) photograph(s) of a person having a known proportional ancestry corresponding to a population structure comprising nucleotide occurrences of ABVIs indicative of BGA.
- the article can be contained in a file, or a plurality of the articles can be contained in a filed, for example, a file containing a plurality of photographs of different persons, wherein the some or all of the persons have the same or different known proportional ancestries that correspond to a population structure comprising nucleotide occurrences of AIMs indicative of BGA.
- a plurality of such articles is provided, as is a plurality of files, each file of which can contain one or more articles, i.e., photographs, which can be of one or more persons having the same or different known proportional ancestries that correspond to a population structure comprising nucleotide occurrences of AlMs indicative of BGA.
- different files of the plurality each can contain one (or more) photograph(s) of one person having a known proportional ancestry corresponding to a population structure comprising nucleotide occurrences of AIMs indicative of BGA.
- Different files of the plurality also can contain photographs of two or more different persons, each of whom has the same or substantially the same proportional ancestry corresponding to a population structure comprising nucleotide occurrences of AIMs indicative of BGA.
- a plurality of files can contain files, each of which contains one or more photographs of one or more persons, and when containing one or more photographs of two or more different persons, the different persons can have the same or different known proportional ancestries.
- the article of manufacture i.e., the photograph of a person having a known proportional ancestry corresponding to a population structure comprising nucleotide occurrences of AIMs indicative of BGA
- the digital information of the digital photograph, or of a plurality of digital photograph articles of manufacture of the invention can be contained in a database.
- the present invention further provides a plurality of the articles of manufactures, including at least two digital photographs each of which comprises digital information.
- the digital information for one or a plurality of the articles is contained in a database, which can be contained in any medium suitable for containing such a database, including, for example, computer hardware or software, a magnetic tape, or a computer disc such as floppy disc, CD, or DVD.
- the database can be accessed through a computer, which can contain the database therein, can accept a medium containing the database, or can access the database through a wired or wireless network, e.g., an intranet or internet.
- the present invention also relates a kit, which contains a plurality of hybridizing oligonucleotides, each hybridizing oligonucleotide including at least fifteen contiguous nucleotides of a polynucleotide as set forth in SEQ ID NOS :1 to 331, or a polynucleotide complementary thereto, and the plurality including at least five of such oligonucleotides, each based on different polynucleotides as set forth in SEQ ID NOS :1 to 331.
- the hybridizing oligonucleotides that include at least fifteen contiguous nucleotides of at least five polynucleotides as set forth in SEQ ID NOS :1 to 71, or polynucleotides complementary to any of SEQ ID NOS :1 to 71.
- the hybridizing polynucleotides of a kit of the invention can include probes, which are useful for detecting a particular AIM, including a particular nucleotide occurrence at the SNP position or DIP (deletion/insertion polymorphism) position of the AIM; can include primers, including primers useful for a primer extension reaction and primer pairs useful for a nucleic acid amplification reaction; or can include combinations of such probes and primers.
- a hybridizing oligonucleotide of the plurality includes a nucleotide corresponding to nucleotide position of the AIM (e.g., nucleotide 50 of any of SEQ ID NOS:1 to 34 and most others, nucleotide 56 of SEQ ID NO:35, nucleotide 44 of SEQ ID NO:50, or nucleotide 26 of SEQ ID NO:56), or to a nucleotide sequence complementary thereto, such a hybridizing oligonucleotide being useful as a probe to identify the presence or absence of a particular nucleotide occurrence at the SNP position of the AIM.
- nucleotide position of the AIM e.g., nucleotide 50 of any of SEQ ID NOS:1 to 34 and most others, nucleotide 56 of SEQ ID NO:35, nucleotide 44 of SEQ ID NO:50, or nucleotide 26 of SEQ ID NO:56
- the kit contains at least one pair of hybridizing oligonucleotides useful for detecting the nucleotide occurrence(s) at the SNP (or DIP) position of an AIM.
- a pair of hybridizing oligonucleotides includes one oligonucleotide that hybridizes upstream and adjacent to the SNP position of an ATM and a second oligonucleotide that hybridizes downstream of and adjacent to the SNP (or DIP) position of the AIM, wherein one or the other of the pair further contains a nucleotide complementary to a nucleotide occurrence suspected of being at the SNP (or DIP) position of the AIM (i.e., one of the polymorphic nucleotides), such a pair of hybridizing oligonucleotides being useful in an oligonucleotide ligation assay.
- a pair of hybridizing oligonucleotides includes an amplification primer pair, including a forward primer and a reverse primer, such a pair of hybridizing oligonucleotides being useful for amplifying a portion of polynucleotide that includes the SNP (or DIP) position of the AIM.
- a kit of the invention can further contain additional reagents useful for practicing a method of the invention.
- the kit can contain one or more polynucleotides comprising an AIM, including, for example, a polynucleotide containing an AIM for which a hybridizing oligonucleotide or pair of hybridizing oligonucleotides of the kit is designed to detect, such polynucleotide(s) being useful as controls.
- hybridizing oligonucleotides of the kit can be detectably labeled, or the kit can contain reagents useful for detectably labeling one or more of the hybridizing oligonucleotides of the kit, including different detectable labels that can be used to differentially label the hybridizing oligonucleotides; such a kit can further include reagents for linking the label to hybridizing oligonucleotides, or for detecting the labeled oligonucleotide, or the like.
- a kit of the invention also can contain, for example, a polymerase, particularly where hybridizing oligonucleotides of the kit include primers or amplification primer pairs; or a ligase, where the kit contains hybridizing oligonucleotides useful for an oligonucleotide ligation assay.
- the kit can contain appropriate buffers, deoxyribonucleotide triphosphates, etc., depending, for example, on the particular hybridizing oligonucleotides contained in the kit and the purpose for which the kit is being provided.
- the present invention relates to methods for performing multiplex reactions to analyze the nucleotide occurrences of single nucleotide polymorphisms (SNPs) as exemplified by ancestry informative markers (AIMs). Accordingly, the invention also relates to kits useful for performing such multiplex reactions.
- the compositions and methods of the invention are exemplified by twelve panels of polymerase chain reaction (PCR) primer pairs and a corresponding twelve panels of single base extension (SBE) primers useful for determining the nucleotide occurrences of 174 AIMs in twelve reactions.
- SNPs useful as AIMs including the SNPs set forth in SEQ ID NOS:371 to 398, 400 to 408, 410 to 413, 415, 418, 420, 422, 423, 425, 431 to 433, 438 to 441, 443, 450 to 452, 455, 456, 461 to 463, 467 to 475, 477 to 485, 487, 495 to 498, 502 to 504, 506, 508 to 512, 514, 516, 519 to 521, 526, 529, and 533 to 537, and to a method of inferring, with a predetermined level of confidence, a trait of an individual by detecting a nucleotide occurrence of a single nucleotide polymorphism as set forth in any of SEQ ID NOS:371 to 398, 400 to 408, 410 to 413, 415, 418, 420, 422, 423, 425, 431 to 433, 438 to 441, 443, 450 to 45
- a panel of 174 AIMs useful for estimating proportional ancestry of an individual is set forth as SEQ ID NOS:364 to 537 (Appendix 3), and primers that selectively hybridize immediately 5' to the SNP position of SEQ ID NOS:364 to 537, or a complement thereof, are exemplified, respectively, in Appendix 1.
- the exemplified primers are useful, for example, as templates to perform SBE reactions to determine the nucleotide occurrence at the SNP position (see Example [I]).
- the primers are organized into Panels (e.g., Panel 3, Panel 41, etc) that can be used in a multiplex reaction to sample pluralities of AIMs in a single reaction (e.g., Panel 3 allows the sampling of 7 AIMs; Panel 41 allows the sampling of 10 AIMs, etc.).
- Panels e.g., Panel 3, Panel 41, etc
- Primer pairs useful for amplifying gene sequences containing the 174 AIMs SEQ ID NOS:364 to 537) are exemplified in Appendix 2 and, in correspondence with Appendix 1, are organized into Panels that can be used in multiplex reactions (see Example 1; see, also, Appendix 4, showing results of multiplex reactions).
- Corresponding primers can be identified, for example, by the "SNP name", which is shown in the second column in each of Appendices 1 and 2, and just above the SEQ ID NO: for each AIM in Appendix 3.
- SNP name which is shown in the second column in each of Appendices 1 and 2, and just above the SEQ ID NO: for each AIM in Appendix 3.
- additional primer pairs useful for example, for amplifying a gene sequence including a SNP position of an AIM as set forth in SEQ ID NOS:364 to 537 can be designed based on the exemplified sequences (Appendix 3), or on further gene sequences upstream or downstream of those shown in Appendix 3, which can be identified from databases using the disclosed sequences to perform a search.
- an "N” or "n” is shown in some of the ATMs of SEQ ID NOS:364 to 537 to illustrate repetitive sequences that are not particularly useful for performing such searches or as targets for primers or primer pairs.
- the present invention relates to a kit that contains one or more pluralities of primers useful for performing a multiplex reaction to analyze (e.g., by SBE) the nucleotide occurrences of SNPs in a sample.
- the pluralities of primers are provided in the kit as a mixture.
- a kit of the invention contains one or more of a plurality of 7 primers as set forth in Panel 3 of Appendix 1 ; a plurality of 10 primers as set forth in Panel 41 of Appendix 1 ; a plurality of 7 primers as set forth in Panel 42 of Appendix 1; a plurality of 9 primers as set forth in Panel 43 of Appendix 1; a plurality of 24 primers as set forth in Panel 4452 of Appendix 1; a plurality of 21 primers as set forth in Panel 4553 of Appendix 1; a plurality of 20 primers as set forth in Panel 4654 of Appendix 1; a plurality of 22 primers as set forth in Panel 4755 of Appendix 1; a plurality of 12 primers as set forth in Panel 48 of Appendix 1; a plurality of 18 primers as set forth in Panel 4957 of Appendix 1; a plurality of 15 primers as set forth in Panel 5051 of Appendix 1; or a plurality of 9 primers as set forth
- the kit which contains at least a first plurality of primers, can include one plurality of primers, two separate (discrete) pluralities of primers, three separate pluralities of primers, and so on (i.e., 4, 5, 6, 7, 8, 9, 10, 11) up to all twelve pluralities of primers set forth as Panels 3, 41, 42, 43, 4452, 4553, 4654, 4755, 48, 49567, 5051, and 56A as set forth in Appendix 1
- a kit of the invention can further contain one or more reagents for performing a primer extension reaction using a primer of the plurality as a substrate (e.g., an SBE reaction).
- the reagent(s) can include, for example, a polymerase (e.g., a DNA dependent DNA polymerase), one or more deoxyribonucleotide triphosphate(s) and/or deoxyribonucleotide triphosphate analog(s), which can be incorporated into a primer by a polymerase; and/or one or more polynucleotide(s) that contain an AIM for which a primer of the plurality is designed to detect, such polynucleotide(s) being useful as controls.
- a polymerase e.g., a DNA dependent DNA polymerase
- deoxyribonucleotide triphosphate(s) and/or deoxyribonucleotide triphosphate analog(s) which can be incorporated into a primer by a poly
- the kit can contain reagents useful for detectably labeling one or the plurality of primers of the kit, including different detectable labels that can be used to differentially label the hybridizing oligonucleotides.
- the kit contains one or more dideoxyribonucleotide triphosphate(s), including dideoxyadenosine, dideoxycytidine, dideoxyguanidine, and/or dideoxythymidine.
- the kit contains all four dideoxyribonucleotide triphosphates, wherein each of the dideoxyribonucleotide triphosphates is detectably labeled with a different label, or wherein the kit contains each of the detectable labels, which can be bound to the individual dideoxyribonucleotide triphosphates.
- a kit of the invention contains at least a first plurality of primer pairs, which can be used, for example, to generate amplification products that include SNPs of interest.
- the kit can further contain one or more of comprising a plurality of 14 primer pairs as set forth in Panel 3 of Appendix 2; a plurality of 20 primer pairs as set forth in Panel 41 of Appendix 2; a plurality of 14 primer pairs as set forth in Panel 42 of Appendix 2; a plurality of 18 primer pairs as set forth in Panel 43 of Appendix 2; a plurality of 48 primer pairs as set forth in Panel 4452 of Appendix 2; a plurality of 42 primer pairs as set forth in Panel 4553 of Appendix 2; a plurality of 40 primer pairs as set forth in Panel 4654 of Appendix 2; a plurality of 44 primer pairs as set forth in Panel 4755 of Appendix 2; a plurality of 24 primer pairs as set forth in Panel 48 of Appendix 2; a plurality of 36 primer pairs as
- the kit contains the first plurality of primer pairs; the first and a second plurality of primer pairs; the first, as second, and a third plurality of primer pairs; and so on up to all twelve pluralities of primer pairs as set forth in Appendix 3.
- a kit of the invention contains one or more pluralities of primer pairs, it can further contain reagents useful, for example, for generating amplification products using the primer pairs, particularly in a multiplex reaction (e.g., a temperature stable polymerase and/or one or more deoxyribonucleotide triphosphate(s) or analogs thereof, which can, but need not, be labeled).
- a kit of the invention can contain at least one plurality of primers as set forth in Appendix 1 and at least one plurality of primer pairs as set forth in Appendix 2.
- the kit contains one or more pluralities of primer pairs that correspond to one or more pluralities of primers contained in the kit (e.g., the plurality of primers in Panel 3 of Appendix 1 and the plurality of primer pairs in Panel 3 of Appendix 2, such primer and primer pairs being useful to sample the AIMs set forth in SEQ ID NOS:364 to 370 of Appendix 3).
- Such a kit can further contain one or more reagents useful for performing a primer extension reaction (e.g., an SBE reaction) and/or an amplification reaction using primers of the plurality or primer pairs of the plurality, respectively, as a substrate.
- the present invention also relates to a method of inferring, with a predetermined level of confidence, a trait of an individual.
- a method can be performed, for example, by contacting at least a first sample (i.e., 1, 2, 3, 4, or more sample(s)) that contain test nucleic acid molecules of the individual with at least a first plurality of primers as set forth in the panels of Appendix 1, under conditions suitable for SBE of the primers; and detecting at least one SBE product of a primer of the first (or other) plurality.
- such SBE products are informative of nucleotide occurrences of a SNPs of ATMs that are indicative of a population structure correlated with a trait and, therefore, allow an inference, with a predetermined level of confidence, of a trait of the individual
- the method comprises examining 2 or more samples, which are different, using the same plurality of primers or different pluralities of primers
- the method comprises examining 2 or more same samples (e.g., aliquots of a single sample) using the same plurality of primers, thus providing a means to obtain statistically improved results
- the method comprises examining 2 or more same samples (e.g., aliquots of a single sample) using different pluralities of primers.
- the method comprises examining 12 same samples (i.e., of a single individual), wherein one of each of the 12 pluralities of primers as set forth in Appendix 1 in included in one of each of the sample, thus allowing for a determination to made of 174 AIMs as set forth as SEQ ID NOs:364-537 in Appendix 3.
- a method of the invention can be performed by generating SBE products that contain a detectable label, particularly a detectable label that is indicative of the single base (nucleotide) that has been extended onto (ligated to) a primer of a plurality.
- Detection of the SBE product(s) can be performed using any method that can distinguish the SBE products of a multiplex reaction, including, for example, an electrophoresis, mass spectrometry, or gel chromatography method, hi one aspect, an electrophoresis method is utilized, hi another aspect, capillary gel electrophoresis is utilized to detect SBE product(s) generated according to a method of the invention.
- a method of the invention can utilize a genomic DNA sample of an individual, or can use a product generated from genomic DNA.
- the test nucleic acid sample is a PCR product.
- multiplex PCR is performed using at least a first plurality of primer pairs as set forth in a Panel of Appendix 2, wherein the plurality of primer pairs is from a Panel corresponding to a Panel containing a plurality of primers that will be used for the SBE multiplex reaction.
- a trait for which an inference can be drawn according to the present methods can be any trait for which the exemplified AIMs (SEQ ID NOS:364 to 537) or other AIMs and/or SNPs that are examined are predictive.
- the trait can be biogeographical ancestry (BGA), including, for example, proportional ancestry such as a proportion of a sub-Saharan African, Native American, IndoEuropean, or East Asian ancestral group, or a combination thereof to which the individual providing the test nucleic acid sample is a member.
- the trait for which an inference can be drawn also can be, for example, responsiveness of the individual to a drag such as a cancer chemotherapeutic agent, or a statin; or susceptibility to a disease, for example, a disease having an ethnic predisposition; or can be a pigmentation trait (e.g., eye color or natural hair color).
- a drag such as a cancer chemotherapeutic agent, or a statin
- susceptibility to a disease for example, a disease having an ethnic predisposition
- a pigmentation trait e.g., eye color or natural hair color
- the method allows an inference to be drawn as to the proportional ancestry of at least two ancestral groups of the individual.
- the proportional ancestry estimated according to a method of the invention can be a proportion of any ancestral group, including, for example, a proportion of sub-Saharan African, Native American, IndoEuropean, East Asian, Middle Eastern, or Pacific Islander ancestral group, and generally is a combination of two or more of such ancestral groups.
- the proportional ancestry of an individual being examined can include proportions of sub-Saharan African and IndoEuropean ancestral groups (e.g., 80% sub-Saharan African and 20% IndoEuropean; or 60% sub-Saharan African, 20% IndoEuropean, and 20% of a third ancestral group); or can include proportions of Native American and IndoEuropean ancestral groups; East Asian and Native American ancestral groups; IndoEuropean and East Asian ancestral groups; and the like.
- the proportional ancestry can include proportions of Native American, East Asian, and IndoEuropean ancestral groups; sub-Saharan African, Native American, and IndoEuropean ancestral groups; sub-Saharan African, Native American, and East Asian ancestral groups; and the like.
- the method can further include generating a graphical representation of the comparison of the ancestral groups. For example, where three ancestral groups are determined, the graphical representation comprises a triangle with each ancestral group independently represented by a vertex of the triangle, and wherein the maximum likelihood value of proportional affiliation for an individual comprises a point within the triangle.
- the graphical representation can further include a confidence contour indicating a level of confidence associated with estimating the proportional ancestry.
- a method of inferring, with a predetermined level of confidence, a trait of an individual is performed by contacting each of 12 aliquots of a nucleic acid sample of test individual with one of each of the plurality of primer pairs as set forth in the Panels of Appendix 2, under conditions suitable for PCR of target nucleotide sequences in the nucleic acid sample, thereby generating 12 PCR amplification samples; depleting each of the 12 PCR amplification samples of single stranded primers of the plurality of primer pairs and deoxyribonucleotide triphosphates; thereafter contacting each of the 12 PCR amplification samples with one of each corresponding plurality of primers as set forth in the Panels of Appendix 1, and with dideoxyadenosine, dideoxycytidine, dideoxyguanidine, and dideoxythymidine, each of which comprises a different detectable label, under conditions suitable for SBE of the primers, thereby generating 12 SBE
- Such a method can further include identifying a population structure that correlates with, or is most likely, given the nucleotide occurrences of the AIMs.
- Such a method can be performed by comparing the nucleotide occurrences of the AIMs of the test individual with known proportional ancestries corresponding to nucleotide occurrences of ABVIs indicative of BGA.
- the known proportional ancestries corresponding to nucleotide occurrences of AEvIs indicative of BGA can be contained in a table or other list, and the nucleotide occurrences of the test individual can be compared to the table or list visually, or can be contained in a database, and the comparison can be made electronically, for example, using a computer.
- each of the known proportional ancestries corresponding to nucleotide occurrences of ABVIs indicative of BGA can be associated with a photograph of a person from whom the known proportional ancestry was determined, thus providing a means to further infer physical characteristics of a test individual.
- the photograph is a digital photograph, which comprises digital information that can be contained in a database that can further contain a plurality of such digital information of digital photographs, each of which is associated with a known proportional ancestry corresponding to nucleotide occurrences of AIMs indicative of BGA of the person in the photographs.
- a method of the invention can further include identifying a photograph of a person having a proportional ancestry corresponding to the proportional ancestry of the test individual. Such identifying can be done by manually looking through one or more files of photographs, wherein the photographs are organized, for example, according to the nucleotide occurrences of AIMs of the person in the photograph.
- Identifying the photograph also can be performed by scanning a database comprising a plurality of files, each file containing digital information corresponding to a digital photograph of a person having a known proportional ancestry, and identifying at least one photograph of a person having nucleotide occurrences of AIMs indicative of BGA that correspond to the nucleotide occurrences of AIMs indicative of BGA of the test individual.
- a database can be contained in any medium suitable for containing a database of digital information, including, for example, computer hardware or software, a magnetic tape, or a computer disc such as floppy disc, CD, or DVD.
- the database can be accessed through a computer, which can contain the database therein, can accept a medium containing the database, or can access the database through a wired or wireless network, e.g., an intranet or internet.
- the present invention also relates to a method of inferring, with a predetermined level of confidence, a trait of an individual.
- a method can be performed, for example, by detecting the nucleotide occurrence of at least one SNP as set forth in any of SEQ ID NOS: 371 to 398, 400 to 408, 410 to 413, 415, 418, 420, 422, 423, 425, 431 to 433, 438 to 441, 443, 450 to 452, 455, 456, 461 to 463, 467 to 475, 477 to 485, 487, 495 to 498, 502 to 504, 506, 508 to 512, 514, 516, 519 to 521, 526, 529, and 533 to 537.
- the invention also provides primers for detecting an ATM and pluralities of primers.
- a plurality of primer pairs for detecting an ATM is provide that contains at least 2 different primer pairs including a first primer having 15-50 contiguous nucleotides of a sequence set forth in SEQ ID NOs:364-537 and a second primer having 15-50 contiguous nucleotides of the complement of a sequence set forth in SEQ ID NOs:364-537.
- These primer pairs can be used to amplify a polynucleotide comprising a nucleotide occurrence of a single nucleotide polymorphism as set forth in any of SEQ ID NOs: 364-537.
- the plurality of primer pairs can be sequences from a single panel such as panel 3: SEQ ID NOs:712-725; panel 41: SEQ ID NOs:726-745; panel 42: SEQ ID NOs:746-759; panel 43: SEQ ID NOs:760-777; panel 4452: SEQ ID NOs:777-825; panel 4553: SEQ ID NOs:826-867; panel 4654: SEQ ID NOs:868-907; panel 4755: SEQ ID NOs:908-951; panel 48: SEQ ID NOs:952-975; panel 4957: SEQ ID NOs:976-1011; panel 5051: SEQ ID s:1012-1041; and panel 56A: SEQ ID NOs:1042-1059.
- the plurality of primer pairs contains all the primers of a panel, such as 7 primer pairs of panel 3; 10 primer pairs of panel 41 ; 7 primer pairs of panel 42; 9 primer pairs of panel 43; 24 primer pairs of panel 4452; 21 primer pairs of panel 4553; 20 primer pairs of panel 4654; 22 primer pairs of panel 4755; 12 primer pairs of panel 48; 18 primer pairs of panel 4957; 15 primer pairs of panel 5051; or 9 primer pairs of panel 56A.
- the plurality of primer pairs contains sequences from at least 2 panels, and may contain all the primer of the panels disclosed herein.
- the invention also provides pluralities containing at least 2 differen primers for detecting an AIM, where each each primer contains 15-100 contiguous nucleotides of a sequence set forth in SEQ ID NOs:364-537, or 15-100 contiguous nucleotides of the complement of a sequence set forth in SEQ ID NOs:364-537, and the primers can be used to detect a nucleotide occurrence of a single nucleotide polymorphism as set forth in any of SEQ ID NOs: 364-537.
- the plurality of primers can be from a single panel or can be all the primer of a single panel. In other embodiments of the invention, the plurality of primers can be from different panels and can be all the primer of all the panels disclosed herein.
- kits containing the plurality of primer and/or primer pairs of the invention can contain additional reagents for performing DNA amplification of primer extension reaction, such as polymerase, deoxyribonucleotide triphosphates, deoxyribonucleotide triphosphate analogs, dideoxyadenosine, dideoxycytidine, dideoxyguaiiidine, dideoxythymidine.
- the deoxyribonucleotide triphosphates or deoxyribonucleotide triphosphate analogs contain a detectable label.
- Figure 1 provides a diagram indicating the fashion in which chromosomal segments are shuffled by recombination over time in an admixed population.
- the parental populations have chromosomal segments that are continuous with respect to ABVIs along the segment.
- the first filial (Fl) generation all persons have one complete chromosomal segment from each parental population, hi the F2 generation, many more combinations are possible.
- the relative likelihood of the non-recombinant vs. the recombinant genotypes shown in F2 is dependent on the size of the chromosomal segment.
- F3 shows an example of a likely genotype for a person with two parents from the F2 generation.
- Figures 2A and 2B show triangle graphs generated using the algorithm described in Example 6 (see, also, Table 12).
- NAM Native American
- AFR sub-Saharan African
- EUR IndoEuropean.
- Figure 2A illustrates extension of a line from the NAM vertex to the opposite leg of the triangle, wherein the opposite leg represents 0% Native American ancestry.
- a circle is shown at the position of the estimated proportional ancestry (see Figure 2B), with the hatch mark on the line indicated the percent of Native American ancestry (approximately 15%).
- Figure 2B shows additional lines drawn from the AFR and EUR vertices.
- the position on each line corresponding to the position of the circle represents the proportion of each respective ancestry; i.e., 15% Native American, 60% IndoEuropean, and 25% African.
- Figure 3 shows a triangle plot depicting one approach to illustrate the value and precision of individual ancestry estimates. Typical distributions of three populations are shown (European Americans: filled squares; African- Americans: open triangles; and an African/Native American population: open circles). Also shown is a single individual with likelihood intervals represented as concentric rings surrounding the point estimate (filled circle). Like a topological map, each concentric ring represents a decrease in the likelihood by 1 log unit (10 times less likely). In this example, the individual has a likelihood interval space that is symmetrical and circular. Interval spaces will take many shapes depending on the admixture proportions of the subject in question and the allele frequencies of the markers that have been typed.
- Figure 4 provides a triangular plot showing average admixture estimates for three African- American samples (filled circles: WASH- Washington DC, AFCAR- AfroCaribbeans, and BOG-Bogalusa), a European-American sample (open circle: SCO- State College), and a Spanish- American sample (open diamond: SLV-San Luis Valley CO), hi parenthesis is shown the average African (AFR), IndoEuropean (EUR), and Native American (NAM) genetic contribution to each sample.
- AFR African-American samples
- EUROP IndoEuropean
- NAM Native American
- Figures 5 A and 5B show the genetic structure in US resident populations.
- Figure 5 A shows the percentage of unlinked ATMs showing significant association. Expected values are based on a 5% significance level. Values for the Washington DC sample are based on 33 AIMs, for San Luis Valley CO on 19 AIMs, and for State College PA on 34 AIMs.
- Figure 5B shows the correlation between individual ancestry estimates based on independent subsets of informative markers. Average correlation is based on 100 replicates. The total number of markers is the same as for Figure 5 A. The corresponding p values are indicated at the bottom of the graph.
- Figures 6A and 6B show the triangle plots for a father (Figure 6A) and mother ( Figure 6B).
- Figures 7 A to 7C show the triangle plots for each of three children of the father and mother represented in Figure 6.
- Figure 8 shows the distribution of AIMs in the genome (chrom. number, chromosome number).
- Figures 9 A and 9B demonstrate the robustness of BGA admixture proportion analysis using AIMs (see Example 2).
- the confidence (contour lines) of the maximum likelihood estimate (MLE; point) is predictably affected by the elimination of AIMs informative for a particular pair-wise comparison.
- the first contour line extending from the MLE defines the triangle plot space within which the likelihood is 2 times lower than that of the MLE, and the second contour line defines the space in which the likelihood is 5 times lower than the MLE.
- Figure 9 A shows the MLE and confidence contours obtained using 71 ATMs; actual percentages are indicated.
- Figure 9B shows the results obtained after eliminating those AIMs used to obtain the results shown in Figure 9A from the analysis that are informative for East Asian-Native American distinction.
- the MLE is relatively unaffected, and the confidence contours along the East Asian-hido European (European) and Native American— European axes remain undistorted, but the confidence contours are distorted along the East Asian— Native American axis.
- Figure 10 shows the BGA admixture proportions determined for each of eight individuals of a family pedigree. Circles represent females, squares males and the BGA affiliation for each individual is shown as a fraction where the numerator represents hidoEuropean BGA and the denominator represents Native American BGA. None of the individuals harbored sub-Saharan African or East Asian BGA except as indicated by the asterisk (*), which indicates that the individual was determined to be of 4% East Asian BGA.
- Figure 11 shows a family tree demonstrating how a Chinese great grandparent in an otherwise IndoEuropean family tree can produce a grandchild with IndoEuropean/East Asian ancestry.
- the individuals that are 100% East Asian (Chinese) are shown with shading; the admixture results for the male (square) at the bottom of the pedigree (short arrow) are of interest.
- the grandparent indicated by the long arrow is about a 50%/50% East Asian/hidoEuropean mix, and her daughter, the subject's mother, is expected to be a 25%/75% East Asian/hidoEuropean mix (see Example 3).
- Figure 12 shows the distribution of all SNPs available for genotyping by chromosomal arm for a group of patients treated for elevated cholesterol levels.
- SNPs with delta values of significance >0.20) among the various trait classes were selected. For example, in about 70% of patients, LipitorTM causes a decrease in LDL.
- the delta value (D) is the difference in minor allele frequency among those individuals for whom LDL decreased by at least 20% versus those for whom LDL did not change.
- Figure 15 shows a distribution of SNPs (Q> 0.11) among chromosome for 1,000 individuals of known eye color.
- Figure 16 shows a distribution of SNPs (Q> 0.11) among chromosome for 1,000 individuals of known hair eye color.
- Figure 17 is a diagram illustrating a protocol for AIM detection using a kit.
- Figure 18 shows instruments that can be used to carry out methods of the invention.
- Figure 19 A shows the output for analysis of Panel 3.
- Figure 19 B shows the output for analysis of Panel 41.
- Figure 19 C shows the output for analysis of Panel 42.
- Figure 19 D shows the output for analysis of Panel 43.
- Figure 19 E shows the output for analysis of Panel 4452.
- Figure 19 F shows the output for analysis of Panel 4453.
- Figure 19 G shows the output for analysis of Panel 4454.
- Figure 19 H shows the output for analysis of Panel 48.
- Figure 191 shows the output for analysis of Panel 4957.
- Figure 19 J shows the output for analysis of Panel 5051.
- Figure 19 K shows the output for analysis of Panel 56a.
- the present invention is based on the identification of ancestry informative markers (AIMs) useful for inferring a level of population structure of an individual, which, in turn, allows an inference as to various traits of the individual. Further, the AIMs of the present invention are demonstrated to correlate with a trait, regardless of whether the marker is in linkage disequilibrium with a gene or locus known to be involved in the trait.
- AIMs ancestry informative markers
- the AIMs of the present invention are distinguishable from previously described markers, which only were considered useful if they were linked with a trait, i.e., if the marker was physically close to a gene known to be involved in the trait as characterized, for example, in having a low cross-over percentage with respect to gene (or locus) known to be involved in (or associated with) the trait.
- markers (AIMs) useful in the present methods be in linkage disequilibrium with a gene/trait and, in fact, AEVIs that are disclosed herein as correlating with a trait can be located on different chromosomes from each other and from a gene/locus known to be associated with the trait.
- AEVIs are genetic loci that show alleles with high frequency differences between populations.
- AIMs are exemplified herein generally by single nucleotide polymorphisms (SNPs; see, e.g., SEQ K) NO:1), as well as by deletion/insertion polymorphisms (DIPs; see, e.g., SEQ ID NO:363).
- SNPs single nucleotide polymorphisms
- DIPs deletion/insertion polymorphisms
- AIMs can be used to estimate BioGeographical Ancestry (BGA) of an individual or collection of individuals at the population level (in terms of races), at the sub-population level (in terms of ethnicities), and at the micro-group level (in terms of familial lines within ethnic groups), as well as at a practical, phenotypically qualified level (e.g., cases and controls).
- BGA BioGeographical Ancestry
- Such ancestry estimates at the subgroup and individual level can be directly instructive regarding the genetics of phenotypes that are different qualitatively or in frequency between populations, including, for example, the likelihood that an individual will respond to a particular medication or the propensity of an individual to develop a disease.
- Ancestry estimates also can provide a compelling foundation for the use of Admixture Mapping (AM) methods to identify the genes underlying these traits.
- AM Admixture Mapping
- a panel of 71 AIMs (SEQ ID NOS: 1 to 71) was identified from an examination of over 800 candidate AJMs (see, also, SEQ ID NOS :72 to 331), and methods were developed to examine these AIMS as a means to obtain accurate estimates of proportional ancestry.
- the methods and markers of the invention have been validated in studies using skin pigmentation as a model phenotype (see, also, Intl. Publ. No. WO 02/097047 (PCT/US02/16789), which is incorporated herein by reference).
- Initial markers were genotyped in two population samples with primarily African ancestry, African Americans from Washington D. C.
- the methods and genetic markers disclosed herein provide tools for several distinct purposes, including, for example, 1) for the estimation of ancestry proportions in individuals from their DNA; 2) for the estimation of genetic structure for the control of study designs commonly used for genetic research; 3) for the construction of physical profiles through the inference of characteristics related to ancestry, which may have implications in forensic investigations; 4) for the identification of disease predisposition, referred to as "Mapping by Ancestry Linkage Disequilibrium" (MALD); and 5) for predicting a significant portion of an individual patient's response to prescription and over-the-counter medications.
- MALD Mapping by Ancestry Linkage Disequilibrium
- the present invention provides, for example, 1) statistical methods for the determination of ancestral proportions from genetic sequences within individuals and examples of use; 2) several hundred AIMs culled from the publicly available single nucleotide polymorphism (SNP) database and identified using statistical methods as useful for the determination of ancestral proportions within individuals or study groups; 3) several hundred AIMs that are demonstrated as useful for the determination of ancestral proportions within individuals or study groups; and 4) software programs that can be used for the determination of ancestral proportions within individuals or study groups.
- SNP single nucleotide polymorphism
- the distribution of trait values among the various branches of the human family tree are such that accurate classification can be obtained only through an appreciation of that structure, rather than a full understanding of the biological mechanism of the trait, and, as a result, markers that were considered false positives when considered with respect to their use for identifying phenotypically active loci, in fact, can enable accurate classification analysis; i.e., they are true positives provided the structure from which they were derived is reflective of human demography rather than sampling effects.
- the present methods are based on correlation between markers and BGA, where BGA is itself on some level of complexity correlated with a trait value, not linkage or linkage disequilibrium.
- the present invention provides a method of inferring, with a predetermined level of confidence, a trait of an individual.
- a method of the invention is performed by contacting a nucleic acid sample of a test individual with hybridizing oligonucleotides that can detect nucleotide occurrences of single nucleotide polymorphisms (SNPs) of a panel of at least about ten AIMs; and identifying, with a predetermined level of confidence, a population structure that correlates with, or is most likely given, the nucleotide occurrences of the AIMs in the individual, wherein the population structure correlates with a trait.
- SNPs single nucleotide polymorphisms
- the panel of AIMs are selected on their delta value (see below) and, where relevant, based on the particular platform used to perform the method, and are indicative of a population structure correlated with the trait.
- ADVIs are exemplified herein by the polynucleotides set forth as SEQ ID NOS: 1 to 331, wherein the SNP position generally is at nucleotide position 50 (but see, e.g., SEQ ID NO:35, nucleotide 56; SEQ ID NO:51, position 48; SEQ ID NO:56, position 26).
- a test individual for whom a trait is to be inferred can be any individual for whom it is desired to infer a trait, and generally is a human.
- the methods of the invention also can be used for inferring traits of other mammals, including, for example, domestic animals such as cats, dogs, or horses; farm animals such as cattle, sheep, pigs, or goats; or other animals.
- the trait to be examined can be any trait of interest, including, as exemplified herein, proportional ancestry (BGA); hair, skin or iris pigmentation; or drug responsiveness.
- the methods of the invention are particularly useful because they allow for an inference to be made of a desired trait with a predetermined level of confidence.
- a predetermined level of confidence means that an inference or estimate of the invention is made using statistical methods that provide a confidence interval to be determined about a mean or a maximum likelihood value.
- other similarly likely values can also be determined and these can be combined to define the x-fold likelihood confidence intervals, where x is any number such as 2, 5 or 10. For example, all of the structure results corresponding to a likelihood value 10 times lower than the Maximum Likelihood Value can be plotted or listed to define the 10-fold likelihood confidence interval.
- an assay of the invention is designed such that performance of the test results in a value having a desired confidence level.
- a method of the invention can be performed such that the result has a predetermined level of confidence by varying the number of AIMs examined with respect to a trait. For example, use of a certain panel often AIMs will allow an inference to be made as to whether an individual has a particular trait, e.g., responsiveness to LipitorTM, with a certain level of confidence, whereas use of a panel of twenty AIMs, which can, but need not be partially overlapping with the panel often AIMs, will allow the same inference to be made, but with a higher level of confidence.
- a sample useful for practicing a method of the invention can be any biological sample of a test individual that contains nucleic acid molecules, including portions of the gene sequences containing ABVIs that are to be examined or, wherein the polymorphism of an ABVI results in an amino acid change in an encoded polypeptide, any biological sample that contains the encoded polypeptides.
- the sample can be a cell, tissue or organ sample, or can be a sample of a biological fluid such as semen, saliva, blood, cerebrospinal fluid, and the like.
- a nucleic acid sample useful for practicing a method of the invention will depend, in part, on whether the SNPs to be identified are in coding regions or in non-coding regions. Where one or more SNPs is present in a non-coding region of a gene, the nucleic acid sample generally is a deoxyribonucleic acid (DNA) sample, particularly genomic DNA or an amplification product thereof.
- DNA deoxyribonucleic acid
- RNA sample can be used and examined directly, or a cDNA or amplification product thereof can be examined according to the present methods.
- the nucleic acid sample can be DNA or RNA, or products derived therefrom, for example, amplification products.
- the methods of the invention are exemplified with respect to a nucleic acid sample, it will be recognized that particular SNPs, when present in coding regions of a gene, can result in polypeptides containing different amino acids at the positions corresponding to the SNPs due to non-degenerate codon changes. As such, in one aspect, the methods of the invention are practiced using a sample containing polypeptides of the subject.
- a method of the invention is performed by contacting the sample and hybridizing oligonucleotides under conditions suitable for detecting the nucleotide occurrences of the AIMs of the individual by the hybridizing oligonucleotides. Further, in aspects of the methods of the invention, the sample can be contacted with second hybridizing oligonucleotides, for example, to determine a sub-population structure.
- second when used in reference to hybridizing oligonucleotides (or to a panel of AIMs), is used for convenience of discussion so as to allow a clear distinction, e.g., of steps for performing a method, hi this respect, it should be further recognized that one or more hybridizing oligonucleotides used, e.g., to determine a population structure, also can be included among the second hybridizing oligonucleotides.
- Conditions suitable for detecting the nucleotide occurrences of ADVIs will vary depending on the sequences of the hybridizing oligonucleotides, including their length and complementarity, as well as on the particular assay being used and, for example, whether the assay is being performed as a multiplex assay.
- the hybridizing oligonucleotides which are at least 15 nucleotides in length, can contain deoxyribonucleotides or ribonucleotides, which are linked together by a phosphodiester bond, and can be single stranded or double stranded, though they generally are used in a single stranded form.
- Such hybridizing oligonucleotides can be prepared using methods of chemical synthesis or by enzymatic methods such as by the polymerase chain reaction (PCR).
- hybridizing oligonucleotides, or other polynucleotides useful in a methods or contained in a kit of the invention also can contain nucleoside or nucleotide analogs, and can have a backbone bond other than a phosphodiester bond, such oligonucleotides providing certain advantages such as having increased stability or more desirable hybridization properties.
- Nucleotide analogs are well known in the art and commercially available, as are polynucleotides containing such nucleotide analogs (Lin et al., Nucl. Acids Res.
- the covalent bond also can be any of numerous other bonds, including a tliiodiester bond, a phosphorothioate bond, a peptide-like bond or any other bond known to those in the art as useful for linking nucleotides to produce synthetic oligonucleotides (see, for example, Tarn et al., Nucl. Acids Res.
- nucleolytic activity including, for example, a tissue culture medium or sample comprising a cell extract because the modified oligonucleotides can be less susceptible to degradation.
- the hybridizing oligonucleotides useful for purposes of the present invention are at least about 15 bases in length, which is sufficient to permit the oligonucleotide to selectively hybridize to a target polynucleotide comprising the AIM, and can be at least about 18 nucleotides or 21 nucleotides or 25 nucleotides or more in length.
- the oligonucleotides of the invention will be 15-100 nucleotides in length.
- a numerical range such as 15-50 includes the beginning number, the ending number and any number in between. Thus, 15-50 can be 15, 16, 17, 18...97, 98, 99 or 100.
- sequence hybridization refers to hybridization under moderately stringent or highly stringent physiological conditions, which can distinguish related nucleotide sequences from unrelated nucleotide sequences.
- conditions used to achieve a particular level of stringency are known to vary, depending on the nature of the nucleic acids being hybridized, including, for example, the length, degree of complementarity, nucleotide sequence composition (e.g., relative GC:AT content), and nucleic acid type, i.e., whether the oligonucleotide or the target nucleic acid sequence is DNA or RNA.
- nucleic acids is immobilized, for example, on a filter, bead, chip, or other solid matrix.
- Methods for selecting appropriate stringency conditions can be determined empirically or estimated using various formulas, and are well known in the art (see, for example, Sambrook et al., supra, 1989).
- An example of progressively higher stringency conditions is as follows: 2X SSC/0.1% SDS at about room temperature (hybridization conditions); 0.2X SSC/0.1% SDS at about room temperature (low stringency conditions); 0.2X SSC/0.1% SDS at about 42°C (moderate stringency conditions); and 0.1X SSC at about 68 0 C (high stringency conditions).
- Washing can be carried out using only one of these conditions, for example, high stringency conditions, or each of the conditions can be used, for example, for 10 to 15 minutes each, in the order listed above, repeating any or all of the steps listed. As such, final conditions will vary, depending on the particular hybridization reaction involved, and can be determined empirically. It should be recognized that a variety of conditions can be utilized to provide selective hybridization conditions.
- the conditions can be selected such that selective hybridization occurs for all of the hybridizing oligonucleotides in the reaction.
- Detectable labeling of a polynucleotide is well known in the art and includes, for example, the use of detectable labels such as chemiluminescent labels, radionuclides, enzymes, haptens such as digoxygenin and biotin, fluorophores, and unique oligonucleotide sequences.
- detectable labels such as chemiluminescent labels, radionuclides, enzymes, haptens such as digoxygenin and biotin, fluorophores, and unique oligonucleotide sequences.
- PCR products can be performed, wherein one primer is biotinylated and the other primer contains digoxygenin.
- the amplification products can then be bound to a streptavidin plate, washed, reacted with an enzyme-conjugated antibody to digoxygenin, and developed with a chromogenic, fluorogenic, or chemiluminescent substrate for the enzyme.
- a radioactive method can be used to detect generated amplification products, for example, by including a radiolabeled deoxynucleoside triphosphate into the amplification reaction, then blotting the amplification products onto DEAE paper for detection.
- one primer is biotinylated
- streptavidin-coated scintillation proximity assay plates can be used to measure the PCR products.
- chemiluminescent label for example, a lanthanide chelate such as used in the DELFIA® assay (Pall Corp.), a fluorescent label, or an electrochemiluminescent label such as ruthenium tris-bipyridyl (ORI-GEN).
- a chemiluminescent label for example, a lanthanide chelate such as used in the DELFIA® assay (Pall Corp.), a fluorescent label, or an electrochemiluminescent label such as ruthenium tris-bipyridyl (ORI-GEN).
- Methods for detecting a nucleotide occurrence at a SNP or DIP position of an AIM can utilize one or more oligonucleotide probes or primers, including, for example, an amplification primer pair, that selectively hybridize to a target polynucleotide spanning the AIM.
- Oligonucleotide probes useful in practicing a method of the invention can include, for example, an oligonucleotide that is complementary to and spans a portion of the target polynucleotide, including the position of the SNP (or DIP), wherein the presence of a specific nucleotide at the position of the SNP is detected by the presence or absence of selective hybridization of the probe.
- Such a method can further include contacting the target polynucleotide and hybridized oligonucleotide with an endonuclease, and detecting the presence or absence of a cleavage product of the probe, depending on whether the nucleotide occurrence at the SNP site is complementary to the corresponding nucleotide of the probe.
- a pair of probes that specifically hybridize upstream and adjacent and downstream and adjacent to the site of the SNP, wherein one of the probes includes a nucleotide complementary to a nucleotide occurrence of the SNP also can be used in an oligonucleotide ligation assay, wherein the presence or absence of a ligation product is indicative of the nucleotide occurrence at the SNP site.
- An oligonucleotide also can be useful as a primer, for example, for a primer extension reaction, wherein the product (or absence of a product) of the extension reaction is indicative of the nucleotide occurrence.
- a primer pair useful for amplifying a portion of the target polynucleotide including the SNP or DIP site can be useful, wherein the amplification product is examined to determine the nucleotide occurrence at the SNP site or to determine whether there is an insertion or a deletion at the DIP site.
- oligonucleotide probes or primers including, for example, an amplification primer pair, that selectively hybridize to a target polynucleotide, which contains one or more SNP positions.
- Hybridizing oligonucleotide useful in practicing a method of the invention can include, for example, an oligonucleotide that is complementary to and spans a portion of the target polynucleotide, including the position of the SNP or DIP (including whether the DIP has a deletion or insertion), wherein the presence of a specific nucleotide at the SNP site or the presence of a deletion or insertion at the DIP site is detected by the presence or absence of selective hybridization of the oligonucleotide probe.
- Such a method can further include contacting the target polynucleotide and hybridized oligonucleotide with an endonuclease, and detecting the presence or absence of a cleavage product of the probe, depending on whether the nucleotide occurrence at the SNP site is complementary to the corresponding nucleotide of the probe.
- An oligonucleotide ligation assay also can be used to identify a nucleotide occurrence at a SNP site, wherein a pair of probes that selectively hybridize upstream and adjacent to and downstream and adjacent to the site of the SNP, and wherein one of the probes includes a terminal nucleotide complementary to a nucleotide occurrence of the SNP.
- the terminal nucleotide of the probe is complementary to the nucleotide occurrence
- selective hybridization includes the terminal nucleotide such that, in the presence of a ligase, the upstream and downstream oligonucleotides are ligated. As such, the presence or absence of a ligation product is indicative of the nucleotide occurrence at the SNP site.
- a hybridizing oligonucleotide also can be useful as a primer, for example, for a primer extension reaction, wherein the product (or absence of a product) of the extension reaction is indicative of the nucleotide occurrence at a SNP site or an insertion or deletion at a DIP site.
- a primer pair useful for amplifying a portion of the target polynucleotide including the SNP or DIP site can be useful, wherein the amplification product is examined to determine the nucleotide occurrence at the SNP site or the presence of a deletion or an insertion at the DTP site.
- Particularly useful methods include those that are readily adaptable to a high throughput format, to a multiplex format, or to both.
- Conditions that allow generation of an amplification product in a sample in which an amplification reaction is being performed are such that the reaction contains the necessary components for the amplification reaction to occur.
- Such conditions include, for example, appropriate buffer capacity and pH, salt concentration, metal ion concentration if necessary for the particular polymerase, appropriate temperatures that allow for selective hybridization of the primer or primer pair to the template target polynucleotide, as well as appropriate cycling of temperatures that permit polymerase activity and melting of a primer or primer extension or amplification product from the template or, where relevant, from forming a secondary structure such as a stem-loop structure.
- a primer extension or amplification product can be detected directly or indirectly and/or can be sequenced using various methods known in the art.
- Amplification products that span a SNP site can be sequenced using traditional sequence methodologies, including, for example, the dideoxy-mediated chain termination method (Sanger et al., J. Molec. Biol. 94:441, 1975; Prober et al. Science 238:336-340, 1987) or the chemical degradation method (Maxam et al., Proc. Natl. Acad. Sci. USA 74:560, 1977) to determine the nucleotide occurrence at the SNP loci.
- the nucleotide occurrence at a SNP site also can be determined using a microsequencing method, wherein the identity of only a single nucleotide is determined at a predetermined site (U.S. Pat. No. 6,294,336).
- Microsequencing methods include the Genetic Bit Analysis method (WO 92/15712). Additional, primer-guided, nucleotide incorporation procedures for assaying polymorphic sites in DNA have also been described ( Komher et al., Nucl. Acids. Res. 17:7779-7784, 1989; Sokolov, Nucl. Acids Res. 18:3671, 1990; Syvanen et al., Genomics 8:684-692, 1990; Prezan et al, Hum.
- nucleotide occurrence at a SNP position is described by Macevicz (U.S. Pat. No. 5,002,867), wherein a nucleic acid sequence is determined via hybridization with multiple mixtures of oligonucleotide probes.
- sequence of a target polynucleotide is determined by permitting the target to sequentially hybridize with sets of probes having an invariant nucleotide at one position, and a variant nucleotides at other positions.
- the nucleotide sequence is determined by hybridizing the target with a set of probes, then determining the number of sites that at least one member of the set is capable of hybridizing to the target (i.e., the number of matches). This procedure is repeated until each member of a sets of probes has been tested.
- U.S. Pat. No. 6,294,336 provides a solid phase sequencing method for determining the sequence of nucleic acid molecules (either DNA or RNA) by utilizing a primer that selectively binds a polynucleotide target at a site wherein the SNP is the most 3' nucleotide selectively bound to the target.
- SNP-ITTM is a 3 -step primer extension reaction, hi the first step a target polynucleotide is isolated from a sample by hybridization to a capture primer, which provides a first level of specificity. In a second step the capture primer is extended from a terminating nucleotide trisphosphate at the target SNP site, which provides a second level of specificity.
- the extended nucleotide trisphosphate can be detected using a variety of known formats, including: direct fluorescence, indirect fluorescence, an indirect colorimetric assay, mass spectrometry, fluorescence polarization, etc.
- Reactions can be processed in 384 well format in an automated format using a SNPstreamTM instrument (Orchid BioSciences, Inc., Princeton, NJ).
- Phase known data can be generated by inputting phase unknown raw data from the SNPstreamTM instrument into the Stephens and Donnelly's PHASE program.
- McSNP® analysis provides another method for detecting a nucleotide occurrence in an AIM (Akey et al., supra, 2Q01). McSNP® analysis provides the additional advantages that it does not require a step of gel electrophoresis, thus minimizing the time and cost for detecting a SNP, and that it is readily adaptable to high throughput formats, thus allowing examination of one or more panels of AIMs and/or samples in parallel.
- the nucleotide occurrence of a SNP is such that the nucleotide occurrence results in an amino acid change in an encoded polypeptide
- the nucleotide occurrence can be identified indirectly by detecting the particular amino acid in the polypeptide.
- the method for determining the amino acid will depend, for example, on the structure of the polypeptide or on the position of the amino acid in the polypeptide.
- the polypeptide contains only a single occurrence of an amino acid encoded by the particular SNP, the polypeptide can be examined for the presence or absence of the amino acid. For example, where the amino acid is at or near the amino terminus or the carboxy terminus of the polypeptide, simple sequencing of the terminal amino acids can be performed.
- the polypeptide can be treated with one or more enzymes and a peptide fragment containing the amino acid position of interest can be examined, for example, by sequencing the peptide, or by detecting a particular migration of the peptide following electrophoresis.
- the particular amino acid comprises an epitope of the polypeptide
- the specific binding, or absence thereof, of an antibody specific for the epitope can be detected.
- Other methods for detecting a particular amino acid in a polypeptide or peptide fragment thereof are well known and can be selected based, for example, on convenience or availability of equipment such as amass spectrometer, capillary electrophoresis system, magnetic resonance imaging equipment, and the like.
- a method of the invention utilizes an antibody, or antigen binding fragment thereof, that specifically binds, for example, to a polypeptide comprising an amino acid encoded by a nucleotide sequence comprising one nucleotide occurrence of a SNP, but not substantially to a polypeptide comprising an different amino acid encoded by the codon comprising the SNP; or that specifically binds, for example, to a polypeptide comprising an amino acid sequence encoded by one form a DIP (e.g., that having the insertion), but not substantially to that encoded by the alternative form (e.g., that having the deletion).
- a DIP e.g., that having the insertion
- alternative form e.g., that having the deletion
- the term "specific interaction,” or “specifically binds” means that two molecules form a complex that is relatively stable under physiologic conditions.
- the term is used herein to refer to various interactions, including, for example, the interaction of an antibody that binds a target polynucleotide including the SNP site only if the SNP has a specified, but not an alternative, nucleotide occurrence (e.g, an A, but not a T); or the interaction of an antibody that binds a polypeptide that includes one amino acid that is encoded by a codon that includes a SNP site, but not a polypeptide having an alternative amino acid encoded by the codon comprising the SNP.
- a specific interaction can be characterized by a dissociation constant of at least about 1 x 10-6 M, generally at least about 1 x 10-7 M, usually at least about 1 x 10-8 M, and particularly at least about 1 x 10-9 M or 1 x 10-10 M or greater.
- a specific interaction generally is stable under physiological conditions, including, for example, conditions that occur in a living individual such as a human or other vertebrate or invertebrate, as well as conditions that occur in a cell culture such as used for maintaining mammalian cells or cells from another vertebrate organism or an invertebrate organism.
- Methods for determining whether two molecules interact specifically are well known and include, for example, equilibrium dialysis, surface plasmon resonance, and the like.
- Antibodies useful in a method of the invention include antibodies that specifically bind polynucleotides that encompass an AIM, or that bind polypeptides that include an amino acid encoded by a codon that includes a SNP or that include amino acids due to an insertion at a DIP site. Such antibodies are selected such that they specifically bind a polypeptide that includes a first amino acid encoded by a codon that includes the SNP loci, but do not bind, or bind measurably more weakly to a polypeptide that includes a second amino acid encoded by a codon that includes a different nucleotide occurrence at the SNP.
- antibody is used broadly herein to refer to immunoglobulin molecules and antigen binding portions of immunoglobulin molecules that specifically bind an antigen.
- antibodies useful in a method of the invention can be polyclonal, monoclonal, multispecific, human, humanized or chimeric antibodies, single chain antibodies, Fab fragments, F(ab') fragments, fragments produced by a Fab expression library, anti-idiotypic (anti-Id) antibodies, and the like, as well as antigen/epitope binding fragments of such antibodies.
- Antigen binding fragments of antibodies include, but are not limited to, Fab, Fab' and F(ab')2, Fd, single-chain Fv's (scFv), single-chain antibodies, disulfide-linked Fv fragments (sdFv) and fragments comprising either a VL or VH domain.
- antigen-binding antibody fragments including single-chain antibodies, can comprise the variable region(s) alone or in combination with the entirety or a portion of the hinge region, CHl, CH2, and/or CH3 domains.
- the antibodies can be from any animal origin including birds and mammals, or can be expressed recombinantly, for example, in insect or mammalian host cells or in plants.
- compositions and methods are provided for inferring an individual's response to commonly used medications, which, remarkably, is a function of individual ancestry; the disclosed markers and methods are, to a differing extent for each drug, useful for the inference of such response.
- compositions and methods are provided for inferring individual and/or group ancestral proportions from knowledge of the individual's or group's DNA sequences.
- compositions and methods are provided for using knowledge of ancestry relevant DNA sequences to identify disease susceptibility and drug response genes through the MALD process.
- compositions and methods are provided for qualifying and normalizing study groups for more traditional methods of mapping disease genes. Each of these processes requires an accurate knowledge of ancestry, which can be determined using the methods and compositions disclosed herein.
- Admixture generates allelic associations between all marker loci where allele frequencies are different between the parental populations (Chakraborty and Weiss, Proc. Natl. Acad. Sci., USA 85:9119-9123, 1988). These associations decay with time in a way that is dependent on the genetic distance between them. Thus, disease (or trait) risk alleles that are different between the parental populations can be mapped in admixed populations using special panels of genetic markers showing high frequency differences between the parental populations. These markers, termed ATMs, are characterized by having particular alleles that are more common in one group of populations than in other populations.
- allelic associations were generated recently and, therefore, are more easily detected for a given sample size because they extend over longer distances than in non-admixed populations (up to 10-20 centiMorgans (cM) or more).
- cM centiMorgans
- the statistical basis of this approach was first explored by Chakraborty and Weiss (supra, 1988) and subsequently by Stephens, Briscoe and O'Brien, who named the method "mapping by admixture linkage disequilibrium" (MALD; Stephens et al., Amer. J. Hum. Genet. 55:809-824, 1994; Briscoe et al., J. Hered. 85:59-63, 1994).
- ANCOVA Analysis of Covariance
- the conditional probability of each allelic state is required given the ancestry of the allele (ancestry specific allele frequencies), e.g., West African or European.
- ancestry specific allele frequencies e.g., West African or European.
- AIMs SNPs or deletion/insertion polymorphisms in the human genome that are of potential use for drug response, disease gene or forensics research were identified; 2) biochemical and genetic test results are provided that demonstrate these AIMs can be useful for disease gene and forensics research; 3) the usefulness of ATMs derived from systematic screens of the human genome in actual drug response, disease gene or forensics research is demonstrated; 4) the usefulness of AIMs derived from systematic screens of the human genome to make an inference as to whether an individual is susceptible to acquire a disease, or to not respond to a drug, is demonstrated; 5) the usefulness of AIMs derived from systematic screens of the human genome to make an inference as to whether a crime scene DNA specimen was derived from an individual of, for example, an 80% European, 10% African and 10% Asian heritage or some other ratio/mix is demonstrated
- compositions and methods of the invention provide a means to predict an individual's likelihood to respond to a particular drug.
- LDL low-density lipoprotein
- some of the most powerful markers identified for LDL response to LipitorTM were gene types that are not immediately recognized as relevant for drug response, including, for example, TYR, OCA2, TYRP, FDPS, and HMGCR (see, also, Intl. Publ. No. WO 03/002721 (PCT/US 02/20847), and Ml. Publ. No. WO 03/045227 (PCT/US02/38345), each of which is incorporated herein by reference).
- ADvIs genes that are informative as to ancestry
- the ability to effectively use the AIMs for the development of patient-drag classification sets, admixture screening panels and forensics tools was accomplished using the disclosed method, including screening the SNP database (see, for example, world wide web ("www") at URL "nih.ncbi.nlm.gov”) for AIMs; screening the AEvIs against a multi- ancestral panel of DNA samples to verify those that, indeed, are good AIMs; using the disclosed statistical and software methods for using the AIM sequences to make biologically relevant inferences; and recognizing that an individual's likelihood to respond to a drug or develop a disease can be predicted through a knowledge of their ancestry, which, in turn, is indicated through the individual's AIM sequences.
- the present approach allow the same statement, and also provides the proportional ancestry of the individual with confidence intervals (CI); e.g., 25% (95% CI 15-35%); European ancestry; 75% (95% CI 60-80%) African ancestry; and 0% (95% CI 0-6%) Native American ancestry.
- CI confidence intervals
- the confidence intervals can be expressed in multidimensional space to provide a clearer representation of the ancestry measured for the person in question (see below; see, also, Figure 2). Though methods for constructing such a representation were known, the present disclosure is the first to provide for the representations to be presented with quantifiable confidence.
- Phasing AIMs along the chromosome can be accomplished by several methods, including 1) estimation from the genotypes of the individual, 2) molecular haplotyping (e.g., allele 05863
- the disclosed methods allow simultaneous consideration of the two sex chromosomes (X and Y) and the mtDNA for ancestral inferences. AIMs are found on each of these sources, and can be informative for many of the questions regarding the ancestral proportions of a person and the population(s) from which a particular person is derived. For example, Hispanic/Latino populations have very high (65-100%) frequencies of Native American mtDNA haplogroups, while showing only a minority contribution from Native American populations in autosomal markers.
- a person with reputed Native American ancestry on her father's side, with a non-Native American mtDNA haplogroup is more likely not Hispanic than partially Native American as she may suspect, than were she to have a Native American mtDNA haplogroup.
- Linkage disequilibrium is increasingly being used as a mapping tool for both fine-scale determination of gene position and for the initial localization of disease genes in special populations.
- Allelic associations are significantly non-random and correlated with physical distance within small ( ⁇ 60 kb) genomic regions (see Jorde, Amer. J. Hum. Genet. 66:979-988, 1995; Jorde, Genome Res. 10:1435-1444, 2000, for review), possibly reflecting an underlying "block structure" that characterizes many genomic regions (Reich et al., 2001; Daly et al., 2001).
- This approach has been important in the positional cloning of several simple Mendelian diseases, including the cystic fibrosis gene, the Huntington's disease gene, and the diastrophic dysplasia gene.
- LD can be used for initial disease gene mapping in homogeneous populations that have undergone recent increases in size or are genetically inbred.
- disease alleles were probably present in a small number of founders, and recombination has had limited opportunity to randomize associations between these alleles and alleles at linked marker loci.
- An analysis of allelic associations between affected and unaffected individuals from these populations can thus facilitate the localization of the disease locus.
- Mendelian diseases have been mapped using this approach: several diseases in the Finish population, Hischsprung's disease in Mennonites, benign recurrent intrahepatic cholestasis in an isolated Dutch fishing community, familial persistent hyperinsulinemic hypoglycemia of infancy in a consanguineous group of Saudi Arabian families, and Bardet-Biedl syndrome in Bedouins.
- LD LD mapping of complex polygenic diseases
- the extent of LD is a complex function of a number of genetic and evolutionary factors such as mutation, recombination and gene conversion rates, demographic and selective events, and the age of the mutation itself. Some of these factors affect the whole genome while others only affect particular genome regions. Additionally, variation of mutation, recombination and gene conversion rates throughout the genome is expected to create LD differences between genome regions.
- admixed populations such as Hispanics and African Americans offer the advantage that linkage disequilibrium has been created recently due to the admixture process, and it can extend over large chromosomal regions, although it is extremely important to control for the genetic structure present in these populations in order to avoid false positives (Parra et al., supra, 1998; Lautenberger et al., supra, 2000; Pfaff et al., supra, 2001; Nordborg and Tavare, supra, 2002).
- LD based methods many issues regarding LD in human populations remain largely unexplored.
- HMP Haplotype Map Project
- the primary focus of the HMP is to understand the fine scale structure of individual genomic regions throughout the genome, whereas the present methods allows an understanding of the LD that results specifically from admixture.
- the level of LD from admixture is on the order of millions of bases (Mb; megabases) and tens of Mb, while the HMP is focused on the level of lO's to 100's of kilobases (kb), and genomic and population features that affect the results from one project may not be noted in the other.
- admixture mapping require accurate parental allele frequency estimates. As such, a large number of different African, Native American, European, and Asian populations have been typed (see Table 6, below), while the HMP will likely focus on one or two samples of the major population groups.
- having a sample of 10 for each of 4 ancestral groups is not adequate for the identification of sequences present preferentially in one or some of those groups; as disclosed herein, at least 50 individuals were tested for each of several tens of ancestral groups (not just four) in order to comprehensively identify these markers.
- the population-based association methods disclosed herein provide several advantages over traditional linkage studies. Localizing disease genes by traditional genetic linkage methods relies on the use of related persons, either extended multigenerational families or pairs of related individuals. These approaches are effective and very powerful when investigating diseases caused by single genes. However, polygenic and multifactorial diseases like Type 2 diabetes, hypertension, and prostate cancer result from the interaction of several genes and multiple environmental influences, and are more difficult to study using traditional methods. The identification of genes contributing to susceptibility to common disease is complicated by heterogeneity. The source of the genetic heterogeneity determines which mapping methods are most likely to work for gene identification.
- locus heterogeneity Two primary types of genetic heterogeneity are locus heterogeneity, wherein more than one locus is affecting a genetic trait, and allelic heterogeneity, wherein within a particular causative locus there are multiple alleles that are important in altering the phenotype.
- Traditional linkage analysis using extended families is generally insensitive to allelic heterogeneity, but can be adversely affected by locus heterogeneity.
- LD based methods are generally adversely affected by allele heterogeneity, but less affected by locus heterogeneity.
- association-based approaches like measured genotype and transmission disequilibrium test (TDT) may be more sensitive than family-based LOD score or sib-pair methods.
- Risch and Merikangas (supra, 1996) compared the number of individuals needed for sib-pair studies and TDT studies, and showed that the number of individuals needed to detect linkage is much smaller for TDT than for sib-pair studies. This is especially true when the disease locus has a small effect. For example, for a locus with risk ratio of 2.0 and a gene frequency of 50%, 2500 sib- pairs or 340 case/parents for TDT would be required.
- LD linkage disequilibrium in genome- wide screening is that LD decays exponentially with the recombination fraction between the marker and the disease locus and with the age of the disease-causing mutations. For older mutations that predispose to diseases, LD becomes very weak even between the disease allele and alleles at relatively closely spaced marker loci.
- LD mapping has been useful in mapping of rare genetic diseases such as cystic fibrosis and diseases in special populations like the Finns and Bedouins, populations that have been subject to significant population bottlenecks, inbreeding, or founder effects.
- LD exists because the variant allele is relatively young, as in the case of cystic fibrosis, or the population has reduced genetic variability, which elevates the LD throughout the genome.
- a leading model for the genetics of common disease stipulates predisposing alleles at a number of loci which, when present in particular combinations, increase an individuals risk (Greenberg, Amer. J. Hum. Genet.
- the predisposing alleles also are expected to be at relatively high frequencies.
- the frequency of an allele in a population is on average related to the age of the allele such that more frequent alleles are older than rare alleles. This fact poses a problem for the application of LD- based methods to identify common disease genes in populations that are not isolated or inbred since, in homogeneous populations, the LD is inversely related to the age of the allele and risk alleles for common disease are expected on average to be relatively old.
- compositions and methods of the present invention for admixture mapping allows for a more precise and reliable mapping of complex traits.
- Admixture mapping takes advantage of the LD created when previously isolated populations admix, and can circumvent these problems in mapping complex traits. It was first recognized that admixed populations could be useful in determining genetic linkage by Chakraborty and Weiss (supra, 1988). When genetically divergent populations hybridize, non-random allelic associations result among loci that have significant allele frequency differentials, even among unlinked loci. This LD quickly decays when the genetic loci in question are not located close together on the same chromosome.
- the linkage disequilibrium at unlinked loci is reduced to 0.1% of the initial level, while at loci 10 cM and 1 cM apart, the disequilibrium due to true linkage will still be 34.9% and 90.4%, respectively, of the initial level.
- the critical parameters for effective detection of linkage in an admixed population identified are the frequency differential ( ⁇ ) between the parental populations and the number of generations since hybridization. Linkage by association analysis in admixed populations worked efficiently if ⁇ was large (not less than 0.2) and the number of generations since admixture small (on the order of 10 generations; Chakraborty and Weiss, supra, 1988).
- the ancestry-specific allele frequencies are required; i.e., the conditional probability of each allelic state given the ancestry of the allele (West African or European, in this example).
- the total population of alleles at any locus in the admixed population can be considered to be made up of two subpopulations - alleles of African ancestry and alleles of European ancestry.
- Bayes' theorem can be applied to invert these conditional probabilities and calculate the posterior distribution of ancestry at the locus (0, 1 or 2 alleles of African ancestry) for each individual under study. If the information conveyed by typing a single marker is not sufficient to assign the ancestry of each allele at the marker locus to one of the two founding populations, markers can be combined in a multipoint analysis to estimate ancestry at adjacent loci.
- Table 1 lists an initially identified panel that includes of 32 AIMs (SEQ ID NOS-.332 to 363; see, also, Example 1). Using a cutoff of d > 0.3, only four of these markers are restricted in info ⁇ nativeness to one of the three comparisons (African/European; African/Native American; Native American/European); the rest are informative for two of the comparisons, and one marker is informative for all three comparisons. In a further study, apanel of 71 AIMs was identified (SEQ ID NOS:1 to 71; Table 6) that are informative as to IndoEuropean, sub-Saharan African, Native American, and East Indian (see Example 2).
- African continent contains a tremendous amount of genetic diversity.
- the majority of enslaved Africans came from West-Central Africa, approximately from Senegal in the North, to Angola in the South (Curtin, In The Atlantic Slave Trade; Madison, University of Wisconsin Press 1969); other areas of Africa were US2006/005863
- heterogeneity can affect an admixture mapping effort.
- heterogeneity can lead to erroneous estimates of the parental frequencies for the markers used in the map, thus biasing the estimate of admixture.
- the goal of admixture mapping is to infer linkage conditioning on parental admixture, it is important to avoid misspecification of the ancestry-specific allele frequencies, because this could affect the final outcome of the analysis.
- heterogeneity can affect the number of loci for the phenotype being studied.
- heterogeneity due to the presence of multiple genes (locus heterogeneity) affecting a phenotype will reduce the power of admixture mapping to detect significant genotypic effects, as it does with any other mapping method.
- Heterogeneity also can be due to multiple functional alleles within a particular gene (allelic heterogeneity).
- MClR Cyclone Resistor Reduction
- these variants are on different haplotype backgrounds, thus decreasing the power to detect an effect of the MClR gene in association studies relative to the case where a single mutation had occurred and risen to high frequencies.
- these variants will all be in allelic association with markers informative for ancestry (e.g., the MClR marker, see Table 1) and, since they all have the effect of lightening the skin, their information will be compounded making the identification of MClR by admixture mapping no different with six functional variants than were there only one functional variant unique to Europeans. So long as the effects of functional variants within a particular parental population are in the same direction (for example, in lowering the risk of disease), allelic heterogeneity will not be a serious problem in admixture mapping.
- the frequency differential ( ⁇ ) is equal to px - py, which is equal to qy - qx, where px and py are the frequencies of one allele in populations X and Y and qx and qy are the frequencies of the other.
- Median ⁇ levels among major ethnic groups range between 15% and 20%, and the vast majority (> 95%) of arbitrarily identified biallelic genetic markers have ⁇ ⁇ 50% (Dean et al., supra, 1994, which is incorporated herein by reference).
- Statistical estimates of power in an admixture mapping study based on using markers with an Fst > 0.4 were previously presented (McKeigue et al., supra, 2000).
- Controlling for genetic structure in admixed populations requires knowledge of the ancestral proportions and the genetic structure of these populations. Reliable estimates of admixture proportions can allow the informed identification of populations to consider. Since the admixture LD that is created during hybridization is dependent on the level of admixture, sampling should focus generally on those areas of the country where there has been more admixture.
- a homogeneous population is one in which there is no assortive mating, a panmictic population in which families are formed more or less by random combination and without regard to DNA genotypes. In most large cosmopolitan populations, homogeneity is expected and found. If, however, there exists stratification within the population such that individuals do not mate at random, the population will not be homogeneous. Admixture is one of the possible mechanisms introducing genetic structure in a population, and taking into account this genetic structure facilitates admixture mapping.
- the effect of genetic structure is considered at two levels.
- parental populations are evaluated to determine whether they show heterogeneity in the allele frequencies of the selected AHVIs; heterogeneity can affect the estimate of admixture proportions, as discussed above.
- GC genomic control
- SA structured association
- the SA method (Pritchard et al., supra, 2000; Pritchard and Donelly, supra, 2001) was used to test for genetic structure in the parental populations. This method is based on using the genotypic information provided by the unlinked markers to infer population structure, and has been implemented in a software program available from Jonathan Pritchard. In addition, to test for the presence of structure, the program estimates individual ancestry proportions, and, for the present studies, this Bayesian method was used to complement the Maximum Likelihood Estimate method. These two methods produce estimates of individual ancestry that are highly correlated.
- the second source of genetic structure in admixed populations is due to the admixture process itself, in which newly created linkage disequilibrium is introduced in the admixed population.
- AIMS such as those exemplified herein, are particularly sensitive indicators of population structure that is related to ancestral proportions.
- samples are tested for the non-random association of alleles both within a locus (Hardy-Weinberg disequilibrium) and among loci (gametic disequilibrium), and the distribution of individual ancestry estimates also is examined (see, Pfaff et al., supra, 2001; Parra et al., supra, 2001).
- Senegambia (Gambia and Senegal), Sierra Leone (Guinea and Sierra Leone), Windward Coast (Ivory Coast and Liberia), Gold Coast (Ghana), Bight of Benin (From the Volta river to the Benin river), Bight of Biafra (East of Benin river to Gabon), and Angola (Southwest Africa, including part of Gabon, Congo and Angola).
- Curtin (supra, 1969) has offered, based on data on the English trade of the 18th century (the peak of the Atlantic slave trade), estimates of the proportional contribution by areas, showing that Angola and Bight of Biafra were the regions contributing the highest numbers of slaves imported into the North American mainland (around 25% each). However, there were significant differences in ethnic origin depending on the port of entry in the United States, and the figures for the Colonies of Virginia and South Carolina differed considerably.
- Hispanic was coined mainly for governmental demographic purposes, and is generally employed to identify persons of Latin American origin or descent, living in the United States. Although this definition lumps together people with very different historical, cultural and linguistic backgrounds, this classification has been widely used. Even though Central America, the Caribbean, and South America have been for centuries under the domination of the Iberian imperial powers (Spain and Portugal), they have had quite different regional histories, both before and after the Colonial period. Populations from four continents, North and South America, Europe, and Africa, have contributed to the formation of contemporary 05863
- Hispanic populations The anthropological background of the main three Hispanic groups currently living in the United States - Mexican Americans, Puerto Ricans and Cuban Americans, which together makeup more than 80% of the total US Hispanic population - is considered here.
- African slaves were imported to work in the sugar plantations in large numbers, even outnumbering the population of European origin (Kanellos and Perez, In Chronology of Hispanic- American history: from pre-Columbian times to the present; New York, Gale Research 1995). Accordingly, the percentage of African genetic contribution in contemporary Cubans (20%) and Puerto Ricans (37%) is significantly higher than in other Hispanic populations (Hanis et al., supra, 1991).
- race is a complex concept and, in general usage, reflects both a cultural and biological feature of a person or group of people. Given the fact that physical differences between populations are often accompanied by cultural differences, it has been difficult to separate these two elements. There has been a movement in several fields of science to oversimplify the issue declaring that race is merely a social construct. While this often can be true, depending on what aspect of variation between people is being considered, it can be false for many particular instances of differences between the populations of the world.
- a biological difference is skin color. Culture or environment has almost no effect on the level of pigmentation in a person's skin. Yet there are dramatic differences across populations. Pigmentation is, of course, only skin deep and is quite simple in light of the complex environments in which we live and how these affect individual and group quality of life.
- the present invention provides methods of estimating proportional ancestry of at least two ancestral groups of a test individual and, in particular, provides a confidence level with respect to the proportional ancestry.
- a method of the invention can be performed by contacting a sample, which includes nucleic acid molecules of the test individual, with hybridizing oligonucleotides that can detect nucleotide occurrences of SNPs of a panel of at least about ten AIMs that are indicative of BGA for each ancestral group examined, wherein the contacting is under conditions suitable for detecting the nucleotide occurrences of the AIMs of the test individual by the hybridizing oligonucleotides; and identifying, with a predetermined level of confidence, a population structure that correlates with the nucleotide occurrences of the AIMs of each of the ancestral groups examined, wherein the population structure is indicative of proportional ancestry.
- BGA biogeographical ancestry
- BGA is a simple and objective description of the ancestral origins of a person, in terms of the major population groups (e.g., Native American, East Asian, Indo-European, and sub-Saharan African). BGA estimates can represent the mixed nature of many people and populations today. In many countries, including the United States, there has been extensive mixing among populations that initially had been separate. The term “admixture” is used herein to refer to such population mixing. In this respect, BGA estimates can be understood as individual admixture proportions, which take the form of a series of percentages that add to 100%. For example, a person can have 75% hido-European, 15% African, and 10% Native American ancestry, or can have 100% hido-European ancestry, or the like.
- the proportional ancestry estimated according to a method of the invention can be a proportion of any ancestral group, including, for example, a proportion of sub-Saharan African, Native American, IndoEuropean, East Asian, Middle Eastern, or Pacific Islander ancestral group, and generally is a combination of two or more of such ancestral groups.
- the proportional ancestry of a test individual can include proportional affiliation among the sub-Saliaran African and IndoEuropean ancestral groups (e.g., 80% sub-Saharan African and 20% IndoEuropean; or 60% sub-Saharan African, 20% IndoEuropean, and 20% of a third ancestral group); or can include proportional affiliation among the Native American and IndoEuropean ancestral groups; East Asian and Native American ancestral groups; IndoEuropean and East Asian ancestral groups; and the like.
- sub-Saliaran African and IndoEuropean ancestral groups e.g., 80% sub-Saharan African and 20% IndoEuropean; or 60% sub-Saharan African, 20% IndoEuropean, and 20% of a third ancestral group
- proportional affiliation among the Native American and IndoEuropean ancestral groups e.g., 80% sub-Saharan African and 20% IndoEuropean; or 60% sub-Saharan African, 20% IndoEuropean, and 20% of a third ancestral group
- proportional affiliation among the Native American and IndoEuropean ancestral groups e.g., 80%
- a panel of AIMs useful for estimating proportional ancestry of an individual can include AIMs as set forth in SEQ ID NOS :1 to 331, for example, AIMs as set forth in SEQ ID NOS :1 to 71, which can be useful for determining proportional ancestries including IndoEuropean, sub-Saharan African, East Asian, and Native American; or AEVIs as set forth in SEQ ID NOS:7, 21, 23, 27, 45, 54, 59, 63, and 72 to 152, which can be useful for determining proportional ancestry of East Asians and sub-Saharan Africans; or in SEQ ID NOS:3, 8, 9, 11, 12, 33, 40, 59, 63, and 153 to 239, which can be useful for determining proportional ancestry of East Asians and mdoEuropeans; or in SEQ ID NOS :1, 8, 11, 21, 24, 40, 172, and 240 to 331, which can be useful for determining proportional ancestry of
- An estimate can be made, for example, of an individual's proportional ancestry with respect to three ancestral groups.
- identifying a population structure within an individual that correlates with the nucleotide occurrences of the AIMs of the test individual can be practiced by performing a likelihood determination for affiliation with each of a sub-Saharan African ancestral group, a Native American ancestral group, an IndoEuropean ancestral group, and an East Asian ancestral group; thereafter selecting three ancestral groups having a greatest likelihood value for the individual; determining a likelihood of all possible proportional affiliations among the three ancestral groups having the greatest likelihood value, whereby a population structure or proportional affiliation that correlates with the nucleotide occurrences of the AEVIs of the test individual is identified; and identifying a single proportional combination of maximum likelihood.
- identifying a population structure that correlates with the nucleotide occurrences of the ATMs can be practiced by performing six two-way (binary) comparisons comprising likelihood determinations for affiliation of each group compared to each other group; thereafter selecting three ancestral groups having a greatest likelihood value across all comparisons; determining a likelihood of all possible proportional affiliations among the three ancestral groups having the greatest likelihood value, whereby a population structure or proportional affiliation that correlates with the nucleotide occurrences of the AfMs of the test individual is identified; and identifying a single proportional combination of maximum likelihood.
- Such a methodology works as well for individuals of three-way admixture as individuals that are 100% affiliated with a single group.
- An estimate of an individual's proportional ancestry that includes proportions of three ancestral groups also can be made by performing three three-way comparisons among the groups; determining a likelihood of all possible proportional affiliations among the three ancestral groups having the greatest likelihood value, whereby a population structure or proportional affiliation that correlates with the nucleotide occurrences of the AEVIs of the test individual is identified; and identifying a single proportional combination of maximum likelihood.
- An advantage of the present methods is that a graphical representation of the comparison of the three ancestral groups can be generated, wherein the graphical representation comprises a triangle with each ancestral group independently represented by a vertex of the triangle, and wherein the maximum likelihood value of proportional affiliation for an individual comprises a point within the triangle (see Figures 2 and 3). If desired, the graphical representation can further include a confidence contour that indicates a level of confidence associated with estimating the proportional ancestry.
- an estimate of an individual's proportional ancestry also can be made where the proportional ancestry includes proportions of four ancestral groups.
- identifying a population structure that correlates with the nucleotide occurrences of the AEVIs of the test individual is practiced by performing six two-way comparisons, or by performing three three-way comparisons, or by performing one four-way comparison among the groups; determining a likelihood of all possible proportional affiliations among the four ancestral groups having the greatest likelihood value, whereby a population structure or proportional affiliation that correlates with the nucleotide occurrences of the ATMs of the test individual is identified; and identifying a single proportional combination of maximum likelihood.
- the method can further include generating a graphical representation of the comparison of the three ancestral groups, wherein the graphical representation comprises a pyramid with each ancestral group independently represented by a vertex of the pyramid, and wherein the maximum likelihood value of proportional affiliation for an individual comprises a point within the pyramid.
- the graphical representation can further include a confidence contour comprising a sphere around the point, wherein the sphere indicates a level of confidence associated with estimating the proportional ancestry.
- the present methods provide substantially greater information for forensics because, using a DNA sample obtained at a crime scene, the methods can provide an investigator with prospective information as to the likelihood of an individuals ancestry, as well as hair, skin and eye pigmentation.
- present DNA methods only allow provide retrospective information because they require that a DNA sample from a crime scene be compared with DNA samples contained in a database or taken from specific individuals.
- the latter methods can provide confirmation that a suspect is likely the perpetrator of a crime, they provide no useful information until the suspect is apprehended, except in cases where the suspect's DNA sample already has been entered into a database.
- the methods of estimating proportional ancestry of a test individual as disclosed herein also provide a tool that can supplement genealogical information, which generally is based on relationships established using geopolitical information (see Example 3).
- the present methods provide information that can be used to generate an ancestral map of the world, wherein locations of populations having a proportional ancestry corresponding to the proportional ancestry of the test individual are indicated on the ancestral map.
- the method can further include overlaying the ancestral map with a genealogical map, wherein the genealogical map indicates locations of populations having geopolitical relevance with respect to the test individual, and statistically combining the information of the ancestral map and genealogical map to obtain a most likely estimate of family history of the test individual.
- Identifying a population structure that correlates with the nucleotide occurrences of the AIMs can be performed by comparing the nucleotide occurrences of the ADVIs of the test individual with known proportional ancestries corresponding to nucleotide occurrences of ADVIs indicative of BGA.
- the known proportional ancestries corresponding to nucleotide occurrences of AIMs indicative of BGA can be contained in a table or other list, and the nucleotide occurrences of the test individual can be compared to the table or list visually, or can be contained database, and the comparison can be made electronically, for example, using a computer.
- a particularly useful application of a method of the invention involves associating known proportional ancestries corresponding to nucleotide occurrences of AMs indicative of BGA of individuals, with a photograph of a person from whom the known proportional ancestry was determined, thus providing a means to further infer physical characteristics of a test individual.
- the photograph is a digital photograph, which comprises digital information that can be contained in a database that can further contain a plurality of such digital information of digital photographs, each of which is associated with a known proportional ancestry corresponding to nucleotide occurrences of ATMs indicative of BGA of the person in the photographs.
- a method of the invention can further include identifying a photograph of a person having a proportional ancestry corresponding to the proportional ancestry of the test individual. Such identifying can be done by manually looking through one or more files of photographs, wherein the photographs are organized, for example, according to the nucleotide occurrences of AIMs of the person in the photograph.
- Identifying the photograph also can be performed by scanning a database comprising a plurality of files, each file containing digital information corresponding to a digital photograph of a person having a known proportional ancestry, and identifying at least one photograph of a person having nucleotide occurrences of AIMs indicative of BGA that correspond to the nucleotide occurrences of AIMs indicative of BGA of the test individual.
- BGA can be determined using any of several variations of the disclosed BGA test, including three BGA tests referred to as the ANCESTRYbyDNATM 1.0 test, the ANCESTRYbyDNATM 2.0 test , and the ANCESTRYbyDNATM 3.0 test (DNAPrint genomics, Inc.; Sarasota FL), which utilize selected panel of Ancestry Informative Markers (AIMs) that have been characterized in a large number of well-defined population samples.
- the ATMs are selected on the basis of a showing of substantial differences in frequency between population groups and, as such, provide information as to the origin of a particular person whose ancestry is otherwise unknown.
- the Duffy Null allele (FY*0) is very common (approaching fixation or an allele frequency of 100%) in all sub-Saharan African populations, but is not found outside of Africa. Thus, a person with this allele is very likely to have some level of African ancestry.
- a likelihood (or probability) can be determined that the person is derived from particular parental populations by calculating all of the possible mixes of parental populations. The population (or combination of populations) where the likelihood is the highest is taken as the best estimate of the ancestral proportions of the person; confidence intervals on these point estimates of ancestral proportions are also calculated.
- An objective assessment of the biological component of human ancestry provides important knowledge about the person whose DNA is examined. For example, an analysis of the biological component of ancestry can elucidate health disparities by identifying, for example, genetic contributions to the higher rates of hypertension and diabetes in African Americans, or the higher rates of dementia in European Americans. Estimates of BGA also can help connect individuals separated by adoption or some other event with their ancestral populations. And even if a person is not particularly motivated to reconnect with ancestors, he or she can uncover the past of their family, for example, to verify family legends or identify forgotten roots. Because the disclosed method is based on an analysis of DNA, it provides a personal demographics tool, which, unlike a census, can provide highly accurate demographics data.
- the BGA test of the invention utilizes sequences throughout a person's genome and, therefore, can provide information about a greater number of ancestors.
- the present invention provides a method of estimating, with a predetermined level of confidence, proportional ancestry of at least two ancestral groups of a test individual.
- a biogeographical ancestry test or "BGA test”
- BGA test can be performed, for example, by contacting a sample, which includes nucleic acid molecules of the test individual, with hybridizing oligonucleotides that can detect nucleotide occurrences of SNPs of a panel of at least about ten AlMs that are indicative of BGA for each ancestral group examined, wherein the contacting is under conditions suitable for detecting the nucleotide occurrences of the AIMs of the test individual by the hybridizing oligonucleotides; and identifying, with a predetermined level of confidence, a population structure that correlates with the nucleotide occurrences of the AIMs of each of the ancestral groups examined, wherein the population structure is indicative of proportional ancestry.
- proportional ancestry refers to the percent contribution of each (if more than one) ancestral group to which an individual belongs.
- the proportional ancestry estimated according to a method of the invention can be a proportion of any ancestral group, including, for example, a proportion of sub-Saharan African, Native American, IndoEuropean, East Asian, Middle Eastern, or Pacific Islander ancestral group, and generally is a combination of two or more of such ancestral groups.
- the proportional ancestry of a test individual can include proportions of sub-Saharan African and IndoEuropean ancestral groups (e.g., 80% sub-Saharan African and 20% IndoEuropean; or 60% sub-Saharan African, 20% IndoEuropean, and 20% of a third ancestral group); or can include proportions of Native American and IndoEuropean ancestral groups; East Asian and Native American ancestral groups; hidoEuropean and East Asian ancestral groups; and the like.
- the proportional ancestry can include proportions of Native American, East Asian, and hidoEuropean ancestral groups; sub-Saharan African, Native American, and IndoEuropean ancestral groups; sub-Saharan African, Native American, and East Asian ancestral groups; and the like.
- a panel of AEVIs useful for estimating proportional ancestry of an individual can include AIMs as set forth in SEQ ID NOS: 1 to 331, for example, AEVIs as set forth in SEQ ID NOS :1 to 71, which can be useful for determining proportional ancestries including IndoEuropean, sub-Saharan African, East Asian, and Native American.
- the AIMs as set forth in SEQ ID NOS:7, 21, 23, 27, 45, 54, 59, 63, and 72 to 152 can be useful for determining proportional ancestry of East Asians and sub-Saharan Africans;
- the AIMs as set forth in SEQ ID NOS:3, 8, 9, 11, 12, 33, 40, 59, 63, and 153 to 239 can be useful for determining proportional ancestry of East Asians and IndoEuropeans;
- the AEVIs as set forth in SEQ ID NOS:1, 8, 11, 21, 24, 40, 172, and 240 to 331 can be useful for determining proportional ancestry of IndoEuropeans and sub-Saharan Africans.
- the ANCESTRYbyDNATM 1.0 test (DNAPrint genomics, Inc.) is a first version of the BGA test that was specifically designed to provide information on the proportions of ancestry at the continental level.
- the ANCESTRYbyDNATM 1.0 test allowed information to be obtained as to levels of Native American, European, and African ancestry, as three component groups.
- the ANCESTRYbyDNATM 2.0 test in comparison, provides information on the proportions of ancestry at the continental level for most continents, including Native American, Indo-European (includes European, Middle Eastern and South Asian groups such as Indians), African, and East Asian (which includes Pacific Islanders, and can distinguish ancestries within Asia and the Pacific Rim.
- the ANCESTRYbyDNATM 3.0 test can further define the levels of ancestry within continents, for example, by distinguishing Japanese from Chinese, or Northern European from Middle Eastern, thus providing greater insight into where within a particular continent a person's ancestors were derived.
- ANCESTRYbyDNATM 2.0 test a logical grouping into four BGA delineations was made, wherein South Asian, Middle Eastern and European are grouped into a single group called IndoEuropean (see Example 2). This grouping was based on anthropological evidence and cultural connections between these groups (e.g., their languages are derived from a common base). The results disclosed herein demonstrate that these groups are far more similar to one another in genetic sequence content than to other groups.
- the ANCESTRYbyDNATM 2.0 test also performs more accurately when Pacific Islanders are grouped with East Asians.
- the four groupings used in the ANCESTRYbyDNATM 2.0 test include 1) Native American (i.e., those who migrated to inhabit South and North America); 2) IndoEuropean (Europeans, Middle Easterners and South Asians such as Indians; 3) East Asians (Japanese, Chinese, Koreans, Pacific Islanders); and 4) Africans (sub-Saharan).
- the ANCESTRYbyDNATM 3.0 test can further distinguish between South Asian and European, and between Pacific Islander and East Asian, thus providing 6 proportions (Native American, European, African, South Asian, East Asian and Pacific Islander), although the confidence intervals are larger than those obtained with the ANCESTRYbyDNATM 2.0 test. Further improvement to the tests are provided, wherein the confidence intervals are reduced. Confidence intervals around a point estimate can be reduced, thus increasing the accuracy of the test, by analyzing a complementary panel, thereby improving the confidence intervals by about 50%.
- the algorithm used to determine the ancestral proportions was developed based on the idea that it is possible to use certain statistical methods to make an inference of the proportionality of ancestry in an individual sample based on their sequence (see Example 6; see, also, Table 12).
- the method of making this inference using the present algorithm is similar to those of others, wherein, if the frequency of an allele in a population is known, and this frequency is significantly different from population to population, a "Maximum Likelihood Estimation" (MLE) can be used to determine the probability that a person with the allele belongs to one of the groups. Expanded to include multiple alleles from multiple genetic loci and multiple populations, the process is the same.
- Bayes' theorem states that the probability of an event given a circumstance (called a posterior probability) is a function of the frequency of the circumstance given the event (a conditional probability) and the frequency of the event itself (the prior probability).
- the event is a proportionality of ancestry, and the circumstance is the genotype of the individual. If the minor allele frequency for 10 SNPs in 2 populations of human beings is known, and the sequence of a person at each of the 10 SNPs is known, a simple binary classification into one of the two groups can be made by choosing that for which the conditional probability is highest. This would offer little improvement over current methods for determining the BGA from a DNA sample. What is provided by the present invention is the ability to obtain the proportionality of ancestry for more complex and realistic scenarios of ancestry.
- the present algorithm addresses this limitation by plotting the MLE graphically, including plotting the confidence regions around the MLE such that a level of confidence can be ascertained (see Figures 2 and 3).
- the algorithm i.e., the software code
- the triangle plot provided by the algorithm is an original method to graphically represent the MLE calculations and their confidence intervals. To read a triangle plot (see below), a perpendicular line is dropped from each vertex (triangle point) of the triangle to the opposite edge (base) of the triangle.(see Figure 2A).
- the circle represents the MLE, and a line has been dropped from the Native American (NAM) vertex to the line below; the line serves as a scale for the percentage of Native American ancestry, from 0% at the base to 100% at the vertex (or tip). Projecting the circle on this line can be analogized to holding a flashlight to the right of the triangle at the same level as the circle and observing the shadow the circle makes on the line. Where this shadow falls on the line indicates the percentage of Native American ancestry, hi this example, the individual is about 15% Native American, as indicated by the hash mark on the line.
- NAM Native American
- the results provided using the disclosed method provide a statistical estimate of BGA admixture for an individual (the Maximum Likelihood Estimate (MLE)), which is indicated as a point on a triangle plot to represent the proportions of the most relevant three groups for the individual. While the MLE is the most likely estimate, the true value for the individual can be a different set of proportions. A triangle plot with calculated and plotted estimates that are 2 times, 5 times and 10 times less likely than the MLE is exemplified.
- MLE Maximum Likelihood Estimate
- the first contour around the MLE delimits the space within which the estimates are up to 2 times less likely, with those positions near the line reflecting values close to 2 and those near the MLE closer to 1; the second contour around the MLE delimits the space within which the estimates are 5 times less likely in the same graded fashion proceeding from the first contour line to the second contour line.
- the third contour delimits the space within which the estimates are from 5 fold (near the second contour line) up to 10 times less likely (near the third contour line). The greater the number of DNA positions read, the closer these contour lines approach the MLE point.
- the likelihood (probability) that the true value is represented by a different point than the MLE increases until the MLE is met, where the probability is maximum (i.e., the Maximum Likelihood Estimate; MLE).
- MLE Maximum Likelihood Estimate
- the test can be performed so that the contour lines are very close to the MLE by sequencing a very large collection of markers.
- the survey can be limited to a desired number of markers (e.g., 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, or more) that is sufficient to determine the most likely proportions with good confidence.
- a variety of different panels of 100 SNP markers have been examined, a panel of 71 AIMs has been used in a number of studies, and a panel of 175 AIMs is being examined such that very confidence is achieved.
- the BGA test of the invention has been validated by determining the frequency of DNA sequence variants in various human populations.
- the test has been evaluated using a large number of people from a wide range of ancestral groups, and the estimates have corresponded well to what is known from anthropological and historical data.
- Hispanics are known to have arisen as an ethnic group from the blending of colonial Europeans with Native Americans, and the hundreds of Hispanics examined using the BGA test aligned with these two groups almost exclusively.
- Nigerians plot as of almost pure African BGA African Americans plot more as a mixture between this group and Europeans, which is what would be expected from knowledge about the admixture between Africans and Europeans in the United States.
- the method also was validated through pedigree challenge (see Example 1); i.e., when the BGA is determined from a mother and father, that of their children should plot somewhere between the two. Numerous family pedigrees have been examined using the test, and the ancestral proportions of offspring have always plotted between those of the child's parents. When the MLE estimates are tested objectively (blindly), they prove to be excellent estimates of ancestral proportions. For example, the data for a European American man, whose mother is European mix and father is mostly Greek, showed the man to be of 85% European ancestry, but also of 15% Native American ancestry (Example 1).
- the genotypes are quite accurate. Because the latest genetic reading equipment available is used, an accuracy greater than 99% accuracy is routinely achieved for each site. If an accurate value was not obtained for a particular site in a particular sample, an "FL" is indicated, instead of the genotype letters for that site. Having a few FL's generally does not prevent a good ancestry estimate.
- a sample can produce an FL for a site because, for example, a small region of the chromosome around this site is missing or is of different sequence character than for most (this result is not uncommon given the highly variable nature of the chromosomal positions we measure); or because not enough DNA was obtained from the buccal swab used to collect a DNA sample.
- the present invention also provides articles of manufacture, including one or a plurality of photographs, each photograph being of a person having a known proportional ancestry corresponding to a population structure comprising nucleotide occurrences of AIMs indicative of BGA, the known proportional ancestry being associated with the photograph in the article.
- An article of manufacture of the invention i.e., a photograph and the proportional ancestry information
- can be contained in one or more files e.g., the photograph and information in one file, or the photograph in one file and the information in a second file, which is or can be linked to the photograph).
- more than one photograph of an individual having a known proportional ancestry can be contained in the same or a linked file, for example, photographs containing different profiles of the individual or photographs of the individual at various ages.
- a plurality of the articles can be contained in a file, for example, a file containing a plurality of photographs of different persons, wherein the some or all of the persons have the same or different known proportional ancestries that correspond to a population structure comprising nucleotide occurrences of AIMs indicative of BGA.
- Such a plurality of articles also can be contained in different files, including, for example, a plurality of files, each containing one photograph and information regarding the known proportional ancestry of the individual in the photograph, or each containing two or more photographs of different individuals, each of which contains the same known proportional ancestry, or each containing two or more photographs of different individuals, some or all of which have a different proportional ancestry as compared to another individual whose photograph is contained in the file.
- a plurality of such articles is provided, as is a plurality of files, each file of which can contain one or more articles, i.e., photographs, which can be of one or more persons having the same or different known proportional ancestries that correspond to a population structure comprising nucleotide occurrences of AIMs indicative of BGA; and the plurality of files can contain files, each of which contains one or more photographs of one or more persons, and when containing one or more photographs of two or more different persons, the different persons can have the same or different known proportional ancestries.
- articles i.e., photographs
- the plurality of files can contain files, each of which contains one or more photographs of one or more persons, and when containing one or more photographs of two or more different persons, the different persons can have the same or different known proportional ancestries.
- the article of manufacture i.e., the photograph of a person having a known proportional ancestry corresponding to a population structure comprising nucleotide occurrences of AIMs indicative of BGA can be a digital photograph, which comprises digital information, including for the photographic image and any other information that may be relevant or desired (e.g., the age, name, or contact information of the subject in the photograph, or the subject's answer on a questionnaire as to what the subject believes his or her ancestry to be).
- Such digital information of one or more digital photographs can be contained in a database thus facilitating searching of the photographs and/or known proportional ancestry information using electronic means.
- the present invention further provides a plurality of the articles of manufactures, including at least two digital photographs, each of which comprises digital information.
- the digital information for one or a plurality of the articles is contained in a database, it can comprise any medium suitable for containing such a database, including, for example, computer hardware or software, a magnetic tape, or a computer disc such as floppy disc, CD 5 or DVD.
- the database can be accessed through a computer, which can contain the database therein, can accept a medium containing the database, or can access the database through a wired or wireless network, e.g., an intranet or internet.
- kits useful for practicing a method of the invention can contain, for example, a plurality of hybridizing oligonucleotides, each of which has a length of at least fifteen contiguous nucleotides of a polynucleotide as set forth in SEQ ID NOS : 1 to 331 (or a polynucleotide complementary thereto), the plurality including at least five (e.g., 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, etc.) of such oligonucleotides, each based on different polynucleotides as set forth in SEQ ID NOS :1 to 331.
- the hybridizing oligonucleotides that include at least fifteen contiguous nucleotides of at least five polynucleotides as set forth in SEQ ID NOS :1 to 71, or polynucleotides complementary to any of SEQ ID NOS :1 to 71.
- the hybridizing oligonucleotides are specific for at least ten AIMs as set forth in SEQ ID NOS: 1 to 71.
- a kit of the invention also can contain at least two panels of such hybridizing oligonucleotide, including, for example, a panel of at least five (e.g., 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, etc.) hybridizing oligonucleotides specific for AIMs as set forth in SEQ ID NOS:7, 21, 23, 27, 45, 54, 59, 63, and 72 to 152; or a panel of at least five hybridizing oligonucleotides specific for an AIM as set forth in SEQ ID NOS:3, 8, 9, 11, 12, 33, 40, 59, 63, and 153 to 239; or a panel of at least five hybridizing oligonucleotides specific for AMs as set forth in SEQ ID NOS:1, 8, 11, 21, 24, 40, 172, and 240 to 331 ; or two or more of such panels and/or a panel of at least five hybridizing oligonucleotides specific for AIMs as set forth in SEQ TD NOS :1 to
- the hybridizing polynucleotides of a kit of the invention can include probes, which are useful for detecting a particular AIM, including a particular nucleotide occurrence at the SNP position of the AM; can include primers, including primers useful for a primer extension reaction and primer pairs useful for a nucleic acid amplification reaction; or can include combinations of such probes and primers.
- a hybridizing oligonucleotide of the plurality can, but need not, include a nucleotide corresponding to nucleotide position of the SNP or DIP of an AIM, e.g., nucleotide 50 of an ADVI as set forth in any of SEQ ID NOS:1 to 55 and 57 to 331 or nucleotide 26 of SEQ ID NO:56, or to a nucleotide sequence complementary thereto, such a hybridizing oligonucleotide being useful as a probe to identify the presence or absence of a particular nucleotide occurrence at the SNP position of the AIM.
- a kit of the invention also can contain at least one pair of hybridizing oligonucleotides useful for detecting the nucleotide occurrence at the SNP position or the presence or absence of a nucleotide sequence the DIP position of an AIM.
- a pair of hybridizing oligonucleotides can include one oligonucleotide that hybridizes upstream and adjacent to the SNP position of an AIM and a second oligonucleotide that hybridizes downstream of and adjacent to the SNP position of the AIM, wherein one or the other of the pair further contains a nucleotide complementary to a nucleotide occurrence suspected of being at the SNP position of the AIM (i.e., one of the polymorphic nucleotides), such a pair of hybridizing oligonucleotides being useful in an oligonucleotide ligation assay.
- a pair of hybridizing oligonucleotides can include an amplification primer pair, including a forward primer and a reverse primer, such a pair of hybridizing oligonucleotides being useful for amplifying a portion of polynucleotide that includes the SNP or DIP position of the AIM.
- a kit of the invention can further contain additional reagents useful for practicing a method of the invention.
- the kit can contain one or more polynucleotides comprising an ATM, including, for example, a polynucleotide containing an AIM for which a hybridizing oligonucleotide or pair of hybridizing oligonucleotides of the kit is designed to detect, such polynucleotide(s) being useful as controls.
- hybridizing oligonucleotides of the kit can be detectably labeled, or the kit can contain reagents useful for detectably labeling one or more of the hybridizing oligonucleotides of the kit, including different detectable labels that can be used to differentially label the hybridizing oligonucleotides; such a kit can further include reagents for linking the label to hybridizing oligonucleotides, or for detecting the labeled oligonucleotide, or the like.
- a kit of the invention also can contain, for example, a polymerase, particularly where hybridizing oligonucleotides of the kit include primers or amplification primer pairs; or a ligase, where the kit contains hybridizing oligonucleotides useful for an oligonucleotide ligation assay, hi addition, the kit can contain appropriate buffers, deoxyribonucleotide triphosphates, etc., depending, for example, on the particular hybridizing oligonucleotides contained in the kit and the purpose for which the kit is being provided.
- a polymerase particularly where hybridizing oligonucleotides of the kit include primers or amplification primer pairs
- a ligase where the kit contains hybridizing oligonucleotides useful for an oligonucleotide ligation assay, hi addition, the kit can contain appropriate buffers, deoxyribonucleotide triphosphates, etc., depending, for example, on the particular hybridizing oligon
- the present invention is based on the development of reagents, including oligonucleotide primers and primer pairs that allow the performing of multiplex reactions for determining the nucleotide occurrences of single nucleotide polymorphisms (SNPs) of ancestry informative markers (ABVIs), which are useful for inferring a level of population structure of an individual and, in turn, an inference as to various traits of the individual.
- SNPs single nucleotide polymorphisms
- ABVIs ancestry informative markers
- the AIMs used to exemplify the present methods are set forth as SEQ ID NOS: 364 to 537 in Appendix 3 (42 pages), which is incorporated herein by reference, and include the ADVIs (SNPs) set forth in SEQ ID NOS: 371 to 398, 400 to 408, 410 to 413, 415, 418, 420, 422, 423, 425, 431 to 433, 438 to 441, 443, 450 to 452, 455, 456, 461 to 463, 467 to 475, 477 to 485, 487, 495 to 498, 502 to 504, 506, 508 to 512, 514, 516, 519 to 521, 526, 529, and 533 to 537, which, as disclosed herein, are useful for inferring a trait of an individual (e.g., biogeographical ancestry).
- SNPs ADVIs
- SBE primers single base extension (SBE) primers useful for multiplex analysis of the exemplified AIMs are shown, respectively, in Appendix 1 (11 pages), which is incorporated herein by reference, and polymerase chain reaction (PCR) primer pairs useful for amplifying gene regions containing the exemplified AIMs are shown, respectively, in Appendix 2 (15 pages), which is incorporated herein by reference.
- SBE primers and PCR primer pairs i.e., panels of primers and primer pairs that can be used together in a single multiplex reaction to sample a plurality of 7 to 24 different AMs are US2006/005863
- Appendices 1 and 2 e.g., Panel 3 in Appendices 1 and 2 contain 7 SBE primers and 14 PCR primer pairs, respectively, that can be used in a single multiplex reactions to sample (i.e., determine the nucleotide occurrence) the AIMs set forth as SEQ ID NOS :1 to 7, respectively.
- the exemplified multiplex reactions and results obtained using the panels of PCR primer pairs (Appendix 2) and SBE primers (Appendix 1) are shown, which is incorporated herein by reference, and in Figures 17-19A-K.
- compositions exemplified herein provide a DNA test adapted for use with capillary electrophoresis equipment.
- the exemplified assay which examines the Biogeographical ancestry (BGA) proportions of individuals, analyzes 174 SNPs that are informative for continental ancestry (see, International Publ. No. WO 04/016768).
- the exemplified compositions provide a test kit that allows multiplex analysis of the 174 SNPs in 12 reactions. SBE primers were designed such that the reaction products could be separated and easily read using standard capillary electrophoresis software and equipment (see Example 1; see, also, Appendix 1).
- the exemplified assay allows the BGA informative SNPs to be examined on a platform that is commonly used in the field, and reduces the cost per test, thus allows tests to be performed economically by a large number of laboratories using readily available reagents.
- a multiplex assay is exemplified herein by performing multiplex PCR reactions, then examining the SNP in the PCR products using multiplex SBE reactions. Following the PCR reaction, which amplified the region surrounding each SNP, the unreacted reagents were removed, and a removed by treatment with shrimp alkaline phosphatase (see Example 1), and the primers were detected using a capillary electrophoresis and detection system (Applied Biosystems, Inc. (ABI), Foster City CA).
- An ABI 3700 Genetic Analyzer instrument was used for the exemplified studies, although any similar platform can be used, including, for example, an ABI 310 Genetic Analyzer, 3100 Genetic Analyzer, 3130 Genetic Analyzer, 3700 Genetic Analyzer, or 3730 Genetic Analyzer instrument, or the exemplified system can be modified as disclosed herein and reaction products (SNPs) can be detected, for example, using a mass spectroscopy system (e.g., Sequenom; San Diego CA), a BeadArray® system (Illurnina; San Diego CA); a LightTyperTM system (Roche), a NanoChip® system (Nanogen; San Diego CA); a fluorescence polarization/template directed dye termination (FP-TDI) system (Perkin- Elmer; Boston MA); or any other system using, for example, slab gel electrophoresis, printed microarrays, colorimetric detection platforms (e.g., flourimetry), or high performance liquid chromatography systems.
- alternative groups of the primers can be used such that the final SBE products can be resolved by the detecting system (e.g., capillary electrophoresis).
- the SBE primers used in alternative groupings further can be modified such that products of resolvably different sizes are generated. By employing such modification, the number of reactions that need to be run for each sample (person) can be further reduced, for example, to 10, 8, 6 or fewer multiplex reaction, including as few as a single tube reaction. Such methods would allow examination of nucleic acid molecule samples that are available only in a very low quantity, or are of a low quality.
- the number of SNPs examined can be reduced from 174 (e.g., to 150, 100, 70, 50, 25, or fewer), for example, by performing fewer than all 12 of the exemplified multiplex reactions (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or 11 reactions), although the confidence level of the inference will be reduced accordingly.
- the methods used for removing unreacted products from the reaction(s) can be varied (e.g., using spin columns) or modified, preferably while minimizing sample manipulation.
- labels other than those used in the exemplified methods also can be used, for example, where capillary electrophoresis instruments of other vendors are used (e.g., Beckman-Coulter, or Hitachi), or where different detection systems are used (e.g., mass spectrometry). Additional modifications to the exemplified method can include, for example, performing the reaction in a single tube.
- the reaction can be performed using PCR primers that include a biotinylated PCR primer, wherein the PCR amplification product is bound to streptavidin, which, in turn, is bound to a solid substrate (e.g., a bead, or a surface of a tube or well in which the reaction is performed); following PCR and washing of the solid substrate, the SBE reaction can be performed in the same tube (or well).
- PCR primers that include a biotinylated PCR primer, wherein the PCR amplification product is bound to streptavidin, which, in turn, is bound to a solid substrate (e.g., a bead, or a surface of a tube or well in which the reaction is performed); following PCR and washing of the solid substrate, the SBE reaction can be performed in the same tube (or well).
- compositions and multiplex methods provide a tool for human forensics, as well as for pharmaceutical testing, demographic testing, ancestry testing, identity testing, and remains identification of human. Further, the methods and compositions can be applied across an array of species such as dogs, cats, horses, pigs, cattle, or any domesticated or wild species of mammalian origin.
- compositions and methods can be used with instruments other than capillary electrophoresis, and, as mentioned above, can be performed by binding a component of the reaction (e.g., a PCR primer, an SBE primer, or a nucleic acid sample to be tested) to plastic, glass, metal or any other suitable surface, and reaction products comprising the nucleotide occurrences of the SNPs can be measured, for example, by detecting emitted light of either UV or other radiation reaction or change in frequency or other detectible linker using other laboratory instruments.
- the exemplified methods are performed using liquid-liquid interactions. However, as discussed above, the assays can be modified to be performed using dry-liquid interactions or dry-dry interactions.
- one or more components of a multiplex reaction can be applied to a silicon wafer or other conducting material used in microelectronic applications, and reaction products can be measured, for example, by ion detection or electron capture.
- Such modifications to the exemplified system can allow the methods to performed using microelectronic or nanotechnology methods, m one aspect, the modifications allow the construction of field service kits.
- the exemplified compositions can be used in other formats in multiplex reactions, including, for example, allele specific hybridization, allele specific oligonucleotide ligation, sequencing, single stranded conformational polymorphism, heteroduplex analysis, allele specific PCR, and allele specific PCR with locked nucleic acid molecules.
- ABVIs are genetic loci that show alleles with high frequency differences between populations.
- ABVIs are exemplified herein generally by SNPs (see, e.g., SEQ ID NO:1, Appendix 3, wherein the alternative alleles indicated in brackets), although ABVIs also occur as deletion/insertion polymorphisms (see WO 2004/016768; e.g., SEQ ID NO.363).
- AIMs can be used to estimate BGA of an individual or collection of individuals at the population level (in terms of races), at the sub-population level (in terms of ethnicities), and at the micro-group level (in terms of familial lines within ethnic groups), as well as at a practical, phenotypically qualified level (e.g., cases and controls).
- Such ancestry estimates at the subgroup and individual level can be directly instructive regarding the genetics of phenotypes that are different qualitatively or in frequency between populations, including, for example, the likelihood that an individual will respond to a particular medication or the propensity of an individual to develop a disease.
- Ancestry estimates also can provide a compelling foundation for the use of Admixture Mapping (AM) methods to identify the genes underlying these traits.
- AM Admixture Mapping
- the multiplex methods disclosed herein provide a tool that conveniently allows, for example, 1) for the estimation of ancestry proportions in individuals from their DNA; 2) for the estimation of genetic structure for the control of study designs commonly used for genetic research; 3) for the construction of physical profiles through the inference of characteristics related to ancestry, which may have implications in forensic investigations; 4) for the identification of disease predisposition, referred to as "Mapping by Ancestry Linkage Disequilibrium" (MALD); and 5) for predicting a significant portion of an individual patient's response to prescription and over-the-counter medications, in only a few reactions that, along with the exemplified primers and primer pairs, use readily available reagents and equipment.
- MALD Ancestry Linkage Disequilibrium
- a method of the invention is performed by contacting at least a first sample (i.e., 1, 2, 3, 4, or more sample(s)) that contain test nucleic acid molecules of the individual with at least a first plurality of primers, including at least one plurality of SBE primers as set forth in the panels of Appendix 1 (e.g., Panel 3, Panel 41, etc.), under conditions suitable for SBE of the primers; and detecting at least one SBE product of a primer of the first (or other) plurality.
- a first sample i.e., 1, 2, 3, 4, or more sample(s)
- a first plurality of primers including at least one plurality of SBE primers as set forth in the panels of Appendix 1 (e.g., Panel 3, Panel 41, etc.)
- Appendix 1 e.g., Panel 3, Panel 41, etc.
- a test individual i.e., an individual from which test nucleic acid molecules are obtained
- a trait for whom a trait is to be inferred
- a test nucleic acid sample can be nucleic acid molecules as obtained from an individual (e.g., nucleic acid molecules present in a blood, semen, oral swab, or tissue sample), or can be product produced from such nucleic acid molecules (e.g., PCR amplification products).
- an individual providing test nucleic acid molecules is a human, though the methods of the invention also can be used for inferring traits of other mammals, including, for example, domestic animals such as cats, dogs, or horses; farm animals such as cattle, sheep, pigs, or goats; or other animals.
- the trait to be examined can be any trait of interest, including, as exemplified herein, proportional ancestry (BGA); hair, skin or iris pigmentation; or drug responsiveness.
- the methods of the invention are particularly useful because they allow for an inference to be made of a desired trait with a predetermined level of confidence.
- a predetermined level of confidence means that an inference or estimate of the invention is made using statistical methods that provide a confidence interval to be determined about a mean or a maximum likelihood value.
- other similarly likely values can also be determined and these can be combined to define the x-fold likelihood confidence intervals, where x is any number such as 2, 5 or 10. For example, all of the structure results corresponding to a likelihood value 10 times lower than the Maximum Likelihood Value can be plotted or listed to define the 10-fold likelihood confidence interval.
- an assay of the invention is designed such that performance of the test results in a value having a desired confidence level.
- a method of the invention can be performed such that the result has a predetermined level of confidence by varying the number of ABVIs examined with respect to a trait.
- use of one exemplified panel to sample a plurality of AIMs can allow an inference to be made as to whether an individual has a particular trait, e.g., responsiveness to LipitorTM medication, with a certain level of confidence, whereas the use of two or more of the exemplified panels can allow the same inference to be made, but with a higher level of confidence.
- use of one of the exemplified panels can allow an inference to be made that an individual has, for example, 80% IndoEuropean ancestry and 20% East Asian ancestry (with an error, e.g., of + 10%), whereas the use of two or more panels can allow the same inference, but with an error, e.g., of + 5%.
- a sample useful for practicing a method of the invention can be any biological sample of a test individual that contains nucleic acid molecules, including portions of the gene sequences containing AMs that are to be examined or, wherein the polymorphism of an AIM results in an amino acid change in an encoded polypeptide, any biological sample that contains the encoded polypeptides.
- the sample can be a cell, tissue or organ sample, or can be a sample of a biological fluid such as semen, saliva, blood, cerebrospinal fluid, and the like.
- a nucleic acid sample useful for practicing a method of the invention will depend, in part, on whether the SNPs to be identified are in coding regions or in non-coding regions. Where one or more SNPs is present in a non-coding region of a gene, the nucleic acid sample generally is a deoxyribonucleic acid (DNA) sample, particularly genomic DNA or an amplification product thereof.
- the primer pairs exemplified in Appendix 2 can be used to obtain such amplification products, particularly a plurality of amplification products comprising different AIMs, which can be prepared in a multiplex reaction using the exemplified panels of primer pairs (e.g., Panel 3, Panel 41, etc.).
- RNA sample can be used and examined directly, or a cDNA or amplification product thereof can be examined according to the present methods.
- the nucleic acid sample can be DNA or RNA, or products derived therefrom, for example, amplification products.
- Conditions suitable for detecting the nucleotide occurrences of AIMs are exemplified herein (see Example 1), and can otherwise be varied depending, for example, on the sequences of the hybridizing oligonucleotides, including their length and complementarity, as well as on the particular reaction being performed (e.g., an amplification reaction, or a single base extension reaction).
- the hybridizing oligonucleotides i.e., SBE primers and PCR primer pairs
- nucleotides can be linked together by a phosphodiester bond, and can be single stranded or double stranded, though they generally are used in a single stranded form, and can be prepared using methods of chemical synthesis or by enzymatic methods.
- hybridizing oligonucleotides, or other polynucleotides useful in a methods or contained in a kit of the invention also can contain nucleoside or nucleotide analogs, and can have a backbone bond other than a phosphodiester bond, such oligonucleotides providing certain advantages such as having increased stability or more desirable hybridization properties.
- Nucleotide analogs are well known in the art and commercially available, as are polynucleotides containing such nucleotide analogs (Lin et al., Nucl. Acids Res.
- the covalent bond also can be any of numerous other bonds, including a thiodiester bond, a phosphorothioate bond, a peptide-like bond or any other bond known to those in the art as useful for linking nucleotides to produce synthetic oligonucleotides (see, for example, Tarn et al., Nucl. Acids Res.
- nucleolytic activity including, for example, a tissue culture medium or sample comprising a cell extract because the modified oligonucleotides can be less susceptible to degradation.
- the hybridizing oligonucleotides useful for purposes of the present invention are at least about 15 bases in length, which is sufficient to permit the oligonucleotide to selectively hybridize to a target polynucleotide comprising the AIM, and can be at least about 18 nucleotides or 21 nucleotides or 25 nucleotides or more in length.
- selective hybridization or “selectively hybridize” refers to hybridization under moderately stringent or highly stringent physiological conditions, which can distinguish related nucleotide sequences from unrelated nucleotide sequences.
- nucleic acid hybridization reactions the conditions used to achieve a particular level of stringency are known to vary, depending on the nature of the nucleic acids being hybridized, including, for example, the length, degree of complementarity, nucleotide sequence composition (e.g., relative GC:AT content), and nucleic acid type, i.e., whether the oligonucleotide or the target nucleic acid sequence is 05863
- DNA or RNA DNA or RNA.
- An additional consideration is whether one of the nucleic acids is immobilized, for example, on a filter, bead, chip, or other solid matrix. Examples of conditions that allow for selective hybridization of panels of primers and primer pairs useful in multiplex reactions performed in solution are provided in Example 1.
- stringency conditions can be determined empirically or estimated using various formulas, as is well known in the art (see, for example, Sambrook et al., Molecular Cloning: A laboratory manual (Cold Spring Harbor Laboratory Press, 1989).
- An example of progressively higher stringency conditions is as follows: 2X SSC/0.1% SDS at about room temperature (hybridization conditions); 0.2X SSC/0.1% SDS at about room temperature (low stringency conditions); 0.2X SSC/0.1% SDS at about 42°C (moderate stringency conditions); and 0.1X SSC at about 68°C (high stringency conditions).
- Washing can be carried out using only one of these conditions, for example, high stringency conditions, or each of the conditions can be used, for example, for 10 to 15 minutes each, in the order listed above, repeating any or all of the steps listed. As such, final conditions will vary, depending on the particular hybridization reaction involved, and can be determined empirically. It should be recognized that a variety of conditions can be utilized to provide selective hybridization conditions.
- the conditions can be selected such that selective hybridization occurs for all of the hybridizing oligonucleotides in the reaction.
- Detectable labels useful for the present methods include, for example, chemiluminescent labels, radionuclides, enzymes, haptens such as digoxygenin and biotin, fluorophores, and unique oligonucleotide sequences, hi addition to the exemplified labeling by SBE, additional sequences can be labeled depending, for example, on the product to be detected.
- labeling of PCR products can be performed, wherein one primer is biotinylated and the other primer contains digoxygenin.
- the amplification products can then be bound to a streptavidin plate, washed, reacted with an enzyme-conjugated antibody to digoxygenin, and developed with a chroniogenic, fluorogenic, or chemiluminescent substrate for the enzyme.
- a radioactive method also can be used to detect generated amplification or SBE products, for example, by including a radiolabeled deoxynucleoside triphosphate or dideoxynucleoside triphosphate, respectively, into the reaction.
- streptavidin-coated scintillation proximity assay plates can be used to measure the PCR products.
- Additional methods of detection can use a chemiluminescent label, for example, a lanthanide chelate such as used in the DELFIA® assay (Pall Corp.), a fluorescent label, or an electrochemiluminescent label such as ruthenium tris-bipyridyl (ORI-GEN).
- methods for detecting a nucleotide occurrence at a SNP position of an AIM can utilize one or more oligonucleotide probes or primers, including, for example, an amplification primer pair, that selectively hybridize to a target polynucleotide spanning the AIM.
- Oligonucleotide probes useful in practicing a method of the invention can include, for example, an oligonucleotide that is complementary to and spans a portion of the target polynucleotide, including the position of the SNP, wherein the presence of a specific nucleotide at the position of the SNP is detected by the presence or absence of selective hybridization of the probe.
- Such a method can further include contacting the target polynucleotide and hybridized oligonucleotide with an endonuclease, and detecting the presence or absence of a cleavage product of the probe, depending on whether the nucleotide occurrence at the SNP site is complementary to the corresponding nucleotide of the probe.
- a pair of probes that specifically hybridize upstream and adjacent and downstream and adjacent to the site of the SNP, wherein one of the probes includes a nucleotide complementary to a nucleotide occurrence of the SNP also can be used in an oligonucleotide ligation assay, wherein the presence or absence of a ligation product is indicative of the nucleotide occurrence at the SNP site.
- An oligonucleotide also can be useful as a primer, for example, for a primer extension reaction, wherein the product (or absence of a product) of the extension reaction is indicative of the nucleotide occurrence, m addition, a primer pair useful for amplifying a portion of the target polynucleotide including the SNP site can be useful, wherein the amplification product is examined to determine the nucleotide occurrence at the SNP site or to determine whether there is an insertion or a deletion at the DIP site.
- Example 1 Conditions that allow generation of amplification products and SBE products in a sample using a multiplex method are exemplified in Example 1.
- conditions in which an amplification reaction and/or SBE reaction is being performed can be modified from those exemplified, or selected de novo, such that the reaction contains the necessary components for the amplification reaction and/or SBE reaction to occur.
- Such conditions include, for example, appropriate buffer capacity and pH, salt concentration, metal ion concentration if necessary for the particular polymerase, appropriate temperatures that allow for selective hybridization of the primers or primer pairs of a panel to the template target polynucleotides, as well as appropriate cycling of temperatures that permit polymerase activity and melting of a primer or primer extension or amplification product from the template or, where relevant, from forming a secondary structure such as a stem-loop structure.
- An SBE product can be detected as exemplified in Example 1. Further, a primer extension (e.g., SBE product) or amplification product can be detected directly or indirectly and/or can be sequenced using various methods known in the art. Amplification products that span a SNP site can be sequenced using traditional sequence methodologies, including, for example, the dideoxy-mediated chain termination method (Sanger et al., J. Molec. Biol. 94:441, 1975; Prober et al. Science 238:336-340, 1987) or the chemical degradation method (Maxam et al., Proc. Natl. Acad. Sci. USA 74:560, 1977) to determine the nucleotide occurrence at the SNP loci.
- the dideoxy-mediated chain termination method Sanger et al., J. Molec. Biol. 94:441, 1975; Prober et al. Science 238:336-340, 1987
- the chemical degradation method Maxam et al., Proc
- the nucleotide occurrence at a SNP site also can be determined using a microsequencing method, wherein the identity of only a single nucleotide is determined at a predetermined site (U.S. Pat. No. 6,294,336).
- Microsequencing methods include the Genetic Bit Analysis method (WO 92/15712). Additional, primer-guided, nucleotide incorporation procedures for assaying polymorphic sites in DNA have also been described (Kornher et al., Nucl. Acids. Res. 17:7779-7784, 1989; Sokolov, Nucl. Acids Res.
- nucleotide occurrence at a SNP position is described by Macevicz (U.S. Pat. No. 5,002,867), wherein a nucleic acid sequence is determined via hybridization with multiple mixtures of oligonucleotide probes.
- sequence of a target polynucleotide is determined by permitting the target to sequentially hybridize with sets of probes having an invariant nucleotide at one position, and a variant nucleotides at other positions.
- the nucleotide sequence is determined by hybridizing the target with a set of probes, then determining the number of sites that at least one member of the set is capable of hybridizing to the target (i.e., the number of matches). This procedure is repeated until each member of a sets of probes has been tested.
- U.S. Pat. No. 6,294,336 provides a solid phase sequencing method for determining the sequence of nucleic acid molecules (either DNA or RNA) by utilizing a primer that selectively binds a polynucleotide target at a site wherein the SNP is the most 3' nucleotide selectively bound to the target.
- McSNP® analysis provides another method that can be adapted to a multiplex procedure and used for detecting a nucleotide occurrence in an AIM (Akey et al., supra, 2001). McSNP® analysis provides the additional advantages that it does not require a step of gel electrophoresis, thus minimizing the time and cost for detecting a SNP.
- compositions and methods are provided for inferring an individual's response to commonly used medications, which is a function of individual ancestry (see WO 2004/016768).
- the exemplified compositions and methods provide tools for inferring individual and/or group ancestral proportions from knowledge of the individual's or group's DNA sequences, as well as for qualifying and normalizing study groups for more traditional methods of mapping disease genes.
- Each of these processes requires an accurate knowledge of ancestry, which can be determined using the methods of the invention and exemplified compositions.
- proportional ancestry which can be a proportion of any ancestral group, including, for example, a proportion of sub-Saharan African, Native American, IndoEuropean, East Asian, Middle Eastern, or Pacific Islander ancestral group, and generally is a combination of two or more of such ancestral groups.
- the proportional ancestry of a test individual can include proportional affiliation among the sub-Saharan African and hidoEuropean ancestral groups (e.g., 80% sub-Saharan African and 20% hidoEuropean; or 60% sub-Saharan African, 20% hidoEuropean, and 20% of a third ancestral group); or can include proportional affiliation among the Native American and IndoEuropean ancestral groups; East Asian and Native American ancestral groups; IndoEuropean and East Asian ancestral groups; and the like, hi this method, identifying a population structure within an individual that correlates with the nucleotide occurrences of the AIMs of the test individual can be practiced by performing a likelihood determination for affiliation with each of a sub-Saharan African ancestral group, a Native American ancestral group, an IndoEuropean ancestral group, and an East Asian ancestral group; thereafter selecting three ancestral groups having a greatest likelihood value for the individual; dete ⁇ nining a likelihood of all possible proportional affiliations among the three ancestral groups having the greatest likelihood value, whereby a population structure
- identifying a population structure that correlates with the nucleotide occurrences of the AJQVIs can be practiced by performing six two-way (binary) comparisons comprising likelihood determinations for affiliation of each group compared to each other group; thereafter selecting three ancestral groups having a greatest likelihood value across all comparisons; dete ⁇ riining a likelihood of all possible proportional affiliations among the three ancestral groups having the greatest likelihood value, whereby a population structure or proportional affiliation that correlates with the nucleotide occurrences of the ADVIs of the test individual is identified; and identifying a single proportional combination of maximum likelihood.
- Such a methodology works as well for individuals of three-way admixture as individuals that are 100% affiliated with a single group.
- An estimate of an individual's proportional ancestry that includes proportions of three ancestral groups also can be made by performing three three-way comparisons among the groups; determining a likelihood of all possible proportional affiliations among the three ancestral groups having the greatest likelihood value, whereby a population structure or proportional affiliation that correlates with the nucleotide occurrences of the AIMs of the test individual is identified; and identifying a single proportional combination of maximum likelihood.
- An advantage of the present methods is that a graphical representation of the comparison of the three ancestral groups can be generated, wherein the graphical representation comprises a triangle with each ancestral group independently represented by a vertex of the triangle, and wherein the maximum likelihood value of proportional affiliation for an individual comprises a point within the triangle (see WO 2004/016768).
- the graphical representation can further include a confidence contour that indicates a level of confidence associated with estimating the proportional ancestry.
- an estimate of an individual's proportional ancestry also can be made where the proportional ancestry includes proportions of four ancestral groups.
- identifying a population structure that correlates with the nucleotide occurrences of the AIMs of the test individual is practiced by performing six two-way comparisons, or by performing three three-way comparisons, or by performing one four-way comparison among the groups; determining a likelihood of all possible proportional affiliations among the four ancestral groups having the greatest likelihood value, whereby a population structure or proportional affiliation that correlates with the nucleotide occurrences of the AEVIs of the test individual is identified; and identifying a single proportional combination of maximum likelihood.
- the method can further include generating a graphical representation of the comparison of the three ancestral groups, wherein the graphical representation comprises a pyramid with each ancestral group independently represented by a vertex of the pyramid, and wherein the maximum likelihood value of proportional affiliation for an individual comprises a point within the pyramid.
- the graphical representation can further include a confidence contour comprising a sphere around the point, wherein the sphere indicates a level of confidence associated with estimating the proportional ancestry.
- Such methods are useful, for example, as a forensic tool and, when performed as multiplex assays, allow for a rapid and cost effective way to screen samples.
- the methods can provide an investigator with prospective information as to the likelihood of an individuals ancestry, as well as hair, skin and eye pigmentation.
- the methods of estimating proportional ancestry of a test individual also provide a tool that can supplement genealogical information, which generally is based on relationships established using geopolitical information.
- the present methods provide information that can be used to generate an ancestral map of the world, wherein locations of populations having a proportional ancestry corresponding to the proportional ancestry of the test individual are indicated on the ancestral map (see WO 2004/016768).
- the method can further include overlaying the ancestral map with a genealogical map, wherein the genealogical map indicates locations of populations having geopolitical relevance with respect to the test individual, and statistically combining the information of the ancestral map and genealogical map to obtain a most likely estimate of family history of the test individual.
- Identifying a population structure that correlates with the nucleotide occurrences of the AIMs can be performed by comparing the nucleotide occurrences of the AIMs of the test individual with known proportional ancestries corresponding to nucleotide occurrences of AEVIs indicative of BGA.
- the known proportional ancestries corresponding to nucleotide occurrences of ADVIs indicative of BGA can be contained in a table or other list, and the nucleotide occurrences of the test individual can be compared to the table or list visually, or can be contained database, and the comparison can be made electronically, for example, using a computer.
- a particularly useful application of a method of the invention involves associating known proportional ancestries corresponding to nucleotide occurrences of AEVIs indicative of BGA of individuals, with a photograph of a person from whom the known proportional ancestry was determined, thus providing a means to further infer physical characteristics of a test individual.
- the photograph is a digital photograph, which comprises digital information
- the information can be contained in a database, which can further contain a plurality of such digital information of digital photographs, each of which is associated with a known proportional ancestry corresponding to nucleotide occurrences of AIMs indicative of BGA of the person in the photographs.
- a method of the invention can further include identifying a photograph of a person having a proportional ancestry corresponding to the proportional ancestry of the test individual. Such identifying can be done by manually looking through one or more files of photographs, wherein the photographs are organized, for example, according to the nucleotide occurrences of AIMs of the person in the photograph.
- Identifying the photograph also can be performed by scanning a database comprising a plurality of files, each file containing digital information corresponding to a digital photograph of a person having a known proportional ancestry, and identifying at least one photograph of a person having nucleotide occurrences of AIMs indicative of BGA that correspond to the nucleotide occurrences of AIMs indicative of BGA of the test individual.
- proportional ancestry is used herein to refer to the percent contribution of each (if more than one) ancestral group to which an individual belongs.
- the proportional ancestry estimated according to a method of the invention can be a proportion of any ancestral group, including, for example, a proportion of sub-Saharan African, Native American, hidoEuropean, East Asian, Middle Eastern, or Pacific Islander ancestral group, and generally is a combination of two or more of such ancestral groups.
- the proportional ancestry of a test individual can include proportions of sub-Saharan African and hidoEuropean ancestral groups (e.g., 80% sub-Saharan African and 20% IndoEuropean; or 60% sub-Saharan African, 20% hidoEuropean, and 20% of a third ancestral group); or can include proportions of Native American and IndoEuropean ancestral groups; East Asian and Native American ancestral groups; hidoEuropean and East Asian ancestral groups; and the like.
- the proportional ancestry can include proportions of Native American, East Asian, and hidoEuropean ancestral groups; sub-Saharan African, Native American, and IndoEuropeaii ancestral groups; sub-Saharan African, Native American, and East Asian ancestral groups; and the like.
- This Example demonstrates that a panel of 32 Ancestry Informative Markers (AIMs) allows an estimate of the genetic contribution from populations of African, European and Native American ancestry.
- AIMs Ancestry Informative Markers
- the AIMs used in the exemplified study include single nucleotide polymorphisms (SNPs), deletion/insertion polymorphisms (DIP s) and AIu sequences (see Example 2 for identification of AIMs). Markers showing differences between the parental populations greater than 30% were selected (Table 1; see, also, SEQ ID NOS:332-363). Informative genetic markers were identified by testing each candidate marker in a panel of European (Spanish, and German), African (from Nigeria, Sierra Leone, and Central African Republic), and Native American populations (Mayan and South Western Native Americans) to confirm the usefulness of the marker for admixture estimation.
- the publicly available human genome sequence database and polymorphism database were screened in order to identify SNPs that met the criteria for being a good AIM. Allele frequencies are available for many of the SNPs in the public databases for three populations — Africans, Europeans and Asians. Since these frequencies are obtained from small samples they are not always accurate. The main criteria for selection herein was the delta value that derived from using these frequencies, which is a statistical measure of the difference in minor allele frequency between various populations of human beings. For example, a C or a G polymorphism at a particular place in the human genome, where the C is present mainly in individuals of European descent and the G present mainly in individuals of Native American descent, would have a high delta value and, therefore, qualify as a good AIM.
- an A or a C polymorphism at a particular place in the human genome where the A is present mainly in individuals of African descent and the C present mainly in individuals of Asian descent would have a large frequency differential between these groups and, therefore, a high delta value, thus qualifying as a good AIM.
- a list of such "candidate ABVIs was compiled, ranked from largest delta value to smallest delta value for each of the possible pair-wise population comparisons, and screened, one at a time, against a panel of "parental" samples.
- Parental samples are samples from regions of the world that are relatively homogeneous, for example, Niger or Congo for sub-Saharan Africans, Southern Mexico for Native Americans, China for East Asians, and Europe for Europeans.
- each candidate AIM was initially selected from the public database based on crude population structure differences (i.e., continental populations), many of them were found to carry information on finer levels of structure because the separation of subgroups of humans from larger groups throughout human evolution has provided a fertile opportunity for genetic drift, founder effects, and natural selection to operate in either fixing or eliminating their sequences.
- the disclosed sequences provide information as to the target being examined (i.e., the polymorphism) as well as information for preparing primers and amplification primer pairs, and hybridization probes, for sampling the SNP (i.e., determining the genotype of a sample). Further, the disclosed sequences can be used, if desired, to scan public databases to identify additional upstream and downstream nucleotide sequences.
- AIMs can be identified using the disclosed methods, and provides a panel of 32 AIMs that can be applied towards an ultimate goal of compiling a panel of approximately 1,000 AIMs spanning the entire human genome.
- Candidate AIMs were obtained by screening SNP allele frequency data generated through The SNP Consortium (TSC).
- the present study focused on the accuracy of the SNP database and the number of candidate SNPs present therein. With respect to the accuracy of the database, each group involved in the SNP consortium has taken a different approach to generating data. As such, initial concerns regarding how the data can be combined was addressed. Because the genotyping approaches were different for each group, it was necessary to address the question of ascertainment biases that might differentially affect the data of particular groups. For example, most of the groups produced their allele frequencies after sequencing a subset of the TSC diversity panels, then scoring these markers in the larger groups of 42 individuals from the 3 populations. The Washington University group has taken an approach whereby pooled sequencing throughout regions was performed, and the allele frequencies calculated for variable positions discovered during this effort. The Orchid group has not used sequencing but, instead, started with loci from the TSC SNP database that are known to be polymorphic. Given such differences, a systematic characterization was made as the extent, if any, that different biases may have affected the results.
- Asian-African (1026 candidate AMs/25,578 total SNPs; average Fst 0.0886)
- African Americans In African Americans, most individuals showed predominantly African genetic contribution, but some persons showed relatively high European contribution and also, to a lesser extent, Native American ancestry. European Americans clustered more tightly near the pole corresponding to high European contribution, with few persons showing evidence of Native American and African ancestry. Spanish Americans showed the highest dispersion of individual ancestry, as expected given the high admixture level observed in this sample.
- Tests for differences in the average pigment levels by genotype for the AIMs typed in the African- American population sample discussed above were performed.
- the panel of AIMs included three candidate gene markers, OC A2, TYR, and MClR.
- the analysis was performed in three alternative ways: first with no consideration of the individual ancestry estimates (ANOVA); second after conditioning to control for the effect of individual ancestry leaving out the locus under consideration (ANCOV A/IAE minus marker); and third using the complete individual ancestry estimate for the conditioning (ANCOV A/IAE).
- ANOVA individual ancestry estimates
- ANCOV A/IAE minus marker the complete individual ancestry estimate for the conditioning
- eight of twenty-one (38%) of the markers showed significant differences (p ⁇ 0.05) among the three genotypes, including two of the four candidate gene markers (OC A2 and TYR).
- Marker indicates the Ancestry Informative marker used in the test. Markers shown in bold and italics are candidate genes for pigmentation (viz. OC A2, MClR, TYR).
- Marker indicates the Ancestry Informative marker used in the test. Markers shown in bold and italics are in or near candidate genes for pigmentation (viz. TYR-192 and CYP19E2 near MYO5A).
- Table 4 shows the results of genotyping and statistical analysis that demonstrate several different but important points (the sequences for each of the AIMs in Table 4 can be found by reference to Table 6, using the marker number. TABLE 4 Pair-wise population comparisons of SNP distinction
- AF-AF-AF-M AFAF- - CT-CTCTEA- A-EAM SASAE- E- PICT EA SA E -PI-AIEASAME-PI-AI-SAMEPI -AIE -PI-AIPI AI marker
- the AIM unique identifier is shown in the last column.
- Cells having numbers in bold indicate good ⁇ values; cells that are shaded with numbers in bold represent extremely high ⁇ values.
- Table 4 was derived from screening several hundred candidate AIMs electronically selected from the public databases, thus demonstrating that only a minority of the candidate AIMs from the public databases are real AIMs.
- the public SNP databases were electronically screened to find good candidate AJMs (since frequency data is provided for three "racial" groups, though the level of admixture for these groups is not known).
- a sampling of the data for about 70 AIMs that passed the screening process from 175 candidate AIMs screened is shown in Table 4.
- the delta ( ⁇ ) value is a measure of how well the sequence for a polymorphism enables one to predict membership to one or the other group; i.e., how distinct the two populations are with regard to the sequence at this polymorphism, ⁇ values are shown for 69 of the 175 AEVIs; the other 105 had ⁇ values of 0 for each pair-wise population comparison and, therefore, were not true AIMs.
- AIM 1068 in Table 4 is representative of types of failure (zeros across all pair wise comparisons — some AIMs with zeros across the population pairs are present in Table 4 because they are informative for populations not shown in this particular table). This result confirms that most of the candidate AIMs culled from the public database are not true AEVIs and highlights the value of the present invention.
- AIMs in a random collection of SNPs would be about 5%, the frequency of true AIMs in a culled set of candidate AIMs is about 50% and, after proceeding as disclosed herein, the frequency of true AIMs in a collection of SNPs is 100%.
- the panel of SNPs in Table 4 provides a well balanced mix of AIMs with resolution power for each of the possible pair-wise comparisons for 7 population groups, and this panel would constitute a good test for ancestry proportions.
- Data for south Asians, Middle Easterners and Pacific Islanders do not exist in the public databases and, therefore, was generated for these studies, hi comparison, one attempting to develop a test for ancestral proportions in 7-dimensions by simply sequencing at candidate AIMs haphazardly selected from the public SNP databases (i.e., without selection through data production) would need to compile a battery of thousands of SNPs to obtain a panel such as that in Table 4 because certain pairs of populations are difficult to resolve (e.g., South Asians and Europeans, which constitute a larger hidoEuropean group united by a common language base).
- the percentage of ancestry is given after each group: EUR - IndoEuropean, NAM - Native American, EAS - East Asian/Pacific Islander, AFR - African.
- the self-reported race for each individual (SELF), their mother (M), father (F), maternal grandmother (MGM), maternal grandfather (MGF), paternal grandmother (PGM), paternal grandfather (PGF) and the country of their birth is shown.
- the public databases employed a small number of samples for each of the three groups (African-Americans, Europeans and Asians). Thus, from the public databases, the actual allele frequencies for the claimed SNPs is uncertain and was only determined with accuracy from the present work. Further, the use of African- Americans as a parental group is faulty since, as disclosed herein, they are an admixed population (between Africans and Europeans). The best way to find SNP markers that are useful for the present methods is to genotype a number of samples from the major BGA groups of the world for all of those of apparently different minor allele frequency between at least two of the groups, calculate the ⁇ values, and rank them.
- each child receives a chromosome from the mother and one from the father, the children are different because the mother has chromosomes of mostly Native American descent, though some have European and African flavor as well, and the father has chromosomes of mostly European flavor though some are of Native American flavor as well.
- the mother has chromosomes of mostly Native American descent, though some have European and African flavor as well
- the father has chromosomes of mostly European flavor though some are of Native American flavor as well.
- independent assortment Since some of the mother's chromosome pairs have a member of the pair with European flavor, some of the children will receive the "European flavored chromosome" and other children will not.
- BioGeographical Ancestry is the heritable component of race. Because socio-cultural and geo-political metrics for measuring human race are human, not natural, constructs, their use in genetics research makes it difficult to control for population genetic structure, and may obscure important correlations between BGA and human biology. This example provides methods and compositions to accurately measure genetic structure within individuals. The human genome was mined for candidate Ancestry Informative Markers (AIMs), which were validated on an ultra-high throughput genotyping platform and used to establish parental population allele frequencies.
- AIMs Ancestry Informative Markers
- the collection of 71 AIMs used for in this Example was selected to maximize the cumulative ⁇ value within, and minimize differences in the cumulative ⁇ value between each of the six possible pairs of the four dimensional (sub-Saharan African, Native American, IndoEuropean and East Asian) problem.
- the algorithm inverts the population specific allele frequencies to obtain a likelihood estimate of proportional affiliation corresponding to a multilocus genotype using three groups at a time (mainly for computational convenience and because a 4-dimensional admixture is likely to be relatively rare).
- the likelihood of 100% IndoEuropean, 0% Native American, 0% East Asian is calculated, then the likelihood of 99% IndoEuropean, 1% Native American, 0% East Asian is calculated next, and so on until all possible IndoEuropean, Native American and East Asian proportions are considered, then the process is repeated for all possible IndoEuropean, Native American and African proportions, and all possible Native American, African and East Asian proportions.
- the likelihood of maximum value is selected as the Maximum Likelihood Estimate (MLE).
- AA sub-Saharan Africans
- IE IndoEuropeans
- EA East Asians
- NA Native Americans
- the 71 AIMS are shown as SEQ ID NOS:1 to 71; the top 100 candidate AIMs for the group pairs were as follows: EA by AA (SEQ ID NOS: SEQ ID NOS:7, 21, 23, 27, 45, 54, 59, 63, and 72 to 152); EA by IE (SEQ ID NOS:3, 8, 9, 11, 12, 33, 40, 59, 63, and 153 to 239); and IE by AA (SEQ ID NOS:1, 8, 11, 21, 24, 40, 172, and 240 to 331).
- ATMs identified by one pairwise comparison also can be AIMs for a second pairwise comparison (e.g., SEQ ID NO:59 was identified as an AIM for EA by AA and EA by IU comparisons), although such AIMs are an exception.
- SEQ ID NO:59 was identified as an AIM for EA by AA and EA by IU comparisons
- AIMs are an exception.
- many of the 71 AIMs are not in the list of the top 100 candidate AIMs shown for any of the pairs (but were in the top 200 candidate AIMs); candidate AIMs were not used, for example, because they did not genotype well due to the SNP type of amplification parameters used for the exemplified platform, or for other reasons as disclosed herein.
- AIM AIM unique identifier
- the 71 AIMs used in the exemplified panel were spread throughout 21 of the 23 autosomal chromosomes (Figure 8), with the average chromosome containing 3 AIMs (see Table 6). Each had alleles in Hardy- Weinberg equilibrium, both overall with all four BGA groups considered together, and within each BGA group, and none were found to be in linkage disequilibrium with one another.
- the software program used individual genotypes for these AIMs with a maximum likelihood algorithm (see Example 6, Table 12; see, also, Example 1). The use of the 71 markers with this algorithm provides another example of the "BGA test".
- the BGA test was used to calculate BGA admixture proportions for the parental Native American, African and Indo European samples used in the construction of the test. After calculating the admixture proportions for each sample, they were plotted in a triangle plot to allow for the relative proportions of a 3 -way mixture to be represented in two dimensions. Because these were the same samples that comprised the parental groups and from which the population allele frequencies were derived, they were expected to exhibit relatively homogeneous BGA (i.e., of low admixture) and, in fact, the sub-Saharan Africans, Native Americans and European parental samples all registered with relatively homogeneous BGA (i.e., they plotted towards the appropriate vertices of a BGA triangle).
- the BGA test was next used to determine BGA proportions for 1 , 186 individuals of self-reported race (43 African Americans, 1,120 Caucasians, and
- BGA admixture proportions were calculated for 2,048 individuals of self-reported race, and blindly (in a computational sense) compared the majority BGA determined from the test against each individual's self-reported majority race. A very strong concordance was observed between the major BGA group determined with the test and the self-reported majority race (Table 7). Using the test, 1252/1252 self-described European-Americans (U.S. born Caucasians) registered with majority IndoEuropean BGA.
- the software program was designed to survey the probability space, and define that space within which the likelihood of proportional affiliation was 2-fold, 5 -fold, and 10-fold less likely to be the correct answer than the MLE (confidence contours, which are plotted on the triangle plot as rings around the MLE).
- One way to test the accuracy of the maximum likelihood algorithm for determining MLE and confidence contours was to observe whether and how these values change when certain of the AIM markers are eliminated from the analysis by replacing the genotypes for each with "failure" readings.
- the European/East Asian pair was selected for the first test -Ancestry 2.1EA/EU - and the BGA group pair for which the number of AIMs and cumulative ⁇ value is the lowest is Native American/East Asian, the ⁇ values for this pair was altered in the second test - Ancestry 2.
- the average change observed in admixture proportions between ANCESTRYbyDNATM 2.0 and Ancestry 2.1EA/EU was 1.4% (Standard Deviation 2.44%).
- the average change between ANCESTRYbyDNATM 2.0 and Ancestry 2.1NA/EA was 1% (Standard Deviation 2.3%).
- a repeatable, testable anthropological approach to define BGA can provide a means to draw connections between BGA and heritable diseases, whether through straight correlation and/or better study designs, or more subtle means such as through gene mapping methods that rely on the admixture process, such as MALD.
- a 71 marker test allowed a determination of BGA proportions and their confidence intervals.
- the test enabled a determination of the relative proportionality of BGA within individuals, thus distinguishing the BGA test from other tests previously used for inferring ancestry from DNA.
- more than 2200 tests were performed and no result was obtained that was inconsistent with self-held notions of race.
- Previous tests of ancestry have been accurate only to the upper 90% range (Shriver et al., supra, 1997; see, also, Frudakis et al., supra, 2003, which is incorporated herein by reference).
- the enhanced performance observed with the BGA test can be because CODIS and other STRs that have been commonly used for inferring ancestry from DNA were not selected for their ⁇ values, but were selected for their polymorphic complexity in the world population.
- the entire genome was systematically scanned and the best AIMs for this purpose were selected, hi addition, most efforts to infer ancestry from DNA using STRs or AIu sequences have attempted to categorize or bin samples into single "racial" groups. For individuals of extensive admixture, such as a 50/50 mix, such a method would seem to produce a "wrong" answer as many times as a "right” answer.
- ancestry is determined in terms of proportional affiliation, thus ameliorating this problem.
- the BGA test is distinguishable from other tests in that it employs SNPs that cover most of the chromosomes. Pan-chromosomal coverage using the BGA test provides a substantial advantage over tests using CODIS STRs, which only cover a fraction of the chromosomes. In addition, the BGA method appears to be the first that quantifies the confidence limits for its answers.
- the BGA test as exemplified is largely heuristic, and divides the world into four main anthropological groups that fall largely along continental lines. Though the geographical divisions are respectful of the anthropological history of human migrations, the use of four groups is indeed a simplification of a very complex situation and can be considered to be arbitrary.
- proportional affiliation has been simplified by calculating the most likely 3 -way (rather than 4- way) combination, because individuals of 4-dimensional BGA are thought to be rare, and because it is more convenient in a computational sense.
- more complex tests may be able to capture more of the extant detail of anthropological history, even a crude 4 population test provides data of meaningful and historical content, provided the results are interpreted strictly with respect to these divisions and to the parental samples used in the construction of the test.
- the East Asian affiliation for those individuals claiming affiliation with American Indians may be a by-product of the choice of Southern Mexico Nativos as the parental source, and the use of only a 4 group anthropological scheme, but nonetheless, the answer is not a "wrong" answer in a scientific sense. Rather, it reports affiliations with respect to a coalescent time scale defined by the source of the parental samples and the anthropologically meaningful way in which the world was divided for this study.
- Aleuts may appear to resemble East Asians as much as, or even more than, most Native Americans in terms of physical features, and even though they are indigenous to a geographical locale as proximal to East Asia as to temperate North America, they are considered by most to be North American Indians and by extension, Native Americans because their home lies east of the Bering Strait. Similar examples have been observed for certain other population groups, as discussed below, illustrating the disconnect between the measurement of population affiliation using genetic markers and that from geographical and social borders man has devised to ascribe racial identity.
- African Americans were more admixed with IndoEuropean Ancestry than were Caucasians with African ancestry, highlighting a difference in how Caucasians and Africans view their heritage, and invoking recollections of the "one drop rule".
- hidden or cryptic BGA structure arising from the process of admixture (a process that is not completely documented and quantifiable from our anthropological literature, as it is based on human constructions) is of potential concern for creating gross (or finer) structure differences between groups of study samples. Such a difference in structure would be expected to reduce the efficacy and power of large population based study designs.
- MCLR melanocortin-1
- the exemplified BGA test can be extended using AIMs in addition to those disclosed herein to quantify the elements of population structure relevant for this particular problem because a precision and objectivity greater than that provided by self-reporting of socio- cultural race will be needed to identify the elements of structure that can interfere with the design of genetics experiments.
- AIMs in addition to those disclosed herein to quantify the elements of population structure relevant for this particular problem because a precision and objectivity greater than that provided by self-reporting of socio- cultural race will be needed to identify the elements of structure that can interfere with the design of genetics experiments.
- Genealogists collect data that largely is relevant in a geopolitical context (e.g., data relating to which countries a person's ancestors are from, what their religions were, and their last names) rather than in an anthropological context (e.g., what type of population admixture characterizes the person's family tree).
- anthropological context e.g., what type of population admixture characterizes the person's family tree.
- the results of an exogamous event is determined in a recent genealogical time (e.g., the last 250 years).
- a Chinese great grandparent in an otherwise homogeneous IndoEuropean family tree would produce a grandchild of IndoEuropean/East Asian admixture.
- the individuals that are 100% East Asian (Chinese) are shown with shading ( Figure 11), and the admixture results for the male (square) at the bottom of the pedigree (short arrow) are of interest.
- a person with a single 100% East Asian great grandparent and seven 100% IndoEuropean great grandparents would be expected to have 12.5% East Asian admixture.
- the expected level is actually a range around 12.5%, with values several percent above and below possible.
- the grandparent indicated by the long arrow is about a 50%/50% East Asian/IndoEuropean mix and her daughter, the subject's mother, is expected to be a 25%/75% East Asian/IndoEuropean mix ( Figure 11).
- a time scale was constructed showing the time the most significant migrations occurred, and was correlated to a very, very large family tree.
- the tree is for a single individual, who resides at the bottom apex of a triangle graph; it is large because it goes back 60,000 years when there are tens of thousands of ancestors for this person.
- the time scale for the migrations applies to the large family tree as well.
- the tree was the same as that shown in the pedigree map ( Figure 11), only much larger and without the lines connecting the ancestors (spots represented each ancestor, but there were so many that it was not practical to show all of the lines connecting the spots.
- a pool of spots represented "Russian", which for purposes of this example were assumed to be an ethnicity that arose about 18,000 years ago. Additional spots represented East Asian, and was based on the assumption that the average Russian harbors 10% East Asian admixture. A third set of spots represented the precursors to Russians; these precursors are unknown but, for purposes of this example, were assumed to be Eastern Europeans.
- East Asians represented about 5% of the total number of ancestors for this person.
- family tree for some people involves numerous groups that are characterized by small degrees of this type of admixture.
- Family trees like that exemplified are polarized with certain ethnicities, and it is uncommon to see a tree with an equal distribution of each of the four BGA groups (sub-Saharan African, Native American, IndoEuropean and East Asian) because, until recently, and even now to a certain extent, people have tended to have children with others like themselves.
- most family trees are not a "mish-mash" of random affiliations, but are highly polarized as exemplified.
- Negative results carry different meaning than positive ones for genealogists. For example, if there is circumstantial, but low quality, data suggesting a pure blood African great-grandfather, and the BGA test reveals 100% IndoEuropean, then the rumor would be discounted (taking into account the genetic law of independent assortment, which would make such a result possible, but unlikely if the data was in fact correct). However, if a person's family is suspected to have had a Chinese great grandfather, one cannot prove it from a 20% East Asian admixture result, since it is not possible to distinguish exogamous admixture from ancient admixture.
- the BGA test provides an ancillary tool that can help fashion a system tailored for the genealogy community by providing BGA admixture results in a manner that places equal emphasis on the anthropological sources, which transmitted ancient or very old (relative to a genealogical time frame, which encompasses the last 250-300 years) admixture to us in modern times, and exogamous admixture due to events in the family free in the last 200 years.
- a database of several thousand BGA profiles is built from people of various locations throughout the world such that one can query the database with a profile, plus or minus a pre-selected error range.
- a list of places for which this type of profile has been commonly found can be provided, or, for example, a map of the world that is color-coded can be provided, wherein the colors indicate the likely regional affiliations corresponding to the admixture profile/range.
- a map can be provided showing the places from which the person's recent ancestors could have been derived.
- a person with 10% East Asian and 90% Indo European would show high probability of ancestral derivation from China (exogamous admixture) or Russia (more ancient admixture coupled with ethnic homogeneity over the family tree).
- the genealogist can provide a map, similarly color-coded, that is derived from paper research, which is based on geopolitical rather than anthropological information.
- the two maps can then be overlaid, and Bayesian statistical calculations made that combine the information from the BGA test with that of the paper genealogy to provide a most likely estimate of recent family history.
- a person with 90% hidoEuropean and 10% East Asian BGA and a paper genealogy of Bulgarian/British/Spanish ancestors would first query the database with his or her BGA result, and be provided a map where the sources would be shown as possibly from East Asia (due to recent admixture), Russia and Northern/Eastern Europe (both due to a large number of more distant ancestors from relatively isolated and admixed groups).
- the color-coding would give the probability of derivation from the regions based on the frequency with which the compatible BGA groups are found in each region, and it may be quite complex depending on the mixture type and the character of our database, which is a function of world-wide sampling.
- the person would provide (or be provided with) a separate map based on the probable Bulgarian/British/ Spanish heritage documented from genealogical research, using map drawing tools we could provide. From this map, it would be apparent that the likelihood of recent, homogeneous East Asian ancestors is not high.
- a program would determine that the most likely origin of the 10% East Asian admixture is from the Bulgarian ancestors (not the British or Spanish, and not due to a Chinese grandparent, for example).
- This type of presentation allow a person to learn the most likely source of an unexpected admixture result, using prior knowledge obtained through other means such as genealogical research. This is valuable to a genealogist seeking to explain the derivation of a genetic constitution.
- This Example demonstrates that cryptic population structure as determined using AEvIs allows an inference as to a complex genetic trait such as iris color.
- Iris pigmentation is a complex genetic trait that has long interested geneticists, anthropologists, and public at large, but is not yet completely understood.
- Eumelanin brown pigment
- melanosomes are a light-absorbing polymer synthesized in specialized melanocyte lysosomes called melanosomes.
- the tyrosinase (TYR) gene product catalyzes the rate-limiting hydroxylation of tyrosine to 3,4-dihydroxyphenylanine, or DOPA, and the resulting product is oxidized to DOPA quinone to form the precursor for eumelanin synthesis.
- TYR is centrally important for this process, pigmentation in animals is not simply a Mendelian function of TYR or any other single protein product or gene sequence.
- variable pigmentation is a function of multiple, heritable factors whose interactions appear to be quite complex (see, e.g., Akey et al., supra, 2002; Box et al., J. Invest. Dermatol. 116, 224-229, 2001).
- Akey et al., supra, 2002 Box et al., J. Invest. Dermatol. 116, 224-229, 2001.
- there appears to be only a minor dominance component for mammalian iris color determination (Brauer and Chopra, Anthropol Anz. 36:109-120, 1978), and there exist minimal correlation between skin, hair and iris color within or between individuals of a given population.
- variable iris color in healthy humans is the result of the differential deposition of melanin pigment granules within a fixed number of stromal melanocytes in the iris.
- the density of granules appears to reach genetically determined levels by early childhood and usually remains constant throughout later life, though a small minority of individuals exhibit changes in color during later stages of life.
- Pedigree studies suggested iris color variation is a function of two loci; a single locus responsible for de-pigmentation of the iris, not affecting skin or hair, and another pleiotropic gene for reduction of pigment in all tissues (Brues, Amer. J. Phys. Anthropol. 43:387-91, 1975).
- TYR catalyzes the rate-limiting step of melanin biosynthesis and the degree to which human hides are pigmented correlates well with the amplitude of TYR message levels. Nonetheless, the complexity of OCA phenotypes has illustrated that TYR is not the only gene involved in iris pigmentation. Though most TYR- negative OCA patients are completely de-pigmented, dark-iris albino mice (C44H), and their human type IB oculocutaneous counterparts exhibit a lack of pigment in all tissues except for the iris (Schmidt and Beermann, Proc. Natl. Acad. Sci. USA 24;91:4756-4560, 1994).
- iris pigmentation defects have been ascribed to mutations in over 85 loci contributing to a variety of cellular processes in melanocytes (Ooi et al., supra, 1997), but mouse studies have suggested that about 14 genes preferentially affect pigmentation in vertebrates (reviewed in Strum et al., supra, 2001), and that disparate regions of the TYR and other OCA genes are functionally distinct for determining the pigmentation in different tissues.
- Specimens for re-sequencing were obtained from the Coriell Institute in Camden, New Jersey. Specimens for genotyping were of self-reported European descent, of different age, sex, hair, iris and skin shades and they were collected using informed consent guidelines under IRB guidance. Donors checked a box for blue, green, hazel, brown, black or unknown/not clear iris colors, and each had the opportunity to identify whether iris color had changed over the course of their lives or whether the color of each iris was different. Individuals for whom iris color was ambiguous or had changed over the course of life were eliminated from the analysis.
- iris colors were reported using a number from 1-11 as well, where 1 is the darkest brown/black and 11 is the lightest blue identified using a color placard.
- digital photographs of the right iris were obtained, where subjects peered into a box at one end, at the camera at the other to standardize lighting conditions and distance, and from which a judge assigned the sample to a color group. Comparing the two, 86 of the classifications matched. Of the 17 that did not, 6 were brown/hazel, 7 were green/hazel and 4 were blue/green discrepancies though none were gross discrepancies such as brown/green, brown/blue or hazel/blue. Though such an error is tolerable for identifying sequences marginally associated with iris colors, confidence can be increased for use of the sequences described herein for iris color classification by obtaining digitally quantified iris colors.
- Candidate SNPs were obtained from the NCBLdbSNP database, which generally provided more candidate SNPs that were possible to genotype.
- Human pigmentation and xenobiotic metabolism genes were examined, selected based on their gene identities not their chromosomal position. For some genes, the number of SNPs in the database was low and/or some of the SNPs were strongly associated with iris colors, warranting a deeper investigation.
- OCA2 oculocutaneous albinism II mouse pink-eyed dilution
Abstract
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA002598455A CA2598455A1 (en) | 2005-02-18 | 2006-02-17 | Multiplex assays for inferring ancestry |
JP2007556374A JP2008532496A (en) | 2005-02-18 | 2006-02-17 | Multiplex assay for inferring ancestry |
AU2006214039A AU2006214039A1 (en) | 2005-02-18 | 2006-02-17 | Multiplex assays for inferring Ancestry |
EP06735495A EP1853730A2 (en) | 2005-02-18 | 2006-02-17 | Multiplex assays for inferring ancestry |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US65467205P | 2005-02-18 | 2005-02-18 | |
US60/654,672 | 2005-02-18 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2006089238A2 true WO2006089238A2 (en) | 2006-08-24 |
WO2006089238A3 WO2006089238A3 (en) | 2007-04-26 |
Family
ID=36917132
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2006/005863 WO2006089238A2 (en) | 2005-02-18 | 2006-02-17 | Multiplex assays for inferring ancestry |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP1853730A2 (en) |
JP (1) | JP2008532496A (en) |
AU (1) | AU2006214039A1 (en) |
CA (1) | CA2598455A1 (en) |
WO (1) | WO2006089238A2 (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008052344A1 (en) * | 2006-11-01 | 2008-05-08 | 0752004 B.C. Ltd. | Method and system for genetic research using genetic sampling via an interactive online network |
WO2012099890A1 (en) * | 2011-01-18 | 2012-07-26 | University Of Utah Research Foundation | Estimation of recent shared ancestry |
TWI460602B (en) * | 2008-05-16 | 2014-11-11 | Counsyl Inc | Device for universal preconception screening |
CN110211639A (en) * | 2018-02-13 | 2019-09-06 | 中国科学院北京基因组研究所 | One kind of groups is distinguished and the construction method and genetic marker reference system of the genetic marker reference system of identification |
CN110273005A (en) * | 2019-05-25 | 2019-09-24 | 深圳市早知道科技有限公司 | A method of the similitude compared with ancients based on SNP parting |
US10854318B2 (en) | 2008-12-31 | 2020-12-01 | 23Andme, Inc. | Ancestry finder |
US11348692B1 (en) | 2007-03-16 | 2022-05-31 | 23Andme, Inc. | Computer implemented identification of modifiable attributes associated with phenotypic predispositions in a genetics platform |
US11514085B2 (en) | 2008-12-30 | 2022-11-29 | 23Andme, Inc. | Learning system for pangenetic-based recommendations |
US11521708B1 (en) | 2012-11-08 | 2022-12-06 | 23Andme, Inc. | Scalable pipeline for local ancestry inference |
US11531445B1 (en) | 2008-03-19 | 2022-12-20 | 23Andme, Inc. | Ancestry painting |
US11817176B2 (en) | 2020-08-13 | 2023-11-14 | 23Andme, Inc. | Ancestry composition determination |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003048318A2 (en) * | 2001-12-03 | 2003-06-12 | Dna Print Genomics, Inc. | Methods for the identification of genetic features |
WO2004016768A2 (en) * | 2002-08-19 | 2004-02-26 | Dnaprint Genomics, Inc. | Compositions and methods for inferring ancestry |
-
2006
- 2006-02-17 EP EP06735495A patent/EP1853730A2/en not_active Withdrawn
- 2006-02-17 AU AU2006214039A patent/AU2006214039A1/en not_active Abandoned
- 2006-02-17 WO PCT/US2006/005863 patent/WO2006089238A2/en active Application Filing
- 2006-02-17 CA CA002598455A patent/CA2598455A1/en not_active Abandoned
- 2006-02-17 JP JP2007556374A patent/JP2008532496A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003048318A2 (en) * | 2001-12-03 | 2003-06-12 | Dna Print Genomics, Inc. | Methods for the identification of genetic features |
WO2004016768A2 (en) * | 2002-08-19 | 2004-02-26 | Dnaprint Genomics, Inc. | Compositions and methods for inferring ancestry |
Non-Patent Citations (4)
Title |
---|
BONNEN P.E. ET AL.: "Haplotypes at ATM identify coding-sequence variation and indicate a region of extensive linkage disequilibrium" AMERICAN JOURNAL OF HUMAN GENETICS, AMERICAN SOCIETY OF HUMAN GENETICS, CHICAGO, IL, US, vol. 67, no. 6, December 2000 (2000-12), pages 1437-1451, XP002210145 ISSN: 0002-9297 * |
FRUDAKIS T. ET AL.: "SEQUENCES ASSOCIATED WITH HUMAN IRIS PIGMENTATION" GENETICS, GENETICS SOCIETY OF AMERICA, AUSTIN, TX, US, vol. 165, December 2003 (2003-12), pages 2071-2083, XP008040049 ISSN: 0016-6731 * |
HOH J. ET AL.: "SELECTING SNPS IN TWO-STAGE ANALYSIS OF DISEASE ASSOCIATION DATA: A MODEL-FREE APPROACH" ANNALS OF HUMAN GENETICS, CAMBRIDGE UNIVERSITY PRESS, LONDON, GB, vol. 64, no. PT 5, 2000, pages 413-417, XP009069066 ISSN: 0003-4800 * |
SHRIVER M.D. ET AL.: "ETHNIC-AFFILIATION ESTIMATION BY USE OF POPULATION-SPECIFIC DNA MARKERS" AMERICAN JOURNAL OF HUMAN GENETICS, UNIVERSITY OF CHICAGO PRESS, CHICAGO,, US, vol. 60, no. 4, April 1997 (1997-04), pages 957-964, XP001041784 ISSN: 0002-9297 cited in the application * |
Cited By (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008052344A1 (en) * | 2006-11-01 | 2008-05-08 | 0752004 B.C. Ltd. | Method and system for genetic research using genetic sampling via an interactive online network |
US11621089B2 (en) | 2007-03-16 | 2023-04-04 | 23Andme, Inc. | Attribute combination discovery for predisposition determination of health conditions |
US11581096B2 (en) | 2007-03-16 | 2023-02-14 | 23Andme, Inc. | Attribute identification based on seeded learning |
US11545269B2 (en) | 2007-03-16 | 2023-01-03 | 23Andme, Inc. | Computer implemented identification of genetic similarity |
US11791054B2 (en) | 2007-03-16 | 2023-10-17 | 23Andme, Inc. | Comparison and identification of attribute similarity based on genetic markers |
US11515047B2 (en) | 2007-03-16 | 2022-11-29 | 23Andme, Inc. | Computer implemented identification of modifiable attributes associated with phenotypic predispositions in a genetics platform |
US11735323B2 (en) | 2007-03-16 | 2023-08-22 | 23Andme, Inc. | Computer implemented identification of genetic similarity |
US11515046B2 (en) | 2007-03-16 | 2022-11-29 | 23Andme, Inc. | Treatment determination and impact analysis |
US11600393B2 (en) | 2007-03-16 | 2023-03-07 | 23Andme, Inc. | Computer implemented modeling and prediction of phenotypes |
US11348692B1 (en) | 2007-03-16 | 2022-05-31 | 23Andme, Inc. | Computer implemented identification of modifiable attributes associated with phenotypic predispositions in a genetics platform |
US11348691B1 (en) | 2007-03-16 | 2022-05-31 | 23Andme, Inc. | Computer implemented predisposition prediction in a genetics platform |
US11581098B2 (en) | 2007-03-16 | 2023-02-14 | 23Andme, Inc. | Computer implemented predisposition prediction in a genetics platform |
US11482340B1 (en) | 2007-03-16 | 2022-10-25 | 23Andme, Inc. | Attribute combination discovery for predisposition determination of health conditions |
US11495360B2 (en) | 2007-03-16 | 2022-11-08 | 23Andme, Inc. | Computer implemented identification of treatments for predicted predispositions with clinician assistance |
US11531445B1 (en) | 2008-03-19 | 2022-12-20 | 23Andme, Inc. | Ancestry painting |
US11625139B2 (en) | 2008-03-19 | 2023-04-11 | 23Andme, Inc. | Ancestry painting |
US11803777B2 (en) | 2008-03-19 | 2023-10-31 | 23Andme, Inc. | Ancestry painting |
TWI460602B (en) * | 2008-05-16 | 2014-11-11 | Counsyl Inc | Device for universal preconception screening |
US11514085B2 (en) | 2008-12-30 | 2022-11-29 | 23Andme, Inc. | Learning system for pangenetic-based recommendations |
US11322227B2 (en) | 2008-12-31 | 2022-05-03 | 23Andme, Inc. | Finding relatives in a database |
US11776662B2 (en) | 2008-12-31 | 2023-10-03 | 23Andme, Inc. | Finding relatives in a database |
US11508461B2 (en) | 2008-12-31 | 2022-11-22 | 23Andme, Inc. | Finding relatives in a database |
US11468971B2 (en) | 2008-12-31 | 2022-10-11 | 23Andme, Inc. | Ancestry finder |
US11935628B2 (en) | 2008-12-31 | 2024-03-19 | 23Andme, Inc. | Finding relatives in a database |
US10854318B2 (en) | 2008-12-31 | 2020-12-01 | 23Andme, Inc. | Ancestry finder |
US11049589B2 (en) | 2008-12-31 | 2021-06-29 | 23Andme, Inc. | Finding relatives in a database |
US11657902B2 (en) | 2008-12-31 | 2023-05-23 | 23Andme, Inc. | Finding relatives in a database |
US11031101B2 (en) | 2008-12-31 | 2021-06-08 | 23Andme, Inc. | Finding relatives in a database |
WO2012099890A1 (en) * | 2011-01-18 | 2012-07-26 | University Of Utah Research Foundation | Estimation of recent shared ancestry |
US20230402132A1 (en) * | 2012-11-08 | 2023-12-14 | 23Andme, Inc. | Error Correction in Ancestry Classification |
US11521708B1 (en) | 2012-11-08 | 2022-12-06 | 23Andme, Inc. | Scalable pipeline for local ancestry inference |
CN110211639B (en) * | 2018-02-13 | 2023-07-04 | 中国科学院北京基因组研究所 | Construction method of genetic marker reference system for population discrimination and identification and genetic marker reference system |
CN110211639A (en) * | 2018-02-13 | 2019-09-06 | 中国科学院北京基因组研究所 | One kind of groups is distinguished and the construction method and genetic marker reference system of the genetic marker reference system of identification |
CN110273005A (en) * | 2019-05-25 | 2019-09-24 | 深圳市早知道科技有限公司 | A method of the similitude compared with ancients based on SNP parting |
US11817176B2 (en) | 2020-08-13 | 2023-11-14 | 23Andme, Inc. | Ancestry composition determination |
Also Published As
Publication number | Publication date |
---|---|
AU2006214039A1 (en) | 2006-08-24 |
WO2006089238A3 (en) | 2007-04-26 |
CA2598455A1 (en) | 2006-08-24 |
JP2008532496A (en) | 2008-08-21 |
EP1853730A2 (en) | 2007-11-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20070037182A1 (en) | Multiplex assays for inferring ancestry | |
US20040229231A1 (en) | Compositions and methods for inferring ancestry | |
EP1853730A2 (en) | Multiplex assays for inferring ancestry | |
US20200208221A1 (en) | Methods for simultaneous amplification of target loci | |
US20200157629A1 (en) | Methods for simultaneous amplification of target loci | |
Lao et al. | Evaluating self‐declared ancestry of US Americans with autosomal, Y‐chromosomal and mitochondrial DNA | |
EP1873257A2 (en) | Compositions and methods for the inference of pigmentation traits | |
EP2175037B1 (en) | Method for analyzing D4Z4 tandem repeat arrays of nucleic acid and kit therefore | |
CN110628891B (en) | Method for screening embryo genetic abnormality | |
CN113614246A (en) | Methods and compositions for identifying tumor models | |
Craig et al. | Applications of whole-genome high-density SNP genotyping | |
CN102439167A (en) | Method for determining DNA copy number by competitive pcr | |
AU2009225275A1 (en) | Compositions and methods for inferring ancestry | |
WO2003045227A2 (en) | Single nucleotide polymorphisms and combinations thereof predictive for paclitaxel responsiveness | |
CN101671736B (en) | Gene detection kit used for detecting cell chimerism or individual recognition | |
CN115948532A (en) | SMA detection kit based on digital PCR technology | |
US20040023275A1 (en) | Methods for genomic analysis | |
Parfenchyk et al. | The Theoretical Framework for the Panels of DNA Markers Formation in the Forensic Determination of an Individual Ancestral Origin | |
Chaitanya et al. | Applications of Genomics | |
Nasedkina et al. | Determination of phenotypic characteristics of an individual on the basis of analysis of genetic markers using biological microchips | |
Hurles | Mutation and variability of the human Y chromosome | |
Wilson et al. | Human population structure and demographic history using genetic markers | |
CN117222751A (en) | Gene polymorphism marker for judging skin color and application thereof | |
CN115362268A (en) | Gene polymorphism marker for judging pigmentation skin type and application thereof | |
CN116218984A (en) | Nucleic acid combination, kit and detection method for detecting antidepressant drug genes based on time-of-flight nucleic acid mass spectrometry technology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase in: |
Ref document number: 2598455 Country of ref document: CA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2007556374 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2006214039 Country of ref document: AU |
|
NENP | Non-entry into the national phase in: |
Ref country code: DE |
|
ENP | Entry into the national phase in: |
Ref document number: 2006214039 Country of ref document: AU Date of ref document: 20060217 Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2006735495 Country of ref document: EP |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |