JP2012050432A

JP2012050432A - Composition and method for inferring ancestry

Info

Publication number: JP2012050432A
Application number: JP2011178621A
Authority: JP
Inventors: Tony N Frudakis; トニーエヌ．フルダキス; Mark D Shriver; マークディー．シュライバー
Original assignee: DNAPrint Genomics Inc
Current assignee: DNAPrint Genomics Inc
Priority date: 2002-08-19
Filing date: 2011-08-17
Publication date: 2012-03-15
Also published as: WO2004016768A2; WO2004016768A3; AU2003265572A1; JP2006514553A; EP1578944A4; AU2009225275A1; EP1578944A2; CA2496155A1

Abstract

PROBLEM TO BE SOLVED: To provide ancestry informative markers (AIMs) containing a single nucleotide polymorphism, and to provide a method for using a panel of AIMs for drawing an inference as to a trait of an individual.SOLUTION: The method for inferring the trait of the individual includes following steps of: (a) contacting a nucleic acid containing sample of a test individual with an oligonucleotide probe that can detect nucleotide occurrences of single nucleotide polymorphisms (SNPs) of a panel of at least about ten AIMs wherein the population structure correlates with the trait; and (b) identifying, with a predetermined level of confidence, a population structure that correlates with the nucleotide occurrences of the AIMs in the test individual, wherein the population structure correlates with the trait, and inferring, with a predetermined level of confidence, the trait of the individual, which are biogeographical ancestry, pigmentation traits, drug responsiveness, or disease susceptibility of the individual.

Description

発明の分野
本発明は、一般的に、個体の生物地理学的祖先を予測する遺伝マーカーの同定、より具体的には、個体の形質に関して推論を可能にする祖先情報提供マーカー(AIM)として有用な一塩基多型の組み合わせ、そのようなAIMを同定するためのアルゴリズム、ならびに個体の祖先を含む個体の形質、薬物に対する個体の応答性および疾患についての個体の素因、を推論するためにそのようなAIMを用いる方法に関する。 Field of the Invention The present invention is generally useful as identification of genetic markers that predict an individual's biogeographic ancestry, and more particularly as an ancestor information marker (AIM) that allows inferences about an individual's traits To infer individual single nucleotide polymorphism combinations, algorithms for identifying such AIMs, and individual traits, including individual ancestry, individual responsiveness to drugs, and individual predisposition for disease It relates to a method that uses AIM.

背景情報
ヒト個体の間での遺伝的変異の大部分(80〜90%)は、個体相互であり、相対的に小さな割合(10〜20%)のみが集団差による(Nei、Molecular Population Genetics (Columbia University Press、ニューヨーク) 1987；Cavalli-Sforzaら、The History and Geography of Human Genes (Princeton University Press、プリンストン、NJ) 1994；Dekaら、Electrophoresis 16:1659-1664、1995；Rosenbergら、Science 298:2381-2385、2002；Akeyら、BioTechniques 30:348-367、2001；Akeyら、Hum. Genet. 108:516-520、2002)。たいていの集団は対立遺伝子を共有し、一つの集団において最も高頻度であるそれらの対立遺伝子は、他のものにおいても高頻度である。集団特異的であるかもしくは地理的および民族的に定義された集団の間で大きな頻度差をもつ古典的マーカー(例えば、血液型、血清タンパク質および免疫学的マーカー)またはDNA遺伝マーカーはごく少ない(RoychodhuryおよびNei、Human Polymorphic Genes: World Distribution (Oxford University Press、ニューヨーク) 1988；Deanら、Amer. J. Hum. Genet. 55:788-808、1994；Cavalli-Sforzaら、前記、1994；Akeyら、前記、2001、2002)。固有の遺伝マーカーのこの明らかな欠如にもかかわらず、固有の生態学的条件、偶然的遺伝的浮動および性別選択への遺伝的適応をおそらく反映している、ヒト集団の間の顕著な身体的および生理的差がある。現代の集団において、これらの差は、民族群間の形態学的差において、加えて薬物応答性ならびに疾患に対する感受性および耐性における差において、明らかである。 Background information The majority of genetic variation (80-90%) among human individuals is between individuals, and only a relatively small percentage (10-20%) is due to population differences (Nei, Molecular Population Genetics ( Columbia University Press, New York) 1987; Cavalli-Sforza et al., The History and Geography of Human Genes (Princeton University Press, Princeton, NJ) 1994; Deka et al., Electrophoresis 16: 1659-1664, 1995; Rosenberg et al., Science 298: 2381 -2385, 2002; Akey et al., BioTechniques 30: 348-367, 2001; Akey et al., Hum. Genet. 108: 516-520, 2002). Most populations share alleles, and those alleles that are most frequent in one population are also frequent in others. Very few classical markers (e.g. blood group, serum proteins and immunological markers) or DNA genetic markers that are population-specific or have large frequency differences between geographically and ethnically defined populations ( Roychodhury and Nei, Human Polymorphic Genes: World Distribution (Oxford University Press, New York) 1988; Dean et al., Amer. J. Hum. Genet. 55: 788-808, 1994; Cavalli-Sforza et al., Supra, 1994; Akey et al., Supra, 2001, 2002). Despite this apparent lack of unique genetic markers, significant physical fitness among human populations, probably reflecting inherent ecological conditions, accidental genetic drift and genetic adaptation to gender selection And there are physiological differences. In modern populations, these differences are evident in morphological differences between ethnic groups, as well as differences in drug responsiveness and disease susceptibility and tolerance.

基本的レベルにおいて、ヒト集団構造は、「人種」の遺伝性構成要素またはヘリテージ（heritage）であり、かつ決定のいずれの尺度にも関連性がある、生物地理学的祖先(BioGeographical Ancestry)(BGA)という言葉で表現されうる。例えば、粗いレベルにおいて、BGAは2群(例えば、ヨーロッパ人対その他)について決定されうる；または細かいレベルにおいて、例えば、それは、インドヨーロッパ人、東アジア人、サハラ以南のアフリカ人および先住アメリカ人（Native American）のような4群によって「人種」に言及することができる；または細かいレベルにおいて、例えば、それは、ヨーロッパ人群内において民族性に言及することができる(例えば、地中海人種の人またはスカンジナビア人)；またはさらに細かいレベルにおいて、例えば、それは、アイルランド人群内の1組の共通の祖先由来のオライリー(O'Reilly)の子孫の群のような、民族群内の家族の群にまでも言及することができる。BGAの測定は、ほとんどいずれの型の遺伝学または疫学的研究設計ついても関連性がある。例えば、BGAは、薬物応答の変動性において重要な構成要素である(Burroughsら、J. Natl. Med. Assoc. 94:1-26、2002)。この関係の理由は、遺伝的浮動、地理的および/または生殖隔離、ならびに地域の選択的圧力が、原産の食物に見出されるアルカロイド、タンニン(自己防衛化学物質)、および他の生体異物との適合性についての我々の祖先の対立遺伝子頻度を形成したことである。たいていの薬物はそのような化学物質から由来しており、それゆえに、人間が薬物を解毒するのを可能にする酵素のファミリーが、異なる集団において異なる頻度で見出されることは、偶然の一致ではない。このシナリオは、薬物応答性に特有ではなく、薬物応答性に関連のないゲノムの多くの他の部分は、これらの同じ型の圧力を受けやすい。 At a basic level, the human population structure is a “geographic” genetic component or heritage, and is associated with any measure of determination, BioGeographical Ancestry ( BGA). For example, at a coarse level, BGA can be determined for two groups (eg, European vs. others); or at a fine level, for example, it is Indian Europeans, East Asians, Sub-Saharan Africans and Native Americans “Race” can be referred to by four groups such as (Native American); or at a fine level, for example, it can refer to ethnicity within the European group (eg Mediterranean people) Or Scandinavian); or at a finer level, for example, it extends to groups of families within ethnic groups, such as a group of O'Reilly descendants from a set of common ancestry within the Irish group Can also be mentioned. BGA measurements are relevant for almost any type of genetic or epidemiological study design. For example, BGA is an important component in drug response variability (Burroughs et al., J. Natl. Med. Assoc. 94: 1-26, 2002). The reason for this relationship is that genetic drift, geographic and / or reproductive isolation, and local selective pressure are compatible with alkaloids, tannins (self-defending chemicals), and other xenobiotics found in native foods It has formed the allelic frequency of our ancestors about sex. Most drugs are derived from such chemicals, and therefore it is not a coincidence that the family of enzymes that allow humans to detoxify drugs is found at different frequencies in different populations . This scenario is not specific to drug responsiveness, and many other parts of the genome that are not associated with drug responsiveness are subject to these same types of pressure.

研究者は、一般的に、疾患と単に相関しているだけである遺伝子変異体を同定することよりむしろ、疾患を引き起こす遺伝子変異体(いわゆる「表現型活性の」遺伝子座)を同定することに関わってきた。それとして、どんな形質が調べられようと、かつ関連のない個体を含むたいていの研究設計について、表現型活性遺伝子座との連鎖不平衡(LD)にあるものよりむしろ与えられたサンプルにおいて形質値と相関する構造のマーカーを同定することを避けるために、集団構造について制御することが重要であると考えられてきた(Rischら、Genome Biology 3:1-12、2001；Wangら、Amer. J. Hum. Genet. 71:1227-1234、2002；Burroughsら、前記、2002；RaoおよびChakraborty、Amer. J. Hum. Genet. 26:444-453、1974)。サンプル収集物において集団構造の2つの源がある：1)サンプリングが同種の集団から行われる場合でさえも、構造を生じうるサンプリング効果、および2)自然のヒト人口統計学。集団構造の第一の源は、遺伝学研究にとって厄介なことであり、この型の構造による研究から見出された関連は、一般的に、ヒト人口統計学の反映よりむしろ収集過程の人為結果とみなされる。たいていの遺伝学者は、一般的に、そのうえ、第二種の構造を厄介なこととみなしている。それとして、集団構造によるとして同定された関連は、偽の発見または人為結果とみなされ、一般的に棄却されてきた；真の連鎖またはLDによる発見のみが発表されたが、そのようなマーカーが、生物学的関連遺伝子に連鎖しているとみなされるからなのだが。 Researchers generally identify genetic variants that cause disease (so-called “phenotypically active” loci), rather than identifying genetic variants that are simply correlated with the disease. I have been involved. As such, no matter what trait is examined, and for most research designs involving unrelated individuals, trait values in a given sample rather than those in linkage disequilibrium (LD) with a phenotypically active locus. In order to avoid identifying markers of correlated structure, it has been considered important to control the population structure (Risch et al., Genome Biology 3: 1-12, 2001; Wang et al., Amer. J. Hum. Genet. 71: 1227-1234, 2002; Burroughs et al., Supra, 2002; Rao and Chakraborty, Amer. J. Hum. Genet. 26: 444-453, 1974). There are two sources of population structure in the sample collection: 1) sampling effects that can produce structure, even when sampling is done from a homogeneous population, and 2) natural human demographics. The primary source of population structure is awkward to genetics research, and the associations found from this type of structural research are generally an artifact of the collection process rather than reflecting human demographics. Is considered. Most geneticists generally also view the second type of structure as a nuisance. As such, associations identified as due to population structure have been considered false findings or artifacts and have generally been rejected; only true linkage or LD findings have been published, but such markers Because it is considered to be linked to a biologically relevant gene.

個体の群において集団構造の両方の型(上記)を定量するために多くの努力が向けられてきた。そのような方法は、本質的に、構造の指標として、サンプルの群内でヘテロ接合性の予想されるレベルからの逸脱を測定する(これらの方法のいずれも個体内構造を読むことができないが)。多くのありふれた疾患は、BGAの機能として遺伝子座および/または対立遺伝子不均一性を示し、多くの著者は、研究設計段階の間の集団構造への不適切な配慮が、今まで得られた再現性のないありふれた疾患/共通変異結果のラッシュに関係しているいわゆる「偽陽性」結果の少なくとも一部を生じたことを示唆した(Terwilligerら、Curr. Opin. Genet. Devel. 12:726-734、2002)。集団構造の影響について制御するために、いくつかの検定が適当である(Cockerham、Evolution 23:72-83、1969；Cockerham、Genetics 74:679-700、1973；WierおよびCockerham、Evolution 38:1358-1370、1984；Long、Genetics 112:629-647、1986；Excoffierら、Genetics 131:343-359、1992)。これらの方法は、2つの主なカテゴリーに分類されうる − ゲノム制御(genomic control)方法(DevlinおよびRoeder、Biometrics 55:997-1004、1999)および構造化関連(structured association)(SA)方法(PritchardおよびDonelly、Theor. Popul. Biol. 60:227-237、2001)。両方の方法は、遺伝的構造の効果について推定かつ補正するために連鎖していないマーカーのパネルのジェノタイピングを必要とするが、それらは、通常、サンプル収集物に適用される。しかしながら、サンプルのプールがそのような検定を怠るならば、問題を修正するためにどのサンプルが除去されるべきかは通常、明らかではない。この方法について等しく悩ます問題は、それが、しばしば、高価なデータの作成後に研究サンプルに適用され、それに従って、経済的問題に加えて循環論理問題を生じることである；これらの方法は、通常、関連が探索されるデータの特徴を用いて集団構造に関する情報を引き出すために用いられる。 Much effort has been directed to quantify both types of population structure (above) in groups of individuals. Such methods inherently measure deviations from the expected level of heterozygosity within a group of samples as structural indicators (although none of these methods can read intra-individual structures). ). Many common diseases show loci and / or allelic heterogeneity as a function of BGA, and many authors have so far gained improper attention to population structure during the study design phase Suggested that it produced at least some of the so-called “false positive” results associated with the rush of common disease / common mutation results that were not reproducible (Terwilliger et al., Curr. Opin. Genet. Devel. 12: 726 -734, 2002). Several tests are appropriate to control for the effects of population structure (Cockerham, Evolution 23: 72-83, 1969; Cockerham, Genetics 74: 679-700, 1973; Wier and Cockerham, Evolution 38: 1358- 1370, 1984; Long, Genetics 112: 629-647, 1986; Excoffier et al., Genetics 131: 343-359, 1992). These methods can be divided into two main categories-the genomic control method (Devlin and Roeder, Biometrics 55: 997-1004, 1999) and the structured association (SA) method (Pritchard). And Donelly, Theor. Popul. Biol. 60: 227-237, 2001). Both methods require genotyping a panel of unlinked markers to estimate and correct for the effects of genetic structure, but they are usually applied to sample collections. However, if a pool of samples fails such an assay, it is usually not clear which samples should be removed to correct the problem. An equally plaguing problem with this method is that it is often applied to research samples after the creation of expensive data, and accordingly produces circular logic problems in addition to economic problems; these methods are usually Used to derive information about the population structure using the characteristics of the data for which the association is searched.

構造または混合物が統計学的燃料として用いられるべきではなく、表現型活性遺伝子座を同定するための試みの最初から集団構造の影響を最小限にするために、症例および対照が構成においてマッチングかつ均質化されうるように、BGAのような大まかな集団層化に基づいてサンプルを認定することが一般的に望ましい。例えば、症例および対照の内ならびに間に均等な割合または「人種の均一性」を保証することは、症例対照法の実行においてまれなことではない。しかしながら、たいていの研究目的にとって、集団所属を測定するために用いられる主観的な方法は不満足である。現在、生物地理学的な質問票を用いて測定されているが、明らかであること以外の集団構造の知識はほとんど得られず、集団構造と薬物応答の間の基本的関係のみが明らかでありうる、および/または制御されうる。無矛盾性は、質問票および食品医薬品局が臨床試験設計過程間に提出するように企てているものにおける人種の自己申告について重要な問題である。しかしながら、データ収集のそのような主観的かつ不正確な方法を用いては、無矛盾性は、達成するのに困難な目的でありうる。 The structure or mixture should not be used as a statistical fuel, and cases and controls are matched and homogeneous in composition to minimize the impact of population structure from the beginning of attempts to identify phenotypically active loci It is generally desirable to qualify samples based on a rough population stratification such as BGA so that For example, ensuring an equal proportion or “race uniformity” within and between cases and controls is not uncommon in the practice of case-control methods. However, for most research purposes, the subjective methods used to measure group affiliation are unsatisfactory. Currently measured using a biogeographic questionnaire, but little knowledge of the population structure other than what is apparent is available, and only the basic relationship between population structure and drug response is obvious. And / or can be controlled. Consistency is an important issue for racial self-reporting in questionnaires and what the Food and Drug Administration intends to submit during the clinical trial design process. However, with such subjective and inaccurate methods of data collection, consistency can be a difficult goal to achieve.

質問票においてどのような質問が尋ねられるかを再公式化することよりむしろ、実行の主観的性質を客観的、再現性のある科学的方法と置き換えることにより、無矛盾性はよりうまく向けられうる。人種データの収集にとって、それの測定がどの他のヒトの属性の測定にも劣らず主観的でありうるため、標準化および客観性は最も重要なことである。人種の自己申告は、性の自己申告ほど些細な実行ではなく、多くの人々は彼らの人種を知らない、または彼らが自分自身を単一の群へ分類するのに悩むほど十分な混合である。そのようなシナリオは、移民により多数の文化が結合されている、米国のような国において、特によくある話である。例えば、主にサハラ以南アフリカ系の女性は、プエルトリコに育ったが、自分自身をヒスパニックとして記載する可能性がある。彼女は、自分をヒスパニックと社会文化的に同一視するが、彼女の生体異物の代謝および薬物標的多型は、他のサハラ以南人の間で共有されるものと関連している可能性がより高い。社会の社会文化的構成概念を記載する非人類学的名称を用いることにより、研究設計過程において人種に関する情報を考慮に入れるための現行のガイドラインは、乏しい予測的力および偽陽性結果をもたらしうる。人が育って、住んでいるところ、および彼らが遵守する文化的または社会学的習慣は、その人が薬物に対してどのように応答するか、または疾患を発生しうる傾向に影響力をもつ可能性がある。このように、非生物学的測定基準が必要とされるが、その証拠は、BGAもまた影響力をもち、それゆえに、科学的に正確かつ再現性のある様式において測定される必要があることを示唆している。 Rather than reformulating what questions are asked in the questionnaire, consistency can be better addressed by replacing the subjective nature of execution with objective and reproducible scientific methods. Standardization and objectivity are paramount for the collection of racial data, as its measurement can be as subjective as any other human attribute measurement. Racial self-reporting is not as trivial as gender self-reporting, and many people do not know their race or are mixed enough that they bother to classify themselves into a single group It is. Such a scenario is especially common in countries such as the United States where many cultures are combined by immigrants. For example, mostly sub-Saharan African women grew up in Puerto Rico, but may describe themselves as Hispanic. She identifies herself and socio-culturally, but her xenobiotic metabolism and drug target polymorphisms may be related to what is shared among other sub-Saharan people. taller than. By using non-anthropological names that describe the socio-cultural composition of society, current guidelines for taking into account racial information in the research design process can result in poor predictive power and false positive results . Where people grow up, live, and the cultural or sociological habits they adhere to influence how they respond to drugs or the tendency to develop disease there is a possibility. Thus, although non-biological metrics are required, the evidence is that BGA is also influential and therefore needs to be measured in a scientifically accurate and reproducible manner. It suggests.

人のDNAに存在する遺伝マーカーは、個体あたりのBGAを信頼性をもって測定する最高の機会を提供し、そのような手段が可能であると認識されてから久しい。例えば、Reed(Science 244:575-576、1973)およびNeel(Mutat. Res. 26:319-328、1974)は、そのようなマーカーを「私有(private)」と呼び、それらを突然変異率を推定するために用いた。Reed(前記、1973)は、異なる対立遺伝子が異なる集団において固定されている仮定の遺伝マーカー遺伝子座を記載するのに「理想(ideal)」(個体の祖先推定におけるマーカーの利用に関して)という用語を用いた。Chakrabortyら(Ethnic. Dis. 1:245-256、1991)は、1つの集団のみに見出される変異体を「固有の対立遺伝子(unique allele)」と呼び、どのようにして対立遺伝子頻度が逆にされて集団またはBGAの所属の尤度推定を与えうるかを示した。BGAの推論に最も有用な「固有の対立遺伝子」は、また集団の間で対立遺伝子頻度において大きな差をもつもの(Reed)、前記、1973；Chakrabortyら、Genetics 130:231-243、1992；Stephensら、Amer. J. Hum. Genet. 55:809-824、1994)、および、今は「祖先情報提供マーカー」(AIM)；Shriverら、Hum. Genet. 112:387-399、2003；Frudakisら、J. Forens. Sci. 48(4) 771-782、2003)と呼ばれているが、「集団特異的対立遺伝子」(PSA、Shriverら、Amer. J. Hum. Genet. 60:957-964、1997；Parraら、Amer. J. Hum. Genet. 63:1839-1851、1998))と呼ばれていたものである。 Genetic markers present in human DNA provide the best opportunity to reliably measure BGA per individual and have long since been recognized as possible. For example, Reed (Science 244: 575-576, 1973) and Neel (Mutat. Res. 26: 319-328, 1974) call such markers "private" and call them mutation rates. Used to estimate. Reed (supra, 1973) uses the term `` ideal '' (with respect to the use of markers in individual ancestry estimation) to describe hypothetical genetic marker loci in which different alleles are fixed in different populations. Using. Chakraborty et al. (Ethnic. Dis. 1: 245-256, 1991) referred to variants found in only one population as `` unique alleles '' and how the allele frequency was reversed. It has been shown whether a population or BGA affiliation likelihood estimate can be given. “Intrinsic alleles” that are most useful for BGA inference are also those with large differences in allele frequencies between populations (Reed), supra, 1973; Chakraborty et al., Genetics 130: 231-243, 1992; Amer. J. Hum. Genet. 55: 809-824, 1994), and now “ancestral information marker” (AIM); Shriver et al., Hum. Genet. 112: 387-399, 2003; Frudakis et al. , J. Forens. Sci. 48 (4) 771-782, 2003), but "population specific alleles" (PSA, Shriver et al., Amer. J. Hum. Genet. 60: 957-964 1997; Parra et al., Amer. J. Hum. Genet. 63: 1839-1851, 1998)).

法医学分野内で、特定の個体において最高レベルの祖先(比例的祖先表記法を用いる大部分BGA)を推論するために単純直列型反復(STR)を用いる統計的方法は、大部分BGA所属を推定することに関してかなり強靱でありうる。STR検査は、たいていの場合において大部分祖先起源を効果的に決定することができるが、分類の条件に合わない数(5〜10%)があいまいである。まれな対立遺伝子により引き起こされるサンプリングエラーおよびSTRが集団所属を決定するそれらの能力としてゲノムから選択されなかった(すなわち、STR対立遺伝子頻度示差がこの目的のために必ずしもまた最適にも情報を与えていない)という事実は別にして、高いレベルのあいまいさの主な理由は、混合のためである可能性が高く、それは明らかに、多くのヒト集団についての遺伝的変動の因子である(Parraら、前記、1998、Cavalli-SforzaおよびBodmer、The genetics of human populations (Dover Publications、NY；ページ387-507を参照) 1999；Rosenbergら、前記、2002)。与えられた研究設計について、自己申告された情報を用いようがDNAマーカー検査を用いようが、かつ薬理ゲノム学的問題を解こうとしようが法医学的問題を解こうとしようが、患者を単一の群へ分類することは、繊細な、しかし取るに足らないことではない、集団構造および下位構造に関する情報を犠牲にする；例えば、50%アフリカ人および50%ヨーロッパ人の所属の人を群へ割り振るための許容性がない。残念なことに、個体について一度にたった2つより多い群についてのBGAに関する正確な推論を可能にするためのマーカーおよび方法はまだ記載されていない。このように、BGAを推論するために有用である強靱なマーカーについて、およびそのようなマーカーを同定かつ使用する方法についての必要性が存在している。本発明はこの必要性を満足させ、かつさらなる利点を提供する。 Within the forensic field, statistical methods that use simple serial iterations (STR) to infer the highest level of ancestry (mostly BGA using proportional ancestry notation) in a particular individual, mostly estimate BGA affiliation Can be quite tough to do. STR testing can effectively determine most ancestral origins in most cases, but numbers (5-10%) that do not meet the criteria of classification are ambiguous. Sampling errors caused by rare alleles and STRs were not selected from the genome as their ability to determine population affiliation (i.e., STR allele frequency differentials are not necessarily optimally informative for this purpose as well. Apart from the fact that the main reason for the high level of ambiguity is likely due to mixing, it is clearly a factor of genetic variation for many human populations (Parra et al. 1998, Cavalli-Sforza and Bodmer, The genetics of human populations (see Dover Publications, NY; see pages 387-507) 1999; Rosenberg et al., Supra, 2002). For a given study design, whether to use self-reported information or to use DNA marker testing, and to solve pharmacogenomic or forensic problems, a single patient Categorizing into a group of people sacrifices sensitive but not trivial information about population structure and substructure; for example, 50% African and 50% European affiliations There is no tolerance to allocate. Unfortunately, markers and methods have not yet been described to allow accurate inferences about BGA for more than two groups at a time for individuals. Thus, a need exists for robust markers that are useful for inferring BGA, and for methods of identifying and using such markers. The present invention satisfies this need and provides further advantages.

本発明は、個々の集団構造内における、望ましい所定の信頼水準をもつ、本明細書に開示されているように、例えば、個体の祖先、色素形成形質、薬物応答性、および疾患感受性に関して引き出される推論を可能にする測定のための方法および組成物を提供する。例として、本方法および組成物は、法医学立場において用いられ、ルイジアナでの連続殺人/強姦の犯罪現場において得られたDNA試料が調べられた。心理学的プロファイリングに基づき、警察は連続殺人者がカフカス人の男性であるという信念をもち、1,000人を超すカフカス人男性のDNAを検査したが、適合を見出せなかった。そういうわけで、警察は、本発明者らを頼りとし、本発明の組成物および方法を用いて、犯罪を犯した個体がアフリカ系アメリカ人である、およびより具体的には、85%サハラ以南アフリカ人および15%先住アメリカ人の比例的かつ信頼資格のある祖先をもつと決定した。この結果および本明細書に開示された追加の結果に基づいて、平均アフリカ系アメリカ人は20%インドヨーロッパ人祖先をもつこと、インドヨーロッパ人祖先のより大きいレベルがより白い皮膚色調と相関すること、および、それゆえに、犯罪を犯した人は、平均から平均皮膚色調より黒い方までのアフリカ系アメリカ人である可能性が高いことを警察はさらに助言された。この情報に基づき努力の方向を変えての2ヶ月間内に、警察は、平均皮膚色調(アフリカ系アメリカ人として)のアフリカ系アメリカ人を逮捕した；DNA検査は、彼が、DNAが犯罪現場で見出された人であることを決定した。 The invention is derived, for example, with respect to an individual's ancestry, pigmentation trait, drug responsiveness, and disease susceptibility, as disclosed herein, having a desired predetermined confidence level within an individual population structure. Methods and compositions for measurements that allow inference are provided. By way of example, the methods and compositions were used in a forensic position to examine DNA samples obtained at a serial murder / rape crime scene in Louisiana. Based on psychological profiling, the police had the belief that serial killers were Caucasian men and examined the DNA of over 1,000 Caucasian men, but found no match. As such, the police have relied on the inventors and, using the compositions and methods of the present invention, the offending individuals are African-Americans, and more specifically, 85% sub-Saharan. We decided to have proportional and reliable ancestry of South Africans and 15% Native Americans. Based on this result and the additional results disclosed herein, the average African American has 20% Indo-European ancestry, and a greater level of Indo-European ancestry correlates with a whiter skin tone. And, therefore, the police further advised that the offenders are likely to be African Americans from average to blacker than average skin tone. Within two months of changing efforts based on this information, police arrested African Americans with average skin tone (as African Americans); DNA testing showed that he Determined to be a person found in

従って、本発明は、個体の形質を所定の信頼水準を以て推論する方法に関する。そのような方法は、例えば、検査個体の核酸分子を含む試料をハイブリダイズするオリゴヌクレオチドと接触させる段階であって、ハイブリダイズするヌクレオチドは、形質と相関した集団構造を示す少なくとも約10個の祖先情報提供マーカー(AIM)のパネルの一塩基多型(SNP)のヌクレオチド出現を検出することができ、かつその接触段階が、個体のAIMのヌクレオチド出現をハイブリダイズするオリゴヌクレオチドにより検出するのに適した条件下で行われる、段階；および個体におけるAIMのヌクレオチド出現に相関する集団構造を所定の信頼水準を以て同定する段階であって、集団構造は形質と相関する、段階により行われうる。本明細書に開示されているように、少なくとも約10個のAIM(例えば、8個、9個、10個、11個、12個、13個、14個、15個、20個、25個、30個またはそれ以上)のパネルが本発明の方法を実施するにおいて調べられる。一般的に、調べられるAIMの数が多ければ多いほど、本方法を用いてなされる推論の信頼水準が高くなる。 Therefore, the present invention relates to a method for inferring individual traits with a predetermined confidence level. Such methods include, for example, contacting a sample containing a nucleic acid molecule of a test individual with a hybridizing oligonucleotide, wherein the hybridizing nucleotide comprises at least about 10 ancestors exhibiting a population structure correlated with the trait. A single nucleotide polymorphism (SNP) nucleotide occurrence of a panel of informational markers (AIM) can be detected, and its contact stage is suitable for detecting an individual's AIM nucleotide occurrence with a hybridizing oligonucleotide Identifying a population structure that correlates with the nucleotide occurrence of AIM in an individual with a predetermined confidence level, wherein the population structure correlates with a trait. As disclosed herein, at least about 10 AIMs (e.g., 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30 or more panels) are examined in carrying out the method of the invention. In general, the greater the number of AIMs examined, the higher the confidence level of inferences made using this method.

推論が本発明の方法によりなされる形質は、民族的素因が知られているまたは現存すると思われる形質、および民族的素因が現存しないことが知られている、または民族的素因があるかどうかに関して知られていないまたは不明である形質を含む、任意の形質でありうる。一つの態様において、形質は生物地理学的祖先(BGA)である。一つの局面において、BGAを調べるために用いられるAIMのパネルは、配列番号：1〜71に示されるAIMを含む。もう一つの局面において、パネルは、配列番号：7、21、23、27、45、54、59、63および72〜152に；配列番号：3、8、9、11、12、33、40、59、63および153〜239に；または配列番号：1、8、11、21、24、40、172および240〜331に示されるAIMを含み、加えてパネルは、配列番号：1〜331に示されるAIMの組み合わせを含む。本明細書に開示されているように、本発明の方法を実施するのに有用なAIMは、形質に結びつけられる遺伝子(すなわち、形質表現型に関連していることが知られた遺伝子)に連鎖しうるが、必要ではなく、一般的にその遺伝子(または遺伝子座)と連鎖不平衡にはない。例えば、本発明の方法により個体の薬物応答性を推論するのに有用なAIMは、薬物に対する応答性に関連している遺伝子(例えば、シトクロムP450遺伝子またはP-糖タンパク質遺伝子のような薬物代謝遺伝子または薬物輸送遺伝子)に連鎖している必要はない。同様に、本発明の方法により個体の色素形成形質を推論するのに有用なAIMは、色素形成に関連する遺伝子(例えば、チロシナーゼ遺伝子またはメラノコルチン-1受容体遺伝子)に連鎖している必要はない。このように、一つの局面において、パネルの少なくとも1個(例えば、1個、2個、3個、4個または5個)のAIMは、推論がなされることになっている形質に関連している遺伝子に連鎖していない。 Traits for which reasoning is made by the method of the present invention are related to traits that are known or likely to exist for ethnic predispositions, and whether or not there are ethnic predispositions It can be any trait, including traits that are unknown or unknown. In one embodiment, the trait is biogeographic ancestry (BGA). In one aspect, the panel of AIMs used to examine BGA comprises the AIMs shown in SEQ ID NOs: 1-71. In another aspect, the panel is in SEQ ID NO: 7, 21, 23, 27, 45, 54, 59, 63 and 72-152; SEQ ID NO: 3, 8, 9, 11, 12, 33, 40, 59, 63 and 153 to 239; or SEQ ID NOs: 1, 8, 11, 21, 24, 40, 172 and 240 to 331, in addition to the panels shown in SEQ ID NOs: 1 to 331 Including AIM combinations. As disclosed herein, AIMs useful for carrying out the methods of the invention are linked to a gene associated with a trait (i.e., a gene known to be associated with a trait phenotype). Although not necessary, it is generally not in linkage disequilibrium with the gene (or locus). For example, an AIM useful for inferring an individual's drug responsiveness by the method of the present invention is a gene associated with drug responsiveness (eg, a drug metabolic gene such as a cytochrome P450 gene or a P-glycoprotein gene). Or a drug transport gene). Similarly, an AIM useful for inferring an individual's pigmentation trait by the methods of the invention need not be linked to a gene associated with pigmentation (eg, a tyrosinase gene or a melanocortin-1 receptor gene). . Thus, in one aspect, at least one (e.g., 1, 2, 3, 4 or 5) AIM of the panel is associated with the trait that is to be inferred. It is not linked to the gene.

BGAが推論がなされることになっている形質である場合、調べられることになっている個体は、例えば、サハラ以南アフリカ人祖先、先住アメリカ人祖先、インドヨーロッパ人祖先、東アジア人祖先、中東人祖先、太平洋諸島系祖先、またはこれらの祖先の1つもしくは複数を含む組み合わせを含む、祖先群のいずれか一つまたは組み合わせを含む祖先をもちうる。それとして、個体の比例的祖先は、1つの祖先(例えば、100%インドヨーロッパ人祖先)、または2つ、3つ、4つもしくはそれ以上の祖先群の任意の割合を含みうる。それとして、検査個体(または既知の比例的祖先の個体)は、例えば、サハラ以南アフリカ人祖先および2つの他の祖先群の割合を含みうる、またはサハラ以南アフリカ人祖先およびインドヨーロッパ人祖先群ならびに第三の祖先；もしくは先住アメリカ人およびインドヨーロッパ人祖先群ならびに第三の祖先；もしくは東アジア人および先住アメリカ人祖先群ならびに第三の祖先；もしくはインドヨーロッパ人および東アジア人祖先群ならびに第三の祖先の割合を含みうる；または先住アメリカ人、東アジア人およびインドヨーロッパ人祖先群、もしくはサハラ以南アフリカ人、先住アメリカ人およびインドヨーロッパ人祖先群などの割合を含みうる。 If BGA is the trait to be inferred, the individuals to be examined are, for example, sub-Saharan African ancestry, Native American ancestry, Indo-European ancestry, East Asian ancestry, Middle East An ancestry can include ancestry, Pacific Islander ancestry, or ancestry that includes any one or combination of ancestry groups, including combinations that include one or more of these ancestry. As such, the proportional ancestry of an individual can include one ancestry (eg, 100% Indo-European ancestry) or any proportion of two, three, four or more ancestry groups. As such, a test individual (or an individual of known proportional ancestry) may include, for example, a sub-Saharan African ancestry and a proportion of two other ancestry groups, or a sub-Saharan African ancestry and an Indo-European ancestry group, and Third ancestry; or Native American and IndoEuropean ancestry and third ancestry; or East Asian and Native American ancestry and third ancestry; or IndoEuropean and East Asian ancestry and third Or a percentage of Native American, East Asian and Indo-European ancestry groups, or sub-Saharan African, Native American and Indo-European ancestry groups.

もう一つの態様において、推論がなされることになっている検査個体の形質は、薬物、特に治療的薬物に対する個体の応答性である。それとして、本発明の方法は、個別化医療を実現するためのツールを提供する。検査個体が積極的にかまたは消極的にかのいずれで応答するかに関して推論がなされうる薬物は、例えば、パクリタキセルのような癌化学療法剤、またはコレステロールレベルを維持するもしくは低下させるのに有用でありうるスタチンのような薬物でありうる。この態様の一つの局面において、本方法を実施するために用いられるAIMのパネルのAIMは、メラニン合成または代謝に関連していることが知られている遺伝子以外の遺伝子のAIMを含む。 In another embodiment, the trait of the test individual that is to be inferred is the individual's responsiveness to drugs, particularly therapeutic drugs. As such, the method of the present invention provides a tool for realizing personalized medicine. Drugs that can be inferred as to whether the test individual responds positively or passively are useful for maintaining or lowering cholesterol levels, for example, cancer chemotherapeutic agents such as paclitaxel. It can be a drug like a possible statin. In one aspect of this embodiment, the AIM of the panel of AIMs used to practice the method comprises an AIM of a gene other than a gene known to be associated with melanin synthesis or metabolism.

さらにもう一つの態様において、推論がなされることになっている検査個体の形質は、疾患に対する個体の感受性または素因である。本明細書に開示されているように、様々な形質が大陸的レベルにおいて集団構造と関連しているが、他の形質は、細かいレベルにおいて集団構造と関連している。それとして、本発明の方法は、民族的素因をもつことが知られている(すなわち、特定の民族/祖先の群の個体においてより高い頻度で発生することが知られている)糖尿病、高血圧症および癌のような疾患について、加えて民族的素因をもたない(または少なくとももつことが知られていない)アルコール中毒、または統合失調症、パーキンソン病および他の神経学的疾患のような疾患についての疾患感受性のような形質に関して推論するための手段を提供することができる。 In yet another embodiment, the trait of the test individual that is to be inferred is the individual's susceptibility or predisposition to the disease. As disclosed herein, various traits are associated with population structure at the continental level, while other traits are associated with population structure at a fine level. As such, the methods of the present invention are known to have an ethnic predisposition (i.e., more frequently occur in individuals of a particular ethnic / ancestry group) Diabetes, hypertension And for diseases such as cancer, plus alcoholism that has no ethnic predisposition (or at least is not known to have), or diseases such as schizophrenia, Parkinson's disease and other neurological diseases Means can be provided for inferring on traits such as disease susceptibility.

なおもう一つの態様において、推論がなされることになっている検査個体の形質は、色素形成形質である。色素形成形質は、例えば、目の色もしくは暗度、皮膚の色、髪の色、またはそれらの組み合わせを含む任意のそのような形質でありうる。この態様の一つの局面において、本方法を実施するために用いられるAIMのパネルのAIMは、メラニン合成もしくは代謝、または色素形成の他の局面に関連していることが知られている遺伝子以外の遺伝子のAIMを含む。 In yet another embodiment, the trait of the individual to be inferred is a pigmentation trait. The pigmentation trait can be any such trait including, for example, eye color or darkness, skin color, hair color, or combinations thereof. In one aspect of this embodiment, the AIM of the panel of AIMs used to perform the method is other than a gene known to be associated with melanin synthesis or metabolism, or other aspects of pigmentation. Contains the gene AIM.

個体においてAIMのヌクレオチド出現と相関する集団構造を測定することにより検査個体の形質を推論する方法は、集団構造の下位集団構造を所定の信頼水準を以て同定することをさらに含みうり、下位集団構造は形質と相関している。例えば、個体の集団構造は、推論により個体が祖先を共有する大陸間の群、例えばインドヨーロッパ人、に相関することができ、下位集団構造は、個体がインドヨーロッパ人祖先を共有する大陸内群、例えば地中海人種の民族性、とさらに相関することができる。 The method of inferring a trait of a test individual by measuring a population structure that correlates with the occurrence of AIM nucleotides in the individual may further comprise identifying a subpopulation structure of the population structure with a predetermined confidence level, wherein the subpopulation structure is Correlates with traits. For example, the population structure of an individual can be correlated by reasoning to an intercontinental group that the individual shares ancestry, such as Indo-European, and the subpopulation structure is a group within the continent where the individual shares the Indo-European ancestry Can be further correlated with, for example, the ethnicity of the Mediterranean.

本発明の方法において有用なハイブリダイズするオリゴヌクレオチドは、オリゴヌクレオチドプローブまたはオリゴヌクレオチドプライマーでありうる。本方法において有用なオリゴヌクレオチドプローブは、AIMについてのSNP位置を含むヌクレオチド配列にハイブリダイズすることができ、AIMについてのSNPの位置に対応するハイブリダイズするオリゴヌクレオチドの位置におけるヌクレオチドは、SNP位置におけるヌクレオチド出現と整合するかまたは整合しないかのいずれかである。本発明の方法において有用なさらなるオリゴヌクレオチドプローブは、SNP位置に隣接しかつ上流の、および/または隣接しかつ下流のポリヌクレオチド配列にハイブリダイズする、ならびにSNPのヌクレオチド位置に対応するヌクレオチドを含みうるが、必要ではなく、そのような対応するヌクレオチドが、プローブに存在する場合、SNPにおけるヌクレオチド出現と整合しうるが、必要ではない、オリゴヌクレオチドプローブを含む。 Hybridizing oligonucleotides useful in the methods of the invention can be oligonucleotide probes or oligonucleotide primers. Oligonucleotide probes useful in this method can hybridize to a nucleotide sequence comprising a SNP position for AIM, and the nucleotide at the position of the hybridizing oligonucleotide corresponding to the SNP position for AIM is at the SNP position. Either consistent or inconsistent with nucleotide occurrence. Additional oligonucleotide probes useful in the methods of the invention may include nucleotides that hybridize to polynucleotide sequences adjacent and upstream and / or adjacent and downstream to the SNP position and corresponding to the nucleotide position of the SNP. However, it is not necessary and includes oligonucleotide probes that, if such corresponding nucleotides are present in the probe, may be consistent with nucleotide occurrences in the SNP, but are not required.

本発明の方法において有用なオリゴヌクレオチドプライマーは、プライマー伸長反応に有用なオリゴヌクレオチドプライマー、および併せて、AIMを含む鋳型ポリヌクレオチドの増幅を可能にするオリゴヌクレオチドプライマーを含む。そのような増幅プライマー対は一般的に、対象となるAIMを含む鋳型ポリヌクレオチドの増幅に有用なフォワードプライマーおよびリバースプライマーを含む。しかしながら、2つ、3つ、4つまたはそれ以上の異なるフォワードプライマーは、AIMを含む異なる鋳型ポリヌクレオチド(例えば、多重反応において)および共通の遺伝子配列(例えば、関連遺伝子配列のファミリーのAIM)の増幅のために、または単一の鋳型由来の異なるサイズの増幅産物を作製するために、共通のリバースプライマーと共に用いられうることは、認識されているものと思われる。同様に、1つの共通のフォワードプライマーが、1つまたは複数の異なるリバースプライマーと共に用いられうる。 Oligonucleotide primers useful in the methods of the present invention include oligonucleotide primers useful for primer extension reactions, as well as oligonucleotide primers that allow amplification of template polynucleotides containing AIM. Such amplification primer pairs generally comprise a forward primer and a reverse primer useful for amplification of a template polynucleotide comprising the AIM of interest. However, two, three, four or more different forward primers can be used for different template polynucleotides containing AIM (e.g., in a multiplex reaction) and common gene sequences (e.g., AIM of a family of related gene sequences). It will be appreciated that it can be used with a common reverse primer for amplification or to generate amplification products of different sizes from a single template. Similarly, one common forward primer can be used with one or more different reverse primers.

従って、一つの態様において、本発明の方法は、オリゴヌクレオチドプライマーを用いて行われる。この態様の一つの局面において、方法は、プライマー伸長産物の生成に適した条件下において、試料をオリゴヌクレオチドプライマーと、およびポリメラーゼと接触させることを含む。そのような方法において、SNPのヌクレオチド出現は、プライマー伸長産物の存在を検出することにより、またはプライマー伸長産物(またはそれの産物)をシーケンシングし、SNPの位置に対応する位置におけるヌクレオチドを同定することにより、測定されうる。この態様のもう一つの局面において、方法は、増幅産物の生成に適した条件下において、試料を、増幅プライマー対を含むオリゴヌクレオチドプライマーと、およびポリメラーゼと接触させることを含む。そのような方法において、SNPのヌクレオチド出現は、増幅産物の存在を検出することにより、または増幅産物(またはそれらの産物)をシーケンシングし、SNPの位置に対応する位置におけるヌクレオチドを同定することにより、測定されうる。 Thus, in one embodiment, the methods of the invention are performed using oligonucleotide primers. In one aspect of this embodiment, the method comprises contacting the sample with an oligonucleotide primer and a polymerase under conditions suitable for the generation of primer extension products. In such a method, the nucleotide occurrence of the SNP is detected by detecting the presence of the primer extension product or by sequencing the primer extension product (or its product) and identifying the nucleotide at the position corresponding to the position of the SNP. Can be measured. In another aspect of this embodiment, the method comprises contacting the sample with an oligonucleotide primer comprising an amplification primer pair and a polymerase under conditions suitable for the generation of an amplification product. In such a method, the nucleotide occurrence of the SNP is detected by detecting the presence of the amplified product or by sequencing the amplified product (or their products) and identifying the nucleotide at the position corresponding to the position of the SNP. Can be measured.

本発明の方法は、多重形式においてを含む、高処理量形式において行われることに特に適合性があり、従って、多数のAIMおよび/または多数の検査個体の試料、加えて対照の並行しての検査を可能にする。それとして、方法は、調べられることになっている試料が、例えば、トレイのウェル上またはスライドガラスもしくはシリコンチップ上において、アレイ、特にアドレス指定できるアレイ、に並べられる形式を用いて行われうり、ロボット工学を用いて部分的にまたは完全に自動化されうる。多重プラットフォームが用いられる場合、調べられるAIMは、必ずしも、特定の形質についての最も大きいデルタ値をもつものである必要はなく、例えば、ハイブリダイズするオリゴヌクレオチドが設計されている標的AIM以外のAIMと実質的にクロスハイブリダイズしないということがなければ、AIMのパネルを調べるために単一の反応において用いられうるハイブリダイズするオリゴヌクレオチド(例えば、増幅プライマー対)が設計されうるようにAIMを選択するために、多重セットにおいてデルタ値をプライマーの適合性とバランスがとれるように選択することもできることは、認識されているものと思われる。 The method of the present invention is particularly compatible with being performed in a high throughput format, including in a multiplex format, and thus a large number of AIM and / or multiple test individual samples, as well as a control in parallel. Enable inspection. As such, the method may be performed using a format in which the sample to be examined is arranged in an array, particularly an addressable array, for example, on a well of a tray or on a glass slide or silicon chip, It can be partially or fully automated using robotics. If multiple platforms are used, the AIM examined need not necessarily have the largest delta value for a particular trait, for example, with an AIM other than the target AIM for which the hybridizing oligonucleotide is designed. If not substantially cross-hybridized, select AIM so that hybridizing oligonucleotides (eg, amplification primer pairs) can be designed that can be used in a single reaction to examine a panel of AIMs Thus, it appears to be recognized that delta values in multiple sets can also be selected to balance primer suitability.

本発明はまた、検査個体の少なくとも2つの祖先群の比例的祖先を所定の信頼水準を以て推定する方法に関する。そのような方法は、例えば、検査個体の核酸分子を含む試料を、調べられる各祖先群についてBGAを示す少なくとも約10個のAIMのパネルのSNPのヌクレオチド出現を検出することができるハイブリダイズするオリゴヌクレオチドと接触させる段階であって、接触段階が、ハイブリダイズするオリゴヌクレオチドにより検査個体のAIMのヌクレオチド出現を検出するのに適した条件下においてである、段階；および調べられる祖先群のそれぞれのAIMのヌクレオチド出現と相関する、または、の傾向である可能性が最も高い、集団構造を、所定の信頼水準を以て同定する段階であって、集団構造が比例的祖先を示している、段階により行われうる。 The invention also relates to a method for estimating a proportional ancestor of at least two ancestry groups of a test individual with a predetermined confidence level. Such methods include, for example, hybridizing oligos capable of detecting the nucleotide occurrence of a SNP in a panel of at least about 10 AIMs representing a BGA for each ancestral group being examined, including a nucleic acid molecule of a test individual Contacting the nucleotides, wherein the contacting step is under conditions suitable for detecting the nucleotide appearance of the AIM of the test individual by the hybridizing oligonucleotide; and each AIM of the ancestral group being examined Identifying a population structure with a predetermined confidence level that correlates or is most likely to be the nucleotide occurrence of the population structure, wherein the population structure indicates a proportional ancestry. sell.

本発明の方法により推定される比例的祖先は、例えば、サハラ以南アフリカ人、先住アメリカ人、インドヨーロッパ人、東アジア人、中東人または太平洋諸島系の祖先群を含む任意の祖先群の割合でありうり、一般的に、そのような祖先群の2つまたはそれ以上の組み合わせである。このように、検査個体の比例的祖先は、サハラ以南アフリカ人およびインドヨーロッパ人祖先群の割合を含みうる(例えば、80%サハラ以南アフリカ人および20%インドヨーロッパ人；または60%サハラ以南アフリカ人、20%インドヨーロッパ人、および20%の第三の祖先群)；または先住アメリカ人およびインドヨーロッパ人祖先群；東アジア人および先住アメリカ人祖先群；インドヨーロッパ人および東アジア人祖先群などの割合を含みうる。同様に、比例的祖先は、先住アメリカ人、東アジア人およびインドヨーロッパ人祖先群；サハラ以南アフリカ人、先住アメリカ人およびインドヨーロッパ人祖先群；サハラ以南アフリカ人、先住アメリカ人および東アジア人祖先群などの割合を含みうる。 Proportional ancestry estimated by the method of the present invention is a percentage of any ancestry group including, for example, sub-Saharan African, Native American, Indo-European, East Asian, Middle Eastern or Pacific Islander ancestry groups. Well, generally, a combination of two or more such ancestry groups. Thus, the proportional ancestry of a test individual may include the proportion of sub-Saharan African and Indo-European ancestry groups (eg, 80% sub-Saharan and 20% Indo-European; or 60% sub-Saharan Africans). , 20% Indo-European, and 20% third ancestry groups); or Native American and Indo-European ancestry groups; East Asian and Indigenous American ancestry groups; Indo-European and East Asian ancestry groups, etc. May include percentage. Similarly, proportional ancestry includes Native American, East Asian and Indo-European ancestry groups; Sub-Saharan African, Native American and Indo-European ancestry groups; Sub-Saharan African, Native American and East Asian ancestry It can include proportions such as groups.

個体の比例的祖先を推定するために有用なAIMのパネルは、配列番号：1〜331に示されるAIM、例えば、インドヨーロッパ人、サハラ以南アフリカ人、東アジア人および先住アメリカ人を含む比例的祖先を測定するために有用でありうる配列番号：1〜71に示されるAIM；または東アジア人およびサハラ以南アフリカ人の比例的祖先を測定するために有用でありうる配列番号：7、21、23、27、45、54、59、63および72〜152に；もしくは東アジア人およびインドヨーロッパ人の比例的祖先を測定するために有用でありうる配列番号：3、8、9、11、12、33、40、59、63および153〜239に；もしくはインドヨーロッパ人およびサハラ以南アフリカ人の比例的祖先を測定するために有用でありうる配列番号：1、8、11、21、24、40、172および240〜331に示されるAIM、を含みうる。 A panel of AIMs useful for estimating an individual's proportional ancestry is the AIM shown in SEQ ID NOs: 1-331, for example, proportional including Indo-European, Sub-Saharan African, East Asian and Native Americans AIM set forth in SEQ ID NOs: 1-71 which may be useful for measuring ancestry; or SEQ ID NOs: 7, 21, which may be useful for measuring proportional ancestry of East Asians and sub-Saharan Africans 23, 27, 45, 54, 59, 63 and 72-152; or SEQ ID NOs: 3, 8, 9, 11, 12 which may be useful for measuring the proportional ancestry of East Asians and Indo-Europeans 33, 40, 59, 63 and 153-239; or SEQ ID NOs: 1, 8, 11, 21, 24, 40, which may be useful for determining the proportional ancestry of Indo-Europeans and Sub-Saharan Africans 172 and 240-331, AIM.

一つの態様において、比例的祖先が3つの祖先群の割合を含む、推定値が作成される。この態様の一つの局面において、検査個体のAIMのヌクレオチド出現と相関する、または、の傾向である可能性が最も高い、集団構造を同定することは、サハラ以南アフリカ人祖先群、先住アメリカ人祖先群、インドヨーロッパ人祖先群、および東アジア人祖先群のそれぞれについての所属の尤度決定を行う段階；その後、最も大きい尤度値をもつ3つの祖先群を選択する段階；最も大きい尤度値をもつ3つの祖先群の中ですべての可能な比例的所属の尤度を決定する段階であって、それにより検査個体のAIMのヌクレオチド出現と相関する集団構造または比例的所属が同定される、段階；ならびに最大尤度のたった1つの比例的組み合わせを同定する段階により実施される。 In one embodiment, an estimate is made where the proportional ancestry includes a proportion of three ancestry groups. In one aspect of this embodiment, identifying a population structure that is most likely correlated with or likely to be a trend in the AIM nucleotide appearance of the tested individual is a sub-Saharan African ancestry group, indigenous American ancestry Determining the likelihood of affiliation for each of the group, Indo-European ancestry group, and East Asian ancestry group; then selecting the three ancestry groups with the largest likelihood values; Determining the likelihood of all possible proportional affiliations among the three ancestral groups having, thereby identifying a population structure or proportional affiliation that correlates with the nucleotide occurrence of the AIM of the test individual, As well as identifying only one proportional combination of maximum likelihood.

この態様のもう一つの局面において、AIMのヌクレオチド出現と相関する、または、の傾向である可能性が最も高い、集団構造を同定することは、各群のそれぞれの他の群との間での所属についての尤度決定を含む6つの二元比較を行う段階；その後、最も大きい尤度値をもつ3つの祖先群を選択する段階；最も大きい尤度値をもつ3つの祖先群の中でのすべての可能な比例的所属の尤度を決定する段階であって、それにより検査個体のAIMのヌクレオチド出現と相関する、もしくは、の傾向である可能性が最も高い、集団構造または比例的所属が同定される、段階；および最大尤度のたった1つの比例的組み合わせを同定する段階により実施される。 In another aspect of this embodiment, identifying the population structure that is most likely to correlate with or likely to be the nucleotide occurrence of AIM is Performing six binary comparisons, including likelihood determination for affiliation; then selecting the three ancestry groups with the largest likelihood values; among the three ancestry groups with the largest likelihood values Determining the likelihood of all possible proportional affiliations, whereby the population structure or proportional affiliation most likely correlates with or is likely to be a trend in the AIM nucleotide appearance of the tested individual Identified steps; and identifying only one proportional combination of maximum likelihood.

比例的祖先が3つの祖先群を含んでいて、推定値が作成される態様のさらにもう一つの局面において、方法は、群の間での3つの三元比較を行う段階；最も大きい尤度値をもつ3つの祖先群の中でのすべての可能な比例的所属の尤度を決定する段階であって、それにより検査個体のAIMのヌクレオチド出現と相関する、もしくは、の傾向である可能性が最も高い、集団構造または比例的所属が同定される、段階；および最大尤度のたった1つの比例的組み合わせを同定する段階により実施される。この態様のもう一つの局面において、方法は、3つの祖先群の比較のグラフ表示を作成する段階をさらに含みうり、グラフ表示は、各祖先群が三角形の頂点により独立して表されている、三角形を含み、かつ個体についての比例的所属の最大尤度値は、三角形内の点を含む。望ましい場合には、グラフ表示は、比例的祖先を推定することに伴う信頼水準を示す信頼等高線をさらに含みうる。 In yet another aspect of the embodiment where the proportional ancestry includes three ancestry groups and an estimate is made, the method performs three ternary comparisons between the groups; the largest likelihood value; Determining the likelihood of all possible proportional affiliations among the three ancestral groups with, thereby correlating with or likely to be correlated with the AIM nucleotide appearance of the test individual Performed by identifying the highest, population structure or proportional affiliation, steps; and identifying only one proportional combination of maximum likelihood. In another aspect of this embodiment, the method may further comprise creating a graphical representation of the comparison of the three ancestry groups, wherein each graphical representation is represented independently by a triangular vertex, The maximum likelihood value of a proportional affiliation for an individual that includes a triangle includes a point within the triangle. If desired, the graphical representation may further include confidence contours that indicate the confidence level associated with estimating proportional ancestry.

もう一つの態様において、比例的祖先が4つの祖先群の割合を含む、推定値が作成される。この態様の様々な局面において、検査個体のAIMのヌクレオチド出現と相関する、または、の傾向である可能性が最も高い、集団構造を同定することは、群の間で、6つの二元比較を行う段階、または3つの三元比較を行う段階、または1つの四元比較を行う段階；最も大きい尤度値をもつ4つの祖先群の中でのすべての可能な比例的所属の尤度を決定する段階であって、それにより検査個体のAIMのヌクレオチド出現と相関する、もしくは、の傾向である可能性が最も高い、集団構造または比例的所属が同定される、段階；および最大尤度のたった1つの比例的組み合わせを同定する段階により実施される。この態様の一つの局面において、方法は、3つの祖先群の比較のグラフ表示を作成する段階をさらに含みうり、グラフ表示は、各祖先群がピラミッドの頂点により独立して表されている、ピラミッドを含み、かつ個体についての比例的所属の最大尤度値は、ピラミッド内の点を含む。望ましい場合には、グラフ表示は、その点を中心とした球を含む信頼等高線をさらに含みうり、球は、比例的祖先を推定することに伴う信頼水準を示す。 In another embodiment, an estimate is made where the proportional ancestry includes a proportion of four ancestry groups. In various aspects of this embodiment, identifying a population structure that correlates or is most likely to be the trend of AIM nucleotide occurrences in a test individual is a six-way comparison between groups. Perform three-way comparisons, or perform three-way comparisons; determine the likelihood of all possible proportional affiliations in the four ancestor groups with the highest likelihood values A population structure or proportional affiliation that is most likely to correlate with or is likely to be a trend in the AIM nucleotide appearance of the test individual; and only the maximum likelihood This is done by identifying one proportional combination. In one aspect of this embodiment, the method may further comprise creating a graphical representation of the comparison of the three ancestry groups, wherein the graphical representation is a pyramid, wherein each ancestor group is independently represented by a pyramid vertex. And the maximum likelihood value of proportional affiliation for an individual includes a point in the pyramid. If desired, the graphical representation may further include a confidence contour that includes a sphere centered at that point, where the sphere indicates the confidence level associated with estimating the proportional ancestry.

比例的祖先を示す集団構造を同定することにより検査個体の少なくとも2つの祖先群の比例的祖先を所定の信頼水準を以て推定する方法は、検査個体が比例的祖先を有する祖先群の1つと関連している民族性を示す下位集団構造を同定する段階をさらに含みうる。この方法により、検査個体におけるAIMのヌクレオチド出現と相関する集団構造の下位集団構造が同定され、下位集団構造は、検査個体の民族性と相関している。下位集団構造を同定するそのような方法は、例えば、生物地理学的祖先群(個体は、1つより多い生物地理学的祖先群に比例的に所属している)についての所属を示すAIMを含む検査個体のそれらの染色体を同定する段階、AIMの第二のパネルのSNPのヌクレオチド出現を検出することができる第二のハイブリダイズするオリゴヌクレオチドと、検査個体の核酸分子を含む試料を接触させる段階であって、第二パネルのAIMは、これらの群の1つの内の民族性についての情報を与え、かつその民族性が生じているより大きな(大陸間の)祖先群を示すAIMを含む検査個体の同じ染色体上に存在している、段階；および第二パネルのAIMのヌクレオチド出現と相関する下位集団構造を同定する段階であって、下位集団は検査個体の祖先群の民族性示している、段階により実施されうる。 Estimating the proportional ancestry of at least two ancestor groups of a test individual with a certain confidence level by identifying a population structure that exhibits proportional ancestry is associated with one of the ancestry groups for which the test individual has a proportional ancestry. The method may further include the step of identifying a sub-group structure exhibiting a certain ethnicity. This method identifies a subpopulation structure of the population structure that correlates with the occurrence of AIM nucleotides in the test individual, and the subpopulation structure correlates with the ethnicity of the test individual. Such methods of identifying subpopulation structures include, for example, an AIM that indicates affiliation for a biogeographic ancestry group (individuals are proportionally associated with more than one biogeographic ancestry group). Identifying those chromosomes of the test individual comprising, contacting a sample containing the nucleic acid molecule of the test individual with a second hybridizing oligonucleotide capable of detecting the nucleotide appearance of the SNP in the second panel of AIM Stage, AIM in the second panel contains information about the ethnicity within one of these groups and includes an AIM that indicates the larger (intercontinental) ancestral group in which that ethnicity occurs Identifying the subpopulation structure that is present on the same chromosome of the test individual; and correlating with the second panel of AIM nucleotide occurrences, wherein the subpopulation indicates the ethnicity of the test individual's ancestry group Is in the stage Ri may be implemented.

そのような方法により、AIM(例えば、71個の例証されるAIMのAIM；配列番号：1〜71)の第一パネルに特異的なハイブリダイズするオリゴヌクレオチドを用いて、検査個体は、60%インドヨーロッパ人(IE)および40%東アジア人であると決定されうる。そのような場合、IE祖先群を示しうる全可能AIMの画分のみが陽性であったと思われ(もしすべてが陽性ならば、個体は100%IEであったであろう)、それゆえに、個体染色体または染色体領域の一部のみがインドヨーロッパ起源であると思われる。その後、IEについての陽性AIMを含む個体の染色体が同定され、AIMの第二パネルに特異的な第二のハイブリダイズするオリゴヌクレオチドが選択され(例えば、ヒト染色体のすべての23対を網羅する1000個ほどのAIMの群から)、第二パネルのAIMは、IE民族群の間で対立遺伝子頻度において高く変動し、それゆえに、IE民族性を示し、かつまた、第一パネルAIMがIE陽性であった染色体上に存在しているものに限定される。第二パネルのAIMのヌクレオチド出現に相関する下位集団構造は、その後、同定され、それに従って、検査個体のIE祖先群に関しての民族性、例えば、そのIE祖先群は、北ヨーロッパ、地中海人種、中東または南アジアインドの民族性から由来すること、を示す。それとして、方法は、インドヨーロッパの生物地理学的祖先を示す集団構造と相関するAIMを含み、かつ地中海人種の民族性を示す下位集団構造とより特異的に相関するAIMをさらに含む特定の染色体の民族起源(例えば、インドヨーロッパ起源であることが前に決定された染色体の地中海人種起源)を同定するための手段を提供する。 By such a method, 60% of the test individuals were tested with hybridizing oligonucleotides specific for the first panel of AIM (eg 71 AIMs of 71 exemplified AIMs; SEQ ID NOs: 1-71). Can be determined to be Indo-European (IE) and 40% East Asian. In such cases, only the fraction of all possible AIMs that could represent the IE ancestry group would have been positive (if all were positive, the individual would have been 100% IE) and therefore the individual Only a portion of the chromosome or chromosomal region appears to be of Indo-European origin. The individual's chromosomes containing positive AIM for IE are then identified, and a second hybridizing oligonucleotide specific for the second panel of AIMs is selected (e.g., 1000 covering all 23 pairs of human chromosomes). (From as many as AIM groups), AIM in the second panel varies highly in allelic frequency among IE ethnic groups, and therefore exhibits IE ethnicity, and also the first panel AIM is IE positive Limited to those present on the chromosomes that were present. A subpopulation structure that correlates with the nucleotide appearance of AIM in the second panel is then identified, and accordingly, ethnicity with respect to the IE ancestry group of the test individual, eg, the IE ancestry group is Northern Europe, Mediterranean, It is derived from the ethnicity of the Middle East or South Asia India. As such, the method includes an AIM that correlates with a population structure that represents IndoEuropean biogeographical ancestry, and that further includes an AIM that more specifically correlates with a subpopulation structure that represents the ethnicity of the Mediterranean race. Provides a means for identifying the chromosomal ethnic origin (eg, the Mediterranean origin of a chromosome previously determined to be of Indo-European origin).

もう一つの態様において、検査個体の比例的祖先を推定する方法は、世界の祖先地図を作成することを含みうり、検査個体の比例的祖先に対応する比例的祖先をもつ集団の位置が祖先地図上に示される。それとして、方法は、系統情報を補うことができる。例えば、方法は、祖先地図を系統地図でオーバーレイする段階であって、系統地図が検査個体に関して地政学的関連性をもつ集団の位置を示す、段階、および検査個体の家系の最も可能性が高い推定が得られるように祖先地図と系統地図の情報を統計的に結合する段階をさらに含みうる。 In another embodiment, a method for estimating a proportional ancestry of a test individual can include creating a global ancestor map, wherein the location of a population having a proportional ancestry corresponding to the proportional ancestry of the test individual is an ancestor map. Shown above. As such, the method can supplement the system information. For example, the method involves overlaying an ancestor map with a phylogenetic map, where the pedigree map indicates the location of a population that has geopolitical relevance with respect to the examined individual, and the most likely family of the examined individual The method may further include statistically combining the information of the ancestor map and the phylogenetic map so that an estimation is obtained.

本発明の方法により、AIMのヌクレオチド出現と相関する、または、の傾向である可能性が最も高い、集団構造を同定することは、検査個体のAIMのヌクレオチド出現を、BGAを示すAIMのヌクレオチド出現に対応する既知の比例的祖先と比較することにより行われうる。BGAを示すAIMのヌクレオチド出現に対応する既知の比例的祖先は、表もしくは他のリストに含まれうり、検査個体のヌクレオチド出現は、視覚的に表もしくはリストに比較されうる、またはデータベースに含まれうり、比較は、例えば、コンピューターを用いて、電子的になされうる。さらに、BGAを示すAIMのヌクレオチド出現に対応する既知の比例的祖先のそれぞれは、既知の比例的祖先が決定された人の写真と結びつけられ、それに従って、検査個体の身体的特徴をさらに推論する手段を提供しうる。一つの局面において、写真はデジタル写真であり、デジタル写真のそのようなデジタル情報の複数をさらに含みうるデータベースに含まれうるデジタル情報を含み、それぞれは、写真における人のBGAを示すAIMのヌクレオチド出現に対応する既知の比例的祖先と結びつけられる。 By the method of the present invention, identifying a population structure that is most likely to correlate with or likely to be a trend in AIM nucleotide occurrences, the tester's AIM nucleotide appearance is indicative of BGA AIM nucleotide occurrences. By comparing to a known proportional ancestor corresponding to. Known proportional ancestry corresponding to the nucleotide occurrence of AIM indicating BGA may be included in the table or other list, and the nucleotide occurrence of the test individual may be visually compared to the table or list, or included in the database In other words, the comparison can be made electronically, for example, using a computer. In addition, each known proportional ancestor corresponding to the occurrence of AIM nucleotides representing BGA is associated with a photograph of the person for whom the known proportional ancestor was determined, and further infers the physical characteristics of the test individual accordingly Means can be provided. In one aspect, the photograph is a digital photograph and includes digital information that may be included in a database that may further include a plurality of such digital information of the digital photograph, each of which represents an AIM nucleotide occurrence that indicates a person's BGA Associated with a known proportional ancestor.

もう一つの局面において、本発明の方法は、検査個体の比例的祖先に対応する比例的祖先をもつ人の写真を同定することをさらに含みうる。そのような同定は、写真の1つまたは複数のファイルを手で調べることによりなされうり、写真は、例えば、写真における人のAIMのヌクレオチド出現に従って系統立てられている。写真を同定することはまた、各ファイルが、既知の比例的祖先をもつ人のデジタル写真に対応するデジタル情報を含んでいる、複数のファイルを含むデータベースをスキャンする段階、および検査個体のBGAを示すAIMのヌクレオチド出現に一致するBGAを示すAIMのヌクレオチド出現をもつ人の少なくとも1つの写真を同定する段階により行われうる。 In another aspect, the method of the present invention may further comprise identifying a photograph of a person having a proportional ancestry that corresponds to the proportional ancestry of the test individual. Such identification can be done by manually examining one or more files of the photograph, which are organized according to, for example, the nucleotide occurrence of a person's AIM in the photograph. Identifying the photos also includes scanning a database containing multiple files, each file containing digital information corresponding to a digital photo of a person with a known proportional ancestor, and the BGA of the individual being examined. This may be done by identifying at least one photograph of a person with an AIM nucleotide occurrence showing a BGA that matches the indicated AIM nucleotide appearance.

従って、本発明はまた、BGAを示すAIMのヌクレオチド出現を含む集団構造に対応する既知の比例的祖先をもつ1人の少なくとも1枚の写真である1つの製品、および、複数のうちの各品がBGAを示すAIMのヌクレオチド出現を含む集団構造に対応する既知の比例的祖先をもつ1人の1枚(または複数)の写真を含んでいる、複数のそのような品に関する。品は1つのファイルに含まれうる、または複数の品が1つのファイルに含まれうる、例えば、1つのファイルは異なる人の複数の写真を含み、その人達のうちのいくらかまたはすべては、BGAを示すAIMのヌクレオチド出現を含む集団構造に対応する同じまたは異なる既知の比例的祖先をもつ。 Thus, the present invention also provides a product that is at least one photograph of one person with a known proportional ancestry that corresponds to a population structure that includes nucleotide occurrences of AIM representing BGA, and each product of the plurality Relates to a plurality of such articles containing one (or more) photograph of a person with a known proportional ancestry corresponding to a population structure containing nucleotide occurrences of AIM indicating BGA. Goods can be contained in one file, or multiple goods can be contained in one file, for example, one file contains multiple photos of different people, some or all of those people have a BGA With the same or different known proportional ancestry corresponding to the population structure containing the nucleotide occurrences of the indicated AIM.

従って、複数のそのような品が提供され、複数のファイルも提供されるが、各ファイルが、BGAを示すAIMのヌクレオチド出現を含む集団構造に対応する同じもしくは異なる既知の比例的祖先をもつ1人もしくは複数の人のものでありうる、1つまたは複数の品、すなわち写真、を含みうる。例えば、その複数の異なるファイルはそれぞれ、BGAを示すAIMのヌクレオチド出現を含む集団構造に対応する既知の比例的祖先をもつ1人の1枚(または複数)の写真を含みうる。その複数の異なるファイルはまた、それぞれがBGAを示すAIMのヌクレオチド出現を含む集団構造に対応する同じまたは実質的に同じ比例的祖先をもつ2人またはそれ以上の異なる人の写真を含みうる。それとして、複数のファイルは、それぞれが1人もしくは複数の人の1枚または複数の写真を含み、かつ2人もしくはそれ以上の異なる人の1枚または複数の写真を含む場合、異なる人は同じまたは異なる既知の比例的祖先をもちうる、ファイルを含みうる。 Thus, multiple such items are provided and multiple files are also provided, each file having the same or different known proportional ancestry corresponding to the population structure containing the AIM nucleotide occurrences representing BGA. It may include one or more items, i.e. photographs, that may be of one or more people. For example, the plurality of different files may each contain one (or more) photograph of a person with a known proportional ancestry that corresponds to a population structure containing AIM nucleotide occurrences representing BGA. The plurality of different files may also include photographs of two or more different people having the same or substantially the same proportional ancestry, each corresponding to a population structure containing AIM nucleotide occurrences representing BGA. As such, if each file contains one or more photos of one or more people and one or more photos of two or more different people, the different people are the same Or it may contain files that may have different known proportional ancestry.

一つの態様において、製品、すなわち、BGAを示すAIMのヌクレオチド出現を含む集団構造に対応する既知の比例的祖先をもつ人の写真、はデジタル情報を含むデジタル写真である。それとして、本発明の、デジタル写真、または複数のデジタル写真の製品のデジタル情報は、データベースに含まれうる。それとして、本発明は、それぞれがデジタル情報を含む少なくとも2つのデジタル写真を含む、複数の製品をさらに提供する。この態様の一つの局面において、品の1つまたは複数についてのデジタル情報は、データベースに含まれ、データベースは、例えば、コンピューターハードウェアもしくはソフトウェア、磁気テープ、またはフロッピーディスク、CDもしくはDVDのようなコンピューターディスクを含む、そのようなデータベースを含むのに適した任意の媒体に含まれうる。それとして、データベースは、その中にデータベースを含むことができる、データベースを含む媒体を受け入れることができる、または有線もしくは無線のネットワーク、例えば、イントラネットもしくはインターネット、を通してデータベースにアクセスすることができる、コンピューターによりアクセスされうる。 In one embodiment, the product, ie a photograph of a person with a known proportional ancestry corresponding to a population structure that includes nucleotide occurrences of AIM indicative of BGA, is a digital photograph containing digital information. As such, the digital information of the present invention, or the digital information of multiple digital photo products, may be included in the database. As such, the present invention further provides a plurality of products, each including at least two digital photographs containing digital information. In one aspect of this embodiment, digital information about one or more of the items is contained in a database, which is, for example, computer hardware or software, magnetic tape, or a computer such as a floppy disk, CD or DVD. It can be included on any medium suitable for containing such a database, including disks. As such, the database can be contained by the computer, can contain the database, can accept the medium containing the database, or can access the database through a wired or wireless network, such as an intranet or the Internet. Can be accessed.

本発明はまた、複数のハイブリダイズするオリゴヌクレオチドを含み、それぞれのハイブリダイズするオリゴヌクレオチドが配列番号：1〜331に示されるポリヌクレオチドの少なくとも15個の連続したヌクレオチドを含んでいる、かつその複数がそのようなオリゴヌクレオチドの少なくとも5つを含み、それぞれが配列番号：1〜331に示される異なるポリヌクレオチドに基づいている、キットに関する。一つの態様において、ハイブリダイズするオリゴヌクレオチドは、少なくとも5つの、配列番号：1〜71に示されるポリヌクレオチド、または配列番号：1〜71のいずれかに相補的なポリヌクレオチド、の少なくとも15個の連続したヌクレオチドを含む。 The present invention also includes a plurality of hybridizing oligonucleotides, each hybridizing oligonucleotide including at least 15 consecutive nucleotides of the polynucleotide shown in SEQ ID NOs: 1-331, and the plurality Relates to a kit comprising at least 5 of such oligonucleotides, each based on a different polynucleotide set forth in SEQ ID NOs: 1-331. In one embodiment, the hybridizing oligonucleotide comprises at least 15 of at least 5 of the polynucleotide set forth in SEQ ID NOs: 1-71 or the polynucleotide complementary to any of SEQ ID NOs: 1-71. Contains consecutive nucleotides.

本発明のキットのハイブリダイズするオリゴヌクレオチドは、AIMのSNP位置またはDIP(欠失/挿入多型)位置における特定のヌクレオチド出現を含む、特定のAIMを検出するために有用であるプローブを含みうる；プライマー伸長反応に有用なプライマーおよび核酸増幅反応に有用なプライマー対を含む、プライマーを含みうる；またはそのようなプローブおよびプライマーの組み合わせを含みうる。その複数のうちのハイブリダイズするオリゴヌクレオチドは、AIMのヌクレオチド位置(例えば、配列番号：1〜34およびほとんどの他のもののいずれかのヌクレオチド50位、配列番号：35のヌクレオチド56位、配列番号：50のヌクレオチド44位、または配列番号：56のヌクレオチド26位)またはそれらに相補的なヌクレオチド配列に対応するヌクレオチドを含み、そのようなハイブリダイズするオリゴヌクレオチドは、AIMのSNP位置における特定のヌクレオチド出現の存在または非存在を同定するためのプローブとして有用である。 The hybridizing oligonucleotides of the kits of the invention can include probes that are useful for detecting specific AIMs, including specific nucleotide occurrences at ANP SNP or DIP (deletion / insertion polymorphism) positions. May include primers, including primers useful for primer extension reactions and primer pairs useful for nucleic acid amplification reactions; or may include combinations of such probes and primers. The hybridizing oligonucleotide of the plurality is a nucleotide position of AIM (eg, nucleotide position 50 of any of SEQ ID NOs: 1-34 and most others, nucleotide position 56 of SEQ ID NO: 35, SEQ ID NO: 50 nucleotides 44, or nucleotides 26 of SEQ ID NO: 56) or nucleotides corresponding to their complementary nucleotide sequence, such hybridizing oligonucleotides appear at a particular nucleotide at the SNP position of AIM It is useful as a probe for identifying the presence or absence of.

もう一つの態様において、キットは、AIMのSNP(またはDIP)位置におけるヌクレオチド出現を検出するために有用な少なくとも1対のハイブリダイズするオリゴヌクレオチドを含む。この態様の一つの局面において、1対のハイブリダイズするオリゴヌクレオチドは、AIMのSNP位置に隣接しかつ上流にハイブリダイズする1つのオリゴヌクレオチドおよびAIMのSNP(またはDIP)位置に隣接しかつ下流にハイブリダイズする第二のオリゴヌクレオチドを含み、対の一方または他方は、AIMのSNP(またはDIP)位置にあるのではないかと疑われるヌクレオチド出現(すなわち、多型ヌクレオチドの1つ)に相補的なヌクレオチドをさらに含み、そのような1対のハイブリダイズするオリゴヌクレオチドは、オリゴヌクレオチドライゲーションアッセイ法に有用である。この態様のもう一つの局面において、1対のハイブリダイズするオリゴヌクレオチドは、フォワードプライマーおよびリバースプライマーを含む増幅プライマー対を含み、そのような1対のハイブリダイズするオリゴヌクレオチドは、AIMのSNP(またはDIP)位置を含むポリヌクレオチド部分を増幅するために有用である。 In another embodiment, the kit comprises at least one pair of hybridizing oligonucleotides useful for detecting nucleotide occurrence at the SNP (or DIP) position of AIM. In one aspect of this embodiment, the pair of hybridizing oligonucleotides is adjacent to and downstream of the SNP (or DIP) position of one oligonucleotide and AIM that hybridizes upstream and downstream of the SNP position of AIM. Includes a second oligonucleotide that hybridizes, and one or the other of the pair is complementary to the nucleotide occurrence suspected of being in the SNP (or DIP) position of AIM (i.e., one of the polymorphic nucleotides) Such a pair of hybridizing oligonucleotides further comprising nucleotides are useful in oligonucleotide ligation assays. In another aspect of this embodiment, the pair of hybridizing oligonucleotides comprises an amplification primer pair comprising a forward primer and a reverse primer, and such a pair of hybridizing oligonucleotides comprises an AIM SNP (or Useful for amplifying polynucleotide portions containing DIP) positions.

本発明のキットは、本発明の方法を実施するために有用な追加の試薬をさらに含みうる。それとして、キットは、例えば、キットのハイブリダイズするオリゴヌクレオチドまたは対のハイブリダイズするオリゴヌクレオチドが検出するように設計されているAIMを含むポリヌクレオチドを含む、AIMを含む1つまたは複数のポリヌクレオチドを含みうり、そのようなポリヌクレオチドは対照として有用である。さらに、キットのハイブリダイズするオリゴヌクレオチドは、検出可能に標識されうる、またはキットは、ハイブリダイズするオリゴヌクレオチドを異なって標識するために用いられうる異なる検出可能な標識を含む、キットの1つまたは複数のハイブリダイズするオリゴヌクレオチドを検出可能に標識するために有用な試薬を含みうる；そのようなキットは、ハイブリダイズするオリゴヌクレオチドに標識を連結するため、または標識されたオリゴヌクレオチドを検出するためなどの試薬をさらに含みうる。本発明のキットはまた、例えば、特にキットのハイブリダイズするオリゴヌクレオチドがプライマーもしくは増幅プライマー対を含む場合、ポリメラーゼ；またはキットがオリゴヌクレオチドライゲーションアッセイ法に有用なハイブリダイズするオリゴヌクレオチドを含む場合、リガーゼ、を含みうる。さらに、キットは、例えば、キットに含まれる特定のハイブリダイズするオリゴヌクレオチドおよびキットが供給されることになっている目的に依存して、適切な緩衝剤、デオキシリボヌクレオチド三リン酸などを含みうる。 The kit of the present invention may further comprise additional reagents useful for performing the method of the present invention. As such, the kit includes, for example, one or more polynucleotides comprising AIM, including a polynucleotide comprising AIM that is designed to be detected by a hybridizing oligonucleotide or a pair of hybridizing oligonucleotides of the kit. And such polynucleotides are useful as controls. Further, the hybridizing oligonucleotides of the kit can be detectably labeled, or the kit comprises one of the kits or different detectable labels that can be used to label the hybridizing oligonucleotide differently. Reagents useful for detectably labeling a plurality of hybridizing oligonucleotides can be included; such kits are for linking a label to a hybridizing oligonucleotide or for detecting a labeled oligonucleotide. And the like. The kits of the invention also include ligases, for example, when the hybridizing oligonucleotide of the kit includes a primer or amplification primer pair, a polymerase; or when the kit includes a hybridizing oligonucleotide useful for oligonucleotide ligation assays. , May be included. In addition, the kit can include appropriate buffers, deoxyribonucleotide triphosphates, and the like, depending, for example, on the specific hybridizing oligonucleotides included in the kit and the purpose for which the kit is to be supplied.

混合された集団において、染色体セグメントが、時間が経てば、組換えにより混合される様式を示す図を提供する。最初は、親集団は、セグメントに沿ってAIMに関して連続的である染色体セグメントを有する。雑種第1代(F1)において、すべての人は、各親集団由来の1つの完全な染色体セグメントを有する。F2世代において、ずっと多い組み合わせが可能である。F2に示される非組換え対組換えの遺伝子型の相対的尤度は、染色体セグメントのサイズに依存している。ほぼヒト染色体のサイズ程度のセグメントは、単一の減数分裂において平均数個の組換え事象を生じると思われる(1個の組換えは同程度に遺伝的距離の50 cMごとである)。F3は、F2世代からの2人の親をもつ人についての可能性が高い遺伝子型の例を示す。F(N)xF1は、1人のF(N)親および1人のF1親をもつ人の遺伝子型を図示する；およびF(N)xF2は、1人のF(N)親および1人のF2親をもつ人の遺伝子型を図示する。In a mixed population, a diagram is provided that shows the manner in which chromosomal segments are recombined over time. Initially, the parent population has chromosomal segments that are continuous with respect to AIM along the segment. In the first hybrid (F1), every person has one complete chromosomal segment from each parental population. Much more combinations are possible in the F2 generation. The relative likelihood of the non-recombinant versus recombinant genotype shown in F2 depends on the size of the chromosomal segment. A segment approximately the size of a human chromosome appears to produce an average of several recombination events in a single meiosis (a single recombination is every 50 cM of genetic distance). F3 shows an example of a likely genotype for a person with two parents from the F2 generation. F (N) xF1 illustrates the genotype of a person with one F (N) parent and one F1 parent; and F (N) xF2 represents one F (N) parent and one person Illustrates the genotypes of people with F2 parents. 実施例6(表12も参照)に記載されるアルゴリズムを用いて作成された三角形グラフを示す。NAM、先住アメリカ人；AFR、サハラ以南アフリカ人；EUR、インドヨーロッパ人。図2Aは、三角形のNAM頂点から向かい側の辺への線の延長を図示し、向かい側の辺は0%先住アメリカ人祖先を表す。円は、推定比例的祖先の位置に示されており(図2B参照)、線上のハッチマークは、先住アメリカ人祖先のパーセント(約15%)を示している。図2Bは、AFRおよびEUR頂点から引かれた追加の線を示す。円の位置に対応する各線上の位置は、各それぞれの祖先の割合；すなわち、15%先住アメリカ人、60%インドヨーロッパ人、および25%アフリカ人、を表す。3 shows a triangular graph created using the algorithm described in Example 6 (see also Table 12). NAM, Native American; AFR, Sub-Saharan African; EUR, Indo-European. FIG. 2A illustrates the extension of the line from the NAM vertex of the triangle to the opposite side, with the opposite side representing 0% Native American ancestry. A circle is shown at the position of the estimated proportional ancestry (see FIG. 2B), and the hatch on the line indicates the percentage of indigenous American ancestry (approximately 15%). FIG. 2B shows additional lines drawn from the AFR and EUR vertices. The position on each line corresponding to the position of the circle represents the percentage of each respective ancestor; ie 15% Native Americans, 60% Indo-Europeans, and 25% Africans. 個体の祖先推定の値および精度を図示する1つのアプローチを描く三角形プロットを示す。3つの集団の典型的分布が示されている(ヨーロッパ系アメリカ人：黒塗りの四角；アフリカ系アメリカ人：白抜きの三角形；およびアフリカ人/先住アメリカ人集団：白抜きの円)。単一の個体もまた示され、信頼区間が点推定値(黒塗りの円)を囲む同心環として表されている。位相的マップのように、各同心環は、1logユニット単位での尤度における減少を表す(おそらく、10回未満)。この例において、個体は、対称かつ円形である尤度区間空間をもっている。区間空間は、質問の対象の混合割合およびタイピングされたマーカーの対立遺伝子頻度に依存して多くの形をとるものである。Figure 3 shows a triangular plot depicting one approach illustrating the value and accuracy of an individual's ancestry estimates. The typical distribution of three populations is shown (European Americans: black squares; African Americans: open triangles; and African / Indigenous American populations: open circles). A single individual is also shown, with confidence intervals represented as concentric rings that enclose point estimates (solid circles). Like the topological map, each concentric ring represents a decrease in likelihood in units of 1 log unit (probably less than 10 times). In this example, the individual has a likelihood interval space that is symmetric and circular. The interval space takes many forms depending on the mixing ratio of the questioned objects and the allele frequency of the typed marker. 3つのアフリカ系アメリカ人サンプル(黒抜き円：WASH-ワシントンDC、AFCAR-アフリカ系カリブ人およびBOG-ボーガルーサ)、ヨーロッパ系アメリカ人サンプル(白抜き円：SCO-ステートカレッジ)およびスペイン系アメリカ人サンプル(白抜きの菱形：SLV-サンルイスヴァリーCO)についての平均混合推定値を示している三角形プロットを提供する。括弧に入れて、平均アフリカ人(AFR)、インドヨーロッパ人(EUR)および先住アメリカ人(NAM)の各サンプルへの遺伝的寄与が示されている。Three African American samples (open circles: WASH-Washington DC, AFCAR-African Caribbean and BOG-Bogarousa), European American samples (open circle: SCO-State College), and Spanish American samples Provide a triangle plot showing the mean mixture estimate for (open diamonds: SLV-San Luis Valley CO). In parentheses, the genetic contributions to the average African (AFR), Indo-European (EUR) and Native American (NAM) samples are shown. 米国居住者集団における遺伝的構造を示す。図5Aは、有意の関連を示す連鎖していないAIMのパーセンテージを示す。期待値は、5%有意水準に基づいている。ワシントンDCサンプルについての値は、33個のAIMに、サンルイスヴァリーCOについては19個のAIMに、およびステートカレッジPAについては34個のAIMに基づいている。図5Bは、情報を与えるマーカーの独立したサブセットに基づいた個体の祖先推定値間の相関を示す。平均相関は100個の複製に基づいている。マーカーの総数は図5Aについてと同じである。対応するp値は、グラフの一番下に示されている。Shows the genetic structure in the US resident population. FIG. 5A shows the percentage of unlinked AIM showing a significant association. Expected values are based on the 5% significance level. Values for the Washington DC sample are based on 33 AIMs, 19 AIMs for San Luis Valley CO, and 34 AIMs for State College PA. FIG. 5B shows the correlation between individual ancestry estimates based on independent subsets of informative markers. The average correlation is based on 100 replicates. The total number of markers is the same as for FIG. 5A. The corresponding p-value is shown at the bottom of the graph. 父親(図6A)および母親(図6B)についての三角形プロットを示す。A triangular plot for the father (Figure 6A) and mother (Figure 6B) is shown. 図6に表された父親および母親の3人の子どものそれぞれについての三角形プロットを示す。FIG. 6 shows a triangular plot for each of the three father and mother children represented in FIG. ゲノムにおけるAIMの分布を示す（Chrom. number、染色体番号）。The distribution of AIM in the genome is shown (Chrom. Number, chromosome number). AIMを用いるBGA混合割合解析の強靱性を実証する(実施例2を参照)。最大尤推定値(MLE；点)の信頼(等高線)は、特定のペアワイズ比較について情報を与えるAIMの除去により予想通り影響を及ぼされる。MLEから拡張している第一等高線は、尤度がMLEのそれよりも2倍低い三角形プロット空間を定義し、第二等高線は、尤度がMLEより5倍低い空間を定義する。図9Aは、71個のAIMを用いて得られたMLEおよび信頼等高線を示す；実際のパーセンテージが示されている。図9Bは、東アジア人-先住アメリカ人区別について情報を与える解析から図9Aに示された結果を得るために用いられたそれらのAIMを除去した後得られた結果を示す。MLEは比較的影響を及ぼされておらず、東アジア人-インドヨーロッパ人(ヨーロッパ人)軸および先住アメリカ人-ヨーロッパ人軸に沿う信頼等高線は歪められていないままであるが、信頼等高線は、東アジア人-先住アメリカ人軸に沿って歪められている。Demonstrate the toughness of BGA mixing ratio analysis using AIM (see Example 2). The confidence (contour) of the maximum likelihood estimate (MLE) is affected as expected by the removal of AIMs that inform the specific pairwise comparison. The first contour extending from the MLE defines a triangular plot space with a likelihood two times lower than that of the MLE, and the second contour defines a space with a likelihood five times lower than the MLE. FIG. 9A shows the MLE and confidence contours obtained with 71 AIMs; the actual percentages are shown. FIG. 9B shows the results obtained after removing those AIMs that were used to obtain the results shown in FIG. 9A from an informed analysis of East Asian-Indigenous American distinctions. MLE is relatively unaffected and the confidence contours along the East Asian-Indo-European (European) axis and the Native American-European axis remain undistorted, but the confidence contours are Distorted along the East Asian-Indigenous American axis. 家系図の8人の個体について測定されたBGA混合割合を示す。円は女性を、四角は男性を表し、各個体についてのBGA所属は、分子がインドヨーロッパ人BGAを表し、分母が先住アメリカ人BGAを表す分数として示されている。アステリスク(*)により示されているもの(個体が4%東アジア人BGAをもつと測定されたことを示している)を除いては、どの個体も、サハラ以南アフリカ人BGAも東アジア人BGAも含まなかった。The BGA mixing ratio measured for 8 individuals in the family tree is shown. Circles represent women, squares represent men, and BGA affiliation for each individual is shown as a fraction where the numerator represents IndoEuropean BGA and the denominator represents Native American BGA. Except for what is indicated by an asterisk (*) indicating that the individual was measured to have 4% East Asian BGA, all sub-Saharan African BGAs and East Asian BGAs Was not included. 他のすべてはインドヨーロッパ人の家系図における中国人曾祖父がどのようにしてインドヨーロッパ人/東アジア人祖先をもつ孫を生じうるかを実証している家系図を示す。100%東アジア人(中国人)である個体は網掛けで示されている；家系図の一番下の男性(四角)(短い矢印)についての混合結果が対象となる。長い矢印により示された祖母は、約50%/50%東アジア人/インドヨーロッパ人混合であり、彼女の娘、対象者の母親、は25%/75%東アジア人/インドヨーロッパ人混合であると予想される(実施例3)。All others show a family tree demonstrating how Chinese great-grandfathers in an Indo-European family tree can produce grandchildren with Indo-European / East Asian ancestry. Individuals who are 100% East Asian (Chinese) are shaded; mixed results for the bottom male (square) (short arrow) in the family tree. The grandmother indicated by the long arrow is about 50% / 50% East Asian / Indo-European mixed, and her daughter, the subject's mother, is 25% / 75% East Asian / Indian European mixed. Expected to be (Example 3). 上昇したコレステロールレベルについて治療された患者の群について染色体アームによるジェノタイピングに有効なすべてのSNPの分布を示す。Figure 5 shows the distribution of all SNPs effective for genotyping by chromosome arm for a group of patients treated for elevated cholesterol levels. コレステロール(lip TC)、低密度リポタンパク質(lip LDL)、肝臓トランスアミナーゼAST-SGOT(lip SGOT)およびALT-GPT(lip GPT)測定に関して応答が知られていたLipitor(商標)を服用するカフカス人個体(n=180)におけるSNPの分布を示す。様々な形質クラス中で有意(>0.20)のデルタ値をもつSNPが選択された。例えば、患者の約70%において、Lipitor(商標)はLDLにおける減少を引き起こした。任意の与えられたSNPについて、デルタ値(δ)は、LDLが少なくとも20%減少した個体対LDLが変化しなかった個体の中での少数対立遺伝子頻度における差である。Caucasian individuals taking Lipitor ™ with known responses for cholesterol (lip TC), low density lipoprotein (lip LDL), liver transaminase AST-SGOT (lip SGOT) and ALT-GPT (lip GPT) measurements The distribution of SNP at (n = 180) is shown. SNPs with significant (> 0.20) delta values in various trait classes were selected. For example, in about 70% of patients, Lipitor ™ caused a decrease in LDL. For any given SNP, the delta value (δ) is the difference in the minor allele frequency among individuals whose LDL has been reduced by at least 20% versus those whose LDL has not changed. 応答がZocor(商標)での治療後に測定されていること(n=150)、ならびに総コレステロール(zoc TC)およびLDL(zoc LDL)のみが調べられたことを除いては、図13についてと同様の解析を示す。Same as for FIG. 13 except that response was measured after treatment with Zocor ™ (n = 150) and only total cholesterol (zoc TC) and LDL (zoc LDL) were examined The analysis of is shown. 既知の目の色の1,000人の個体についての染色体中のSNPの分布(δ>0.11)を示す。The distribution of SNPs in the chromosome (δ> 0.11) for 1,000 individuals of known eye color is shown. 既知の髪および目の色の1,000人の個体についての染色体中のSNPの分布(δ>0.11)を示す。Shown is the distribution of SNPs in the chromosome (δ> 0.11) for 1,000 individuals of known hair and eye color.

発明の詳細な説明
本発明は、個体の集団構造のレベルを推論する、次には、個体の様々な形質に関する推論を可能にするために有用な祖先情報提供マーカー(AIM)の同定に基づく。さらに、本発明のAIMは、マーカーが、形質と関係していることが知られている遺伝子または遺伝子座との連鎖不平衡にあろうとなかろうとにかかわらず、形質に相関することを実証されている。それとして、形質と連鎖している場合、すなわち、マーカーが、例えば、形質と関係している(または関連している)ことが知られている遺伝子(または遺伝子座)に関して低いクロスオーバーパーセンテージをもつことに特徴があるように形質に関係していることが知られている遺伝子に物理的に近接している場合のみ有用であるとみなされた以前に記載されたマーカーと、本発明のAIMは区別できる。対照的に、本方法において有用なマーカー(AIM)が遺伝子/形質と連鎖不平衡にあるという必要条件はなく、実際、形質に相関しているとして本明細書に開示されたAIMは、お互いと、および形質に関連していることが知られている遺伝子/遺伝子座と、異なる染色体上に位置しうる。 DETAILED DESCRIPTION OF THE INVENTION The present invention is based on the identification of ancestral informative markers (AIM) that are useful to infer the level of population structure of an individual, which in turn allows inferences about the various traits of the individual. Furthermore, the AIM of the present invention has been demonstrated that markers correlate with traits, whether or not they are in linkage disequilibrium with genes or loci known to be associated with traits. Yes. As such, when linked to a trait, i.e., the marker has a low crossover percentage, for example with respect to a gene (or locus) known to be associated with (or associated with) the trait The previously described markers considered useful only when in physical proximity to a gene known to be associated with a trait such that it is particularly characterized, and the AIM of the present invention Can be distinguished. In contrast, there is no requirement that markers (AIMs) useful in this method be in linkage disequilibrium with the gene / trait, and in fact, the AIMs disclosed herein as being correlated with traits , And a gene / locus known to be associated with a trait may be located on a different chromosome.

AIMは、集団間の高い頻度差をもつ対立遺伝子を示す遺伝子座である。AIMは、一般的に、一塩基多型(SNP；例えば、配列番号：1を参照)、および欠失/挿入多型(DIP；例えば、配列番号：363を参照)により本明細書に例証されている。本明細書に開示されているように、AIMは、集団レベルにおいて(人種に関して)、下位集団レベルにおいて(民族性に関して)、および微小群レベルにおいて(民族群内の家系に関して)、加えて実践的な、表現型的に認定されるレベルにおいて(例えば、症例および対照)、個体または個体の集合の生物地理学的祖先(BGA)を推定するために用いられうる。下位群および個体レベルにおけるそのような祖先推定は、例えば、個体が特定の薬物に応答する見込みまたは疾患を発生する個体の性向を含む、質的にまたは集団間の頻度において異なる表現型の遺伝的性質に関して直接的に指導的でありうる。祖先推定はまた、これらの形質の根底にある遺伝子を同定するための混合マッピング(AM)方法の使用になくてはならない基礎を提供することができる。 AIM is a locus that represents an allele with a high frequency difference between populations. AIM is generally exemplified herein by single nucleotide polymorphisms (SNP; see, eg, SEQ ID NO: 1), and deletion / insertion polymorphisms (DIP; see, eg, SEQ ID NO: 363). ing. As disclosed herein, AIM is practiced at the population level (with respect to race), at the subgroup level (with respect to ethnicity), and at the microgroup level (with respect to families within ethnic groups). Can be used to estimate the biogeographic ancestry (BGA) of an individual or a collection of individuals at a typical, phenotypically qualified level (eg, cases and controls). Such ancestry estimates at the subgroup and individual levels, for example, include genetic phenotypes that differ qualitatively or in frequency between populations, including the likelihood that an individual will respond to a particular drug or the propensity of an individual to develop a disease Can be directly instructive about the nature. Ancestral estimation can also provide an essential basis for the use of mixed mapping (AM) methods to identify the genes underlying these traits.

本明細書に例証されているように、71個のAIM(配列番号：1〜71)のパネルは、800個を超える候補AIMの調査から同定され(配列番号：72〜331もまた参照)、これらのAIMを比例的祖先の正確な推定を得るための手段として調査するために方法が開発された。本発明の方法およびマーカーは、モデル表現型として皮膚色素形成を用いる研究において確証された(国際公開公報第02/097047号(PCT/US02/16789)もまた参照されたい、それは参照として本明細書に組み入れられている)。最初のマーカーは、主としてアフリカ人祖先を含む2つの集団サンプル、ワシントンD.C.からのアフリカ系アメリカ人およびイングランドからのアフリカ系カリブ人のサンプルにおいて、ならびにペンシルバニアからのヨーロッパ系アメリカ人のサンプルにおいてジェノタイピングされた(実施例1を参照)。2つのアフリカの集団サンプルにおいて、個体の祖先の推定と反射率測定により測定される場合の皮膚色素形成との間に非常に強い相関が観察された(アフリカ系アメリカ人のサンプルについてR²=0.21、p<0.0001、および英国のアフリカ系カリブ人のサンプルについてR²=0.16、p<0.0001)。これらの相関は、祖先推定の妥当性を確証させ、また、これらの集団を特徴付け、かつ遺伝的構造を同定する他の検査を用いて検出可能である、混合に関連した高レベルの集団構造を示した。これらの結果は、個体の祖先の推定が、比較的少数の十分に定義された遺伝マーカー(AIM)を用いるDNA解析に基づいてなされうることを実証している。 As illustrated herein, a panel of 71 AIMs (SEQ ID NOs: 1-71) was identified from a survey of over 800 candidate AIMs (see also SEQ ID NOs: 72-331), A method was developed to investigate these AIMs as a means to obtain an accurate estimate of proportional ancestry. The methods and markers of the present invention have been validated in studies using skin pigmentation as a model phenotype (see also WO 02/097047 (PCT / US02 / 16789), which is hereby incorporated by reference. Incorporated in). The first markers were genotyped in two population samples, mainly including African ancestry, in African American samples from Washington DC and African Caribbean samples from England, and in European American samples from Pennsylvania. (See Example 1). In two African population samples, a very strong correlation was observed between estimation of individual ancestry and skin pigmentation as measured by reflectance measurements (R ² = 0.21 for African American samples) , P <0.0001, and R ² = 0.16, p <0.0001 for a sample of British African Caribbean. These correlations confirm the validity of the ancestor estimates and also identify high-level population structures associated with mixing that can be detected using other tests that characterize these populations and identify their genetic structure showed that. These results demonstrate that estimation of an individual's ancestry can be made based on DNA analysis using a relatively small number of well-defined genetic markers (AIM).

本明細書に開示された方法および遺伝マーカーは、例えば、1)彼らのDNAから個体における祖先割合の推定のため；2)遺伝的調査のために一般に用いられる研究設計の対照としての遺伝的構造の推定のため；3)法医学の調査において意味をもちうる、祖先に関連した特徴の推論による身体的プロフィールの構築のため；4)「祖先連鎖不平衡によるマッピング」(MALD)と呼ばれる、疾患素因の同定のため；および5)処方薬および売薬に対する個々の患者の応答の有意な部分を予測するためを含む、いくつかの別個の目的のためのツールを提供する。それとして、本発明は、例えば、1)個体内の遺伝子配列から祖先の割合の測定のための統計的方法および使用の例；2)個体または研究群内における祖先の割合の測定に有用であるとして、統計的方法を用いて、公的に利用可能な一塩基多型(SNP)データベースから選別かつ同定された数百個のAIM；3)個体または研究群内における祖先の割合の測定に有用であるとして実証された数百個のAIM；および4)個体または研究群内における祖先の割合の決定のために用いられうるソフトウェアプログラムを提供する。 The methods and genetic markers disclosed herein include, for example, 1) for estimation of ancestry proportions in individuals from their DNA; 2) genetic structure as a control for research designs commonly used for genetic investigations. 3) To build a physical profile by inferring ancestor-related features that can be meaningful in forensic research; 4) Disease predisposition called “mapping by ancestral linkage disequilibrium” (MALD) And 5) provide tools for several distinct purposes, including to predict a significant portion of individual patient response to prescription drugs and drug sales. As such, the present invention is useful, for example, for 1) examples of statistical methods and uses for measuring ancestry percentages from gene sequences within individuals; 2) measuring ancestry percentages within individuals or research groups As a statistical method, several hundred AIMs selected and identified from publicly available single nucleotide polymorphism (SNP) databases; 3) useful for measuring the proportion of ancestors within an individual or study group Hundreds of AIMs that have been demonstrated to be; and 4) provide software programs that can be used to determine the proportion of ancestors within an individual or study group.

以前には、特定の形質に関連した遺伝子のマーカーを同定する試みを混乱させると考えられていた、サンプリング効果および自然のヒト人口統計学を含む、集団構造の2つの源を制御するために努力がなされた。しかしながら、本明細書に開示されているように、集団構造は、ヒト人口統計学を反映し、形質値と相関するマーカーは、形質値と相関する構造のレポーターとして有用であり(表現型活性遺伝子座についてのLDにおけるマーカーよりむしろ)、それゆえに、費用効果が高くかつ実践的な様式で正確な分類を可能にする価値のあるツールを提供する。集団構造による形質と関連した対立遺伝子は、表現型活性遺伝子座に連鎖しておらず、それらが、形質値がよりありふれている人類系図の分派に濃縮されているため、単に形質値と相関しているにすぎない。本明細書に開示されているように、人類系図の様々な分派の間での形質値の分布は、正確な分類が、形質の生物学的機構の完全な理解よりむしろ、その構造の正しい認識によるだけで得られうるようなことであり、結果として、表現型活性遺伝子座を同定するための使用に関して考えられた場合、偽陽性とみなされたマーカーが、実際には、正確な分類解析を可能にすることができる；すなわち、それらが由来した構造がサンプリング効果よりむしろヒト人口統計学の反映であるとの条件で、それらは真の陽性である。本方法は、マーカーとBGAの間の相関に基づいており、BGAは、それ自身、形質値と相関した複雑性の相当なレベルにあり、連鎖または連鎖不平衡ではない。 Efforts to control two sources of population structure, including sampling effects and natural human demography, previously thought to disrupt attempts to identify genetic markers associated with specific traits It has been made. However, as disclosed herein, population structure reflects human demographics, and markers that correlate with trait values are useful as reporters of structures that correlate with trait values (phenotypically active genes (Rather than a marker in the LD for the locus), therefore, provides a valuable tool that allows accurate classification in a cost-effective and practical manner. Alleles associated with traits by population structure are not linked to phenotypically active loci and are simply correlated with trait values because they are enriched in a branch of the human genealogy where trait values are more common. It ’s just that. As disclosed herein, the distribution of trait values among various sects of the human genealogy is such that accurate classification is a correct recognition of its structure rather than a complete understanding of the biological mechanism of the trait. As a result, when considered for use to identify a phenotypically active locus, a marker that was considered a false positive actually did an accurate classification analysis. That is, they are truly positive, provided that the structure from which they are derived is a reflection of human demographics rather than sampling effects. The method is based on the correlation between the marker and BGA, which itself is at a considerable level of complexity correlated with the trait value and is not linked or linkage disequilibrium.

従って、本発明は、所定の信頼水準を以て、個体の形質を推論する方法を提供する。一つの態様において、本発明の方法は、検査個体の核酸分子試料を、少なくとも約10個のAIMのパネルの一塩基多型(SNP)のヌクレオチド出現を検出することができるハイブリダイズするオリゴヌクレオチドと接触させる段階；および個体におけるAIMのヌクレオチド出現と相関する、または、の傾向である可能性が最も高い、集団構造を、所定の信頼水準を以て同定する段階であって、集団構造が形質と相関している、段階により行われる。AIMのパネルは、それらのデルタ値(下記参照)において、および関連性のあるところでは、方法を行うために用いられる特定のプラットフォームに基づいて選択され、形質に相関した集団構造を示す。AIMは、配列番号：1〜331として示されるポリヌクレオチドにより本明細書に例示され、SNP位置は、一般的に、ヌクレオチド50位にある(しかし、例えば、配列番号：35、ヌクレオチド56位；配列番号：51、48位；配列番号：56、26位を参照されたい)。 Therefore, the present invention provides a method for inferring an individual trait with a predetermined confidence level. In one embodiment, the method of the invention comprises subjecting a nucleic acid molecule sample of a test individual to a hybridizing oligonucleotide capable of detecting the nucleotide occurrence of a single nucleotide polymorphism (SNP) of at least about 10 AIM panels. Identifying a population structure that correlates or is most likely to be a trend of AIM nucleotides in an individual, wherein the population structure correlates with a trait Is done in stages. The AIM panel shows population structures that are selected and based on the specific platform used to perform the method, in their delta values (see below) and where relevant. AIM is exemplified herein by the polynucleotides shown as SEQ ID NOs: 1-331, and the SNP position is generally at nucleotide position 50 (but, for example, SEQ ID NO: 35, nucleotide position 56; sequence No: 51, 48; see SEQ ID NO: 56, 26).

形質が推論されることになっている検査個体は、形質を推論することが望ましい任意の個体でありうり、一般的にヒトである。しかしながら、本発明の方法はまた、例えば、ネコ、イヌもしくはウマのような家畜；ウシ、ヒツジ、ブタもしくはヤギのような農業用家畜；または他の動物を含む、他の哺乳動物の形質を推論するために用いられうる。調べられうる形質は、本明細書に例証されているように、比例的祖先(BGA)；髪、皮膚もしくは虹彩の色素形成；または薬物応答性を含む、対象となる任意の形質でありうる。 The test individual whose trait is to be inferred can be any individual whose trait is desired to be inferred, and is generally a human. However, the method of the present invention also infers traits of other mammals including, for example, livestock such as cats, dogs or horses; agricultural livestock such as cattle, sheep, pigs or goats; or other animals Can be used to The trait that can be examined can be any trait of interest, including proportional ancestry (BGA); hair, skin, or iris pigmentation; or drug responsiveness, as illustrated herein.

本発明の方法は、所定の信頼水準を以て所望の形質について推論がなされるのを可能にするため、特に有用である。本明細書に用いられる場合、「所定の信頼水準」への言及は、本発明の推論または推定が、平均または最大尤度値について決定される信頼区間を与える統計的方法を用いてなされることを意味する。個体内またはサンプル構造内の最大尤度値を決定することに加えて、他の同様に可能性が高い値もまた決定されうり、これらはx倍尤度信頼区間(xは2、5または10のような任意の数である)を定義するように組み合わせられうる。例えば、最大尤度値より10倍低い尤度値に対応するすべての構造結果は、10倍尤度信頼区間を定義するようにプロットまたはリストされうる。任意の統計的検定に関するかぎりでは、本発明のアッセイ法は、検定の実行が結果として、所望の信頼水準をもつ値を生じるように設計される。本明細書に開示されているように、本発明の方法は、結果が、形質に関して調べられるAIMの数を変えることにより、所定の信頼水準をもつように行われうる。例えば、10個のAIMのある特定のパネルの使用は、個体がある特定の信頼水準を以て特定の形質、例えばLipitor(商標)に対する応答性、をもつかどうかに関して推論がなされるのを可能にするが、10個のAIMのパネルと部分的に、必要ではないが、重複していることができる20個のAIMのパネルの使用は、同じ推論であるが、より高い信頼水準を以てなされるのを可能にする。同様に、各10個のAIMの2つのパネルの使用は、個体が、例えば、80%インドヨーロッパ人祖先および20%東アジア人祖先(誤差、例えば±10%を以て)をもつと推論がなされるのを可能にしうるが、各20個のAIMの2つのパネルの使用は、同じ推論であるが、例えば±5%の、誤差を以て可能になりうる。 The method of the present invention is particularly useful because it allows inferences about the desired trait with a predetermined confidence level. As used herein, reference to a “predetermined confidence level” is made using a statistical method in which the inference or estimation of the present invention provides a confidence interval determined for an average or maximum likelihood value. Means. In addition to determining the maximum likelihood value within an individual or sample structure, other similarly likely values can also be determined, which are x-times likelihood confidence intervals (x is 2, 5, or 10 Can be combined to define any number such as For example, all structural results corresponding to likelihood values 10 times lower than the maximum likelihood value can be plotted or listed to define a 10-times likelihood confidence interval. As far as any statistical test is concerned, the assay method of the present invention is designed such that execution of the test results in a value with the desired confidence level. As disclosed herein, the methods of the invention can be performed such that the results have a predetermined confidence level by changing the number of AIMs examined for a trait. For example, the use of a particular panel of 10 AIMs allows an inference to be made regarding whether an individual has a particular trait, such as responsiveness to Lipitor ™, with a particular confidence level However, the use of 20 AIM panels that can be duplicated, though not necessary, in part with 10 AIM panels is the same reasoning but is made with a higher confidence level. enable. Similarly, the use of two panels of 10 AIMs is inferred that an individual has, for example, 80% Indo-European ancestry and 20% East Asian ancestry (with an error, eg, ± 10%) Although the use of two panels of 20 AIMs each is the same reasoning, it can be possible with an error of, for example, ± 5%.

本発明の方法を実施するために有用な試料は、調べられることになっているAIMを含む遺伝子配列の部分を含む、核酸分子を含む検査個体の任意の生物学的試料、またはAIMの多型が結果としてコードされたポリペプチドにおけるアミノ酸変化を生じている、そのコードされたポリペプチドを含む任意の生物学的試料でありうる。それとして、試料は、細胞、組織もしくは器官試料でありうる、または精液、唾液、血液、髄液などのような生物学的液体の試料でありうる。 Samples useful for practicing the methods of the invention include any biological sample of a test individual that contains a nucleic acid molecule, including a portion of a gene sequence that contains the AIM to be examined, or a polymorphism of AIM Can be any biological sample containing the encoded polypeptide that results in an amino acid change in the encoded polypeptide. As such, the sample can be a cell, tissue or organ sample, or can be a sample of a biological fluid such as semen, saliva, blood, spinal fluid, and the like.

本発明の方法を実施するために有用な核酸試料は、同定されるべきSNPが、コード領域にあるのかまたは非コード領域にあるのかに、一部、依存するものである。1つまたは複数のSNPが遺伝子の非コード領域に存在するところにおいて、核酸試料は一般的に、デオキシリボ核酸(DNA)試料、特にゲノムDNAまたはそれらの増幅産物である。しかしながら、AIMが転写された配列、例えば、rDNA、マイクロサテライトDNA、または非コードRNA配列を含むスプライシングされていないmRNA前駆体RNA分子を含む異核のリボ核酸(RNA)、の内に含まれるところにおいて、RNA試料は、直接的に用いて調べられうる、またはcDNAもしくはその増幅産物は、本方法により調べられうる。1つまたは複数のSNPは遺伝子のコード領域に存在しているところにおいて、核酸試料はDNAもしくはRNA、またはそれら由来の産物、例えば、増幅産物、でありうる。さらになお、本発明の方法は、核酸試料に関して例証されているが、特定のSNPが、遺伝子のコード領域に存在している場合、結果として、非縮重のコドン変化によるSNPに対応する位置における異なるアミノ酸を含むポリペプチドを生じうることは、認識されているものと思われる。それとして、一つの局面において、本発明の方法は、対象のポリペプチドを含む試料を用いて実施される。 Nucleic acid samples useful for practicing the methods of the invention will depend in part on whether the SNP to be identified is in the coding region or in the non-coding region. Where one or more SNPs are present in the non-coding region of a gene, the nucleic acid sample is generally a deoxyribonucleic acid (DNA) sample, particularly genomic DNA or an amplification product thereof. However, where AIM is contained within a transcribed sequence, for example, rDNA, microsatellite DNA, or heteronuclear ribonucleic acid (RNA) containing an unspliced mRNA precursor RNA molecule containing a non-coding RNA sequence. In this case, RNA samples can be examined directly or cDNA or amplification products thereof can be examined by this method. Where one or more SNPs are present in the coding region of a gene, the nucleic acid sample can be DNA or RNA, or a product derived therefrom, eg, an amplification product. Furthermore, although the method of the invention is illustrated with respect to nucleic acid samples, if a particular SNP is present in the coding region of a gene, the result is a position at a position corresponding to the SNP due to non-degenerate codon changes. It will be appreciated that polypeptides containing different amino acids can be generated. As such, in one aspect, the methods of the invention are performed using a sample containing the polypeptide of interest.

本発明の方法は、試料を接触させる段階、およびハイブリダイズするオリゴヌクレオチドにより個体のAIMのヌクレオチド出現を検出するのに適した条件下でオリゴヌクレオチドをハイブリダイズさせる段階により行われる。さらに、本発明の方法の局面において、試料は、第二のハイブリダイズするオリゴヌクレオチドに、例えば、下位構造構造を測定するために、接触させられうる。用語「第二の」とは、ハイブリダイズするオリゴヌクレオチド(またはAIMのパネル)に関して用いられる場合、例えば、方法を行うための段階の、明瞭な区別を可能にするように考察の便宜上、用いられることは、認識されるべきである。この点で、例えば集団構造を測定するために用いられる1つまたは複数のハイブリダイズするオリゴヌクレオチドはまた、第二のハイブリダイズするオリゴヌクレオチドの中に含まれうることは、さらに認識されるべきである。 The method of the invention is performed by contacting the sample and hybridizing the oligonucleotide under conditions suitable for detecting the nucleotide appearance of the individual's AIM by the hybridizing oligonucleotide. Further, in an aspect of the method of the invention, the sample can be contacted with a second hybridizing oligonucleotide, for example, to determine the substructure. The term "second" is used for convenience of discussion when used with respect to hybridizing oligonucleotides (or a panel of AIMs), for example to allow a clear distinction of the steps for performing the method. That should be recognized. In this regard, it should further be appreciated that one or more hybridizing oligonucleotides used, for example, to determine population structure can also be included in the second hybridizing oligonucleotide. is there.

AIMのヌクレオチド出現を検出するために適した条件は、長さおよび相補性を含むハイブリダイズするオリゴヌクレオチドの配列に、加えて用いられることになっている特定のアッセイ法、および、例えば、アッセイ法が多重アッセイ法として行われることになっているかどうかに依存して変わるものである。少なくとも15ヌクレオチド長であるハイブリダイズするオリゴヌクレオチドは、ホスホジエステル結合により共に連結されるデオキシリボヌクレオチドまたはリボヌクレオチドを含みうり、それらは一般的に一本鎖型で用いられるが、一本鎖または二本鎖でありうる。そのようなハイブリダイズするオリゴヌクレオチドは、化学合成の方法を用いて、またはポリメラーゼ連鎖反応(PCR)のような酵素的方法により、調製されうる。 Suitable conditions for detecting the nucleotide occurrence of AIM include specific assays that are to be used in addition to the hybridizing oligonucleotide sequence, including length and complementarity, and, for example, assay methods. Depending on whether or not is to be performed as a multiplex assay. Hybridizing oligonucleotides that are at least 15 nucleotides long may include deoxyribonucleotides or ribonucleotides linked together by phosphodiester bonds, which are commonly used in single-stranded form, but are single-stranded or double-stranded. It can be a chain. Such hybridizing oligonucleotides can be prepared using chemical synthesis methods or by enzymatic methods such as polymerase chain reaction (PCR).

方法において有用な、もしくは本発明のキットに含まれる、ハイブリダイズするオリゴヌクレオチドまたは他のポリヌクレオチドはまた、ヌクレオシドまたはヌクレオチド類似体を含みうり、ホスホジエステル結合以外のバックボーン結合をもちうり、そのようなオリゴヌクレオチドは、増加した安定性またはより望ましいハイブリダイゼーション性質をもつような特定の利点を与える。ヌクレオチド類似体は当技術分野においてよく知られており、そのようなヌクレオチド類似体を含むポリヌクレオチドであるが、商業的に入手可能である(Linら、Nucl. Acids Res. 22:5220-5234、1994；Jellinekら、Biochemistry 34:11363-11372、1995；Pagratisら、Nature Biotechnol. 15:68-73、1997、それぞれは参照として本明細書に組み入れられている)。共有結合はまた、チオジエステル結合、ホスホロチオエート結合、ペプチド様結合、または合成オリゴヌクレオチドを作製するためにヌクレオチドを連結するのに有用として当業者に公知の任意の他の結合を含む、多数の他の結合のいずれかでありうる(例えば、Tamら、Nucl. Acids Res. 22:977-986、1994；EckerおよびCrooke、BioTechnology 13:351-360、1995、それぞれは参照として本明細書に組み入れられている)。天然に存在しないヌクレオチド類似体またはヌクレオチドもしくは類似体を連結する結合は、修飾されたオリゴヌクレオチドが分解に対して感受性がより低くありうるため、例えば、組織培養培地または細胞抽出物を含む試料を含む、核酸分解性活性を含みうる環境にオリゴヌクレオチドが曝されることになっているところにおいて、特に有用でありうる。 Hybridizing oligonucleotides or other polynucleotides useful in the methods or included in kits of the invention may also contain nucleosides or nucleotide analogs, may have backbone linkages other than phosphodiester linkages, and the like Oligonucleotides offer certain advantages such as increased stability or more desirable hybridization properties. Nucleotide analogs are well known in the art and are polynucleotides containing such nucleotide analogs, but are commercially available (Lin et al., Nucl. Acids Res. 22: 5220-5234, 1994; Jellinek et al., Biochemistry 34: 11363-11372, 1995; Pagratis et al., Nature Biotechnol. 15: 68-73, 1997, each incorporated herein by reference). Covalent bonds also include a number of other bonds, including thiodiester bonds, phosphorothioate bonds, peptide-like bonds, or any other bond known to those skilled in the art as useful for linking nucleotides to make synthetic oligonucleotides. (Eg, Tam et al., Nucl. Acids Res. 22: 977-986, 1994; Ecker and Crooke, BioTechnology 13: 351-360, 1995, each of which is incorporated herein by reference. ) Non-naturally occurring nucleotide analogs or linkages that link nucleotides or analogs include, for example, samples containing tissue culture media or cell extracts, as modified oligonucleotides may be less susceptible to degradation. It can be particularly useful where the oligonucleotide is to be exposed to an environment that may contain nucleolytic activity.

一般的に、本発明の目的に有用なハイブリダイズするオリゴヌクレオチドは、オリゴヌクレオチドがAIMを含む標的ポリヌクレオチドに選択的にハイブリダイズすることを可能にするのに十分である、少なくとも約15塩基長であり、少なくとも約18ヌクレオチド長、または21ヌクレオチド長または25ヌクレオチド長またはそれ以上でありうる。用語「選択的ハイブリダイゼーション」または「選択的にハイブリダイズする」とは、関連したヌクレオチド配列を関連していないヌクレオチド配列から区別することができる、中位にストリンジェントな、または高くストリンジェントな生理学的条件下におけるハイブリダイゼーションを指す。核酸ハイブリダイゼーション反応において、ストリンジェント性の特定のレベルを達成するために用いられる条件は、例えば、長さ、相補性の程度、ヌクレオチド配列構成要素(例えば、相対的GC：AT含有量)、および核酸の型、すなわち、オリゴヌクレオチドまたは標的核酸配列はDNAであるかRNAであるか、を含む、ハイブリダイズされることになっている核酸の性質に依存して変わることは知られている。追加の考慮すべきことは、核酸の1つは、例えば、フィルター、ビーズ、チップまたは他の固体マトリックス上に固定化されているかどうかである。 In general, a hybridizing oligonucleotide useful for the purposes of the present invention is at least about 15 bases long, which is sufficient to allow the oligonucleotide to hybridize selectively to a target polynucleotide comprising AIM. And can be at least about 18 nucleotides in length, or 21 nucleotides in length or 25 nucleotides in length or longer. The term “selective hybridization” or “selectively hybridize” means a moderately stringent or highly stringent physiology that can distinguish related nucleotide sequences from unrelated nucleotide sequences. Refers to hybridization under dynamic conditions. Conditions used to achieve a particular level of stringency in a nucleic acid hybridization reaction include, for example, length, degree of complementation, nucleotide sequence components (eg, relative GC: AT content), and It is known that the type of nucleic acid, ie the oligonucleotide or target nucleic acid sequence, will vary depending on the nature of the nucleic acid to be hybridized, including DNA or RNA. An additional consideration is whether one of the nucleic acids is immobilized on, for example, a filter, bead, chip or other solid matrix.

適切なストリンジェント性条件を選択するための方法は、経験的に決定されうるまたは様々な式を用いて推定されうり、当技術分野においてよく知られている(例えば、Sambrookら、前記、1989を参照)。段々に高くなるストリンジェント性条件の例は以下のとおりである：約室温において2X SSC/0.1% SDS(ハイブリダイゼーション条件)；約室温において0.1 % SDS(低ストリンジェント性条件)；約42℃において0.2X SSC/0.1% SDS(中位ストリンジェント性条件)；および約68℃において0.1X SSC(高ストリンジェント性条件)。洗浄は、これらの条件の1つだけ、例えば高ストリンジェント性条件、を用いて行われうる、または、例えば、各10〜15分間、上で列挙された順に、列挙された段階のいずれかまたはすべてを繰り返して、各条件が用いられうる。それとして、最終的条件は、含まれる特定のハイブリダイゼーション反応に依存して変わるものであり、経験的に決定されうる。様々な条件が選択的ハイブリダイゼーション条件を与えるように利用されうることは、認識されるべきである。例えば、多重アッセイ法が、パネルの異なるAIMに特異的な複数の異なるハイブリダイズするオリゴヌクレオチドを用いて行われることになっている場合、条件(および、AIM/ハイブリダイズするオリゴヌクレオチド)は、選択的ハイブリダイゼーションが反応においてすべてのハイブリダイズするオリゴヌクレオチドについて起こるように選択されうる。 Methods for selecting appropriate stringency conditions can be determined empirically or can be estimated using various equations and are well known in the art (see, for example, Sambrook et al., Supra, 1989. reference). Examples of increasingly stringent conditions are: 2X SSC / 0.1% SDS (hybridization conditions) at about room temperature; 0.1% SDS (low stringency conditions) at about room temperature; 0.2X SSC / 0.1% SDS (medium stringency conditions); and 0.1X SSC (high stringency conditions) at about 68 ° C. Washing can be performed using only one of these conditions, e.g., high stringency conditions, or any of the listed steps, e.g., in the order listed above for 10-15 minutes each, or Everything can be repeated and each condition can be used. As such, the final conditions will vary depending on the particular hybridization reaction involved and can be determined empirically. It should be appreciated that a variety of conditions can be utilized to provide selective hybridization conditions. For example, if a multiplex assay is to be performed with multiple different hybridizing oligonucleotides specific for different AIMs in the panel, the conditions (and AIM / hybridizing oligonucleotides) are selected. Selective hybridization can be selected to occur for all hybridizing oligonucleotides in the reaction.

様々な態様において、ポリヌクレオチドまたはハイブリダイズするオリゴヌクレオチドを検出可能に標識することが有用でありうる。ポリヌクレオチドの検出可能な標識化は、当技術分野においてよく知られており、例えば、化学ルミネセンス標識、放射性核種、酵素、ジゴキシゲニンおよびビオチンのようなハプテン、フルオロフォア、ならびに固有のオリゴヌクレオチド配列のような検出可能な標識の使用を含む。例えば、PCR産物が行われうるが、一方のプライマーがビオチン化され、かつ他方のプライマーがジゴキシゲニンを含んでいる。その後、増幅産物は、ストレプトアビジンプレートに結合され、洗浄され、ジゴキシゲニンへの酵素結合型抗体と反応させられ、酵素についての色素生産の、蛍光発生のまたは化学ルミネセンスの基質で発色させられうる。または、放射性方法が、例えば、放射性標識されたデオキシヌクレオシド三リン酸を増幅反応へ含め、その後、増幅産物を検出のためにDEAEペーパー上へブロットすることにより、生成された増幅産物を検出するために用いられうる。さらに、1つのプライマーがビオチン化されている場合には、ストレプトアビジンコーティング化シンチレーション近接アッセイプレートがPCR産物を測定するために用いられうる。追加の検出方法は、化学ルミネセンス標識、例えば、DELFIA(登録商標)(Pall Corp.)に用いられるようなランタニドキレート、蛍光標識、またはルテニウムトリスビピリジル(ORI-GEN)のような電気化学ルミネセンス標識を用いうる。 In various embodiments, it may be useful to detectably label a polynucleotide or hybridizing oligonucleotide. Detectable labeling of polynucleotides is well known in the art and includes, for example, chemiluminescent labels, radionuclides, enzymes, haptens such as digoxigenin and biotin, fluorophores, and unique oligonucleotide sequences. Use of such detectable labels. For example, a PCR product can be performed, but one primer is biotinylated and the other primer contains digoxigenin. The amplification product can then be bound to a streptavidin plate, washed, reacted with an enzyme-linked antibody to digoxigenin, and developed with a chromogenic, fluorogenic or chemiluminescent substrate for the enzyme. Alternatively, a radioactive method may be used to detect the generated amplification product, for example, by including radiolabeled deoxynucleoside triphosphates in the amplification reaction and then blotting the amplification products onto DEAE paper for detection. Can be used. Furthermore, if one primer is biotinylated, streptavidin-coated scintillation proximity assay plates can be used to measure PCR products. Additional detection methods include chemiluminescent labels, e.g., lanthanide chelates such as those used in DELFIA® (Pall Corp.), fluorescent labels, or electrochemiluminescents such as ruthenium trisbipyridyl (ORI-GEN). A label may be used.

AIMのSNPまたはDIP位置におけるヌクレオチド出現を検出するための方法は、AIMに及ぶ標的ポリヌクレオチドに選択的にハイブリダイズする、例えば、増幅プライマー対を含む、1つまたは複数のオリゴヌクレオチドプローブまたはプライマーを利用することができる。本発明の方法を実施するにおいて有用なオリゴヌクレオチドプローブは、例えば、SNP(またはDIP)の位置を含む標的ポリヌクレオチドの部分に相補的であり、かつ及ぶオリゴヌクレオチドを含みうり、SNPの位置における特定のヌクレオチドの存在が、プローブの選択的ハイブリダイゼーションの存在または非存在により検出される。そのような方法は、標的ポリヌクレオチドおよびハイブリダイズされたオリゴヌクレオチドをエンドヌクレアーゼに接触させる段階、ならびにSNP部位におけるヌクレオチド出現がプローブの対応するヌクレオチドと相補的であるかどうかに依存する、プローブの切断生成物の存在または非存在を検出する段階をさらに含みうる。SNPの部位に近接しかつ上流、および近接しかつ下流に特異的にハイブリダイズする1対のプローブであって、プローブの1つがSNPのヌクレオチド出現に相補的なヌクレオチドを含んでいる、1対のプローブはまた、オリゴヌクレオチドライゲーションアッセイ法に用いられうり、ライゲーション産物の存在または非存在がSNP部位におけるヌクレオチド出現を示す。オリゴヌクレオチドはまた、プライマー、例えばプライマー伸長反応のための、として有用でありうり、伸長反応の産物(または産物の非存在)がヌクレオチド出現を示す。さらに、SNPまたはDIP部位を含む標的ポリヌクレオチドの部分を増幅するために有用なプライマー対が有用でありうり、増幅産物は、SNP部位におけるヌクレオチド出現を測定するために、またはDIP部位において挿入もしくは欠失があるかどうかを測定するために調べられる。 A method for detecting the occurrence of a nucleotide at the SNP or DIP position of AIM selectively comprises one or more oligonucleotide probes or primers that hybridize selectively to a target polynucleotide spanning AIM, for example, including an amplification primer pair. Can be used. Oligonucleotide probes useful in practicing the methods of the invention include, for example, oligonucleotides that are complementary to and span the portion of the target polynucleotide that includes the position of the SNP (or DIP) and are identified at the position of the SNP. Is detected by the presence or absence of selective hybridization of the probe. Such a method involves contacting the target polynucleotide and hybridized oligonucleotide with an endonuclease, and cleaving the probe, depending on whether the nucleotide occurrence at the SNP site is complementary to the corresponding nucleotide of the probe. It may further comprise detecting the presence or absence of the product. A pair of probes that specifically hybridize close to, upstream, and close to and downstream of the SNP site, one of the probes comprising nucleotides complementary to the nucleotide occurrence of the SNP Probes can also be used in oligonucleotide ligation assays, where the presence or absence of a ligation product indicates nucleotide appearance at the SNP site. Oligonucleotides can also be useful as primers, eg, for primer extension reactions, where the product (or absence of product) of the extension reaction indicates nucleotide appearance. In addition, primer pairs useful for amplifying portions of the target polynucleotide containing SNP or DIP sites may be useful, and amplification products may be inserted or deleted to measure nucleotide appearance at the SNP site or at the DIP site. Checked to determine if there is a loss.

ポリヌクレオチドでの特定の位置における(すなわち、SNPまたはDIPの)、ヌクレオチド出現を測定するために多数の方法が知られている。そのような方法は、例えば、1つまたは複数のSNP位置を含む標的ポリヌクレオチドに選択的にハイブリダイズする、例えば、増幅プライマー対を含む、1つまたは複数のオリゴヌクレオチドプローブまたはプライマーを利用することができる。本発明の方法を実施するにおいて有用なハイブリダイズするオリゴヌクレオチドは、例えば、SNPまたはDIP(DIPが欠失または挿入をもつかどうかを含む)の位置を含む標的ポリヌクレオチドの部分に相補的でありかつ及ぶオリゴヌクレオチドを含みうり、SNP部位における特定のヌクレオチドの存在、またはDIP部位における欠失もしくは挿入の存在が、オリゴヌクレオチドプローブの選択的ハイブリダイゼーションの存在または非存在により検出される。そのような方法は、標的ポリヌクレオチドおよびハイブリダイズされたオリゴヌクレオチドをエンドヌクレアーゼに接触させる段階、ならびにSNP部位におけるヌクレオチド出現がプローブの対応するヌクレオチドと相補的であるかどうかに依存する、プローブの切断生成物の存在または非存在を検出する段階をさらに含みうる。 A number of methods are known for measuring nucleotide appearance at a particular position in a polynucleotide (ie, SNP or DIP). Such methods utilize, for example, one or more oligonucleotide probes or primers that selectively hybridize to a target polynucleotide that includes one or more SNP positions, for example, including an amplification primer pair. Can do. Hybridizing oligonucleotides useful in practicing the methods of the invention are complementary to the portion of the target polynucleotide that includes, for example, the position of SNP or DIP (including whether DIP has a deletion or insertion). The presence of a specific nucleotide at the SNP site or the presence of a deletion or insertion at the DIP site is detected by the presence or absence of selective hybridization of the oligonucleotide probe. Such a method involves contacting the target polynucleotide and hybridized oligonucleotide with an endonuclease, and cleaving the probe, depending on whether the nucleotide occurrence at the SNP site is complementary to the corresponding nucleotide of the probe. It may further comprise detecting the presence or absence of the product.

オリゴヌクレオチドライゲーションアッセイ法はまた、SNP部位におけるヌクレオチド出現を同定するために用いられうり、1対のプローブは、SNPの部位に近接しかつ上流、および近接しかつ下流を選択的にハイブリダイズし、かつプローブの1つは、SNPのヌクレオチド出現に相補的な末端ヌクレオチドを含む。プローブの末端ヌクレオチドがヌクレオチド出現に相補的であるところにおいて、選択的ハイブリダイゼーションは、リガーゼの存在下において、上流および下流オリゴヌクレオチドがライゲーションされるように、末端ヌクレオチドを含む。それとして、ライゲーション産物の存在または非存在は、SNP部位におけるヌクレオチド出現を示す。 Oligonucleotide ligation assays can also be used to identify nucleotide occurrences at SNP sites, where a pair of probes selectively hybridize close and upstream, and close and downstream to the SNP site, And one of the probes contains a terminal nucleotide complementary to the nucleotide occurrence of the SNP. Where the terminal nucleotide of the probe is complementary to the nucleotide occurrence, selective hybridization includes the terminal nucleotide such that the upstream and downstream oligonucleotides are ligated in the presence of ligase. As such, the presence or absence of a ligation product indicates the occurrence of a nucleotide at the SNP site.

ハイブリダイズするオリゴヌクレオチドはまた、プライマー、例えばプライマー伸長反応のための、として有用でありうり、伸長反応の産物(または産物の非存在)が、SNP部位におけるヌクレオチド出現またはDIP部位における挿入もしくは欠失を示す。さらに、SNPまたはDIP部位を含む標的ポリヌクレオチドの部分を増幅するために有用なプライマー対が有用でありうり、増幅産物は、SNP部位におけるヌクレオチド出現またはDIP部位における欠失もしくは挿入の存在を測定するために調べられる。特に有用な方法は、高処理量形式に、多重形式に、または両方に容易に適応させうるものを含む。 Hybridizing oligonucleotides can also be useful as primers, such as for primer extension reactions, where the product of the extension reaction (or absence of product) is a nucleotide occurrence at the SNP site or an insertion or deletion at the DIP site. Indicates. In addition, primer pairs useful for amplifying the portion of the target polynucleotide containing the SNP or DIP site may be useful, and the amplification product measures the occurrence of nucleotides at the SNP site or the presence of deletions or insertions at the DIP site. Be examined for. Particularly useful methods include those that can be easily adapted to high throughput formats, multiple formats, or both.

増幅反応が行われることになっている試料において増幅産物の生成を可能にする条件は、反応が、生じうる増幅反応に必要な構成要素を含むようなことである。そのような条件は、例えば、適切な緩衝能力およびpH、塩濃度、特定のポリメラーゼに必要である場合には金属イオン濃度、プライマーまたはプライマー対の鋳型標的ポリヌクレオチドへの選択的ハイブリダイゼーションを可能にする適切な温度、加えて、ポリメラーゼ活性、および鋳型からまたは、関連性のあるところにおいては、ステム-ループ構造のような二次構造を形成していることからの、プライマーまたはプライマー伸長もしくは増幅産物の融解を可能にする温度の適切なサイクリングを含む。そのような条件およびそのような条件を選択するための方法は、日常的であり、当技術分野においてよく知られている(例えば、Innisら、「PCR Strategies」(Academic Press 1995)；Ausubelら、「Short Protocols in Molecular Biology」、第4版(John Wiley and Sons、1999)、それぞれは参照として本明細書に組み入れられている)。 Conditions that allow the generation of an amplification product in the sample in which the amplification reaction is to be performed are such that the reaction includes the components necessary for a possible amplification reaction. Such conditions allow, for example, appropriate buffer capacity and pH, salt concentration, metal ion concentration if required for a specific polymerase, selective hybridization of a primer or primer pair to a template target polynucleotide. Primer or primer extension or amplification product from the appropriate temperature, in addition to polymerase activity and template, or where relevant, forming a secondary structure such as a stem-loop structure Including cycling at a temperature that allows melting of Such conditions and methods for selecting such conditions are routine and well known in the art (eg, Innis et al., “PCR Strategies” (Academic Press 1995); Ausubel et al., “Short Protocols in Molecular Biology”, 4th edition (John Wiley and Sons, 1999), each incorporated herein by reference).

プライマー伸長または増幅産物は、直接的もしくは間接的に検出されうる、および/または当技術分野において公知の様々な方法を用いてシーケンシングされうる。SNP部位に及ぶ増幅産物は、SNP遺伝子座におけるヌクレオチド出現を測定するために、例えば、ジデオキシ媒介型鎖終結法(dideoxy-mediated chain termination method)(Sangerら、J. Molec. Biol. 94:441、1975；Proberら、Science 238:336-340、1987)または化学分解法(Maxamら、Proc. Natl. Acad. Sci. USA 74:560、1977)を含む、伝統的な配列方法体系を用いてシーケンシングされうる。 Primer extension or amplification products can be detected directly or indirectly, and / or can be sequenced using various methods known in the art. Amplification products spanning the SNP site can be used to measure nucleotide occurrence at the SNP locus, for example, the dideoxy-mediated chain termination method (Sanger et al., J. Molec. Biol. 94: 441, 1975; Prober et al., Science 238: 336-340, 1987) or chemical degradation methods (Maxam et al., Proc. Natl. Acad. Sci. USA 74: 560, 1977). Can be shinged.

SNP部位におけるヌクレオチド出現はまた、マイクロシーケンシング方法を用いて測定されうり、たった1つのヌクレオチドの同定が所定の部位において測定される(米国特許第6,294,336号)。マイクロシーケンシング方法はGenetic Bit Analysis方法を含む(国際公開公報第92/15712号)。DNAにおいて多型部位をアッセイするための追加の、プライマー先導のヌクレオチド取り込み方法もまた記載されている(Komherら、Nucl. Acids. Res. 17:7779-7784、1989；Sokolov、Nucl. Acids Res. 18:3671、1990；Syvanenら、Genomics 8:684-692、1990；Prezanら、Hum. Mutat. 1:159-164、1992；Nyrenら、Anal. Biochem. 208:171-175、1993)。これらの方法は、Genetic Bit(商標)とは異なる。それらはすべて、多型部位における塩基間を識別するために、標識デオキシリボヌクレオチドの取り込みに頼るということにおいての分析。そのような形式において、シグナルは取り込まれたデオキシリボヌクレオチドの数に比例し、同じヌクレオチドのひと続きに起こる多型は、そのひと続きの長さに比例したシグナルを生じる(Syvanenら、Amer. J. Hum. Genet. 52:46-59、1993)。 Nucleotide appearance at the SNP site can also be measured using microsequencing methods, and the identity of only one nucleotide is measured at a given site (US Pat. No. 6,294,336). Microsequencing methods include the Genetic Bit Analysis method (International Publication No. 92/15712). Additional primer-initiated nucleotide incorporation methods for assaying polymorphic sites in DNA have also been described (Komher et al., Nucl. Acids. Res. 17: 7779-7784, 1989; Sokolov, Nucl. Acids Res. 18: 3671, 1990; Syvanen et al., Genomics 8: 684-692, 1990; Prezan et al., Hum. Mutat. 1: 159-164, 1992; Nyren et al., Anal. Biochem. 208: 171-175, 1993). These methods are different from Genetic Bit ™. Analysis in that they all rely on the incorporation of labeled deoxyribonucleotides to distinguish between bases at polymorphic sites. In such a format, the signal is proportional to the number of incorporated deoxyribonucleotides, and polymorphisms that occur in a stretch of the same nucleotide result in a signal that is proportional to the stretch length (Syvanen et al., Amer. Hum. Genet. 52: 46-59, 1993).

SNP位置におけるヌクレオチド出現を測定するためのもう一つの方法は、Maceviczにより記載されており(米国特許第5,002,867号)、核酸配列は、オリゴヌクレオチドプローブの複数の混合物でのハイブリダイゼーションにより測定される。そのような方法に従って、標的ポリヌクレオチドの配列は、1つの位置において不変のヌクレオチド、および他の位置において異なるヌクレオチドを有するプローブのセットに標的が逐次的にハイブリダイズすることを可能にすることにより決定される。ヌクレオチド配列は、標的を1セットのプローブにハイブリダイズさせ、その後、そのセットの少なくとも1つのメンバーが標的にハイブリダイズすることができる部位の数(すなわち、マッチしたものの数)を測定することにより決定される。この工程は、セットのプローブの各メンバーが試験されてしまうまで繰り返される。米国特許第6,294,336号は、SNPが標的に選択的に結合した最も3'側のヌクレオチドである、部位においてポリヌクレオチド標的を選択的に結合するプライマーを利用することにより、核酸分子(DNAまたはRNAのいずれか)の配列を測定するための固相シーケンシング法を提供している。 Another method for measuring nucleotide appearance at SNP positions is described by Macevicz (US Pat. No. 5,002,867), where nucleic acid sequences are measured by hybridization with multiple mixtures of oligonucleotide probes. According to such a method, the sequence of the target polynucleotide is determined by allowing the target to hybridize sequentially to a set of probes having an unaltered nucleotide at one position and a different nucleotide at another position. Is done. The nucleotide sequence is determined by hybridizing the target to a set of probes and then measuring the number of sites (i.e. the number of matches) at which at least one member of the set can hybridize to the target. Is done. This process is repeated until each member of the set of probes has been tested. U.S. Pat. A solid phase sequencing method for measuring any one of the sequences is provided.

試料におけるSNPのヌクレオチド出現はまた、SNP-IT(商標)法(Orchid BioSciences, Inc.、プリンストン、NJ)を用いて測定されうる。一般的に、SNP-IT(商標)は、3段階プライマー伸長反応である。第一段階において、標的ポリヌクレオチドは、捕獲プライマーへのハイブリダイゼーションにより試料から単離され、特異性の第一レベルを与える。第二段階において、捕獲プライマーは、標的SNP部位における終結しているヌクレオチド三リン酸から伸長され、特異性の第二レベルを与える。第三段階において、伸長されたヌクレオチド三リン酸は、以下のものを含む、様々な公知の形式を用いて検出されうる：直接蛍光法、間接蛍光法、間接比色アッセイ法、質量分析法、蛍光偏光法など。反応は、SNPstream(商標)装置(Orchid BioSciences, Inc.、プリンストン、NJ)を用いて、384ウェル形式で自動化形式において処理されうる。相既知データは、SNPstream(商標)装置からの相未知生データをStephens and Donnelly's PHASEプログラムへインプットすることにより作成されうる。 The nucleotide appearance of SNPs in a sample can also be measured using the SNP-IT ™ method (Orchid BioSciences, Inc., Princeton, NJ). In general, SNP-IT ™ is a three-step primer extension reaction. In the first step, the target polynucleotide is isolated from the sample by hybridization to the capture primer, giving a first level of specificity. In the second step, the capture primer is extended from a terminating nucleotide triphosphate at the target SNP site, giving a second level of specificity. In the third step, extended nucleotide triphosphates can be detected using a variety of known formats including: direct fluorescence, indirect fluorescence, indirect colorimetric assay, mass spectrometry, Such as fluorescence polarization. The reaction can be processed in an automated format in a 384 well format using a SNPstream ™ instrument (Orchid BioSciences, Inc., Princeton, NJ). Phase known data can be created by inputting phase unknown raw data from the SNPstream ™ device into the Stephens and Donnelly's PHASE program.

SNPの融解曲線分析(McSNP(登録商標)分析)は、AIMにおけるヌクレオチド出現を検出するためのもう一つの方法を提供する(Akeyら、前記、2001)。McSNP(登録商標)分析は、ゲル電気泳動の段階を必要とせず、従って、SNPを検出するための時間および費用を最小限にする、ならびに高処理量形式に容易に適応させることができ、従って、1つもしくは複数のAIMのパネルおよび/または試料の並行しての検査を可能にするという追加の利点を提供する。 Melting curve analysis of SNPs (McSNP® analysis) provides another method for detecting nucleotide occurrences in AIM (Akey et al., Supra, 2001). McSNP® analysis does not require a gel electrophoresis step, thus minimizing the time and expense to detect SNPs, and can be easily adapted to high throughput formats, and thus Providing the additional advantage of allowing parallel examination of one or more AIM panels and / or samples.

SNPの特定のヌクレオチド出現が、ヌクレオチド出現が結果としてコードされたポリペプチドにおけるアミノ酸変化を生じるようなものであるところにおいて、ヌクレオチド出現は、ポリペプチドにおいて特定のアミノ酸を検出することにより間接的に同定されうる。アミノ酸を測定するための方法は、例えば、ポリペプチドの構造に、またはポリペプチドにおけるアミノ酸の位置に依存する。ポリペプチドが特定のSNPによりコードされるアミノ酸の単一の出現のみを含むところにおいて、ポリペプチドは、そのアミノ酸の存在または非存在について調べられうる。例えば、アミノ酸が、ポリペプチドのアミノ末端もしくはカルボキシ末端に、または近くに、あるところにおいて、末端アミノ酸の簡単なシーケンシングが行われうる。または、ポリペプチドは、1つまたは複数の酵素で処理されうり、対象となるアミノ酸位置を含むペプチド断片が、例えば、ペプチドをシーケンシングすることにより、または電気泳動後にペプチドの特定の移動を検出することにより、調べられうる。特定のアミノ酸がポリペプチドのエピトープを含むところにおいて、エピトープに特異的な抗体の特異的な結合、またはその非存在が検出されうる。ポリペプチドまたはそのペプチド断片において特定のアミノ酸を検出するための他の方法は、よく知られており、例えば、質量分析計、キャピラリー電気泳動システム、磁気共鳴画像装置などのような装置の便利さまたは有効性に基づいて選択されうる。 Where a particular nucleotide occurrence of a SNP is such that the nucleotide occurrence results in an amino acid change in the encoded polypeptide, the nucleotide occurrence is indirectly identified by detecting a particular amino acid in the polypeptide. Can be done. The method for measuring amino acids depends, for example, on the structure of the polypeptide or on the position of the amino acid in the polypeptide. Where a polypeptide contains only a single occurrence of an amino acid encoded by a particular SNP, the polypeptide can be examined for the presence or absence of that amino acid. For example, simple sequencing of terminal amino acids can be performed where amino acids are at or near the amino terminus or carboxy terminus of a polypeptide. Alternatively, the polypeptide can be treated with one or more enzymes, and a peptide fragment containing the amino acid position of interest detects the specific movement of the peptide, for example, by sequencing the peptide or after electrophoresis Can be examined. Where a particular amino acid contains an epitope of a polypeptide, specific binding of an antibody specific for the epitope, or its absence can be detected. Other methods for detecting a specific amino acid in a polypeptide or peptide fragment thereof are well known and include, for example, the convenience of a device such as a mass spectrometer, a capillary electrophoresis system, a magnetic resonance imaging device, etc. It can be selected based on effectiveness.

もう一つの態様において、本発明の方法は、例えば、SNPの1つのヌクレオチド出現を含むヌクレオチド配列によりコードされるアミノ酸を含むポリペプチドに特異的に結合するが、そのSNPを含むコドンによりコードされる異なるアミノ酸を含むポリペプチドに実質的に結合しない；または、例えば、DIPの1つの型(例えば、挿入をもつ)によりコードされるアミノ酸配列を含むポリペプチドに特異的に結合するが、代替の型(例えば、欠失をもつ)によりコードされるものに実質的に結合しない、抗体またはその抗原結合断片を利用する。本明細書に用いられる場合、用語「特異的な相互作用」または「特異的に結合する」とは、2つの分子が、生理的条件下において比較的安定である複合体を形成することを意味する。その用語は、例えば、SNPが特定された、しかし代替ではない、ヌクレオチド出現(例えば、A、しかしTではない)をもつ場合のみ、SNP部位を含む標的ポリヌクレオチドを結合する抗体の相互作用；またはSNP部位を含むコドンによりコードされる1つのアミノ酸を含むポリペプチドを結合するが、そのSNPを含むコドンによりコードされる代替のアミノ酸を有するポリペプチドを結合しない抗体の相互作用を含む、様々な相互作用を指すために本明細書に用いられる。 In another embodiment, the method of the invention specifically binds to a polypeptide comprising an amino acid encoded by a nucleotide sequence comprising, for example, one nucleotide occurrence of a SNP, but is encoded by a codon comprising that SNP. Does not substantially bind to polypeptides comprising different amino acids; or binds specifically to a polypeptide comprising an amino acid sequence encoded by, for example, one type of DIP (eg, with an insertion), but an alternative type An antibody or antigen-binding fragment thereof that does not substantially bind to that encoded by (eg, has a deletion) is utilized. As used herein, the term “specific interaction” or “specifically binds” means that two molecules form a complex that is relatively stable under physiological conditions. To do. The term is, for example, the interaction of an antibody that binds a target polynucleotide containing a SNP site only if the SNP has been identified but not substituted, only if it has a nucleotide occurrence (eg, A, but not T); or A variety of interactions, including antibody interactions that bind a polypeptide that contains a single amino acid encoded by a codon that includes the SNP site, but do not bind a polypeptide that has an alternative amino acid encoded by the codon that includes the SNP. Used herein to refer to action.

特異的な相互作用は、少なくとも約1 x 10^-6 M、一般的に少なくとも約1 x 10^-7 M、通常少なくとも約1 x 10^-8 M、および特に少なくとも約1 x 10^-9 Mまたは1 x 10^-10 Mまたはそれ以上の解離定数により特徴付けられうる。特異的な相互作用は一般的に、例えば、ヒトもしくは他の脊椎動物または無脊椎動物のような生きている個体に生じている条件、加えて哺乳動物細胞または別の脊椎動物生物体もしくは無脊椎動物生物体由来の細胞を維持するために用いられるような細胞培養において生じている条件を含む、生理的条件下において安定である。2つの分子が特異的に相互作用しているかどうかを測定するための方法は、よく知られており、例えば、平衡透析法、表面プラスモン共鳴などを含む。 Specific interactions are at least about 1 x 10 ^-6 M, generally at least about 1 x 10 ^-7 M, usually at least about 1 x 10 ^-8 M, and especially at least about 1 x 10 ^-9 M or 1 It can be characterized by a dissociation constant of x 10 ^-10 M or higher. A specific interaction is generally a condition occurring in a living individual such as, for example, a human or other vertebrate or invertebrate, as well as a mammalian cell or another vertebrate organism or invertebrate It is stable under physiological conditions, including those occurring in cell cultures such as those used to maintain cells from animal organisms. Methods for determining whether two molecules are interacting specifically are well known and include, for example, equilibrium dialysis, surface plasmon resonance, and the like.

本発明の方法において有用な抗体は、AIMを含むポリヌクレオチドを特異的に結合する、またはSNPを含むコドンによりコードされるアミノ酸を含むもしくはDIP部位における挿入によるアミノ酸を含むポリペプチドを結合する、抗体を含む。そのような抗体は、SNP遺伝子座を含むコドンによりコードされる第一アミノ酸を含むポリペプチドを特異的に結合するが、SNPにおいて異なるヌクレオチド出現を含むコドンによりコードされる第二アミノ酸を含むポリペプチドを結合しない、または測定可能により弱く結合するように選択される。 Antibodies useful in the methods of the present invention bind specifically to a polynucleotide comprising AIM, or to a polypeptide comprising an amino acid encoded by a codon comprising SNP or comprising an amino acid by insertion at a DIP site including. Such an antibody specifically binds a polypeptide comprising a first amino acid encoded by a codon comprising a SNP locus, but comprises a second amino acid encoded by a codon comprising a different nucleotide occurrence in the SNP. Are selected so as not to bind or to be measurable weaker.

用語「抗体」は、抗原を特異的に結合する免疫グロブリン分子および免疫グロブリン分子の抗原結合部分を指すために本明細書に広く用いられる。それとして、本発明の方法において有用な抗体は、ポリクローナル、モノクローナル、多特異的な、ヒト、ヒト化またはキメラの抗体、単鎖抗体、Fab断片、F(ab')断片、Fab発現ライブラリーにより作製される断片、抗イディオタイプ(抗Id)抗体など、加えてそのような抗体の抗原/エピトープ結合断片でありうる。抗体の抗原結合断片は、限定されるものではないが、Fab、Fab'およびF(ab')2、Fd、単鎖Fv(scFv)、単鎖抗体、ジスルフィド結合Fv断片(sdFv)およびVLまたはVHドメインのいずれかを含む断片を含む。このように、単鎖抗体を含む抗原結合抗体断片は、可変領域を単独で、またはヒンジ領域、CH1、CH2および/もしくはCH3ドメインの全体または部分と組み合わせて、含まれうる。抗体は、鳥および哺乳動物を含む任意の動物起源由来でありうる、または、例えば、昆虫もしくは哺乳動物の宿主細胞において、または植物において組換えで発現されうる。 The term “antibody” is used broadly herein to refer to immunoglobulin molecules that specifically bind antigen and antigen-binding portions of immunoglobulin molecules. As such, antibodies useful in the methods of the present invention include polyclonal, monoclonal, multispecific, human, humanized or chimeric antibodies, single chain antibodies, Fab fragments, F (ab ′) fragments, Fab expression libraries. Fragments produced, anti-idiotype (anti-Id) antibodies, etc., plus antigen / epitope binding fragments of such antibodies. Antigen binding fragments of antibodies include, but are not limited to, Fab, Fab ′ and F (ab ′) 2, Fd, single chain Fv (scFv), single chain antibody, disulfide bond Fv fragment (sdFv) and VL or Includes fragments containing any of the VH domains. Thus, an antigen-binding antibody fragment comprising a single chain antibody can comprise a variable region alone or in combination with all or part of the hinge region, CH1, CH2 and / or CH3 domain. The antibody can be from any animal origin, including birds and mammals, or can be recombinantly expressed, eg, in insect or mammalian host cells, or in plants.

多数の科学的分野において遺伝マーカーの使用を通して今日、学ばれうることは多い。遺伝子配列の使用は、法医学および疾患研究にとって日常的になったが、最近完成されたヒトゲノムプロジェクトからの恩恵の大部分は、まだ発見を待ち設けている。ゲノムの内には、作物収穫量を増加させること、ヒト寿命を延ばすこと、薬物により引き起こされる苦痛を最小限にすること、およびより良い、より効果的かつ特異的な治療を通して我々の生活の質を向上させることを含む様々な目的のために有用であると証明されるであろう、配列および配列のパターンが存在する。今まで、生物医学的研究は、比較的単純な項において行われた。それにもかかわらず、1千より多い単純なメンデル形質が、家族において遺伝マーカーの伝達をたどることによりマッピングされた。 There is much that can be learned today through the use of genetic markers in many scientific fields. Although the use of gene sequences has become routine for forensic and disease research, most of the benefits from the recently completed human genome project are still awaiting discovery. Within the genome are our quality of life through increasing crop yields, extending human life expectancy, minimizing drug-induced pain, and better, more effective and specific treatments There are sequences and patterns of sequences that will prove useful for a variety of purposes, including improving To date, biomedical research has been conducted in relatively simple terms. Nevertheless, more than 1,000 simple Mendelian traits were mapped by following the transmission of genetic markers in the family.

伝統的な家系に基づく連鎖解析、分散成分方法、同胞対連鎖、測定遺伝子型、伝達不平衡、ゲノム制御および構造解析を含む、多くの統計的方法が遺伝的形質を研究するために利用可能である。ありふれた疾患(例えば、心疾患、肥満、2型糖尿病、高血圧症および癌)に対する感受性における変動の根底にある遺伝子のいくつかは、結局、遺伝的アプローチを用いて同定されるであろう。しかしながら、あるふれた疾患における遺伝的研究において、これらの状態の多くが多因子性(すなわち、リスクにおける変動性のいくつかの原因をもつ)および多遺伝子性(すなわち、いくつかの遺伝子間の作用および相互作用による結果)であるため、多数の複雑さがある。ありふれた疾患の研究におけるさらなる困難は、症状の遅発性および病因における不均一性から由来しうる。このように、複合性疾患に関与する遺伝子を同定することは、ヒト遺伝学の分野において最高の難題の一つのままである。 Many statistical methods are available to study genetic traits, including traditional pedigree-based linkage analysis, distributed component methods, sibling pair linkage, measurement genotype, transmission disequilibrium, genome control and structural analysis is there. Some of the genes that underlie variability in susceptibility to common diseases (eg, heart disease, obesity, type 2 diabetes, hypertension and cancer) will eventually be identified using genetic approaches. However, in genetic studies in certain diseases, many of these conditions are multifactorial (i.e., have some cause of variability in risk) and multigenic (i.e. effects between several genes). And the result of interaction), there is a great deal of complexity. Further difficulties in the study of common diseases can stem from late onset of symptoms and heterogeneity in etiology. Thus, identifying genes involved in complex diseases remains one of the best challenges in the field of human genetics.

ありふれた疾患および薬物応答遺伝子をマッピングするための有用なアプローチとして関連解析に関心が高まった(RischおよびMerikangas、Science 273:1516-1517、1996；Jorde、Genome Res. 10:1435-1444、2000；NordborgおよびTavare、Trends Genet. 18:83-90、2002)。しかしながら、本開示まで、これらの遺伝子を同定することに対する祖先の意味は完全には認識されていなかった。それとして、本発明の方法は、疾患感受性および薬物応答性に関連した遺伝子の同定のために、加えて進歩した法医学的方法の開発のために、以前には記載されていないプラットフォームを提供する。それとして、顕著に、個体の祖先の機能である、一般に用いられる薬物に対する個体の応答を推論するための組成物および方法が提供される；開示されたマーカーおよび方法は、各薬物についての異なる程度まで、そのような応答の推論に有用である。さらに、個体のもしくは群のDNA配列の知識から個体および/または群の祖先の割合を推論するための組成物および方法が提供される。なおさらに、MALD過程により疾患感受性および薬物応答遺伝子を同定するために祖先関連性DNA配列の知識を用いるための組成物および方法が提供される。また、疾患遺伝子をマッピングする、より伝統的な方法についての研究群を認定および標準化するための組成物および方法が提供される。これらの過程のそれぞれは、本明細書に開示された方法および組成物を用いて測定されうる祖先の正確な知識を必要とする。 Increasing interest in association analysis as a useful approach to map common diseases and drug response genes (Risch and Merikangas, Science 273: 1516-1517, 1996; Jorde, Genome Res. 10: 1435-1444, 2000; Nordborg and Tavare, Trends Genet. 18: 83-90, 2002). However, until the present disclosure, the ancestry implications for identifying these genes were not fully recognized. As such, the methods of the present invention provide a platform not previously described for the identification of genes associated with disease susceptibility and drug responsiveness, as well as for the development of advanced forensic methods. As such, compositions and methods for inferring an individual's response to a commonly used drug, which is notably a function of the individual's ancestry, are provided; the disclosed markers and methods differ to varying degrees for each drug Until useful in the inference of such responses. Further provided are compositions and methods for inferring the ancestry of an individual and / or group from knowledge of the DNA sequence of the individual or group. Still further, compositions and methods are provided for using knowledge of ancestry-related DNA sequences to identify disease susceptibility and drug response genes by the MALD process. Also provided are compositions and methods for authorizing and standardizing research groups for more traditional methods of mapping disease genes. Each of these processes requires accurate knowledge of ancestry that can be measured using the methods and compositions disclosed herein.

連鎖不平衡(LD)マッピングに最高に適していると思われる集団は、多くの議論および討論を促した(Wrightら、Nat. Genet. 23:397-404、1999；Eavesら、Nat. Genet 25:320-323、2000；NordborgおよびTavare、前記、2002；Kaessmannら、Amer. J. Hum. Genet. 70:673-685、2002)。LDの程度は、突然変異、組換えおよび遺伝子変換の率、人口統計的および淘汰的事象、ならびに突然変異自身の年数のような多数の遺伝的および進化的因子の複雑な関数である。これらの因子のあるものは、ゲノム全体に影響を及ぼすが、他のものは特定のゲノム領域に影響を及ぼすのみである。さらに、ゲノムを通じての突然変異、組換えおよび遺伝子変換の率における変動は、ゲノム領域間にLD差を引き起こすと予想される(例えば、Taillon-Millerら、Nat. Genet. 25:324-328、2000)。 The population that seemed best suited for linkage disequilibrium (LD) mapping prompted much discussion and discussion (Wright et al., Nat. Genet. 23: 397-404, 1999; Eaves et al., Nat. Genet 25 : 320-323, 2000; Nordborg and Tavare, supra, 2002; Kaessmann et al., Amer. J. Hum. Genet. 70: 673-685, 2002). The degree of LD is a complex function of a number of genetic and evolutionary factors such as mutation, recombination and gene conversion rates, demographic and episodic events, and the age of the mutation itself. Some of these factors affect the entire genome, while others only affect specific genomic regions. Furthermore, variations in the rate of mutation, recombination and gene conversion throughout the genome are expected to cause LD differences between genomic regions (e.g., Tailon-Miller et al., Nat. Genet. 25: 324-328, 2000 ).

小さな、隔離された、同系交配の集団が、より低い不均一性およびより大きい程度の連鎖不平衡により、他の集団を凌ぐ利点をもつであろうと提案されていた(Wrightら、前記、1999；NordborgおよびTavare、前記、2002；Kaessmannら、前記、2002)。マッピングによく適した他の集団は、最近混合された集団(例えば、ヒスパニックおよびアフリカ系アメリカ人)であり、混合過程によりLDが最近引き起こされたという利点を提供する。このLDが最近であるため、大きな染色体領域に渡りうる。しかしながら、偽陽性を避けるためにこれらの集団に存在する遺伝的構造(混合割合における個体間変動)について制御することもまた極めて重要である(Parraら、前記、1998；Lautenbergerら、Amer. J. Hum. Genet. 66:969-978、2000；Pfaffら、Amer. J. Hum. Genet. 68:198-207、2001；NordborgおよびTavare、前記、2002、それぞれは参照として本明細書に組み入れられている)。混合マッピングへの関心は、近年、増加した(McKeigueら、Ann. Hum. Genet. 64:171-186、2000；Smithら、J. Invest. Dermatol. 111:119-122、2001；Collins-Schrammら、Amer. J. Hum. Genet. 70:737-750、2002、それぞれは参照として本明細書に組み入れられている)。混合マッピングの一般的な説明は、混合マッピングのために開発された統計的アプローチおよびモデル表現型として皮膚色素形成へのそれの適用についてのいくつかの項目であるが、下に提供されている。 It has been proposed that small, isolated, inbred populations will have advantages over other populations due to lower heterogeneity and a greater degree of linkage disequilibrium (Wright et al., Supra, 1999; Nordborg and Tavare, supra, 2002; Kaessmann et al., Supra, 2002). Other populations that are well suited for mapping are recently mixed populations (eg, Hispanic and African Americans), providing the advantage that LD was recently caused by the mixing process. Because this LD is recent, it can span large chromosomal regions. However, it is also extremely important to control the genetic structure (individual variability in the mixing ratio) present in these populations to avoid false positives (Parra et al., Supra, 1998; Lautenberger et al., Amer. J. Hum. Genet. 66: 969-978, 2000; Pfaff et al., Amer. J. Hum. Genet. 68: 198-207, 2001; Nordborg and Tavare, supra, 2002, each incorporated herein by reference. ) Interest in mixed mapping has increased in recent years (McKeigue et al., Ann. Hum. Genet. 64: 171-186, 2000; Smith et al., J. Invest. Dermatol. 111: 119-122, 2001; Collins-Schramm et al. Amer. J. Hum. Genet. 70: 737-750, 2002, each incorporated herein by reference). A general description of mixed mapping is a statistical approach developed for mixed mapping and some items about its application to skin pigmentation as a model phenotype, but is provided below.

混合は、対立遺伝子頻度が親集団間で異なるすべてのマーカー遺伝子座間に対立遺伝子の関連を生じる(ChakrabortyおよびWeiss、Proc. Natl. Acad. Sci., USA 85:9119-9123、1988)。これらの関連は、それらの間の遺伝的距離に依存するという方式で時間と共に衰退する。このように、親集団間で異なる疾患(または形質)リスク対立遺伝子は、親集団間で高い頻度差を示す遺伝マーカーの特定のパネルを用いて混合された集団においてマッピングされうる。これらのマーカーは、AIMと呼ばれるが、集団の1つの群において、他の集団においてより、ありふれている特定の対立遺伝子を有するという特徴がある。そのようなマーカーの情報提供性の1つの尺度は、対立遺伝子頻度差、デルタ(δ)、であり、単に、集団間の特定の対立遺伝子の差の絶対値である(ChakrabortyおよびWeiss、前記、1988；Deanら、前記、1994)。 Mixing results in an allelic association between all marker loci that differ in allelic frequency between parental populations (Chakraborty and Weiss, Proc. Natl. Acad. Sci., USA 85: 9119-9123, 1988). These associations fade with time in a manner that depends on the genetic distance between them. In this way, disease (or trait) risk alleles that differ between parental populations can be mapped in a mixed population using a particular panel of genetic markers that show a high frequency difference between parental populations. These markers, called AIM, are characterized by having certain alleles that are more common in one group of populations than in other populations. One measure of the informability of such markers is the allelic frequency difference, delta (δ), which is simply the absolute value of a particular allelic difference between populations (Chakraborty and Weiss, supra, 1988; Dean et al., Supra, 1994).

混合された集団において、対立遺伝子の関連は、最近、生じた、それゆえに、それらは混合されていない集団においてより長い距離に渡る(10〜20センチモルガン(cM)またはそれ以上まで)ため、与えられた試料サイズについてより容易に検出される。このアプローチの統計的基盤は、最初、ChakrabortyおよびWeiss(前記、1988)により、その後、その方法を「混合連鎖不平衡によるマッピング」(MALD；Stephensら、Amer. J. Hum. Genet. 55:809-824、1994；Briscoeら、J. Hered. 85:59-63、1994)と名付けた、Stephens、BriscoeおよびO'Brienにより探査された。さらに、連鎖していない遺伝子座における対立遺伝子と形質の関連を排除するために、遺伝的研究についてMALDアプローチを用いようが、より伝統的なLDアプローチを用いようが、マーカーデータから推定された個体の祖先についての解析において制御することが必要である。本明細書に開示されたSNP配列(マーカー；AIM)および方法(BGA検査)は、この課題を成し遂げるために特に効率的な手段である。共分散分析(ANCOVA)検定が使用され、以下の2つの方法において個体の祖先の影響について制御するために個体混合の推定値を条件付け変数として用いた：1)考慮中の遺伝子座を除外する(ANCOVA/IAEマイナスマーカー)；および2)条件付けについて完全な個体祖先推定値を用いること(ANCOVA/IAE)。この方法は、本明細書で詳細に記載されている。 In mixed populations, allelic associations have recently occurred and are therefore given because they span longer distances (up to 10-20 centimorgans (cM) or more) in unmixed populations. The detected sample size is more easily detected. The statistical basis of this approach was first by Chakraborty and Weiss (supra, 1988), and then the method was “mapped by mixed linkage disequilibrium” (MALD; Stephens et al., Amer. J. Hum. Genet. 55: 809. -824, 1994; Briscoe et al., J. Hered. 85: 59-63, 1994), explored by Stephens, Briscoe and O'Brien. In addition, individuals excluded from marker data, whether to use the MALD approach for genetic studies or the more traditional LD approach to eliminate associations between alleles and traits at unlinked loci Need to be controlled in the analysis of the ancestors. The SNP sequences (markers; AIM) and methods (BGA testing) disclosed herein are particularly efficient means to accomplish this task. An analysis of covariance (ANCOVA) test was used, and an estimate of individual mixture was used as a conditioning variable to control for the influence of individual ancestry in the following two ways: 1) Exclude the locus under consideration ( ANCOVA / IAE minus marker); and 2) Use full individual ancestry estimates for conditioning (ANCOVA / IAE). This method is described in detail herein.

より早い研究に基づいているが、古典的なLDマッピングとほとんど共通点がなく、かつ実験的交差の連鎖分析とより類似している、混合を探査することへの代替アプローチが開発された(McKeigue、Amer. J. Hum. Genet. 63:241-251、1998、参照として本明細書に組み入れられている；McKeigueら、前記、2000)。この理由のために、用語「混合マッピング」が、「混合連鎖不平衡によるマッピング」より適切であるとして提案された。対立遺伝子関連について検定する代わりに、本方法により、祖先における根底にある変動が、混合により生じる連鎖についてのすべての情報を抽出するために混合された家系の染色体上にモデル化される。開示される方法およびマーカーは、この過程を達成するのに必要かつ十分である。連鎖を検出するために頼る根底にある原理は単刀直入ではあるが、進歩した統計的方法がこの方法を実践において適用するために利用される。例えば、遺伝子座が西アフリカ人とヨーロッパ人の間の色素形成における変動の一部の原因であると仮定する。混合された家系の個体は、この遺伝子座においてアフリカ人祖先の対立遺伝子を0個もつか、1個もつかまたは2個もつかに従って分類される場合には、他の因子が一定に保たれたこれらの3つの群の比較において、平均色素形成レベルは、アフリカ人祖先のものである遺伝子座における対立遺伝子の割合によって変動するものと思われる。親の混合についての解析を制御することは、連鎖していない遺伝子座における祖先との形質の関連を除去し、他の因子が一定に保たれて、比較がなされることを保証する。 An alternative approach to exploring mixing has been developed that is based on earlier studies, but has little in common with classical LD mapping and is more similar to experimental cross-linkage analysis (McKeigue Amer. J. Hum. Genet. 63: 241-251, 1998, incorporated herein by reference; McKeigue et al., Supra, 2000). For this reason, the term “mixed mapping” has been proposed as more appropriate than “mapped by mixed linkage disequilibrium”. Instead of testing for allelic association, the method models the underlying variation in ancestry onto a mixed pedigree chromosome to extract all information about linkages resulting from mixing. The disclosed methods and markers are necessary and sufficient to accomplish this process. Although the underlying principle that relies on detecting linkages is straightforward, advanced statistical methods are used to apply this method in practice. For example, assume that the locus is responsible for some of the variations in pigmentation between West Africans and Europeans. Individuals of mixed pedigrees kept other factors constant when classified according to whether they had 0, 1, or 2 African ancestry alleles at this locus In comparing these three groups, the average pigmentation level appears to vary with the percentage of alleles at loci that are of African ancestry. Controlling the analysis for parental mixing removes the trait association with ancestors at unlinked loci and ensures that other factors remain constant and comparisons are made.

マーカー遺伝子型から遺伝子座における対立遺伝子の祖先を推論するために、各対立遺伝子状態の条件付き確率が、対立遺伝子の祖先(祖先特異的対立遺伝子頻度)、例えば、西アフリカ人またはヨーロッパ人、を仮定する場合に必要とされる。混合マッピングが遺伝子同定の効果的手段であるという証拠が増加しており、混合された集団において、強い対立遺伝子関連が、実質的な距離で間隔をあけられた連鎖したマーカー間に観察されることが報告された(Parraら、前記、1998；Parraら、Amer. J. Phys. Anthropol. 114:18-29、2001；McKeigueら、前記、2000；Lautenbergerら、前記、2000；Smithら、前記、2001；WilsonおよびGoldstein、Amer. J. Hum. Genet. 67:926-935、2000；Pfaffら、前記、2001)。非常に高いレベルの関連が長い遺伝的距離に渡って観察されたとすれば、いくつかの遺伝的因子のために親集団間で異なる表現型がまた、連鎖したAIMとの関連を示すことが予想される。混合マッピングを適用するのによく適している表現型は皮膚色素形成である。 To infer the allelic ancestry at the locus from the marker genotype, the conditional probability of each allelic state is assumed to be the ancestor of the allele (ancestor-specific allelic frequency), for example, West Africans or Europeans Needed if you want. There is increasing evidence that mixed mapping is an effective means of gene identification, and in mixed populations, strong allele associations are observed between linked markers spaced at substantial distances (Parra et al., Supra, 1998; Parra et al., Amer. J. Phys. Anthropol. 114: 18-29, 2001; McKeigue et al., Supra, 2000; Lautenberger et al., Supra, 2000; Smith et al., Supra, 2001; Wilson and Goldstein, Amer. J. Hum. Genet. 67: 926-935, 2000; Pfaff et al., Supra, 2001). If very high levels of association were observed over long genetic distances, phenotypes that differ between parental populations due to several genetic factors are also expected to show association with linked AIM Is done. A well-suited phenotype for applying mixed mapping is skin pigmentation.

疾患遺伝子および法医学解析についてのAIMの検出力にもかかわらず、この検出力を解明するために研究が行われてこなかった。本明細書に開示されているように、1)薬物応答、疾患遺伝子もしくは法医学研究のために有用である可能性があるヒトゲノムにおけるSNPまたは欠失/挿入多型(集合的にAIMと呼ばれる)が同定された；2)これらのAIMが疾患遺伝子および法医学研究のために有用でありうることを実証する生化学的および遺伝学的試験結果が提供される；3)実際の薬物応答、疾患遺伝子または法医学研究におけるヒトゲノムの体系的スクリーンから引き出されるAIMの有用性が実証される；4)個体が疾患を獲得しやすいか、または薬物に反応しにくいかどうかに関して推論するためのヒトゲノムの体系的スクリーンから引き出されるAIMの有用性が実証される；5)犯罪現場DNA検体が、例えば、80%ヨーロッパ人、10%アフリカ人および10%アジア人ヘリテージまたはいくつかの他の比率/混合の個体由来であったかどうかに関して推論するためのヒトゲノムの体系的スクリーンから引き出されるAIMの有用性が実証される；6)それらのDNA由来の個体の祖先の割合を推論する(例えば、その個体が80%ヨーロッパ人、10%アフリカ人および10%アジア人ヘリテージまたはいくつかの他の比率/混合であるかどうか)ためのヒトゲノムの体系的スクリーンから引き出されるAIMの有用性が実証される；ならびに7)それらのDNA由来の個体の群の祖先の割合を推論する(例えば、群は、集団サンプル、家族または臨床的に定義された人の群でありうるが、80%ヨーロッパ人、10%アフリカ人および10%アジア人ヘリテージまたはいくつかの他の比率/混合であるかどうか)ためのヒトゲノムの体系的スクリーンから引き出されるAIMの有用性が実証される。 Despite the power of AIM for disease genes and forensic analysis, no studies have been conducted to elucidate this power. As disclosed herein, 1) SNPs or deletion / insertion polymorphisms (collectively referred to as AIM) in the human genome that may be useful for drug response, disease genes or forensic research 2) biochemical and genetic test results are provided that demonstrate that these AIMs may be useful for disease genes and forensic research; 3) actual drug response, disease genes or Demonstrates the usefulness of AIM derived from a systematic screen of the human genome in forensic research; 4) From a systematic screen of the human genome to infer whether individuals are likely to acquire disease or are less responsive to drugs The usefulness of the extracted AIM is demonstrated; 5) Crime scene DNA specimens, for example 80% European, 10% African and 10% Asian heritage or some other Demonstrates the usefulness of AIM derived from the systematic screen of the human genome to infer whether it was from a ratio / mixed individual; 6) infer the percentage of ancestors of individuals from those DNA The utility of AIM derived from a systematic screen of the human genome for whether the individual is 80% European, 10% African and 10% Asian heritage or some other ratio / mix; And 7) infer the proportion of ancestors of the group of individuals from their DNA (e.g., the group can be a population sample, a family or a group of clinically defined people, but 80% European, 10% The utility of AIM derived from a systematic screen of the human genome (whether African and 10% Asian heritage or some other ratio / mix) is demonstrated.

本結果は、AIMが上記の適用に有用であり、本明細書に例証されている配列、および本明細書に開示された方法を用いて同定される追加的AIMがこれらの適用を可能にすることを実証する。本発明のAIMおよび方法は、ヒト疾患、薬物応答および身体的形質の研究に有用であり、それゆえに、ひときわ優れた商業的可能性を提供する。例えば、個別化処方および疾患リスクアセスメントのこの作成段階において、本発明のマーカーおよび方法がこの駆け出しの産業において前進するために必要とされるツールを提供する。本明細書に例証されているように、特定の薬物への個体の応答は、個体が、薬物標的についての人の遺伝子型または異物代謝遺伝子配列に加えて、しかし、それに関係なく、特定の集団構造を示す(すなわち、特定の祖先のヘリテージである)程度に依存した。それとして、本発明の組成物および方法は、特定の薬物へ応答する個体の尤度を予測するための手段を提供する。 The results show that AIM is useful for the above applications, and that the sequences exemplified herein and additional AIMs identified using the methods disclosed herein allow these applications Prove that. The AIMs and methods of the present invention are useful for the study of human disease, drug response and physical traits, and therefore provide exceptional commercial potential. For example, during this stage of personalized prescription and disease risk assessment, the markers and methods of the present invention provide the tools needed to advance in this pioneering industry. As illustrated herein, an individual's response to a particular drug is determined by the individual population in addition to, but irrespective of, the person's genotype or xenobiotic metabolism gene sequence for the drug target. Dependent on the degree of structure (ie, the heritage of a particular ancestor). As such, the compositions and methods of the present invention provide a means for predicting the likelihood of an individual to respond to a particular drug.

例えば、コレステロール低下薬、Lipitor(商標)、への患者応答に関連した遺伝マーカーのスクリーンにおいて、好ましい応答の指標である低密度リポタンパク質(LDL)応答によって、Lipitor(商標)へのLDL応答について同定された最も強力なマーカーのいくつかは、例えば、TYR、OCA2、TYRP、FDPSおよびHMGCRを含む、薬物応答について関連性があるとすぐには認識されない遺伝子の型であった(国際公開公報第03/002721号(PCT/US02/20847)、および国際公開公報第03/045227号(PCT/US02/38345)も参照されたい、それぞれは参照として本明細書に組み入れられている)。応答について生物学的関連性がある遺伝子由来のマーカーと組み合わされる場合、それらは、DNAから応答の正確な推論を導く能力を増す。これらのマーカーのそれぞれは、優れたAIMであり、薬物応答へのAIMの連鎖が応答傾向における祖先の差の関数である可能性が高いことを示している(実施例5参照)。それとして、祖先のヘリテージがLipitor(商標)への好ましい応答を予測しうる。この関連は、調べられた薬物(n=23)のほとんどあらゆる型に対する応答(n=54)のほとんどあらゆる型について観察され、従って、薬物応答の推論が、少なくとも一部は、祖先の割合の推論を通して達成されうることを確認した。それとして、薬物応答について真に関連性がある遺伝子は、少なくとも一部は、個体の祖先の機能であり、かつ薬物応答について関連性がある遺伝子配列は、祖先に関して情報を与えるマーカー(すなわち、AIM)と統計的に連鎖していると思われる。 For example, in a screen of genetic markers related to patient response to a cholesterol-lowering drug, Lipitor ™, identified for LDL response to Lipitor ™ by a low-density lipoprotein (LDL) response, a favorable response indicator Some of the most potent markers that were made were gene types that were not immediately recognized as relevant for drug response, including, for example, TYR, OCA2, TYRP, FDPS, and HMGCR (WO 03 See also / 002721 (PCT / US02 / 20847) and WO03 / 045227 (PCT / US02 / 38345), each incorporated herein by reference). When combined with markers from genes that are biologically relevant for the response, they increase the ability to derive an accurate inference of the response from DNA. Each of these markers is an excellent AIM, indicating that the AIM linkage to drug response is likely a function of ancestral differences in response trends (see Example 5). As such, the ancestor heritage may predict a favorable response to Lipitor ™. This association has been observed for almost any type of response (n = 54) to almost any type of drug examined (n = 23), thus inference of drug response is at least partially inferred from ancestral proportions Confirmed that it can be achieved through As such, genes that are truly relevant for drug response are at least partially a function of the ancestry of the individual, and gene sequences that are relevant for drug response are markers that provide information about ancestry (i.e., AIM ) And seem to be statistically linked.

薬物応答性のような特定の形質に関連した遺伝子の真の同定についてゲノムをスクリーニングすることは、法外に費用がかかりかつ時間がかかる。それとして、薬物への個体の傾向について推論するためのAIMの使用は、患者に合う、彼らの遺伝的構成に最も適切なそれらの薬物を見つけるために用いられうる検査の迅速な開発のための意義深い近道を提供する。このように、疾患遺伝子の混合マッピングに有用であることに加えて、開示された方法および例証されているマーカーは、臨床医による治療プロトコールを指示しうるツールを提供する。公的に利用可能なヒトゲノムデータからのAIMの同定、ならびに患者-薬物分類セット、混合スクリーニングパネルおよび法医学ツールの開発のためにAIMを効果的に用いる能力は、AIMについてSNPデータベース(例えば、URL「nih.ncbi.nlm.gov」におけるワールドワイドウェブ(「www」)を参照)をスクリーニングすること；真に良いAIMであるものを立証するためにDNA試料の多祖先のパネルに対してAIMをスクリーニングすること；生物学的関連性がある推論を導くためにAIM配列を用いるための開示された統計的およびソフトウェア方法を用いること；ならびに、薬物に応答するまたは疾患を発生する個体の尤度が、彼らの祖先の知識、次には個体のAIM配列を通して示される、を通して予想されうることを認識することを含む、開示された方法を用いて達成された。 Screening the genome for the true identification of genes associated with specific traits such as drug responsiveness is prohibitively expensive and time consuming. As such, the use of AIM to infer an individual's propensity to drugs is for the rapid development of tests that can be used to find those drugs that fit patients and are most appropriate for their genetic makeup. Provide meaningful shortcuts. Thus, in addition to being useful for mixed mapping of disease genes, the disclosed methods and illustrated markers provide a tool that can direct treatment protocols by clinicians. The ability to use AIM effectively for the identification of AIM from publicly available human genome data, and for the development of patient-drug classification sets, mixed screening panels and forensic tools, is the SNP database for AIM (e.g., URL `` Screening the World Wide Web (see “www”) at nih.ncbi.nlm.gov); screening AIM against a multi-ancestral panel of DNA samples to demonstrate what is truly a good AIM Using the disclosed statistical and software methods for using AIM sequences to guide biologically relevant inferences; and the likelihood of an individual responding to a drug or developing a disease Achieved using the disclosed methods, including recognizing what can be expected through knowledge of their ancestors, then shown through the AIM sequence of the individual .

本開示の前に、個体祖先は、2つの独立した方法を用いて推定されえた：最尤法(Hanisら、Amer. J. Phys. Anthropol. 70(4):433-441、1986、参照として本明細書に組み入れられている)、およびSTRUCTUREプログラムにおいて実行されるベイズの(Bayesian)方法(Pritchardら、Genetics 155:945-959、2000、参照として本明細書に組み入れられている)。最尤法およびベイズの方法は、比例的祖先または混合の点推定を提供するが、開示された方法により取り組まれている、これらの方法におけるいくつかの不足がある。例えば、開示されたアルゴリズムを用いる場合(実施例6を参照；アルゴリズムを図解しているフローチャートを含む表12も参照)、1)個体が由来する最も可能性が高い群は、比例的祖先の推定と同時に推定された；2)多次元の信頼区間はコンピューターで計算されかつ投影され、従って、提示についての複雑性を減少させた；3)過去における各レベル(親の、祖父母の、曾祖父母の、など)での祖先の数およびそれらの混合割合を推定するアプローチが開発された；ならびに4)2つより多いBGA群についての個体内の比例的BGA所属が一度に引き出され、従って、例えば、改良された、より正確な法医学適用を提供し、加えて、量的または連続的に分布した形質(すなわち、二分ではない)、形質値は少なくとも一部、BGAの機能である、についての分類器の開発を可能にした。 Prior to this disclosure, individual ancestry could be estimated using two independent methods: maximum likelihood (Hanis et al., Amer. J. Phys. Anthropol. 70 (4): 433-441, 1986, as a reference. And the Bayesian method implemented in the STRUCTURE program (Pritchard et al., Genetics 155: 945-959, 2000, incorporated herein by reference). Although the maximum likelihood and Bayesian methods provide proportional ancestry or mixed point estimation, there are several deficiencies in these methods that are addressed by the disclosed methods. For example, when using the disclosed algorithm (see Example 6; see also Table 12 containing a flowchart illustrating the algorithm), 1) the most likely group from which the individuals are derived is the proportional ancestor estimation Estimated at the same time; 2) the multidimensional confidence interval was computed and projected by the computer, thus reducing the complexity of presentation; 3) each level in the past (parent, grandparent, great-grandparent , Etc.) has been developed to estimate the number of ancestors and their proportions mixed; and 4) the proportional BGA affiliation within an individual for more than two BGA groups is derived at one time, and thus, for example, A classifier for providing improved and more accurate forensic applications, in addition to quantitatively or continuously distributed traits (i.e. not dichotomous), trait values are at least partly a function of BGA Opening It made it possible to.

個体の集団群への分類のための独立した方法が開発された(Shriverら、前記、1997、Frudakisら、前記、2002、それぞれは参照として本明細書に組み入れられている)。本方法は、特定の個体が属するものと思われる最適の群、および個体の複数の親の群への比例的割当の同時的推定を可能にする点において、以前の分類方法とは異なる(実施例6；表12も参照)。このように、以前の方法が、人がヨーロッパ系アメリカ人よりアフリカ系アメリカ人である可能性が非常により高いことを言明するのを可能にしたところにおいて、本アプローチは、同じ言明を可能にし、かつまた、信頼区間(CI)を以て個体の比例的祖先を提供する；例えば、25%(95% CI 15〜35%)ヨーロッパ人祖先；75%(95% CI 60〜80%)アフリカ人祖先；および0%(95% CI 0〜6%)先住アメリカ人祖先。さらに、信頼区間は、問題の人について測定された祖先のより明らかな表示を提供するために多次元空間において表されうる(下記参照；図2も参照)。そのような表示を作図するための方法は知られていたが、本開示は、定量化可能な信頼を以て提示される表示を提供する最初である。 Independent methods for the classification of individuals into population groups have been developed (Shriver et al., Supra, 1997, Frudakis et al., Supra, 2002, each incorporated herein by reference). This method differs from previous classification methods in that it allows the simultaneous estimation of the optimal group to which a particular individual belongs and the proportional assignment of individuals to multiple parent groups. Example 6; see also Table 12). Thus, where the previous method has made it possible to declare that a person is much more likely to be African American than European American, this approach allows the same statement, And also provides the individual's proportional ancestry with a confidence interval (CI); for example, 25% (95% CI 15-35%) European ancestry; 75% (95% CI 60-80%) African ancestry; And 0% (95% CI 0-6%) Native American ancestry. In addition, confidence intervals can be represented in multidimensional space to provide a clearer representation of measured ancestry for the person in question (see below; see also FIG. 2). Although methods for drawing such displays are known, the present disclosure is the first to provide a display that is presented with quantifiable confidence.

異なる祖先の歴史をもつ人の間の染色体セグメント祖先のパターン(PCSA)において明らかな差がある(図1参照)。染色体に渡る一連のAIMは、与えられた人に観察される配列のプロファイルへと導く最も可能性が高い親の組み合わせの推定を容易にすることができる。PCSAの推定が重要であるところの一つの例は、最近の先住アメリカ人祖先のいくらかの割合を含む主としてヨーロッパ人祖先をもつ人からのヒスパニック祖先の人の識別においてである。実際、これは、これらの2つの群に要求されかつ与えられる政治上および法律上の権利が彼らの祖先に依存しうるため、重要な測定である。メキシコ系アメリカ人(MA)のようなヒスパニック集団は、およそ30〜40%先住アメリカ人祖先をもつが、そのバランスは、微量部分(5%ほど)のアフリカ人祖先をもつヨーロッパ人である。4分の1の先住アメリカ人である人は、25%先住アメリカ人祖先をもつものであり、それゆえに、推定された祖先の彼のレベルにおいて多くのMAの人と重複するものと思われる。PCSAパターンは、これらの2つの事例について有意に異なり、そのような場合、祖先の正確な限定を容易にする唯一の遺伝的証拠の一部を提供しうることが期待される。本明細書に開示されているように、PCSAは祖先研究において用いられうる。 There is a clear difference in chromosome segment ancestry patterns (PCSA) between persons with different ancestor history (see Figure 1). A series of AIMs across chromosomes can facilitate the estimation of the most likely parental combination leading to the sequence profile observed in a given person. One example of where PCSA estimation is important is in the identification of Hispanic ancestors from people with primarily European ancestry, including some proportion of recent Native American ancestry. In fact, this is an important measurement because the political and legal rights required and granted to these two groups can depend on their ancestry. Hispanic groups such as Mexican Americans (MA) have approximately 30-40% Native American ancestry, but the balance is European with a small fraction (about 5%) of African ancestry. A person who is a quarter indigenous American has 25% indigenous American ancestry and therefore seems to overlap with many MAs at his level of estimated ancestry. The PCSA pattern is significantly different for these two cases, and in such cases it is expected that it can provide some of the only genetic evidence that facilitates the precise definition of ancestry. As disclosed herein, PCSA can be used in ancestral studies.

これらの測定における重要な段階は、染色体セグメントに沿ったAIMの整相である(実施例2、図8を参照；実施例5、図12〜16も参照)。染色体に沿ってAIMの相を合わすことは、1)個体の遺伝子型からの推定、2)分子ハプロタイピング(例えば、ジェノタイピングと組み合わされた対立遺伝子特異的PCR)、および3)単一精子分析(女性の対象については、男性の同父母の同胞の精子が同じプロファイルを与えるものと思われる)を含む、いくつかの方法により達成されうる。さらに、開示された方法は、祖先の推論について2つの性染色体(XおよびY)ならびにmtDNAの同時的考慮を可能にする。AIMは、これらの源のそれぞれにおいて見出され、人の祖先の割合および特定の人が由来している集団に関する問題の多くに対して情報を与えうる。例えば、ヒスパニック/ラテンアメリカ系集団は、非常に高い(65〜100%)頻度の先住アメリカ人のmtDNAハプロ群をもつが、常染色体のマーカーにおいて先住アメリカ人集団からの少数の寄与のみを示す。このように、例えば、非先住アメリカ人のmtDNAハプロ群を有する彼女の父親側に先住アメリカ人と言われている祖先をもつ人は、彼女が想像するような一部分は先住アメリカ人であるよりも、彼女が先住アメリカ人のmtDNAハプロ群を有する運命であったとするよりも、ヒスパニックではない可能性が高い。 An important step in these measurements is the phasing of AIM along the chromosome segment (see Example 2, FIG. 8; see also Example 5, FIGS. 12-16). Matching AIM phases along the chromosome is: 1) inferring from individual genotype, 2) molecular haplotyping (e.g., allele-specific PCR combined with genotyping), and 3) single sperm analysis (For female subjects, male sibling sibling sperm would give the same profile) and can be achieved in several ways. Furthermore, the disclosed method allows for simultaneous consideration of two sex chromosomes (X and Y) and mtDNA for ancestral reasoning. AIM can be found in each of these sources and can inform many of the problems related to the proportion of human ancestry and the population from which a particular person is derived. For example, the Hispanic / Latin American population has a very high (65-100%) frequency of Native American mtDNA haplogroups, but shows only a small contribution from the Native American population in autosomal markers. Thus, for example, a person with an ancestor who is said to be an indigenous American on her father's side who has a non-indigenous American mtDNA haplogroup is partly more indigenous than an indigenous American. It's more likely not to be Hispanic than she was destined to have an Native American mtDNA haplogroup.

連鎖不平衡(LD)は、遺伝子位置の詳細な測定、および特定の集団における疾患遺伝子の最初の位置推定の両方のためのマッピングツールとして用いられることが増加している。対立遺伝子の関連性は、小さな(<60 kb)ゲノム領域内において、有意に、非ランダムであり、かつ物理的距離と相関しており(概説として、Jorde、Amer. J. Hum. Genet. 66:979-988、1995；Jorde、Genome Res. 10:1435-1444、2000を参照)、おそらく、多くのゲノム領域を特徴付けている根底にある「ブロック構造」を反映している(Reichら、2001；Dalyら、2001)。このように、集団における疾患対立遺伝子が最近の共通した起源を共有する場合には、最も強い関連をもつ近くの遺伝マーカーが疾患を引き起こす遺伝子座に最も近いものと思われる。このアプローチは、嚢胞性線維症遺伝子、ハンチントン病遺伝子、および地殻変動性異形成症遺伝子を含む、いくつかの単純なメンデル疾患のポジショナルクローニングにおいて重要であった。 Linkage disequilibrium (LD) is increasingly being used as a mapping tool for both detailed measurement of gene location and initial localization of disease genes in specific populations. Allelic associations are significantly non-random and correlated with physical distance within a small (<60 kb) genomic region (for review, see Jorde, Amer. J. Hum. Genet. 66 : 979-988, 1995; see Jorde, Genome Res. 10: 1435-1444, 2000), probably reflecting the underlying "block structure" that characterizes many genomic regions (Reich et al., 2001; Daly et al., 2001). Thus, when disease alleles in a population share a recent common origin, the closest genetic marker with the strongest association is likely the closest to the locus causing the disease. This approach was important in the positional cloning of several simple Mendelian diseases, including the cystic fibrosis gene, the Huntington's disease gene, and the crustal dysplasia gene.

詳細なマッピングまたはポジショナルクローニングにおける適用に加えて、サイズにおいて最近増加を生じた、または遺伝的に同系交配である、均一な集団における最初の疾患遺伝子マッピングのために用いられうる。そのような集団において、疾患対立遺伝子は、おそらく、少数の創始者に存在しており、組換えは、これらの対立遺伝子と連鎖したマーカー遺伝子座との間の関連をランダム化する機会が制限されていた。これらの集団由来の罹患と非罹患の個体の間の対立遺伝子の関連の解析は、このように、疾患遺伝子座の位置推定を容易にしうる。多数のメンデル疾患はこのアプローチを用いてマッピングされた：フィニッシュ(Finish)集団におけるいくつかの疾患、メノナイトにおけるヒシュスプラング病(Hischsprung's disease)、隔離されたオランダ漁業共同体における良性再発性肝臓内胆汁うっ滞、サウジアラビア種族の血族群における幼少期の家族性持続性インスリン過剰低血糖症、およびベドウィンにおけるバルデー-ビードル(Bardet-Biedl)症候群。 In addition to application in detailed mapping or positional cloning, it can be used for initial disease gene mapping in a homogeneous population that has recently increased in size or is genetically inbred. In such populations, disease alleles are probably present in a few founders, and recombination has limited opportunities to randomize associations between these alleles and linked marker loci. It was. Analysis of allelic associations between affected and unaffected individuals from these populations may thus facilitate localization of disease loci. A number of Mendelian diseases have been mapped using this approach: several diseases in the Finish population, Hischsprung's disease in Menonite, benign recurrent intrahepatic cholestasis in an isolated Dutch fishery community Stagnation, early-onset familial persistent hyperinsulinemia in Saudi Arabia, and Bardet-Biedl syndrome in Bedouin.

どの集団が複合性多遺伝子性疾患のLDマッピングに最も良く適しているかに関して多くの論争があった(例えば、Wrightら、前記、1999；Eavesら、前記、2000；NordborgおよびTavare、前記、2002；Kaessmannら、前記、2002を参照)。LDの程度は、突然変異、組換えおよび遺伝子変換の率、人口統計的および淘汰的事象、ならびに突然変異自身の年数のような多数の遺伝的および進化的因子の複雑な関数である。これらの因子のあるものは、ゲノム全体に影響を及ぼすが、他のものは特定のゲノム領域に影響を及ぼすのみである。さらに、ゲノムを通じての突然変異、組換えおよび遺伝子変換の率の変動は、ゲノム領域間にLD差を引き起こすと予想される。 There has been a lot of controversy as to which population is best suited for LD mapping of complex polygenic diseases (eg, Wright et al., Supra, 1999; Eaves et al., Supra, 2000; Nordborg and Tavare, supra, 2002; (See Kaessmann et al., Supra, 2002). The degree of LD is a complex function of a number of genetic and evolutionary factors such as mutation, recombination and gene conversion rates, demographic and episodic events, and the age of the mutation itself. Some of these factors affect the entire genome, while others only affect specific genomic regions. Furthermore, variations in the rate of mutation, recombination and gene conversion throughout the genome are expected to cause LD differences between genomic regions.

LDに基づく方法により行われる疾患発見の試みに用いるための集団に関して、小さな、隔離された、同系交配の集団が、より低い不均一性およびより大きい程度の連鎖不平衡により、他の集団を凌ぐ利点をもつであろうと提案されていた(Wrightら、前記、1999；NordborgおよびTavare、前記、2002；Kaessmannら、前記、2002を参照)。他方、ヒスパニックおよびアフリカ系アメリカ人のような混合された集団は、連鎖不平衡が混合過程により最近引き起こされており、かつそれが大きな染色体領域に渡りうるという利点を提供するが、偽陽性を避けるためにこれらの集団に存在する遺伝的構造について制御することが極めて重要である(Parraら、前記、1998；Lautenbergerら、前記、2000；Pfaffら、前記、2001；NordborgおよびTavare、前記、2002)。LDに基づく方法における研究の焦点の増加にもかかわらず、しかしながら、ヒト集団におけるLDに関する多くの論点は十分に探査されていないままである。現在、NHGRIは、いくつかの集団において共通のハプロタイプを同定することにより遺伝子同定研究のための情報的ツールを開発するのを助ける体系的プロジェクトを計画している。この「ハプロタイプマッププロジェクト」(HMP)は、一般的な集団サンプルにおいて共通のハプロタイプを見出すことに焦点を合わせた大規模な複数の研究機関にまたがった試みになる可能性が高い。HMPは、いくつかの集団はハプロタイプブロック構造について調べられるため、本明細書に開示されているようなAIMを同定するための重要なデータ源であるとわかる可能性が高く、従って、追加の候補AIMおよび親の集団の一部における詳細なLD構造についての基本的計画を提供する。 Small, isolated, inbred populations outperform other populations with lower heterogeneity and greater degree of linkage disequilibrium in terms of populations for use in disease discovery attempts performed by LD-based methods It has been proposed to have advantages (see Wright et al., Supra, 1999; Nordborg and Tavare, supra, 2002; Kaessmann et al., Supra, 2002). On the other hand, mixed populations such as Hispanics and African-Americans offer the advantage that linkage disequilibrium has recently been caused by the mixing process and it can span large chromosomal regions, but avoids false positives It is very important to control for the genetic structure present in these populations (Parra et al., Supra, 1998; Lautenberger et al., Supra, 2000; Pfaff et al., Supra, 2001; Nordborg and Tavare, supra, 2002) . Despite increasing research focus on LD-based methods, however, many issues regarding LD in the human population remain unexplored. Currently, NHGRI is planning a systematic project to help develop information tools for gene identification studies by identifying common haplotypes in several populations. This “Haplotype Map Project” (HMP) is likely to be an attempt across multiple large institutions focused on finding common haplotypes in common population samples. HMP is likely to prove to be an important data source for identifying AIMs as disclosed herein, as some populations are examined for haplotype block structure, and therefore additional candidates Provides a basic plan for detailed LD structure in AIM and part of the parental population.

本明細書に開示されているような混合マッピングは、HMPと性質が異なるのだが、HMPに補足的である。第一に、HMPの第一次焦点は、ゲノム中の個々のゲノム領域の詳細な構造を理解することであるが、本方法は、混合に特異的に起因するLDの理解を可能にする。混合からのLDのレベルは、何百万塩基(Mb；メガベース)および何十Mbというオーダーであり、一方、HMPは、何10〜100のキロベース(kb)のレベルに焦点を合わせ、1つのプロジェクトからの結果に影響を及ぼすゲノムおよび集団の特徴が別のものにおいて注目されない場合がある。第二に、混合マッピングは、正確な親の対立遺伝子頻度推定を必要とする。それとして、多数の異なるアフリカ人、先住アメリカ人、ヨーロッパ人およびアジア人集団がタイピングされたが(下の表6を参照)、HMPは、主な集団群の1つまたは2つのサンプルに焦点を合わせる可能性が高い。 A mixed mapping as disclosed herein is complementary to HMP, although it differs in nature from HMP. First, the primary focus of HMP is to understand the detailed structure of individual genomic regions in the genome, but this method allows an understanding of LD that is specifically caused by mixing. The level of LD from mixing is on the order of millions of bases (Mb; megabase) and tens of Mb, while HMP focuses on tens of kilobases (kb), The genomic and population characteristics that affect the results from the project may not be noticed elsewhere. Second, mixed mapping requires accurate parental allele frequency estimation. As such, a number of different African, Native American, European and Asian populations were typed (see Table 6 below), but HMP focuses on one or two samples of the main population groups. There is a high possibility of matching.

第三に、アフリカ系アメリカ人およびヒスパニックの多数のサンプル(n=500)がタイピングされ、それに従って、混合マップの適用範囲を試験する、および解析方法を比較するのに十分な統計的検出力を提供する。さらに、国の異なる領域からいくつかの代表的な集団がタイピングされたので、祖先の割合および混合動態における地理学的変動が調べられうる。いくつかの混合された集団はHMPに含まれる可能性が高いが、個体の数および異なる集団サンプルの数は、本明細書に開示されているものより少なく、それゆえに、同じ比較が可能ではない。例えば、4つの祖先群のそれぞれについて10個のサンプルをもつことは、それらの群の1つまたはいくつかにおいて優先的に存在する配列の同定にとって十分ではない；本明細書に開示されているように、少なくとも50個の個体が、これらのマーカーを包括的に同定するために数十個の祖先群(ただの4個ではない)のそれぞれについて検査された。 Third, a large number of African-American and Hispanic samples (n = 500) are typed and have sufficient statistical power to test the coverage of mixed maps and compare analytical methods accordingly. provide. In addition, as several representative populations from different regions of the country have been typed, geographic variations in ancestral proportions and mixed dynamics can be examined. Although some mixed populations are likely to be included in the HMP, the number of individuals and the number of different population samples are less than those disclosed herein and therefore the same comparison is not possible . For example, having 10 samples for each of four ancestry groups is not sufficient for identification of sequences preferentially present in one or several of those groups; as disclosed herein At least 50 individuals were examined for each of several tens of ancestry groups (not just 4) to comprehensively identify these markers.

第四に、最近の集団変動の試み(例えば、SNPコンソーシアム対立遺伝子頻度プロジェクト)および、おそらく、HMPの焦点は、多数の複雑な理由のために先住アメリカ人集団を除外して東アジア人、アフリカ人およびヨーロッパ人のサンプルにあった。しかしながら、これらの集団の除外は、US現住人口の最も速く増加している群、すなわち、ヒスパニック、有意なレベルの先住アメリカ人祖先(20%〜40%)をもっている、の遺伝的性質の理解において不足を生じる。本明細書に開示されたマーカーおよび方法で、ヒスパニック集団の疾患の遺伝的性質が調べられうる。同様に、いくつかの種々の先住アメリカ人集団は、しばしばヒスパニックとしていっしょに分類される多数の別個の群についての重要な親の集団を示しうる。 Fourth, recent population variability attempts (e.g., SNP consortium allele frequency projects) and perhaps the focus of HMP is to exclude indigenous American populations for many complex reasons, East Asians, Africa Was in the human and European samples. However, the exclusion of these populations is in understanding the genetic nature of the fastest growing group of US inhabitants, namely Hispanic, having a significant level of indigenous American ancestry (20% -40%). Cause shortages. With the markers and methods disclosed herein, the genetic nature of a Hispanic population's disease can be examined. Similarly, several different indigenous American populations may represent important parental populations for many distinct groups that are often classified together as Hispanic.

本明細書に開示された集団に基づく関連方法は、伝統的な連鎖研究を凌ぐいくつかの利点を提供する。伝統的な遺伝的連鎖方法により疾患遺伝子の位置を推定することは、関連した人、拡張された複数世代の家族かまたは関連した個体のペアのいずれか、の使用に頼る。これらのアプローチは、単一遺伝子により引き起こされる疾患を調査する場合、効果的かつ非常に強力である。しかしながら、2型糖尿病、高血圧症および前立腺癌のような多遺伝子性および多因子性疾患は、いくつかの遺伝子および複合的な環境影響の相互作用から生じ、伝統的な方法を用いて研究するにはより困難である。ありふれた疾患に対する感受性に寄与する遺伝子の同定は、不均一性により複雑にされている。遺伝的不均一性の源が、どのマッピング方法が遺伝子同定のために働く可能性が最も高いかを決定する。遺伝的不均一性の2つの基本的な型は、1つより多い遺伝子座が遺伝的形質に影響を及ぼしている、遺伝子座不均一性、および特定の原因である遺伝子座内に表現型を変えることにおいて重要である複数の対立遺伝子がある、対立遺伝子不均一性である。拡張された家族を用いる伝統的な連鎖解析は、一般的に、対立遺伝子不均一性に反応しないが、遺伝子座不均一性により不利な影響を及ぼされうる。または、LDに基づく方法は、一般的に、対立遺伝子不均一性により不利な影響を及ぼされるが、遺伝子座不均一性によってはほとんど影響を及ぼされない。 The population-based related methods disclosed herein provide several advantages over traditional linkage studies. Estimating the location of disease genes by traditional genetic linkage methods relies on the use of either a related person, an extended multi-generation family or a pair of related individuals. These approaches are effective and very powerful when investigating diseases caused by a single gene. However, polygenic and multifactorial diseases, such as type 2 diabetes, hypertension and prostate cancer, arise from the interaction of several genes and complex environmental effects, and can be studied using traditional methods Is more difficult. The identification of genes that contribute to susceptibility to common diseases is complicated by heterogeneity. The source of genetic heterogeneity determines which mapping method is most likely to work for gene identification. There are two basic types of genetic heterogeneity: locus heterogeneity, where more than one locus affects the genetic trait, and phenotype within the locus that is the specific cause There is allelic heterogeneity, where there are multiple alleles that are important in changing. Traditional linkage analysis using extended families generally does not respond to allelic heterogeneity, but can be adversely affected by locus heterogeneity. Alternatively, LD-based methods are generally adversely affected by allelic heterogeneity, but little affected by locus heterogeneity.

ほとんど対立遺伝子不均一性がないとすれば、測定遺伝子型および伝達不平衡試験(TDT)のような関連に基づくアプローチは、家族に基づくLODスコアまたは同胞対方法より感度が高くありうる。RischおよびMerikangas(前記、1996)は、同胞対研究およびTDT研究に必要とされる個体の数を比較し、連鎖を検出するために必要とされる個体の数は、同胞対研究についてよりTDTについて非常に小さいことを示した。これは、疾患遺伝子座の影響が小さい場合、特に真である。例えば、2.0のリスク率および50%の遺伝子頻度をもつ遺伝子座について、2500個の同胞対またはTDTについての340個の症例/親が必要とされる。ハプロタイプ相対的リスク(Haplotype Relative Risk)(HRR)または症例/対照設計、またはTDTでの連鎖を用いる関連の実証が同胞対での連鎖の実証に勝ったいくつかの例がある。古典的例は、インスリン遺伝子とIDDMとの間の関連であり、症例および対照において実証され、その後、TDTを用いて確認されたが、しばしば、同胞対連鎖研究において観察されなかった(Spielmanら、Amer. J. Hum. Genet. 28:317-331、1993)。Yaouanqら、(Science、1997)は、一連の157個のフランス人家族(99個の単発性および58個の多発性)においてTDTを用いてHLAと多発性硬化症の間の連鎖について非常に有意な(p<10^-9)証拠を報告した。58個の多発性家族が単独で解析された場合、TDTおよび同胞対方法について、それぞれ、0.0001および0.03のp値が生じた。 Given little allelic heterogeneity, association-based approaches such as measured genotypes and transmission disequilibrium tests (TDT) can be more sensitive than family-based LOD scores or sibling pair methods. Risch and Merikangas (supra, 1996) compare the number of individuals required for sibling pair studies and TDT studies, and the number of individuals required to detect linkage is more for TDT than for sibling pair studies. It was very small. This is especially true when the effect of the disease locus is small. For example, for a locus with a risk rate of 2.0 and a gene frequency of 50%, 2500 sibling pairs or 340 cases / parents for TDT are required. There are several examples where the related demonstration using Haplotype Relative Risk (HRR) or case / control design, or linkage in TDT, has demonstrated linkage in sibling pairs. A classic example is the association between the insulin gene and IDDM, which was demonstrated in cases and controls and subsequently confirmed using TDT, but was often not observed in sibling pair linkage studies (Spielman et al., Amer. J. Hum. Genet. 28: 317-331, 1993). Yaouanq et al. (Science, 1997) are very significant about the linkage between HLA and multiple sclerosis using TDT in a series of 157 French families (99 single and 58 multiple) ^N (p <10 ^-9 ) evidence was reported. When 58 multiple families were analyzed alone, p-values of 0.0001 and 0.03 were generated for the TDT and sibling pair methods, respectively.

候補遺伝子に基づく関連解析は、家族における連鎖解析より疾患遺伝子を検出する力が相対的に高いが、40,000個を超える遺伝子をもつゲノムにおいてすべての遺伝子を徹底的に検査することは、現在のところ実際的ではない。ハプロタイプマッププロジェクトは、連鎖不平衡に基づく遺伝子同定を行うために必要な情報源を作成するのに成功する可能性がある。しかしながら、たとえヒトゲノムのブロック構造モデルが各遺伝子における4つのハプロタイプにより説明されえたとしても、SNPおよびDIPの最小限の数は、80,000個であり、実際の数はより高い可能性がある。ジェノタイピング技術は急速に進歩しているが、多数の研究対象においてこの数のマーカーを検査することは、まだ実際的ではない。さらに、大きな集団においてLDを用いて遺伝子を同定する計画に内在するいくつかの重要な仮定がある。ゲノムワイドのスクリーニングにおいて連鎖不平衡を用いる1つの重要な困難性は、LDが、マーカーと疾患遺伝子座の間の組換え画分と共に、および疾患を引き起こす突然変異の年数と共に、指数関数的に衰退することである。疾患の素因をつくるより古い突然変異については、LDは、疾患対立遺伝子と、比較的密接に間隔をおかれたマーカー遺伝子座における対立遺伝子との間でさえも非常に弱くなる。 Association analysis based on candidate genes is relatively more powerful than family linkage analysis to detect disease genes, but a thorough examination of all genes in genomes with more than 40,000 genes is currently Not practical. The haplotype map project may succeed in creating the necessary information sources for gene identification based on linkage disequilibrium. However, even though the block structure model of the human genome can be explained by four haplotypes in each gene, the minimum number of SNPs and DIPs is 80,000, and the actual number may be higher. Although genotyping technology is advancing rapidly, testing this number of markers in many research subjects is not yet practical. In addition, there are several important assumptions inherent in the plan to identify genes using LD in large populations. One important difficulty in using linkage disequilibrium in genome-wide screening is that LD declines exponentially with the recombinant fraction between the marker and disease locus, and with the number of mutations that cause the disease It is to be. For older mutations that predispose to disease, LD becomes very weak, even between disease alleles and alleles at relatively closely spaced marker loci.

LDマッピングは、嚢胞性線維症のようなまれな遺伝病、ならびにフィン人およびベドウィンのような特定の集団、有意な集団ボトルネック、同系交配または創始者効果を受けやすくなった集団における疾患のマッピングにおいて有用であった。これらの状況において、嚢胞性線維症の場合のように変異体対立遺伝子は比較的若い、または集団は遺伝的変動性を低下させ、ゲノム中に渡るLDを上昇させるために、LDが存在する。ありふれた疾患の遺伝的性質についてのリーディングモデルは、特定の組み合わせで存在する場合、個体リスクを増加させる多数の遺伝子座において素因をつくる対立遺伝子を規定する(Greenberg、Amer. J. Hum. Genet. 52:135-143、1993；LanderおよびSchork、Science 265:2037-2048、1994；RischおよびMerikangas、前記、1996)。疾患がありふれている場合には、このモデルについて、素因をつくる対立遺伝子はまた、比較的高い頻度であることが予想される。しかしながら、天然のモデルを仮定すれば、集団における対立遺伝子の頻度は、より頻度の高い対立遺伝子はまれな対立遺伝子より古いというように、平均して対立遺伝子の年数に関連している。均一の集団において、LDは対立遺伝子の年数に逆比例して関連づけられ、ありふれた疾患についてのリスク対立遺伝子は平均して、相対的に古いと予想されるため、隔離されていないまたは同系交配ではない集団において共通の疾患遺伝子を同定するためのLDに基づく方法の適用について、この事実は問題を提起している。 LD mapping is a mapping of diseases in rare genetic diseases such as cystic fibrosis and in certain populations such as Finns and Bedouins, significant population bottlenecks, inbred or founder-sensitive populations Was useful. In these situations, the mutant allele is relatively young, as in cystic fibrosis, or the population is present in order to reduce genetic variability and increase LD across the genome. Reading models for the genetic nature of common diseases define alleles that predispose at multiple loci that increase individual risk when present in specific combinations (Greenberg, Amer. J. Hum. Genet. 52: 135-143, 1993; Lander and Schork, Science 265: 2037-2048, 1994; Risch and Merikangas, supra, 1996). If the disease is common, the predisposing allele for this model is also expected to be relatively frequent. However, given the natural model, the frequency of alleles in a population is on average related to the age of the allele, such that more frequent alleles are older than rare alleles. In a homogeneous population, LD is inversely related to the age of the allele, and risk alleles for common diseases are expected to be relatively old on average, so in unisolated or inbred This fact poses a problem for the application of LD-based methods to identify common disease genes in no population.

混合マッピングのための本発明の組成物および方法の適用は、複合性形質のより正確かつ信頼性のあるマッピングを可能にする。混合マッピングは、以前に隔離された集団が混じる場合に引き起こされるLDを利用し、複合性形質をマッピングするにおいてこれらの問題を回避することができる。混合された集合が遺伝的連鎖を測定するにおいて有用でありうることは、ChakrabortyおよびWeiss(前記、1988)により初めて認識された。遺伝的に多岐にわたる集団がハイブリダイズする場合、非ランダム的対立遺伝子関連が、有意な対立遺伝子頻度差をもつ遺伝子座間に、連鎖していない遺伝子座間にさえも、生じる。このLDは、問題の遺伝的座が同じ染色体上に共に近接して位置していない場合、速やかに衰退する。 Application of the compositions and methods of the present invention for mixed mapping allows for more accurate and reliable mapping of complex traits. Mixed mapping can take advantage of LD caused when previously segregated populations are mixed and avoid these problems in mapping complex traits. It was first recognized by Chakraborty and Weiss (supra, 1988) that mixed populations could be useful in measuring genetic linkage. When genetically diverse populations hybridize, nonrandom allele associations occur between loci with significant allelic frequency differences, even between unlinked loci. This LD declines rapidly if the genetic locus in question is not located close together on the same chromosome.

LDは、2つのマーカー間の組換え率(θ)およびそれらのハイブリダイゼーションからの世代の数(n)の関数として衰退し、D_n=(1-θ)ⁿD₀、D_nはハイブリダイゼーション後n世代の連鎖不平衡であり、D₀は最初の連鎖不平衡である、として表されうる(ChakrabortyおよびWeiss、前記、1988)。LDにおける減少と遺伝的距離の間にこの指数関係を仮定すれば、マーカーが共に近接して、遺伝的に連鎖しているために高く留まっている混合された集団(混合からの時間が短い場合)におけるLDと、連鎖していない遺伝子座間のバックグラウンド連鎖不平衡の間を識別することが可能である。例えば、10世代後、連鎖していない遺伝子座における連鎖不平衡は、最初のレベルの0.1%まで低下し、10 cMおよび1 cM離れた遺伝子座において、真の連鎖による不平衡は、まだ、最初のレベルの、それぞれ、34.9%および90.4%である。同定される混合された集団における連鎖の効果的検出のための臨界パラメーターは、親の集団とハイブリダイゼーションからの世代の数の間の頻度差(δ)である。混合された集団における関連解析による連鎖は、δが大きく(0.2以上)、混合からの世代の数が小さい(10世代のオーダーにおいて；ChakrabortyおよびWeiss、前記、1988)場合には、効率的に働いた。 LD declines as a function of the recombination rate between two markers (θ) and the number of generations from those hybridizations (n), where D _n = (1-θ) ⁿ D ₀ , D _n is the hybridization The latter n generations of linkage disequilibrium and D ₀ can be expressed as being the first linkage disequilibrium (Chakraborty and Weiss, supra, 1988). Assuming this exponential relationship between the decrease in LD and genetic distance, a mixed population in which markers are close together and remain high because of genetic linkage (if the time from mixing is short) ) And background linkage disequilibrium between unlinked loci. For example, after 10 generations, linkage disequilibrium at unlinked loci drops to 0.1% of the initial level, and at 10 cM and 1 cM away loci, true linkage disequilibrium is still Of 34.9% and 90.4%, respectively. A critical parameter for effective detection of linkages in the identified mixed population is the frequency difference (δ) between the parent population and the number of generations from hybridization. Linkage by association analysis in mixed populations works efficiently when δ is large (greater than 0.2) and the number of generations from mixing is small (in the order of 10 generations; Chakraborty and Weiss, supra, 1988). It was.

Stephensら(前記、1994)およびBriscoeら(前記、1994)は、コンピューターシミュレーション(MALD)を用いてこのアプローチを研究し、研究設計のための実際的な考慮すべき事柄を詳述した。アメリカにおいて起きた混合の型の単純なモデルを用いて、彼らは、200〜300個の均等に間隔をおかれたマーカーについてタイピングされ、それぞれがδ>0.3をもつ、200〜300人の患者のサンプルサイズを用いると、原因となる遺伝子を位置づける>95%の機会をもつだろうことを示唆した。研究されたいくつかのモデルからの一致する結果は、用いられたマーカーの対立遺伝子頻度差におけるMALDの検出力の一次従属性であった。δが小さい場合には、最初のLDは小さく、バックグラウンドノイズから識別するには困難である。 Stephens et al. (Supra, 1994) and Briscoe et al. (Supra, 1994) studied this approach using computer simulation (MALD) and detailed practical considerations for study design. Using a simple model of the mixed type that occurred in the United States, they were typed on 200-300 equally spaced markers, each of 200-300 patients with δ> 0.3. Using the sample size suggested that there would be> 95% opportunity to locate the causative gene. A consistent result from several models studied was the primary dependency of MALD power on the allelic frequency difference of the markers used. When δ is small, the initial LD is small and difficult to distinguish from background noise.

Stephensら(前記、1994)は、効果的混合マッピングのために混合された親の集団間でδ>0.4である遺伝子座を用いることを示唆した。彼らはまた、混合マッピングは、4〜20世代前にハイブリダイズした集団において最も効果的であること、および過去3世代中に親の集団から新しい遺伝子移入がなかったとの条件で、漸増の混合(一つの集団のもう一つへのゆるやかな遺伝子移入；連続的遺伝子流動モデルとしても知られている)は、重大ではないが、混合マッピングの検出力に影響を及ぼすことを実証している。開示された混合マッピング技術は、感受性のある遺伝子型の頻度において大きな差がある親の集団から構成される混合された集団の解析により疾患感受性遺伝子の位置を同定することができる。それとして、混合マッピングの適用は、太平洋諸島系集団における2型糖尿病感受性、アフリカ系アメリカ人における高血圧症、肥満症および前立腺癌、ならびにヒスパニック集団における2型糖尿病、肥満症および胆嚢疾患の研究を含む。 Stephens et al. (Supra, 1994) suggested using loci where δ> 0.4 between mixed parental populations for effective mixed mapping. They also found that mixed mapping was most effective in populations that hybridized 4-20 generations ago, and that there was no incremental gene transfer (provided that there were no new introgressions from the parental population during the last three generations). Slow gene transfer from one population to another; also known as a continuous gene flow model) has been demonstrated to affect the power of mixed mapping, although not critical. The disclosed mixed mapping technique can identify the location of disease susceptibility genes by analysis of a mixed population composed of parental populations with large differences in the frequency of susceptible genotypes. As such, mixed mapping applications include studies of type 2 diabetes susceptibility in Pacific Islander populations, hypertension, obesity and prostate cancer in African Americans, and type 2 diabetes, obesity and gallbladder disease in Hispanic populations .

McKeigueは、より初期の仕事に基づく遺伝子をマッピングするにおいて混合を探査するためのアプローチを開発した(McKeigue、前記、1997；McKeigue、前記、1998；McKeigueら、前記、2000)。そのアプローチは、混合により生じているLDにより動力を供給されるが、実験的交差の連鎖解析により類似している。この理由のため、用語「混合マッピング」が提案された。対立遺伝子の関連について検査する代わりに、混合により生じている連鎖についてすべての情報を抽出するために混合された血統の染色体上に祖先における根底にある変動のモデルを作ることができる。 McKeigue has developed an approach to explore mixing in mapping genes based on earlier work (McKeigue, supra, 1997; McKeigue, supra, 1998; McKeigue et al., Supra, 2000). The approach is powered by LD generated by mixing, but is more similar to linkage analysis of experimental crossovers. For this reason, the term “mixed mapping” has been proposed. Instead of testing for allelic association, a model of the underlying variation in ancestry can be created on the mixed pedigree chromosomes to extract all the information about linkages resulting from mixing.

上で考察されているように、実際にこのアプローチを適用するために、進歩した統計的方法が必要とされる。親の混合における条件付けは、連鎖していない遺伝子座における形質の祖先との関連を除去し、他の因子が一定に保たれて比較がなされることを保証する。非統計的言い方をすれば、その個体の両親の混合を仮定すれば、期待された割合をもつ特定の血統のものであるマーカー遺伝子座における対立遺伝子の割合について各個体において比較がなされる。これを行う1つの単純な方法は、共分散分析(ANCOVA)検定を用いることであるが(表2および3を参照)、このより単純なアプローチは、利用可能な情報のすべてを用いるとは限らない。それとして、ベイズの方法もまた用いられた(表2および3を参照)。 As discussed above, advanced statistical methods are needed to apply this approach in practice. Conditioning in the parental mix removes the association of the trait ancestors at unlinked loci and ensures that other factors remain constant and the comparison is made. In non-statistical terms, each individual is compared for the proportion of alleles at a marker locus that is of a particular pedigree with the expected proportion, assuming a mixture of the individuals' parents. One simple way to do this is to use a covariance analysis (ANCOVA) test (see Tables 2 and 3), but this simpler approach may not use all of the available information. Absent. As such, the Bayesian method was also used (see Tables 2 and 3).

マーカー遺伝子型から遺伝子座における対立遺伝子の祖先を推論するために、祖先特異的対立遺伝子頻度が必要とされる；すなわち、対立遺伝子の祖先(この例では、西アフリカ人またはヨーロッパ人)を与えられた各対立遺伝子状態の条件付き確率。混合された集団における任意の遺伝子座での対立遺伝子の全集合は、2つの下位集団 − アフリカ人祖先の対立遺伝子およびヨーロッパ人祖先の対立遺伝子、から構成されるとみなされうる。祖先特異的対立遺伝子頻度が研究中の混合された集団に正しく特定される限り、ベイズの定理は、これらの条件付き確率を逆にして、研究中の各個体について遺伝子座における祖先の次の分布(アフリカ人祖先の0個、1個または2個の対立遺伝子)を計算するために適用されうる。単一のマーカーをタイピングすることにより伝達される情報が、マーカー遺伝子座における各対立遺伝子の祖先を2つの創始集団の1つに割り当てるのみ十分でない場合には、マーカーは、隣接した遺伝子座において祖先を推定する多点解析に組み合わされうる。 An ancestor-specific allele frequency is required to infer the allelic ancestry at the locus from the marker genotype; ie, given the allelic ancestry (in this example West African or European) Conditional probability for each allelic state. The entire set of alleles at any locus in a mixed population can be considered to be composed of two subpopulations-alleles of African ancestry and alleles of European ancestry. As long as the ancestor-specific allele frequencies are correctly specified for the mixed population under study, Bayes' theorem reverses these conditional probabilities, and for each individual under study, the next distribution of ancestors at the locus It can be applied to calculate (0, 1 or 2 alleles of African ancestry). If the information conveyed by typing a single marker is not sufficient to assign each allelic ancestor at the marker locus to one of the two founding populations, the marker is ancestor at the adjacent locus. Can be combined with multipoint analysis to estimate.

シミュレーション研究は、十分なマーカーを用いて、たとえどの単一のマーカーも祖先について完全には情報を与えないとしても、各遺伝子座における祖先についての高い割合の情報が抽出されうることを示した(McKeigue、前記、1998)。これらのシミュレーションに基づいて、全1,000個のAIMについて2〜3 cMの平均間隔においてFst>0.4をもつマーカーのパネルが、本明細書に開示されているように構築されうる。特定の集団(例えば、主として西アフリカ人およびヨーロッパ人であるアフリカ系アメリカ人の群)についての1,000個のAIMのパネルが他の群についてのパネルとしばしば重複することは認識されるべきである。換言すれば、1つのレベルの識別(例えば、アフリカ人/ヨーロッパ人)について選択されたAIMがまた他の識別(例えば、先住アメリカ人/ヨーロッパ人)についての情報を与えることはよくある。表1は、32個のAIMを含む最初に同定されたパネルを列挙している(配列番号：332〜363；実施例1も参照)。d>0.3のカットオフを用いて、これらのマーカーの4個のみが、3つの比較(アフリカ人/ヨーロッパ人；アフリカ人/先住アメリカ人；先住アメリカ人/ヨーロッパ人)のうちの1つへの情報提供において限定される；残りは、比較の2つについて情報を与え、1つのマーカーはすべての3つの比較について情報を与える。さらなる研究において、71個のAIMのパネルが、インドヨーロッパ人、サハラ以南アフリカ人、先住アメリカ人および東インド人に関して情報を与えると同定された(配列番号：1〜71；表6)(実施例2参照)。 Simulation studies have shown that with sufficient markers, a high percentage of information about ancestry at each locus can be extracted even if no single marker provides complete information about ancestry ( McKeigue, supra, 1998). Based on these simulations, a panel of markers with Fst> 0.4 at an average interval of 2-3 cM for all 1,000 AIMs can be constructed as disclosed herein. It should be appreciated that a panel of 1,000 AIMs for a particular population (eg, a group of African Americans, primarily West Africans and Europeans) often overlaps with panels for other groups. In other words, the AIM selected for one level of identification (eg, African / European) often also provides information about other identifications (eg, Native American / European). Table 1 lists the first identified panel containing 32 AIMs (SEQ ID NOs: 332-363; see also Example 1). Using a cut-off of d> 0.3, only four of these markers were transferred to one of three comparisons (African / European; African / Indigenous American; Native American / European) Limited in providing information; the rest gives information for two of the comparisons, and one marker gives information for all three comparisons. In further studies, a panel of 71 AIMs were identified as informative about Indo-Europeans, Sub-Saharan Africans, Native Americans and East Indians (SEQ ID NOs: 1-71; Table 6) (Examples) 2).

混合マッピングが遺伝子同定の効果的手段であるという証拠が増加している。少なくとも3つの独立したグループは、実質的な距離で間隔をおかれた連鎖したマーカー間に強い混合連鎖不平衡(ALD)を報告した(例えば、Parraら、前記、1998および2001；Pfaffら、前記、2001；McKeigueら、前記、2000)。長い遺伝的距離に渡って観察された非常に高いレベルの関連を仮定すれば、いくつかの遺伝的差のために親の集団間で劇的に異なる表現型がまた、連鎖しているAIMとの関連を示すことが予想される。しかしながら、本開示まで、MALDアプローチが見せるほどの見込みがある、必要とされるAIMのバージョンに基づいてSNPを同定する体系的スクリーンは、報告されてこなかった。McKeigueおよび他は、このアプローチでの使用のためのSTR AIMのパネルを同定したが、この目的のためのSTRの使用は、STRの対立遺伝子の複雑性、および対立遺伝子頻度を正確に推定するために必要とされる大規模のデータベースのために問題がある。観察されていない対立遺伝子の頻度における小さなエラーまたは不完全な仮定でさえも増幅して、研究の統計的検出力を損ないうる。 There is increasing evidence that mixed mapping is an effective means of gene identification. At least three independent groups reported strong mixed linkage disequilibrium (ALD) between linked markers spaced at a substantial distance (eg, Parra et al., Supra, 1998 and 2001; Pfaff et al., Supra). 2001; McKeigue et al., Supra, 2000). Given the very high level of association observed over long genetic distances, phenotypes that differ dramatically between parental populations due to some genetic differences are also linked to AIM It is expected to show the relationship. However, until this disclosure, no systematic screen has been reported to identify SNPs based on the required version of AIM that the MALD approach is likely to show. McKeigue and others have identified a panel of STR AIMs for use in this approach, but the use of STRs for this purpose is to accurately estimate STR allele complexity and allele frequency. There is a problem because of the large database required. Even small errors or even incomplete assumptions in the frequency of alleles that have not been observed can amplify the statistical power of the study.

親の集団内の不均一性は、混合マッピング研究に混乱させる影響を及ぼしうる。アフリカ系アメリカ人集団の場合、新世界(New World)で起こった混合の過程は、主に中央西部アフリカおよびヨーロッパ由来の集団、加えていくつかの先住アメリカ人集団の不均一な群を含んだ。ヨーロッパの遺伝的寄与に関して、最も重要な源集団は、グレートブリテン、アイルランド、ドイツおよびイタリアから来た。親のヨーロッパ人集団の起源の多様な地理学的領域にもかかわらず、遺伝的観点からヨーロッパ人集団の相対的均一性を示すことが重要である(例えば、Cavalli-Sforzaら、前記、1994)。 Inhomogeneities within the parental population can have a disruptive effect on mixed mapping studies. In the case of African-American populations, the mixing process that took place in the New World included mainly heterogeneous groups of populations from Central West Africa and Europe, plus several indigenous American populations. . With regard to the genetic contribution of Europe, the most important source populations came from Great Britain, Ireland, Germany and Italy. It is important to show the relative homogeneity of the European population from a genetic point of view despite the diverse geographical regions of origin of the parent European population (e.g. Cavalli-Sforza et al., Supra, 1994) .

アフリカ人の寄与に関して、アフリカ大陸はおびただしい量の遺伝的多様性を含むことはよく知られている。しかしながら、アフリカ人の遺伝的多様性のサブセットのみがアフリカ系アメリカ人集合の形成に寄与した。奴隷にされたアフリカ人の大多数は、中央西部アフリカ、およそ北部のセネガルから南部のアンゴラまで、から来た(Curtin、The Atlantic Slave Trade；Madison、University of Wisconsin Press 1969)；アフリカの他の地域は奴隷貿易に影響を及ぼされなかった。アフリカに存在する4つの主な言語学的語族、ニジェール-コンゴコルドファン語族、ナイル-サハラ語族、アフロアジア語族およびコイサン語族のうち(Greenberg、前記、1963)、新世界へ強制的に連れてこられた奴隷のアフリカ人の大多数は、ニジェール-コンゴ語族のメンバーであった。この広まった語族は、西アフリカ語(セネガルからナイジェリアまでの人々に話されている)およびバントゥー語(中央および南アフリカに優勢)を含む。バントゥー語は、約3,000年前に起こった「最近の」拡大によりアフリカ中に分散され、おそらく西アフリカ(ナイジェリアおよびカメルーン；Excoffierら、Yearbook Phys. Anthropol. 30:151-194、1987およびCavalli-Sforzaら、前記、1994を参照)に源を発していた。この最近の起源は、バントゥーの言語学的および遺伝的均一性に反映されている(Excoffierら、前記、1987；Weberら、前記、2000)。このように、入手可能な歴史学的、言語学的および遺伝的証拠は、サハラ以南アフリカに見出される多様性のサブセットのみがアフリカ系アメリカ人の遺伝子プールへ寄与したこと、および不均一性の可能性のある問題は、アフリカの全大陸の多様性が現代のアフリカ系アメリカ人集団に現されている場合よりもはるかに少ないことを示している。残念なことに、西および中央アフリカに存在する不均一性の程度は、この地域の集団についての入手可能な情報の欠如のため、ほとんど未知のままである。 With respect to African contributions, it is well known that the African continent contains a tremendous amount of genetic diversity. However, only a subset of African genetic diversity contributed to the formation of African-American populations. The vast majority of slaved Africans came from Central Western Africa, approximately from northern Senegal to southern Angola (Curtin, The Atlantic Slave Trade; Madison, University of Wisconsin Press 1969); other parts of Africa Was not affected by the slave trade. Of the four main linguistic languages in Africa, the Niger-Congo Cordofan, the Nile-Sahara, the Afro-Asian and the Koisan (Greenberg, supra, 1963) are forced into the New World. The majority of slave Africans were Niger-Congo members. This widespread language includes West African (spoken to people from Senegal to Nigeria) and Bantu (dominant in Central and South Africa). The Bantu language is dispersed throughout Africa due to the “recent” expansion that occurred about 3,000 years ago, perhaps in West Africa (Nigeria and Cameroon; Excoffier et al., Yearbook Phys. Anthropol. 30: 151-194, 1987 and Cavalli-Sforza et al. , Supra, see 1994). This recent origin is reflected in the linguistic and genetic homogeneity of Bantu (Excoffier et al., Supra, 1987; Weber et al., Supra, 2000). Thus, available historical, linguistic and genetic evidence suggests that only a subset of diversity found in sub-Saharan Africa contributed to the African American gene pool, and possible heterogeneity Sexual problems indicate that the diversity of all continents in Africa is much less than that present in modern African American populations. Unfortunately, the degree of heterogeneity that exists in West and Central Africa remains largely unknown due to the lack of available information about populations in this region.

ヨーロッパ人集団内ならびに西および中央アフリカ内の不均一性の程度がほとんど未知のままであるため、不均一性の可能性のある影響は、特に混合マッピングアプローチを考慮する場合、注意を向けられる必要がある。不均一性が混合マッピングの試みに影響を及ぼしうる2つのレベルがある。第一に、不均一性は、マップにおいて用いられたマーカーについての親の頻度の誤った推定へと導きうり、それに従って、混合の推定を偏向させる。混合マッピングの目標が親の混合における連鎖条件付けを推論することであるとすれば、祖先特異的対立遺伝子頻度の誤った特定化を避けることが重要であり、なぜなら、これは、解析の最終結論に影響を及ぼしうるからである。第二に、不均一性は、研究をすることになっている表現型についての遺伝子座の数に影響を及ぼしうる。 The degree of heterogeneity within the European population and within West and Central Africa remains largely unknown, so the potential impact of heterogeneity needs attention, especially when considering mixed mapping approaches There is. There are two levels where heterogeneity can affect mixed mapping attempts. First, the heterogeneity leads to an incorrect estimate of the parent's frequency for the markers used in the map, and biases the mixture estimate accordingly. If the goal of mixed mapping is to infer linkage conditioning in the parental mix, it is important to avoid misspecification of ancestor-specific allele frequencies because this is the final conclusion of the analysis. It can have an impact. Second, heterogeneity can affect the number of loci for the phenotype to be studied.

混合された集団への遺伝的寄与の推定を偏向させることにおける不均一性の影響は、主な親の集団(ヨーロッパ人およびアフリカ人)内で均一性を示すマーカーを選択することにより低下させられうる。このように、異なる地理学的領域の親集団への寄与の問題が最小限にされ、混合推定における偏向を低下させる。このストラテジーは、以前の混合研究で実行されており(Parraら、前記、1998、2001；Pfaffら、前記、2001)、異なるヨーロッパ人およびアフリカ人集団における情報を与える可能性のあるマーカーが体系的に解析された。例として、現在、アフリカ内の不均一性を検定するために、5つのアフリカ人集団、ナイジェリアから2つ、シエラレオネから2つおよび中央アフリカ共和国から1つ、からのサンプルにおいて、それぞれ情報を与える可能性があるマーカーがジェノタイピングされ、有意な不均一性を示すマーカーは解析から除外された(下記参照)。これらのサンプルのすべては、奴隷貿易により影響を及ぼされた地域から来た。望ましい場合には、アンゴラ、奴隷にされたアフリカ人の約40%の供給源であった領域である、からのサンプルが組み込まれうり、従って、アフリカ人の親集団のもう一つのサンプルを提供する。このストラテジーに加えて、親の頻度の誤った特定化について検定する統計的方法があることに留意することは重要である(例えば、McKeigueら、前記、2000を参照)。 The impact of heterogeneity in biasing estimates of genetic contribution to a mixed population can be reduced by selecting markers that show uniformity within the main parent population (European and African). sell. In this way, the problem of contribution to the parent population of different geographical regions is minimized, reducing the bias in the mixture estimation. This strategy has been carried out in previous mixed studies (Parra et al., Supra, 1998, 2001; Pfaff et al., Supra, 2001) and systematic markers that may provide information in different European and African populations. Was analyzed. As an example, information can now be provided in samples from five African populations, two from Nigeria, two from Sierra Leone, and one from the Central African Republic to test heterogeneity within Africa. Sexual markers were genotyped and markers that showed significant heterogeneity were excluded from the analysis (see below). All of these samples came from areas affected by slave trade. If desired, a sample from Angola, an area that was the source of about 40% of slaved Africans, may be incorporated, thus providing another sample of the African parent population . In addition to this strategy, it is important to note that there are statistical methods to test for misspecified parent frequencies (see, eg, McKeigue et al., Supra, 2000).

研究されることになっている表現型における不均一性の可能性のある問題に関して、表現型に影響を及ぼす同義遺伝子(遺伝子座不均一性)の存在による不均一性は、任意の他のマッピング方法についてもそうであるが、有意な遺伝子型効果を検出する混合マッピングの検出力を低下させることが予想される。不均一性はまた、特定の遺伝子内の多機能的対立遺伝子のためでありうる(対立遺伝子不均一性)。一つの例はMC1Rであり、およそ6個の比較的共通の変異体がネイティブヨーロッパ人および彼らの子孫集団の間に赤毛、そばかす、および青白い皮膚へ導く。ヨーロッパ人内のこれらの変異体は、異なるハプロタイプバックグラウンドにあり、従って、単一の突然変異が起こり、高頻度まで上昇した場合と比較して関連解析におけるMC1R遺伝子の効果を検出する力を減少させる。しかしながら、混合された集団(例えば、ヨーロッパ人/アフリカ人)において、これらの変異体は、すべて、祖先についての情報を与えるマーカー(例えば、MC1Rマーカー、表1を参照)との対立遺伝子関連にあり、それらはすべて、皮膚を明るくする効果を生じるため、それらの情報は、ヨーロッパ人に固有のたった1つの機能的変異体があるというよりも、混合マッピングによるMC1Rの同定を6個の機能的変異体について異ならないとすることに折り合わされる。特定の親集団内の機能的変異体の効果が同じ方向である(例えば、疾患のリスクを低下させることにおいて)限り、対立遺伝子の不均一性は、混合マッピングにおいて深刻な問題にはならないと思われる。 With regard to possible problems of heterogeneity in the phenotype to be studied, heterogeneity due to the presence of synonymous genes (locus heterogeneity) that affect the phenotype can be any other mapping As is the case with the method, it is expected to reduce the power of mixed mapping to detect significant genotype effects. Heterogeneity can also be due to multifunctional alleles within a particular gene (allelic heterogeneity). One example is MC1R, where approximately six relatively common variants lead to red hair, freckles, and pale skin between native Europeans and their offspring populations. These variants within Europeans are in different haplotype backgrounds, thus reducing the ability to detect the effect of the MC1R gene in association analyzes compared to when a single mutation occurs and rises to high frequency Let However, in mixed populations (e.g. European / African), all of these variants are allelic with markers that provide information about ancestry (e.g., MC1R markers, see Table 1). Because all of them have the effect of lightening the skin, their information has identified six functional mutations in the identification of MC1R by mixed mapping rather than having only one functional mutant specific to Europeans It is compromised that the body is not different. As long as the effects of functional variants within a particular parent population are in the same direction (e.g. in reducing the risk of disease), allelic heterogeneity does not appear to be a serious problem in mixed mapping It is.

ヒト個体間の遺伝的変動の大部分(80〜90%)は、個体間である；変動の10〜20%のみが集団差による(例えば、Nei、前記、1987；Cavalli-Sforzaら、前記、1994；Dekaら、前記、1995)。たいていの集団は対立遺伝子を共有し、一つの集団において最も高頻度であるそれらの対立遺伝子は、一般的に、他のものにおいても高頻度である。集団特異的であるかまたは地理学的および民族的に定義された集団間に大きな頻度差をもつかのいずれかの古典的(血液型、血清タンパク質および免疫学的な)またはDNA遺伝マーカーはごく少ない(RoychodhuryおよびNei、前記、1988；Cavalli-Sforzaら、前記、1994)。固有の遺伝マーカーのこの明らかな欠如にもかかわらず、固有の生態学的条件、偶然的遺伝的浮動および性別選択への長期間適応をおそらく反映している、ヒト集団の間の顕著な身体的および生理的差がある。現代の集団において、これらの差は、民族群間の形態学的差、ならびに疾患に対する感受性および耐性における差の両方において明らかである。 The majority of genetic variation between human individuals (80-90%) is between individuals; only 10-20% of the variation is due to population differences (eg Nei, supra, 1987; Cavalli-Sforza et al., Supra, 1994; Deka et al., Supra, 1995). Most populations share alleles, and those alleles that are most frequent in one population are generally also frequent in others. Very few classical (blood group, serum proteins and immunological) or DNA genetic markers that are either population specific or have a large frequency difference between geographically and ethnically defined populations Less (Roychodhury and Nei, supra, 1988; Cavalli-Sforza et al., Supra, 1994). Despite this apparent lack of unique genetic markers, significant physical fitness among human populations, probably reflecting long-term adaptation to unique ecological conditions, accidental genetic drift and gender selection And there are physiological differences. In modern populations, these differences are evident both in morphological differences between ethnic groups, as well as differences in susceptibility and resistance to disease.

混合およびマッピング研究に最も有用な固有の対立遺伝子は、また集団間で対立遺伝子頻度において大きな差をもつものである(Reed、前記、1973；Chakrabortyら、前記、1992；Stephensら、前記、1994)。それらがすべての他の集団には全く存在しないという事実は、統計的計算の一部を簡単にし、より確信的な親の対立遺伝子頻度推定を容易にすることができるが、それらの利用についての主たる理由ではない。集団特異的対立遺伝子(PSA)という名称は、最初は、集団間で大きな対立遺伝子頻度差をもつ遺伝マーカーを記載するために用いられたが(Shriverら、前記、1997；Parraら、前記、1998)、これらのマーカーは、今、より正しくかつ記述的な用語により、祖先情報提供マーカー(Ancestry Informative Marker)(AIM)と呼ばれる。二対立遺伝子マーカーについて、頻度差(δ)は、p_x-p_y(それはq_y-q_xに等しい)に等しく、p_xおよびp_yは集団XおよびYにおける一方の対立遺伝子の頻度であり、q_xおよびq_yは他方の頻度である。主要な民族群間の中央値δレベルは、15%と20%の間の範囲であり、任意に同定された二対立遺伝子の遺伝マーカーの大多数(>95%)はδ<50%をもつ(Deanら、前記、1994、参照として本明細書に組み入れられている)。Fst>0.4をもつマーカーを用いることに基づく混合マッピング研究における検出力の統計的推定は、以前に提示された(McKeigueら、前記、2000)。ゲノムに渡って均等に間隔をおかれた1,000個のそのようなマーカーを用いて、親集団間での2倍の相対的リスクを説明する疾患遺伝子を同定しうる80%の統計的検出力をもつことが可能であることが実証された。 The unique alleles most useful for mixing and mapping studies are also those that have large differences in allele frequencies between populations (Reed, supra, 1973; Chakraborty et al., Supra, 1992; Stephens et al., Supra, 1994). . The fact that they are completely absent from all other populations can simplify some of the statistical calculations and facilitate more confident parental allele frequency estimations, Not the main reason. The name population-specific allele (PSA) was initially used to describe genetic markers with large allelic frequency differences between populations (Shriver et al., Supra, 1997; Parra et al., Supra, 1998). ), These markers are now called Ancestry Informative Markers (AIM) in more correct and descriptive terms. For biallelic markers, the frequency difference (δ) is equal to p _x -p _y (which is equal to q _y -q _x ), where p _x and p _y are the frequencies of one allele in populations X and Y , Q _x and q _y are the other frequencies. Median δ levels between major ethnic groups range between 15% and 20%, and the majority of arbitrarily identified biallelic genetic markers (> 95%) have δ <50% (Dean et al., Supra, 1994, incorporated herein by reference). Statistical estimates of power in mixed mapping studies based on using markers with Fst> 0.4 have been presented previously (McKeigue et al., Supra, 2000). With 1,000 such markers evenly spaced across the genome, 80% statistical power to identify disease genes that account for twice the relative risk between parental populations It was proved possible to have.

AIMおよび本発明の方法によるそれらの使用は、実施例1〜6に実証されている(下記)。さらに、アフリカ系アメリカ人およびヒスパニック集団における混合マッピングに情報を与えるマーカーについての対立遺伝子頻度データが報告された(Smithら、前記、2001；Collins-Schrammら、前記、2002)。疾患素因または薬物応答性の解析に本発明の方法を適用するために、混合割合および混合動態の推定は重要である。混合された集団において遺伝的構造について制御することは、これらの集団の祖先の割合および遺伝的構造の知識を必要とする。混合割合の信頼性のある推定は、考慮する集団の情報をもっている同定を可能にしうる。ハイブリダイゼーション中に引き起こされる混合LDは混合のレベルに依存するため、サンプリングは、一般的に、より多くの混合があった国のそれらの地域に集中させるべきである。 AIM and their use by the methods of the invention are demonstrated in Examples 1-6 (below). In addition, allelic frequency data were reported for markers that inform mixed mapping in African American and Hispanic populations (Smith et al., Supra, 2001; Collins-Schramm et al., Supra, 2002). In order to apply the method of the present invention to the analysis of disease predisposition or drug responsiveness, estimation of mixing ratio and mixing kinetics is important. Controlling the genetic structure in a mixed population requires knowledge of the ancestral proportions and genetic structure of these populations. Reliable estimation of the mixing ratio may allow identification with information on the population to consider. Since mixed LD caused during hybridization depends on the level of mixing, sampling should generally be concentrated in those areas of the country where there was more mixing.

祖先の割合を知ることに加えて、考慮中の集団に存在する集団構造のレベルを理解することが重要である。均一な集団は、同類の交配がないもの、家族が多かれ少なかれ、ランダムな組み合わせにより、かつDNA遺伝子型にかまわず、形成される任意交配集団である。たいていの大きな世界各地の人々からなる集団において、均一性が予想されかつ見出される。しかしながら、個体がランダムに交配しないような集団内に層別化が存在する場合には、集団は均一にならない。混合は、集団において遺伝的構造を導入する可能な機構の一つであり、この遺伝的構造を考慮に入れることが混合マッピングを促進する。 In addition to knowing the proportion of ancestors, it is important to understand the level of population structure that exists in the population under consideration. A uniform population is an arbitrary mating population that is formed with no similar mating, with more or less families, random combinations, and regardless of DNA genotype. Uniformity is expected and found in most large groups of people from all over the world. However, if stratification exists in a population where individuals do not mate randomly, the population will not be uniform. Mixing is one possible mechanism for introducing genetic structure in a population, and taking this genetic structure into account facilitates mixed mapping.

遺伝的構造の影響は、2つのレベルで考えられる。第一に、親集団は、それらが選択されたAIMの対立遺伝子頻度において不均一性を示すかどうかを決定するために評価される；不均一性は、上で考察されているように、混合割合の推定に影響を及ぼしうる。いくつかの方法は遺伝的構造の存在を検出することができる。これらの方法は、2つの主なカテゴリー、ゲノム制御(GC)方法(DevlinおよびRoeder、前記、1999)および構造化関連(SA)方法(PritchardおよびDonelly、前記、2001)と名付けられる、に分類されうる。両方の方法は、上で考察されているように、サンプリング効果のため、またはサンプリングされた集団における現実の人口統計学的層のためであった可能性がある遺伝的構造の影響について推定かつ補正するために連鎖していないマーカーのパネルのジェノタイピングを必要とする。SA方法(Pritchardら、前記、2000；PritchardおよびDonelly、前記、2001)は、親集団における遺伝的構造について検定するために用いられた。この方法は、集団構造を推論するために連鎖していないマーカーにより提供される遺伝子型情報を用いることに基づき、Jonathan Pritchardから入手可能なソフトウェアプログラムで実行された。さらに、構造の存在について検定するために、プログラムは個体の祖先割合を推定し、本研究のために、このベイズの方法が最大尤推定法を補完するために用いられた。これらの2つの方法は、高く相関している個体の祖先の推定を生じる。 The effects of genetic structure can be considered at two levels. First, the parent population is evaluated to determine if they show heterogeneity in the allelic frequency of the selected AIM; heterogeneity is mixed as discussed above May affect percentage estimates. Some methods can detect the presence of genetic structure. These methods are categorized into two main categories, named genome control (GC) methods (Devlin and Roeder, supra, 1999) and structuring-related (SA) methods (Pritchard and Donelly, supra, 2001). sell. Both methods, as discussed above, estimate and correct for the effects of genetic structure that may have been due to sampling effects or to the real demographic layer in the sampled population Requires genotyping of a panel of unlinked markers to do. The SA method (Pritchard et al., Supra, 2000; Pritchard and Donelly, supra, 2001) was used to test for genetic structure in the parental population. This method was implemented with a software program available from Jonathan Pritchard based on using genotype information provided by unlinked markers to infer population structure. In addition, to test for the existence of the structure, the program estimated the ancestry percentage of individuals, and for this study this Bayesian method was used to complement the maximum likelihood estimation method. These two methods result in an estimate of the ancestry of individuals that are highly correlated.

混合された集団における遺伝的構造の第二の源は、混合過程自身のためであり、その過程において新しく引き起こされた連鎖不平衡が混合された集合に導入される。本明細書に例証されるもののようなAIMは、特に、祖先の割合に関連している集団構造の感度の高い指標である。集団構造の存在を評価するために、サンプルは、遺伝子座内(Hardy-Weinberg不平衡)および遺伝子座間(配偶子不平衡)の両方において対立遺伝子の非ランダムな関連について検定され、個体の祖先推定の分布もまた調べられる(Pfaffら、前記、2001；Parraら、前記、2001を参照)。 The second source of genetic structure in the mixed population is for the mixing process itself, in which newly induced linkage disequilibrium is introduced into the mixed set. AIM, such as that exemplified herein, is a sensitive indicator of population structure, particularly related to ancestral proportions. To assess the presence of population structure, samples were tested for non-random associations of alleles both within loci (Hardy-Weinberg disequilibrium) and between loci (gamete imbalances) to estimate individual ancestry (See Pfaff et al., Supra, 2001; Parra et al., Supra, 2001).

アフリカ系アメリカ人の歴史は、1619年、最初のアフリカ人が英国の植民地(ジェームズタウン)に着いた年、までさかのぼりうるが、早くも1526年に、アフリカ人奴隷の存在が米国になるであろう所(サウスカロライナ、ジョージア、フロリダおよびニューメキシコ)へのスペインの遠征において報告された。組織化された奴隷制度がほんのすぐ後に始まったが、18世紀初頭になって初めて、奴隷の輸入が、南部植民地におけるタバコ、インジゴおよび米プランテーションを耕作する労働者の需要と平行して、比率の増加に達した；ピークは1790〜1800年の10年間および19世紀の初めの年に生じた。1808年、奴隷貿易は違法となったが、もう数十年間、低率で続いた。米国へ連れてこられた奴隷の総数について異なる推定が提示されており、380,000人と570,000人の間の範囲である数が一般的には認められている。 The history of African Americans can be traced back to 1619, the year when the first Africans arrived in the British colony (Jamestown), but as early as 1526, the existence of African slaves became the United States. Reported on a Spanish expedition to deaf places (South Carolina, Georgia, Florida and New Mexico). Organized slavery began just shortly after, but only in the early 18th century, slave imports accounted for a proportion of the demand for workers cultivating tobacco, indigo and rice plantations in the southern colonies. The peak occurred in the 10 years from 1790 to 1800 and in the early years of the 19th century. In 1808, slave trade became illegal, but it continued at a low rate for decades. Different estimates have been given for the total number of slaves brought to the United States, with numbers generally in the range between 380,000 and 570,000.

アフリカ人奴隷の民族起源を正確に決定することは困難であるが、船舶リストからの情報が彼らの地理学的出所のおおよその絵を提供した。奴隷貿易は、西および中央西部アフリカの非常に広い地域、主として、今日の国の北部のセネガルと南部のアンゴラの間の海岸線、に影響を及ぼした。最も重要な領域は、セネガンビア(ガンビアおよびセネガル)、シエラレオネ(ギニアおよびシエラレオネ)、ウィンドワードコースト(象牙海岸(Ivory Coast)およびリベリア)、黄金海岸(Gold Coast)(ガーナ)、ベニン湾(ヴォルタ川からベニン川まで)、ビアフラ湾(ベニン川の東からガボンまで)、ならびにアンゴラ(ガボン、コンゴおよびアンゴラの一部を含む南西アフリカ)であった。Curtin(前記、1969)は、18世紀(大西洋奴隷貿易のピーク)の英国貿易におけるデータに基づいて、地域による比例的寄与の推定を提示し、アンゴラおよびビアフラ湾が北米本土へ輸入された奴隷の最高数を与える領域であることを示した(それぞれ、約25%)。しかしながら、米国における入場の港に依存して民族起源において有意な差があり、ヴァージニアおよびサウスカロライナの植民地についての数字はかなり異なった。 Although it is difficult to accurately determine the ethnic origin of African slaves, information from the ship list provided an approximate picture of their geographic origin. The slave trade affected a very large area of West and Central West Africa, primarily the coastline between Senegal in the northern part of today's country and Angola in the southern part. The most important areas are from Sene Gambia (Gambia and Senegal), Sierra Leone (Guinea and Sierra Leone), Windward Coast (Ivory Coast and Liberia), Gold Coast (Ghana), Benin Bay (Volta River) The Benin River), the Biafra Bay (from the east of the Benin River to Gabon), and Angola (southwest Africa including parts of Gabon, Congo and Angola). Curtin (supra, 1969) presents an estimate of proportional contribution by region based on data from the British trade in the 18th century (the peak of the Atlantic slave trade). It was shown to be the region giving the highest number (about 25% each). However, there were significant differences in ethnic origin depending on the port of entry in the United States, and figures for the Virginia and South Carolina colonies differed considerably.

アフリカ系アメリカ人の歴史は、アフリカからの強制移住によるだけでなく、ヨーロッパ人および先住アメリカ人とを含む、彼らが北アメリカに着いた時に出会った他の民族群との混合によっても特徴付けられた。しかしながら、混合の問題を扱う歴史的記録はほとんどない。さらに、奴隷制度の廃止から現在までの期間に、現アフリカ系アメリカ人集団を形成した重要な因子があった。特別な対象となるのは、過去150年間に渡って米国内でのアフリカ系アメリカ人の移住のパターンである。この意味において、19世紀中の南部諸州におけるアフリカ系アメリカ人の再分布、および第一次世界大戦後に始まる田舎の南部から北部の都市地域への大移住が特に関連があり、米国におけるアフリカ系アメリカ人集団の現分布を定めるにおいて莫大な影響力を与えていた(JohnsonおよびCampbell、Black Migration in American: A Social Demographic History; Duke University Press、ダラム、NC、1981)。 African American history is characterized not only by forced migration from Africa, but also by a mix with other ethnic groups that they met when they arrived in North America, including Europeans and Native Americans. It was. However, there are few historical records dealing with the problem of mixing. In addition, during the period from the abolition of slavery to the present, there were important factors that formed the current African-American population. Of special interest is the pattern of African American migration within the United States over the past 150 years. In this sense, the redistribution of African Americans in the southern states during the 19th century, and the great migration from the southern part of the countryside to the northern urban areas that began after World War I, are particularly relevant. It had enormous influence in defining the current distribution of the American population (Johnson and Campbell, Black Migration in American: A Social Demographic History; Duke University Press, Durham, NC, 1981).

ヒスパニックに関して、用語「ヒスパニック」は、主に政治上の人口統計的目的のために造り出され、一般的に、米国に住んでいる、ラテンアメリカ人起源または家系の人を同定するために用いられる。この定義は、非常に異なる歴史的、文化的および言語的背景をもつ人々をいっしょにひとまとめにするが、この分類が広く用いられてきた。中央アメリカ、カリブ海および南アメリカは、何世紀もの間、イベリア帝国の政権(スペインおよびポルトガル)の支配下にあったが、彼らは、植民地時代の前も後も、全く異なる地域の歴史をもっていた。四大陸、北および南アメリカ、ヨーロッパならびにアフリカ、由来の集団は、現代のヒスパニック集団の形成に寄与した。米国に現在住んでいる主な3つのヒスパニック群 − メキシコ系アメリカ人、プエルトリコ人およびキューバ系アメリカ人、併せて、全米国ヒスパニック集団の80%より多くを構成する − の人類学的背景がここで考慮される。 With respect to Hispanic, the term “Hispanic” is created primarily for political demographic purposes and is generally used to identify people of Latin American origin or ancestry living in the United States. . Although this definition brings together people with very different historical, cultural and linguistic backgrounds, this classification has been widely used. Central America, the Caribbean and South America have been under the control of the Iberian regime (Spain and Portugal) for centuries, but they have completely different regional histories before and after the colonial era. It was. Populations from four continents, North and South America, Europe and Africa, contributed to the formation of modern Hispanic populations. Here are the anthropological backgrounds of the three major Hispanics currently living in the United States-Mexican-Americans, Puerto Ricans and Cuban-Americans, together making up more than 80% of the total Hispanic population Be considered.

メキシコ系アメリカ人は、3つの前記の群のうち最も高いアメリカインディアンの寄与を示す。16世紀の初頭におけるスペインのメキシコ征服後まもなく、スペイン人男性のアメリカインディアン女性との混合が、結果として、だんだん重要性を増す混合された集団(メスティーソ)を生じ、この人種混合は、「ヌエバエスパーニャ(New Spain)」におけるスペイン支配の3世紀を通して続き、メキシコ人集団を生物学的および文化的の両方ともに形成した。推定の大多数は、30%と40%の間の範囲であるメキシコ系アメリカ人におけるアメリカインディアン構成要素を示した(Hanisら、前記、1986；Longら、1991；Hanisら、Diabetes Care 14:618-627、1991；Merriwetherら、Amer. J. Phys. Anthrop. 102:153-159、1997)。そのうえ、興味深いので指摘するが、いくつかの研究が、社会経済的地位に依存するアメリカインディアン祖先の量における差を示した(Chakrabortyら、Genet. Epidemiol. 3:435-454、1986；Mitchellら、Ethnicity and Disease 3:22-31、1992)。スペイン支配の間のメキシコ属領に実質的なアフリカ人存在もあった。Curtin(前記、1969)は、奴隷貿易の全期間中にメキシコへ輸入されたアフリカ人の総数を約200,000人であると推定した。しかしながら、メキシコ人の遺伝子プールへの彼らの寄与は、ヨーロッパ人およびアメリカインディアンの寄与よりはるかに低くく、ゼロから10%までの範囲であると推定された(例えば、Hanisら、前記、1991を参照)。 Mexican Americans show the highest American Indian contribution of the three aforementioned groups. Shortly after Spain's conquest of Mexico at the beginning of the 16th century, the mixing of Spanish men with American Indians resulted in a mixed population (Mestiso) of increasing importance, this racial mix Continuing through the third century of Spanish rule in the New Spain, a Mexican population was formed both biologically and culturally. The majority of estimates showed American Indian components in Mexican Americans ranging between 30% and 40% (Hanis et al., Supra, 1986; Long et al., 1991; Hanis et al., Diabetes Care 14: 618 -627, 1991; Merriwether et al., Amer. J. Phys. Anthrop. 102: 153-159, 1997). In addition, it is interesting to point out that several studies have shown differences in the amount of American Indian ancestry depending on socio-economic status (Chakraborty et al., Genet. Epidemiol. 3: 435-454, 1986; Mitchell et al., Ethnicity and Disease 3: 22-31, 1992). There were also substantial Africans in the Mexican territory during Spanish rule. Curtin (supra, 1969) estimated that the total number of Africans imported into Mexico during the entire period of slave trade was approximately 200,000. However, their contribution to the Mexican gene pool was estimated to be much lower than that of Europeans and American Indians, ranging from zero to 10% (e.g., Hanis et al., Supra, 1991. reference).

カリブ海植民地(キューバおよびプエルトリコ)において、状況は本土とは非常に異なった。先住アメリカ人集団は、はるかに小さく、ヨーロッパ人との最初の接触後まもなくすぐ、苦役および疾患により多くが死んだ。にもかかわらず、植民地化の初期の間の混合の比率は、結果として、アラワク族およびカリブ人、ヒスパニックカリブ人の原住民、からの評価できる遺伝的寄与(約18%)を生じるのに十分高かった(Hanisら、前記、1991)。この領域のもう一つの示差的特徴は、有意なアフリカ人の影響であり、キューバ、プエルトリコおよびドミニカ共和国のような国の現在の社会の多くの局面においても反映されている。アフリカ人奴隷は、大勢で砂糖プランテーションで働くために輸入され、ヨーロッパ人起源の集団を数でまさるまでにもなった(KanellosおよびPerez、Chronology of Hispanic-American history: from pre-Cloumbian times to the present；ニューヨーク、Gale Research、1995)。従って、現代のキューバ人(20%)およびプエルトリコ人(37%)におけるアフリカ人の遺伝的寄与のパーセンテージは、他のヒスパニック集団においてより有意に高い(Hanisら、前記、1991)。 In the Caribbean colonies (Cuba and Puerto Rico), the situation was very different from the mainland. Indigenous American populations were much smaller and soon died of suffering and illness soon after the first contact with Europeans. Nevertheless, the proportion of mixing during the early period of colonization resulted in an appreciable genetic contribution (about 18%) from Arawak and the Caribbean, natives of the Hispanic Caribbean. High enough (Hanis et al., Supra, 1991). Another differential feature in this area is the significant African influence, which is also reflected in many aspects of the country's current society such as Cuba, Puerto Rico and the Dominican Republic. African slaves have been imported to work on sugar plantations and have even surpassed European populations (Kanellos and Perez, Chronology of Hispanic-American history: from pre-Cloumbian times to the present New York, Gale Research, 1995). Thus, the percentage of African genetic contribution in modern Cubans (20%) and Puerto Ricans (37%) is significantly higher than in other Hispanic populations (Hanis et al., Supra, 1991).

人種が複雑な概念であり、一般的な用法において、人または人々の群の文化的および生物学的の両方の特徴を反映している。集団間の身体的差がしばしば、文化的差を伴うという事実を仮定すれば、これらの2つの要素を分離することは困難であった。人種は単に社会的構築物にすぎないと言明し、その論点を単純化しすぎる動きが科学のいくつかの分野にあった。これはしばしば、人々の間での変動の何の局面が考慮されることになっているかに依存して、真でありうるが、世界の集団間の差の多くの特定の事例にとって偽りでありうる。生物学的差の1つの明らかな例は、皮膚の色である。文化または環境は、人の皮膚における色素形成のレベルにほとんど影響を及ぼさない。今なお、集団に渡って劇的な差がある。色素形成は、もちろん、ただ皮だけであり、我々が住んでいる複雑な環境、ならびにこれらが個体および群の生活の質にどれくらい影響を及ぼすかという観点からすれば全く単純である。 Race is a complex concept that reflects both cultural and biological characteristics of a person or group of people in common usage. Given the fact that physical differences between groups are often accompanied by cultural differences, it was difficult to separate these two elements. There were moves in some areas of science that state that race was just a social construct, and that the issue was oversimplified. This can often be true, depending on what aspect of variability among people is to be considered, but is false for many specific cases of differences between populations in the world sell. One obvious example of a biological difference is skin color. Culture or environment has little effect on the level of pigmentation in the human skin. There are still dramatic differences across the group. Pigmentation is, of course, just the skin and is quite simple in terms of the complex environment in which we live and how they affect the quality of life of individuals and groups.

ヒトの種は比較的若く、種として、100,000年前、東アフリカに源を発する可能性が最も高く、群として分岐し地球に定住した(Cavalli-SforzaおよびCavalli-Sforza、The Great Human Diasporas. The History of Diversity and Evolution (Perseus Books、ケンブリッジ、MA、1995))。これらの移住の間、およびそれ以来ずっと、世界の様々な大陸に定住する集団の独立した進展が幾分かあった。この進展の最も単純な証拠は、遺伝マーカーでの対立遺伝子頻度における差に見られる。一般的に、1つの集団に見出される対立遺伝子はまた、すべての集団にも見出され、1つの集団において最もありふれている対立遺伝子はまた、他においてもありふれている。集団間のこれらの類似性は、すべての集団の最近の共通起源に光を当てる。しかしながら、集団間で異なる遺伝マーカーの例があり、本明細書で開示されているように、これらのマーカー、AIM、は人または集団の祖先の起源を推定するために用いられうる。 Human species are relatively young, most likely to originate in East Africa 100,000 years ago as species, diverging as groups and settled on Earth (Cavalli-Sforza and Cavalli-Sforza, The Great Human Diasporas. History of Diversity and Evolution (Perseus Books, Cambridge, MA, 1995)). During these migrations and since then, there have been some independent developments of groups settled on various continents of the world. The simplest evidence of this progress is in the differences in allele frequency at genetic markers. In general, alleles found in one population are also found in all populations, and the most common alleles in one population are also common in others. These similarities between populations shed light on the recent common origin of all populations. However, there are examples of genetic markers that differ between populations, and as disclosed herein, these markers, AIM, can be used to estimate the origin of a human or population ancestor.

本発明は、検査個体の少なくとも2つの祖先群の比例的祖先を推定する方法を提供し、特に、比例的祖先に関して信頼水準を提供する。本発明の方法は、検査個体の核酸分子を含む試料を、調べられる各祖先群についてのBGAを示す少なくとも約10個のAIMのパネルのSNPのヌクレオチド出現を検出することができるハイブリダイズするオリゴヌクレオチドに接触させる段階であって、接触段階が、ハイブリダイズするオリゴヌクレオチドにより検査個体のAIMのヌクレオチド出現を検出するのに適した条件下である、段階；および、所定の信頼水準を以て、調べられる祖先群のそれぞれのAIMのヌクレオチド出現と相関する集団構造を同定する段階であって、集団構造が比例的祖先を示している、段階により行われうる。 The present invention provides a method for estimating the proportional ancestry of at least two ancestry groups of a test individual, and in particular, provides a confidence level for the proportional ancestry. The method of the present invention allows a sample comprising a nucleic acid molecule of a test individual to hybridize oligonucleotides capable of detecting the nucleotide occurrence of SNPs in a panel of at least about 10 AIMs indicating a BGA for each ancestral group being examined. An ancestor to be tested with a predetermined confidence level, wherein the contacting step is under conditions suitable for detecting the nucleotide appearance of the AIM nucleotide of the test individual by the hybridizing oligonucleotide; It may be performed by identifying a population structure that correlates with the nucleotide occurrence of each AIM of the group, wherein the population structure indicates a proportional ancestry.

用語「生物地理学的祖先」または「BGA」は、人種の生物学的または遺伝学的構成要素を記載するために本明細書に用いられる。BGAは、主な集団群(例えば、先住アメリカ人、東アジア人、インドヨーロッパ人、およびサハラ以南アフリカ人)の言葉による、人の祖先の起源の簡単かつ客観的な記載である。BGA推定は、今日の多くの人々および集団の混合された性質を表すことができる。米国を含む多くの国々において、最初は分離されていた集団間に広範囲な混合があった。用語「混合(admixture)」は、そのような集団混合を指すために本明細書に用いられる。この点で、BGA推定は、合計100%となる一連のパーセンテージの形をとる、個体の混合割合として理解されうる。例えば、ある人は、75%インドヨーロッパ人、15%アフリカ人および10%先住アメリカ人をもちうる、または100%インドヨーロッパ人祖先をもちうるなど。 The term “biogeographical ancestry” or “BGA” is used herein to describe the biological or genetic components of a race. BGA is a simple and objective description of the origin of a person's ancestry in the language of the main population groups (eg, Native Americans, East Asians, Indo-Europeans, and Sub-Saharan Africans). BGA estimates can represent the mixed nature of many people and groups today. In many countries, including the United States, there was extensive mixing among initially segregated populations. The term “admixture” is used herein to refer to such population mixing. In this regard, BGA estimation can be understood as a mixture of individuals, taking the form of a series of percentages that add up to 100%. For example, a person may have 75% IndoEuropean, 15% African and 10% Native Americans, or 100% IndoEuropean ancestry.

本発明の方法により推定された比例的祖先は、例えば、サハラ以南アフリカ人、先住アメリカ人、インドヨーロッパ人、東アジア人、中東人、または太平洋諸島系の祖先群を含む任意の祖先群の割合でありうり、一般的には、そのような祖先群の2つまたはそれ以上の組み合わせである。このように、検査個体の比例的祖先は、サハラ以南アフリカ人およびインドヨーロッパ人祖先群間の比例的所属を含みうる(例えば、80%サハラ以南アフリカ人および20%インドヨーロッパ人；または60%サハラ以南アフリカ人、20%インドヨーロッパ人および20%の第三の祖先群)；または先住アメリカ人およびインドヨーロッパ人祖先群；東アジア人および先住アメリカ人祖先群；インドヨーロッパ人および東アジア人祖先群などの間の比例的所属を含みうる。 Proportional ancestry estimated by the method of the present invention is the proportion of any ancestry group including, for example, sub-Saharan African, Native American, Indo-European, East Asian, Middle Eastern, or Pacific Islander ancestry groups It is generally a combination of two or more such ancestry groups. Thus, the proportional ancestry of a test individual may include a proportional affiliation between sub-Saharan African and Indo-European ancestry groups (eg, 80% sub-Saharan and 20% Indo-European; or 60% Saharan). South Africans, 20% Indo-Europeans and 20% third ancestry groups); Indigenous Americans and Indo-European ancestry groups; East Asian and Indigenous American ancestry groups; Indo-European and East Asian ancestry groups And so on.

推定は、例えば、3つの祖先群に関する個体の比例的祖先でなされる。この方法において、検査個体のAIMのヌクレオチド出現と相関する個体内の集団構造を同定することは、サハラ以南アフリカ人祖先群、先住アメリカ人祖先群、インドヨーロッパ人祖先群および東アジア人祖先群のそれぞれについて所属の尤度決定を行う段階；その後、個体について最も大きい尤度値をもつ3つの祖先群を選択する段階；最も大きい尤度値をもつ3つの祖先群の中ですべての可能な比例的所属の尤度を決定する段階であって、それにより、検査個体のAIMのヌクレオチド出現と相関する集団構造または比例的所属が同定される、段階；ならびに最大尤度のたった1つの比例的組み合わせを同定する段階により実施されうる。または、AIMのヌクレオチド出現と相関する集団構造を同定することは、それぞれの他の群と比較した各群の所属についての尤度決定を含む6つの二元(二項の)比較を行う段階；その後、すべての比較に渡って最も大きい尤度値をもつ3つの祖先群を選択する段階；最も大きい尤度値をもつ3つの祖先群の中ですべての可能な比例的所属の尤度を決定する段階であって、それにより、検査個体のAIMのヌクレオチド出現と相関する集団構造または比例的所属が同定される、段階；および最大尤度のたった1つの比例的組み合わせを同定する段階により実施されうる。そのような方法は、三元混合の個体に対してに加えて、単一の群に100%所属している個体にも働く。 The estimation is made, for example, on the individual's proportional ancestry for the three ancestry groups. In this method, identifying the population structure within an individual that correlates with the AIM nucleotide appearance of the tested individual is the ability of sub-Saharan African ancestry, Native American ancestry group, IndoEuropean ancestry group and East Asian ancestry group. Determining the likelihood of membership for each; then selecting the three ancestry groups with the largest likelihood value for the individual; all possible proportions among the three ancestry groups with the largest likelihood value Determining the likelihood of global affiliation, thereby identifying a population structure or proportional affiliation that correlates with the AIM nucleotide appearance of the test individual; and only one proportional combination of maximum likelihood Can be carried out by identifying. Alternatively, identifying a population structure that correlates with the nucleotide occurrence of AIM involves performing six binary (binary) comparisons, including a likelihood determination for each group affiliation compared to each other group; Then select the three ancestry groups with the largest likelihood values across all comparisons; determine the likelihood of all possible proportional affiliations among the three ancestry groups with the largest likelihood values Carried out by identifying a population structure or proportional affiliation that correlates with the AIM nucleotide occurrence of the test individual; and identifying only one proportional combination of maximum likelihood sell. Such a method works for individuals who are 100% belonging to a single group in addition to ternary individuals.

3つの祖先群の割合を含む個体の比例的祖先の推定はまた、群間で3つの三元比較を行う段階；最も大きい尤度値をもつ3つの祖先群の中でのすべての可能な比例的所属の尤度を決定する段階であって、それにより検査個体のAIMのヌクレオチド出現と相関する集団構造または比例的所属が同定される、段階；および最大尤度のたった1つの比例的組み合わせを同定する段階によりなされうる。本方法の利点は、3つの祖先群の比較のグラフ表示が作成されうることであり、グラフ表示は、各祖先群が三角形の頂点により独立して表されている、三角形を含み、かつ個体についての比例的所属の最大尤度値は、三角形内の点を含む(図2および3を参照)。望ましい場合には、グラフ表示は、比例的祖先を推定することに伴う信頼水準を示す信頼等高線をさらに含みうる。 Estimating the proportional ancestry of an individual, including the proportions of the three ancestry groups, also performs three ternary comparisons between groups; all possible proportions among the three ancestry groups with the highest likelihood values Determining the likelihood of global affiliation, thereby identifying a population structure or proportional affiliation that correlates with the AIM nucleotide appearance of the test individual; and a single proportional combination of maximum likelihood This can be done by identifying. The advantage of this method is that a graphical representation of a comparison of three ancestor groups can be created, the graphical representation comprising triangles, each ancestor group being represented independently by a triangle vertex, and for individuals The maximum likelihood value of the proportional affiliation of includes the points in the triangle (see FIGS. 2 and 3). If desired, the graphical representation may further include confidence contours that indicate the confidence level associated with estimating proportional ancestry.

個体の比例的祖先の推定はまた、比例的祖先が4つの祖先群の割合を含むところにおいてなされうる。この方法の様々な局面において、検査個体のAIMのヌクレオチド出現と相関する集団構造を同定することは、群の間で、6つの二元比較を行う段階、または3つの三元比較を行う段階、または1つの四元比較を行う段階；最も大きい尤度値をもつ4つの祖先群の中でのすべての可能な比例的所属の尤度を決定する段階であって、それにより検査個体のAIMのヌクレオチド出現と相関する集団構造または比例的所属が同定される、段階；および最大尤度のたった1つの比例的組み合わせを同定する段階により実施される。望ましい場合には、方法は、4つの祖先群の比較のグラフ表示を作成する段階をさらに含みうり、グラフ表示は、各祖先群がピラミッドの頂点により独立して表されている、ピラミッドを含み、かつ個体についての比例的所属の最大尤度値は、ピラミッド内の点を含む。望ましい場合には、グラフ表示は、その点を中心とした球を含む信頼等高線をさらに含みうり、球は、比例的祖先を推定することに伴う信頼水準を示す。 An estimate of an individual's proportional ancestry can also be made where the proportional ancestry includes a proportion of four ancestry groups. In various aspects of the method, identifying a population structure that correlates with the AIM nucleotide occurrence of the test individual comprises performing six binary comparisons, or three ternary comparisons between groups, Or performing one quaternary comparison; determining the likelihood of all possible proportional affiliations among the four ancestry groups with the largest likelihood values, thereby determining the AIM of the test individual A population structure or proportional affiliation that correlates with nucleotide occurrence is identified; and identifying only one proportional combination of maximum likelihood. If desired, the method may further comprise creating a graphical representation of the comparison of the four ancestry groups, the graphical representation comprising a pyramid, each ancestor group being independently represented by a pyramid vertex; And the maximum likelihood value of proportional affiliation for an individual includes a point in the pyramid. If desired, the graphical representation may further include a confidence contour that includes a sphere centered at that point, where the sphere indicates the confidence level associated with estimating the proportional ancestry.

本明細書に開示されているように、そのような方法は、例えば、法医学のツールとして有用である。犯罪現場で得られたDNA試料を用いて、方法は、髪、皮膚および目の色素形成に加えて、個体祖先の尤度に関する予想される情報を捜査員に提供することができるため、本方法は、法医学に実質的により多くの情報を提供する。比較して、現在のDNA方法は、それらが、犯罪現場からのDNA試料がデータベースに含まれる、または特定の個体から採取されたDNA試料と比較されることを必要とするため、遡及的情報を与えるのみである。このように、後者の方法は、容疑者が犯罪の犯人である可能性が高いという確証を与えることができるが、容疑者のDNA試料がすでにデータベースへ入力されている場合を除き、容疑者が逮捕されるまで有用な情報を提供しない。 As disclosed herein, such methods are useful, for example, as a forensic tool. Using DNA samples obtained at crime scenes, the method can provide investigators with expected information about the likelihood of individual ancestry in addition to hair, skin and eye pigmentation, so the method Provides substantially more information to forensic medicine. In comparison, current DNA methods require retrospective information because they require that DNA samples from crime scenes be included in the database or compared with DNA samples taken from specific individuals. Only give. Thus, the latter method can provide confirmation that the suspect is likely to be a criminal, but the suspect is not Do not provide useful information until arrested.

本明細書に開示されているように、検査個体の比例的祖先を推定する方法はまた、一般的に地政学的情報を用いて確立された関係に基づいている系統情報を補うことができるツールを提供する(実施例3参照)。例えば、本方法は、世界の祖先地図を作成するために用いられうる情報を提供し、検査個体の比例的祖先に対応する比例的祖先をもつ集団の位置が祖先地図に示される。それとして、方法は、祖先地図を系統地図でオーバーレイする段階であって、系統地図が検査個体に関して地政学的関連性をもつ集団の位置を示す、段階、および検査個体の家系の最も可能性が高い推定が得られるように祖先地図と系統地図の情報を統計的に結合する段階をさらに含みうる。 As disclosed herein, a method for estimating the proportional ancestry of a test individual is also a tool that can supplement phylogenetic information that is generally based on relationships established using geopolitical information. (See Example 3). For example, the method provides information that can be used to create a global ancestry map, and the location of the population with proportional ancestry corresponding to the proportional ancestry of the test individual is indicated in the ancestor map. As such, the method involves overlaying an ancestor map with a phylogenetic map, where the pedigree map indicates the location of a population that has geopolitical relevance with respect to the examined individual, and the most likely family of the examined individual. The method may further include statistically combining the information of the ancestor map and the phylogenetic map so as to obtain a high estimation.

本発明の方法により、AIMのヌクレオチド出現と相関する集団構造を同定することは、検査個体のAIMのヌクレオチド出現を、BGAを示すAIMのヌクレオチド出現に対応する既知の比例的祖先と比較することにより行われうる。BGAを示すAIMのヌクレオチド出現に対応する既知の比例的祖先は、表もしくは他のリストに含まれうり、検査個体のヌクレオチド出現は、視覚的に表もしくはリストに比較されうる、またはデータベースに含まれうり、比較は、例えば、コンピューターを用いて、電子的になされうる。本発明の方法の特に有用な適用は、BGAを示すAIMのヌクレオチド出現に対応する既知の比例的祖先を、既知の比例的祖先が決定された人の写真と結びつけ、それに従って、検査個体の身体的特徴をさらに推論する手段を提供することを含む。一つの局面において、写真はデジタル写真であり、デジタル写真のそのようなデジタル情報の複数をさらに含みうるデータベースに含まれうるデジタル情報を含み、それぞれは、写真における人のBGAを示すAIMのヌクレオチド出現に対応する既知の比例的祖先と結びつけられる。 By the method of the present invention, identifying a population structure that correlates with the nucleotide occurrence of AIM can be achieved by comparing the AIM nucleotide occurrence of the tested individual with a known proportional ancestor corresponding to the nucleotide occurrence of AIM that indicates BGA. Can be done. Known proportional ancestry corresponding to the nucleotide occurrence of AIM indicating BGA may be included in the table or other list, and the nucleotide occurrence of the test individual may be visually compared to the table or list, or included in the database In other words, the comparison can be made electronically, for example, using a computer. A particularly useful application of the method of the present invention is to combine a known proportional ancestry corresponding to the nucleotide occurrence of an AIM indicative of BGA with a photograph of the person from whom the known proportional ancestry has been determined, and accordingly the body of the examined individual Providing a means for further inferring the characteristic. In one aspect, the photograph is a digital photograph and includes digital information that may be included in a database that may further include a plurality of such digital information of the digital photograph, each of which represents an AIM nucleotide occurrence that indicates a person's BGA Associated with a known proportional ancestor.

本発明の方法は、検査個体の比例的祖先に対応する比例的祖先をもつ人の写真を同定することをさらに含みうる。そのような同定は、写真の1つまたは複数のファイルを手で調べることによりなされうり、写真は、例えば、写真における人のAIMのヌクレオチド出現に従って系統立てられている。写真を同定することはまた、各ファイルが、既知の比例的祖先をもつ人のデジタル写真に対応するデジタル情報を含んでいる、複数のファイルを含むデータベースをスキャンする段階、および検査個体のBGAを示すAIMのヌクレオチド出現に一致するBGAを示すAIMのヌクレオチド出現をもつ人の少なくとも1つの写真を同定する段階により行われうる。 The method of the present invention may further comprise identifying a photograph of a person having a proportional ancestry that corresponds to the proportional ancestry of the test individual. Such identification can be done by manually examining one or more files of the photograph, which are organized according to, for example, the nucleotide occurrence of a person's AIM in the photograph. Identifying the photos also includes scanning a database containing multiple files, each file containing digital information corresponding to a digital photo of a person with a known proportional ancestor, and the BGA of the individual being examined. This may be done by identifying at least one photograph of a person with an AIM nucleotide occurrence showing a BGA that matches the indicated AIM nucleotide appearance.

本発明により、BGAは、開示されたBGA検査のいくつかのバリエーションのいずれかを用いて測定されうり、多数の十分定義された集団サンプルにおいて特徴付けられた祖先情報提供マーカー(AIM)の選択されたパネルを利用する、ANCESTRYbyDNA(商標)1.0検査、ANCESTRYbyDNA(商標)2.0検査、およびANCESTRYbyDNA(商標)3.0検査(DNAPrint genomics, Inc.；サラソタ、FL)と呼ばれる3つのBGA検査を含む。AIMは、集団群間での頻度における実質的な差の表示を基礎として選択され、それとして、祖先が別な方法では知られていない特定の人の起源に関する情報を提供する。例えば、Duffy Null対立遺伝子(FY^＊0)は、すべてのサハラ以南アフリカ人集団において非常にありふれているが(定着または100%の対立遺伝子頻度にほぼ等しい)、アフリカの外側では見出されない。このように、この対立遺伝子をもつ人は、アフリカ人祖先のいくらかのレベルをもつ可能性が非常に高い。未知の起源の人からのDNA試料におけるAIMの解析において、親集団の可能な混合のすべてを計算することにより人が特定の親集団に由来しているという尤度(または確率)が決定されうる。尤度が最高である集団(または集団の組み合わせ)は、その人の祖先の割合の最良推定として採用される；祖先の割合のこれらの点推定における信頼区間もまた計算される。 In accordance with the present invention, BGA can be measured using any of several variations of the disclosed BGA test, and a selection of ancestral informative markers (AIMs) characterized in a number of well-defined population samples. This includes three BGA tests called ANCESTRYbyDNA ™ 1.0 test, ANCESTRYbyDNA ™ 2.0 test, and ANCESTRYbyDNA ™ 3.0 test (DNAPrint genomics, Inc .; Sarasota, FL). AIM is selected on the basis of displaying substantial differences in frequency between population groups, thus providing information about the origin of a particular person whose ancestors are otherwise unknown. For example, the Duffy Null allele (FY ^* 0) is very common in all sub-Saharan African populations (approximately equal to colonization or 100% allele frequency) but is not found outside Africa. Thus, a person with this allele is very likely to have some level of African ancestry. In the analysis of AIM in DNA samples from people of unknown origin, the likelihood (or probability) that a person is from a particular parent population can be determined by calculating all possible mixtures of the parent population . The population (or population combination) with the highest likelihood is taken as the best estimate of the person's ancestry percentage; confidence intervals in these point estimates of the ancestry percentage are also calculated.

ヒト祖先の生物学的構成要素の客観的評価は、DNAが調べられるヒトについて重要な知識を提供する。例えば、祖先の生物学的構成要素の解析は、例えば、アフリカ系アメリカ人における高血圧症および糖尿病のより高い比率、またはヨーロッパ系アメリカ人における痴呆症のより高い比率への遺伝的寄与を同定することにより健康不同性を解明することができる。BGAの推定はまた、養子縁組またはいくつかの他の事件により引き離された個体を彼らの祖先集団と出会うのを助けることができる。たとえ、人が祖先と再び出会うことに特に動かされないとしても、彼または彼女は、彼らの家族の過去の覆いをとり、例えば、家族の言い伝えを検証する、または忘れられたルーツを同定することができる。開示された方法はDNAの解析に基づいているため、それは、人口調査とは違って、高度に正確な人口統計データを提供することができる個人的人口統計ツールを提供する。 Objective assessment of the biological components of human ancestry provides important knowledge about the person whose DNA is being examined. For example, analysis of ancestral biological components can identify genetic contributions to, for example, higher rates of hypertension and diabetes in African Americans, or higher rates of dementia in European Americans Can elucidate health disparity. BGA estimates can also help individuals separated by adoption or some other incident to meet their ancestry population. Even if a person is not particularly moved to meet an ancestor again, he or she may uncover their family's past, for example, verify family legends or identify forgotten roots. it can. Because the disclosed method is based on DNA analysis, it provides a personal demographic tool that, unlike census, can provide highly accurate demographic data.

ミトコンドリアDNA(mtDNA)またはY染色体マーカーを解析するいくつかの商業的に入手可能な検査があり、1人の祖先の起源を知る手段として販売促進された。これらの検査は、人の祖先の一部の起源に関する情報を提供することができるが、検査は非常に制限されている。例えば、1世代前、人は2人の祖先、1人の母親および1人の父親、をもつ；5世代前、人は32人の祖先をもつ；一方、10世代前、人は1024人の祖先をもつ。10世代は、ざっと250年間であり、特に、例えば北アメリカの定住を考慮する場合、十分、系統的対象となる時間枠内である。mtDNAおよびY染色体検査は、ゲノムの小さな部分を見るのみであるため(それぞれ、母系および父系系統)、それらは、人の祖先の非常に小さな部分に関して情報を提供できるだけである。本発明のBGA検査は、人のゲノム中を通じての配列を利用し、それゆえに、より多数の祖先についての情報を提供することができる。 There are several commercially available tests that analyze mitochondrial DNA (mtDNA) or Y chromosome markers that have been promoted as a means of knowing the origin of one ancestor. Although these tests can provide information about the origin of some of the ancestors of a person, the tests are very limited. For example, one generation ago, a person has two ancestors, one mother and one father; five generations ago, a person has 32 ancestors; whereas ten generations ago, a person has 1024 Has ancestry. The 10th generation is roughly 250 years long, especially within the time frame that is systematically targeted, especially when considering the settlement of North America, for example. Because mtDNA and Y chromosome tests only look at small parts of the genome (maternal and paternal strains, respectively), they can only provide information about a very small part of a person's ancestry. The BGA test of the present invention utilizes sequences throughout the human genome and can therefore provide information about a greater number of ancestors.

従って、本発明は、所定の信頼水準を以て、検査個体の少なくとも2つの祖先群の比例的祖先を推定する方法を提供する。そのような方法は、「生物地理学的祖先検査」または「BGA検査」と呼ばれているが、例えば、検査個体の核酸分子を含む試料を、調べられる各祖先群についてのBGAを示す少なくとも約10個のAIMのパネルのSNPのヌクレオチド出現を検出することができるハイブリダイズするオリゴヌクレオチドと接触させる段階であって、接触段階が、ハイブリダイズするオリゴヌクレオチドにより検査個体のAIMのヌクレオチド出現を検出するのに適した条件下においてである、段階；および調べられる祖先群のそれぞれのAIMのヌクレオチド出現と相関する集団構造を、所定の信頼水準を以て同定する段階であって、集団構造が比例的祖先を示している、段階により行われうる。 Accordingly, the present invention provides a method for estimating a proportional ancestor of at least two ancestry groups of a test individual with a predetermined confidence level. Such a method is referred to as a “biogeographical ancestry test” or “BGA test”, but for example, a sample containing nucleic acid molecules of a test individual is at least approximately indicative of a BGA for each ancestral group being examined. Contacting a hybridizing oligonucleotide capable of detecting the nucleotide occurrence of a SNP in a panel of 10 AIMs, wherein the contacting step detects the nucleotide occurrence of AIM in the test individual by the hybridizing oligonucleotide. Identifying a population structure that correlates with the nucleotide occurrence of each AIM in the ancestral group being examined, with a predetermined confidence level, wherein the population structure identifies a proportional ancestor The steps shown can be performed.

本明細書に用いられる場合、用語「比例的祖先」は、個体が属するそれぞれの(1つより多い場合)祖先群のパーセント寄与を指す。本発明の方法により推定された比例的祖先は、例えば、サハラ以南アフリカ人、先住アメリカ人、インドヨーロッパ人、東アジア人、中東人、または太平洋諸島系祖先群の割合を含む任意の祖先群の割合でありうり、一般的には、そのような祖先群の2つまたはそれ以上の組み合わせである。このように、検査個体の比例的祖先は、サハラ以南アフリカ人およびインドヨーロッパ人祖先群の割合を含みうる(例えば、80%サハラ以南アフリカ人および20%インドヨーロッパ人；または60%サハラ以南アフリカ人、20%インドヨーロッパ人、および20%の第三の祖先群)；または先住アメリカ人およびインドヨーロッパ人祖先群；東アジア人および先住アメリカ人祖先群；インドヨーロッパ人および東アジア人祖先群などの割合を含みうる。同様に、比例的祖先は、先住アメリカ人、東アジア人およびインドヨーロッパ人祖先群；サハラ以南アフリカ人、先住アメリカ人およびインドヨーロッパ人祖先群；サハラ以南アフリカ人、先住アメリカ人および東アジア人祖先群などの割合を含みうる。 As used herein, the term “proportional ancestry” refers to the percent contribution of each ancestry group (if more than one) to which an individual belongs. Proportional ancestry estimated by the methods of the present invention can be any ancestry group including, for example, the proportion of sub-Saharan Africans, Native Americans, Indo-Europeans, East Asians, Middle Easterns, or Pacific Islander ancestry groups. It can be a proportion, generally a combination of two or more such ancestry groups. Thus, the proportional ancestry of a test individual may include the proportion of sub-Saharan African and Indo-European ancestry groups (eg, 80% sub-Saharan and 20% Indo-European; or 60% sub-Saharan Africans). , 20% Indo-European, and 20% third ancestry groups); or Native American and Indo-European ancestry groups; East Asian and Indigenous American ancestry groups; Indo-European and East Asian ancestry groups, etc. May include percentage. Similarly, proportional ancestry includes Native American, East Asian and Indo-European ancestry groups; Sub-Saharan African, Native American and Indo-European ancestry groups; Sub-Saharan African, Native American and East Asian ancestry It can include proportions such as groups.

個体の比例的祖先を推定するために有用なAIMのパネルは、配列番号：1〜331に示されるAIM、例えば、インドヨーロッパ人、サハラ以南アフリカ人、東アジア人および先住アメリカ人を含む比例的祖先を測定するために有用でありうる配列番号：1〜71に示されるAIM、を含みうる。例えば、配列番号：7、21、23、27、45、54、59、63および72〜152に示されるAIMは、東アジア人およびサハラ以南アフリカ人の比例的祖先を測定するために有用でありうる；配列番号：3、8、9、11、12、33、40、59、63および153〜239に示されるAIMは、東アジア人およびインドヨーロッパ人の比例的祖先を測定するために有用でありうる；ならびに配列番号：1、8、11、21、24、40、172および240〜331に示されるAIMは、インドヨーロッパ人およびサハラ以南アフリカ人の比例的祖先を測定するために有用でありうる。 A panel of AIMs useful for estimating an individual's proportional ancestry is the AIM shown in SEQ ID NOs: 1-331, for example, proportional including Indo-European, Sub-Saharan African, East Asian, and Native Americans An AIM set forth in SEQ ID NOs: 1-71 that may be useful for measuring ancestry. For example, the AIMs shown in SEQ ID NOs: 7, 21, 23, 27, 45, 54, 59, 63 and 72-152 are useful for measuring the proportional ancestry of East Asians and sub-Saharan Africans The AIMs shown in SEQ ID NOs: 3, 8, 9, 11, 12, 33, 40, 59, 63 and 153-139 are useful for measuring the proportional ancestry of East Asians and Indo-Europeans. And the AIMs shown in SEQ ID NOs: 1, 8, 11, 21, 24, 40, 172 and 240-331 are useful for measuring the proportional ancestry of IndoEuropeans and sub-Saharan Africans sell.

ANCESTRYbyDNA(商標)1.0検査(DNAPrint genomics, Inc.)は、大陸レベルでの祖先の割合において情報を提供するように特異的に設計されたBGA検査の第一バージョンである。それとして、ANCESTRYbyDNA(商標)1.0検査は、3つの構成要素群として、先住アメリカ人、ヨーロッパ人およびアフリカ人祖先のレベルに関して情報が得られることを可能にする。ANCESTRYbyDNA(商標)2.0検査は、比較して、先住アメリカ人、インドヨーロッパ人(ヨーロッパ人、中東人およびインド人のような南アジア人群を含む)、アフリカ人および東アジア人(太平洋諸島系を含む)を含む、たいていの大陸について大陸レベルでの祖先の割合において情報を提供し、アジアおよび環太平洋地域内で祖先を区別することができる。ANCESTRYbyDNA(商標)3.0検査は、例えば、日本人と中国人、または北ヨーロッパ人と中東人を区別することにより、大陸内で祖先のレベルをさらに限定することができ、それに従って、特定の大陸内で人の祖先がどこに由来しているのかということへのより深い洞察を与える。 The ANCESTRYbyDNA ™ 1.0 test (DNAPrint genomics, Inc.) is the first version of the BGA test specifically designed to provide information on the percentage of ancestry at the continental level. As such, the ANCESTRYbyDNA ™ 1.0 test allows information to be obtained regarding the levels of indigenous American, European and African ancestry as three component groups. The ANCESTRYbyDNA (TM) 2.0 test compared, indigenous Americans, Indo-Europeans (including South Asians such as Europeans, Middle Easts and Indians), Africans and East Asians (including Pacific Islanders) ) For most continents, including percentages of ancestry at the continental level, and can distinguish ancestry within Asia and the Pacific Rim. The ANCESTRYbyDNA (TM) 3.0 test can further limit the level of ancestry within a continent, for example, by distinguishing between Japanese and Chinese, or Northern Europeans and the Middle East, and accordingly in certain continents It gives a deeper insight into where human ancestry comes from.

ANCESTRYbyDNA(商標)2.0検査について、4つのBGA図への論理的分類がなされ、南アジア人、中東人およびヨーロッパ人は、インドヨーロッパ人と呼ばれる単一の群へ分類されている(実施例2参照)。この分類は、これらの群間の人類学的証拠および文化的結びつき(例えば、彼らの言語は共通の母体に由来している)。本明細書に開示された結果は、これらの群が、他の群とよりも遺伝子配列内容においてお互いにはるかに類似していることを実証している。ANCESTRYbyDNA(商標)2.0検査はまた、太平洋諸島系が東アジア人といっしょに分類される場合、より正確に実行される。それとして、ANCESTRYbyDNA(商標)2.0検査に用いられる4つの分類は、1)先住アメリカ人(すなわち、移住して南および北アメリカに居住した人々)；2)インドヨーロッパ人(ヨーロッパ人、中東人およびインド人のような南アジア人)；3)東アジア人(日本人、中国人、朝鮮人、太平洋諸島系)；および4)アフリカ人(サハラ以南)を含む。ANCESTRYbyDNA(商標)3.0検査は、さらに、南アジア人とヨーロッパ人、および太平洋諸島系と東アジア人を区別することができ、それに従って、6部分(先住アメリカ人、ヨーロッパ人、アフリカ人、南アジア人、東アジア人および太平洋諸島系)を提供するが、信頼区間は、ANCESTRYbyDNA(商標)2.0検査で得られるものよりも大きい。検査へのさらなる改善が与えられ、信頼区間は減少している。補足的パネルを解析し、それにより信頼区間を約50%、改善させることにより、点推定を中心とする信頼区間は減少し、それに従って、検査の正確さを増加させることができる。 The ANCESTRYbyDNA (TM) 2.0 test has been logically divided into four BGA diagrams, with South Asians, Middle Easts and Europeans being classified into a single group called Indo-Europeans (see Example 2). ). This classification is anthropological evidence and cultural ties between these groups (eg, their language comes from a common maternal body). The results disclosed herein demonstrate that these groups are much more similar to each other in gene sequence content than the other groups. ANCESTRYbyDNA (TM) 2.0 testing is also performed more accurately when Pacific Islanders are classified with East Asians. As such, the four classifications used in the ANCESTRYbyDNA ™ 2.0 test are: 1) Native Americans (ie, those who moved and resided in South and North America); 2) Indo-Europeans (European, Middle Eastern and South Asians like Indians); 3) East Asians (Japanese, Chinese, Korean, Pacific Islander); and 4) Africans (sub-Saharan). The ANCESTRYbyDNA (TM) 3.0 test can further distinguish between South Asians and Europeans, and Pacific Islanders and East Asians, and accordingly 6 parts (Indigenous Americans, Europeans, Africans, South Asians) Human, East Asian and Pacific Islander), but the confidence interval is larger than that obtained with the ANCESTRYbyDNA ™ 2.0 test. Further improvements to testing are given and confidence intervals are decreasing. By analyzing the supplemental panel and thereby improving the confidence interval by about 50%, the confidence interval centered around the point estimate can be reduced and the accuracy of the test can be increased accordingly.

祖先の割合を決定するために用いられるアルゴリズムは、個体の試料における祖先の比例をそれらの配列に基づいて推論するために特定の統計的方法を用いることが可能であるという考えに基づいて開発された(実施例6参照；表12も参照)。本アルゴリズムを用いてこの推論をする方法は、他のもののそれと類似しており、集団における対立遺伝子頻度が既知であり、かつこの頻度が集団から集団まで有意に異なる場合には、「最大尤推定値」(MLE)が、その対立遺伝子をもつ人が群の1つに属する確率を決定するために用いられうる。複数の遺伝子座由来の複対立遺伝子および複数の集団を含むように拡大される場合、過程は同じである。簡約として、ベイズの定理は、状況を仮定した場合の事象の確率(事後確率と呼ばれる)は、事象を仮定した場合の状況の頻度(条件付き確率)および事象自身の頻度(事前確率)の関数であると述べている。可能な事象の広い範囲について状況を仮定した場合の事象の確率を決定することにより、最高確率をもつものが選択され、それに従って、確率についてのMLEを得ることができる。 The algorithm used to determine the proportion of ancestors was developed based on the idea that certain statistical methods can be used to infer the ancestry proportions in an individual's sample based on their sequence. (See Example 6; see also Table 12). The method of making this inference using this algorithm is similar to that of others, and if the allele frequency in a population is known and this frequency is significantly different from population to population, then "maximum likelihood estimation" The “value” (MLE) can be used to determine the probability that a person with that allele belongs to one of the groups. The process is the same when expanded to include multiple alleles from multiple loci and multiple populations. As a simplification, Bayes' theorem is that the probability of an event assuming a situation (called the posterior probability) is a function of the frequency of the situation assuming the event (conditional probability) and the frequency of the event itself (prior probability). It is said that. By determining the probability of an event assuming a situation for a wide range of possible events, the one with the highest probability is selected and the MLE for the probability can be obtained accordingly.

本アルゴリズムにおいて、事象は祖先の比例であり、状況は個体の遺伝子型である。人間の2つの集団における10個のSNPについての少数対立遺伝子頻度が既知であり、かつ10個のSNPのそれぞれにおける人の配列が既知である場合には、2つの群の1つへの簡単な二項分類が、条件付き確率が最も高いものを選択することによりなされうる。これは、DNA試料からBGAを測定するための現行の方法に対してほとんど改善を示していない。本発明により提供されるものは、祖先のより複雑かつ現実的なシナリオについて祖先の比例を得る能力である。99%アフリカ人、1%ヨーロッパ人、0%先住アメリカ人、0%東アジア人；または98%アフリカ人、1%ヨーロッパ人、1%先住アメリカ人、0%東アジア人などのような多くの可能な組み合わせがある。何千という可能性のそれぞれについて事後確率は、彼または彼女の多座位遺伝子型(すなわち、多くのAIMの遺伝子型)を仮定した場合、任意の特定の個体について同じではなく、実際、各遺伝子型について最高の事後確率または尤度をもつものがある。本アルゴリズムが選択するのはこの組み合わせである(すなわち、MLE)。 In this algorithm, events are proportional to ancestors and situations are individual genotypes. If the minor allele frequencies for 10 SNPs in two populations of humans are known and the human sequence in each of the 10 SNPs is known, a simple to one of the two groups Binary classification can be done by selecting the one with the highest conditional probability. This represents little improvement over current methods for measuring BGA from DNA samples. What is provided by the present invention is the ability to obtain ancestral proportions for more complex and realistic scenarios of ancestors. Many like 99% African, 1% European, 0% Native American, 0% East Asian; or 98% African, 1% European, 1% Native American, 0% East Asian, etc. There are possible combinations. The posterior probabilities for each of thousands of possibilities are not the same for any particular individual, given his or her multilocus genotype (i.e., many AIM genotypes), in fact, each genotype Have the highest posterior probability or likelihood. The algorithm chooses this combination (ie, MLE).

以前の方法は、推定の信頼がわからなかったという点において制限されていた。本アルゴリズムは、信頼水準が確かめられうるように、MLEを中心とした信頼領域をプロットすることを含む、MLEをグラフを用いてプロットすることによりこの制限に取り組んでいる(図2および3参照)。さらに、MLE計算を行うアルゴリズム(すなわち、ソフロウェアコード)は、著しく効率的な様式で動作する。アルゴリズムにより提供される三角形プロットは、MLE計算およびそれらの信頼区間をグラフで表す独創的な方法である。三角形プロットを読むために(下記参照)、垂線が、三角形の各頂点(三角形の点)から三角形の向かい側の辺(底辺)へ下ろされる(図2A参照)。この図において、円はMLEを表し、線は、先住アメリカ人(NAM)頂点から下の線まで下ろされている；線は、底辺における0%から頂点(または先端)における100%までの、先住アメリカ人祖先のパーセンテージについての尺度として働く。この線上へ円を投影することは、円と同じレベルにおいて三角形の右側へフラッシュライトを保持すること、および円が線上に作る影を観察することを類推されうる。この影が線上に落ちる所が先住アメリカ人祖先のパーセンテージを示す。この例において、個体は、線上にハッシュマークにより示されているように、約15%先住アメリカ人である。 The previous method was limited in that it did not know the confidence of the estimate. The algorithm addresses this limitation by plotting MLE graphically, including plotting a confidence region centered on MLE so that the confidence level can be ascertained (see Figures 2 and 3). . Furthermore, algorithms that perform MLE computations (ie, software code) operate in a significantly efficient manner. The triangle plot provided by the algorithm is an ingenious way to graph MLE calculations and their confidence intervals. To read the triangle plot (see below), a perpendicular is dropped from each vertex of the triangle (triangle point) to the opposite side (bottom) of the triangle (see FIG. 2A). In this figure, the circle represents MLE, and the line is down from the Native American (NAM) vertex to the bottom line; the line is indigenous, from 0% at the base to 100% at the vertex (or tip). Serves as a measure of the percentage of American ancestry. Projecting a circle onto this line can be inferred to hold the flashlight to the right side of the triangle at the same level as the circle and to observe the shadow that the circle makes on the line. Where this shadow falls on the line indicates the percentage of Native American ancestry. In this example, the individual is approximately 15% Native American, as indicated by the hash mark on the line.

開示された方法を用いて与えられた結果は、個体について最も関連性のある3つの群の割合を表す三角形プロット上の点として示される、個体についてのBGA混合の統計的推定値(最大尤推定値(MLE))を提供する。MLEは最も可能性が高い推定値であるが、個体についての真の値は割合の異なるセットでありうる。MLEより2倍、5倍および10倍、可能性が低い、計算およびプロットされた推定値をもつ三角形プロットが例証されている。MLEを中心とした第一等高線は、推定値が2倍まで可能性が低い空間の範囲を定め、ほとんど2の値を反映している線近くのそれらの位置およびほとんど1のMLE近くのそれらを含む；MLEを中心とした第二等高線は、推定値が、第一等高線から第二等高線まで続行する同じ段階的様式において5倍可能性が低い空間の範囲を定める。第三等高線は、推定値が、5倍(第二等高線近く)から10倍(第三等高線近く)まで可能性が低い空間の範囲を定める。DNA位置の数を読むのが多ければ多いほど、これらの等高線はMLE点に、より近くに近づく。三角形プロット上において、真の値がMLEとは異なる点で表されるという尤度(確率)は、MLEにぶつかるまで増加し、そこにおいて確率が最大である(すなわち、最大尤推定値；MLE)。検定は、マーカーの非常に大きな集合をシーケンシングすることにより、等高線がMLEに極めて近くになるように行われうる。しかしながら、検定を手頃でかつ効率的に保つために、調査は、優良な信頼度を以て最も可能性の高い割合を測定するのに十分であるマーカーの望ましい数(例えば、10個、15個、20個、25個、30個、40個、50個、60個、70個、80個、90個、100個またはそれ以上)に制限されうる。この点では、100個のSNPマーカーの様々な異なるパネルが調べられており、71個のAIMのパネルは、多数の研究において用いられ、175個のAIMのパネルは、真の信頼が達成されるように調べられることになっている。 The results given using the disclosed method are statistical estimates of the BGA mixture for individuals (maximum likelihood estimation), shown as points on a triangle plot representing the proportion of the three groups most relevant for the individual. Value (MLE)). While MLE is the most likely estimate, the true value for an individual can be a different set of proportions. Triangular plots with calculated and plotted estimates that are two, five and ten times less likely than MLE are illustrated. The first contours centered on the MLE define the extent of the space where the estimate is likely to be doubled, locating those locations near the line reflecting almost the value of 2 and those near the MLE of almost 1. Contain; second contour centered around MLE delimits the space where the estimate is five times less likely in the same stepwise manner, continuing from the first contour to the second contour. The third contour line defines the range of the space where the estimated value is less likely from 5 times (near the second contour line) to 10 times (near the third contour line). The more you read the number of DNA positions, the closer these contours will be to the MLE point. On a triangular plot, the likelihood (probability) that a true value is represented at a point different from MLE increases until it hits MLE, where the probability is the highest (ie, maximum likelihood estimate; MLE) . The assay can be performed so that the contour lines are very close to the MLE by sequencing a very large set of markers. However, in order to keep the assay affordable and efficient, the survey will show that the desired number of markers (e.g. 10, 15, 20) is sufficient to measure the most likely proportion with good confidence. , 25, 30, 40, 50, 60, 70, 80, 90, 100 or more). In this regard, a variety of different panels of 100 SNP markers have been examined, 71 AIM panels are used in numerous studies, and 175 AIM panels achieve true confidence It is supposed to be examined as follows.

本発明のBGA検査は、様々なヒト集団においてDNA配列変異体の頻度を測定することにより確証されている。さらに、検査は、広い範囲の祖先群由来の多数の人々を用いて評価されており、推定は、人類学的および歴史的データから知られているものと十分一致していた。例えば、ヒスパニックは、植民地のヨーロッパ人の先住アメリカ人との混合からの民族群として生まれたことが知られており、BGA検査を用いて調べられた何百人というヒスパニックは、ほとんど独占的にこれらの2つの群について整列した。もう一つの例として、ナイジェリア人はほとんど純粋なアフリカ人BGAをもつとしてプロットするが、アフリカ系アメリカ人は、米国におけるアフリカ人とヨーロッパ人との間の混合についての知識から予想されることであるが、この群とヨーロッパ人の間の混合としてより多くプロットする。 The BGA test of the present invention has been validated by measuring the frequency of DNA sequence variants in various human populations. Furthermore, the tests were evaluated using a large number of people from a wide range of ancestry groups, and the estimates were in good agreement with those known from anthropological and historical data. For example, Hispanics are known to have been born as ethnic groups from a mix of colonial European and Native Americans, and hundreds of Hispanics examined using BGA testing are almost exclusively these The two groups were aligned. As another example, Nigerians plot as having almost pure African BGA, but African Americans are expected from knowledge of the mix between Africans and Europeans in the United States But plot more as a mix between this group and Europeans.

方法はまた、血統挑戦を通じて確証された(実施例1参照)；すなわち、BGAが母親および父親から決定される場合、それらの子どものそれはその2人の間のどこかにプロットするはずである。多数の家系図がその検査を用いて調べられ、子どもの祖先の割合は、子どもの両親のそれらの間に常にプロットした。MLE推定が客観的に(目隠しして)検査される場合、それらは祖先の割合の優秀な推定であると証明される。例えば、母親がヨーロッパ人混合であり、父親がほとんどギリシャ人であるヨーロッパ系アメリカ人男性についてのデータは、その男性が85%ヨーロッパ人祖先をもつが、15%先住アメリカ人祖先ももつことを示した(実施例1)。実際、彼の父方の曾祖母は純血のチェロキー族であり、このように検査の結果を確証させた(遺伝学の法則に基づき、その男性は、彼の曾祖母が100%先住アメリカ人であり、かつ彼の他の親族の誰も先住アメリカ人祖先をもたなかった場合には、12%先住アメリカ人祖先をもつことが予想される)。さらに、男性の妻はメキシコ人であり、彼女はほとんど先住アメリカ人であることが測定されたが、いくらかの先住アメリカ人およびアフリカ人ヘリテージをもった。これは、植民地のカリブ海およびラテンアメリカにおけるスペイン人探検家と先住アメリカ人との結婚に由来するヒスパニックの人類学的起源から知られていることに基づいても予想された。その男性と女性の3人の子どものそれぞれは、予想通り、おおよそ、両親の間の中間点でプロットした。どの子どももアジア人または太平洋諸島系祖先を少しも示さなかったが、両親のどちらも少しの有意なアジア人または太平洋諸島系ヘリテージも示さなかったため、不可能であっただろうし(正確な検査を仮定する場合)、どの子どもも彼らの母親より多いアフリカ人祖先をもたないことが見出されたが、父親が実質的にもたないという事実を仮定すれば、それもまた不可能であろう。このように、子どもの結果は、両親のそれらと矛盾がなく、MLE値は、伝記上データから知られていることに対して検定された場合、正確な推定であった。 The method was also validated through a pedigree challenge (see Example 1); that is, if BGA is determined from the mother and father, those of those children should plot somewhere between the two. A number of pedigrees were examined using the test, and the percentage of children's ancestry was always plotted between those of the children's parents. If MLE estimates are examined objectively (blindly), they prove to be excellent estimates of ancestry proportions. For example, data for a European-American male whose mother is a mixed-European mother and whose father is mostly Greek shows that the male has 85% European ancestry but also 15% Native American ancestry. (Example 1). In fact, his paternal great-grandmother was a purebred Cherokee, thus confirming the results of the test (based on the laws of genetics, the man was 100% native American with his great-grandmother, and If none of his other relatives had indigenous American ancestry, it is expected to have 12% indigenous American ancestry). In addition, the male wife was Mexican and she was measured to be mostly indigenous Americans, but had some indigenous American and African heritage. This was also expected based on what is known from Hispanic anthropological origins derived from the marriage of Spanish explorers and Native Americans in the colonial Caribbean and Latin America. Each of the three male and female children, as expected, plotted approximately at the midpoint between the parents. None of the children showed any Asian or Pacific Islander ancestry, but neither would have been possible because neither of the parents showed any significant Asian or Pacific Islander heritage (assuming accurate testing). ), It was found that none of the children had more African ancestry than their mothers, but that would also not be possible given the fact that the father has virtually no . Thus, the child's results were consistent with those of the parents, and the MLE value was an accurate estimate when tested against what is known from the biographical data.

今までのところ測定された遺伝子型(ヌクレオチド文字)は全く正確である。入手可能な最新の遺伝子読み取り装置が用いられるため、99%正確さより大きい正確さが各部位について日常的に達成される。正確な値が特定の試料における特定部位について得られなかった場合には、その部位についての遺伝子型文字の代わりに、「FL」が示される。少しのFLをもつことは、一般的に、優良な祖先推定を妨げない。試料は、部位についてFLを生じうる、なぜなら、例えば、この部位の周りの染色体の小領域が欠損している、もしくは大部分についてとは異なる配列の特徴をもつからである；またはDNA試料を収集するために用いられた頬側塗布から十分なDNAが得られなかったからである。 So far the genotype (nucleotide character) measured is quite accurate. Since the latest available gene readers are used, accuracy greater than 99% accuracy is routinely achieved for each site. If an exact value is not obtained for a particular site in a particular sample, “FL” is shown instead of the genotype letter for that site. Having a little FL generally does not interfere with good ancestry estimation. A sample can produce FL for a site because, for example, a small region of the chromosome around this site is missing or has different sequence characteristics than most; or a DNA sample is collected This is because sufficient DNA could not be obtained from the buccal application used to do this.

ゲノムは、BGA AIMの有用なパネルについてスキャンされ、BGA混合割合を測定するための最尤法アルゴリズムを用いて同定された最良の71個のAIMが選択された(表6)。これらのAIMを用いて、大部分BGA所属は、BGAに関する自己保有概念と一致した様式で測定でき、BGA混合割合は、人種の推論について以前に記載された方法と比較して、有意に改善された精度、正確さおよび信頼性を以て測定できた(実施例2参照；32個のマーカー試験を用いる実施例1も参照)。この検査は、隠れたもしくは微小な集団構造が課する狡猾な影響を低下させるまたは排除するのを助けるために研究設計中に用いられうる。検査はまた、犯罪現場DNAから人種を推論するために精度が悪く、かつ時々不正確な手段を現在用いている法医学科学者にとって有用でありうる。 The genome was scanned for a useful panel of BGA AIMs, and the best 71 AIMs identified using the maximum likelihood algorithm to measure the BGA mixing ratio were selected (Table 6). Using these AIMs, the majority of BGA affiliations can be measured in a manner consistent with the self-contained concept of BGA, and the BGA mix ratio is significantly improved compared to previously described methods for racial inference Was able to be measured with the accuracy, accuracy and reliability provided (see Example 2; see also Example 1 with 32 marker tests). This test can be used during research design to help reduce or eliminate the devastating effects imposed by hidden or small population structures. Testing can also be useful for forensic scientists who currently use inaccurate and sometimes inaccurate means to infer race from crime scene DNA.

本発明はまた、1枚または複数の写真を含む製品を提供し、各写真は、BGAを示すAIMのヌクレオチド出現を含む集団構造に対応する既知の比例的祖先をもつ人のものであり、既知の比例的祖先は、品における写真と結びつけられている。本発明の製品(すなわち、写真および比例的祖先情報)は、1つまたは複数のファイルに含まれうる(例えば、1つのファイルに写真および情報、または1つのファイルに写真、かつ写真にリンクしているもしくはリンクされうる第二のファイルに情報)。望ましい場合には、既知の比例的祖先をもつ個体の1枚より多い写真が同じまたはリンクされたファイルに含まれうる、例えば、個体の異なるプロフィールを含む写真または様々な年齢での個体の写真。 The present invention also provides a product comprising one or more photographs, each photograph being of a person with a known proportional ancestry corresponding to a population structure containing the nucleotide occurrence of AIM representing BGA. Proportional ancestry is associated with the photographs in the goods. Products of the invention (i.e., photos and proportional ancestry information) can be contained in one or more files (e.g., photos and information in one file, or photos in one file, and links to photos) Information to a second file that may or may not be linked). If desired, more than one photo of an individual with a known proportional ancestry can be included in the same or linked file, eg, photos containing different profiles of individuals or photos of individuals at various ages.

同様に、複数の品(すなわち、写真および比例的祖先情報)は1つのファイルに含まれうる、例えば、異なる人の複数の写真を含むファイルであり、その人達の一部またはすべては、BGAを示すAIMのヌクレオチド出現を含む集団構造に対応する同じまたは異なる既知の比例的祖先をもっている。そのような複数の品はまた、異なるファイルに含まれうる、例えば、それぞれが1枚の写真および写真における個体の既知の比例的祖先に関する情報を含んでいる、またはそれぞれが異なる個体の2枚またはそれ以上の写真を含んでいて、その個体のそれぞれが同じ既知の比例的祖先を含む、またはそれぞれが異なる個体の2枚またはそれ以上の写真を含んでいて、個体の一部またはすべてが、写真がそのファイルに含まれている別の個体と比較して異なる比例的祖先をもつ、複数のファイルを含む。従って、複数のそのような品が提供され、複数のファイルも提供されるのだが、各ファイルが、BGAを示すAIMのヌクレオチド出現を含む集団構造に対応する同じもしくは異なる既知の比例的祖先をもつ1人または複数の人のものでありうる、1つまたは複数の品、すなわち写真、を含むことができる；およびその複数のファイルは、それぞれが1人もしくは複数の人の1枚または複数の写真を含み、かつ2人もしくはそれ以上の異なる人の1枚または複数の写真を含む場合、その異なる人は同じまたは異なる既知の比例的祖先をもちうる、ファイルを含みうる。 Similarly, multiple items (i.e., photos and proportional ancestry information) can be included in a single file, e.g., a file that includes multiple photos of different people, some or all of which have a BGA It has the same or different known proportional ancestry corresponding to the population structure containing the nucleotide occurrences of AIM shown. Such multiple items can also be included in different files, for example, each containing information about one photograph and the known proportional ancestry of the individual in the photograph, or two of each Contains more photographs, each of which contains the same known proportional ancestry, or each contains two or more photographs of different individuals, and some or all of the individuals Contains multiple files with different proportional ancestry compared to another individual contained in the file. Thus, multiple such items are provided, and multiple files are also provided, but each file has the same or different known proportional ancestry corresponding to the population structure containing the AIM nucleotide occurrences representing BGA Can include one or more items, i.e. photographs, which can be of one or more persons; and the plurality of files can each include one or more photographs of one or more persons And including one or more photographs of two or more different people, the different people may contain files that may have the same or different known proportional ancestry.

製品、すなわち、BGAを示すAIMのヌクレオチド出現を含む集団構造に対応する既知の比例的祖先をもつ人の写真は、デジタル写真でありうり、写真画像について、および関連性がありうるまたは所望されうる任意の他の情報(例えば、写真における対象者の年齢、名前もしくは連絡情報、または対象者は彼もしくは彼女の祖先が何であるかと考えているかに関する質問票についての対象者の回答)を含むデジタル情報を含む。1枚または複数のデジタル写真のそのようなデジタル情報は、データベースに含まれうり、それに従って、電子的手段を用いて写真および/または既知の比例的祖先情報の検索を容易にする。それとして、本発明は、それぞれがデジタル情報を含む少なくとも2枚のデジタル写真を含む、複数の製品をさらに提供する。1つまたは複数の品についてのデジタル情報がデータベースに含まれているところにおいて、それは、例えば、コンピューターハードウェアもしくはソフトウェア、磁気テープ、またはフロッピーディスク、CDもしくはDVDのようなコンピューターディスクを含む、そのようなデータベースを含むのに適した任意の媒体に含まれうる。それとして、データベースは、その中にデータベースを含むことができる、データベースを含む媒体を受け入れることができる、または有線もしくは無線のネットワーク、例えば、イントラネットもしくはインターネット、を通してデータベースにアクセスすることができる、コンピューターによりアクセスされうる。 A photograph of a person with a known proportional ancestry that corresponds to a product structure, ie, a population structure containing nucleotide occurrences of AIM representing BGA, can be a digital photograph, for a photographic image, and can be relevant or desirable Digital information including any other information (e.g., subject's answer to a questionnaire about the subject's age, name or contact information in the photo, or what the subject thinks about his or her ancestry) including. Such digital information of one or more digital photographs may be included in a database, and accordingly facilitate retrieval of photographs and / or known proportional ancestor information using electronic means. As such, the present invention further provides a plurality of products that each include at least two digital photographs each containing digital information. Where digital information about one or more items is included in the database, it includes, for example, computer hardware or software, magnetic tape, or computer disks such as floppy disks, CDs or DVDs, etc. It can be included on any medium suitable for containing a simple database. As such, the database can be contained by the computer, can contain the database, can accept the medium containing the database, or can access the database through a wired or wireless network, such as an intranet or the Internet. Can be accessed.

本発明はまた、本発明の方法を実施するために有用なキットを提供する。そのようなキットは、例えば、複数のハイブリダイズするオリゴヌクレオチドを含みうり、それぞれが配列番号：1〜331に示されるポリヌクレオチド(またはそれらに相補的なポリヌクレオチド)の少なくとも15個の連続したヌクレオチドの長さをもち、その複数がそのようなオリゴヌクレオチドの少なくとも5つ(例えば、5、6、7、8、9、10、15、20、25、30など)を含み、それぞれが配列番号：1〜331に示される異なるポリヌクレオチドに基づいている。一つの態様において、ハイブリダイズするオリゴヌクレオチドは、少なくとも5つの、配列番号：1〜71に示されるポリヌクレオチド、または配列番号：1〜71のいずれかに相補的なポリヌクレオチドの、少なくとも15個の連続したヌクレオチドを含む。もう一つの態様において、ハイブリダイズするオリゴヌクレオチドは、配列番号：1〜71に示される少なくとも10個のAIMに特異的である。本発明のキットはまた、例えば、配列番号：7、21、23、27、45、54、59、63および72〜152に示されるAIMに特異的な少なくとも5つ(例えば、5、6、7、8、9、10、11、12、13、14、15など)のハイブリダイズするオリゴヌクレオチドのパネル；または配列番号：3、8、9、11、12、33、40、59、63および153〜239に示されるAIMに特異的な少なくとも5つのハイブリダイズするオリゴヌクレオチドのパネル；または配列番号：1、8、11、21、24、40、172および240〜331に示されるAIMに特異的な少なくとも5つのハイブリダイズするオリゴヌクレオチドのパネル；または、そのようなパネルの2つもしくはそれ以上、および/もしくは配列番号：1〜71に示されるAIMに特異的な少なくとも5つのハイブリダイズするオリゴヌクレオチドのパネルを含む、そのようなハイブリダイズするオリゴヌクレオチドの少なくとも2つのパネルを含みうる。 The invention also provides kits useful for performing the methods of the invention. Such a kit may comprise, for example, a plurality of hybridizing oligonucleotides, each of at least 15 contiguous nucleotides of the polynucleotides shown in SEQ ID NOs: 1-331 (or polynucleotides complementary thereto) A plurality of which comprises at least 5 of such oligonucleotides (eg, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, etc.), each of which is SEQ ID NO: Based on different polynucleotides shown in 1-331. In one embodiment, the hybridizing oligonucleotide comprises at least 5 of the polynucleotide set forth in SEQ ID NOs: 1-71, or the polynucleotide complementary to any of SEQ ID NOs: 1-71. Contains consecutive nucleotides. In another embodiment, the hybridizing oligonucleotide is specific for at least 10 AIMs shown in SEQ ID NOs: 1-71. The kit of the present invention also includes at least five (for example, 5, 6, 7) specific to AIM shown in, for example, SEQ ID NOs: 7, 21, 23, 27, 45, 54, 59, 63 and 72-152. , 8, 9, 10, 11, 12, 13, 14, 15, etc.); or SEQ ID NOs: 3, 8, 9, 11, 12, 33, 40, 59, 63 and 153 A panel of at least 5 hybridizing oligonucleotides specific for AIM as shown in ~ 239; or specific for AIM as shown in SEQ ID NOs: 1, 8, 11, 21, 24, 40, 172 and 240-331 A panel of at least 5 hybridizing oligonucleotides; or two or more of such panels and / or of at least 5 hybridizing oligonucleotides specific for AIM shown in SEQ ID NOs: 1-71 Such hybridizing orifices, including panels It may contain at least two panels of gogonucleotides.

本発明のキットのハイブリダイズするオリゴヌクレオチドは、AIMのSNP位置における特定のヌクレオチド出現を含む特定のAIMを検出するために有用であるプローブを含みうる；プライマー伸長反応に有用なプライマーおよび核酸増幅反応に有用なプライマー対を含む、プライマーを含みうる；またはそのようなプローブおよびプライマーの組み合わせを含みうる。その複数のハイブリダイズするオリゴヌクレオチドは、必要ではないが、AIMのSNPもしくはDIPのヌクレオチド位置、例えば、配列番号：1〜55および57〜331のいずれかに示されるAIMのヌクレオチド50位、もしくは配列番号：56のヌクレオチド26位、またはそれらに相補的なヌクレオチド配列について、に対応するヌクレオチドを含みうり、そのようなハイブリダイズするオリゴヌクレオチドは、AIMのSNP位置における特定のヌクレオチド出現の存在または非存在を同定するプローブとして有用である。 The hybridizing oligonucleotides of the kits of the invention can include probes that are useful for detecting specific AIMs that include specific nucleotide occurrences at the SNP position of AIM; primers useful for primer extension reactions and nucleic acid amplification reactions Can include primers, useful primer pairs; or can include combinations of such probes and primers. The plurality of hybridizing oligonucleotides is not necessary, but is a nucleotide position of SNP or DIP of AIM, for example, nucleotide position 50 of AIM shown in any of SEQ ID NOs: 1 to 55 and 57 to 331, or a sequence Number: 56 of nucleotide 56, or for a complementary nucleotide sequence, may include a nucleotide corresponding to, such a hybridizing oligonucleotide may or may not be present at a particular nucleotide occurrence at the SNP position of AIM It is useful as a probe for identifying.

本発明のキットはまた、SNP位置におけるヌクレオチド出現またはAIMのDIP位置におけるヌクレオチド配列の存在もしくは非存在を検出するために有用な少なくとも1対のハイブリダイズするオリゴヌクレオチドを含みうる。例えば、1対のハイブリダイズするオリゴヌクレオチドは、AIMのSNP位置に隣接しかつ上流にハイブリダイズする1つのオリゴヌクレオチドおよびAIMのSNP位置に隣接しかつ下流にハイブリダイズする第二のオリゴヌクレオチドを含みうり、対の一方または他方は、AIMのSNP位置にあるのではないかと疑われるヌクレオチド出現(すなわち、多型ヌクレオチドの1つ)に相補的なヌクレオチドをさらに含み、そのような1対のハイブリダイズするオリゴヌクレオチドは、オリゴヌクレオチドライゲーションアッセイ法に有用である。もう一つの例において、1対のハイブリダイズするオリゴヌクレオチドは、フォワードプライマーおよびリバースプライマーを含む増幅プライマー対を含みうり、そのような1対のハイブリダイズするオリゴヌクレオチドは、AIMのSNPまたはDIP位置を含むポリヌクレオチド部分を増幅するために有用である。 The kits of the invention can also include at least one pair of hybridizing oligonucleotides useful for detecting the occurrence of nucleotides at the SNP position or the presence or absence of nucleotide sequences at the DIP position of the AIM. For example, a pair of hybridizing oligonucleotides comprises one oligonucleotide that hybridizes adjacent and upstream to the SNP position of AIM and a second oligonucleotide that hybridizes adjacent and downstream of the SNP position of AIM. In other words, one or the other of the pair further comprises a nucleotide complementary to the nucleotide occurrence suspected of being at the SNP position of AIM (i.e., one of the polymorphic nucleotides), and such a pair of hybrids These oligonucleotides are useful in oligonucleotide ligation assays. In another example, a pair of hybridizing oligonucleotides can include an amplification primer pair that includes a forward primer and a reverse primer, and such a pair of hybridizing oligonucleotides contains the SNP or DIP position of AIM. Useful for amplifying the polynucleotide portion comprising.

以下の実施例は、例証するものであるが、本発明を限定するものではない。 The following examples illustrate but do not limit the invention.

実施例1
祖先情報提供マーカーを用いる生物地理学的祖先の測定
この実施例は、32個の祖先情報提供マーカー(AIM)がアフリカ人、ヨーロッパ人および先住アメリカ人の集団からの遺伝的寄与の推定を可能にすることを実証する。 Example 1
Measuring biogeographical ancestry using ancestral informative markers This example allows 32 ancestral informative markers (AIM) to estimate genetic contributions from African, European and Native American populations Demonstrate that

例証された研究に用いられるAIMは、一塩基多型(SNP)、欠失/挿入多型(DIP)およびAlu配列を含む(AIMの同定について実施例2を参照)。30%より大きい親集団間の差を示すマーカーが選択された(表1；配列番号：332〜363も参照)。情報を与える遺伝マーカーは、混合推定のためのマーカーの有用性を確認するために、ヨーロッパ人(スペイン人およびドイツ人)、アフリカ人(ナイジェリア、シエラレオネおよび中央アフリカ共和国から)および先住アメリカ人(マヤ語族および南西部先住アメリカ人)集団のパネルにおいて各候補マーカーを試験することにより同定された。 The AIM used in the illustrated study includes single nucleotide polymorphism (SNP), deletion / insertion polymorphism (DIP) and Alu sequences (see Example 2 for identification of AIM). Markers were selected that showed differences between parental populations greater than 30% (Table 1; see also SEQ ID NOs: 332-363). Informative genetic markers can be used to confirm the utility of markers for mixed estimation, including Europeans (Spanish and German), Africans (from Nigeria, Sierra Leone, and Central African Republic) and Native Americans (Maya Identified by testing each candidate marker in a panel of populations (Spanish and Southwestern Native Americans).

（表１）祖先情報提供マーカーパネル

マーカー名および染色体バンド、メガベース(Mb)での染色体上におけるマーカーのおおよその位置、ならびにアフリカ人とヨーロッパ人集団(AF/EU)、アフリカ人と先住アメリカ人(AF/NA)およびヨーロッパ人と先住アメリカ人(EU/NA)の間での頻度における差が示されている。30%より大きい差は太字で示されている(参照として本明細書に組み入れられている、Shriverら、前記、2003も参照)。
^＊括弧内の数字はAIMについての配列番号である；NS - 示されていない配列。 (Table 1) Ancestral information provision marker panel

Marker name and chromosome band, approximate location of the marker on the chromosome in megabase (Mb), and African and European population (AF / EU), African and Native American (AF / NA) and European and Indigenous Differences in frequency between Americans (EU / NA) are shown. Differences greater than 30% are shown in bold (see also Shriver et al., Supra, 2003, incorporated herein by reference).
^* Numbers in parentheses are SEQ ID NOs for AIM; NS-sequences not shown.

公的に利用可能なヒトゲノム配列データベースおよび多型データベースを、優良なAIMであることの判定基準に合うSNPを同定するためにスクリーニングした。対立遺伝子頻度は、3つの集団 − アフリカ人、ヨーロッパ人およびアジア人、に関する公開データベースにおいてそのSNPの多数について入手可能である。これらの頻度は少数のサンプルから得られているため、それらは必ずしも正確であるとは限らない。本明細書での選択の主な判定基準は、これらの頻度を用いることに由来したデルタ値であり、人間の様々な集団間での少数対立遺伝子頻度における差の統計的尺度である。例えば、ヒトゲノムにおける特定の場所でのCまたはG多型は、Cがヨーロッパ人家系の個体に主として存在し、かつGが先住アメリカ人家系の個体に主として存在するのだが、高デルタ値をもち、それゆえに、優良なAIMの資格がある。同様に、ヒトゲノムにおける特定の場所でのAまたはC多型は、Aがアフリカ人家系の個体に主として存在し、かつCがアジア人家系の個体に主として存在するのだが、これらの群間に大きな頻度差、それゆえに、高デルタ値をもち、従って、優良なAIMの資格がある。そのような「候補」AIMのリストが集められ、可能なペアワイズ集団比較のそれぞれについて最も大きいデルタ値から最も小さいデルタ値まで並べられ、一度に1つずつ、「親の」サンプルのパネルに対してスクリーニングされた。親のサンプルは、比較的均一である世界の領域、例えば、サハラ以南アフリカ人についてニジェールまたはコンゴ、先住アメリカ人について南メキシコ、東アジア人について中国、およびヨーロッパ人についてヨーロッパ、由来のサンプルである。 Publicly available human genome sequence databases and polymorphism databases were screened to identify SNPs that meet the criteria for being a good AIM. Allele frequencies are available for many of its SNPs in public databases for three populations-African, European and Asian. Since these frequencies are derived from a small number of samples, they are not always accurate. The primary criterion for selection herein is the delta value derived from using these frequencies and is a statistical measure of the difference in minor allele frequencies between various populations of humans. For example, a C or G polymorphism at a particular location in the human genome has a high delta value, although C is predominantly present in individuals of European descent and G is predominantly present in individuals of Native American descent. Therefore, it has a good AIM qualification. Similarly, the A or C polymorphism at a particular location in the human genome is large between these groups, although A is predominantly present in individuals of African descent and C is predominantly present in individuals of Asian descent. It has a frequency difference and hence a high delta value and is therefore a good AIM qualification. A list of such “candidate” AIMs is collected and ordered from the largest delta value to the smallest delta value for each possible pair-wise population comparison, one at a time against a panel of “parent” samples. Screened. Parental samples are samples from areas of the world that are relatively uniform, such as Niger or Congo for sub-Saharan Africans, Southern Mexico for Native Americans, China for East Asians, and Europe for Europeans.

候補AIMの約半分は、それらの実際のデルタ値が公開データベース対立遺伝子頻度から予想されるほど高くなかったため、非常に有用とは言えないことがわかった(いくつかはSNPでさえなかった、または現在のプラットフォームを用いてアッセイされえなかった)。本明細書に例証されているもののような真のAIMとして確証された配列は、混合マッピング、個体の祖先割合を推論すること、および集団群混合割合を推論すること、加えて、それらの祖先情報提供性を通して特定のヒト形質に相関する対立遺伝子に関するマーカーを同定するためにゲノムをスクリーニングすることに有用であった。各候補AIMは最初は、大まかな集団構造差(すなわち、大陸集団)に基づいて公開データベースから選択されたとしても、それらの多数は、ヒト進化を通じてのより大きな群からのヒトの下位群の分離は、それらの配列を固定させるかまたは排除するかのいずれかに作用する遺伝的浮動、創始者効果、および自然淘汰の豊かな機会を与えたため、細かいレベルの構造に関する情報を所有することが見出された。 About half of the candidate AIMs proved not very useful because their actual delta values were not as high as expected from public database allele frequencies (some were not even SNPs, or Could not be assayed using the current platform). Sequences validated as true AIMs such as those illustrated herein are mixed mappings, inferring individual ancestry proportions, and inferring population group mixture proportions, plus their ancestry information It was useful to screen the genome to identify markers for alleles that correlate with specific human traits through donation. Even though each candidate AIM was initially selected from a public database based on rough population structure differences (i.e., continental populations), many of them separated human subgroups from larger groups throughout human evolution. Gave them a wealth of opportunities for genetic drift, founder effects, and natural selection that acted to either fix or eliminate their sequences, and to see that they possess information at a fine level of structure. It was issued.

配列は、5'から3'まで(左から右へ)の配列リストに示され、配列番号：1〜331について、一般的に、しかし必ずしもそうとは限らないが、5'末端からヌクレオチド50位にSNPを含む(配列番号：56、26位を除く)。多型はIUB記号、S=C/G、Y=C/T、R=A/G、K=G/T、W=A/Tなど、で示されている。それとして、開示された配列(配列番号：1〜331)は、調べられることになっている標的(すなわち、多型)に関する情報、加えてSNPをサンプリングする(すなわち、サンプルの遺伝子型を測定する)ためのプライマーおよび増幅プライマー対、ならびにハイブリダイゼーションプローブを調製するための情報を提供する。さらに、開示された配列は、望ましい場合には、追加的な上流および下流ヌクレオチド配列を同定するために公開データベースをスキャンするのに用いられうる。 The sequences are shown in the sequence list from 5 ′ to 3 ′ (from left to right), and generally but not necessarily at nucleotide position 50 from the 5 ′ end for SEQ ID NOS: 1-331 Contains SNP (SEQ ID NO: 56, excluding position 26). Polymorphisms are indicated by the IUB symbol, S = C / G, Y = C / T, R = A / G, K = G / T, W = A / T, etc. As such, the disclosed sequences (SEQ ID NOs: 1-331) sample information about the target (ie, polymorphism) to be examined, plus the SNP (ie, measure the genotype of the sample) ) And amplification primer pairs, as well as information for preparing hybridization probes. Furthermore, the disclosed sequences can be used to scan public databases to identify additional upstream and downstream nucleotide sequences, if desired.

このマーカーのパネルは、集団サンプルにおいて混合割合を精度を以て推定するのに極めて強力である(標準誤差は典型的には、1%と5%の間)。さらに、AIMは、個体の祖先の手頃な価格の推定を提供し、等価の精度はより多くのマーカーを用いて得られうることを示唆した(実施例2で確認された)。個体の祖先を推定する2つの独立した方法、最大尤推定法(MLE)(Chakrabortyら、前記、1986)およびプログラムSTRUCTUREを用いるベイズの方法(Pritchardら、前記、2000)が用いられた；両方の方法により得られた値は、個体の祖先の推定がワシントンDCからのアフリカ系アメリカ人のサンプルにおけるパーセントアフリカ人の遺伝的寄与の項で比較された場合、高く相関していた(R²=0.9836)。これらのマーカーは、混合された集団からのサンプルに集団構造があるかどうかを測定するのに優秀である。下で考察されているように、混合の過程は、集団に有意な構造、およびその結果として、多数の偽陽性結果(マーカーの疾患原因性遺伝子との物理的連鎖によるのではなく、遺伝的構造により引き起こされる陽性関連)を生じうり、マッピング結果を誤解するリスクを有意に増加させるため、この能力は、混合マッピング適用に関して重要である。 This panel of markers is extremely powerful for estimating the mixing ratio with accuracy in population samples (standard error is typically between 1% and 5%). In addition, AIM provided an affordable estimate of the ancestor of the individual, suggesting that equivalent accuracy could be obtained using more markers (confirmed in Example 2). Two independent methods of estimating an ancestor of an individual were used, maximum likelihood estimation (MLE) (Chakraborty et al., 1986) and Bayes' method using the program STRUCTURE (Pritchard et al., 2000); both The values obtained by the method were highly correlated when individual ancestry estimates were compared in terms of percent African genetic contribution in an African American sample from Washington, DC (R ² = 0.9836 ). These markers are excellent for measuring whether a sample from a mixed population has a population structure. As discussed below, the process of mixing is significant for the population and, as a result, a large number of false positive results (genetic structure, not by physical linkage of the marker to the disease-causing gene. This ability is important for mixed mapping applications because it can significantly increase the risk of misleading mapping results.

本研究は、AIMが開示された方法を用いて同定されうることを確認し、かつ全ヒトゲノムに及ぶ約1,000個のAIMのパネルを編集するという最終目標へ向けて適用されうる32個のAIMのパネルを提供する。候補AIMは、SNPコンソーシアム(The SNP Consortium)(TSC)により作成されたSNP対立遺伝子頻度データをスクリーニングすることにより得られた。Sanger Centre、Celera Genomics、Washington University、Orchid Biosciences、Motorola、およびWhitehead Instituteを含む6つのサイトは、2003年現在、3つの集団(アフリカ系アメリカ人、ヨーロッパ系アメリカ人およびアジア系アメリカ人)のそれぞれからの42人の個体の中央収集を用いて、ゲノム全体を通じて位置している60,000個のSNPに関する対立遺伝子頻度を作成した。このデータベースは、研究者に自由に利用可能であるが(例えば、ハイパーテキストトランスファープロトコール(「http」)を用いて、URL「snp.cshl.org」を参照)、本結果を提供するために用いられ、それに従って、AIMのゲノムワイドなパネルを編集するための供給源の有用性を実証した。 This study confirms that AIM can be identified using the disclosed method, and that of 32 AIMs that can be applied towards the final goal of editing a panel of approximately 1,000 AIMs spanning the entire human genome. Provide a panel. Candidate AIMs were obtained by screening SNP allele frequency data generated by the SNP Consortium (TSC). Six sites, including Sanger Centre, Celera Genomics, Washington University, Orchid Biosciences, Motorola, and Whitehead Institute, as of 2003, each from three groups (African American, European American, and Asian American) Using a central collection of 42 individuals, we generated allele frequencies for 60,000 SNPs located throughout the genome. This database is freely available to researchers (see, for example, the URL “snp.cshl.org” using the Hypertext Transfer Protocol (“http”)), but used to provide the results. And accordingly demonstrated the usefulness of the source for editing the genome-wide panel of AIMs.

本研究は、SNPデータベースの正確さおよびそこに存在する候補SNPの数に焦点を合わせた。データベースの正確さに関して、SNPコンソーシアムに含まれる各グループは、データを作成することに異なるアプローチを採用した。それとして、データはどのようにして結合されうるかに関する最初の関心事に取り組んだ。ジェノタイピングのアプローチが各グループについて異なったため、特定のグループのデータに異なって影響を及ぼしうる確認偏向の問題に取り組む必要があった。例えば、グループの大部分は、TSC多様性パネルのサブセットをシーケンシングした後でそれらの対立遺伝子頻度を作成し、その後、3つの集団由来の42人の個体の大きい方の群においてこれらのマーカーに得点をつけた。Washington Universityグループは、領域を通じてのプール化シーケンシングが行われ、対立遺伝子頻度はこの試み中に発見された可変性の位置について計算されるというアプローチを採用した。Orchidグループは、シーケンシングを用いなかったが、その代わりとして、多型であることが知られているTSC SNPデータベースからの遺伝子座から開始した。そのような差異を考慮すれば、体系的特徴付けは、もしあれば、異なる偏向が結果に影響を及ぼした可能性があるという程度としてなされた。 This study focused on the accuracy of the SNP database and the number of candidate SNPs present. Regarding database accuracy, each group included in the SNP consortium took a different approach to creating data. As such, we addressed the initial concerns regarding how data can be combined. Because the genotyping approach was different for each group, it was necessary to address the problem of confirmation bias that could affect the data in a particular group differently. For example, the majority of groups generated their allelic frequencies after sequencing a subset of the TSC diversity panel, and then assigned these markers to the larger group of 42 individuals from three populations. Scored. The Washington University group took an approach where pooled sequencing across the region was performed and allelic frequencies were calculated for the variable positions found during this attempt. The Orchid group did not use sequencing, but instead started with a locus from the TSC SNP database known to be polymorphic. Given such differences, systematic characterization was made to the extent that different deflections, if any, may have influenced the results.

そのような可能性のある偏向を体系的に特徴付けるための一つのアプローチは、1つより多いグループによりジェノタイピングされた遺伝子座についての対立遺伝子頻度を比較することであった。予想されたとおり、45°線を中心としてばらつきがあったが、異なるジェノタイピングおよび確認ストラテジーにより導入された対立遺伝子頻度偏向の程度が制限されていることを示す、異なるグループにより得られた頻度データにおいて一般的な取り決めがあった(R²=0.8762)。これらのデータの正確さを検定する次の段階は、データをサイトにより分類し、ペアワイズ比較を行うことであり、他のサイトと比較した場合、より多くはずれる対立遺伝子頻度推定をもつ特定のサイトの同定を可能にする。 One approach to systematically characterize such potential bias was to compare allelic frequencies for loci genotyped by more than one group. As expected, frequency data obtained by different groups that varied around the 45 ° line, but showed a limited degree of allelic frequency bias introduced by different genotyping and confirmation strategies There was a general arrangement in (R ² = 0.8762). The next step in testing the accuracy of these data is to categorize the data by site and perform a pair-wise comparison, which is more specific for a particular site with an allele frequency estimate that deviates more when compared to other sites. Allows identification.

候補AIMの数に関して、TSCにより特徴付けられた60,000個のSNPのうちどれくらいの数が混合マッピングに有用であるかを測定することもまた重要であった。関連性のある集団群間で大きな頻度差を示す(Fst>0.4)約1,000個のマーカーのパネルを編集することが結局は有用でありうるため、入手可能なマーカーのどれくらいのパーセンテージが所望の特徴をもつかを評価することは重要であった。候補AIMは、McKeigueら(前記、2000)の推薦に基づいた。アフリカ人、アジア人およびヨーロッパ人集団に利用できる情報をもつマーカー、各Fstカテゴリー(0.05間隔において0〜1)におけるマーカーの累積的割合および可能な比較のそれぞれについて候補AIMの総数。TSC対立遺伝子頻度プロジェクトからのペアワイズFstの分布は以下のとおりであった：アジア人-ヨーロッパ人(556個の候補AIM/25,110個の総SNP；平均Fst=0.0720)；アジア人-アフリカ人(1026個の候補AIM/25,578個の総SNP；平均Fst=0.0886)；およびヨーロッパ人-アフリカ人(1306個の候補AIM/30,103個の総SNP；平均Fst=0.0861)。それとして、スクリーニングは、マーカーの約2〜5%が混合マッピングに有用でありうることを示した。 With regard to the number of candidate AIMs, it was also important to determine how many of the 60,000 SNPs characterized by TSC are useful for mixed mapping. Since it may eventually be useful to compile a panel of approximately 1,000 markers that show large frequency differences between relevant population groups (Fst> 0.4), what percentage of the available markers is the desired feature It was important to evaluate whether Candidate AIMs were based on the recommendations of McKeigue et al. (Supra, 2000). Markers with information available to African, Asian and European populations, cumulative percentage of markers in each Fst category (0-1 at 0.05 interval) and total number of candidate AIMs for each possible comparison. The distribution of pairwise Fst from the TSC allele frequency project was as follows: Asian-European (556 candidates AIM / 25,110 total SNPs; mean Fst = 0.0720); Asian-African (1026 Candidate AIM / 25,578 total SNPs; mean Fst = 0.0886); and European-African (1306 candidate AIM / 30,103 total SNPs; mean Fst = 0.0861). As such, screening has shown that about 2-5% of the markers can be useful for mixed mapping.

米国の混合された集団、特にアフリカ系アメリカ人およびヒスパニック、における混合の地理学的パターンが最初の調査の対象であった。18個より多いアフリカ系アメリカ人集団の混合割合が特徴付けられ、米国におけるいくつかの異なる地理学的地域からのアフリカ系アメリカ人へのヨーロッパ人の遺伝的寄与の推定を示す地図が作成された。ヨーロッパ人混合は、サウスカロライナのガラにおける3.5%からニューオリンズにおける22.5%までの範囲であった(例えば、シカゴにおける18.8%；およびヒューストンにおける16.4%)。これらの推定の大部分は、10個の情報を与えるAIMの最初のパネルを用いて得られた。観察された分布は、アフリカ系アメリカ人の歴史において重要な役割を果たした周知の歴史的および人口統計学的事象の点から解釈された(Parraら、前記、1998；Parraら、前記、2001を参照)。これらのデータは、混合マッピングの適用が複合性疾患に関与する遺伝子を同定するのを可能にする。混合マッピングは高程度の混合を示す集団においてより適しているが、それゆえに、ヨーロッパ人の遺伝的寄与が非常に制限されていたガラ(3.5%)およびジャマイカ人(6.6%)のような集団はこの種の分析に適していない可能性があることが予想される。 A mixed geographical pattern in a mixed population in the United States, particularly African Americans and Hispanics, was the subject of the first study. A mixed proportion of more than 18 African-American populations was characterized and a map was created showing estimates of the European genetic contribution to African-Americans from several different geographic regions in the United States . European mixes ranged from 3.5% in South Carolina galas to 22.5% in New Orleans (eg, 18.8% in Chicago; and 16.4% in Houston). Most of these estimates were obtained using the first panel of AIM that gave 10 pieces of information. The observed distribution was interpreted in terms of known historical and demographic events that played an important role in African American history (Parra et al., Supra, 1998; Parra et al., Supra, 2001. reference). These data allow the application of mixed mapping to identify genes involved in complex diseases. Mixed mapping is more appropriate in populations that show a high degree of mixing, therefore populations such as Gala (3.5%) and Jamaicans (6.6%), whose genetic contributions of Europeans were very limited It is expected that it may not be suitable for this type of analysis.

ミトコンドリアDNA(mtDNA)を用いる予備的研究において、アフリカ系アメリカ人が、アフリカ系アメリカ人の個体によりしばしば話に出されていた自己申告された先住アメリカ人祖先と一致して、低いが検出可能な先住アメリカ人の遺伝的寄与をもつことが観察された。アフリカ人/先住アメリカ人対比についての情報を与える30個のAIMおよびヨーロッパ人/先住アメリカ人対比についての19個のAIMを同定したが(表1参照；Shriverら、前記、2003も参照)、3つのアフリカ系アメリカ人集団における先住アメリカ人混合の存在が核DNAマーカーを用いて検査された。mtDNA推定(「母系の」寄与情報を提供するのみ)と一致して、アフリカ系アメリカ人サンプルのそれぞれにおいて低い先住アメリカ人の遺伝的寄与の証拠が検出された(ワシントンDC、6%；ロンドンからのアフリカ系カリブ人、5%；およびボーガルーサ、ルイジアナ、6%)。 In preliminary studies using mitochondrial DNA (mtDNA), African-Americans are low but detectable, consistent with self-reported native American ancestry often spoken by African-American individuals It was observed to have a genetic contribution of Native Americans. We identified 30 AIMs that gave information about African / Indigenous American contrasts and 19 AIMs for European / Indigenous American contrasts (see Table 1; see also Shriver et al., Supra, 2003), but 3 The presence of indigenous Americans in two African American populations was examined using nuclear DNA markers. Consistent with mtDNA estimates (only providing “maternal” contribution information), evidence of low indigenous American genetic contribution was detected in each of the African American samples (Washington, DC, 6%; from London) African Caribbean, 5%; and Bogarousa, Louisiana, 6%).

ヒスパニックにおける混合に関して、サンルイスヴァリーCOからのスペイン系アメリカ人のサンプルにおける関連性のあるヨーロッパ人、先住アメリカ人およびアフリカ人寄与が推定された。59%ヨーロッパ人混合、35%先住アメリカ人混合、および6%アフリカ人混合がこのサンプルにおいて観察され、メキシコ人祖先の集団について以前に記載された推定と良く一致した(Chakrabortyら、前記、1986；Hanisら、前記、1991；Tsengら、Amer. J. Phys. Anthropol. 106:361-371、1998；Collins-Schramら、前記、2002)。実施例2に示されているように、メキシコからの追加のサンプルおよびプエルトリコ人祖先のヒスパニックからの2つのサンプル(ニューヨークおよびプエルトリコ)における混合のさらなる特徴付けが行われた。 For mixing in Hispanic, relevant European, Native American and African contributions were estimated in Spanish American samples from San Luis Valley CO. A 59% European mix, a 35% Native American mix, and a 6% African mix were observed in this sample and were in good agreement with the estimates previously described for the Mexican ancestry population (Chakraborty et al., Supra, 1986; Hanis et al., Supra, 1991; Tseng et al., Amer. J. Phys. Anthropol. 106: 361-371, 1998; Collins-Schram et al., Supra, 2002). Further characterization of the mixing in additional samples from Mexico and two samples from Puerto Rican ancestry Hispanics (New York and Puerto Rico) was performed as shown in Example 2.

ステートカレッジPAに現在住んでいるヨーロッパ人祖先の個体(N=199)のサンプルもまた解析された。このサンプルにおける遺伝的寄与は、大部分はヨーロッパ人起源(91%)であり、いくらかのアフリカ人(3%)および先住アメリカ人(6%)影響の証拠があった。これらの結果は三角形プロットを用いて図4に要約されており、ヨーロッパ系アメリカ人、スペイン系アメリカ人およびアフリカ系アメリカ人の間の平均混合レベルにおける差を明らかに示している。図4に示された三角形プロットは特定のサンプルにおける平均混合推定値を示す；個体の祖先の根底にある分布は複雑であり、異なる個体は、アフリカ人、ヨーロッパ人および先住アメリカ人祖先の広く分散した値を示す(示されず)。アフリカ系アメリカ人において、ほとんどの個体は、主にアフリカ人遺伝的寄与を示したが、いくらかの人は、比較的高いヨーロッパ人寄与、およびまた、より少ない程度で、先住アメリカ人祖先を示した。ヨーロッパ系アメリカ人は、高ヨーロッパ人寄与に対応して極の近くによりみっちりと密集し、先住アメリカ人およびアフリカ人祖先の証拠を示す人はほとんどなかった。スペイン系アメリカ人は、このサンプルにおいて観察される高い混合レベルを仮定した場合に予想されるとおり、個体の祖先の最も高いばらつきを示した。 Samples of individuals of European ancestry (N = 199) currently living in State College PA were also analyzed. The genetic contribution in this sample was mostly of European origin (91%) and there was evidence of some African (3%) and indigenous American (6%) effects. These results are summarized in Figure 4 using a triangle plot, clearly showing the difference in average mixing levels among European Americans, Spanish Americans and African Americans. The triangle plot shown in Figure 4 shows the mean mixed estimate for a particular sample; the distribution underlying the ancestry of individuals is complex, and different individuals are widely distributed among African, European, and Native American ancestry Value (not shown). In African Americans, most individuals showed predominantly African genetic contributions, but some showed relatively high European contributions and also to a lesser extent indigenous American ancestry . European Americans were more closely packed near the pole in response to high European contributions, with few showing evidence of indigenous American and African ancestry. Spanish Americans showed the highest variability in individual ancestry, as expected when assuming the high mixing levels observed in this sample.

特に、個体は、ヨーロッパ人および先住アメリカ人祖先の全範囲(100%ヨーロッパ人から100%先住アメリカ人まで)を示し、相対的により低いアフリカ人の遺伝的寄与もいくらかの個体において明らかであった。個体の祖先に観察された変動の一部は、祖先を推論するために用いられたマーカーの制限された数による確率的誤差のためである可能性が高かった。このように、例証された検査に用いられた20〜32個のマーカーは個体の祖先を検出したが、推定の標準誤差はかなり高かった；AIMの数を増加させることは、個体の祖先推定の精度を増加させることが期待される(実施例2参照)。個体の祖先における変動の他の成分は、個体間での祖先における真の差によるものであった。2つの完全に独立した方法、MLおよびSTRUCTURE(上で考察されている)、により得られた個体の祖先の値における顕著な相関は、マーカーのこのパネルが、これらの集団に特有な根底にある個体の祖先パターンを捕らえることができることを示している。下で開示されているように、個体の祖先における変動を制御することは、偽陽性結果の回避を可能にする。 In particular, individuals showed a full range of Europeans and Native American ancestry (from 100% Europeans to 100% Native Americans), and relatively lower African genetic contributions were also apparent in some individuals . Some of the variation observed in the ancestry of individuals was likely due to stochastic errors due to the limited number of markers used to infer ancestry. Thus, 20-32 markers used in the illustrated test detected the ancestry of the individual, but the standard error of estimation was quite high; increasing the number of AIMs It is expected to increase accuracy (see Example 2). Another component of variation in individual ancestry was due to true differences in ancestry between individuals. The striking correlation in individual ancestry values obtained by two completely independent methods, ML and STRUCTURE (discussed above), underlies that this panel of markers is unique to these populations It shows that an ancestor pattern of an individual can be captured. As disclosed below, controlling variation in an individual's ancestry allows avoidance of false positive results.

集団構造における混合動態の影響および連鎖不平衡(LD)が調べられた。混合過程に生じる集団構造およびLDに関して混合モデル(ハイブリッド隔離モデル対継続的な遺伝子流動モデル)の重要性は、以前に記載され(Pfaffら、前記、2001)、混合された集団における集団構造のレベルを定量化する2つの方法が提示された。集団構造は、混合マッピング、および混合された集団における任意の遺伝的関連解析の鍵となる局面である。この論点は、以前に利用可能であったものより多くの情報提供マーカーを用いてアフリカ系アメリカ人、スペイン系アメリカ人およびヨーロッパ系アメリカ人において探究された。 The effects of mixing dynamics on population structure and linkage disequilibrium (LD) were investigated. The importance of the mixed model (hybrid sequestration model versus continuous gene flow model) with respect to the population structure and LD that occur during the mixing process has been described previously (Pfaff et al., Supra, 2001) and the level of population structure in the mixed population. Two methods to quantify are presented. Population structure is a key aspect of mixed mapping and any genetic association analysis in mixed populations. This issue was explored in African-Americans, Spanish-Americans and European-Americans using more informative markers than previously available.

構造の存在は、2つの異なる方法で評価された。第一に、有意な関連の観察された数が、連鎖していないマーカー間において、5%有意水準で予想される数と比較された、第二に、個体の祖先の平均相関が遺伝マーカーの2つのサブセットを用いて推定された。以前に報告されたデータと一致して、ワシントンDCからのアフリカ系アメリカ人集団は、偶然により予想されるよりも連鎖していないマーカー間の有意な関連の非常に高い数により反映されている、有意な遺伝的構造を示した(10.5%対5%、図5A)。24 Mbほど遠く離れて位置しているマーカー間に非常に強い関連が観察され(AT3〜F13B、G=15.21、p<0.0001)、これらの有意な関連が混合過程により引き起こされているという明らかな証拠を提供した。関連している対立遺伝子は、いつも、アフリカ人集団において高頻度である対立遺伝子の組み合わせであり、頻度における差が高ければ高いほど、より高頻度の関連がマーカー間に観察された：FYは、アフリカ人とヨーロッパ人の間で頻度において最高の差を示すのだが、9個の連鎖していないマーカーと有意に関連していた。このように、17%ヨーロッパ人祖先を示す、このアフリカ系アメリカ人集団における混合過程は、アフリカ人とヨーロッパ人集団間で高頻度差を示すマーカー間に強い関連を生じた。これらの関連は、マーカーが連鎖している場合、および連鎖していない場合の両方において有意であるが、連鎖しているマーカーは連鎖していないマーカーより高いG値を示す傾向にあり、真の連鎖による関連は、以前に実証されているように(McKeigueら、前記、2000)、遺伝的構造による関連から識別されうることを示した。興味深いことに、有意な関連は、アフリカ人と先住アメリカ人集団間、またはヨーロッパ人と先住アメリカ人集団間で高頻度差を示すマーカー間ではなく、ヨーロッパ人とアフリカ人集団間で高頻度差を示すマーカー間で観察された。この結果は、このサンプルに観察される低い先住アメリカ人祖先(6%)による可能性が高く、そのような小さなサンプルにおいて混合過程による検出可能な関連を生じるには不十分であった。 The existence of the structure was evaluated in two different ways. First, the observed number of significant associations was compared with the number expected at the 5% significance level among unlinked markers, and second, the average correlation of individual ancestry Estimated using two subsets. Consistent with previously reported data, the African-American population from Washington, DC is reflected by a very high number of significant associations between markers that are not linked more than expected by chance, Significant genetic structure was shown (10.5% vs. 5%, FIG. 5A). A very strong association was observed between markers as far as 24 Mb (AT3 to F13B, G = 15.21, p <0.0001), and clear evidence that these significant associations were caused by the mixing process Provided. The related alleles are always combinations of alleles that are frequent in the African population, the higher the difference in frequency, the more frequent association was observed between markers: The highest frequency difference between Africans and Europeans, but significantly associated with 9 unlinked markers. Thus, the mixing process in this African-American population, representing 17% European ancestry, resulted in a strong association between markers that showed a high frequency difference between African and European populations. These associations are significant both when the markers are linked and when they are not linked, but linked markers tend to show higher G values than unlinked markers, and true Linkage associations have been shown to be distinguishable from associations by genetic structure, as previously demonstrated (McKeigue et al., Supra, 2000). Interestingly, a significant association is found between European and African populations, not between markers that show high frequency differences between African and Native American populations, or between European and Native American populations. Observed between the indicated markers. This result was likely due to the low indigenous American ancestry (6%) observed in this sample and was insufficient to produce a detectable association by the mixing process in such a small sample.

このアフリカ系アメリカ人サンプルに存在する高レベルの遺伝的構造を実証する証拠のもう一つのラインは、遺伝マーカーの異なるサブセットを用いる個体の祖先の独立した推定間の有意な相関であった。マーカーの独立したサブセットの100個のランダムな選択に対する平均相関は、r=0.40、p<0.0001であった(図5B)。マーカーのより限定されたセットで解析されたワシントンDCアフリカ系アメリカ人および他のアフリカ系アメリカ人サンプル(ジャクソンMIおよび低地地方の地域、サウスカロライナ、Pfaffら、前記、2001)において観察される遺伝的構造およびLDのパターンは、これらの集団における混合過程を説明する最良のモデルが継続的遺伝子流動モデルであることを示した。このモデルへの追加の支持は、D₀(混合過程により起こされるマーカー間の最初の予想される関連)とD_t(マーカー間の現在の関連)間の強い相関から来た。コンピューターシミュレーションを用いて示されるように(Pfaffら、前記、2001)、継続的遺伝子流動モデルに従う集団において、D₀とD_t間の陽性相関が予想され、実際、この結果はアフリカ系アメリカ人集団において観察された。ハイブリッド隔離モデルに従う集団において、D₀とD_t間の有意な相関は予想されない。 Another line of evidence demonstrating the high level of genetic structure present in this African American sample was a significant correlation between independent estimates of individual ancestry using different subsets of genetic markers. The average correlation for 100 random selections of independent subsets of markers was r = 0.40, p <0.0001 (FIG. 5B). Genetics observed in Washington DC African American and other African American samples (Jackson MI and Lowland, South Carolina, Pfaff et al., Supra, 2001) analyzed with a more limited set of markers Structure and LD patterns indicated that the best model to explain the mixing process in these populations was a continuous gene flow model. Additional support for this model came from a strong correlation between D ₀ (the first expected association between markers caused by the mixing process) and D _t (the current association between markers). As shown using computer simulations (Pfaff et al., Supra, 2001), a positive correlation between D ₀ and D _t is expected in a population following a continuous gene flow model, and in fact this result is an African American population Observed in. In the population following the hybrid sequestration model, no significant correlation between D ₀ and D _t is expected.

解析されたサンルイスヴァリーからのスペイン系アメリカ人サンプルは、アフリカ系アメリカ人集団のいずれよりも少ない遺伝的構造を示した。連鎖していないマーカー間の観察された有意な関連の数は、5%有意水準で予想されるよりもわずかだけ高かった(7.3%対5%、図5A)。この結果は、スペイン系アメリカ人集団がアフリカ系アメリカ人よりかなり多く混合されたことを考えれば興味深く、混合動態の同じモデル下においては、かなり多くの構造を示すことが予想されるものと思われる。独立したマーカーに基づく個体の祖先推定の相関は、有意であるが、アフリカ系アメリカ人集団において観察される値よりずっと低かった(r=0.11、p<0.0001、図5B)。また、サンルイスヴァリーのサンプルにおいて、アフリカ系アメリカ人に観察される結果とは対照的に、D₀とD_tの相関がなかった。これらの結果は、混合動態(集団が形成され、かつ進化した行程)がアフリカ系アメリカ人集団およびサンルイスヴァリー集団において異なっており、前者は、後者より継続的遺伝子流動モデルに近く似ていることを実証している。もちろん、他のヒスパニック集団は、サンルイスヴァリーに観察されたものとは異なる混合動態のパターンを示しうる。 Analyzed Spanish-American samples from San Luis Valley showed less genetic structure than any of the African-American population. The number of significant associations observed between unlinked markers was only slightly higher than expected at the 5% significance level (7.3% vs. 5%, Figure 5A). This result is interesting given that the Spanish-American population was much more mixed than African-Americans, and would be expected to show much more structure under the same model of mixed dynamics . The correlation of individual ancestry estimates based on independent markers was significant, but much lower than that observed in the African American population (r = 0.11, p <0.0001, FIG. 5B). Also, in the San Luis Valley sample, there was no correlation between D ₀ and D _t , in contrast to the results observed in African Americans. These results show that mixed dynamics (the process by which the population is formed and evolved) are different in African-American and San Luis Valley populations, the former being more similar to the continuous gene flow model than the latter. It has been demonstrated. Of course, other Hispanic populations may exhibit mixed kinetic patterns different from those observed in San Luis Valley.

ヨーロッパ系アメリカ人に観察される低い方の混合レベルから予想されるように、ステートカレッジPAからのこのサンプルに混合による遺伝的構造の証拠がなかった(図5Aおよび5B)。連鎖していないマーカー間の有意な関連の数は、偶然により予想される値と類似しており、マーカーの独立したサブセットの個体の祖先推定間に相関がなかった(p=0.149、NS)。 As expected from the lower mixing levels observed in European Americans, this sample from State College PA had no evidence of genetic structure due to mixing (FIGS. 5A and 5B). The number of significant associations between unlinked markers was similar to the value expected by chance, and there was no correlation between the ancestry estimates of individuals in an independent subset of markers (p = 0.149, NS).

これらの結果は、選択された遺伝マーカー(AIM)の使用が、混合過程の動態、およびこの混合過程の混合された集団におけるLDのパターンへの影響の解析を可能にすることを実証している。ハイブリッド隔離モデル(最初の混合後、親集団のさらなる遺伝的寄与なしに混合された集団の独立した進化が続く)に類似した混合過程をもつ混合された集団において、偽陽性結果はほとんど予想されない(虚偽は、LDまたは連鎖を通して形質を引き起こす遺伝子を探索する「遺伝子ハンター(gene hunter)」について関連しているのであって、分類ツールを開発しようと努力する者についてではないことを想起させる)。継続的遺伝子流動モデル(親集団の1つからその混合された集団への世代ごとの継続的遺伝的寄与)により近く似ている混合された集団において、LDは、よりずっと長い距離に広がることが予想され、偽陽性結果についての問題が起こるであろう。遺伝子ハンターにとって幸運なことに、AIMにより伝達される情報は、遺伝的構造について制御し、偽陽性を最小限にすることができる。そのような制御が適切な統計的方法およびモデル表現型として皮膚色素形成を用いてどのようにして達成されうるかを実証する例が下に提供されている。 These results demonstrate that the use of selected genetic markers (AIM) allows analysis of the dynamics of the mixing process and the effects of this mixing process on LD patterns in a mixed population . Few positive results are expected in a mixed population with a mixed process similar to the hybrid sequestration model (after initial mixing, followed by independent evolution of the mixed population without further genetic contribution of the parent population) ( (Recall that false is related to “gene hunters” that search for genes that cause traits through LD or linkage, not those who strive to develop classification tools). In a mixed population that more closely resembles a continuous gene flow model (generational continuous genetic contribution from one of the parent populations to its mixed population), LD can spread over much longer distances. Problems with expected and false positive results will occur. Fortunately for gene hunters, the information conveyed by AIM can control the genetic structure and minimize false positives. Examples are provided below that demonstrate how such control can be achieved using skin pigmentation as an appropriate statistical method and model phenotype.

皮膚色素形成および個体の祖先がアフリカ系アメリカ人サンプルおよびスペイン系アメリカ人サンプルにおいて調べられた。以前に実証されているように、混合により生じた遺伝的構造は、効果的に制御されうり、連鎖による関連は、適切な統計的検定を用いて遺伝的構造による偽の関連から識別されうる(McKeigueら、前記、2000)。本研究において、同じ方法が、2つの混合されたサンプル(ワシントンDCからのアフリカ系アメリカ人およびサンルイスヴァリーからのスペイン系アメリカ人)における皮膚色素形成の研究において適用された。皮膚色素形成に関する情報は、両方の研究における各個体について収集され、対象はAIMのパネルについてジェノタイピングされ、個体の祖先割合は、最尤法を用いて計算された(Chakrabortyら、前記、1986)。個体の祖先(%アフリカ人または%先住アメリカ人)は、各個体についてのメラニン指数(アフリカ人)または皮膚反射率(先住アメリカ人)に対してプロットされた。親集団間で頻度における高い差を示すAIMのいくつかはまた、色素形成についての候補遺伝子であった。 Skin pigmentation and individual ancestry were examined in African American and Spanish American samples. As previously demonstrated, the genetic structure resulting from mixing can be effectively controlled, and linkage linkages can be distinguished from false linkages due to genetic structure using appropriate statistical tests ( McKeigue et al., Supra, 2000). In this study, the same method was applied in the study of skin pigmentation in two mixed samples (African American from Washington DC and Spanish American from San Luis Valley). Information on skin pigmentation was collected for each individual in both studies, subjects were genotyped for a panel of AIMs, and the ancestry percentage of individuals was calculated using the maximum likelihood method (Chakraborty et al., Supra, 1986). . Individual ancestry (% African or% Native American) was plotted against melanin index (African) or skin reflectance (Indigenous American) for each individual. Some of the AIMs that showed high differences in frequency between parental populations were also candidate genes for pigmentation.

アフリカ系アメリカ人サンプルにおいて、個体の祖先と、皮膚のメラニン含量を測定するメラニン指数の間に、強くかつ大いに有意な相関(R²=0.1879、p<0.0001)が観察された。より黒い皮膚をもつ個体は、平均して、より高いレベルのアフリカ人祖先をもった。個体の祖先推定は、21個のマーカーに基づいており、それゆえに、相対的に高い分散を受けやすく、それに従って、グラフに観察されるばらつきの少なくとも一部を説明する。これらの結果の興味深い特徴は、右(より多いアフリカ人祖先)から左(より多いヨーロッパ人祖先)へ動くにおいて観察される分散における明らかな減少であった。この結果は、ヨーロッパ人と比較して、アフリカ人集団に見出される皮膚色における高いレベルの変動性と一致している。個体の祖先と皮膚色素形成の間に観察される高い相関は、アフリカ系アメリカ人集団に典型的な集団構造によるものであり、この関係内に含まれる親集団差を測定するために用いられる遺伝子の制限された数に関連しうる。 In African-American samples, a strong and highly significant correlation (R ² = 0.1879, p <0.0001) was observed between the ancestry of the individual and the melanin index, which measures skin melanin content. Individuals with darker skin on average had higher levels of African ancestry. An individual's ancestry estimate is based on 21 markers and is therefore subject to a relatively high variance, thus accounting for at least some of the variations observed in the graph. An interesting feature of these results was the apparent decrease in variance observed in moving from right (more African ancestry) to left (more European ancestry). This result is consistent with the high level of variability in skin color found in African populations compared to Europeans. The high correlation observed between individual ancestry and skin pigmentation is due to the population structure typical of African-American populations, and the genes used to measure parental population differences contained within this relationship Can be related to a limited number of

同様のプロットは、サンルイスヴァリーのサンプルについて調製された。15個の先住アメリカ人/ヨーロッパ人AIMを用いる個体の祖先推定は、PHOTOVOLT 670グリーンフィルターを通して反射される光のパーセントにより測定される色素形成レベルに対してプロットされた。皮膚色素形成は、これらの2つの研究において異なる方式(吸光度対反射率)で測定されたため、グラフで示された場合に観察される傾向は逆転される。スペイン系アメリカ人サンプルにおいて、個体の祖先と皮膚色の間の相関もまた有意であったが(R²=0.0481、p<0.001)、アフリカ系アメリカ人サンプルにおいてより低く、おそらく、このサンプルに存在する低下した遺伝的構造によるものと思われる。 A similar plot was prepared for a sample of San Luis Valley. Individual ancestry estimates using 15 Native American / European AIMs were plotted against pigmentation levels measured by the percent of light reflected through the PHOTOVOLT 670 green filter. Since skin pigmentation was measured in different ways (absorbance vs. reflectance) in these two studies, the trend observed when shown graphically is reversed. In the Spanish-American sample, the correlation between individual ancestry and skin color was also significant (R ² = 0.0481, p <0.001), but lower in the African-American sample and probably present in this sample This is probably due to the reduced genetic structure.

上で考察されたアフリカ系アメリカ人集団サンプルにおいてタイピングされたAIMについての遺伝子型による平均色素形成レベルにおける差についての検定が行われた。AIMのパネルは、3つの候補遺伝子マーカー、OCA2、TYRおよびMC1R、を含んだ。解析は3つの選択的方法において行われた：個体の祖先推定を考慮しない第一の方法(ANOVA)；考慮中の遺伝子座を除外する個体の祖先の影響について制御するための条件付け後の第二の方法(ANCOVA/IAEマイナスマーカー)；および条件付けのために完全な個体の祖先推定を用いる第三の方法(ANCOVA/IAE)。表2に示されているように、マーカーの21個のうち8個(38%)が、3つの遺伝子型の中で有意な差(p<0.05)を示し、4つの候補遺伝子マーカーの2つを含んだ(OCA2およびTYR)。0.05のアルファレベルを用いる場合、検定されたマーカーの5%のみが有意な結果を生じることが予想された。それとして、38%有意差の発見は、集団構造が祖先および色素形成の両方に関連していることを示している(Pfaffら、前記、2001；Parraら、前記、2001)。 Tests were performed for differences in average pigmentation levels by genotype for AIM typed in the African American population sample discussed above. The AIM panel included three candidate genetic markers, OCA2, TYR and MC1R. The analysis was performed in three alternative methods: the first method (ANOVA) that does not take into account the individual's ancestry estimation; the second after conditioning to control the influence of the individual's ancestry that excludes the locus under consideration Method (ANCOVA / IAE minus marker); and a third method (ANCOVA / IAE) using complete individual ancestry estimation for conditioning. As shown in Table 2, 8 out of 21 markers (38%) showed a significant difference (p <0.05) among the 3 genotypes, and 2 of the 4 candidate gene markers (OCA2 and TYR). When using an alpha level of 0.05, only 5% of the tested markers were expected to yield significant results. As such, the discovery of a significant difference of 38% indicates that the population structure is associated with both ancestry and pigmentation (Pfaff et al., Supra, 2001; Parra et al., Supra, 2001).

集団構造の影響を除去する一つの方法は、個体の祖先推定(IAE)の条件になる差について検定することである。完全なIAEが条件を設けるために用いられる場合(ANCOVA/IAE)、たった1つの遺伝子座、OCA2、ヒトP遺伝子、だけが遺伝子間での有意な平均差を示した。考慮中の遺伝子座が個体の祖先推定から除外される、より保存性の少ない条件付けアプローチが採られた場合(ANCOVA/IAEマイナスマーカー)、4つの有意な結果：OCA2、TYR、FYおよびSGC30055があった。 One way to remove the effects of population structure is to test for differences that are a condition of individual ancestry estimation (IAE). When complete IAE was used to establish the condition (ANCOVA / IAE), only one locus, OCA2, the human P gene showed a significant average difference between genes. When a less conservative conditioning approach was taken (ANCOVA / IAE minus marker), where the locus under consideration was excluded from the individual's ancestry estimates, there were four significant results: OCA2, TYR, FY, and SGC30055 It was.

混合およびマーカー遺伝子型についてのベイズの完全確率モデルもまた設定された(McKeigueら、前記、2000)。連鎖についての得点検定は、個体の祖先(マーカーデータから推定される)を含む回帰モデルにおいて一度に1つずつ、各遺伝子座におけるヨーロッパ人祖先の対立遺伝子の数と色素形成の独立した関連について検定することに基づいた。得点検定についての片側確率は表2に示されており、3つの遺伝子座が0.05のアルファレベルにおいて皮膚色素形成への連鎖の証拠を示した{OCA2(p=0.005)、AT3(p=0.027)、TYR(p=0.033)}。これらの結果を確認するために、OCA2における祖先についての情報を与える他のマーカーが同定され、得点検定方法により解析されることになっている。これらのANOVA結果とベイズの混合マッピングの間の一致は励みになり、両方法は、新しい連鎖していないAIMの追加から恩恵を受け、個体の祖先推定の精度を増加させるものと思われる。 Bayesian complete probability models for mixed and marker genotypes have also been established (McKeigue et al., Supra, 2000). Scoring tests for linkage test for independent association between the number of European ancestry alleles at each locus and pigmentation, one at a time in a regression model that includes individual ancestry (estimated from marker data) Based on what to do. The one-sided probabilities for scoring tests are shown in Table 2, with three loci showing evidence of linkage to skin pigmentation at an alpha level of 0.05 (OCA2 (p = 0.005), AT3 (p = 0.027) , TYR (p = 0.033)}. To confirm these results, other markers that give information about ancestry in OCA2 are to be identified and analyzed by scoring methods. The agreement between these ANOVA results and Bayesian mixed mappings is encouraging, and both methods will benefit from the addition of new unchained AIMs and increase the accuracy of individual ancestry estimations.

サンルイスヴァリーCOからのスペイン系アメリカ人サンプルもまた、ベイズおよびANOVA方法を用いて連鎖および関連について解析された(表3)。この解析は、祖先についての情報を与える15個のマーカー遺伝子座(DRD2遺伝子における2個のSNPは1つの遺伝子座として扱われる)についてタイピングされた442人の個体を含んだ。CYP19E2マーカー(MYO5A、色素形成候補遺伝子の近くに位置している)は、皮膚色素形成において民族的差との連鎖についての強い証拠を示した。しかしながら、この結果は、慎重に解釈されるべきである、なぜなら、いくつかの、祖先についての情報を与える密接に連鎖したマーカーが、用いられない限り、連鎖についての検定は、祖先特異的対立遺伝子頻度の誤った特定化に対して強靱ではないからである。MYO5Aの周りのSNPがこれらの予備的な結果を確認するために解析されうる。 Spanish-American samples from San Luis Valley CO were also analyzed for linkage and association using Bayes and ANOVA methods (Table 3). This analysis included 442 individuals typed for 15 marker loci that give information about ancestry (two SNPs in the DRD2 gene are treated as one locus). The CYP19E2 marker (MYO5A, located near the pigmentation candidate gene) showed strong evidence for linkage to ethnic differences in skin pigmentation. However, this result should be interpreted with caution because, as long as some closely linked markers that give information about ancestry are not used, the test for linkage is an ancestor-specific allele. It is because it is not strong against the wrong specification of frequency. SNPs around MYO5A can be analyzed to confirm these preliminary results.

（表２）アフリカ系アメリカ人サンプルにおける色素形成への単一座遺伝子型の効果についての検定

¹ マーカーは、検定に用いられる祖先情報提供マーカーを示す。太字のイタリック体で示されたマーカーは色素形成についての候補遺伝子である(すなわち、OCA2、MC1R、TYR)。
² 「デルタ」(δ)は、アフリカ人とヨーロッパ人集団の間の対立遺伝子頻度差である。
³ 性別が唯一の共変動である、分散分析有意水準
⁴ 検定される遺伝子座が共変動として除外された個体の祖先推定(M)を用いる一元ANCOVA分析についての有意水準。
⁵ Mがすべての21個のマーカーに基づくことを除いて、3と同じ。
⁶ ベイズの混合マッピング片側確率。 Table 2. Tests for the effect of single locus genotypes on pigmentation in African American samples

¹ marker indicates an ancestral information providing marker used for the assay. Markers shown in bold italics are candidate genes for pigmentation (ie, OCA2, MC1R, TYR).
² “Delta” (δ) is the allelic frequency difference between African and European populations.
³ gender is the only co-variation, analysis of variance level of significance
Significance level for one yuan ANCOVA ^{analysis 4-validated} the loci used ancestry estimation (M) of individuals were excluded as covariate.
Same as 3, except ⁵ M is based on all 21 markers.
⁶ Bayesian mixed mapping one-sided probability.

（表３）スペイン系アメリカ人サンプルにおける色素形成への単一座遺伝子型の効果についての検定

¹ マーカーは、検定に用いられる祖先情報提供マーカーを示す。太字のイタリック体で示されたマーカーは、色素形成についての候補遺伝子の中に、または近くにある(すなわち、TYR-192、およびMYO5A近くのCYP19E2)。
² 「デルタ」(δ)は、先住アメリカ人とヨーロッパ人の親集団の間の対立遺伝子頻度差である。
³ 性別が唯一の共変動である、分散分析有意水準
⁴ 検定される遺伝子座が共変動として除外された個体の祖先推定(M)を用いる一元ANCOVA分析についての有意水準。
⁵ Mがすべての15個のマーカーに基づくことを除いて、3と同じ。
⁶ ベイズの混合マッピング片側確率。 Table 3. Tests for the effect of single locus genotypes on pigmentation in Spanish-American samples

¹ marker indicates an ancestral information providing marker used for the assay. Markers shown in bold italics are in or near candidate genes for pigmentation (ie, TYR-192, and CYP19E2 near MYO5A).
² “Delta” (δ) is the allelic frequency difference between native American and European parental populations.
³ gender is the only co-variation, analysis of variance level of significance
Significance level for one yuan ANCOVA ^{analysis 4-validated} the loci used ancestry estimation (M) of individuals were excluded as covariate.
Same as 3, except ⁵ M is based on all 15 markers.
⁶ Bayesian mixed mapping one-sided probability.

SNP区別のペアワイズ集団比較
表4は、ジェノタイピングの結果、およびいくつかの異なるが重要である点を実証する統計的解析を示す(表4における各AIMについての配列は、マーカー番号を用いて、表6への参照により見出されうる)。 Pairwise population comparison of SNP discrimination Table 4 shows the results of genotyping and statistical analysis demonstrating several different but important points (the sequence for each AIM in Table 4 uses the marker number, Can be found by reference to Table 6).

（表４）SNP区別のペアワイズ集団比較

主張されたリストからの精選されたAIMについての「デルタ」(δ)値が示されている。
AIM固有識別名は最後の列に示されている。
太字(影がついていない)の番号をもつセルは良いδ値を示す；太字の番号をもち影がついているセルは極めて高いδ値を示す。
AF-アフリカ人、CT-ヨーロッパ人、EA-東アジア人、SA-南アジア人、ME-中東人、PI-太平洋諸島系、AI-先住アメリカ人。 (Table 4) Pairwise group comparison of SNP distinction

“Delta” (δ) values for selected AIMs from the claimed list are shown.
The AIM unique identifier is shown in the last column.
Cells with bold (no shadow) numbers show good δ values; cells with bold numbers and shades show very high δ values.
AF-African, CT-European, EA-East Asian, SA-South Asian, ME-Middle Eastern, PI-Pacific Islander, AI-Indigenous American.

第一に、表4は、公開データベースから電子的に選択された数百個の候補AIMをスクリーニングすることから引き出され、このように、公開データベースからの候補AIMの少数のみが真のAIMであることを実証した。上で考察されているように、公開SNPデータベースは、良い候補AIMを見出すために電子的にスクリーニングされた(頻度データは3つの「人種の」群について提供されているため、3つの群についての混合のレベルは知られていないが)。第二に、様々な大陸およびBGA起源の384人の個体がこれらの部位のそれぞれにおいてジェノタイピングされている：ナイジェリアおよびコンゴから収集された70個のアフリカ人サンプル、北ヨーロッパから収集された65個のヨーロッパ人サンプル、サンフランシスコ、CAへの最近の移民から収集された70個の東アジア人サンプル；トルコから収集された35個の中東人サンプル、インドから収集された35個の南アジア人サンプルおよびフィリピンおよびUSサモアから収集された25個の太平洋諸島系サンプル。 First, Table 4 is derived from screening hundreds of candidate AIMs electronically selected from public databases, and thus only a small number of candidate AIMs from public databases are true AIMs. Proved that. As discussed above, the public SNP database was electronically screened to find good candidate AIMs (frequency data are provided for three “race” groups, so The level of mixing is unknown)). Second, 384 individuals of various continents and BGA origin are genotyped at each of these sites: 70 African samples collected from Nigeria and the Congo, 65 collected from Northern Europe European samples, 70 East Asian samples collected from recent immigrants to San Francisco, CA; 35 Middle Eastern samples collected from Turkey, 35 South Asian samples collected from India, and 25 Pacific Islander samples collected from the Philippines and US Samoa.

スクリーニングされた175個の候補AIMからスクリーニング過程に合格した約70個のAIMについてのデータのサンプリングが表4に示されている。デルタ(δ)値は、多型についての配列が一方または他方の群への帰属関係を予想するのをどれくらい十分に可能にするか；すなわち、2つの集団がこの多型における配列に関してどれくらい異なるか、という尺度である。δ値は、175個のAIMのうちの69個について示されている；残りの105個は、ペアワイズ集団比較のそれぞれについて0のδ値をもち、それゆえに、真のAIMではなかった。表4におけるAIM1068は、失敗の型を代表している(すべてのペアワイズ比較に渡ってゼロ − 集団ペアに渡ってゼロをもついくつかのAIMが表4に存在するが、それらは、この特定の表に示されていない集団についての情報を与えているからである)。この結果は、公開データベースから選別された候補AIMの大部分が真のAIMではないことを確認し、本発明の価値を目立たせている。 Sampling of data for approximately 70 AIMs that passed the screening process from 175 candidate AIMs screened is shown in Table 4. The delta (δ) value is enough to allow the sequence for the polymorphism to predict membership in one or the other group; ie how different the two populations are with respect to the sequence in this polymorphism This is the scale. δ values are shown for 69 of 175 AIMs; the remaining 105 had a δ value of 0 for each of the pair-wise population comparisons and were therefore not true AIMs. AIM1068 in Table 4 is representative of the type of failure (There are several AIMs in Table 4 that have zeros across all pairwise comparisons-zero across population pairs, but these are Because it gives information about groups not shown in the table). This result confirms that most of the candidate AIMs selected from the public database are not true AIMs, and highlights the value of the present invention.

ジェノタイピングおよび解析における比較的大きな投資は、どの候補AIMが真のAIMであるかを同定することを必要とした。この工程は迂回されうる、例えば、単に、100個の候補AIMにおいてサンプルをジェノタイピングし、真にAIMであることが証明されるものからデータを抽出しうるのだが、ジェノタイピングは費用のかかる方法であり、それとして、背負い込む浪費は、検査を経済的に実際的ではないものにする。祖先の割合についての経済的かつ実際的な検査を開発するために、検査は多数の真のAIMについて質問しなければならない。多くの公的に入手可能な候補AIMは、いくらかの情報を提供してはいるが、公開データベースにおける対立遺伝子頻度は、非常に信頼性があるととてもいえるものではないような、例えば、低いサンプルサイズのせいで、真のAIMではない。SNPのランダムな収集における真の(すなわち、確証された)、正しく特徴付けられた(すなわち、集団特異的頻度が確実性を以てわかっている)AIMの頻度は約5%であることが予想されるが、候補AIMの選別されたセットにおける真のAIMの頻度は約50%であり、本明細書に開示されているように進行した後では、SNPの収集物における真のAIMの頻度は100%である。 A relatively large investment in genotyping and analysis required identifying which candidate AIMs were true AIMs. This process can be bypassed, for example, simply genotyping a sample in 100 candidate AIMs and extracting data from what proves to be truly AIM, but genotyping is an expensive method As such, the wasteful burden of making it makes the inspection economically impractical. In order to develop an economical and practical test for the proportion of ancestors, the test must ask a number of true AIMs. Many publicly available candidate AIMs provide some information, but allele frequencies in public databases are not very reliable, for example low samples Not true AIM because of size. The frequency of true (i.e. validated), correctly characterized (i.e. population-specific frequency is known with certainty) in random collections of SNPs is expected to be about 5% However, the true AIM frequency in the selected set of candidate AIMs is about 50%, and after proceeding as disclosed herein, the true AIM frequency in the collection of SNPs is 100% It is.

第二に、表4の結果は、AIMのいくらかはアフリカ人対ヨーロッパ人の決定に適している、別のAIMは先住アメリカ人対アフリカ人の決定に適しているなどを実証している。ヨーロッパ人/アフリカ人/アジア人対立遺伝子頻度差に基づいて選択されているが、いくつかのAIMは、太平洋諸島系、南アジア人および中東人のような他の群の良い識別を提供する。この型の情報は、より大きなサンプルにおいてジェノタイピングすることにより知られうるだけであり、祖先の割合についての検査は、正確であるためにこの段階を通って行かなければならない(例えば、検査が、公的に利用可能であるデータにより与えられる3次元 − ヨーロッパ人、アフリカ人およびアジア人 − において働くのみであったならば、例えば、ヒスパニックについて得られる結果は、あいまいであったであろう)。表4におけるSNPのパネルは、7つの集団群に関する可能なペアワイズ比較のそれぞれについての決定力をもつAIMのよくバランスのとれた混合を提供し、このパネルは、祖先の割合についての良い検査を構成するものと思われる。南アジア人、中東人および太平洋諸島系についてのデータは公開データベースに存在せず、それゆえに、これらの研究のために作成された。比較して、単に、公開SNPデータベースからでたらめに選択された候補AIMにおいてシーケンシングすることにより(すなわち、データ生成を通しての選択なしに)、7次元における祖先の割合についての検査を開発する1つの試みは、集団の特定のペアは、(例えば、共通の言語母体により結合されるより大きなインドヨーロッパ人群を構成する南アジア人およびヨーロッパ人を)決定することが困難であるため、表4にあるもののようなパネルを得るために何千個というSNPの一群を編集する必要がある。 Second, the results in Table 4 demonstrate that some AIMs are suitable for African-European decisions, and other AIMs are suitable for indigenous Americans-African decisions. Although selected based on European / African / Asian allele frequency differences, some AIMs provide good discrimination of other groups such as Pacific Islanders, South Asians and Middle Easterns. This type of information can only be known by genotyping in a larger sample, and testing for ancestral proportions must go through this stage to be accurate (e.g., testing If you only worked in the third dimension given by publicly available data-Europeans, Africans and Asians-for example, the results obtained for Hispanic would have been ambiguous. The panel of SNPs in Table 4 provides a well-balanced mix of AIM with decisive power for each of the possible pair-wise comparisons for the seven population groups, and this panel constitutes a good test for ancestry proportions It seems to do. Data for South Asians, Middle Easterns, and Pacific Islanders do not exist in public databases and were therefore created for these studies. In comparison, one attempt to develop a test for the proportion of ancestors in 7 dimensions by simply sequencing on candidate AIMs selected randomly from a public SNP database (ie without selection through data generation) Although certain pairs of populations are difficult to determine (e.g., South Asians and Europeans that make up a larger Indo-European group combined by a common language parent), those in Table 4 It is necessary to edit a group of thousands of SNPs to obtain such a panel.

表4に示されるAIMのパネルを用いて得られた結果は、表5に示されている。現在開示されているアルゴリズム(実施例6、表12参照)は、南東部米国に居住する、および彼ら自身をカフカス人であると記載する、96人の個体の群についての割合を計算するために用いられた。 The results obtained using the AIM panel shown in Table 4 are shown in Table 5. The currently disclosed algorithm (Example 6, see Table 12) is used to calculate the proportion for a group of 96 individuals who reside in the southeastern United States and describe themselves as Caucasian. Used.

（表５）多数の自己申告されたカフカス人についての祖先の割合

祖先のパーセンテージは各群に関して与えられている：EUR - インドヨーロッパ人、NAM - 先住アメリカ人、EAS - 東アジア人/太平洋諸島系、AFR - アフリカ人。各個体についての自己申告された人種(SELF)、彼らの母親(M)、父親(F)、母方の祖母(MGM)、母方の祖父(MGF)、父方の祖母(PGM)、父方の祖父(PGF)および彼らの出生国が示されている。 (Table 5) Proportion of ancestors for a number of self-reported Caucasians

The percentage of ancestry is given for each group: EUR-Indian European, NAM-Native American, EAS-East Asian / Pacific Islander, AFR-African. Self-declared race (SELF) for each individual, their mother (M), father (F), maternal grandmother (MGM), maternal grandfather (MGF), paternal grandmother (PGM), paternal grandfather (PGF) and their country of birth are shown.

表5の結果は、自分自身をカフカス人であると記載するたいていの人々は、本当に、BGA検査を用いて、すなわち、表4におけるマーカーのパネルおよびアルゴリズム(実施例6、表12)を用いて、測定されたように大部分のインドヨーロッパ人祖先をもつことを実証している。これらのカフカス人の約40%は、混合を含まない、100%ヨーロッパ人祖先と測定されたが、60%は検出可能な混合を示した。批判的評価として、得られた割合は、自己申告された混合と比較されうる。彼らの両親および祖父母のすべてが混合されていないカフカス人であると主張している表5の個体について(すべての列に渡って「ca」)、90%またはそれ以上のヨーロッパ人祖先が見出される率は、血統に混合を報告する人について(35%)よりも、血統に混合無しと報告する人について高い(55%)。その人々の半分は、彼らの血統に混合無しと報告したという事実は、たいていの人々が、少なくとも、地政学的と比較しての人類学的名辞において、彼らのBGAについていかに知らないかを示している。 The results in Table 5 show that most people who describe themselves as Caucasians really use BGA testing, i.e. using the marker panel and algorithm in Table 4 (Example 6, Table 12). Demonstrates having most Indo-European ancestry, as measured. Approximately 40% of these Caucasians were measured as 100% European ancestry without mixing, while 60% showed detectable mixing. As a critical assessment, the percentage obtained can be compared to a self-reported blend. For individuals in Table 5 claiming that all of their parents and grandparents are unmixed Caucasians ("ca" across all columns), 90% or more European ancestry is found The rate is higher for those who report no mixing to their pedigree (55%) than those who report mixing to their pedigree (35%). The fact that half of the people reported no mixing in their pedigree is how most people don't know about their BGA, at least in an anthropological name compared to geopolitics. Show.

公開データベースは、3つの群(アフリカ系アメリカ人、ヨーロッパ人およびアジア人)のそれぞれについて少数のサンプルを使用した。従って、公開データベースからの、主張されたSNPについての実際の対立遺伝子頻度は不確実であり、本研究からのみ正確さを以て測定された。さらに、親群としてアフリカ系アメリカ人の使用は、本明細書に開示されているように、彼らは混合された集団(アフリカ人とヨーロッパ人の間)であるため、誤りを犯しがちである。本方法に有用であるSNPマーカーを見出すための最良の方法は、世界の主要なBGA群からの多数のサンプルを、群の少なくとも2つの間で明らかに異なる少数対立遺伝子頻度のもののすべてについてジェノタイピングし、δ値を計算し、それらをランク付けすることである。 The public database used a small number of samples for each of the three groups (African American, European and Asian). Therefore, the actual allele frequencies for the claimed SNPs from public databases were uncertain and were measured with accuracy only from this study. Furthermore, the use of African Americans as parent groups is prone to error because they are mixed populations (between Africans and Europeans), as disclosed herein. The best way to find SNP markers that are useful in this method is to genotype a large number of samples from the world's major BGA groups for all of the minor allele frequencies that are clearly different between at least two of the groups And calculate the δ values and rank them.

家系図
祖先の割合を測定するSNP(AIM)の使用のBGA方法は、父親、母親および彼らの3人の子どもの家系図の検査に適用された(図6および7参照)。男性は大部分、ヨーロッパ人であり、彼の妻はメキシコ人である、だから、検査が正確であれば、彼らの3人の子どもは、その男性と彼の妻との間のどこかにプロットするはずであり、これはまさしく、観察されたことである。家系の父親側における父親の祖父母のうちの3人は、比較的純粋なギリシャ人/ヨーロッパ人であるが、1人はほとんど純粋なチェロキー族であった。彼の母親側における彼の祖父母のすべての4人は、ヨーロッパ人混合であった。母親(図6B)および父親(図6A)についてのプロットは示されている。頂点の向かい側の三角形の辺を二等分する線を頂点から引くことにより(頂点が100%、三角形の辺が0%を表す)、父親は約85%ヨーロッパ人、11%先住アメリカ人、4%アフリカ人であることが見られうる(彼は、検出可能な東アジア人、南アジア人または太平洋諸島系の祖先をもたなかった)。11%先住アメリカ人は、彼の父親の祖父母から来たように思われる。彼の曾祖父母の7/8はヨーロッパ人混合であり、1/8は先住アメリカ人であったことを知っている場合、予想されるパーセントは12%(1/8)であり、検査から生じたデータと良く一致している。母親はメキシコ出身であり、ヒスパニック系である。以前に考察されているように、ヒスパニックはヨーロッパ人および先住アメリカ人の寄与をもつ混合された集団である。母親は、11%ヨーロッパ人、76%先住アメリカ人および13%アフリカ人血統をもつことがここで見られうる(検出可能な東アジア人、南アジア人または太平洋諸島系の祖先はなかった)。 Family Tree The BGA method of using SNP (AIM) to measure the proportion of ancestry was applied to the examination of the family tree of the father, mother and their three children (see Figures 6 and 7). The men are mostly European and his wife is Mexican, so if the test is accurate, their three children plot somewhere between that man and his wife This is exactly what has been observed. Three of the father's grandparents on the paternal side of the family were relatively pure Greek / European, but one was almost pure Cherokee. All four of his grandparents on his mother's side were a mixed European. Plots for the mother (Figure 6B) and father (Figure 6A) are shown. By drawing a line from the vertex that bisects the side of the triangle opposite the vertex (the vertex represents 100%, the triangle side represents 0%), the father was about 85% European, 11% Native American, 4 It can be seen as an African (he had no detectable East Asian, South Asian or Pacific Islander ancestry). 11% Native Americans appear to have come from his father's grandparents. If you know that 7/8 of his great-grandparents were European mixed and 1/8 were indigenous Americans, the expected percentage was 12% (1/8), resulting from the test The data match well. The mother is from Mexico and is Hispanic. As previously discussed, Hispanic is a mixed population with European and Native American contributions. Mothers can be seen here with 11% European, 76% Native American and 13% African pedigree (no detectable East Asian, South Asian or Pacific Islander ancestry).

3人の子どもはそれぞれ、彼らの母親および父親のそれぞれから1つの染色体を受けたため、彼らは、それぞれ、彼と彼の妻の間のどこかにプロットするはずである。その方法を用いて、子どもは予想通りプロットしている(図7)。予想されることだが、子どもの点推定は両親のそれの間にあることはこれらの結果から明らかである。これらの結果から、子ども1番(図7A；80%ヨーロッパ人、18%先住アメリカ人、2%アフリカ人)は、母親より父親に類似した祖先の割合をもち、子ども2番(図7B；61%ヨーロッパ人、31%先住アメリカ人、8%アフリカ人)および子ども3番(図7C；54%ヨーロッパ人、37%先住アメリカ人、9%アフリカ人)は、2人の親の間の約中間点の祖先の割合をもつ。 Each of the three children received a chromosome from each of their mother and father, so they should each plot somewhere between him and his wife. Using that method, the children are plotting as expected (Figure 7). As expected, it is clear from these results that the child's point estimate is between that of the parents. From these results, child No. 1 (Fig. 7A; 80% European, 18% Native American, 2% African) has a proportion of ancestry more similar to father than mother, and child No. 2 (Fig. 7B; 61 % European, 31% Native American, 8% African) and Child 3 (Figure 7C; 54% European, 37% Native American, 9% African) are about halfway between the two parents Has the percentage of the ancestor of a point.

各子どもは、母親から1つの染色体および父親から1つを受けているが、母親は大部分、先住アメリカ人血統の染色体をもつとはいっても、なおいくらかはヨーロッパ人およびアフリカ人型をもち、父親は大部分、ヨーロッパ人型の染色体をもつとはいっても、なおいくらかは先住アメリカ人型であるため、子どもは異なる。子どもが受胎される時、彼または彼女は各親から染色体を受けるが、2つのうちどちらを子どもが母親から受けるのかはランダムである(すなわち、「独立組み合わせ」)。母親の染色体対のいくつかはヨーロッパ人型を含む対のメンバーをもつため、子どものいくらかが、「ヨーロッパ人型染色体」を受けて、他の子どもは受けなかったりする。本研究から、子ども1番(図7A)が、その残りの2人の子どもが受けたよりも母親のヨーロッパ人型染色体の多くを受けたことは明らかである。このように、各子どもは、母親および父親の染色体の50/50混合であるが、彼らの祖先の割合は、彼らの両親のそれらの固有かつランダムな関数である。 Each child receives one chromosome from the mother and one from the father, although the mother has mostly indigenous American pedigree chromosomes, some still have European and African types, Although fathers have mostly European-type chromosomes, children are different because some are still indigenous-Americans. When a child is conceived, he or she receives a chromosome from each parent, but which of the two a child receives from a mother is random (ie, an “independent combination”). Since some of the mother's chromosome pairs have members of pairs that include the European type, some children receive the “European type chromosome” and others do not. From this study, it is clear that child 1 (Figure 7A) received more of the mother's European chromosome than the other two children. Thus, each child is a 50/50 mixture of mother and father chromosomes, but the proportion of their ancestors is their unique and random function of their parents.

実施例2
祖先の4元混合推定
この実施例は、4元混合BGA検査が3元BGA検査を用いて得られた同じ結果を提供することを実証している。 Example 2
Ancestor's four-way mixed estimation This example demonstrates that a four-way mixed BGA test provides the same results obtained using a three-way BGA test.

上で示されているように、生物地理学的祖先(BGA)は、人種の遺伝性構成要素である。人類を測定するための社会文化的および地政学的測定基準は、人間の作る、自然ではない、構築物であるため、遺伝学研究におけるそれらの使用は、集団の遺伝的構造について制御するのを困難にさせ、BGAとヒト生物学の間の重要な相関を覆い隠す可能性がある。この実施例は、個体内の遺伝的構造を正確に測定するための方法および組成物を提供する。ヒトゲノムは、候補祖先情報提供マーカーへ関心が向けられ、それは、超高処理量ジェノタイピングプラットフォームにおいて確証され、親集団対立遺伝子頻度を確立するために用いられた。染色体の大部分を網羅する、最も情報価値の高いAIMの71個を用い(表6)、かつヒト集団を4つの主な大陸集団群(サハラ以南アフリカ人、東アジア人、インドヨーロッパ人および先住アメリカ人)へ合体させて、MLE方法は、個体のBGA混合割合およびそれらの付随した信頼区間を決定するために用いられた。本明細書に開示されているように、自己申告された集団所属は、2,024個の国際的なサンプルの1つのサンプルについて測定された大部分BGA集団所属とほとんど完全に相関した。BGA混合結果は、驚くほど頻繁であり、観察される場合、一般的に、人類学的および地政学的歴史と矛盾しなかった。作成された混合割合は、家系図において、独立組み合わせの法則と一致した様式でたどっており、シミュレーションにより、群所属を決定することに関連したマーカーが本発明者らのアルゴリズムの範囲内で独立して機能したことが明らかにされた。多数の高δ値マーカーが用いられたため、検査は驚くほど強靱であった；偏りのある親のサンプリングにより引き起こされうるシミュレートされた対立遺伝子頻度誤差の妥当なレベルは、測定されたBGA割合に有意な影響を及ぼさなかった。これらの結果は、BGA混合がDNA試料から信頼性をもって測定されうることを実証している。 As indicated above, biogeographic ancestry (BGA) is the genetic component of race. Because socio-cultural and geopolitical metrics for measuring humanity are human-made, non-natural, constructs, their use in genetic research makes it difficult to control the genetic structure of a population And may obscure an important correlation between BGA and human biology. This example provides methods and compositions for accurately measuring genetic structure within an individual. The human genome has drawn interest in candidate ancestral informative markers, which have been validated in an ultra-high throughput genotyping platform and used to establish parental allele frequencies. Using 71 of the most informative AIMs that cover most of the chromosomes (Table 6), the human population is divided into four main continental population groups (sub-Saharan Africa, East Asian, Indo-European and indigenous) The MLE method was used to determine individual BGA mixing ratios and their associated confidence intervals. As disclosed herein, self-reported population affiliations were almost completely correlated with the majority of BGA population affiliations measured for one sample of 2,024 international samples. BGA mixing results were surprisingly frequent and were generally consistent with anthropological and geopolitical history when observed. The created mixing ratio is traced in the family tree in a manner consistent with the law of independent combination, and by simulation, markers related to determining group affiliation are independent within the scope of our algorithm. It was revealed that it worked. The test was surprisingly robust because a large number of high delta markers were used; a reasonable level of simulated allele frequency error that could be caused by biased parental sampling was measured in the measured BGA percentage There was no significant effect. These results demonstrate that BGA mixing can be reliably measured from DNA samples.

サンプル収集：
親のサンプル − 親群対立遺伝子頻度を確立するために、4つの大まかに定義されたヒト集団群の100個の比較的均一な子孫がジェノタイピングされた。これらの4つの群は、集団がサハラ以南アフリカ(サハラ以南アフリカ人)、ヨーロッパおよび中東(インドヨーロッパ人)、北/南アメリカ(先住アメリカ人)および東アジア(東アジア人)の大陸領域へ比較的隔離された地点まで時をさかのぼった、単純化された人類系図の合体に対応した。本質において、現存する集団の極値は、身体的特徴および公知のヒト移住パターンによってとられ、人類系図はそれを仮定することにより単純化された、なぜなら、集団の残りは、これらの極値により定義される連続体に沿って身体的特徴を現しているからであり、人類社会のすべては、内への放散およびこれらの4つの主な大陸群間の混合から生まれた。 Sample collection:
Parental Samples—To establish parental allele frequencies, 100 relatively uniform offspring of four roughly defined human population groups were genotyped. These four groups are compared to continental regions in which the population is sub-Saharan Africa (sub-Saharan Africa), Europe and Middle East (Indo-European), North / South America (Indigenous Americans), and East Asia (East Asians) Corresponds to the unification of simplified human genealogy that goes back to the point where it was isolated. In essence, extrema of existing populations are taken by physical characteristics and known human migration patterns, and the anthropology has been simplified by assuming that because the rest of the population is due to these extremes This is because it reveals physical features along the defined continuum, and all of human society was born out of intrusion and mixing between these four main continental groups.

収集の努力は、各領域に居住し、自己記載された「人種」によって各群の子孫について比較的均一な所属をもつ個体に焦点を合わせられた；各対象は、各群の子孫に関連した強い身体的外観を現し、その群についての均一な所属を報告した。BGA所属または人種の事実上の究極的審判員は存在しないが、民族性を考慮せずに収集することは、体系的混合が民族性の機能である場合には、所定の群についての頻度推定に体系的誤差を導入しうる。可能であるところにおいて、各親群内でできる限り幅広い種類の民族から収集するように試みられ、各親サンプル内での混合が釣り合うことが期待された。既知の均一な所属の個体から収集することはより良いだろうが、サンプリングが混合または他の集団構造に関して偏向されてなかった場合には、収集するのに実際的であるサンプルに現存する混合は、体系的偏向を導入するよりむしろ、検査の検出力を低下させる傾向にある。各BGA群内の各マーカーについてのハーディ-ワインベルグ平衡の存在は、適度に良いサンプルが得られたことの表示として頼られた。サハラ以南アフリカ人サンプルは、ナイジェリアおよびコンゴ、アフリカにおいて収集された；ヨーロッパ人サンプルは、米国における様々な地元民から収集された；東アジア人サンプルは、日本および中国から収集された；ならびに先住アメリカ人サンプルは、南メキシコの辺鄙な地域に住む「ネティボズ(Nativos)」から収集された。すべてのサンプルは、ヒト集団変動の遺伝的研究を目的としてIRBガイドライン下で収集された。 Collection efforts focused on individuals living in each area and having a relatively uniform affiliation for each group's offspring by a self-described “race”; each subject was associated with each group's offspring Reported a uniform physical affiliation for the group. There is no de facto ultimate referee of BGA affiliation or race, but collecting without considering ethnicity is the frequency for a given group when systematic mixing is a function of ethnicity. Systematic errors can be introduced into the estimation. Where possible, attempts were made to collect from the widest possible range of ethnic groups within each parent group, and the mix within each parent sample was expected to be balanced. It would be better to collect from individuals of known uniform affiliation, but if the sampling was not biased with respect to the mix or other population structure, the existing mix in the sample that is practical to collect would be Rather than introducing systematic deflections, they tend to reduce the power of inspection. The presence of Hardy-Weinberg equilibrium for each marker within each BGA group was relied upon as an indication that a reasonably good sample was obtained. Sub-Saharan African samples were collected in Nigeria and Congo, Africa; European samples were collected from various locals in the United States; East Asian samples were collected from Japan and China; and Native America Human samples were collected from “Nativos” in remote areas of southern Mexico. All samples were collected under IRB guidelines for genetic studies of human population variability.

実験的サンプル − 認可されたIRB同意用紙を読み、サインした後、対象者は、伝記の質問票を完成し、頬側塗布かまたは4 mlの血液のいずれかを提供した。質問票において、対象者は、彼ら自身、彼らの母親、父親ならびに母方および父方の祖父母を「アフリカ人」、「アメリカインディアン」、「アジア人」、「カフカス人」、「ヒスパニック」または「その他」の群に属するとして、各家族メンバーについて「知らない」と報告するという選択肢も含めて、記載した。対象者の一部について、デジタル写真が撮られた；写真が提供されたそれらの対象者から明確な許可を得た。DNAは、循環しているリンパ球または頬側塗布から商業的キット(Qiagen)を用いて抽出され、25 K SNPストリーム超高処理量(UHT)ジェノタイピングシステムを用いるプライマー伸長プロトコールが用いられた(Orchid Biosciences)。 Experimental Sample-After reading and signing an approved IRB consent form, the subject completed a biographical questionnaire and provided either buccal application or 4 ml of blood. In the questionnaire, subjects will identify themselves, their mother, father and maternal and paternal grandparents as “African”, “American Indian”, “Asian”, “Caucasian”, “Hispanic” or “Other”. It was described including the option of reporting "I don't know" about each family member as belonging to the group. Digital photographs were taken of some of the subjects; explicit permission was obtained from those subjects to whom the photos were provided. DNA was extracted from circulating lymphocytes or buccal applications using a commercial kit (Qiagen) and a primer extension protocol using a 25 K SNP stream ultra high throughput (UHT) genotyping system was used ( Orchid Biosciences).

生物地理学的祖先(BGA)の推定
ソフトウェアプログラムは、多座AIM遺伝子型を用いて個体のBGA混合の最大尤推定値(MLE)を決定するために、Hanisら(前記、1986)のアルゴリズムに基づいて書かれた(実施例6；表12も参照)。デルタ(δ)値は、マーカーの祖先情報提供性の表現である(Deanら、前記、1994)。二対立遺伝子マーカーについて、頻度差(δ)は、p_x-p_y(それはq_y-q_xに等しい)に等しく、p_xおよびp_yは集団XおよびYにおける一方の対立遺伝子の頻度であり、q_xおよびq_yは他方の頻度である。遺伝子座内および遺伝子座間において対立遺伝子状態における独立性からの逸脱を検定するために、MLD精密検定が用いられた(Zaykinら、Genetica 96:169-178、1995)。 Biogeographical ancestry (BGA) estimation software program uses the algorithm of Hanis et al. (Supra, 1986) to determine the maximum likelihood estimate (MLE) of an individual's BGA mixture using a multidentate AIM genotype. Based on (Example 6; see also Table 12). The delta (δ) value is a representation of the marker ancestry information provision (Dean et al., Supra, 1994). For biallelic markers, the frequency difference (δ) is equal to p _x -p _y (which is equal to q _y -q _x ), where p _x and p _y are the frequencies of one allele in populations X and Y , Q _x and q _y are the other frequencies. To test the deviation from independence in allelic status within and between loci, an MLD exact test was used (Zaykin et al., Genetica 96: 169-178, 1995).

この実施例に用いられた71個のAIMの収集物は、4次元(サハラ以南アフリカ人、先住アメリカ人、インドヨーロッパ人および東アジア人)の問題の6つの可能なペアのそれぞれの内で累積δ値を最大限にし、ペアのそれぞれの間で累積δ値における差を最小限にするように選択された。アルゴリズムは、一度に3つの群を用いて多座遺伝子型に対応する比例的所属の尤度推定値を得るために集団特異的対立遺伝子頻度を逆にする(主として計算法の便宜上、および4次元混合は比較的まれである可能性が高いから)。例えば、100%インドヨーロッパ人、0%先住アメリカ人、0%東アジア人の尤度が計算される、その後、99%インドヨーロッパ人、1%先住アメリカ人、0%東アジア人が次に計算されるなど、すべての可能なインドヨーロッパ人、先住アメリカ人および東アジア人の割合が考慮されるまであり、その後、その工程は、すべての可能なインドヨーロッパ人、先住アメリカ人およびアフリカ人の割合、ならびにすべての可能な先住アメリカ人、アフリカ人および東アジア人の割合について繰り返される。最大値の尤度が最大尤推定値(MLE)として選択される。 The collection of 71 AIMs used in this example is cumulative within each of the six possible pairs of four-dimensional (sub-Saharan African, Native American, Indo-European, and East Asian) problems. It was chosen to maximize the δ value and minimize the difference in cumulative δ values between each of the pairs. The algorithm reverses the population-specific allele frequency (mainly for computational convenience and 4D) to obtain a likelihood estimate of proportional affiliation corresponding to the multilocus genotype using 3 groups at a time. Because mixing is likely to be relatively rare). For example, the likelihood of 100% Indian Europeans, 0% Native Americans, 0% East Asians is calculated, then 99% Indian Europeans, 1% Native Americans, 0% East Asians are calculated next And so on until all possible Indo-European, Native American and East Asian percentages are considered, and then the process is followed by all possible Indo-European, Native American and African percentages , And repeated for all possible Native American, African and East Asian proportions. The likelihood of the maximum value is selected as the maximum likelihood estimate (MLE).

三角形プロット上に単一のMLEをプロットする場合、尤度がMLEの2倍、5倍および10倍内である空間が範囲を定められる(複数のMLEsが単一の三角形プロットに示される場合、これらの区間はプロットされない)。すべての4つのBGA群をいっしょに用いてMLEを計算することについて、手順は同じ精密様式で実行された；3つの可能な3元BGA組み合わせの代わりに、たった1つの4元BGA組み合わせが可能である。この実施例に記載されるすべてのMLEsは、3元計算スキームを用いて計算された。検査のこの型の代替バージョンが可能であり、例えば、人類系図の異なる合体に対応する異なるAIMおよび異なる親群を用いることであり、ここで与えられているものとは異なる人類学的時間尺度に関して意義のある結果を提供することが期待される。 When plotting a single MLE on a triangle plot, the space where the likelihood is within 2x, 5x and 10x the MLE is bounded (if multiple MLEs are shown in a single triangle plot, These intervals are not plotted). For calculating MLE using all four BGA groups together, the procedure was performed in the same precision manner; instead of three possible three-way BGA combinations, only one four-way BGA combination is possible is there. All MLEs described in this example were calculated using a ternary calculation scheme. Alternative versions of this type of examination are possible, for example, using different AIMs and different parent groups corresponding to different coalescings of the anthropology, with respect to different anthropological time scales than those given here It is expected to provide meaningful results.

データベースがスクリーニングされた時は、SNPコンソーシアム(TSC)が、頻度が3つの集団(アフリカ系アメリカ人、ヨーロッパ系アメリカ人および東アジア人)において入手可能である約27,000個のSNPについてのデータに貢献した。このデータベースは、候補AIM、すなわち、4つの大陸集団群のいずれか2つの間でδ>0.40のSNP、についてスクリーニングされた(実施例1参照；Shriverら、前記、1997も参照)。サハラ以南アフリカ人(AA)、インドヨーロッパ人(IE)、東アジア人(EA)および先住アメリカ人(NA)の親サンプルは、最も大きいδ値をもつ200個の候補AIMのそれぞれについてスクリーニングされ、これらのうち、71個が、1%より大きく、かつ群のペアの少なくとも1つについてδ>0.40の少数対立遺伝子頻度をもち、真のAIM(すなわち、真のSNP)として確証された。71個のAIMは、配列番号：1〜71として示されている；群のペアについて最上位100個の候補AIMは以下のとおりだった：EA対AA(配列番号：7、21、23、27、45、54、59、63および72〜152)；EA対IE(配列番号：3、8、9、11、12、33、40、59、63および153〜239)；ならびにIE対AA(配列番号：1、8、11、21、24、40、172および240〜331)。1つのペアワイズ比較により同定されたいくつかのAIMがまた第二のペアワイズ比較についてのAIMでありうること(例えば、配列番号：59は、EA対AAおよびEA対IUの比較として同定された)を留意されるべきである、そのようなAIMは例外ではあるが。さらに、71個のAIMの多数は、いずれかのペアについて示される最上位100個の候補AIMのリストにはない(しかし、最上位200個の候補AIMにはあった)；候補AIMは、例えば、それらが、例証されたプラットフォームについて用いられた増幅パラメーターのSNP型のせいで十分にジェノタイピングしなかったため、または本明細書に開示されているような他の理由のために、用いられなかった。 When the database was screened, the SNP Consortium (TSC) contributed data on about 27,000 SNPs that are available in three populations (African American, European American and East Asian) did. This database was screened for candidate AIMs, ie SNPs with δ> 0.40 between any two of the four continental population groups (see Example 1; see also Shriver et al., Supra, 1997). Sub-Saharan African (AA), Indo-European (IE), East Asian (EA) and Native American (NA) parent samples were screened for each of the 200 candidate AIMs with the largest δ values, Of these, 71 were confirmed as true AIMs (ie, true SNPs) with a minor allele frequency greater than 1% and δ> 0.40 for at least one of the pair of groups. 71 AIMs are shown as SEQ ID NOs: 1-71; the top 100 candidate AIMs for the group pairs were as follows: EA vs. AA (SEQ ID NOs: 7, 21, 23, 27 45, 54, 59, 63 and 72-152); EA versus IE (SEQ ID NOs: 3, 8, 9, 11, 12, 33, 40, 59, 63 and 153-139); and IE versus AA (sequence Numbers: 1, 8, 11, 21, 24, 40, 172 and 240-331). That some AIMs identified by one pair-wise comparison could also be AIMs for a second pair-wise comparison (eg, SEQ ID NO: 59 was identified as a comparison of EA vs. AA and EA vs. IU) It should be noted that such an AIM is an exception. Furthermore, the majority of 71 AIMs are not in the list of top 100 candidate AIMs shown for any pair (but were in the top 200 candidate AIMs); They were not used because they did not fully genotype because of the SNP type of amplification parameters used for the illustrated platform, or for other reasons as disclosed herein .

（表６）BGA検査に用いられる71個のAIMについてのペアワイズδ値

AF-サハラ以南アフリカ人、CT-インドヨーロッパ人、EA-東アジア人、AI-先住アメリカ人。AIM固有識別名は、SNP配列のNCBI:dbSNPデータベースへの提出において著者らにより与えられたGenBankアクセッション番号と同様に示されている(AIM)。各ペアワイズ比較についてδ>0.40をもつAIMの数は、リストの一番下に示されている。 (Table 6) Pairwise δ values for 71 AIMs used for BGA inspection

AF-Sub-Saharan Africa, CT-Indo-European, EA-East Asian, AI-Indigenous American. The AIM unique identifier is shown as well as the GenBank accession number given by the authors in the submission of the SNP sequence to the NCBI: dbSNP database (AIM). The number of AIMs with δ> 0.40 for each pairwise comparison is shown at the bottom of the list.

例証されたパネルに用いられた71個のAIMは、23個の常染色体の染色体のうちの21個に渡って広がり(図8)、平均の染色体は3個のAIMを含んだ(表6参照)。それぞれは、いっしょに考慮されるすべての4つのBGA群での全体として、および各BGA群内の両方において、ハーディ-ワインベルグ平衡における対立遺伝子をもち、お互いとの連鎖不平衡にあるものは見出されなかった。ソフトウェアプログラムは、最尤法アルゴリズムでこれらのAIMについての個体の遺伝子型を用いた(実施例6、表12参照；実施例1も参照)。このアルゴリズムでの71個のマーカーの使用は、「BGA検査」のもう一つの例を提供する。 The 71 AIMs used in the illustrated panel spread across 21 of the 23 autosomal chromosomes (Figure 8), with the average chromosome containing 3 AIMs (see Table 6). ). Each has alleles in the Hardy-Weinberg equilibrium and is in linkage disequilibrium with each other, both in all four BGA groups considered together and within each BGA group. It was not issued. The software program used individual genotypes for these AIMs with a maximum likelihood algorithm (see Example 6, Table 12; see also Example 1). The use of 71 markers in this algorithm provides another example of “BGA inspection”.

BGA検査は、その検査の構築に用いられた親の先住アメリカ人、アフリカ人およびインドヨーロッパ人のサンプルについてBGA混合割合を計算するために用いられた。各サンプルについて混合割合を計算した後、それらは、3元混合の相対的割合が2次元で表されることを可能にするために三角形プロットにプロットされた。これらは親群を含む同じサンプルであり、かつそこから集団対立遺伝子頻度が引き出されたため、それらは、比較的均一なBGA(すなわち、低い混合の)を示すことが予想され、実際、サハラ以南アフリカ人、先住アメリカ人およびヨーロッパ人の親サンプルはすべて、比較的均一なBGAを示した(すなわち、それらは、BGA三角形の適切な頂点へ向かってプロットした)。 The BGA test was used to calculate the BGA mix ratio for the parent Native American, African and Indo-European samples used to construct the test. After calculating the mixing ratio for each sample, they were plotted in a triangular plot to allow the relative ratio of the ternary mixing to be represented in two dimensions. Because these are the same sample including the parent group and from which the population allele frequency was derived, they are expected to exhibit a relatively uniform BGA (i.e., low mixing), indeed, sub-Saharan Africa Human, Native American and European parent samples all showed a relatively uniform BGA (ie, they were plotted towards the appropriate vertices of the BGA triangle).

BGA検査は、次に、自己申告された人種の1,186人の個体(43人のアフリカ系アメリカ人、1,120人のカフカス人および23人のヒスパニック)についてBGA割合を測定するために用いられた。個体のうちの306人(26%)は、均一なBGAを示した(いずれか1つの群について100%)。1,186人のうち101人(8.5%)は、3つの群について>5%BGA所属を含み、これらの個体について、4群計算を実行するためのソフトウェアの改変がより適切である可能性がある、およびサンプルの大多数が2元混合の特徴があったことを示した。有意により大きいヨーロッパ人混合が、ナイジェリア人のそれと比較してアフリカ系アメリカ人に同定された(サハラ以南アフリカ人頂点から離れた点の分散として視覚化された)。対照的に、インドヨーロッパ人サンプルは、親サンプルがもっているのと同じくらいインドヨーロッパ人頂点にきちんとプロットした、とはいっても、低レベルの先住アメリカ人または東アジア人混合はまれではなかった − おおよそ対象者の2/3が検出可能に含んだが、一般的に低レベルのそのような混合である。ヒスパニックの対象者は、先住アメリカ人/インドヨーロッパ人の軸に沿って対等な分布でプロットし、ヒスパニックが約500年前、植民地のヨーロッパ人と居住の先住アメリカ人の混合から生まれたという知識と一致した。 The BGA test was then used to measure the BGA rate for 1,186 individuals of self-reported race (43 African Americans, 1,120 Caucasians and 23 Hispanics). 306 (26%) of the individuals showed uniform BGA (100% for any one group). 101 of 1,186 (8.5%) included> 5% BGA affiliation for 3 groups, and for these individuals, software modifications to perform 4-group calculations may be more appropriate, And the majority of samples showed that they had the characteristics of binary mixing. A significantly larger European mix was identified in African Americans compared to that of Nigerians (visualized as a variance of points away from the sub-Saharan African peak). In contrast, the Indo-European sample plotted as neatly at the top of the Indo-European population as the parent sample had, but a low level of Native American or East Asian mixing was not uncommon − Approximately 2/3 of the subjects are detectably included, but are generally such a low level of mixing. Hispanic subjects plotted with equal distribution along the Native American / Indo-European axis and knowledge that Hispanic was born about 500 years ago from a mix of colonial Europeans and resident Native Americans Matched.

BGA検査で得られた大部分割合が自己申告された人種で確証したかどうか、およびどの程度までか、を決定するために、BGA混合割合が自己申告された人種の2,048人の個体について計算され、検査から測定された大部分BGAを各個体の自己申告された大部分人種に対して目隠しして(計算法の意味で)比較された。非常に強い一致が検査で測定された大部分BGA群と自己申告された大部分人種の間に観察された(表7)。検査を用いて、1252人/1252人自己記載されたヨーロッパ系アメリカ人(米国生まれのカフカス人)が大部分インドヨーロッパ人BGAを示した。201人の自己記載されたアフリカ系アメリカ人のうちの191人は、大部分サハラ以南BGAを示し、残りの11人は、サハラ以南BGAを小部分所属として、インドヨーロッパ人を大部分所属として示した。ヒスパニックは、インドヨーロッパ人と先住アメリカ人との間に大部分BGAにおいておおよそ等しい分布を示し、三角形プロットに観察された結果およびこの群の人類学的歴史と一致した。 To determine if and to what extent the majority of the BGA results obtained were self-declared, for 2,048 individuals of the race with a self-reported BGA mix Most of the BGAs calculated and measured from the test were compared (in the sense of calculation) blinded to the self-declared majority of each individual. A very strong agreement was observed between most BGA groups measured by examination and most self-reported races (Table 7). Using the test, 1252/1252 self-described European Americans (US-born Caucasians) showed mostly Indo-European BGA. Of the 201 self-listed African-Americans, 191 show mostly sub-Saharan BGA, and the remaining 11 belong mostly to sub-Saharan BGA and mostly Indo-Europeans. As shown. Hispanics showed a roughly equal distribution in BGA between Indo-Europeans and Native Americans, consistent with the results observed in the triangle plot and the anthropological history of this group.

（表７）大部分の生物地理学的祖先と自己申告された人種との比較

^＊小部分割合は先住アメリカ人であった。
^＊＊第二型として有意な先住アメリカ人
^＊＊＊小部分割合は東アジア人であった。 (Table 7) Comparison of most biogeographic ancestry with self-reported race

^{* A} small percentage was Native Americans.
^** Native Americans that are significant as the second type
^{*** The} small fraction was East Asian.

南メキシコからの1人の個体が大部分ヨーロッパ人であったという発見のような、予想されなかった結果が得られた場合でさえも、予想された所属が、全く存在しないというよりむしろ小部分所属であるという点において多かれ少なかれ一致しており、大部分所属(インドヨーロッパ人)は、その地域の歴史(何百年前、スペイン人に植民地化された)に照らしてなるほどと思えた。この特定の場合において、先住アメリカ人祖先の部分は、50%よりわずかだけ少なかった。1つならぬひどい誤りが観察され、例えば、自己申告されたヨーロッパ系アメリカ人が大部分東アジア人として分類された。大部分インドヨーロッパ人BGAの11人の自己記載されたアフリカ系アメリカ人は、ひどい誤りであるように見えるだろうが、結果は同様に一致していた − これらのアフリカ系アメリカ人サンプルのそれぞれは、おおよそ同等のインドヨーロッパ人/アフリカ人割合を示し、混合を示唆し、検査誤差よりむしろ、サンプルが採られた地域の歴史的タペストリーと一致していた。 Even if an unexpected result was obtained, such as the discovery that one individual from Southern Mexico was mostly European, a small part rather than no expected affiliation The affiliations were more or less consistent, and most of the affiliations (Indo-Europeans) seemed to be in light of the history of the region (which was colonized by the Spanish hundreds of years ago). In this particular case, the proportion of Native American ancestry was only slightly less than 50%. Only one terrible error was observed, for example, self-reported European Americans were mostly classified as East Asians. Most of the 11 self-described African Americans in the Indo-European BGA would seem to be a terrible mistake, but the results were equally consistent-each of these African-American samples , Showing roughly similar Indo-European / African proportions, suggesting mixing, and consistent with the historical tapestry of the area where the samples were taken, rather than inspection errors.

真の盲検法(計算法の意味において目隠しである検査に対立するものとして)を行うために、サンディエゴ警察鑑識（the San Diego Police Department Crime Lab、SDPD)およびセントラル・フロリダ大学法医科学国立センター（the National Center for Forensic Science at the University of Central Florida、UCF)それぞれが、BGA検査のために数字で符号化されたアイデンティティの10個の頬側塗布を寄託した。BGA検査が行われ、結果はSDPDおよびUCFへ戻され、それぞれは独立してそれらの結果を評価し、サンプルの自己申告された集団所属を明らかにした。BGA混合割合検査から測定された大部分パーセンテージは、自己申告された集団所属と矛盾しなかった(表8参照)。サンプルのいくつかは、混合されていると論理的に考えられる群に所属した個体由来であった − 例えば、フィリピン人(SDPD2、SDPD3、表8)、アフリカ系アメリカ人またはカリブ人(SDPD5、SDPD6、UCF7、UCF8、表8)、メキシコ系アメリカ人(ヒスパニック、SDPD8、SDPD10、表8)およびプエルトリコ人(UCF6、表8)；有意な混合がこれらのサンプルのそれぞれについて検出された。さらになお、検出された混合の型は、所属した集団の人類学的歴史に関して筋が通っていた。例えば、サハラ以南アフリカ人、先住アメリカ人およびインドヨーロッパ人血統の人々はプエルトリコに住んでいるが、東アジア人は比較的まれである、そして、検査されたプエルトリコ人の個体についての検査結果は、インドヨーロッパ人および先住アメリカ人を示したが、東アジア人の混合を示さなかった。 The San Diego Police Department Crime Lab (SDPD) and the University of Central Florida National Center for Forensic Science (SDPD) to conduct a true blind test (as opposed to a test that is blind in the sense of calculation) The National Center for Forensic Science at the University of Central Florida (UCF) each deposited 10 buccal applications of numerically encoded identity for BGA testing. A BGA test was performed and results were returned to SDPD and UCF, each independently evaluating their results to reveal the self-reported population affiliation of the samples. Most percentages measured from the BGA mix test were consistent with self-reported group affiliations (see Table 8). Some of the samples were from individuals belonging to groups that were logically considered mixed--for example, Filipinos (SDPD2, SDPD3, Table 8), African Americans or Caribbeans (SDPD5, SDPD6 UCF7, UCF8, Table 8), Mexican American (Hispanic, SDPD8, SDPD10, Table 8) and Puerto Rican (UCF6, Table 8); significant mixing was detected for each of these samples. Still further, the type of mixture detected was in line with the anthropological history of the group to which it belonged. For example, sub-Saharan Africans, indigenous Americans and Indo-European pedigrees live in Puerto Rico, while East Asians are relatively rare, and test results for tested Puerto Rican individuals are: Showed Indo-Europeans and Native Americans, but did not show a mix of East Asians.

混合のMLEを決定することに加えて、ソフトウェアプログラムは、確率空間を測量し、範囲内では比例的所属の尤度が、正しい答えである可能性がMLEより2倍、5倍および10倍、低いその空間を定義するように設計された(信頼等高線、MLEを中心とした環として三角形プロット上にプロットされる)。MLEおよび信頼等高線を決定するための最尤法アルゴリズムの正確さを検定するための1つの方法は、AIMマーカーの若干数が、「失敗」読みをもつそれぞれについての遺伝子型で置換することにより解析から除去される場合、これらの値は変化するかどうか、およびどれくらい変化するかを観察することであった。例えば、所定のサンプル遺伝子型について、アフリカ人/東アジア人識別について高δ値のマーカーについての遺伝子型のすべてが「失敗」または「データ無し」と置換された場合には、正確な検査は、三角形プロットのこの次元においてのみゆがんだ信頼等高線を示すことが予想されると思われる。従って、BGA検査は、それの付随した信頼区間を以て大部分東アジア人BGAの1つのサンプルをプロットするために用いられ(図9A)、その後、先住アメリカ人対東アジア人BGAについて情報価値のあるδをもつ検査におけるマーカーのすべての24個が除去され、MLEおよび信頼推定は、残りのAIMで再計算された(図9B)。欠けているAIMでの再計算において、信頼環は、東アジア人から先住アメリカ人BGA頂点の方向へ劇的にゆがみ(図9B)、予想されたとおり、東アジア人/先住アメリカ人識別について良いδ値をもつAIMの欠損が東アジア人/先住アメリカ人軸に沿っての信頼が高くない推定を生じたことを示している。たぶん、サンプルは大部分東アジア人として分類され、かつ東アジア人/インドヨーロッパ人および東アジア人/アフリカ人所属の識別についてのAIMは煩わされていないようにしておかれたため、MLEシフト自身は最小限であった；東アジア人/先住アメリカ人軸に沿った不確実性のほとんどは、等高線のシフトにおいて明らかであった。他のサンプルおよびAIMでの同様の実験は同様の結果を生じた。 In addition to determining the mixed MLE, the software program surveys the probability space, and within the range, the likelihood of proportional affiliation is two, five and ten times more likely than MLE to be the correct answer, Designed to define that space low (confidence contours, plotted on a triangle plot as a ring centered on MLE). One method to test the accuracy of the maximum likelihood algorithm for determining MLE and confidence contours is to analyze by replacing some of the AIM markers with genotypes for each with a “failure” reading When removed from, it was to observe whether and how much these values would change. For example, for a given sample genotype, if all of the genotypes for high δ markers for African / East Asian discrimination were replaced with “failure” or “no data”, then the correct test is It would be expected to show a distorted confidence contour only in this dimension of the triangle plot. Thus, the BGA test is mostly used to plot one sample of East Asian BGA with its associated confidence interval (Figure 9A), and then informative about Indigenous American vs. East Asian BGA All 24 of the markers in the test with δ were removed, and MLE and confidence estimates were recalculated with the remaining AIM (FIG. 9B). In the missing AIM recalculations, the trust circle is distorted dramatically from East Asians to the indigenous American BGA apex (Figure 9B), and as expected, good for East Asian / Indigenous American discrimination. This shows that a deficiency of AIM with a δ value resulted in a less reliable estimate along the East Asian / Indigenous American axis. Maybe the sample was mostly classified as East Asian, and the AIM on identifying East Asian / Indo-European and East Asian / African affiliations was left undisturbed, so the MLE shift itself Most of the uncertainty along the East Asian / Indigenous American axis was evident in the contour shift. Similar experiments with other samples and AIM produced similar results.

（表８）サンディエゴ警察およびセントラル・フロリダ大学法医科学センターによるBGA検査の目隠し的挑戦

(Table 8) Blindfolded challenges in BGA testing by the San Diego Police Department and the University of Central Florida Forensic Science Center

BGA混合割合測定の再現性および一貫性を測定するために、5つのサンプルが、別々の場合において、ジェノタイピングおよび解析された。ヨーロッパ系アメリカ人、アフリカ系アメリカ人、ヒスパニックおよびアジア人群への自己申告された大部分所属から1つのサンプルが選択され、第五のサンプルは親の先住アメリカ人群から選択された。失敗した遺伝子座を除いては、各個体における各マーカーでの遺伝子型は、実行間で100%一致し、AIMが信頼性をもってジェノタイピングすることを示している。BGA混合割合における1〜3%変動は実行のところどころで観察された；シミュレーション研究は、変動がこれらのジェノタイピング失敗に帰することを示した。この結果は、失敗した遺伝子座が、大部分BGAまたは混合レベルのいずれに関しても、個体についてのBGA混合測定の再現性に対して有意な妨げを与えなかったことを示している。これらのシミュレーションから、サンプルが最小二項混合である場合には、BGA検査は約10個の遺伝子座失敗を許容し、かつ混合のないサンプルについては、失敗した遺伝子座のより大きい数が許容できることが測定された(すなわち、混合パーセンテージにおける変化は5%未満であった)。 Five samples were genotyped and analyzed in separate cases to determine the reproducibility and consistency of BGA mix fraction measurements. One sample was selected from the majority of self-reported affiliations to European Americans, African Americans, Hispanics, and Asians, and a fifth sample was selected from the parent Native Americans. Except for the failed locus, the genotype at each marker in each individual is 100% consistent between runs, indicating that AIM is genotyping reliably. A 1-3% variation in BGA mixing ratio was observed throughout the run; simulation studies showed that the variation was attributable to these genotyping failures. This result indicates that the failed locus did not significantly interfere with the reproducibility of the BGA mixed measurements for the individuals, mostly for either BGA or mixed levels. From these simulations, if the sample is a minimum binomial mixture, the BGA test will tolerate approximately 10 locus failures and, for samples without mixing, a larger number of failed loci can be tolerated. Was measured (ie, the change in mixing percentage was less than 5%).

家族遺伝の法則を仮定した場合、検査で測定されたBGA混合割合は感知できたかどうかを決定するために、いくつかの家系図から3つの世代についての割合が計算された。典型的な結果が図10に示されており、実質的なヨーロッパ人/先住アメリカ人混合をもつ確認された父系(STR検査を用いる)の家族について得られた比率を描いている。第一世代個体は、自己申告されたヨーロッパ系アメリカ人で、BGA検査で有意な先住アメリカ人混合を含むことが測定されたが、それは、異なる割合で彼らの息子および娘へ伝えられ、独立組み合わせの法則と矛盾しなかった。息子の配偶者は、メキシコ生まれのヒスパニックであり、26%ヨーロッパ人/74%先住アメリカ人混合であることが測定された。彼らの子どものそれぞれは、おおよそ、彼らの両親間の先住アメリカ人とインドヨーロッパ人混合の中間レベルを含み、再び、独立組み合わせの法則と矛盾しなかった。息子の1人は、わずかなパーセンテージの東アジア人祖先をもつとして分類されたが、そのレベル(4%)は、上で考察されているように確立された高信頼限界(約3%)に近かった。検査された他の血統(n=8)は、同様に一致した結果を示し、大部分BGAおよび混合レベルに関して、BGA検査結果が家系図の関係内で感知できたことを示している。 Given the laws of family inheritance, the proportions for three generations were calculated from several pedigrees to determine if the BGA mixture proportion measured in the test was perceptible. A typical result is shown in FIG. 10, depicting the ratio obtained for a confirmed paternal (with STR test) family with a substantial European / Indigenous American mix. The first generation individuals were self-reported European Americans and were measured to contain a significant indigenous American mix on a BGA test, which was communicated to their sons and daughters in different proportions, independent combinations There was no contradiction with the law. The son's spouse was a Mexican-born Hispanic, measured to be 26% European / 74% Native American mixed. Each of their children roughly contained an intermediate level of indigenous American and Indo-European mixing between their parents and again was consistent with the law of independent combination. One of the sons was classified as having a small percentage of East Asian ancestry, but its level (4%) was within the established high confidence limit (about 3%) as discussed above. It was close. Other pedigrees tested (n = 8) showed similarly consistent results, indicating that BGA test results were perceptible within the pedigree for most BGA and mixed levels.

BGA検査は親集団対立遺伝子頻度に頼っているため、検査により作成された混合割合が親の対立遺伝子サンプリング偏向により影響を及ぼされた程度が調べられた。集団のペアが選択され、AIMはこれらの2つの集団間での所属を決定することに関連していたのだが(表6 − 群のこのペアについて最も高いδ値をもつものを選択した)、対立遺伝子頻度は、AIMについてのδ値が20%(これらの2つの群に関して)、低下するように群の1つにおけるこれらのAIMのそれぞれについて調整された。要するに、検査の検出力は、群の特定のペア間での所属の決定について20%、計画的に低下させられた；この低下させられた検査は、祖先2.1EA/EU BGA検査と呼ばれ、EA/EUは、東アジア人(EA)とインドヨーロッパ人(EU)群間の識別における低下を指す。31個のサンプルがランダムに選択され、遺伝子型は、原型(ANCESTRYbyDNA(商標)2.0)の検査についてと全く同じ様式で祖先2.1EA/EU BGA検査に対して実行され、結果は、ANCESTRYbyDNA(商標)2.0検査で得られたものと比較された。 Because the BGA test relies on parental allele frequency, the extent to which the mix ratio created by the test was influenced by the parental allele sampling bias was examined. A pair of groups was selected and AIM was associated with determining affiliation between these two groups (Table 6-selected the one with the highest δ value for this pair of groups) Allele frequencies were adjusted for each of these AIMs in one of the groups so that the δ value for AIM was reduced by 20% (for these two groups). In short, the power of the test was systematically reduced by 20% for the determination of affiliation between specific pairs of groups; this reduced test is called the ancestor 2.1EA / EU BGA test, EA / EU refers to the decline in discrimination between East Asian (EA) and Indo-European (EU) groups. Thirty-one samples were randomly selected and the genotype was run against the ancestor 2.1EA / EU BGA test in exactly the same manner as for the prototype (ANCESTRYbyDNATM 2.0), and the result was ANCESTRYbyDNATM Compared to that obtained in the 2.0 test.

ANCESTRYbyDNA(商標)2.0検査からの結果が、サンプリング偏向により引き起こされた親の対立遺伝子頻度誤差、多くて2、3パーセントのオーダーにあると予想されうるが、に高い感受性がある場合には、祖先2.1EA/EU AIMについて導入された20%変化は、結果として、ANCESTRYbyDNA(商標)2.0のものとは実質的に異なる混合割合を生じるはずである。観察されたインドヨーロッパ人/東アジア人混合の数が、アフリカ人/アジア人混合または先住アメリカ人/アフリカ人混合のような混合の他の型より有意に大きかったので、ヨーロッパ人/東アジア人のペアが第一検査について選択された − 祖先2.1EA/EU −、およびAIMの数および累積δ値が最も低いBGA群ペアは先住アメリカ人/東アジア人であり、このペアについてのδ値は第二検査において変更された − 祖先2.1NA/EA。ANCESTRYbyDNA(商標)2.0と祖先2.1EA/EUの間の混合割合において観察される平均変化は1.4%であった(標準偏差2.44%)。先住アメリカ人/東アジア人のペアについて、ANCESTRYbyDNA(商標)2.0と祖先2.1NA/EAの間の平均変化は1%であった(標準偏差2.3%)。 If the results from the ANCESTRYbyDNATM 2.0 test can be expected to be on the order of a few percent of the parent's allele frequency caused by sampling bias, but are highly sensitive to ancestry The 20% change introduced for 2.1EA / EU AIM should result in a mixing ratio that is substantially different from that of ANCESTRYbyDNA ™ 2.0. European / East Asian because the number of Indian European / East Asian mix observed was significantly greater than other types of mixes such as African / Asian mix or Native American / African mix Pairs were selected for the first study-ancestor 2.1EA / EU-and the BGA group pair with the lowest AIM count and cumulative δ value was Native American / East Asian, and the δ value for this pair was Changed in second test-ancestor 2.1NA / EA. The average change observed in the mixing ratio between ANCESTRYbyDNA ™ 2.0 and ancestor 2.1EA / EU was 1.4% (standard deviation 2.44%). For the Native American / East Asian pair, the mean change between ANCESTRYbyDNA ™ 2.0 and ancestor 2.1NA / EA was 1% (standard deviation 2.3%).

人種の社会文化的または自己保有の概念は、人種の遺伝性構成要素であるBGAとしてのヒト生物学ときちんと結びつけられている可能性は高くない。それは主観的、非精密および時々、不正確であるため、BGAの推論のための自己同定された人種の使用は、現在、実施されているのだが、いかに、およびなぜ、ヒト生物学がヒト人類学と関連しているかをわかりにくくしている。さらになお、前もって作られた人種の群への患者の融通のきかない区分けは、全く満足させない概念の実施である、なぜなら、多くの個体は、彼らの起源を混合の過程を通しての複数の集団へたどることができるからである。BGAを定義するための反復可能、検査可能な人類学的アプローチは、直接の相関を通してであろうが、および/もしくはより良い研究設計を通してであろうが、BGAと遺伝性疾患の間の関係を引き出しうる手段、またはMALDのような混合過程に頼る遺伝子マッピング方法を通してのようなより繊細な手段を提供することができる。 The socio-cultural or self-contained concept of a race is not likely to be properly linked to human biology as a BGA, the genetic component of the race. Because it is subjective, inexact and sometimes inaccurate, the use of self-identified races for BGA inference is currently being implemented, but how and why human biology is human It is difficult to understand if it is related to anthropology. Still further, the inflexible segmentation of patients into pre-made racial groups is the implementation of a concept that is completely unsatisfactory, because many individuals divide their origin into multiple populations through a process of mixing. Because you can follow. An iterative, testable anthropological approach to defining BGAs, whether through direct correlation and / or through better research design, shows the relationship between BGAs and genetic disorders More sophisticated means can be provided, such as those that can be derived or through genetic mapping methods that rely on mixing processes such as MALD.

本明細書に開示されているように、71個のマーカー検査は、BGA割合およびそれらの信頼区間の決定を可能にした。検査は、個体内のBGAの相対的比例の決定を可能にし、このように、BGA検査を、DNAから祖先を推論するために以前に用いられた他の検査と区別した。大部分BGA所属に関して、2200個より多い検査が行われ、人種の自己保有概念と矛盾する結果は得られなかった。以前の祖先の検査は、上部の90%範囲に対してのみ正確であった(Shriverら、前記、1997；Frudakisら、前記、2003も参照、これは参照として本明細書に組み入れられている)。BGA検査で観察される向上した性能は、DNAから祖先を推論するために一般に用いられていたCODISおよび他のSTRは、それらのδ値について選択されず、世界集団におけるそれらの多型複雑性について選択されたからでありうる。本明細書に開示されたBGA検査について、ゲノム全体が体系的にスキャンされ、この目的のための最良のAIMが選択された。さらに、STRまたはAlu配列を用いてDNAから祖先を推論するほとんどの試みは、単一の「人種の」群へサンプルを分類または区分けしようとした。50/50混合のような広範囲な混合の個体について、そのような方法は、「正しい」答えと同じくらいの回数で「間違った」答えを生じるように思われる。対照的に、BGA検査については、祖先は比例的所属に関して決定され、それに従って、この問題を改善する。 As disclosed herein, 71 marker tests allowed the determination of BGA proportions and their confidence intervals. The test allowed determination of the relative proportion of BGA within an individual, thus distinguishing the BGA test from other tests previously used to infer ancestry from DNA. For the majority of BGA affiliations, more than 2200 tests were performed and no results were inconsistent with racial self-contained concepts. Previous ancestral testing was only accurate for the upper 90% range (see also Shriver et al., Supra, 1997; Fruitakis et al., Supra, 2003, which is incorporated herein by reference) . The improved performance observed in the BGA test is that CODIS and other STRs commonly used to infer ancestry from DNA are not selected for their δ values, but for their polymorphic complexity in the world population It may be because it was selected. For the BGA test disclosed herein, the entire genome was systematically scanned and the best AIM for this purpose was selected. Furthermore, most attempts to infer ancestry from DNA using STR or Alu sequences have attempted to classify or partition samples into a single “racial” group. For a wide range of individuals, such as a 50/50 mixture, such a method seems to produce a “wrong” answer as many times as a “correct” answer. In contrast, for BGA testing, ancestry is determined with respect to proportional affiliation and ameliorates this problem accordingly.

BGA検査は、それが染色体の大部分を網羅しいてるSNPを使用する点において他の検査と区別できる。BGA検査を用いる全染色体の適用範囲は、染色体の小部分を網羅するのみであるCODIS STRを用いる検査を凌ぐ実質的な利点を提供する。さらに、BGA方法は、それの答えについての信頼限界を定量化する最初であると思われる。例証されているようなBGA検査は広く発見的であり、世界を、大陸ラインに沿って広く属する4つの主な人類学的群へ分割する。地理学的分割は、ヒト移住の人類学的歴史を重んじているが、4つの群の使用は、実に、非常に複雑な状況の単純化であり、任意であると考えられうる。さらに、比例的所属を決定する問題は、最も可能性が高い3元(4元よりむしろ)の組み合わせを計算することにより単純化された、なぜなら、4次元BGAの個体はまれであると考えられるから、およびそれは計算法の意味においてより便利であるからである。しかしながら、より複雑な検査は、人類学的歴史の実在の詳細のより多くを捕らえることができうるが、大まかな4つの集団検査でさえも、もし結果がこれらの区分および検査の構築に用いられた親のサンプルに関して厳密に解釈されるとすれば、意義のある、かつ歴史的な内容のデータを提供する。 The BGA test is distinguishable from other tests in that it uses SNPs that cover the majority of the chromosomes. Whole chromosome coverage using BGA testing offers substantial advantages over testing using CODIS STR, which only covers a small portion of the chromosome. Moreover, the BGA method appears to be the first to quantify the confidence limits for its answer. BGA tests, as illustrated, are widely heuristic and divide the world into four main anthropological groups that belong broadly along the continental line. Although geographic division respects the anthropological history of human migration, the use of the four groups is indeed a simplification of a very complex situation and can be considered optional. Furthermore, the problem of determining proportional affiliation has been simplified by calculating the most likely ternary (rather than quaternary) combination, because individuals with 4D BGA are considered rare And because it is more convenient in the sense of calculation. However, more complex tests can capture more of the real details of anthropological history, but even with a rough four-group test, the results can be used to build these categories and tests. It provides meaningful and historical content data if interpreted strictly for the parent sample.

特定の親の群を選択し、かつ一定の群への世界の単純な分割化を選択することにより、合体時間に意味がもたらされ、それにより、検査によって与えられた推論が評価される。実際、4つの世界集団群に基づく検査と25個に基づくより複雑な検査とにより与えられた答えの間の差は、人類学的時間尺度の一つであろうが、「正確さ」ではない。例えば、検査された米国生まれのほとんどのヒスパニックおよび検査されたアメリカインディアン・ヘリテージ型を主張しているほとんどの個体は、インドヨーロッパ人の背景において小部分先住アメリカ人混合を示した。しかしながら、ヒスパニックについての場合とは違って、アメリカインディアン・ヘリテージを主張している個体の一部は、先住アメリカ人混合の代わりに小部分東アジア人混合をもつとして分類された。先住アメリカ人についての創始者は東アジアから、おそらく歴史において異なる時代に異なる動向(wave)で、移住したため、先住アメリカ人と東アジア人の間の遺伝的距離は、先住アメリカ人とサハラ以南アフリカ人またはインドヨーロッパ人の間よりも低い(Cavalli-SforzaおよびCavalli-Sforza、前記、1995)。植民地時代前の北アメリカ人の中で、東アジア人または先住アメリカ人への比例的所属は、祖先が第一の波の部分だった個体について、祖先が後期の波の部分であった個体についてとは異なることが予想される。 Choosing a particular parent group and choosing a simple partition of the world into a certain group will bring meaning to the coalescing time, thereby evaluating the reasoning given by the test. In fact, the difference between the answers given by a test based on 4 world population groups and a more complex test based on 25 would be one of anthropological time scales, but not "accuracy" . For example, most of the tested US-born Hispanics and most individuals claiming the tested American Indian heritage type showed a partial indigenous American mix in the Indo-European background. However, unlike the case with Hispanics, some individuals claiming American Indian heritage were classified as having a partial East Asian mix instead of an indigenous mix. Since the founders of Native Americans migrated from East Asia, perhaps in different eras of history, with different waves, the genetic distance between Native Americans and East Asians is between Native Americans and sub-Saharan Africa. Lower than between humans or Indo-Europeans (Cavalli-Sforza and Cavalli-Sforza, supra, 1995). Among pre-colonial North Americans, proportional affiliation to East Asians or indigenous Americans refers to individuals whose ancestors were part of the first wave, whereas those whose ancestors were part of the first wave Is expected to be different.

本研究に用いられた先住アメリカ人についての親のサンプルは、南メキシコから引き出され、このサンプルから得られた先住アメリカ人について確立されたAIM対立遺伝子頻度は、ベーリング海峡を渡るより初期の移住の波からの祖先をより表していると予想されうる。ラテンアメリカおよび南アメリカからの先住アメリカ人は、例えば、おそらくベーリング海峡を渡る後期の波からの祖先により密接に所属するものと思われるアリュートインディアン(およびその他)のような北アメリカからの者より、より密接に初期の波の祖先に所属する可能性が高いものと思われる。アメリカインディアンへの所属を主張しているそれらの個体についての東アジア人所属は、親の供給源としての南メキシコナティボスの選択、および4群の人類学的スキームのみの使用の副産物である可能性がある、しかしそれでもなお、その答えは、科学的意味において「間違った」答えではない。むしろ、それは、親のサンプルの供給源により定義された合体時間尺度に関する所属およびこの研究のための世界が分割される人類学的に意義のあるやり方を報告している。 The parent sample for the Native Americans used in this study was drawn from Southern Mexico, and the AIM allele frequency established for Native Americans obtained from this sample is that of early migration across the Bering Strait. It can be expected to represent more ancestors from the waves. Indigenous Americans from Latin America and South America, for example, from those from North America, such as Aruth Indian (and others), who probably belong more closely to their ancestors from late waves across the Bering Strait, It seems likely that they belong more closely to the early wave ancestors. East Asian affiliation for those individuals claiming affiliation with American Indians may be a by-product of the choice of South Mexico Nativos as a parent source and the use of only four groups of anthropological schemes But the answer is nevertheless a “wrong” answer in the scientific sense. Rather, it reports an affiliation with respect to the coalescence time scale defined by the parent sample source and an anthropologically meaningful way of dividing the world for this study.

北アメリカ人群内の所属をお互いから、および東アジア人から決定しうるより多くのマーカーでの異なる検査は、異なる合体時間尺度において働き、これらの個体を小部分「アメリカインディアン」または「北アメリカの先住アメリカ人」混合として分類される可能性が高いものと思われる。それでもなお、検査に組み込まれた測定基準により明確にされているように、例証されたBGA検査が「先住アメリカ人」祖先の有意な数の個体が先住アメリカ人より東アジア人へのより多い所属を示すという事実は、集団所属の概念に基づく社会的またはヒトの歴史が意味論的、主観的であり、生物学的に意味のあるやり方において必ずしも正確とは限らないというさらにもう一つの例でありうる。たとえ、アリュート人が身体的特徴に関してたいていの先住アメリカ人と同じくらい、またはそれより多くまでも、東アジア人に似ているように思われうるとしても、およびたとえ、彼らが温帯性北アメリカに対してと同じくらい東アジアに対して近位の地理学的場所に土着しているとしても、彼らは、たいていの人により、彼らの家がベーリング海峡の東にあることから、北アメリカインディアンおよび、拡大により、先住アメリカ人であるとみなされる。同様の例は、下に考察されているように、特定の他の集団群について観察され、遺伝マーカーを用いる集団所属の測定と、地理学的および社会的境界から人が人種のアイデンティティを帰するために編み出したことの間の意思疎通の欠如を例証している。 Different tests with more markers that can determine affiliation within the North American population from each other and from East Asians work on different coalescence time scales, and subdivide these individuals into “American Indian” or “North American It is likely to be classified as a “Native American” mix. Nonetheless, as demonstrated by the metrics incorporated into the test, the illustrated BGA test shows that a significant number of individuals of “Indigenous Americans” ancestry have more affiliations to East Asians than Native Americans. The fact that shows social and human history based on the concept of group affiliation is yet another example where social or human history is semantic and subjective and not necessarily accurate in a biologically meaningful way. It is possible. Even if the Aleutes seem to resemble East Asians in terms of physical characteristics, or even more than most Native Americans, and even if they are in temperate North America Even if they are indigenous to a geographical location as close to East Asia as they are, most people say that because their homes are east of the Bering Strait, North American Indians and By extension, is considered an indigenous American. A similar example is observed for certain other population groups, as discussed below, and measures of population affiliation using genetic markers and the return of racial identity by people from geographical and social boundaries. Illustrates the lack of communication between things that have been devised.

BGA検査の認定を尊重して、いくつかの意義のある人類学的および/または社会学的知識を抽出するために結果を編集することは興味深い。BGA検査を用いて、検査された201人のアフリカ系アメリカ人のうちの11人が大部分インドヨーロッパ人および小部分アフリカ人BGAを示し、大部分アフリカ人血統のプエルトリコ人のは、ほとんどいつも、彼ら自身をヒスパニックと記載し、人造の構築物に基づく二分法による存在物としての人種の現在の概念についての欠陥を再び、指摘している。本研究において観察されているように、Rischら(前記、2002)およびRosenbergら(前記、2002)は、ゲノムマーカーに頼るBGAを報告する方法に対して検査した場合、大部分集団所属は、質問票において全く正確に報告されていることを示した。2,000人を超える個体の本検査は、大部分BGA所属は自己申告された人種から正確に予想されうること、その2つの間の不一致はさほど有意な事象ではないこと、および大部分祖先所属の測定は現行の自己申告している方法について主要な問題ではないことを示している。しかしながら、たぶん、最も驚くべき結果は、検査された各集団についての混合の程度であった。個体が混合を主張した場合、それは、ほとんどいつも、BGA検査で確認された；予想されたサハラ以南アフリカ人と東アジア人混合の各場合は、BGA検査で確認され、メキシコ系のあらゆるヒスパニックは、予想されたとおり、大部分または小部分のいずれかの先住アメリカ人混合を示した。 It is interesting to compile the results to extract some meaningful anthropological and / or sociological knowledge, respecting the accreditation of the BGA test. Using the BGA test, 11 out of 201 African-Americans tested showed mostly Indo-European and small-African BGA, and most of the Puerto Ricans in the most African pedigrees They describe themselves as Hispanic and point out again the deficiencies in the current concept of race as a dichotomy based on artificial constructs. As observed in this study, Risch et al. (Supra, 2002) and Rosenberg et al. (Supra, 2002) found that most population affiliations were questioned when tested against methods that report BGAs that rely on genomic markers. It showed that it was reported quite accurately in the vote. This test of over 2,000 individuals shows that, for the most part, BGA affiliation can be accurately predicted from the self-declared race, the discrepancy between the two is not a significant event, and most ancestry affiliations The measurement shows that it is not a major problem with current self-reported methods. However, perhaps the most surprising result was the degree of mixing for each population examined. If the individual claimed mixing, it was almost always confirmed by BGA testing; in each case of expected sub-Saharan African and East Asian mixing, it was confirmed by BGA testing, and every Mexican Hispanic As expected, it showed either a majority or a minority of Native Americans.

検査されたすべての「カフカス人」の2/3よりわずかに多くは、小部分東アジア人または先住アメリカ人混合を示し、事実上、これらの個体の誰も彼または彼女の質問票において少しの有意な血統混合も報告しなかった。この混合のいくらかは、民族性の機能であると思われる。比較的均一な自己申告された北および東ヨーロッパ人ヘリテージの個体がより一般に東アジア人BGAを示しただけでなく、Rosenbergら(前記、2002)は、この観察を支える「ヨーロッパ系アメリカ人」または「ヨーロッパ人」集団内に有意な構造があることを示した；具体的には、彼らは、ロシア人が一般に、小部分東アジア人ヘリテージを示すことを示した。そのような東アジア人/インドヨーロッパ人混合が定着した可能性があるところは歴史において多数回あり、例えば、モンゴル人のヨーロッパの侵略、およびコーカソイドのスカンジナビア半島への拡大を含み、そのラップ人の住人は北アジア人起源であり、モンゴル人の特徴を示し、東アジア人と共通した歴史および文化を共有する(Cavalli-SforzaおよびCanvalli-Sforza、前記、1995)。観察された先住アメリカ人混合のいくらかは歴史と一致している；例えば、フィリピン人が広範囲の先住アメリカ人混合を示すことは、スペイン人がラテンアメリカの大部分を征服し、最近までスペインの属領であったこれらの島へ先住アメリカ人奴隷を輸出したということから見ればさほど驚くべきことではない。かなりのフィリピン人について観察された先住アメリカ人混合の程度は実に高く、たぶん、先住アメリカ人混合は世界のこの地域において比較的ありふれたことであること、および多数のフィリピン人についての系図は、高く分極されたBGA割合の個体についての最近の混合よりむしろ、比較的低い先住アメリカ人混合の個体の大きな数で支配されていることを反映している。「カフカス人」に一般に観察される先住アメリカ人混合は、北アメリカにおけるヨーロッパ人と先住アメリカ人の人々の混合から来た。 Slightly more than two-thirds of all “caucasians” examined showed a small East Asian or Native American mix, and virtually none of these individuals had a little in his or her questionnaire No significant pedigree mixing was reported. Some of this mixing seems to be a function of ethnicity. Not only did relatively uniform self-reported northern and eastern European heritage individuals more commonly exhibit East Asian BGAs, but Rosenberg et al. (Supra, 2002) supported this observation by "European Americans" or We have shown that there is a significant structure within the “European” population; specifically, they have shown that Russians generally exhibit small East Asian heritage. Such East Asian / Indo-European mixes may have become established many times in history, including the Mongolian invasion of Europe and the expansion of the Caucasian to the Scandinavian peninsula, Residents are of North Asian origin, exhibit Mongolian character, and share a common history and culture with East Asians (Cavalli-Sforza and Canvalli-Sforza, supra, 1995). Some of the observed Native American mixes are consistent with history; for example, the fact that the Filipinos show a wide range of Native American mixes has meant that the Spanish have conquered most of Latin America and until recently the Spanish territory It is not surprising that they exported native American slaves to these islands. The degree of indigenous Americans observed for a significant number of Filipinos is indeed high, perhaps the indigenous Americans are relatively common in this part of the world, and the genealogy for many Filipinos is high. Rather than the recent mix for individuals with a polarized BGA proportion, it reflects a domination with a large number of relatively low Native American mix individuals. The Native American mix commonly observed in “Caucasians” came from a mix of European and Native Americans in North America.

スカンジナビア人/ロシア系インドヨーロッパ人および東アジア人、米国におけるサハラ以南アフリカ人/インドヨーロッパ人、フィリピンにおける先住アメリカ人/東アジア人混合、または米国におけるインドヨーロッパ人/先住アメリカ人の間のような体系的混合のほとんどの場合、関連した群の地理学的近接および/または歴史的混合は、歴史により十分に確立されている。例えば、アフリカ人/東アジア人混合はこれらの研究においてめったに観察されなかったし、これらの2つの集団が、お互いに、極めて接近して住んでいたまたは混合した時代の例を歴史はほとんど知らない。観察される混合の型はまた、「人種」の自己保有の概念と比較した場合、興味深かった。例えば、アフリカ系アメリカ人は、カフカス人がアフリカ人祖先と混合していたよりもインドヨーロッパ人祖先と多く混合しており、カフカス人およびアフリカ人が彼らのヘリテージをどのように見ているかにおける差を浮き彫りにし、「一滴ルール(one drop rule)」の想起をもたらす。観察された混合の程度を仮定すれば、混合の過程(ヒトの構築物に基づいているため、我々の人類学的文献からは、完全には証明されず、かつ定量化できない過程)から生じる隠されたまたは潜在的なBGA構造は、研究サンプルの群間に大まかな(または細かい)構造差を引き起こすことについて可能性的懸念がある。そのような構造における差は、大集団に基づく研究設計の効力および検出力を低下させることが予想されるものと思われる。 Such as between Scandinavian / Russian IndoEuropean and East Asian, Sub-Saharan Africa / IndoEuropean in the US, Native American / East Asian mixed in the Philippines, or IndoEuropean / Indigenous American in the US In most cases of systematic mixing, the geographical proximity and / or historical mixing of related groups is well established by history. For example, African / East Asian mixes were rarely observed in these studies, and little history is known of the times when these two populations lived or mixed very close to each other . The type of mixing observed was also interesting when compared to the “racial” self-contained concept. For example, African-Americans are more mixed with Indo-European ancestry than Caucasians were mixed with African ancestry, and the difference in how Caucasians and Africans see their heritage Relieve and bring back the “one drop rule” recall. Assuming the degree of mixing observed, the concealment arising from the mixing process (because it is based on human constructs, a process that cannot be fully proven and quantified from our anthropological literature). There is a potential concern that other or potential BGA structures may cause rough (or fine) structural differences between groups of study samples. Such differences in structure would be expected to reduce the efficacy and power of large population-based study designs.

分極され(低い混合)かつ自己申告された人種の2,200人を超える盲検対象者について、大部分祖先が所属の自己保有概念と矛盾しなかった場合、これらのような小部分混合割合の正確さを保証する仕方、および実存としてBGAの究極的な審判員が存在しない(あるいは、家系的情報を除いて、下記参照)とすれば、そのような正確さはどのようにして測定されうるかに関して問題が発生する。この問題に取り組むいくつかの実験が行われ、考え合わせれば、小部分割合は正確に測定されていることを示している。第一に、小部分混合パーセンテージは、独立組み合わせの遺伝的法則に一致した様式で家系図に沿って伝わる。定義により、大きな不偏の誤差は、それを、小部分割合が家系図の関係において矛盾していたようにさせるであろう、すなわち、親の割合が正しいと仮定すると、独立組み合わせの法則を考慮すれば、その結果は起こり得ないであろう。 For more than 2,200 blind subjects of polarized (low mix) and self-reported races, if the ancestry is largely consistent with the self-contained concept of affiliation, the accuracy of such minor mix proportions And how such accuracy can be measured if there is no BGA ultimate referee in existence (or see below, except for pedigree information). A problem occurs. Several experiments have been conducted to tackle this problem and, when considered together, indicate that fractional fractions are accurately measured. First, the fractional mixing percentage propagates along the pedigree in a manner consistent with the independent combination of genetic laws. By definition, a large unbiased error would make it appear that the fractional proportion was inconsistent in the pedigree relationship, i.e., assuming the parent proportion was correct, the independent combination law would be considered. If that is the case, the result will not occur.

第二に、小部分混合割合は、平均して、混合の自己保有概念と一致している。BGA所属の推定において大きな、体系的かつ不偏の誤差があった場合には、たいていの個体は比較的分極されたBGA所属をもつため、この誤差は、大きい方の割合パーセンテージの完全性よりも多く、小さい方の割合パーセンテージの完全性に強い影響を及ぼすだろうが、さほど気づかれないほどではない場合には、小さい方の混合割合/存在の、混合の自己保有概念との相関は、おそらく非常に弱いであろう。事実はこのとおりに観察されなかった。例えば、誤差率が20%くらいであった場合には、小部分ヒスパニック祖先を報告している、先住アメリカ人と同じくらい多くの個体が、少量のサハラ以南アフリカ人および東アジア人祖先を示したであろう；本結果は、これが事実ではないことを明らかに実証している。 Second, the fractional mixing ratio, on average, is consistent with the self-contained concept of mixing. If there is a large, systematic and unbiased error in the BGA affiliation estimation, this error is more than the larger percentage percentage completeness because most individuals have a relatively polarized BGA affiliation. , Which will have a strong impact on the completeness of the smaller percentage, but if it is not so noticeable, the correlation of the smaller ratio / presence to the self-contained concept of mixing is probably very It will be weak. The fact was not observed this way. For example, if the error rate was around 20%, as many individuals as native Americans reporting small Hispanic ancestry showed small amounts of sub-Saharan African and East Asian ancestry. This result clearly demonstrates that this is not the case.

第三に、北アメリカからの目隠しして検査された約2,200個のサンプルのうち、実質的な(>10%)サハラ以南アフリカ人混合を含む大部分東アジア人、または逆、をもつ個体は全く観察されなかった。そのような個体は、確証サンプルが引き出された北アメリカにおいて極めてまれであるため、この無観察は関連性がある。大きな、不偏かつ体系的な誤差率があった場合には、東アジア人/アフリカ人混合が、高頻度で観察されたヨーロッパ人-アジア人混合と同じくらいの頻度で観察されたであろう。 Third, out of the approximately 2,200 blind-blown samples from North America, most individuals with a substantial (> 10%) sub-Saharan African mix, or vice versa, It was not observed at all. This non-observation is relevant because such individuals are extremely rare in North America from which validation samples were drawn. Where there was a large, unbiased and systematic error rate, the East Asian / African mixture would have been observed as frequently as the European-Asian mixture observed with high frequency.

第四に、高く有意な相関が、アフリカ系アメリカ人において、小部分インドヨーロッパ人混合と皮膚メラニン含有量との間に観察された(実施例1参照)。大きく、不偏の誤差率があった場合には、そのような相関は観察される可能性は高くないであろう。 Fourth, a highly significant correlation was observed between a small Indo-European mix and skin melanin content in African Americans (see Example 1). If there was a large and unbiased error rate, such a correlation would not be likely to be observed.

第五に、2群間の所属の決定に関連したAIMについての真の対立遺伝子頻度が、親のδ値が群の与えられたペアにそれぞれ関連したAIMについて20%、減少していたように、調整される場合、BGA検査の総合的な検出力は、これらの2群間の所属割合を決定することに関して低下したが、本質的には、大部分所属、およびより重要なことには、小部分混合推定の両方に関して、同じ結果が生じた。親のサンプリング偏向は、こういうふうに、親の対立遺伝子頻度推定値およびδ値において不正確を引き起こしうるが、本研究についての親のサンプルが約100個の個体から成り立っていたとすれば、確実に20%未満である。さらに、対立遺伝子頻度推定値における誤差は、存在しうるそのような誤差についての群のペア間の所属を決定することに関連したAIMの大部分について同じ方向で得られなければならないであろう。それにもかかわらず、この結果は、たとえ、そのような誤差が存在したとしても、BGA検査の性能は比較的影響を及ぼされないであろうことを示している。換言すれば、この実験は、BGA検査が親のサンプリング偏向および対立遺伝子頻度推定値をものともせず、比較的強靱であることを実証した。 Fifth, as the true allele frequency for AIM related to affiliation decisions between the two groups was reduced by 20% for the AIM associated with each parental δ value for each given pair of groups When adjusted, the overall power of the BGA test has decreased with respect to determining the affiliation ratio between these two groups, but in essence, for the most part, and more importantly, The same result was produced for both submixture estimates. Parent sampling biases can thus cause inaccuracies in parental allele frequency estimates and δ values, but if the parent sample for this study consisted of about 100 individuals, it would definitely Less than 20%. Furthermore, errors in allelic frequency estimates would have to be obtained in the same direction for most of the AIMs associated with determining affiliation between group pairs for such errors that may exist. Nevertheless, this result indicates that the performance of the BGA test will be relatively unaffected even if such an error exists. In other words, this experiment demonstrated that the BGA test was relatively robust without parental sampling bias and allele frequency estimates.

第六に、ジェノタイピング失敗が関連したものに対応する軸のみに沿った信頼等高線のゆがみが、検査の互いにかみ合う構成要素(AIMのサブセット)中のまともな断絶を示している。換言すれば、サンプルが比例的適合の測定についてのいくつかの鋳型に適合している場合には、これらの鋳型の要素は、妥当であるべき検査について独立しているはずであり、得られた結果はこのとおりであった。 Sixth, the distortion of confidence contours along only the axis corresponding to the one associated with genotyping failure indicates a decent break in the interdigitated components of the examination (a subset of AIM). In other words, if the sample fits several templates for proportional fit measurements, the elements of these templates should be independent for the test to be valid and obtained The result was as follows.

上の6つの観察に照らして、どれくらい有意な体系的かつ不偏の誤差が存在しうるかを想像することは困難である。各観察は、それ自身だけで、正確さの問題点を提供しえないが、その結果をひとまとめにして考えることは、開示されたBGA検査においてほとんどまたは全く、体系的かつ不偏の誤差がないという十分な証拠を提供する。それにもかかわらず、検査における誤差がランダムではなく、線形様式の偏りがあることが議論されうる。しかしながら、第五の観察(上記)は、この可能性の反証となる。例えば、インドヨーロッパ人背景における小部分東アジア人混合の発見は、予想されていたよりも高頻度であり、「正しい」方向で「正しい」数のマーカーにおいて、「正しい」量の対立遺伝子頻度推定誤差が存在した場合(ありそうもない、しかし不可能ではない、状況)に、そのような結果が起こりうる。しかしながら、これらの研究に用いられたAIMマーカーは、特定の群のペアに相互排除的ではないため、そのような誤差は、ただ1つではなく、多くのペアについての所属の決定において顕在化するだろうし、最初の4つの観察(上記)が、これが事実ではないことを示している。また、そのような誤差は、高度に同系交配の群からの親のサンプリングを必要とするように思われる。そのようなサンプリングを避けるために多くの配慮がなされたが、そのように演繹的に行いうる検査は存在しないため、集団の要素がことごとく制御されるとは限らなかった。それとして、シミュレーションが、検査結果への線形誤差の寄与を推定するために行われた(第五の観察、上記)。混合結果がδ値における実質的な(20%)低下に対して比較的抵抗性であったという事実は、本研究で用いられたAIMの量および質が、予想されえた妥当なレベルのサンプリング誤差が結果の質に有意の有害な影響を及ぼさなかったように、十分な検出力をもっていたことを実証している。マーカーの質に関して、AIMの選択過程を考慮すれば、ゲノムにおける最良のマーカーの一部がBGA所属を測定するために用いられたため、この結果は妥当と思われる。マーカーの量に関して、これらの結果は、本明細書に開示されているような71個のマーカー検査からの結果がより早い30個のマーカー検査(Shriverら、前記、2003)から生じたものと非常に類似していたという観察と一致している。このように、平均のマーカー質を保持しながら、マーカーの数を減少させることは、検査を損なわなかった。さらになお、もう一つの研究において、量ではなく、マーカーの質を低下させることは、同じ結果を生じた。全体的に見て、これらの観察は、検査により作成されたBGA割合は、与えられた信頼区間に関して正確であること、および検査は強靱な様式で行われることを示している。 In light of the above six observations, it is difficult to imagine how significant systematic and unbiased errors can exist. Each observation by itself may not provide accuracy issues, but considering the results together means that there is little or no systematic and unbiased error in the disclosed BGA test. Provide sufficient evidence. Nevertheless, it can be argued that the errors in the inspection are not random and there is a linear style bias. However, the fifth observation (above) is a proof of this possibility. For example, the discovery of a small East Asian mix in an Indo-European background is more frequent than expected, with a “right” amount of allele frequency estimation error at the “right” number of markers in the “right” direction. Such a result can occur if is present (a situation that is unlikely but not impossible). However, since the AIM markers used in these studies are not mutually exclusive to a particular group of pairs, such errors are manifested in affiliation decisions for many pairs rather than just one And the first four observations (above) show that this is not the case. Also, such errors appear to require parent sampling from highly inbred groups. Many considerations have been made to avoid such sampling, but there are no tests that can be performed a priori, so the elements of the group were not always controlled. As such, a simulation was performed to estimate the contribution of linear error to the test results (fifth observation, above). The fact that the mixing results were relatively resistant to a substantial (20%) drop in δ values, the amount and quality of AIM used in this study was expected to be a reasonable level of sampling error Demonstrates that it had sufficient power so that it did not have a significant adverse effect on the quality of the results. Regarding the quality of the markers, considering the AIM selection process, this result seems reasonable because some of the best markers in the genome were used to measure BGA affiliation. With respect to the amount of markers, these results were very similar to those resulting from 30 marker tests (Shriver et al., Supra, 2003) with earlier results from 71 marker tests as disclosed herein. Is consistent with the observation that Thus, reducing the number of markers while retaining the average marker quality did not compromise the test. Furthermore, in another study, reducing the quality of the marker, not the quantity, produced the same result. Overall, these observations indicate that the BGA percentage produced by the test is accurate for a given confidence interval, and that the test is performed in a robust manner.

本結果は、BGA混合が以前に考えられていたより、一般であることを実証している。真実とすれば、細かいレベルのBGA混合はヒト生物学、例えば、薬物応答性または疾患素質、と結びつけられるかどうかが問われるべきである。そのような「潜在的な」構造は、大まかな集団構造とは違って、質問票が用いられることができないため、分子検査を用いて測定されうるのみである。質問票を用いて測定されるより粗い大陸のレベルを超える集団構造の細かいレベルが認識されている。例えば、有意な逸話の証拠は、赤毛の人は多くの一般の麻酔薬の20%多い投与量を必要とし、麻酔中の間、高血圧および出血への傾向を示すことを示唆している(Cohen、The Scientist 16:10、2002)。これらの複雑な生理的応答は、赤の髪の色における変化の一部と以前に結びつけられたメラノコルチン-1(MC1R)変異体に基づいて説明するのは困難であるように思われる(Robbinsら、Cell 72:827-834、1993；Smithら、J. Invest. Dermatol. 111:119-122、1998；Flanaganら、Hum. Molec. Genet. 9:2531-2537、2000)。これらの臨床的表現型の原因である可能性が高い特異的な遺伝子変異体があり、これらの変異体が集団構造または微細構造の要素と相関している場合、それらはそうであるように見えるが、マーカーと、関連した表現型活性遺伝子座との間の連鎖またはLDを同定しようとしているいずれの研究も、研究設計段階における最初から困難を背負っているように思われる。社会文化的人種の自己申告により提供されるものより高い精度および客観性は、遺伝学実験の設計を妨げうる構造の要素を同定することが必要とされると思われるため、例証されたBGA検査は、この特定の問題に関連した集団構造の要素を定量化するために、本明細書に開示されたものに加えたAIMを用いて拡張されうる。 The results demonstrate that BGA mixing is more common than previously thought. If true, it should be questioned whether a fine level of BGA mixing is linked to human biology, such as drug responsiveness or disease predisposition. Such “latent” structures can only be measured using molecular tests, unlike the general population structure, since questionnaires cannot be used. A fine level of collective structure has been recognized that exceeds the level of the coarser continent measured using questionnaires. For example, evidence of significant anecdotes suggests that redheads require 20% higher doses of many common anesthetics, and tend to hypertension and bleeding during anesthesia (Cohen, The Scientist 16:10, 2002). These complex physiological responses appear to be difficult to explain based on melanocortin-1 (MC1R) mutants previously associated with some of the changes in red hair color (Robbins et al. Cell 72: 827-834, 1993; Smith et al., J. Invest. Dermatol. 111: 119-122, 1998; Flanagan et al., Hum. Molec. Genet. 9: 2531-2537, 2000). If there are specific genetic variants that are likely to be responsible for these clinical phenotypes, and these variants correlate with elements of population structure or fine structure, they appear to be However, any study seeking to identify a linkage or LD between a marker and the associated phenotypically active locus appears to carry difficulties from the outset in the study design phase. The higher accuracy and objectivity than that provided by socio-cultural racial self-reporting may require the identification of structural elements that may interfere with the design of genetic experiments, so the illustrated BGA The test can be extended with AIM in addition to those disclosed herein to quantify the elements of the population structure associated with this particular problem.

実施例3
BGA検査の系図学への適用
この実施例は、BGA混合推定が伝統的な系図学的研究方法を用いて得られた系図学的情報と統合されうることを実証している。 Example 3
Application of BGA testing to genealogy This example demonstrates that BGA mixture estimation can be integrated with genealogical information obtained using traditional genealogical research methods.

系図学者は、主として、人類学的内容において(例えば、集団混合のどの型がその人の家系図を特徴付けるか)よりむしろ、地政学的内容において(例えば、人の祖先がどの国からであるか、彼らの宗教は何だっかた、および彼らの姓に関するデータ)関連しているデータを収集する。一人の結果において小部分混合を得るための2つの主な供給源がある：1)最近の族外婚混合事象；および2)体系的混合の特徴を持つ民族群への古代の所属。 Genealogists are primarily concerned with geopolitical content (e.g. from which country an ancestor of a person comes) rather than in anthropological content (e.g. which type of population mix characterizes the family tree of the person). Collect relevant data), what was their religion, and data about their surname. There are two main sources for obtaining submixes in the results of one person: 1) recent territorial marriage mixed events; and 2) ancient affiliations to ethnic groups with the characteristics of systematic mixing.

族外婚事象の結果は、最近の系図学的時間(例えば、この250年間)において測定される。例えば、図11に示されているように、他のすべては均一なインドヨーロッパ人家系図における中国人の曾祖父は、インドヨーロッパ人/東アジア人混合の孫を生じる。100%東アジア人(中国人)である個体は網掛けで示されており(図11)、家系図の一番下の男性(四角)(短い矢印)についての混合結果が対象となる。1人の100%東アジア人の曾祖父および7人の100%インドヨーロッパ人の曾祖父母をもつ人は、12.5%東アジア人混合をもつことが予想される。遺伝的組み合わせの法則により、予想されるレベルは、実際に、数パーセント上下に可能性がある値をもつ、12.5%あたりの範囲である。長い矢印で示された祖母は、約50%/50%東アジア人/インドヨーロッパ人混合であり、彼女の娘、対象者の母親、は25%/75%東アジア人/インドヨーロッパ人混合であると予想される(図11)。 The consequences of extramarital marriage events are measured in the recent genealogical time (eg, in the last 250 years). For example, as shown in FIG. 11, a Chinese great-grandfather in an all-even uniform Indo-European pedigree gives rise to a mixed Indo-European / East Asian grandson. Individuals who are 100% East Asian (Chinese) are shaded (Fig. 11) and are subject to the mixed results for the lowest male (square) (short arrow) in the family tree. A person with one 100% East Asian great-grandfather and seven 100% Indo-European great-grandparents is expected to have a 12.5% East Asian mix. Due to the laws of genetic combinations, the expected level is actually a range around 12.5%, with values that can be up or down a few percent. The grandmother indicated by the long arrow is about 50% / 50% East Asian / Indo-European mixed, and her daughter, the mother of the subject, is 25% / 75% East Asian / Indian European mixed. Expected to be (Figure 11).

古代の所属(すなわち、人類学的時間枠に関してみなされる)は、族内婚の、比較的、地理学的に隔離された、結びつきの緊密な共同体構造(すなわち、民族性)により現代に保存されている。例えば、今日の人口統計学は、新しい集団を設立するために我々の祖先が行った移住によるだけでなく、世界中を通じてのこれらの集団間の混合により形づけられている。下の地図は、何万年間に渡って生じたY染色体配列から測定された場合のこれらの移住を示している。 Ancient affiliations (i.e., considered in terms of anthropological timeframes) are preserved in modern times due to the relative, geographically isolated, and tightly connected community structure (i.e. ethnicity) of family marriages. ing. For example, today's demographics are shaped not only by the migration that our ancestors made to establish new populations, but also by a mix between these populations throughout the world. The map below shows these migrations as measured from Y chromosome sequences generated over tens of thousands of years.

それぞれが別々の群として発展した後の、群間の混合は、古代およびずっと最近の歴史を通じて何度も起こり、移住パターンを表している地図上の矢印により表されている。例えば、ロシア人および東ヨーロッパ人において広範囲な東アジア人混合があり(Rosenbergら、前記、2003)、モンゴル人およびフンの侵略が長期間に渡ってこの混合に寄与した可能性がある程度は、謎のままである。先住アメリカ人について明らかなよりいっそう著しい東アジア人混合があった(Rosenbergら、前記、2003)；はるかに多すぎる矢印が必要とされたであろうし、混合事象の大部分は知られていないため、矢印はこの混合について地図上に含まれなかった。それにもかかわらず、彼らの家系図においてかなりの数の先住アメリカ人またはロシア人をもつ人は、100%中国人祖母および3人の他の100%インドヨーロッパ人祖父母をもつ個体と同じくらいの東アジア人混合を非常によく示しえた。 Intermixing between groups, after each developing as a separate group, occurs many times throughout ancient and much more recent history, and is represented by arrows on the map representing migration patterns. For example, there is an extensive East Asian mix in Russians and Eastern Europeans (Rosenberg et al., Supra, 2003), and to some extent the Mongolian and Hun invasion may have contributed to this mix over time. Remains. There was a much more pronounced East Asian mix apparent for Native Americans (Rosenberg et al., Supra, 2003); much more arrows would have been needed and most of the mix events were unknown The arrow was not included on the map for this mixture. Nonetheless, people with a significant number of indigenous Americans or Russians in their pedigree are as east as individuals with a 100% Chinese grandmother and three other 100% IndoEuropean grandparents. The Asian mix was very good.

示されてはいないが、時間尺度は、最も意義のある移住が起きた時期を示すように構築され、とても、とても大きな家系図に対応した。家系図は、三角形グラフの一番下の頂点に位置する単一の個体についてである；この人について何万人という祖先がいる場合、60,000年間さかのぼるため、それは大きい。移住についての時間尺度は、その上、大きな家系図に適用する。家系図は、家系地図(図11)に示されたものと同じだったが、ただし、ずっと大きく、祖先を繋ぐ線がなかった(スポットは各祖先を表したが、非常に多くあったので、スポットを繋ぐ線をすべて示すことは実際的ではなかった)。スポットのプールは、「ロシア人」を表し、この例の目的のために、約18,000年前に生まれた民族であると仮定された。追加のスポットは東アジア人を表し、平均ロシア人が10%東アジア人混合を含むという仮定に基づいた。スポットの第三セットは、ロシア人の先駆者を表した；これらの先駆者は未知であるが、この例の目的のために、東ヨーロッパ人であると仮定された。 Although not shown, the time scale was constructed to show when the most significant migrations occurred and corresponded to a very large family tree. The pedigree is for a single individual located at the bottom vertex of the triangle graph; if there are tens of thousands of ancestors for this person, it is large because it dates back to 60,000 years. The time scale for migration applies to large family trees as well. The family tree was the same as that shown on the family tree map (Figure 11), except that it was much larger and had no lines connecting the ancestors (the spots represented each ancestor, but there were so many It was not practical to show all the lines connecting the spots). The pool of spots represents "Russians" and for the purposes of this example was assumed to be a race born about 18,000 years ago. The additional spots represent East Asians, based on the assumption that the average Russians include a 10% East Asian mix. The third set of spots represented Russian pioneers; these pioneers were unknown but for the purposes of this example were assumed to be Eastern Europeans.

この例において、この人のロシア人民族性の大部分は、家族の対象者の父親側を表す家系図のその部分であると仮定されうる、家系図の左側から来た。この例が示したように、平均ロシア人が10%東アジア人混合を含み、その人の家系図の半分が主にロシア人である場合には、その人は5%東アジア人混合を含むと予想される。たとえ、その人の祖母、祖父または過去18,000年間内のいずれの他の親類も均一な東アジア人ではなかったとしても、東アジア人混合はこの人について有意である。これを家系図上に視覚化する方法は、すべての「東アジア人」スポットを数え、それらを家系図におけるスポットの総数で割ることであり、それは約5%に達する。このように、比較的均一な東アジア人は、この人についての祖先の総数の約5%を示した。もちろん、いくらかの人々についての家系図は、小さな程度のこの型の混合を特徴とする多数の群を含む。例証されたもののような家系図は、特定の民族について分極化されており、4つのBGA群(サハラ以南アフリカ人、先住アメリカ人、インドヨーロッパ人および東アジア人)のそれぞれの均等な分布をもつ家系図を見ることはまれである、なぜなら、最近まで、および今でさえ、ある程度は、人々は、彼ら自身と同様の他人と子どもをもつ傾向にあったからである。それとして、たいていの家系図は、ランダムな所属の「ごたまぜ」ではなく、例証されているように高度に分極化されている。 In this example, the majority of the Russian ethnicity of this person came from the left side of the family tree, which can be assumed to be that part of the family tree that represents the father side of the family subject. As this example shows, if the average Russian contains a 10% East Asian mix and half of the family tree is predominantly Russian, then that person contains a 5% East Asian mix It is expected to be. Even if the person's grandmother, grandfather, or any other relative within the last 18,000 years is not a uniform East Asian, East Asian mixing is significant for this person. The way to visualize this on a family tree is to count all “East Asian” spots and divide them by the total number of spots in the family tree, which amounts to about 5%. Thus, relatively uniform East Asians represented about 5% of the total number of ancestors for this person. Of course, the family tree for some people includes multiple groups that are characterized by a small degree of this type of mixing. A family tree, such as the one illustrated, is polarized for a particular ethnicity and has an even distribution of each of the four BGA groups (sub-Saharan Africa, Native American, Indo-European, and East Asian) It is rare to see a family tree because, until recently and even now, people have tended to have others and children as well as themselves. As such, most pedigrees are highly polarized, as illustrated, rather than "random" of random affiliation.

ペンシルヴェニアダッチが古代の所属による混合のもう一つの例を提供し、この共同体において、ドイツ人祖先において、1700年より前に、有意な東アジア人含有量が存在したと思われる。これらの祖先は、上部ライン川流域における谷、その後、後になって、ずっとさらに遠くの内陸、に場所を占める共同体を設立した。これらの共同体は比較的隔離されたままであったため、東アジア人のレベルは、約20%レベルのままであった。このレベルの希釈は、東アジア人混合が検出できないフランス人またはサルデーニャ人のような他のインドヨーロッパ人民族との対外的混合を必要とする。この場合、ドイツ人祖先は、実質的な平均東アジア人混合をもったが、おそらく、より不均一なドイツ人集団内からのサンプリングによると思われる。 Pennsylvania Dutch provides another example of mixing by ancient affiliations, and it appears that there was significant East Asian content in the community before 1700 in the German ancestry. These ancestors established a community that occupies a place in the valley in the Upper Rhine basin, and later, inland farther away. Since these communities remained relatively isolated, the level of East Asians remained at about 20% level. This level of dilution requires external mixing with other Indo-European ethnic groups such as French or Sardinians who cannot detect East Asian mixing. In this case, the German ancestry had a substantial average East Asian mix, possibly due to a more heterogeneous sampling within the German population.

たいていの系図学者は、族外婚混合の方法により到達した源に見出される混合の型に関心をもつが、それは遠い祖先についての人類学的情報よりむしろ、最近の祖先の地政学的所属についての情報を与える。これは、最近の人と比較して、より遠い祖先についての文書データがほとんど存在しないからである；人が時間をさかのぼるのが遠ければ遠いほど、存在する祖先の数が大きくなり、不可能ではないとしても、研究を困難にさせる；および遠い祖先の現代の人の遺伝的構成への寄与は、より最近の祖先のものより親類あたり平均して低いからである。それとして、系図学者は、最近の混合により何が生じうるかというような情報を捜す傾向にある。例えば、人がアメリカインディアン祖先のうわさまたは伝説を証明するまたは反証を挙げようとしている場合には、10%先住アメリカ人混合結果は、この混合についての機構が最近の族外婚混合事象によったこと；および体系的混合の特徴がある民族群への古代の所属によったのではないことが保証されうるとすれば、非常に有用であるだろう。BGA検査は、族外婚対古代の混合の間の区別を可能にしないが、検査の結果は、系図学的パズルの重要なピースを提供する。 Most genealogists are interested in the type of mixing found in the sources reached by the method of mixed marriage, which is more about the geopolitical affiliation of recent ancestors than the anthropological information about distant ancestry. Give information. This is because there is little document data about distant ancestors compared to modern people; the farther a person goes back in time, the greater the number of ancestors present, which is impossible If not, it makes the study difficult; and the contribution of distant ancestors to the genetic makeup of modern people on average is lower per relative than that of more recent ancestors. As such, genealogists tend to look for information such as what can happen with recent mixing. For example, if a person is trying to prove or disprove a rumor or legend of an American Indian ancestor, the 10% Native American mixed result was due to a recent mixed affiliation event with a mechanism for this mixing It would be very useful if it could be assured that it was not due to an ancient affiliation to ethnic groups with characteristics of systematic mixing. BGA testing does not allow a distinction between alien marriage versus ancient mixing, but the results of the test provide an important piece of genealogical puzzles.

家系図に依存しているいくらかの系図学者にとって、証拠が、混合の機構が最近の事象からであることを強く示唆する場合がある。それとして、家族がアメリカインディアンの曾祖父の文書証拠をもつ人について、BGA検査による10%先住アメリカ人混合結果は、その事象が最近の混合により起きた可能性が高いことを示しうる。比較して、確認された、均一なヨーロッパ人祖先の人について、10%先住アメリカ人混合は、その事象が古代の混合により起きた可能性が高いことを示している。 For some genealogists who rely on pedigrees, the evidence may strongly suggest that the mechanism of mixing is from recent events. As such, for a person whose family has documented evidence of an American Indian great-grandfather, a 10% Native American mixed result from a BGA test may indicate that the event was likely caused by recent mixing. In comparison, for a confirmed, uniform European ancestor, a 10% Native American mix indicates that the event is likely caused by an ancient mix.

否定的結果は、系図学者にとって肯定的結果とは異なる意味をもつ。例えば、純血のアフリカ人の曾祖父を示唆している情況的な、しかし低い質の、データがあり、かつBGA検査が100%インドヨーロッパ人と明らかにする場合には、そのうわさの価値が減ぜられるだろう(独立組み合わせの遺伝的法則を考慮に入れる、それは、そのデータが実際、正しかった場合には、そのような結果を可能ではあるが、可能性は低くいものにさせるだろう)。しかしながら、ある人の家族が中国人の曾祖父をもっていたのではないかと思う場合には、族外婚の混合と古代の混合を区別することが可能ではないため、20%東アジア人混合結果からそれを証明することはできない。 Negative results have a different meaning for genealogists than positive results. For example, if there is contextual but low quality data suggesting a purebred African great-grandfather and the BGA test reveals 100% Indo-European, the value of the rumor will be reduced (Taking into account the independent combination of genetic laws, which would make such a result possible but less likely if the data was indeed correct). However, if you suspect that a family member had a Chinese great grandfather, it is not possible to distinguish between a mixed alien marriage and an ancient one, so from the 20% East Asian mixed results Can not prove.

系図学者が、混合結果について最も可能性が高い源を再構築するために他の知識を生かすことは重要である。実際、BGA混合データは、家系を再構築しようとしている人にとって独立した手掛かりとしての役割を果たし、それが系図学的知識と共に用いられる場合、その2つは結合して、いずれか単独よりも強力である証拠を形成する。それとして、BGA検査は、古代または非常に古い(系図学的時間枠に関して、この250〜300年間を含む)混合を現代の我々に伝えた人類学的源、およびこの200年間の家系図における事象による族外婚混合に均等に重きをおく様式においてBGA混合結果を提供することにより、家系共同体のためにあつらえられる系を形作るのを助けることができる補助的なツールを提供する。 It is important that genealogists use other knowledge to reconstruct the most likely source of mixed results. In fact, BGA mixed data serves as an independent clue to those trying to reconstruct a family, and when used with genealogical knowledge, the two combine to be more powerful than either alone To form evidence. As such, BGA testing is an anthropological source that communicated ancient or very old (including this 250-300 years in terms of genealogical timeframe) to us today, and events in the family tree over the last 200 years Providing BGA blending results in a manner that places equal emphasis on blending with family members by means of providing ancillary tools that can help shape custom systems for the pedigree community.

数千個のBGAプロフィールのデータベースは、世界中の様々な場所の人々から構築されていて、あらかじめ選択された誤差範囲を加えてまたは引いて、プロフィールでデータベースに質問することができる。この型のプロフィールが一般に見出されている場所のリストが提供されうる、または、例えば、色で分類されている世界の地図が提供されうり、色は、混合プロフィール/範囲に対応する可能性が高い地域の所属を示す。換言すれば、BGA混合プロフィールを仮定すれば、地図は、その人の最近の祖先が出てきたと思われる場所を示すことを提供しうる。10%東アジア人および90%インドヨーロッパ人をもつ人は、中国(族外婚混合)またはロシア(家系図を越える民族的均一性と結合されたより古代の混合)からの祖先の起源の高確率を示す。 A database of thousands of BGA profiles is built from people from various locations around the world, and you can query the database with profiles with or without a preselected error range. A list of places where this type of profile is commonly found can be provided, or a map of the world can be provided, for example by color, where the color can correspond to a mixed profile / range. Indicates high affiliation. In other words, given a BGA blending profile, the map can provide an indication of where the person's recent ancestors may have appeared. People with 10% East Asians and 90% Indo-Europeans have a high probability of ancestral origin from China (mixed marriages outside of tribe) or Russia (more ancient blends combined with ethnic homogeneity across the family tree) Indicates.

同様に、系図学者は、人類学的情報よりむしろ地政学的情報に基づいている文書研究から導かれる、同様に色で分類された地図を提供することができる。その2つの地図はその後、オーバーレイされうり、最近の家系の最も可能性が高い推定を提供するためにBGA検査からの情報を文書系図の情報と結合させるベイズの統計的計算がなされうる。 Similarly, genealogists can provide similarly color-categorized maps derived from documentary studies that are based on geopolitical information rather than anthropological information. The two maps can then be overlaid and Bayesian statistical calculations can be made that combine the information from the BGA examination with the information in the document pedigree to provide the most likely estimate of a recent family tree.

例として、90%インドヨーロッパ人および10%東アジア人BGAならびにルーマニア人/英国人/スペイン人祖先の文書系図をもつ人が、第一に、彼または彼女のBGA結果でデータベースに質問する、源がおそらく東アジア(最近の混合による)、ロシアおよび北/東ヨーロッパ(両方とも、比較的隔離かつ混合された群からの多数のより遠い祖先による)からであるとして示される地図が提供される。色での分類は、適合性のあるBGA群が各地域に見出される頻度に基づいて地域からの起源の確率を与え、それは、混合の型、および世界中でのサンプリングの作用である本発明者らのデータベースの特徴に依存して全く複雑でありうる。第二に、その人は、本発明者らが提供しうる地図製図ツールを用いて、系図学的研究から文書で証明されたもっともらしいルーマニア人/英国人/スペイン人ヘリテージに基づいた別途の地図を提供する(または提供される)。この地図から、最近の均一な東アジア人祖先の見込みは高くないことは明らかである。第三に、プログラムは、10%東アジア人混合の最も可能性が高い起源はルーマニア人祖先(英国人またはスペイン人ではない、かつ例えば、中国人の祖父によるのではない)からであることを決定する。 As an example, a source with a 90% Indo-European and 10% East Asian BGA and a Romanian / British / Spanish ancestry pedigree first queries the database with his or her BGA results. Maps are provided that are likely to be from East Asia (due to recent mixing), Russia and North / Eastern Europe (both by a number of distant ancestors from relatively isolated and mixed groups). The classification by color gives the probability of origin from a region based on the frequency with which a compatible BGA group is found in each region, which is the type of mixture, and the inventor who is the effect of sampling around the world. Depending on the characteristics of these databases, they can be quite complex. Second, he uses a map drafting tool that we can provide, and uses a separate map based on plausible Romanian / British / Spanish heritage documented from genealogical studies. (Or provided). From this map it is clear that the prospects for recent uniform East Asian ancestry are not high. Third, the program states that the most likely origin of 10% East Asian mixing is from Romanian ancestry (not British or Spanish and not by, for example, a Chinese grandfather). decide.

この型の提示は、人が、系図学的研究のような他の手段を通して得られた以前の知識を用いて、予想外の混合結果の最も可能性が高い源を知ることを可能にする。これは、遺伝的構成の起源を解明しようと努める系図学者にとって価値がある。この補助的なツール無しでは、90%インドヨーロッパ人および10%東アジア人混合をもつ人は、検査が最近の中国人もしくは日本人の祖父/曾祖父を示唆したのか、または小部分東アジア人混合が一般に見出される特定の民族所属を示唆したのかを測定する手っ取り早い手段をもたないものと思われる。 This type of presentation allows one to know the most likely source of unexpected mixed results using previous knowledge gained through other means such as genealogical studies. This is valuable for genealogists who seek to elucidate the origin of the genetic makeup. Without this ancillary tool, those with 90% Indo-European and 10% East Asian mixing may have a test suggesting a recent Chinese or Japanese grandfather / great-grandfather, or a small East Asian mixing There seems to be no quick way to measure whether or not it suggested a particular ethnic affiliation commonly found.

実施例4
虹彩色素形成と生物地理学的祖先の関連
この実施例は、AIMを用いて測定された潜在的集団構造が、虹彩の色のような複雑な遺伝的形質に関する推論を可能にすることを実証している。 Example 4
This example demonstrates that the potential population structure measured using AIM enables inferences about complex genetic traits such as iris color. ing.

共通の多型が虹彩の色の自然の分布と関連しているかどうか、およびどのようにか、を測定するために、主としてヨーロッパ人系の851人の個体が、13個の色素形成遺伝子での335個のSNP遺伝子座、およびゲノム中に分布し、集団構造の特定の要素について情報を与えることが知られているまたは考えられる419個の他のSNPにおいて調べられた。多数のSNP、ハプロタイプおよびディプロタイプ(ハプロタイプの二倍体対)、がOCA2、MYO5A、TYRP1、AIM、DCTおよびTYR遺伝子、ならびに虹彩の色と有意に関連しているCYP1A2-15q22-ter、CYP1B1-2p21、CYP2C8-10q23、CYP2C9-10q24およびMAOA-Xp11.4領域内で同定された。関連したSNPの半分は、染色体15上に位置したが、他の人が以前に連鎖解析から得られた結果と一致している。ハプロタイプおよび/またはディプロタイプをもつ、5個の追加の遺伝子(ASIP、MC1R、POMCおよびSILV)ならびに1個の追加の領域(GSTT2-22q11.23)が同定されたが、虹彩の色に関連した個体のSNP対立遺伝子ではなかった(国際公開公報第02/097047号も参照)。その遺伝子の大部分について、多座の遺伝子様式の遺伝子型配列は、ハプロタイプまたはSNP対立遺伝子より強く虹彩の色に関連していた。これらの遺伝子のディプロタイプが虹彩の色変動の15%を説明する。これらの結果は、変動性虹彩色素形成についての包括的な候補遺伝子研究を提供し、DNAからの虹彩の色の推論に有用な分類モデルを構成する。結果は、ゲノムが適切なAIMでスクリーニングされている場合には、潜在的集団構造が複雑な形質遺伝子マッピングのためのてこの力のツールとしての役割を果たしうることをさらに実証している。 To measure whether and how common polymorphisms are associated with the natural distribution of iris colors, 851 individuals of mainly European descent were found in 13 chromogenic genes. It was examined in 335 SNP loci and 419 other SNPs that are known or thought to be distributed in the genome and inform about specific elements of the population structure. Numerous SNPs, haplotypes and diplotypes (haplotype diploid pairs) are significantly associated with OCA2, MYO5A, TYRP1, AIM, DCT and TYR genes, and iris color CYP1A2-15q22-ter, CYP1B1- Identified within the 2p21, CYP2C8-10q23, CYP2C9-10q24 and MAOA-Xp11.4 regions. Half of the relevant SNPs were located on chromosome 15, but others are consistent with results previously obtained from linkage analysis. Five additional genes (ASIP, MC1R, POMC and SILV) and one additional region (GSTT2-22q11.23) with haplotypes and / or diplotypes were identified but related to iris color It was not an individual's SNP allele (see also WO 02/097047). For most of the genes, the multilocus genotypic genotype sequence was more strongly associated with iris color than the haplotype or SNP allele. The diplotypes of these genes account for 15% of iris color variation. These results provide comprehensive candidate genetic studies for variable iris pigmentation and constitute a classification model useful for inferring iris color from DNA. The results further demonstrate that potential population structure can serve as a leverage tool for complex trait gene mapping when the genome is screened with the appropriate AIM.

虹彩色素形成は、遺伝学者、人類学者および社会全般に興味を起こさせてから久しい複雑な遺伝的形質であるが、まだ完全には理解されていない。ユーメラニン(褐色色素)は、メラノソームと呼ばれる分化したメラノサイトリソソームにおいて合成される光吸収性ポリマーである。メラノソーム内で、チロシナーゼ(TYR)遺伝子産物は、チロシンの3,4-ジヒドロキシフェニラニンまたはDOPAへの律速ヒドロキシル化を触媒し、その結果生じた産物は、DOPAキノンへ酸化され、ユーメラニン合成のための前駆物質を形成する。TYRはこの過程にとって中心的に重要であるが、動物における色素形成は、単にTYRまたは任意の他の単一なタンパク質産物もしくは遺伝子配列のメンデルの関数ではない。実際、人および様々なモデル系における色素形成形質についての伝達遺伝学の研究は、変動性色素形成は相互作用が全く複雑であるように見える複数の遺伝性の因子の関数であることを示唆している(例えば、Akeyら、前記、2002；Boxら、J. Invest. Dermatol. 116、224-229、2001を参照)。例えば、ヒトの髪の色と違って(Sturmら、Gene 277:49-62、2001)、哺乳動物の虹彩の色の決定について微量優性成分のみがあるようにみえ(BrauerおよびChopra、Anthropol Anz. 36:109-120、1978)、与えられた集団の個体内または個体間において、皮膚、髪と虹彩の色の間に最小限の相関が存在している。対照的に、集団間の比較は、良い一致を示す；黒い方の平均虹彩の色をもつ集団はまた、黒い方の平均皮膚色調および髪の色を示す傾向がある。これらの観察は、様々な組織における色素形成についての遺伝的決定子は別個であること、ならびにこれらの決定子は、世界集団においてそれらの分布を形づくった、共通の1組の体系的および進化的力を受けやすかったことを示唆している。 Iris pigmentation is a complex genetic trait that has long been of interest to geneticists, anthropologists and society in general, but is not yet fully understood. Eumelanin (brown pigment) is a light-absorbing polymer synthesized in differentiated melanocyte lysosomes called melanosomes. Within the melanosome, the tyrosinase (TYR) gene product catalyzes the rate-limiting hydroxylation of tyrosine to 3,4-dihydroxyphenylanine or DOPA, and the resulting product is oxidized to DOPA quinone for eumelanin synthesis Forming precursors. Although TYR is centrally important for this process, pigmentation in animals is not simply a Mendelian function of TYR or any other single protein product or gene sequence. Indeed, transmission genetics studies on pigmentation traits in humans and various model systems suggest that variable pigmentation is a function of multiple hereditary factors that appear to be quite complex in interaction. (See, eg, Akey et al., Supra, 2002; Box et al., J. Invest. Dermatol. 116, 224-229, 2001). For example, unlike human hair color (Sturm et al., Gene 277: 49-62, 2001), it appears that there is only a minor component in determining the color of a mammalian iris (Brauer and Chopra, Anthropol Anz. 36: 109-120, 1978), there is a minimal correlation between skin, hair and iris colors within or between individuals in a given population. In contrast, comparisons between populations show good agreement; populations with a black average iris color also tend to exhibit a black average skin tone and hair color. These observations indicate that genetic determinants for pigmentation in various tissues are distinct, and that these determinants are a common set of systematic and evolutionary forms that have shaped their distribution in the world population It suggests that it was easy to receive power.

細胞レベルにおいて、健康なヒトにおける変動性虹彩の色は、虹彩における一定数の間質性メラノサイト内でのメラニン色素顆粒の示差的沈着の結果である。顆粒の密度は、幼少期初期までには遺伝的に決定されたレベルに達するように見え、ごく少数派の個体は人生の後期の間に色の変化を示すが、通常は、後期の人生を通じて不変のままである。血統研究は、虹彩の色変動は、2つの遺伝子座の機能であることを示唆した；皮膚または髪に影響を及ぼさない、色素脱失の原因である一つの遺伝子座、およびすべての組織における色素の減少についてのもう一つの多面発現性の遺伝子(Brues、Amer. J. Phys. Anthropol. 43:387-391、1975)。 At the cellular level, the variable iris color in healthy humans is the result of differential deposition of melanin pigment granules within a certain number of interstitial melanocytes in the iris. Granule density appears to reach a genetically determined level by early childhood, with a minority of individuals showing a color change during later life, but usually throughout later life It remains unchanged. Pedigree studies have suggested that iris color variation is a function of two loci; one locus that causes depigmentation that does not affect skin or hair, and pigments in all tissues Another pleiotropic gene for the decrease in (Brues, Amer. J. Phys. Anthropol. 43: 387-391, 1975).

色素形成について今知られていることの大部分は、ヒトならびにマウスおよびキイロショウジョウバエ(Drosophila)のようなモデル系におけるまれな色素形成欠陥の分子遺伝学研究から導かれた。例えば、ヒトにおける眼皮膚白皮症(OCA)形質の解剖は、多くの色素形成欠陥がTYR遺伝子における損傷によることを示し、チロシナーゼ(TYR)陰性OCAsとしての名称に帰着した(例えば、OettingおよびKing、Hum. Mutat. 13:99-115、1999を参照；ワールドワイドウェブ(「www」)上のURL「cbc.umn.edu/tad/」における白皮症データベースも参照)。TYRは、メラニン生合成の律速段階を触媒し、ヒト虹彩が着色される程度は、TYRメッセージレベルの振幅と良く相関する。それにもかかわらず、OCA表現型の複雑性は、TYRが虹彩色素形成に関与する唯一の遺伝子ではないことを例証した。たいていのTYR陰性OCA患者は完全に色素脱失しているが、黒色虹彩のアルビノマウス(C44H)およびそれらのヒトIB型眼皮膚白皮症の対応物は、虹彩を除くすべての組織において色素の欠乏を示す(SchmidtおよびBeermann、Proc. Natl. Acad. Sci. USA 24;91:4756-4560、1994)。多数の他のTYR陽性OCA表現型の研究は、TYRに加えて、眼皮膚白皮症2(OCA2；Durham-Pierreら、Nature Genet. 7:176-179、1994；Durham-Pierreら、Hum. Mutat. 7:370-373、1996；Gardnerら、Hum. Mutat. 7:370-373、1992)、チロシナーゼ様タンパク質(TYRP1；Boissyら、Amer. J. Hum. Genet. 58:1145-1156、1996)、メラノコルチン受容体(MC1R)；Robbinsら、前記、1993；Smithら、前記、1998；Flanaganら、前記、2000)およびアダプチン3B(AP3B；Ooiら、EMBO J. 16:4508-4518、1997)遺伝子座、ならびに他の遺伝子(Sturmら、前記、2001に概説されている)が、正常なヒト虹彩色素形成に必要であることを示した。これらの遺伝子のそれぞれは、主要な(TYR)ヒト色素形成経路の部分である。 Much of what is now known about pigmentation has been derived from molecular genetic studies of rare pigmentation defects in humans and model systems such as mice and Drosophila. For example, dissection of the ocular cutaneous albinism (OCA) trait in humans has shown that many pigmentation defects are due to damage in the TYR gene, resulting in a name as tyrosinase (TYR) negative OCAs (e.g. Oetting and King Hum. Mutat. 13: 99-115, 1999; see also the albinism database at the URL “cbc.umn.edu/tad/” on the World Wide Web (“www”). TYR catalyzes the rate-limiting step of melanin biosynthesis, and the degree to which the human iris is colored correlates well with the amplitude of the TYR message level. Nevertheless, the complexity of the OCA phenotype illustrated that TYR is not the only gene involved in iris pigmentation. Most TYR-negative OCA patients are completely depigmented, but the black iris albino mice (C44H) and their human IB ocular dermatoderma counterparts are pigmented in all tissues except the iris. Depletion is indicated (Schmidt and Beermann, Proc. Natl. Acad. Sci. USA 24; 91: 4756-4560, 1994). Numerous other TYR-positive OCA phenotype studies have been performed in addition to TYR, ocular dermatoderma 2 (OCA2; Durham-Pierre et al., Nature Genet. 7: 176-179, 1994; Durham-Pierre et al., Hum. Mutat. 7: 370-373, 1996; Gardner et al., Hum. Mutat. 7: 370-373, 1992), tyrosinase-like protein (TYRP1; Boissy et al., Amer. J. Hum. Genet. 58: 1145-1156, 1996) ), Melanocortin receptor (MC1R); Robbins et al., Supra, 1993; Smith et al., Supra, 1998; Flanagan et al., Supra, 2000) and adaptin 3B (AP3B; Ooi et al., EMBO J. 16: 4508-4518, 1997) The locus, as well as other genes (reviewed in Sturm et al., Supra, 2001) have been shown to be required for normal human iris pigmentation. Each of these genes is part of the major (TYR) human chromogenic pathway.

キイロショウジョウバエにおいて、虹彩色素形成欠陥は、メラノサイトでの様々な細胞過程に寄与する85個を超える遺伝子座おける突然変異のせいにされたが(Ooiら、前記、1997)、マウス研究は、約14個の遺伝子が脊椎動物において優先的に色素形成に影響を及ぼすこと(Strumら、前記、2001に概説されている)、ならびにTYRおよび他のOCA遺伝子の不同性領域が異なる組織において色素形成を決定するについて機能的に別個であることを示唆した。ヒト色素形成遺伝子は、メラノソームの内側表面上でのチロシナーゼ酵素複合体形成についてのもの、ホルモン性および環境的制御、メラニン芽細胞遊走および分化、新しいタンパク質のメラノソームへの細胞内経路設定、ならびにメラノソームの細胞の本体からケラチン生成細胞に向けての樹状突起アームへの適切な輸送を含む、いくつかの生化学経路へと展開する。それにもかかわらず、ヒトOCA変異体の研究は、表現型活性のある色素形成遺伝子座の数が遺伝的解析にとって扱いやすく小さいことを示唆している。 In Drosophila melanogaster, iris pigmentation defects have been attributed to mutations in more than 85 loci contributing to various cellular processes in melanocytes (Ooi et al., Supra, 1997), but mouse studies have found about 14 Genes preferentially affect pigmentation in vertebrates (reviewed in Strum et al., Supra, 2001), and dissimilar regions of TYR and other OCA genes determine pigmentation in different tissues Suggested to be functionally distinct. Human chromogenic genes are for tyrosinase enzyme complex formation on the inner surface of melanosomes, hormonal and environmental regulation, melanoblast migration and differentiation, intracellular routing of new proteins to melanosomes, and melanosomes It develops into several biochemical pathways, including proper transport from the body of the cell to the dendritic arm towards the keratinocytes. Nevertheless, studies of human OCA mutants suggest that the number of phenotypically active chromogenic loci is small and manageable for genetic analysis.

色素突然変異体における研究は、遺伝子の小さなサブセットが主として、マウスおよびヒトにおいて破局的色素形成欠陥の原因であることを明らかにしたが、これらの遺伝子における共通のSNPがヒト虹彩の色における自然の変動へと寄与(または連鎖)しているかどうか、またはどのようにかは不明なままである。褐色虹彩遺伝子座は、OCA2およびMYO5A遺伝子を含む区間に位置し(EibergおよびMohr、Eur. J. Hum. Genet. 4:237-241、1996)、MC1R遺伝子における特定の多型は、比較的隔離された集団において赤毛および青の虹彩の色と関連している(例えば、Robbinsら、前記、1993；Flanaganら、前記、2000；Valverdeら、Nature Genet. 11:328-330、1995；Schiothら、Biochem. Biophys. Res. Comm. 260:488-491、1999を参照)。ASIP多型は、褐色の虹彩および髪の色の両方と関連していることが報告された(Kanetskyら、Amer. J. Hum. Genet. 70:770-775、2002)。しかしながら、これらの対立遺伝子のそれぞれの浸透度は低いように思われ、一般的に、ヒト集団内で虹彩の色における全体的な変動のただごく少量のみを説明するにすぎないと思われる(Spritzら、Nature Genet. 11:225-226、1995)。しかしながら、単一遺伝子研究は、ヒト虹彩の色の複雑な遺伝的性質を理解するための論理的に正しい根拠を提供しなかった。 Studies in pigment mutants have revealed that a small subset of genes is primarily responsible for catastrophic pigmentation defects in mice and humans, but a common SNP in these genes is a natural occurrence in human iris colors. It remains unclear whether or how it contributes (or links) to fluctuations. The brown iris locus is located in the section containing the OCA2 and MYO5A genes (Eiberg and Mohr, Eur. J. Hum. Genet. 4: 237-241, 1996), and certain polymorphisms in the MC1R gene are relatively isolated Associated with red and blue iris colors (eg, Robbins et al., Supra, 1993; Flanagan et al., Supra, 2000; Valverde et al., Nature Genet. 11: 328-330, 1995; Schioth et al., Biochem. Biophys. Res. Comm. 260: 488-491, 1999). ASIP polymorphisms have been reported to be associated with both brown iris and hair color (Kanetsky et al., Amer. J. Hum. Genet. 70: 770-775, 2002). However, the penetrance of each of these alleles appears to be low, and generally only accounts for a very small amount of overall variation in iris color within the human population (Spritz Et al., Nature Genet. 11: 225-226, 1995). However, single gene studies did not provide a logically correct basis for understanding the complex genetic nature of human iris colors.

たいていのヒト形質は複雑な遺伝的起源をもち、全体はしばしば、部分の総和より大きいため、遺伝的複雑性 − 例えば、優性および上位の遺伝的分散の多因子性ならびに/または相既知の成分 − を重んじる、革新的なゲノム科学に基づく研究設計およびコンピューターで遺伝的データをスクリーニングするための解析方法が必要とされる。しかしながら、第一段階は、配列レベルで形質値における分散を説明する遺伝子座の補体を定義することであり、これらのうち、境界的（marginal）または浸透的意味でそのように行うものは、見出すのに最も容易であると思われる。本研究が行われたのはこの目標に向けてである。 Most human traits have complex genetic origins, and the whole is often larger than the sum of the parts, so genetic complexity--for example, the multifactorial nature of dominant and higher genetic variances and / or known components-- Innovative genomic science-based research design and analytical methods for screening genetic data with computers are needed. However, the first step is to define the complement of the locus that accounts for the variance in trait values at the sequence level, of which so doing in a marginal or osmotic sense, It seems the easiest to find. It is towards this goal that this study was conducted.

非体系的な、仮説に基づくゲノムスクリーニングアプローチは、虹彩の色変動に境界的に(すなわち、独立して)関連した様々なSNP、ハプロタイプおよびディプロタイプを同定するために適用された。この実施例に開示されているように、多数の遺伝子における驚くほど多数の多型が虹彩の色と関連しており、虹彩の色の色素形成の遺伝的性質は全く複雑であることを示している。同定された配列は、DNAからの虹彩の色の推論のための分類器モデルの基盤を提供し、生物地理学的祖先のマーカーとしてのこれらの一部の性質は、他の複雑な形質遺伝子マッピング研究の設計に対して意味をもつ。 A non-systematic, hypothetical-based genome screening approach has been applied to identify various SNPs, haplotypes, and diplotypes that are bounded (ie, independently) related to iris color variation. As disclosed in this example, a surprisingly large number of polymorphisms in many genes are associated with iris color, indicating that the genetic nature of iris color pigmentation is quite complex. Yes. The identified sequences provide the basis for classifier models for inferring iris colors from DNA, and some of these properties as markers of biogeographic ancestry map other complex trait gene mappings It has implications for study design.

方法
検体収集
再シーケンシングのための検体は、キャムデン、ニュージャージーにおけるCoriell Instituteから得られた。ジェノタイピングのための検体は、自己申告されたヨーロッパ系で、異なる年齢、性別、髪、虹彩および皮膚暗度をもち、それらは、IRBガイダンスの下、インフォームドコンセントのガイドラインを用いて収集された。供与者は、青色、緑色、うす茶色、褐色、黒色または知られていない/明確ではない、虹彩の色についてのボックスにチェックし、それぞれは、虹彩の色が彼らの人生行路に渡って変化したかどうか、または各虹彩の色は異なるかどうかを同定する機会をもった。虹彩の色があいまいである、または人生行路に渡って変化した個体は、解析から排除された。 Methods Specimen Collection Specimens for resequencing were obtained from the Coriell Institute in Camden, New Jersey. Samples for genotyping were self-reported Europeans of different ages, genders, hair, iris and skin darkness, and were collected using informed consent guidelines under IRB guidance . Donors check boxes about iris colors in blue, green, light brown, brown, black or unknown / unclear, each of which the iris color has changed over their life path We had the opportunity to identify whether or not each iris color was different. Individuals whose iris colors were ambiguous or changed over their course of life were excluded from the analysis.

103人の被検者について、虹彩の色は、そのうえ、1〜11の数を用いて報告され、1は、色掲示を用いて同定される、最も暗い褐色/黒色、11は最も明るい青色である。これらの被検者について、被検者が、照明条件および距離を標準化するために一方の側で箱の中を、他方でカメラを凝視した、右の虹彩のデジタル写真が得られ、そこからの判断によりサンプルが色群へ割り当てられた。その2つを比較して、86個の分類が合致した。そうでなかった17個のうち、6個が褐色/うす茶色、7個が緑色/うす茶色、および4個が青色/緑色の不一致であったが、褐色/緑色、褐色/青色またはうす茶色/青色のようなひどい不一致はなかった。そのような誤差は、虹彩の色と境界的に関連した配列を同定することについて許容できるが、デジタル的に定量化された虹彩の色を得ることにより、虹彩の色の分類について本明細書に記載された配列の使用に関して、信頼は増加しうる。 For 103 subjects, the iris color is also reported using a number between 1 and 11, with 1 being the darkest brown / black, 11 being the lightest blue identified using color marking is there. For these subjects, a digital photograph of the right iris was obtained, from which the subject stared into the box on one side and the camera on the other to standardize lighting conditions and distance. Samples were assigned to color groups by judgment. Comparing the two, 86 classifications matched. Of the 17 that were not, 6 were brown / light brown, 7 were green / light brown, and 4 were blue / green mismatched but brown / green, brown / blue or light brown / There were no terrible discrepancies like blue. While such errors are acceptable for identifying sequences that are borderlinely related to iris colors, by obtaining digitally quantified iris colors, the classification of iris colors is described herein. Confidence can increase with the use of the described sequences.

SNP発見
候補SNPは、一般的に遺伝子型に対して可能であるより多くの候補SNPを提供する、NCBI:dsSNPデータベースから得られた。ヒト色素形成および生体異物代謝の遺伝子が調べられ、それらの染色体の位置ではなく、それらの遺伝子アイデンティティに基づいて選択された。いくつかの遺伝子について、データベースにおけるSNPの数は低かった、および/またはSNPのいくつかは虹彩の色と強く関連しており、より深い調査を請け合うものである。これらの遺伝子について、再シーケンシングが行われた；本明細書に開示された遺伝子のうち、113個のSNPが、CYP1A2(7個の遺伝子領域、5個の単位複製配列、10個のSNPが見出された)、CYP2C8(9個の遺伝子領域、8個の単位複製配列、15個のSNPが見出された)、CYP2C9(9個の遺伝子領域、8個の単位複製配列、24個のSNPが見出された)、OCA2(16個の遺伝子領域、15個の単位複製配列、40個のSNPが見出された)、TYR(5個の遺伝子領域、5個の単位複製配列、10個のSNPが見出された)およびTYRP1(7個の遺伝子領域、6個の単位複製配列、14個のSNPが見出された；表9および10を参照；国際公開公報第02/097047号も参照)において発見された。 SNP Discovery Candidate SNPs were obtained from the NCBI: dsSNP database, which provides more candidate SNPs than are generally possible for genotypes. Human pigmentation and xenobiotic metabolism genes were examined and selected based on their gene identity rather than their chromosomal location. For some genes, the number of SNPs in the database was low, and / or some of the SNPs were strongly associated with iris color and would undertake deeper research. Of these genes, resequencing was performed; of the genes disclosed herein, 113 SNPs were CYP1A2 (7 gene regions, 5 amplicons, 10 SNPs CYP2C8 (9 gene regions, 8 amplicons, 15 SNPs found), CYP2C9 (9 gene regions, 8 amplicons, 24 SNP was found), OCA2 (16 gene regions, 15 amplicons, 40 SNPs were found), TYR (5 gene regions, 5 amplicons, 10 SNPs were found) and TYRP1 (7 gene regions, 6 amplicons, 14 SNPs were found; see Tables 9 and 10; WO 02/097047 Also see).

（表９）ヒト虹彩色素形成に関連した配列について検査された候補遺伝子

Table 9: Candidate genes tested for sequences associated with human iris pigmentation

これらの遺伝子についての再シーケンシングは、672人の個体(the Coriell Institutes DNA Polymorphism Discovery Resourceから450人の個体、96人の追加のヨーロッパ系アメリカ人、96人のアフリカ系アメリカ人および10人の太平洋諸島系、10人の日本人および10人の中国人；この672個は、本明細書に記載された関連解析に用いられたものとは別々の1組のサンプルを表した)の多民族的パネルからの近位のプロモーター(転写開始部位の平均700 bp上流)、各エキソン(平均1400 bp)、各イントロンの5'および3'末端(イントロン-エキソン接合部を含む、平均サイズ約100 bp)、ならびに3'UTR配列(平均700 bp)を増幅することにより行われた。PCR増幅は、pfu TURBOポリメラーゼを用いて製造会社のガイドライン(Stratagene)に従って達成された。プログラムは、偽遺伝子が共増幅されないこと、または反復の内からの配列が増幅されることを保証するように、ゲノムにおける相同性配列を重んじる様式において再シーケンシングするプライマーを設計するために用いられた。BLAST検索は、用いられたすべてのプライマーの特異性を確認した。増幅産物はpTOPO(登録商標)シーケンシングベクター(Invitrogen)へサブクローニングされ、96個の挿入陽性コロニーがプラスミドDNA単離のために増殖された(増幅段階への670人の個体の使用は、選択された96個のこのサブセットへ1度より多く寄与する個体の尤度を低下させた)。 Resequencing for these genes was performed on 672 individuals (450 individuals from the Coriell Institutes DNA Polymorphism Discovery Resource, 96 additional European Americans, 96 African Americans and 10 Pacific Oceans. Multi-ethnic of islanders, 10 Japanese and 10 Chinese; these 672 represented a separate set of samples from those used in the association analysis described herein Proximal promoter from panel (average 700 bp upstream of transcription start site), each exon (average 1400 bp), 5 'and 3' ends of each intron (including intron-exon junction, average size about 100 bp) As well as by amplifying the 3 ′ UTR sequence (average 700 bp). PCR amplification was achieved using pfu TURBO polymerase according to the manufacturer's guidelines (Stratagene). The program is used to design primers that re-sequencing in a manner that favors homologous sequences in the genome to ensure that pseudogenes are not co-amplified or that sequences from within repeats are amplified. It was. A BLAST search confirmed the specificity of all primers used. The amplification product was subcloned into the pTOPO® sequencing vector (Invitrogen) and 96 insertion positive colonies were grown for plasmid DNA isolation (use of 670 individuals for the amplification step was selected. Reduced the likelihood of individuals contributing more than once to this subset of 96).

シーケンシングは、PE Applied Biosystems BDT chemistryを用いるABI3700シーケンサーで行われた；配列は、市販用のリレーショナルデータベースシステム(iFINCH、Geospiza；シアトル、WA)へ入れられた。PHRED認定された配列は、CLUSTAL Xアラインメントプログラムへインポートされ、このアウトプットは、配列間の品質確証された不一致を同定する第二のプログラムで用いられた。PHREDスコア24点またはそれ以上の変異体の少なくとも2つの例が同定されたそれらの配列が選択され、再シーケンシングを通して発見されたこれらのSNPのそれぞれが、ジェノタイピングに用いられた。 Sequencing was performed on an ABI 3700 sequencer using PE Applied Biosystems BDT chemistry; the sequences were placed into a commercial relational database system (iFINCH, Geospiza; Seattle, WA). PHRED-certified sequences were imported into the CLUSTAL X alignment program and this output was used in a second program to identify quality-verified mismatches between sequences. Those sequences in which at least two examples of variants with a PHRED score of 24 or more were identified were selected and each of these SNPs discovered through resequencing was used for genotyping.

ジェノタイピング：
SNPの大部分について、PCRの第一ラウンドが高忠実度DNAポリメラーゼpfu TURBOポリメラーゼおよび適切な再シーケンシングプライマーを用いてサンプルにおいて行われた。その結果生じたPCR産物の代表がアガロースゲル上で確認され、第一ラウンドPCR産物は希釈され、その後、PCRの第二ラウンドのための鋳型として用いられた。質問される遺伝子の多数は、遺伝子ファミリーのメンバーであるため、2ラウンドが必要であった；SNPは、配列相同性の領域に存在し、ジェノタイピングプラットフォームは短い(約100 bp)単位複製配列を必要とした。残っているサンプルについて、PCRの単一ラウンドのみが行われた。ジェノタイピングは、一塩基プライマー伸長プロトコールおよびSNPstream(商標)25K/超高処理量(Ultra High Throughput(UHT)装置(Orchid Biosystems；プリンストン、NJ)を用いて個体のDNA検体について行われた。遺伝子型はいくつかの品質管理を必要とした；2人の科学者は、独立して、>95%について1,000より大きい全体のUHTシグナル強度および各遺伝子型クラスについての平均間での明らかなシグナル差(すなわち、UHT解析ソフトウェアを用いる2-D空間における明らかな遺伝子型クラスター形成)を要求して、コールを合格/不合格の検査をした。 Genotyping:
For the majority of SNPs, the first round of PCR was performed on the samples using high fidelity DNA polymerase pfu TURBO polymerase and appropriate resequencing primers. A representative of the resulting PCR product was confirmed on an agarose gel, the first round PCR product was diluted and then used as a template for the second round of PCR. Since many of the genes queried were members of the gene family, two rounds were required; SNPs existed in the region of sequence homology, and the genotyping platform used a short (about 100 bp) amplicon. I needed it. For the remaining samples, only a single round of PCR was performed. Genotyping was performed on individual DNA specimens using a single base primer extension protocol and SNPstream ™ 25K / Ultra High Throughput (UHT) apparatus (Orchid Biosystems; Princeton, NJ). Required several quality controls; the two scientists independently observed an overall UHT signal intensity greater than 1,000 for> 95% and a clear signal difference between the means for each genotype class ( That is, the call was tested for pass / fail, requiring clear genotype cluster formation in 2-D space using UHT analysis software.

統計的方法
遺伝子座内および間での対立遺伝子状態における独立性からの逸脱を検定するために、MLD精密検定が用いられた(Zaykinら、前記、1995)。ハプロタイプは、ハプロタイプ再構築方法を用いて推論された(Stephensら、前記、2001)。実在の虹彩の色変動が様々なモデルにより説明されうる程度を測定するために、R²値が、表現型の値を青色の目の色について1として、緑色の目の色について2として、うす茶色の目の色について3として、および褐色の目の色について4として、最初に割り当てることにより、SNP、ハプロタイプおよび多座の遺伝子型のデータについて計算された。BGA混合割合は、この目的のために開発されたソフトウェアプログラムの関係内で記載されているように測定された。R²計算法について、以下の関数が用いられた：Adj-R²=1-{(n/(n-P)}(1-R²)、nはモデル自由度であり、n-pは誤差自由度である。複数の検査について補正するために、複数の結果についての経験的ベイズ調整方法が用いられた(Steenlandら、Cancer Epidemiol. 9:895-903、2000、参照として本明細書に組み入れられている)。 Statistical methods To test for deviations from independence in allelic states within and between loci, the MLD exact test was used (Zaykin et al., Supra, 1995). Haplotypes were inferred using haplotype reconstruction methods (Stephens et al., Supra, 2001). To measure the extent to which real iris color variations can be explained by various models, the R ² value is a phenotypic value of 1 for blue eye color and 2 for green eye color. The SNP, haplotype and multilocus genotype data were calculated by first assigning as 3 for the color eye color and 4 for the brown eye color. The BGA mixing ratio was measured as described within the context of software programs developed for this purpose. For R ² calculation, the following functions were used: Adj-R ² = 1-{(n / (nP)} (1-R ² ), where n is the model freedom and np is the error freedom An empirical Bayesian adjustment method for multiple results was used to correct for multiple tests (Steenland et al., Cancer Epidemiol. 9: 895-903, 2000, incorporated herein by reference. ).

結果
変動性ヒト色素形成に関連したSNP遺伝子座を同定するために、色素形成遺伝子(AP3B1、ASIP、DCT、MC1R、OCA2、SILV、TYR、TYRP1、MYO5A、POMC、AIM、AP3D1およびRAB、表9を参照)内の335個のSNPおよびゲノム中に分布した419個の他のSNPを含む、754個のSNPがジェノタイピングされた。これらの後者のSNPについての対立遺伝子は、集団構造の特定の要素についての情報を与えた；インドヨーロッパ人、サハラ以南アフリカ人、先住アメリカ人および東アジア人のBGAについてのそれらの例外的に高いδ値(すなわち、例外的なAIM)に基づいてヒトゲノムのスクリーンから71個が選択され(実施例2；配列番号：1〜71を参照；Shriverら、前記、2003も参照)、残りは、BGAの機能として劇的な配列変異を示す傾向がある生体異物代謝の遺伝子の中または周囲に見出された。これらの754個の候補SNPについての遺伝子型は、自己申告された虹彩の色の851人のヨーロッパ人由来の個体(292人青色、100人緑色、186人うす茶色および273人褐色)について得点化された。 ResultsTo identify SNP loci associated with variable human chromogenesis, chromogenic genes (AP3B1, ASIP, DCT, MC1R, OCA2, SILV, TYR, TYRP1, MYO5A, POMC, AIM, AP3D1 and RAB, Table 9 754 SNPs were genotyped, including 335 SNPs within and 419 other SNPs distributed throughout the genome. Alleles for these latter SNPs provided information about specific elements of population structure; their exceptionally high for BGA in Indo-Europeans, sub-Saharan Africans, indigenous Americans and East Asians 71 were selected from the screen of the human genome based on δ values (ie, exceptional AIM) (see Example 2; see SEQ ID NOs: 1-71; see also Shriver et al., supra, 2003), the rest being BGA It was found in or around genes for xenobiotic metabolism that tend to exhibit dramatic sequence variation as a function of. Genotypes for these 754 candidate SNPs were scored for 851 European-derived individuals (292 blue, 100 green, 186 light brown, and 273 brown) with self-reported iris colors It was done.

これらの遺伝子型を虹彩の色との関連についてスクリーニングする前に、71個の非生体異物代謝のAIMが各サンプルについてBGA混合割合を測定するために用いられ、BGA混合と虹彩の色との間の相関について検査された。この検査は、851個のカフカス人のサンプルのそれぞれは大部分インドヨーロッパ人BGAをもち、サンプルの58%は有意な(>4%)非インドヨーロッパ人BGA混合をもつが、低レベル(33%未満)の東アジア人、サハラ以南アフリカ人または先住アメリカ人混合と虹彩の色との間に相関はなく、および高レベルの先住アメリカ人混合と虹彩の色との間に相関はなかった；しかしながら、より高レベル(33%より高いが50%より低い)の東アジア人およびサハラ以南アフリカ人混合とより濃い虹彩の色との間に弱い関連があった。 Prior to screening these genotypes for association with iris color, 71 non-xenobiotic metabolism AIMs were used to measure the BGA mixing ratio for each sample, and between BGA mixing and iris color. Were examined for correlation. The test showed that each of the 851 Caucasian samples had mostly Indo-European BGA, and 58% of the samples had significant (> 4%) non-Indo-European BGA mixing, but low levels (33% (Less than) East Asian, Sub-Saharan African or Native American mix and iris color, and there was no correlation between high-level Native American mix and iris color; There was a weak association between higher-level (higher than 33% but lower than 50%) East Asian and sub-Saharan African blends and darker iris colors.

4色(青色、緑色、うす茶色および褐色)または色の群によって虹彩の色を考慮することにより、より良い成功が実現されるかどうか最初から不確かであった。色を分類する一つの方法は、明るい＝青色＋緑色、および暗い＝うす茶色＋褐色であり、この分類は、検出可能レベルのユーメラニン(褐色色素)に関して個体をよりはっきりと区別するように思われる。虹彩の色データが自己申告されたとすれば、サンプルを、褐色と非褐色、または青色と非青色へと分割することは、有意な関連、特に、たった1つの色と関連した対立遺伝子について、を検出するより大きな検出力を提供しうる。これらの4つの方法のそれぞれを利用するために、関連についてSNPをスクリーニングする場合、すべてが考慮された；δ値、カイ二乗および精密検定p値が、a)全4色、b)明るい(青色および緑色)対暗い(うす茶色および褐色)を用いる暗度、c)青色対褐色、およびd)褐色対非褐色(青色、緑色およびうす茶色)の分類について計算された。有意水準は5%に固定され、20個のSNPの対立遺伝子が特定の虹彩の色と、20個が虹彩の色の暗度と、19個が青色/褐色の色比較と、および18個が褐色/非褐色の比較を用いて、関連していた。これらのSNPセット中の重複は高かったが完全ではなかった；4つの基準の少なくとも1つを用いる関連についての有意なp値をもつSNPは、「境界的に」関連しているとして示されている。 It was uncertain from the beginning whether better success would be achieved by considering the iris color by four colors (blue, green, light brown and brown) or groups of colors. One way to classify colors is light = blue + green and dark = light brown + brown, and this classification seems to more clearly distinguish individuals with respect to detectable levels of eumelanin (brown pigment) It is. If the iris color data was self-reported, dividing the sample into brown and non-brown or blue and non-blue would have a significant association, especially for alleles associated with only one color. It can provide greater power to detect. In order to utilize each of these four methods, all were considered when screening SNPs for association; δ values, chi-squares and precision test p-values were a) all four colors, b) bright (blue) And green) versus darkness using dark (light brown and brown), c) blue versus brown, and d) brown versus non-brown (blue, green and light brown) classification. Significance level is fixed at 5%, 20 SNP alleles with specific iris color, 20 with iris color darkness, 19 with blue / brown color comparison, and 18 with Related using a brown / non-brown comparison. Overlap in these SNP sets was high but not complete; SNPs with significant p-values for associations using at least one of the four criteria were shown to be “boundary” related Yes.

複数の同時に存在する仮説がセットp値において検定される場合、拡張I型誤りの可能性がある。それとして、補正手順がこのリスクを補うために用いられた(Steenlandら、前記、2002)；関連の大部分は、この補正後、有意であった。境界的に関連したSNPの大部分は、色素形成遺伝子 − OCA2(色のレベルで11個のSNP)、TYRP1(色のレベルで3つのSNP)、MYO5A(色のレベルで2つのSNP)、AIM(色のレベルで3つのSNP)およびDCT(色のレベルで2つのSNP) − 内であった、とはいっても、いくつかの関連は、10q23でのCYP2C8、10q24でのCYP2C9、2p21でのCYP1B1およびXp11.3でのMAOAのような非色素形成遺伝子内に見出された − これらもまた、境界的に関連したSNPと呼ばれた。色素形成遺伝子SILV、MC1R、ASIP、POMC、RABまたはTYR内に有意なSNP関連は見出されなかった、とはいっても、TYRはp=0.06をもつ1つのSNPをもった。境界的に関連したSNPのうち最も強く関連したものは、関連の強さの順に、OCA2、TYRP1おおびAIM遺伝子由来であった。 If multiple simultaneous hypotheses are tested in the set p-value, there is a possibility of an extended type I error. As such, a correction procedure was used to compensate for this risk (Steenland et al., Supra, 2002); most of the associations were significant after this correction. Most of the border-related SNPs are chromogenic genes-OCA2 (11 SNPs at the color level), TYRP1 (3 SNPs at the color level), MYO5A (2 SNPs at the color level), AIM (3 SNPs at the color level) and DCT (2 SNPs at the color level) − within, although some associations were found in CYP2C8 at 10q23, CYP2C9 at 10q24, 2p21 Found in non-pigmented genes such as MAOA at CYP1B1 and Xp11.3-these were also called border-associated SNPs. Although no significant SNP association was found within the chromogenic genes SILV, MC1R, ASIP, POMC, RAB or TYR, TYR had one SNP with p = 0.06. The most related of the border-related SNPs were derived from OCA2, TYRP1, and AIM genes in order of strength of association.

このアプローチから同定されたSNPの大部分は、別個の遺伝子または染色体領域へ位置したため、各遺伝子座からのSNPのすべてが群化され、推論されたハプロタイプが偶然性解析を用いて虹彩の色との関連について検定された。このより高度な解析は、境界的SNP関連をもつそれらの遺伝子に限らず、遺伝子のすべてについて群化されたSNPが検定された。各遺伝子について、ハプロタイプが推論され、偶然性解析が、どのハプロタイプが虹彩の色と統計的に関連しているかを決定するために用いられた。カイ二乗および修正残差から、16個の異なる遺伝子座について43個のハプロタイプが、虹彩の色と正に(アゴニスト)かまたは負に(アンタゴニスト)かのいずれかで関連していた(表10)。最強の関連は、境界的に関連したSNPをもつ遺伝子について観察された；これらの遺伝子の大部分は、少なくとも1つの虹彩の色と正に(アゴニスト)または負に(アンタゴニスト)関連したハプロタイプおよびディプロタイプ(時々、多座遺伝子様式遺伝子型またはハプロタイプの二倍体対と呼ばれる)をもった(表10)。境界的に関連したSNPを含まない遺伝子/領域のうちの少しは、虹彩の色と正におよび/または負にかのいずれかで関連したハプロタイプならびにディプロタイプをもった(ASIP遺伝子−1個のハプロタイプ、MC1R−2個のハプロタイプ、表10)。換言すれば、それらのSNPは、遺伝子ハプロタイプまたはディプロタイプの関係内で虹彩の色と関連するのみであった。いくつかについて、虹彩の色との関連は、ディプロタイプの関係内で見出されるのみであったが、SNPまたはハプロタイプのレベル(すなわち、SILVおよびGSTT2-22q11.23)において見出されなかった。 Because most of the SNPs identified from this approach were located in distinct genes or chromosomal regions, all of the SNPs from each locus were grouped and the inferred haplotypes were matched with iris colors using contingency analysis. Tested for association. This more sophisticated analysis tested for grouped SNPs for all of the genes, not just those genes with borderline SNP associations. For each gene, haplotypes were inferred and contingency analysis was used to determine which haplotypes were statistically associated with iris color. From chi-square and modified residuals, 43 haplotypes at 16 different loci were associated with iris color, either positive (agonist) or negative (antagonist) (Table 10). . The strongest associations were observed for genes with border-associated SNPs; the majority of these genes were haplotypes and dipros that were positively (agonist) or negatively (antagonist) related to at least one iris color. Has a type (sometimes referred to as a polydentate genotype genotype or haplotype diploid pair) (Table 10). A few of the borderlessly related genes / regions that do not contain SNPs had haplotypes and diplotypes that were either positively and / or negatively related to iris color (one ASIP gene Haplotypes, MC1R-2 haplotypes, Table 10). In other words, those SNPs were only associated with iris color within a gene haplotype or diplotype relationship. For some, an association with iris color was only found within the diplotype relationship, but not at the SNP or haplotype level (ie SILV and GSTT2-22q11.23).

（表１０）16個の虹彩の色遺伝子についての共通のハプロタイプおよびディプロタイプ

¹ 虹彩の色と関連していることが見出された遺伝子座内の最高オーダーの複雑性をもつ配列。少なくとも1つの有意に関連した配列をもつ各遺伝子座についての主要な配列(計数≧13)のすべてが示されている。遺伝子座についてのハプロタイプもディプロタイプも、関連していることが見出されなかった場合には、SNP対立遺伝子のみが示されている。ハプロタイプは遺伝子座について関連していることが見出されなかったが、ディプロタイプは関連していることが見出された場合には、ハプロタイプおよびディプロタイプの両方が示されている。
² アゴニスト色は、配列が正に関連している色を指す。アンタゴニスト色は、配列が負に関連している色を指す。カイ二乗P値が示されている。
³ 851個の本発明者らのサンプルにおいてハプロタイプが観察された回数。 Table 10: Common haplotypes and diplotypes for 16 iris color genes

^An array with the highest order of complexity within a locus found to be associated with ^one iris color. All of the major sequences (count ≧ 13) for each locus with at least one significantly related sequence are shown. If neither a haplotype nor a diplotype for the locus was found to be related, only the SNP allele is shown. Where haplotypes were not found to be related for a locus, but diplotypes were found to be related, both haplotypes and diplotypes are indicated.
² Agonist colors refer to colors that are positively related in sequence. Antagonist color refers to the color with which the sequence is negatively related. The chi-square P value is shown.
³ Number of times a haplotype was observed in 851 of our samples.

ハプロタイプのレベルにおいて、各遺伝子または領域は、固有の数および関連の型をもった。例えば、OCA2、AIM、DCTおよびTYRP1は、青色虹彩と正に関連した、および褐色虹彩と負に関連した両方のハプロタイプを含んだ(OCA2ハプロタイプ1、37、38、42、AIMハプロタイプ1、DCTハプロタイプ2、およびTYRP1ハプロタイプ1、表10)。AIM、OCA2およびTYRP1のような他の遺伝子は、褐色に正に関連したが、青色と負に関連したハプロタイプを含んだ(AIMハプロタイプ2、OCA2ハプロタイプ2、4、45、47、TYRPハプロタイプ4、表10)、一方、MYO5A、OCA2、TYRP1およびCYP2C8-10q23のような他のものは、1つの色と正に関連したが、いずれの他の色とも負に関連しなかったハプロタイプを含んだ(MYO5Aハプロタイプ5、ハプロタイプ10、OCA2ハプロタイプ19、TYRP1ハプロタイプ3およびCYP2C8-10q23ハプロタイプ1、表10)。MC1R遺伝子は、本発明者らのサンプルにおいて緑色のみ関連したハプロタイプを含み、POMC遺伝子は、虹彩の色と弱く関連した遺伝子型をもつ単一のSNPを含んだ(有意なハプロタイプまたはディプロタイプは見出されなかった)。 At the haplotype level, each gene or region had a unique number and associated type. For example, OCA2, AIM, DCT, and TYRP1 included both haplotypes positively associated with blue iris and negatively associated with brown iris (OCA2 haplotype 1, 37, 38, 42, AIM haplotype 1, DCT haplotype 2, and TYRP1 haplotype 1, Table 10). Other genes such as AIM, OCA2 and TYRP1 were positively associated with brown, but included haplotypes associated with blue and negative (AIM haplotype 2, OCA2 haplotype 2, 4, 45, 47, TYRP haplotype 4, Table 10), while others, such as MYO5A, OCA2, TYRP1, and CYP2C8-10q23, included haplotypes that were positively associated with one color but not negatively associated with any other color ( MYO5A haplotype 5, haplotype 10, OCA2 haplotype 19, TYRP1 haplotype 3 and CYP2C8-10q23 haplotype 1, Table 10). The MC1R gene contained a haplotype that was only associated with green in our sample, and the POMC gene contained a single SNP with a genotype weakly associated with iris color (no significant haplotype or diplotype was found). Was not issued).

全体的に、褐色虹彩と関連したハプロタイプの多様性は、青色虹彩と関連したハプロタイプのそれと類似していた。ハプロタイプの大部分は、多人種のサンプルにおいて虹彩の色とよりいっそう劇的に関連していたが、なぜなら、それらを含む多数のSNPが良いAIMであるからであり、より暗い虹彩の色と関連した変異体は、より暗い平均虹彩の色をもつ世界のそれらの祖先群において濃縮されていた。遺伝子または領域内のSNPの大部分は、その遺伝子または領域において他のものとLDにあった(D'<0.1)；MC1R(1ペア)、OCA2(27ペア)、TYR(2ペア)およびTYRP1(2ペア)遺伝子における32個のSNPペアのみがLDにあることが見出された。 Overall, the haplotype diversity associated with the brown iris was similar to that of the haplotype associated with the blue iris. The majority of haplotypes were more dramatically related to iris color in multi-racial samples because the large number of SNPs containing them are good AIMs, and darker iris colors and Related variants were enriched in those ancestry groups of the world with darker average iris colors. The majority of SNPs within a gene or region were in LD with others in that gene or region (D '<0.1); MC1R (1 pair), OCA2 (27 pairs), TYR (2 pairs) and TYRP1 Only 32 SNP pairs in the (2 pairs) gene were found to be in LD.

これらの解析は、結果として、SNPが境界的に関連していようがハプロタイプおよび/またはディプロタイプの関係内で関連していようが、1つのレベルまたは別のレベルにおいて虹彩の色と関連した16個の遺伝子/染色体領域における61個のSNPの同定を生じた。これらのSNPの大部分についての少数対立遺伝子頻度は比較的高く(平均f少数対立遺伝子=0.22)、それらの大部分は、ハーディ-ワインベルグの平衡にあった(HWE p>0.05であるもの、28/34、表10)。9個はそうではなく、これらのうち、2個は、比較的低い頻度をもち、不平衡についての証拠は、境界であった(p値ほとんど0.05)。HWEの欠乏は、通常、不十分に設計されたジェノタイピングアッセイ法のしるしであり、残りの7個のSNPのいずれも、本発明者らが以前、そのような問題と関連づけたジェノタイピングパターンを示さなかった(1つの遺伝子型クラスの非存在、またはヘテロ接合体の優性)。実際、HWEの欠乏の証拠が最も強いそれらのうちの1つは、直接的DNAシーケンシングを通して正当なSNPとして確証された。境界的意味において有意に関連したSNPの染色体の分布は、実際に調べられたSNPの分布と無関係であり、関連が単にSNPサンプリングの作用だけではなかったことを示している。 These analyzes result in 16 SNPs associated with iris color at one or another level, whether related to a boundary or within a haplotype and / or diplotype relationship. This resulted in the identification of 61 SNPs in the gene / chromosomal region. The minor allele frequencies for the majority of these SNPs are relatively high (mean f minor allele = 0.22), and most of them were in Hardy-Weinberg equilibrium (HWE p> 0.05, 28/34, Table 10). Nine were not, and two of these were relatively infrequent and the evidence for an imbalance was a boundary (p value of almost 0.05). HWE deficiency is usually an indication of a poorly designed genotyping assay, and any of the remaining seven SNPs have a genotyping pattern that we have previously associated with such problems. Not shown (absence of one genotype class or heterozygous dominance). In fact, one of those with the strongest evidence of HWE deficiency was validated as a legitimate SNP through direct DNA sequencing. The distribution of SNP chromosomes that were significantly related in the boundary sense was independent of the distribution of SNPs actually examined, indicating that the association was not just the effect of SNP sampling.

染色体15qは、虹彩の色と境界的に関連していたSNPの大多数(18/34)を含み、これらの染色体15のSNPのうちの14個は、2つの異なる遺伝子OCA2およびMYO5Aに見出された。染色体5pは、境界的に関連した3個のSNPをもって、すべてAIM遺伝子にあり、染色体9pは、関連した5個のSNPをもって、すべてTYRP1遺伝子にあった。複数のSNPが染色体10q上で同定された；CYP2C8-10p23.33領域は2個のSNPをもち、隣接する領域CYP2C9-10p24もまた1個もった。すべての3個のマーカーは、お互いと密接したLDにあった(各可能なペアについてp<0.001)。複数のSNPはまた、染色体2上でも同定された；2p23に位置したPOMC SNPは、境界的に関連しており、CYP1B1-2p21領域由来のSNPは、2-SNPハプロタイプの関係内で関連しており(表10)、これらのSNPはまた、LDにあった(p<0.01)。最後に、OCA2(15q11.2-q12)およびMYO5A(15q21)配列に加えて、単一のSNP(15q22-ter)もまた染色体15q上に含まれたが、これらの3つの遺伝子座のそれぞれの間のSNPは、LDになかった。MC1R(16q24)、SILV(12q13)、TYR(11q)、MAOA-Xp11.4-11.3およびGSTT2-22q11.23領域についてのSNPもまた、ハプロタイプのレベルにおいて関連していた、とはいっても、これらは、関連が見出されたこれらの染色体の唯一の領域であった。 Chromosome 15q contains the majority of the SNPs (18/34) that were borderlinely related to iris color, and 14 of these chromosome 15 SNPs were found in two different genes, OCA2 and MYO5A. It was done. Chromosome 5p was all in the AIM gene with three border-associated SNPs, and chromosome 9p was all in the TYRP1 gene with five related SNPs. Multiple SNPs were identified on chromosome 10q; the CYP2C8-10p23.33 region had two SNPs and an adjacent region CYP2C9-10p24. All three markers were in LD in close contact with each other (p <0.001 for each possible pair). Multiple SNPs were also identified on chromosome 2; POMC SNPs located at 2p23 are borderlinely related, and SNPs from the CYP1B1-2p21 region are related within the context of the 2-SNP haplotype (Table 10), these SNPs were also present in LD (p <0.01). Finally, in addition to the OCA2 (15q11.2-q12) and MYO5A (15q21) sequences, a single SNP (15q22-ter) was also included on chromosome 15q, but each of these three loci There was no SNP in LD. Although SNPs for the MC1R (16q24), SILV (12q13), TYR (11q), MAOA-Xp11.4-11.3 and GSTT2-22q11.23 regions were also related at the haplotype level, these Was the only region of these chromosomes where an association was found.

得られたp値は、ディプロタイプが、ハプロタイプまたは個々のSNPより、多くの虹彩の色変動を説明することを示した。これを検定するために、補正されたANOVA解析が、これらの3つのレベルのそれぞれにおけるデータについて行われた。すべての61個のSNPが考慮され、それらのハプロタイプ(表10)およびディプロタイプ(示されず)も考慮された。変数の数について補正した後、ディプロタイプは、変動の15%を説明したが、ハプロタイプは13%を説明し、およびSNPは11%(表4)を説明した。カイ二乗修正残差に基づく、16個の遺伝子/領域について観察された543個の遺伝子型のうちの最も強く関連した68個の遺伝子型は、変動の13%を説明した(行4、表11)。 The obtained p-values indicated that the diplotype accounted for more iris color variation than the haplotype or individual SNPs. To test this, a corrected ANOVA analysis was performed on the data at each of these three levels. All 61 SNPs were considered, and their haplotypes (Table 10) and diplotypes (not shown) were also considered. After correcting for the number of variables, the diplotype accounted for 15% of the variation, while the haplotype accounted for 13% and the SNP accounted for 11% (Table 4). Of the 543 genotypes observed for 16 genes / regions, based on the chi-square modified residual, the 68 most strongly related genotypes accounted for 13% of the variation (line 4, table 11). ).

（表１１）ANOVA-SNPおよびハプロタイプデータ

(Table 11) ANOVA-SNP and haplotype data

754個のSNP遺伝子座のスクリーンから、遺伝子内の複雑性の一つのレベルまたは別のレベルにおいて変動性虹彩の色素形成と統計的に関連した61個が同定された。残りのSNPは、遺伝子内の複雑性のいずれのレベルにおいても有意ではないδ値およびカイ二乗p値をもった。これらの61個の対立遺伝子についてのディプロタイプは、サンプルにおける虹彩の色の分散の大部分を説明した；SNPのレベルにおいては、最低量が説明され、虹彩の色決定への遺伝子内の複雑性の要素(すなわち、優性)を示唆した。 A screen of 754 SNP loci identified 61 that were statistically associated with variable iris pigmentation at one or another level of complexity within the gene. The remaining SNPs had δ values and chi-square p-values that were not significant at any level of complexity within the gene. The diplotypes for these 61 alleles accounted for most of the iris color variance in the sample; at the level of SNP, the lowest amount was explained and the intragenic complexity to iris color determination Suggested the element of (ie dominance).

同定された61個のSNPの約半分のみが、虹彩の色と独立して関連していた；その残りは、ハプロタイプまたはディプロタイプの関係においてのみ関連していた。複雑性のこのレベルにおいてさえ、単一ではない遺伝子由来の配列が、信頼性のある虹彩の色の推論をするために用いられうり、虹彩の色決定について、おまけに、遺伝子間の複雑性の要素(すなわち、上位性)も示した。同定されたSNPの多数が、多重検定についての補正プロトコールを課した後に有意であったという事実はさておき、証拠の5つのラインが、同定されたSNPが偽で関連しているのではないということを示した。第一に、境界的に関連したSNPとして同定された遺伝子のすべてについて、そのようなSNPの複数が同定された；すなわち、検査された様々な遺伝子の間でのSNPの分布はランダムではなかった。第二に、非色素遺伝子SNPのいくつかは、色素遺伝子、例えばCYP2C8(10q24.1)およびCYP2C9(10q24)(直接的に検査されていない2つの色素遺伝子 − HPS1(10q23.1-4およびHPS6(10q24.34)の近位に位置している)、の近くに位置しており、CYP1B1遺伝子座における染色体2p SNP(CYP1B1-2p21)は、2p23におけるPOMCの近位に位置している(およびPOMC SNPとLDにある)。第三に、おおよそ等しい数の色素形成および非色素形成の遺伝子SNPが検査され、34個の境界的に関連したSNPをもったが、それらのうちの28個(82%)は、色素形成遺伝子にあった。このように、様々な遺伝子の型の間でのSNPの分布もまたランダムではなかった。第四に、関連は、一般的に、遺伝子内ハプロタイプの関係におけるSNPについてより強く、偽で関連したSNPについて必ずしも得られないであろう結果であった(すなわち、結果は、遺伝子配列自身が関連しており、単に、各遺伝子内の単一の多型にすぎないのではないことを示唆している)。第五に、複数の祖先の個体を含むサンプルに適用された場合、結合されたこれらを始めとする遺伝子からの線形および非線形の変数は、大部分ヨーロッパ人祖先の個体にだけ適用された場合よりもいっそう良く実行された。非ヨーロッパ人または小部分ヨーロッパ系のほとんどの個体は、虹彩の色(平均して、ヨーロッパ系の個体より暗い暗度の)において低い変動性を示すため、この向上は、驚くべきようではない場合もある。しかしながら、この結果は、SNPが虹彩の色と真に関連しているのではなかったとすれば、必ずしも得られなかったであろう。 Only about half of the 61 SNPs identified were independently associated with iris color; the rest were associated only in haplotype or diplotype relationships. Even at this level of complexity, sequences from non-single genes can be used to make reliable iris color inferences, and for iris color determination, aside from the complexity of the genes. Elements (ie superordinates) are also shown. Aside from the fact that many of the identified SNPs were significant after imposing a correction protocol for multiple testing, the five lines of evidence indicate that the identified SNPs are not falsely related. showed that. First, for all genes identified as border-related SNPs, multiple such SNPs were identified; that is, the distribution of SNPs among the various genes examined was not random . Second, some of the non-pigment gene SNPs are pigment genes such as CYP2C8 (10q24.1) and CYP2C9 (10q24) (two pigment genes that have not been tested directly-HPS1 (10q23.1-4 and HPS6). (Located proximal to (10q24.34)), the chromosome 2p SNP at the CYP1B1 locus (CYP1B1-2p21) is located proximal to POMC at 2p23 (and (In POMC SNP and LD) Third, roughly equal numbers of pigmented and non-pigmented gene SNPs were examined and had 34 border-associated SNPs, of which 28 ( 82%) were in chromogenic genes, and thus the distribution of SNPs among various gene types was also not random.Fourth, the association was generally of intragenic haplotypes The results were stronger for SNPs in the relationship and would not necessarily be obtained for falsely related SNPs (i.e. The results suggest that the gene sequences themselves are related, not just a single polymorphism within each gene.) Fifth, it includes multiple ancestral individuals When applied to a sample, linear and non-linear variables from these and other combined genes performed much better than when applied only to individuals of most European ancestry, or non-European or This improvement may not be surprising, as most individuals of small European descents exhibit low variability in iris color (on average, darker than European individuals). This result would not necessarily have been obtained if SNP was not truly associated with the color of the iris.

多重検定についての補正は、SNPレベルの関連の大部分をそのままにしておいたが、多数の関連は、多重検定試験を合格しなかった、しかし、可能なII型誤りを避けるために提示されている；配列は、虹彩の色と弱く関連している可能性があり、ひょっとしたら、分類についての複数遺伝子モデル内で関連性があるかもしれない(すなわち、上位性)。これらについて、より高度な関連の検定からのような、SNP同定の下流で、上で記載されたもののような様々な他の基準を用いて、あるいは、複雑な分類モデルの一般化のためのSNPの有用性を用いて、偽陽性を排除することがより賢明であるように思われる。 The correction for multiple testing left most of the associations in the SNP level intact, but many associations did not pass the multiple testing, but were presented to avoid possible type II errors Sequences may be weakly related to iris color and possibly related within a multi-gene model for classification (ie superficiality). For these, downstream of SNP identification, such as from more sophisticated association tests, using various other criteria such as those described above, or for generalization of complex classification models It seems more sensible to eliminate false positives using the utility of.

色素形成遺伝子における突然変異は、眼皮膚白皮症の主たる原因であり、だから、それらの配列における共通の変異が天然の虹彩の色における分散の一部を説明することを期待するのは当然のことであり、実際、この結果は観察された。しかしながら、多数のその関連は、他の型の遺伝子に位置しているSNPについてであった(10q23におけるCYP2C8、10q24におけるCYP2C9、2p21におけるCYP1B1およびXp11.3におけるMAOA)。この研究における非色素形成遺伝子の包含は意図的であった；スクリーンは、色素形成遺伝子SNPに限定されず、AIMの2つの型 − サハラ以南、インドヨーロッパ人、先住アメリカ人および東アジア人集団対立遺伝子頻度の間でのδ値に基づいたゲノムから選択されたもの、ならびに生体異物代謝遺伝子内でのそれらの位置に基づいて選択されたもの − を含んだ。おそらく、生体異物代謝遺伝子産物は土地固有の食物に存在する植物のアルカロイドおよびタンニンの解毒を担い、かつ淘汰および遺伝的浮動がそれらの配列の地理学的分布を形づくったためと思われるが、生体異物代謝遺伝子が並はずれた濃度のAIMを含むこと、およびこれらのAIMのいくつかは「潜在的」集団構造の測定に関連性があることを示す以前の証拠に、一部、基づいて、後者は含まれている。そのような潜在的構造は、たとえ、それらが生物学的機構を解明する助けにならないとしても、正確な分類を可能にする程度で虹彩の色と相関しうる。 Mutations in the pigmentation gene are the main cause of ocular dermatoderma, so it is natural to expect that common mutations in their sequences explain some of the variance in natural iris color In fact, this result was observed. However, a number of its associations were with SNPs located in other types of genes (CYP2C8 at 10q23, CYP2C9 at 10q24, CYP1B1 at 2p21 and MAOA at Xp11.3). The inclusion of non-pigmented genes in this study was intentional; the screen is not limited to pigmented gene SNPs, but two types of AIMs—sub-Saharan, Indo-European, Native American and East Asian populations Included those selected from the genome based on δ values between allele frequencies, as well as those selected based on their location within the xenobiotic metabolism gene. Presumably, the xenobiotic metabolizing gene product is responsible for the detoxification of plant alkaloids and tannins present in land-specific foods, and that cocoons and genetic drift have shaped the geographical distribution of their sequences. Based in part on previous evidence showing that metabolic genes contain extraordinary concentrations of AIM, and that some of these AIMs are relevant to the measurement of “potential” population structure, the latter include. Such potential structures can correlate with iris colors to the extent that they allow accurate classification, even if they do not help elucidate biological mechanisms.

a)これらのSNPの一部は、大まかなまたは大陸的集団構造だけでなく、下位構造およびおそらく、微細構造までもの指標である、b)虹彩の色は、カフカス人群内で構造のこれらの要素と相関した、およびc)これらのマーカーが分類または形質値推論の目的のために表現型活性遺伝子座の代用としての役割を果たしうることが仮説とされた。一般にもたれている概念は、遺伝的スクリーニングは連鎖不平衡を通して表現型活性遺伝子座を同定する方向へ厳密に行われるのみであるということである。しかしながら、分類が目標である場合、形質値が構造と相関するならば、および構造についてのマーカーが同定されうるならば、表現型活性遺伝子座の同定よりむしろ、集団構造が助けとなりうる。例えば、虹彩の色分類ツールは、法医学科学者にとって、犯罪現場DNAからの一部分の身体的プロフィールの客観的かつ科学に基づく構築に有用でありうる。現在、法医学捜査員は、驚くほど非科学的な手段を用いて身体的プロフィールを構築する；まれにのみ、目撃者報告が有効であり、しばしば、人間の報告は特定の状況において主観的でかつ信頼できない。法医学適用について、捜査員は、形質値の正確な推論をする能力より、表現型の生物学的機構に関心が少ない。もちろん、表現型活性遺伝子座とのLDにあるマーカー(または表現型活性遺伝子座それら自身)を同定することは、生物学的機構のより良い理解に加えて、より正確な分類を提供するが、不均一な集団におけるこれらのつかまえにくい遺伝子座の捜索は、LDが2、3 Kb間に及ぶのみであり、費用のかかるゲノムワイドのスキャンが必要とされるため、実際的ではない。 a) Some of these SNPs are indicators of not only a rough or continental population structure, but also substructures and possibly even fine structures, b) the colors of the iris are those elements of structure within the Caucasian population And c) it was hypothesized that these markers could serve as a surrogate for phenotypically active loci for purposes of classification or trait value inference. The general concept is that genetic screening is only done strictly in the direction of identifying phenotypically active loci through linkage disequilibrium. However, if classification is the goal, population structure can help rather than identify phenotypically active loci if trait values correlate with structure and if markers for structure can be identified. For example, an iris color classification tool may be useful for forensic scientists to objectively and scientifically construct a partial physical profile from crime scene DNA. Currently, forensic investigators use surprisingly unscientific means to build physical profiles; rarely, witness reports are effective, and human reports are often subjective in certain situations and Untrusted. For forensic applications, investigators are less interested in phenotypic biological mechanisms than their ability to accurately infer trait values. Of course, identifying a marker in LD with a phenotypically active locus (or phenotypically active locus itself) provides a more accurate classification in addition to a better understanding of biological mechanisms, Searching for these hard to find loci in a heterogeneous population is impractical because LD only spans between a few Kb and requires costly genome-wide scans.

虹彩の色と関連しているとして本明細書で同定された多数のSNPが生体異物代謝遺伝子に位置していたということは、同定されたマーカーが、潜在的集団構造との相関を通して虹彩の色と関連していることを示唆している。換言すれば、非色素形成遺伝子マーカーは、おそらく、虹彩の色についての表現型活性遺伝子座と、必ずしもLDにないが、相関していると思われる。そのような相関を通して、マーカーおよび活性のある遺伝子座の両方は、たとえ、それらがお互いとLDにないとしても、インドヨーロッパ人血統の特定の分派において濃縮されている。そのような相関に基づくこれらの結果は、用いられたサンプルに関してのみの分類関係において意味がある。例えば、それらの大陸間δ値に基づいて選択されたAIMは、主としてヨーロッパ系の個体において虹彩の色と関連していなかったが、より国際的なサンプルにおいて虹彩の色と強く関連していた、なぜなら、AIMは、このサンプルにおいて虹彩の色と相関した構造の要素に特異的な関連性があるためである。対照的に、これらの同じAIMは、調べられた大部分ヨーロッパ人起源の個体のサンプル内で虹彩の色と関連していなかったが、なぜなら、このサンプル内の大まかな構造においてほとんど変動がないためである。その代わりとして、任意の(大部分)カフカス人またはヨーロッパ系アメリカ人のサンプル内において、民族的もしくは他の下位集団レベルの所属における変動による下位構造または微細構造(潜在的構造)が存在するように思われ、構造が形質と相関する場合には、この潜在的構造を測定することに特異的に関連したそれらのSNPのみが必要とされるであろう。測定された表現型と関連性がない主としてヨーロッパ人起源の研究サンプルにおける体系的構造は同定されなかったことは、何も生まず、本結果を別の形質について再現するために、認定されたAIMの使用が避けられないことを示している。 The large number of SNPs identified herein as being associated with the iris color were located in the xenobiotic metabolism gene, indicating that the identified marker is iris color through correlation with potential population structure. Suggests that it is related to In other words, the non-pigmented gene marker is probably correlated with the phenotypically active locus for iris color, although not necessarily in LD. Through such a correlation, both markers and active loci are enriched in a particular branch of the Indo-European pedigree, even if they are not in each other and LD. These results based on such correlations are meaningful in classification relationships only with respect to the samples used. For example, AIMs selected based on their intercontinental delta values were not associated with iris color primarily in European individuals, but were strongly associated with iris color in more international samples. This is because AIM has a specific association with structural elements that correlate with iris color in this sample. In contrast, these same AIMs were not associated with iris color in most samples of individuals of European origin examined because there was little variation in the rough structure in this sample It is. Instead, in any (mostly) Caucasian or European-American sample, there is a substructure or fine structure (potential structure) due to variations in affiliation at the ethnic or other subgroup level. Apparently, if the structure correlates with the trait, only those SNPs specifically associated with measuring this potential structure would be required. The fact that no systematic structure was identified in a study sample of primarily European origin that was not related to the measured phenotype was irrelevant, and a certified AIM was used to reproduce this result for another trait. The use of is inevitable.

本結果の他の解釈が可能である、例えば、関連は、現在まだ、限定された色素形成遺伝子ではないものとのLDを通して観察された可能性があったということである。実際、CYP2C8およびCYP2C9は、染色体10上、HPS1およびHPS2色素形成遺伝子の近くに位置している(直接的に検査されていない)、CYP1A2は、OCA2およびMYO5Aと同じアーム上に15q22-terに位置している、CYP1B1は、2p23におけるPOMC遺伝子の付近で2p21に位置している、ならびにMAOAは、OA1色素形成遺伝子と同じX染色体のアーム(Xp11.4-11.3)上に位置している(直接的に検査されていない)。虹彩の色と関連したこれらの遺伝子座と「隣接」色素形成遺伝子との間の距離は、ゲノムにおけるLDの平均範囲よりはるかに大きく、たとえ、これらの関連がLDを通してであるとしても、再び、集団構造が解釈として引き合いに出される必要があるように思われる。 Other interpretations of this result are possible, for example, that the association may have been observed through LD with what is not yet a limited chromogenic gene. In fact, CYP2C8 and CYP2C9 are located on chromosome 10 near the HPS1 and HPS2 chromogenic genes (not tested directly), and CYP1A2 is located on 15q22-ter on the same arm as OCA2 and MYO5A CYP1B1 is located at 2p21 near the POMC gene at 2p23, and MAOA is located on the same X chromosome arm (Xp11.4-11.3) as the OA1 chromogenic gene (directly) Have not been inspected automatically). The distance between these loci associated with iris color and the “adjacent” pigmentation genes is much larger than the average range of LD in the genome, even if these associations are through LD, It seems that the group structure needs to be quoted as an interpretation.

LDは、最近混合された集団においてメガベースに渡ることが知られていて、2千個ほどの少ないAIMがこれらの集団において全ゲノム適用範囲を得るために用いられうり、この研究に用いられたヨーロッパ系アメリカ人サンプルの3分の2が有意な(4%)BGA混合をもったことは、かなり興味深い。ヨーロッパ系アメリカ人は、伝統的に定義された混合された群(ヒスパニックまたはアフリカ系アメリカ人のような)として認識されていないが、観察されたBGA混合は、細かい、潜在的レベルの集団構造に結びつけられうる。本結果のLDおよび/または集団構造への関連性は明らかではないが、結果が相関よりむしろLDによるとすれば、それらは、まさしくAIMが最近かつ広範囲に混合された集団における形質マッピングのために集団混合を強化するために用いられうる、それらはまた、同様の様式において潜在的集団構造を強化するために用いられうることを示唆しているものと思われる。このように、結果が相関によるか、またはLDによるかにかかわらず、同定された多数の非候補遺伝子関連が、集団構造の測定が薬理ゲノム学および複合性疾患遺伝子分類器の費用効果の高い開発についてのより広い含みをもつことを示している。 LD is known to cross the megabase in recently mixed populations, and as few as 2,000 AIMs could be used to obtain whole-genome coverage in these populations, and Europe used in this study It is quite interesting that two-thirds of American-American samples had significant (4%) BGA mixing. European-Americans are not recognized as traditionally defined mixed groups (such as Hispanic or African-Americans), but the observed BGA mixes have a fine, potential level population structure. Can be tied together. The relevance of this result to LD and / or population structure is not clear, but if the result is based on LD rather than correlation, they are just for trait mapping in a population with a recent and extensive mix of AIM. It may be used to enhance population mixing, which also suggests that it can be used to enhance potential population structure in a similar manner. Thus, regardless of whether the outcome is by correlation or by LD, a large number of identified non-candidate gene associations are a cost-effective development of population structure measurements for pharmacogenomics and complex disease gene classifiers. It has a broader implication for.

連鎖研究は、特定の色素形成遺伝子を色素形成表現型に特異的に関連性があるとして関係づけ、本明細書で同定された色素形成遺伝子SNPの大部分は、OCA2、MYO5A、TYRP1およびAIMのような特定の遺伝子へクラスター形成した。さらに、本件の特定の局面は、以前の文献を支持する。同定されたSNPの大部分は、連鎖解析が「褐色性」の決定のための主たる染色体として同定した、染色体15上にある(EibergおよびMohr、Eur. J. Hum. Genet. 4:237-241、1996)；この遺伝子座(BEY2)を含む区間内の候補遺伝子はOCA2遺伝子である可能性が最も高いことが示唆された、とはいっても、MYO5A遺伝子もまたこの区間内に存在し、かつ、本明細書に開示されているように、虹彩の色と関連していた。OCA2関連は、検査された遺伝子または領域の中で最もはるかに有意であったが、MYO5A SNPは、弱く関連していただけだった(しかし、ハプロタイプおよびディプロタイプはより強かった)。MYO5A対立遺伝子は、OCA2のそれらとLDになく、これらの結果が独立して得られたこと、およびEibergおよびMohrによる結果が2つの別々の遺伝子の活性の反映であった可能性があること(EibergおよびMohr、前記、1996)を示唆している。 Linkage studies related specific chromogenic genes as being specifically related to the chromogenic phenotype, and the majority of chromogenic gene SNPs identified here are OCA2, MYO5A, TYRP1 and AIM. Clustered to such specific genes. Furthermore, certain aspects of the present case support previous literature. Most of the identified SNPs are on chromosome 15, which linkage analysis has identified as the main chromosome for the determination of “brownness” (Eiberg and Mohr, Eur. J. Hum. Genet. 4: 237-241 1996); suggesting that the candidate gene in the section containing this locus (BEY2) is most likely the OCA2 gene, although the MYO5A gene is also present in this section, and , As disclosed herein, associated with the color of the iris. OCA2 association was most significantly significant among the genes or regions examined, while MYO5A SNP was only weakly associated (but haplotypes and diplotypes were stronger). The MYO5A allele was not in LD with those of OCA2, and these results were obtained independently, and the results by Eiberg and Mohr may have been a reflection of the activity of two separate genes ( Eiberg and Mohr, supra, 1996).

2つのOCA2コード変化は、暗い方の虹彩の色と関連していることが報告された(Rebbeckら、Cancer Epidemiol. Biomarkers Prev. 11(8):782-784、2003)。さらに、「赤毛/青色虹彩」SNP対立遺伝子が以前に(Valverdeら、前記、1995；Koppulaら、前記、1997)同定され、これらの配列が虹彩の色素形成と関連していることを確認した、とはいっても、以前に記載された関連は、青色虹彩とで、かつSNPのレベルにおいてであった、ところが、本研究においては、関連は、緑色虹彩とであり、かつハプロタイプおよびディプロタイプのレベルにおいて明らかであったのみであった。関連はまた、ASIP遺伝子において同定されたが(Kanetskyら、前記、2002)、本研究においては、この遺伝子関連は、SNPのレベルにおいてではなかった；本明細書で同定されたASIP SNPの1つ(マーカー861)は、褐色虹彩の色と関連しているとして記載された8818 G-A SNP転換であるが(Kanetskyら、前記、2002)、本研究においては、関連は、ハプロタイプのレベルにおいてうす茶色とであった。 Two OCA2 code changes have been reported to be associated with darker iris color (Rebbeck et al., Cancer Epidemiol. Biomarkers Prev. 11 (8): 782-784, 2003). In addition, the “red hair / blue iris” SNP allele was previously identified (Valverde et al., Supra, 1995; Koppula et al., Supra, 1997), confirming that these sequences are associated with iris pigmentation, Nonetheless, the previously described association was with the blue iris and at the level of SNP, but in this study, the association was with the green iris and the haplotype and diplotype levels. It was only apparent in An association was also identified in the ASIP gene (Kanetsky et al., Supra, 2002), but in this study this genetic association was not at the level of SNP; one of the ASIP SNPs identified herein (Marker 861) is an 8818 GA SNP conversion described as being associated with the color of brown iris (Kanetsky et al., Supra, 2002), but in this study, the association is a light brown at the haplotype level. Met.

TYRハプロタイプと虹彩の色との間の関連は、比較的弱く、より少ないサンプルにおいて強い関連を見出すことができなかった眼皮膚白皮症の分野の他の人により得られた結果と矛盾していない。本結果は、OCA2、ASIPおよびMC1Rについての発見を独立して検証したが、それらはまた、いくつかの他の色素形成遺伝子が、虹彩の色の自然の分布と関連した対立遺伝子を含むことを示している(TYRP1、AIM、MYO5AおよびDCT)。それとして、本結果は、独立して取り上げられた、色素形成遺伝子の対立遺伝子を虹彩の色と関連づける以前の研究は、より大きく、より複雑な肖像画のほんの一筆を表しているにすぎないことを示している。 The association between TYR haplotypes and iris colors is relatively weak and contradicts results obtained by others in the area of ocular dermatoderma who were unable to find a strong association in fewer samples. Absent. Although the results independently verified the findings for OCA2, ASIP and MC1R, they also found that several other pigmentation genes contain alleles associated with the natural distribution of iris color. Showing (TYRP1, AIM, MYO5A and DCT). As such, the results indicate that the previous work that linked the pigment-forming gene alleles to the iris color, which was taken up independently, represents only one stroke of a larger, more complex portrait. Show.

興味深いことに、本明細書で同定されたSNPの大部分は、翻訳領域ではない、沈黙表現型かまたは遺伝子近位のプロモーター、イントロンもしくは3'UTRに存在するかのいずれかである。この結果は、全く珍しいことでもないが、SNPが他の表現型活性遺伝子座とLDにある、またはそれは、メッセージ転写および/もしくはターンオーバーにおける変動性がヒト虹彩の色に観察される変動性の一部を説明しうるという反映である可能性があることを示しうる。多数のSNPがスクリーニングされたが、遺伝子のいくつかは、多数の候補SNPを含み、すべてが検査されたとは限らない。例えば、OCA2は、NCBI dbSNPにおいて約200個の既知の候補SNPをもつ。それとして、OCA2遺伝子はまだ、変動性ヒト虹彩色素形成のより多くの情報をもっている可能性があり、そのような情報は、本明細書に開示された方法を用いて入手できる。 Interestingly, the majority of SNPs identified herein are either silence phenotypes that are not translational regions, or are present in gene proximal promoters, introns or 3′UTRs. This result is not at all uncommon, but SNPs are at other phenotypically active loci and LDs, or that variability in message transcription and / or turnover is observed in the color of the human iris. It may indicate that this may be a reflection that a part can be explained. A large number of SNPs have been screened, but some of the genes contain a large number of candidate SNPs, and not all have been examined. For example, OCA2 has about 200 known candidate SNPs in NCBI dbSNP. As such, the OCA2 gene may still have more information on variable human iris pigmentation, and such information can be obtained using the methods disclosed herein.

実施例5
薬物応答性を予測するためのAIMの使用
この実施例は、多くのヒト遺伝的形質と同様に、多くの薬物応答形質は集団構造の要素と相関するため、AIMが、化学予測的および診断的検査を開発するのに用いられうることを実証している。 Example 5
Use of AIM to predict drug responsiveness This example shows that, like many human genetic traits, AIM is chemically predictive and diagnostic because many drug response traits correlate with elements of population structure Demonstrates that it can be used to develop tests.

染色体アームによるジェノタイピングに有効なSNPの分布は、図12に示されている。調べられた約400個のSNPのそれぞれにおいて、コレステロール(TC)、低密度リポタンパク質(LDL)、肝臓トランスアミナーゼASTSGOTおよびALTGPT測定に関して応答が知られていた、Lipitor(商標)を服用するカフカス人個体(180人)がジェノタイピングされた。TCおよびLDL変化に関しての既知の応答の150人のZocor(商標)患者、ならびに既知の髪および目の色の1,000人の個体もジェノタイピングされた。様々な形質クラス中で有意のデルタ値(δ>0.20)をもつそれらのSNPが選択された。例えば、患者の約70%において、Lipitor(商標)はLDLにおける減少を引き起こしたが、患者の30%において、それは効果を生じなかった。任意の与えられたSNPについて、デルタ値(δ)は、LDLが少なくとも20%減少した個体対LDLがそのように減少しなかった個体の中での少数対立遺伝子頻度における差である。δ値は、各検査(Zocor(商標)：LDL、TC、ASTSGOT、ALTGPT応答；Lipitor(商標)：LDL、TC応答)に対して、図12の各SNPについて測定された。目の色について、δ値は、暗い(うす茶色または褐色)目対明るい(青色または緑色)目によって測定された。髪の色について、δ値は、黒色または褐色対ブロンドによって測定された。 The distribution of SNPs effective for genotyping by chromosomal arms is shown in FIG. In each of the approximately 400 SNPs examined, Caucasian individuals taking LipitorTM, whose responses were known for cholesterol (TC), low density lipoprotein (LDL), liver transaminase ASTSGOT and ALTGPT measurements ( 180 people) were genotyped. 150 Zocor ™ patients with known responses for TC and LDL changes, and 1,000 individuals with known hair and eye color were also genotyped. Those SNPs with significant delta values (δ> 0.20) in various trait classes were selected. For example, in about 70% of patients, Lipitor ™ caused a decrease in LDL, but in 30% of patients it had no effect. For any given SNP, the delta value (δ) is the difference in minor allele frequency among individuals with at least 20% reduction in LDL versus individuals with no such reduction in LDL. The δ value was measured for each SNP in FIG. 12 for each test (Zocor ™: LDL, TC, ASTSGOT, ALTGPT response; Lipitor ™: LDL, TC response). For eye color, δ values were measured by dark (light brown or brown) eyes versus light (blue or green) eyes. For hair color, δ values were measured by black or brown vs. blond.

Lipitor(商標)応答について、4つの終点測定のそれぞれについての有意の(δ>0.20)値のSNPの数は図13に示されている。LDLおよびTCの終点測定を用いるZocor(商標)応答についての有意のデルタ値をもつそれらのSNPが、その後、選択された(図14)。次に、虹彩の色について有意のデルタ値をもつそれらのSNPが選択された(図15)；および、同様に、髪の色について(図16)。良いδ値をもつSNPの分布は、特異性の特定の要素を別にすれば、グラフA〜Eにおいて各形質について類似していた。特異性は、染色体アーム6pに焦点を合わせることにより認識されうるが、それは、Lipitor(商標)に対するTC(総コレステロール)応答について多くの重要なSNPを有するが(図13)、Zocor(商標)応答については少しもなかった(図14)。染色体2は、目の色について良いδ値をもつSNPを含むが(図16)、髪の色については含まなかった(図15)。染色体15は、Lipitor(商標)応答性を予測する多くのマーカーを含むが、Zocor(商標)応答については含まなかった。この特異性は、これらの形質のそれぞれについて決定論的な他の遺伝子座との連鎖不平衡の作用である可能性が高いが、残りの形質の発現をどうにかしうるものはもたない；この型の発見は、遺伝子マッピングの伝統的測定の目標である。または、それは、集団構造の特定の要素との相関によるものでありうる。 For the Lipitor ™ response, the number of SNPs with significant (δ> 0.20) values for each of the four endpoint measurements is shown in FIG. Those SNPs with significant delta values for the Zocor ™ response using LDL and TC endpoint measurements were then selected (FIG. 14). Next, those SNPs with significant delta values for iris color were selected (FIG. 15); and, similarly, for hair color (FIG. 16). The distribution of SNPs with good δ values was similar for each trait in graphs A to E, apart from specific elements of specificity. Specificity can be recognized by focusing on chromosomal arm 6p, which has many important SNPs for the TC (total cholesterol) response to LipitorTM (Figure 13), but the ZocorTM response There was nothing about (Figure 14). Chromosome 2 contains SNPs with good δ values for eye color (FIG. 16) but not hair color (FIG. 15). Chromosome 15 contains many markers that predict Lipitor ™ responsiveness, but not the Zocor ™ response. This specificity is likely the result of linkage disequilibrium with other loci that are deterministic for each of these traits, but nothing can manage the expression of the remaining traits; Type discovery is the traditional measurement goal of genetic mapping. Or it may be due to correlation with specific elements of the population structure.

類似性は、染色体1もそうであるが、染色体10および22が、4つの機構的に関連のない形質のそれぞれについての良いδ値をもつSNPの比較的高い数をもつ点において明らかである。全体的に、1つの形質について重要なSNPの分布は、別のものについてのそれと異ならない。それは、本明細書で対象となる、かつ本方法の価値を例証する、プロフィールの間での類似性である。 Similarities are evident in that chromosome 10 and 22 have a relatively high number of SNPs with good δ values for each of the four mechanistically unrelated traits, as does chromosome 1. Overall, the distribution of SNPs important for one trait does not differ from that for another. It is the similarity between profiles that is of interest here and that illustrates the value of the method.

4つのグラフが共通してもつ要素は、SNP遺伝子型の数と相関する(図12；図13〜16は大まかに図12に似ている)。第一関門において、この結果は、これらのSNP対立遺伝子についての「重要性」または有意性が偽であり、単に、各染色体アームについてジェノタイピングされたSNPの数の関数にすぎない(すなわち、染色体からジェノタイピングするSNPが多ければ多いほど、その染色体上で見出すであろう良いデルタ値のSNPが多い)ことを示しているように見える。しかしながら、図12におけるSNPは、いずれの型のSNPともちょうどではない；図12における有効なSNPの大部分は、生体異物代謝および色素形成遺伝子SNPであり、ほとんどすべて良いAIMである。 The elements that the four graphs have in common correlate with the number of SNP genotypes (Figure 12; Figures 13-16 are roughly similar to Figure 12). At the first barrier, this result is false in “importance” or significance for these SNP alleles and is simply a function of the number of SNPs genotyped for each chromosomal arm (i.e., chromosomes). The more SNPs that are genotyped, the more delta SNPs that will be found on that chromosome). However, the SNPs in FIG. 12 are not exactly any type of SNP; the majority of the effective SNPs in FIG. 12 are xenobiotic metabolism and chromogenic gene SNPs, almost all good AIMs.

虹彩の色について、SNP関連の大部分が、多重検定についての補正後、有意(カイp<0.05)のままであり、このように、SNP関連が偽ではないことを示している。4つの形質のそれぞれについて関連したSNPの分布が大部分はお互いに似ていること、およびこの分布が、大部分が良いAIMである有効なSNPの分布と類似していることは、これらの実験で測定されたSNPの大部分が集団構造のレポーターであること、ならびに集団構造の類似した要素が、2つの薬物のそれぞれへの応答(どのような方法で測定されたとしても)、加えて髪および虹彩の色の色素形成と相関していることを示している。 For iris colors, the majority of SNP associations remain significant (Chi p <0.05) after correction for multiple tests, thus indicating that SNP associations are not false. These experiments show that the distribution of related SNPs for each of the four traits is largely similar to each other, and that this distribution is similar to the distribution of effective SNPs that are mostly good AIMs. The majority of the SNPs measured in (1) are reporters of the population structure, and similar elements of the population structure respond to each of the two drugs (no matter how), plus hair And correlate with iris color pigmentation.

これらの4つの形質が機構的に関連がない(少なくとも現行の知識にとって)ようにみえることは注目に値し、髪または目の色が、2つの無作為に選択された薬物への応答にどのように関連しうるかに関して直観的に明らかではない。しかしながら、形質のそれぞれについての重要なSNPに関するプロファイルにおける類似性は、それぞれが共通の1組の染色体マーカーについての配列の知識で、有意な程度で、予測されうることを示唆している。これらのマーカーはBGAの優秀な指標であることが知られているため、その結果は、4つの関連のない形質のそれぞれが、図12〜16における測定された特定のSNPを測定することによるよりもむしろ、BGAを測定することにより、ある程度は予測されうることを示している。 It is noteworthy that these four traits appear to be mechanistically unrelated (at least for current knowledge), and the color of the hair or eyes depends on the response to the two randomly selected drugs. It is not intuitively clear as to how they can be related. However, the similarity in profiles for key SNPs for each of the traits suggests that each can be predicted to a significant extent with sequence knowledge about a common set of chromosomal markers. Since these markers are known to be excellent indicators of BGA, the results are more likely by each of the four unrelated traits measured by measuring the specific SNP measured in Figures 12-16. Rather, it shows that it can be predicted to some extent by measuring BGA.

本明細書に開示されているような単純にBGAを測定することは、上でのプロットにおいて特定のAIMを測定することほどの、4つの形質のそれぞれについての予測的力を与えない。しかしながら、図15でのマーカーを用いることは、虹彩の色について良い分類正確さを可能にする。これらの結果は、上のプロットにおける異なるAIMが集団構造または下位集団の異なる要素についての情報を与えることを示し、図12のSNPの大部分が大陸的BGAによる大まかな集団構造の良い指標であるが、それぞれが、スカンジナビア人と地中海人種の祖先の間、または民族群内までものような構造の他の細かいレベルについての情報もどの程度まで与えるかはまだ測定されていない。そのような「潜在的」構造は、本開示の前には、生物地理学的意味のある方法で共通の民族の個体の大きな集団で認識される構造の繊細な要素を定義することは可能ではなかったため、以前には、信頼性のある信頼認定様式で定義されえなかった。例えば、大部分インドヨーロッパ系の赤毛の個体は、他のインドヨーロッパ人(または他の)患者より20%多い麻酔を必要とすることが知られている(Cohen、前記、2002)。これらの赤毛の個体はまた、特定の一般の麻酔薬の影響下で高血圧および出血への傾向を示し、このように、未知の病因の深刻な臨床的問題を提示している。 Simply measuring BGA as disclosed herein does not provide the predictive power for each of the four traits as measuring a particular AIM in the plot above. However, using the markers in FIG. 15 allows good classification accuracy for iris colors. These results show that the different AIMs in the above plot give information about different elements of the population structure or subgroup, and the majority of the SNPs in Figure 12 are good indicators of the rough population structure by continental BGA However, it has not yet been measured to what extent each provides information about other fine levels of structure, such as between Scandinavian and Mediterranean ancestry, or even within ethnic groups. Such “latent” structures are not possible prior to this disclosure to define sensitive elements of structures that are recognized in a large population of individuals of a common ethnicity in a biogeographically meaningful manner. Previously, it could not be defined in a reliable trust certification form. For example, most IndoEuropean redhead individuals are known to require 20% more anesthesia than other IndoEuropean (or other) patients (Cohen, supra, 2002). These redhead individuals also show a tendency to hypertension and bleeding under the influence of certain common anesthetics, thus presenting serious clinical problems of unknown etiology.

赤毛の個体は、英国(アイルランドおよびブリテン)ではよくあり、英国はまた、ヨーロッパの他の地域の個体とよりもお互いとより多くの祖先を共有する個体を含む。それとして、それらは、人類系図からはずれた不確定の構造の分派としてみなされうる。たった1つの遺伝子が赤毛色と結びつけられ(MCIR)、この遺伝子がそのような多面発現性効果をもつため、それの配列が多数の麻酔薬に対するもののような多様かつ複雑な生理学的応答における変動性に寄与していると想像することはむずかしい。さらに、赤毛をもつどの人も既知の赤毛MC1R変異体を含むとは限らず、なおたいていは異常型麻酔応答を示す。むしろ、赤毛色が、また麻酔応答と相関する集団構造の要素と相関するということがより可能性が高い；すなわち、赤毛色および異常型麻酔応答についての遺伝子は、これらの2つの形質がよりありふれている人類系図の特定の分派に、固有である、または濃縮されている。このように、赤毛色および異常型麻酔応答についての遺伝子は、集団構造の機能として分布しており、同様に、従って、本明細書に開示されているように、多くの他の形質ももつ。 Redhead individuals are common in the UK (Ireland and Britain), which also includes individuals that share more ancestors with each other than individuals in other parts of Europe. As such, they can be viewed as an uncertain structural branch that deviates from the human genealogy. Because only one gene is associated with red hair color (MCIR) and this gene has such a pleiotropic effect, its sequence varies in diverse and complex physiological responses such as those for multiple anesthetics It is difficult to imagine that it contributes to Furthermore, not everyone with red hair will contain a known red-haired MC1R variant, and usually will exhibit an abnormal anesthetic response. Rather, it is more likely that red hair color also correlates with elements of the population structure that correlate with anesthesia responses; that is, genes for red hair color and abnormal anesthetic responses are more common in these two traits Are specific or enriched to a particular branch of the human genealogy. Thus, the genes for red hair color and abnormal anesthetic responses are distributed as a function of population structure, and thus also have many other traits, as disclosed herein.

本明細書に開示されたAIMおよび方法は、大まかな大陸的構造、加えて民族性に関連した構造および潜在的構造までも含む、様々なレベルの構造の測定に適している(例えば、太平洋諸島系を他の東アジア人から分割することができるほとんど30個のマーカーが同定された；すなわち、大陸的構造より細かいレベルの構造)。AIMの情報提供性は、創始者効果、遊走、ボトルネック、遺伝的浮動および/または淘汰を通しての進化的ヒト発生の過程に渡って生まれるが、これらの力が、髪の色または麻酔応答に、生まれたAIMをこれらの2つの型の形質と相関させることがあるとして、焦点をおいている必要はない。北西部大陸のヨーロッパ人において本質的に異なる表現型について情報を与えるAIMがあるにちがいないこの場合とほぼ同様、一般に、インドヨーロッパ人においても本質的に異なる表現型について情報を与えるAIMがあるにちがいない。Lipitor(商標)応答、Zocor(商標)応答、髪の色および目の色について良いデルタ値をもつAIMにおいて類似性を実証している結果(図12〜16)は、これらの表現型のそれぞれについて情報を与える集団構造のレベルを示している。 The AIMs and methods disclosed herein are suitable for measuring various levels of structure, including rough continental structures as well as structures and potential structures related to ethnicity (e.g., Pacific Islands Almost 30 markers have been identified that can divide the system from other East Asians; that is, finer level structures than continental structures). AIM's informability is born across the process of evolutionary human development through founder effects, migration, bottlenecks, genetic drift and / or wrinkles, but these forces can affect hair color or anesthetic responses, There is no need to focus on the fact that the born AIM may correlate with these two types of traits. Just as in this case, there must be an AIM that informs about essentially different phenotypes in Europeans in the northwestern continent. No. The results demonstrating similarities in AIM with good delta values for Lipitor ™ response, Zocor ™ response, hair color and eye color (Figures 12-16) for each of these phenotypes Indicates the level of collective structure that provides information.

関連の大きさは形質間で釣り合わないことは留意されるべきである。例えば、虹彩の色について最強のAIMは、Lipitor(商標)応答について最強のAIMではない、など。また、関連の方向は、必ずしも形質中で同じではない；このように、青色虹彩の色についてアゴニスト関連(正に関連した)は、特定のLipitor(商標)応答結果についてアゴニストかまたはアンタゴニスト(負に関連した)かのいずれかでありうる。 It should be noted that the magnitude of association does not balance between traits. For example, the strongest AIM for iris color is not the strongest AIM for Lipitor ™ response, etc. Also, the direction of association is not necessarily the same in the trait; thus, an agonist association (positively associated) for the color of the blue iris is an agonist or antagonist (negatively associated with a particular Lipitor ™ response result). Any of the above).

アフリカ人と東アジア人の間を特異的に識別するランダムに選択されたAIMは、必要ではないが、特定の組の形質についての情報を含みうる、なぜなら、それらは、必要ではないが、その形質に相関したヒト集団構造の特定の要素(すなわち、これらの形質がよりありふれている人類系図の分派)のマーカーでありうるからである。同様に、インドヨーロッパ人とアフリカ人の間を識別するAIMは、必ずしも、麻酔応答または赤毛の予測を助けるのに必要な情報をそれらと共に保有するとは限らない。形質値が集中しているまたは少ない比率で表示されている人類の特定の分派の個体間を識別する対立遺伝子をもつそれらのAIMのみが、形質値、この例では、麻酔および赤毛、または開示されているように、Lipitor(商標)、Zocor(商標)への応答、髪の色および目の色、を予測するのに必要な情報を保有する。特定のSNPは、集団構造の特定の大まかな要素(ヨーロッパ人対サハラ以南アフリカ人)、または下位構造(北ヨーロッパ人対地中海人種のインドヨーロッパ人)、または微細構造(スコットランド人対アイルランド人対英国人；または赤毛の北ヨーロッパ人対他の髪の色の北ヨーロッパ人；または薬物に応答する北ヨーロッパ人対その薬物に応答しない他の北ヨーロッパ人)についての良いAIMである。いくつかのAIMは、集団構造のいくつかのレベルについて情報を与えるが、他のものはそうではないし、ヒトゲノムにおけるSNPの大部分は、集団構造のいずれのレベルの情報も全く保有しない(すなわち、それらはAIMではない)。 A randomly selected AIM that specifically distinguishes between Africans and East Asians is not required, but may contain information about a particular set of traits, because they are not required This is because it can be a marker of a specific element of the human population structure that correlates with traits (ie, a branch of the human genealogy where these traits are more common). Similarly, AIMs that distinguish between Indo-Europeans and Africans do not necessarily have the information they need to help predict anesthesia responses or redheads. Only those AIMs with alleles that distinguish between individuals of a particular fraction of humanity whose trait values are concentrated or displayed at a low rate are trait values, in this example anesthesia and red hair, or disclosed. As such, it possesses the information necessary to predict Lipitor ™, response to Zocor ™, hair color and eye color. A specific SNP can be a specific rough element of the population structure (European vs. sub-Saharan Africans), or substructure (Northern European vs. Indo-European of Mediterranean), or fine structure (Scottish vs. Irish vs. A good AIM for a British (or a Northern European with red hair vs. a Northern European with another hair color; or a Northern European who responds to a drug vs. another Northern European who does not respond to that drug). Some AIMs provide information about some level of population structure, others do not, and the majority of SNPs in the human genome do not carry any information at any level of the population structure (i.e. They are not AIM).

開示された方法の第一次要素は、たいていのヒト形質が、様々なレベルにおいて集団構造と関連したAIMの詳細な測定を通して予測されうる、ただし、その形質が構造のその要素と相関するとの条件である、ことである。第二次要素は、DNAから形質値を予測するための分類器、またはSNPマーカーの収集物、および方法が、集団構造のそのような認識を通してたいていのヒト形質について構築されうることである。本明細書に開示されているように、そのような適用は、ヒスパニックまたはアフリカ系アメリカ人のような特定の混合された群に見出される広範囲のLDを通してだけでなく、相関を通して達成されうるが、対象のいずれのサンプルについても、人種または民族的背景にかかわらず、用いられるAIMは、その形質が相関している集団構造の要素について適切であるとの条件である。本発明の方法は、ゲノムを良いAIMについて採掘し、それらの価値をAIMとして認定し、ヒト表現型の背景に対する集団構造を正確に測定するため、これらの結果は遂げられうる。 The primary element of the disclosed method is that most human traits can be predicted through detailed measurements of AIM associated with population structure at various levels, provided that the trait correlates with that element of the structure. It is that. A secondary factor is that a classifier, or collection of SNP markers, and methods for predicting trait values from DNA can be constructed for most human traits through such recognition of population structure. As disclosed herein, such applications can be achieved through correlation as well as through a wide range of LDs found in certain mixed groups such as Hispanic or African Americans, For any sample of interest, regardless of race or ethnic background, the AIM used is a condition that it is appropriate for the elements of the population structure to which the trait is correlated. These results can be achieved because the method of the present invention mines the genome for good AIM, certifies their value as AIM, and accurately measures population structure against the background of the human phenotype.

図12〜16により表されている研究において観察された傾向は、多くの他の形質について観察されており、SNP関連は、そのような「浸透度」をもつので、それらは多重検定についての補正によく耐える(Steenlandら、前記、2000)。それとして、ヒトゲノムに渡ってAIMを測定することにより、たとえ、マーカーが形質についての表現型活性遺伝子座と連鎖不平衡にないとしても、事実上任意の与えられたヒト形質の値を予測または推論することに関連した集団構造、下位構造または微細構造の要素を知ることができる。あらゆるヒト形質が集団構造の特定の要素と多かれ少なかれ相関しているため、単純なまたは複雑な、臨床の、娯楽の、法医学のもしくは他の価値の、またはそうでない、任意のヒト形質に、この相関は適用する。 The trends observed in the studies represented by FIGS. 12-16 have been observed for many other traits, and SNP associations have such “penetration”, so they are corrected for multiple tests. (Steenland et al., Supra, 2000). As such, measuring AIM across the human genome predicts or infers the value of virtually any given human trait, even if the marker is not in linkage disequilibrium with the phenotypically active locus for the trait It is possible to know the elements of the collective structure, substructure or fine structure related to Because any human trait is more or less correlated with a particular element of the population structure, any human trait that is simple or complex, clinical, recreational, forensic or other value or not Correlation applies.

個体においてゲノムに渡ってAIMを測定し、どれが(もしあれば)、形質値と相関するかを同定しない限りは、形質が集団構造対下位構造対下位の下位の構造対微細構造について分離するかどうかを演繹的に知ることは可能ではない。事実上任意の形質は、構造の少なくとも1つの要素と相関し、いくつかは大まかな構造と(ヒト皮膚色素形成の場合のような)、いくつかは下位構造と(南アジア系インドヨーロッパ人(インド人)と北ヨーロッパ系インドヨーロッパ人(例えば、アイルランド人)の間での虹彩、髪または皮膚の色素形成の場合のような)、およびいくつかは微細構造と(大陸のヨーロッパ人中での赤毛または麻酔応答の場合のような)、相関する。形質が、その形質の推論のためのAIMを測定かつ見出すために集団構造のどのレベルと相関しているか、を知ることは重要ではない、その形質との統計的関連について複数の一般的AIMを測定かつ検定することが必要なだけである。それとして、ヒト遺伝子配列を形質と結びつけて、それらが予測または推論されうるための方法が提供される。そのような方法は、例えば、臨床および法医学分野において価値がある、なぜなら、一般の形質(薬物への応答または疾患を発生しうる疾病素質)を予測する場合、その一般の形質(薬物応答または疾患素因)の正確な推論をすることが重要であるが、生物学的または機構的に関連した配列が測定されることは重要ではないためである。 As long as AIM is measured across the genome in an individual and it does not identify which (if any) correlates with the trait value, the trait segregates for population structure vs. substructure vs subordinate substructure vs. fine structure It is not possible to know a priori whether or not. Virtually any trait correlates with at least one element of structure, some with rough structure (as in human skin pigmentation) and some with substructure (South Asian Indo-European ( Indian) and Northern European Indo-Europeans (e.g. Irish), as in the case of iris, hair or skin pigmentation), and some with fine structure (in continental Europeans) Correlates, as in the case of red hair or anesthetic responses). It is not important to know which level of population structure a trait correlates to measure and find the AIM for the inference of that trait. Multiple general AIMs for statistical association with that trait It only needs to be measured and verified. As such, methods are provided for associating human gene sequences with traits so that they can be predicted or inferred. Such methods are valuable, for example, in the clinical and forensic field, because when predicting a common trait (drug response or disease predisposition that can cause disease), the general trait (drug response or disease This is because it is important to make an accurate inference of the predisposition), but it is not important that the biologically or mechanistically related sequences are measured.

実施例6
最大尤推定値のためのアルゴリズム
ソフトウェアプログラムは、多座AIM遺伝子型を用いて個体のBGA混合の最大尤推定値を決定するためにHanisら(前記、1986)のアルゴリズムに基づいて書かれた。比例的祖先を決定するために有用なアルゴリズムを図解しているフローチャートは、表12に提供されている。どのようにアルゴリズムが作動するのかに関する例は表13に示され、祖先割合計算の結果は表14に示されている。 Example 6
Algorithm for Maximum Likelihood Estimates A software program was written based on the algorithm of Hanis et al. (Supra, 1986) to determine the maximum likelihood estimate of an individual's BGA mixture using a multidentate AIM genotype. A flowchart illustrating an algorithm useful for determining proportional ancestry is provided in Table 12. An example of how the algorithm works is shown in Table 13, and the results of the ancestry percentage calculation are shown in Table 14.

δ値は、マーカーの祖先情報提供性の表現である(Deanら、1994)。二対立遺伝子マーカーについて、頻度差(δ)は、p_x-p_y(q_y-q_xに等しい)に等しく、p_xおよびp_yは集団XおよびYにおける一方の対立遺伝子の頻度であり、q_xおよびq_yは他方の頻度である。遺伝子座内および間において対立遺伝子状態における独立性からの逸脱を検定するために、本発明者らは、MLD精密検定を用いた(Zaykinら、前記、1995)。実施例2に用いられた71個のAIMの収集物は、4次元(サハラ以南アフリカ人、先住アメリカ人、インドヨーロッパ人および東アジア人)の問題の6つの可能なペアのそれぞれの内において累積δ値を最大限にし、かつペアのそれぞれの間で蓄積δ値における差を最小限にするように選択された。 The δ value is a representation of the marker ancestry information provision (Dean et al., 1994). For biallelic markers, the frequency difference (δ) is equal to p _x -p _y (equal to q _y -q _x ), where p _x and p _y are the frequencies of one allele in populations X and Y, q _x and q _y are the other frequencies. To test for deviations from independence in allelic status within and between loci, we used the MLD exact test (Zaykin et al., Supra, 1995). The collection of 71 AIMs used in Example 2 is cumulative within each of the six possible pairs of four-dimensional (sub-Saharan African, Native American, Indo-European and East Asian) problems. It was chosen to maximize the δ value and minimize the difference in accumulated δ values between each of the pairs.

アルゴリズムは、一度に3つの群を用いて多座遺伝子型に対応する比例的所属の尤度推定を得るために集団特異的対立遺伝子頻度を逆にする；主として計算法の便宜上、およびまた、4次元混合は比較的まれである可能性が高いため、3つの群が用いられた。例えば、100%インドヨーロッパ人、0%先住アメリカ人、0%東アジア人の尤度が計算される、その後、99%インドヨーロッパ人、1%先住アメリカ人、0%東アジア人が次に計算されるなど、すべての可能なインドヨーロッパ人、先住アメリカ人および東アジア人の割合が考慮されるまであり、その後、その工程は、すべての可能なインドヨーロッパ人、先住アメリカ人およびアフリカ人の割合、ならびにすべての可能な先住アメリカ人、アフリカ人および東アジア人の割合について繰り返される。最大値の尤度が最大尤推定値(MLE)として選択される。三角形プロット上に単一のMLEをプロットする場合、尤度がMLEの2倍、5倍および10倍内である空間が範囲を定められる(図3参照)；複数のMLEsが単一の三角形プロットに示される場合、これらの区間は一般的にプロットされない。 The algorithm reverses the population-specific allele frequency to obtain a likelihood estimate of proportional affiliation corresponding to the multilocus genotype using three groups at once; mainly for computational convenience and also 4 Three groups were used because dimensional mixing is likely to be relatively rare. For example, the likelihood of 100% Indian Europeans, 0% Native Americans, 0% East Asians is calculated, then 99% Indian Europeans, 1% Native Americans, 0% East Asians are calculated next And so on until all possible Indo-European, Native American and East Asian percentages are considered, and then the process is followed by all possible Indo-European, Native American and African percentages , And repeated for all possible Native American, African and East Asian proportions. The likelihood of the maximum value is selected as the maximum likelihood estimate (MLE). When plotting a single MLE on a triangle plot, the space where the likelihood is within 2x, 5x and 10x MLE is delimited (see Figure 3); multiple MLEs are a single triangle plot In general, these intervals are not plotted.

（表１２）祖先計算のためのアルゴリズム−フローチャート

Table 12: Algorithm for ancestor calculation-flow chart

（表１３）アルゴリズムを用いる比例的祖先決定の例

I. 目隠しサンプルについて最良の3つの集団を選び取る
II. 最大尤度値をもつ割合を得る
1. 最良の3つの集団を選び取る：
アルゴリズム：
すべての集団について
{
すべてのSNPについて
{
集団合計<-集団合計+期待遺伝子型頻度。
}
}
最大値をもつ3つの集団を選び取る。
段階1：
すべての集団について
段階2：
SNP1は不均一の遺伝子型をもつ；対立遺伝子はGおよびTである。
SNP1についての期待遺伝子型=log(2*P(G,1)*P(T,1))；
SNP2は均一の遺伝子型をもつ
SNP2についての期待遺伝子型=log(P(T,1)*P(T,1))；
尤度値集団1=SNP1についての期待GT+SNP2についての期待TT
すべての集団について段階2を繰り返す
それらの4つの集団値から最良の3つの尤度値を選び取る。
それらの選択された3つの集団推定割合について。
1. 以下から始まる
λ₁=0、λ₂=0およびλ₃=1 0+0+1=1
2. 尤度値を計算する：
SNP1について期待遺伝子型を推定する
SNP1は不均一の遺伝子型をもつ；対立遺伝子はGおよびTである。
サンプルからの推定対立遺伝子頻度：
対立遺伝子1推定頻度(A1EF)=λ₁.p(G,1)+λ₂.p(G,2)+λ₃.p(G,3)
p(G,1)−集団1におけるG対立遺伝子頻度
p(G,2)−集団2におけるG対立遺伝子頻度
p(G,3)−集団3におけるG対立遺伝子頻度
混合している割合λ₁、λ₂およびλ₃は未知のパラメーターとして処理される。
対立遺伝子2推定頻度(A2EF)=λ₁.p(T,1)+λ₂.p(T,2)+λ₃.p(T,3)
p(T,1)−集団1におけるT対立遺伝子頻度
p(T,2)−集団2におけるT対立遺伝子頻度
p(T,3)−集団3におけるT対立遺伝子頻度
パラメーターの尤度は、ハーディ-ワインベルグの法則の仮定の下、新しい観察においてそれぞれ観察された遺伝子型についての確率に掛けることにより得られる。
SNP1は不均一の遺伝子型をもつため
SNP1についての期待遺伝子型=log(2*A1EF*A2EF)；
SNP2について期待遺伝子型を推定する
サンプルからの推定対立遺伝子頻度：
対立遺伝子1推定頻度(A1EF)=λ₁.p(T,1)+λ₂.p(T,2)+λ₃.p(T,3)
p(T,1)−集団1におけるT対立遺伝子頻度
p(T,2)−集団2におけるT対立遺伝子頻度
p(T,3)−集団3におけるT対立遺伝子頻度
混合している割合λ₁、λ₂およびλ₃は未知のパラメーターとして処理される。
SNP2は均一の遺伝子型をもつため
SNP2についての期待遺伝子型=log(A1EF*A1EF)；
尤度値
すべてのSNPのすべての期待遺伝子型を加えることにより尤度値を計算する
尤度=SNP1についての期待遺伝子型+SNP2についての期待遺伝子型；
異なる未知のパラメーターを用いることにより尤度値を計算する。(段階2を繰り返す)
3. 最大尤度値および対応する未知のパラメーターを得る。
それらの未知のパラメーターは、割合以外にない。 Table 13: Example of proportional ancestor determination using algorithm

I. Pick the best three populations for blindfold samples
II. Get the fraction with the maximum likelihood value
1. Pick the best three groups:
algorithm:
About all groups
{
About all SNPs
{
Population total <-Population total + Expected genotype frequency.
}
}
Select the three groups with the maximum values.
Stage 1:
Stage 2 for all groups:
SNP1 has a heterogeneous genotype; alleles are G and T.
Expected genotype for SNP1 = log (2 * P (G, 1) * P (T, 1));
SNP2 has a uniform genotype
Expected genotype for SNP2 = log (P (T, 1) * P (T, 1));
Likelihood value group 1 = Expectation TT for SNP1 Expected TT for SNP2
Pick the best three likelihood values from those four population values that repeat step 2 for all populations.
About those three selected population estimates.
1. Starting with λ ₁ = 0, λ ₂ = 0 and λ ₃ = 1 0 + 0 + 1 = 1
2. Calculate the likelihood value:
Estimate expected genotype for SNP1
SNP1 has a heterogeneous genotype; alleles are G and T.
Estimated allele frequency from sample:
Allele 1 estimated frequency (A1EF) = λ ₁ .p (G, 1) + λ ₂ .p (G, 2) + λ ₃ .p (G, 3)
p (G, 1) -G allele frequency in population 1
p (G, 2) -G allele frequency in population 2
The proportions λ ₁ , λ ₂ and λ ₃ mixing in p (G, 3) -population ₃ are treated as unknown parameters.
Allele 2 estimated frequency (A2EF) = λ ₁ .p (T, 1) + λ ₂ .p (T, 2) + λ ₃ .p (T, 3)
p (T, 1)-T allele frequency in population 1
p (T, 2)-T allele frequency in population 2
The likelihood of the T allele frequency parameter in p (T, 3) -population 3 is obtained by multiplying the probability for each genotype observed in each new observation under the assumption of Hardy-Weinberg law.
SNP1 has a heterogeneous genotype
Expected genotype for SNP1 = log (2 * A1EF * A2EF);
Estimated allele frequencies from samples that estimate expected genotypes for SNP2:
Allele 1 Estimated Frequency (A1EF) = λ ₁ .p (T, 1) + λ ₂ .p (T, 2) + λ ₃ .p (T, 3)
p (T, 1)-T allele frequency in population 1
p (T, 2)-T allele frequency in population 2
The proportions λ ₁ , λ ₂ and λ ₃ mixing in p (T, 3) -population ₃ are treated as unknown parameters.
SNP2 has a uniform genotype
Expected genotype for SNP2 = log (A1EF * A1EF);
Likelihood value is calculated by adding all expected genotypes of all SNPs. Likelihood value = Likely expected genotype for SNP1 + Expected genotype for SNP2;
A likelihood value is calculated by using different unknown parameters. (Repeat step 2)
3. Obtain the maximum likelihood value and the corresponding unknown parameter.
There are no other unknown parameters.

（表１４）祖先割合計算
祖先頻度表

例

III. 目隠しサンプルについて最良の3つの集団を選び取る
IV. 最大尤度値をもつ割合を得る
a. 3つの最良の集団を選び取る
アフリカ人(目隠しサンプル100%アフリカ人を仮定する)：
SNP1対立遺伝子：G、T
アフリカ人における「G」対立遺伝子頻度 P(G)=0.8
アフリカ人における「T」対立遺伝子頻度 P(T)=0.2
SNP1についての期待遺伝子型値=log(2*P(G)*P(T))
=log(2*0.8*0.2)
=-0.4948
SNP2対立遺伝子：T、T
アフリカ人における「T」対立遺伝子頻度 P(T)=0.7
SNP2についての期待遺伝子型値=log(P(T)*P(T))
=log(0.7*0.7)
=-0.3098
SNP3対立遺伝子：C、T
アフリカ人における「C」対立遺伝子頻度 P(C)=0.9999
アフリカ人における「T」対立遺伝子頻度 P(T)=0.0001
SNP3についての期待遺伝子型値=log(2*P(C)*P(T))
=log(2*0.9999*0.0001)
=-3.6990
アフリカ人についての尤度=-0.4948-0.3098-3.6990
=-4.5036
ヨーロッパ人(目隠しサンプル100%ヨーロッパ人を仮定する)：
SNP1対立遺伝子：G、T
ヨーロッパ人における「G」対立遺伝子頻度 P(G)=0.9
ヨーロッパ人における「T」対立遺伝子頻度 P(T)=0.1
SNP1についての期待遺伝子型値=log(2*P(G)*P(T))
=log(2*0.9*0.1)
=-0.7447
SNP2対立遺伝子：T、T
ヨーロッパ人における「T」対立遺伝子頻度 P(T)=0.7
SNP2についての期待遺伝子型値=log(P(T)*P(T))
=log(0.7*0.7)
=-0.3098
SNP3対立遺伝子：C、T
ヨーロッパ人における「C」対立遺伝子頻度 P(C)=0.8
ヨーロッパ人における「T」対立遺伝子頻度 P(T)=0.2
SNP3についての期待遺伝子型値=log(2*P(C)*P(T))
=log(2*0.8*0.2)
=-0.4948
2. ヨーロッパ人についての尤度=-0.7447-0.3098-0.4948
=-1.5493
先住アメリカ人(目隠しサンプル100%NAを仮定する)：
SNP1対立遺伝子：G、T
NAにおける「G」対立遺伝子頻度 P(G)=0.6
NAにおける「T」対立遺伝子頻度 P(T)=0.4
SNP1についての期待遺伝子型値=log(2*P(G)*P(T))
=log(2*0.6*0.4)
=-0.3187
SNP2対立遺伝子：T、T
NAにおける「T」対立遺伝子頻度 P(T)=0.5
SNP2についての期待遺伝子型値=log(P(T)*P(T))
=log(0.5*0.5)
=-0.6020
SNP3対立遺伝子：C、T
NAにおける「C」対立遺伝子頻度 P(C)=0.7
NAにおける「T」対立遺伝子頻度 P(T)=0.3
SNP3についての期待遺伝子型値=log(2*P(C)*P(T))
=log(2*0.7*0.3)
=-0.3767
3. 先住アメリカ人についての尤度=-0.3187-0.6020-0.3767
=-1.2974
中東(目隠しサンプル100%MEを仮定する)：
SNP1対立遺伝子：G、T
MEにおける「G」対立遺伝子頻度 P(G)=0.7
MEにおける「T」対立遺伝子頻度 P(T)=0.3
SNP1についての期待遺伝子型値=log(2*P(G)*P(T))
=log(2*0.7*0.3)
=-0.3767
SNP2対立遺伝子：T、T
MEにおける「T」対立遺伝子頻度 P(T)=0.9
SNP2についての期待遺伝子型値=log(P(T)*P(T))
=log(0.9*0.9)
=-0.0915
SNP3対立遺伝子：C、T
MEにおける「C」対立遺伝子頻度 P(C)=0.9
MEにおける「T」対立遺伝子頻度 P(T)=0.1
SNP3についての期待遺伝子型値=log(2*P(C)*P(T))
=log(2*0.9*0.1)
=-0.7447
4. 中東についての尤度=-0.3767-0.0915-0.7447
=-1.2129
アフリカ人についての尤度=-4.5036
5. ヨーロッパ人についての尤度=-1.5493
6. 先住アメリカ人についての尤度=-1.2974
7. 中東についての尤度=-1.2129
この場合、本発明者らは、アフリカ人を落とし、割合について他の3つを考える。
最大尤度値
そこで、いくつかの値を未知のパラメーターへ与えることを始める
I=ヨーロッパ人 J=先住アメリカ人 K=中東
常に、I+j+k=1
I=0；j=0；k=1
{
SNP1対立遺伝子：G、T
ヨーロッパ人における「G」対立遺伝子頻度 P(G,1)=0.9
NAにおける「G」対立遺伝子頻度 P(G,2)=0.6
MEにおける「G」対立遺伝子頻度 P(G,3)=0.7
対立遺伝子1推定頻度(A1EF)=I*P(G,1)+J*P(G,2)+K*P(G,3)
=0*0.9+0*0.6+1*0.7
=0.7
ヨーロッパ人における「T」対立遺伝子頻度 P(T,1)=0.1
NAにおける「T」対立遺伝子頻度 P(T,2)=0.4
MEにおける「T」対立遺伝子頻度 P(T,3)=0.3
対立遺伝子2推定頻度(A1EF)=I*P(T,1)+J*P(T,2)+K*P(T,3)
=0*0.1+0*0.4+1*0.3
=0.3
SNP1についての期待遺伝子型値=log(2*A1EF*A2EF)
=log(2*0.7*0.3)
=-0.3767
SNP2対立遺伝子：T、T
ヨーロッパ人における「T」対立遺伝子頻度 P(T,1)=0.7
NAにおける「T」対立遺伝子頻度 P(T,2)=0.5
MEにおける「T」対立遺伝子頻度 P(T,3)=0.9
対立遺伝子1推定頻度(A1EF)=I*P(T,1)+J*P(T,2)+K*P(T,3)
=0*0.7+0*0.5+1*0.9
=0.9
SNP2についての期待遺伝子型値(EGV2)=log(A1EF*A2EF)
=log(0.9*0.9)
=-0.0915
SNP3対立遺伝子：C、T
ヨーロッパ人における「C」対立遺伝子頻度 P(C,1)=0.8
NAにおける「C」対立遺伝子頻度 P(C,2)=0.7
MEにおける「C」対立遺伝子頻度 P(C,3)=0.9
対立遺伝子1推定頻度(A1EF)=I*P(C,1)+J*P(C,2)+K*P(C,3)
=0*0.8+0*0.7+1*0.9
=0.9
ヨーロッパ人における「T」対立遺伝子頻度 P(T,1)=0.2
NAにおける「T」対立遺伝子頻度 P(T,2)=0.3
MEにおける「T」対立遺伝子頻度 P(T,3)=0.1
対立遺伝子2推定頻度(A1EF)=I*P(T,1)+J*P(T,2)+K*P(T,3)
=0*0.2+0*0.3+1*0.1
=0.1
SNP3についての期待遺伝子型値(EGV3)=log(2*A1EF*A2EF)
=log(2*0.9*0.1)
=-0.7447
i. 未知のパラメーターについての尤度値
=EGV1+EGV2+EGV3
=-0.3767-0.0915-0.7447
=-1.2129
ヨーロッパ人=0；NA=0；中東=1について；尤度値は-1.2129である
}
すべての可能な組み合わせについて上のループを繰り返す
0.0,0.0,1.0 -1.2129
1.0,0.0,0.0,
0.0,1.0,0.0,
0.1,0.0,0.9,
0.1,0.1,0.8,
0.1,0.2,0.7,
0.1,0.3,0.6,
0.1,0.4,0.5,
など
最大尤度値を得て、対応する割合が祖先割合である。 (Table 14) Calculation of ancestry ratio
Ancestor frequency table

Example

III. Pick the best three populations for blindfold samples
IV. Get the ratio with the maximum likelihood value
a. Select the three best groups
African (assuming 100% African with blindfold sample):
SNP1 alleles: G, T
“G” allele frequency in Africans P (G) = 0.8
'T' allele frequency in Africans P (T) = 0.2
Expected genotype value for SNP1 = log (2 * P (G) * P (T))
= log (2 * 0.8 * 0.2)
= -0.4948
SNP2 alleles: T, T
'T' allele frequency in Africans P (T) = 0.7
Expected genotype value for SNP2 = log (P (T) * P (T))
= log (0.7 * 0.7)
= -0.3098
SNP3 alleles: C, T
'C' allele frequency in Africans P (C) = 0.9999
'T' allele frequency in Africans P (T) = 0.0001
Expected genotype value for SNP3 = log (2 * P (C) * P (T))
= log (2 * 0.9999 * 0.0001)
= -3.6990
Likelihood for Africans = -0.4948-0.3098-3.6990
= -4.5036
European (assuming 100% European blindfold sample):
SNP1 alleles: G, T
"G" allele frequency in Europeans P (G) = 0.9
“T” allele frequency in Europeans P (T) = 0.1
Expected genotype value for SNP1 = log (2 * P (G) * P (T))
= log (2 * 0.9 * 0.1)
= -0.7447
SNP2 alleles: T, T
"T" allele frequency in Europeans P (T) = 0.7
Expected genotype value for SNP2 = log (P (T) * P (T))
= log (0.7 * 0.7)
= -0.3098
SNP3 alleles: C, T
“C” allele frequency in Europeans P (C) = 0.8
"T" allele frequency in Europeans P (T) = 0.2
Expected genotype value for SNP3 = log (2 * P (C) * P (T))
= log (2 * 0.8 * 0.2)
= -0.4948
2. Likelihood for Europeans = -0.7447-0.3098-0.4948
= -1.5493
Native Americans (assuming a blindfold sample of 100% NA):
SNP1 alleles: G, T
“G” allele frequency in NA P (G) = 0.6
“T” allele frequency in NA P (T) = 0.4
Expected genotype value for SNP1 = log (2 * P (G) * P (T))
= log (2 * 0.6 * 0.4)
= -0.3187
SNP2 alleles: T, T
“T” allele frequency in NA P (T) = 0.5
Expected genotype value for SNP2 = log (P (T) * P (T))
= log (0.5 * 0.5)
= -0.6020
SNP3 alleles: C, T
“C” allele frequency in NA P (C) = 0.7
“T” allele frequency in NA P (T) = 0.3
Expected genotype value for SNP3 = log (2 * P (C) * P (T))
= log (2 * 0.7 * 0.3)
= -0.3767
3. Likelihood for Native Americans = -0.3187-0.6020-0.3767
= -1.2974
Middle East (assuming 100% ME blindfold sample):
SNP1 alleles: G, T
“G” allele frequency in ME P (G) = 0.7
“T” allele frequency in ME P (T) = 0.3
Expected genotype value for SNP1 = log (2 * P (G) * P (T))
= log (2 * 0.7 * 0.3)
= -0.3767
SNP2 alleles: T, T
“T” allele frequency in ME P (T) = 0.9
Expected genotype value for SNP2 = log (P (T) * P (T))
= log (0.9 * 0.9)
= -0.0915
SNP3 alleles: C, T
“C” allele frequency in ME P (C) = 0.9
“T” allele frequency in ME P (T) = 0.1
Expected genotype value for SNP3 = log (2 * P (C) * P (T))
= log (2 * 0.9 * 0.1)
= -0.7447
4. Likelihood for the Middle East = -0.3767-0.0915-0.7447
= -1.2129
Likelihood for Africans = -4.5036
5. Likelihood for Europeans = -1.5493
6. Likelihood for Native Americans = -1.2974
7. Likelihood for the Middle East = -1.2129
In this case, we drop Africans and consider the other three in terms of proportion.
Maximum likelihood value So start giving some values to unknown parameters
I = European J = Indigenous American K = Middle East Always, I + j + k = 1
I = 0; j = 0; k = 1
{
SNP1 alleles: G, T
"G" allele frequency in Europeans P (G, 1) = 0.9
“G” allele frequency in NA P (G, 2) = 0.6
“G” allele frequency in ME P (G, 3) = 0.7
Allele 1 estimated frequency (A1EF) = I * P (G, 1) + J * P (G, 2) + K * P (G, 3)
= 0 * 0.9 + 0 * 0.6 + 1 * 0.7
= 0.7
“T” allele frequency in Europeans P (T, 1) = 0.1
“T” allele frequency in NA P (T, 2) = 0.4
“T” allele frequency in ME P (T, 3) = 0.3
Allele 2 estimated frequency (A1EF) = I * P (T, 1) + J * P (T, 2) + K * P (T, 3)
= 0 * 0.1 + 0 * 0.4 + 1 * 0.3
= 0.3
Expected genotype value for SNP1 = log (2 * A1EF * A2EF)
= log (2 * 0.7 * 0.3)
= -0.3767
SNP2 alleles: T, T
"T" allele frequency in Europeans P (T, 1) = 0.7
“T” allele frequency in NA P (T, 2) = 0.5
“T” allele frequency in ME P (T, 3) = 0.9
Allele 1 estimated frequency (A1EF) = I * P (T, 1) + J * P (T, 2) + K * P (T, 3)
= 0 * 0.7 + 0 * 0.5 + 1 * 0.9
= 0.9
Expected genotype value for SNP2 (EGV2) = log (A1EF * A2EF)
= log (0.9 * 0.9)
= -0.0915
SNP3 alleles: C, T
“C” allele frequency in Europeans P (C, 1) = 0.8
“C” allele frequency in NA P (C, 2) = 0.7
“C” allele frequency in ME P (C, 3) = 0.9
Allele 1 estimated frequency (A1EF) = I * P (C, 1) + J * P (C, 2) + K * P (C, 3)
= 0 * 0.8 + 0 * 0.7 + 1 * 0.9
= 0.9
"T" allele frequency in Europeans P (T, 1) = 0.2
“T” allele frequency in NA P (T, 2) = 0.3
“T” allele frequency in ME P (T, 3) = 0.1
Allele 2 estimated frequency (A1EF) = I * P (T, 1) + J * P (T, 2) + K * P (T, 3)
= 0 * 0.2 + 0 * 0.3 + 1 * 0.1
= 0.1
Expected genotype value for SNP3 (EGV3) = log (2 * A1EF * A2EF)
= log (2 * 0.9 * 0.1)
= -0.7447
i. Likelihood values for unknown parameters
= EGV1 + EGV2 + EGV3
= -0.3767-0.0915-0.7447
= -1.2129
European = 0; NA = 0; Middle East = 1; likelihood value is -1.2129
}
Repeat the above loop for all possible combinations
0.0,0.0,1.0 -1.2129
1.0,0.0,0.0,
0.0,1.0,0.0,
0.1,0.0,0.9,
0.1,0.1,0.8,
0.1,0.2,0.7,
0.1,0.3,0.6,
0.1,0.4,0.5,
The maximum likelihood value is obtained, and the corresponding ratio is the ancestor ratio.

本発明は、上の実施例に関して記載されたが、修飾および変化が本発明の真意および範囲内に含まれることは理解されるものと思われる。従って、本発明は、特許請求の範囲によってのみ限定される。 Although the invention has been described with reference to the above examples, it will be understood that modifications and variations are encompassed within the spirit and scope of the invention. Accordingly, the invention is limited only by the claims.

Claims

A method for inferring an individual's traits with a predetermined confidence level, including the following steps:
a) contacting a sample containing a nucleic acid molecule of a test individual with a hybridizing oligonucleotide, wherein the hybridizing oligonucleotide exhibits at least about 10 ancestral informational markers (AIM) exhibiting a population structure correlated with the trait The nucleotide occurrence of a single nucleotide polymorphism (SNP) in this panel can be detected, and the contacting step is carried out under conditions suitable for detecting the nucleotide occurrence of the AIM of the test individual by the hybridizing oligonucleotide. Stage, and
b) identifying a population structure that correlates with the occurrence of AIM nucleotides in a test individual with a predetermined confidence level, wherein the population structure correlates with a trait, thereby inferring the individual's trait with a predetermined confidence level; Stage.

The method of claim 1, wherein the panel comprises at least about 20 AIMs.

2. The method of claim 1, wherein the trait comprises biogeographic ancestry (BGA).

4. The method of claim 3, wherein the panel comprises AIM shown as SEQ ID NOs: 1-331.

4. The method of claim 3, wherein the panel comprises AIM shown as SEQ ID NOs: 1-71.

4. The method of claim 3, wherein the panel comprises AIM indicated as:
SEQ ID NOs: 7, 21, 23, 27, 45, 54, 59, 63 and 72-152;
SEQ ID NOs: 3, 8, 9, 11, 12, 33, 40, 59, 63 and 153-239;
SEQ ID NOs: 1, 8, 11, 21, 24, 40, 172 and 240-331; or combinations thereof.

2. The method of claim 1, wherein at least one AIM of the panel is not linked to a gene associated with the trait.

4. The method of claim 3, wherein the BGA comprises a percentage of sub-Saharan African, Native American, Indo-European, or East Asian ancestry groups, or a combination of the ancestry groups.

9. The method of claim 8, wherein the BGA comprises a proportion of at least 3 ancestral groups.

BGA of at least sub-Saharan African and Indo-European ancestry groups; Native American and Indo-European ancestry groups; East Asian and Indigenous American ancestry groups; or Indo-European and East Asian ancestry groups 4. The method of claim 3, comprising a percentage.

4. The method of claim 3, wherein the BGA comprises a proportion of at least Native American, East Asian and IndoEuropean ancestry groups; or Sub-Saharan African, Native American and IndoEuropean ancestry groups.

2. The method of claim 1, wherein the trait comprises an individual's responsiveness to a drug.

13. The method of claim 12, wherein the drug is a cancer chemotherapeutic agent.

13. The method of claim 12, wherein the drug is a statin.

2. The method of claim 1, wherein the trait comprises susceptibility to a disease.

16. The method of claim 15, wherein the disease has an ethnic predisposition.

17. The method according to claim 16, wherein the disease is cancer, diabetes or hypertension.

18. The method of claim 17, wherein the cancer is prostate cancer.

16. The method of claim 15, wherein the disease is a neurological disorder.

20. The method of claim 19, wherein the method is schizophrenia or Parkinson's disease.

16. The method of claim 15, wherein the disease is alcoholism.

2. The method of claim 1, wherein the trait comprises a chromogenic trait.

23. The method of claim 22, wherein the pigmentation trait comprises eye color, skin color, hair color, or a combination thereof.

The method of claim 1, further comprising identifying a subpopulation structure of a population structure that correlates with the occurrence of AIM nucleotides in a test individual with a predetermined confidence level, wherein the subpopulation structure is correlated with a trait. Method.

The hybridizing oligonucleotide comprises an oligonucleotide primer, and the method further comprises contacting the sample with a polymerase under conditions suitable for production of the primer extension product, wherein measuring the nucleotide occurrence of the SNP is a primer extension product. 2. The method of claim 1, comprising detecting the presence of.

The hybridizing oligonucleotide comprises an oligonucleotide primer, and the method further comprises contacting the sample with a polymerase under conditions suitable for generation of a primer extension product, and measuring the nucleotide occurrence of the SNP comprises: 2. The method of claim 1, comprising measuring the nucleotide sequence of the primer extension product at a position corresponding to the position.

The hybridizing oligonucleotide comprises an amplification primer pair, and the method further comprises contacting the sample with a polymerase under conditions suitable for production of the amplification product, wherein determining the nucleotide appearance of the SNP is the presence of the amplification product 2. The method of claim 1, comprising detecting.

The hybridizing oligonucleotide comprises an amplification primer pair, and the method further comprises contacting the sample with a polymerase under conditions suitable for the generation of an amplification product, wherein measuring the nucleotide occurrence of the SNP comprises measuring the position of the SNP 2. The method of claim 1, comprising measuring the nucleotide sequence of the amplification product at a position corresponding to.

The method of claim 1, wherein the method is performed in a high throughput format.

The method of claim 1, wherein the method is performed in a multiplexed format.

A method for estimating a proportional ancestor of at least two ancestry groups of individuals with a predetermined confidence level, including the following steps:
a) contacting a sample containing a nucleic acid molecule of a test individual with a hybridizing oligonucleotide, wherein the hybridizing oligonucleotide represents at least about a biogeographic ancestry (BGA) for each ancestral group being examined. A single nucleotide polymorphism (SNP) nucleotide occurrence of a panel of 10 ancestral informative markers (AIM) can be detected, and the contacting step can detect the nucleotide appearance of the test individual's AIM by the hybridizing oligonucleotide. Under conditions suitable for detection; and
b) identifying a population structure that correlates with the nucleotide occurrence of each AIM of the ancestry group being examined, with a predetermined confidence level, wherein the population structure indicates a proportional ancestry, thereby determining an individual's proportional ancestry Estimating with a confidence level of.

32. The method of claim 31, wherein the proportional ancestry comprises a proportion of sub-Saharan African ancestry groups, Native American ancestry groups, Indo-European ancestry groups, East Asian ancestry groups, or combinations thereof.

Proportional ancestry is sub-Saharan African and Indo-European ancestry groups; Native American and Indo-European ancestry groups; East Asian and Indigenous American ancestry groups; or Indo-European and East Asian ancestry groups 32. The method of claim 31, comprising a proportion of:

Proportional ancestry of Native American, East Asian and Indo-European ancestry groups; Sub-Saharan African, Native American and Indo-European ancestry groups; or Sub-Saharan African, Native American and East Asian ancestry groups 32. The method of claim 31, comprising a proportion of ancestral groups.

32. The method of claim 31, wherein the panel for at least one of the ancestral groups comprises the AIM set forth in SEQ ID NOs: 1-331.

32. The method of claim 31, wherein the panel for at least one of the ancestral groups comprises an AIM shown as SEQ ID NOs: 1-71.

32. The method of claim 31, wherein the panel for at least one of the ancestry group comprises an AIM indicated as:
SEQ ID NOs: 7, 21, 23, 27, 45, 54, 59, 63 and 72-152;
SEQ ID NOs: 3, 8, 9, 11, 12, 33, 40, 59, 63 and 153-239;
SEQ ID NOs: 1, 8, 11, 21, 24, 40, 172 and 240-331; or combinations thereof.

32. The method of claim 31, wherein at least one AIM of the panel is not linked to a gene associated with the trait.

32. The method of claim 31, wherein the step of identifying a population structure wherein the proportional ancestry comprises a proportion of three ancestry groups and correlates with the AIM nucleotide occurrence of the test individual comprises the following steps:
Determining likelihood for affiliations to sub-Saharan African ancestry groups, Native American ancestry groups, IndoEuropean ancestry groups and East Asian ancestry groups;
Then selecting the three ancestor groups with the largest likelihood values;
Determining the likelihood of all possible proportional affiliations among the three ancestral groups with the largest likelihood values, thereby correlating the population structure or proportional affiliation with the AIM nucleotide occurrence of the test individual As well as identifying only one proportional combination of maximum likelihood, thereby estimating the proportional ancestry of the individual.

32. The method of claim 31, wherein the step of identifying a population structure in which the proportional ancestry comprises a proportion of three ancestry groups and correlates with the nucleotide occurrence of AIM comprises the following steps:
Making 6 binary comparisons, including determining the likelihood of affiliation with each other in each group;
Then selecting the three ancestor groups with the largest likelihood values;
Determining the likelihood of all possible proportional affiliations among the three ancestral groups with the largest likelihood values, thereby correlating the population structure or proportionality with the AIM nucleotide occurrence of the test individual Identifying the global affiliation; and identifying only one proportional combination of maximum likelihood, thereby estimating the individual's proportional ancestry.

Identifying a population structure whose proportional ancestry contains a proportion of three ancestry groups and correlates with the AIM nucleotide appearance of the tested individual performs three ternary comparisons between the groups, the largest likelihood value Determining the likelihood of all possible proportional affiliations among the three ancestral groups having, thereby identifying a population structure or proportional affiliation that correlates with the nucleotide occurrence of the AIM of the test individual, 32. The method of claim 31, comprising identifying; and identifying only one proportional combination of maximum likelihoods, thereby estimating the proportional ancestry of the individual.

Identifying a population structure where the proportional ancestry contains a proportion of the four ancestry groups and correlates with the nucleotide occurrence of the AIM of the test individual is six binary comparisons, three ternary comparisons, or 1 Performing four quaternary comparisons; determining the likelihood of all possible proportional affiliations among the four ancestral groups with the highest likelihood values, thereby determining the nucleotide occurrence of the AIM of the individual being tested 32. The method of claim 31, comprising: identifying correlated population structure or proportional affiliation; and identifying only one proportional combination of maximum likelihoods, thereby estimating a proportional ancestry of the individual .

Creating a graphical representation of a comparison of the three ancestry groups, each graph representation including a triangle, each ancestor group independently represented by a vertex of a triangle, and the maximum likelihood of proportional affiliation for the individual 41. The method of claim 40, further comprising the step of: the degree value includes a point in the triangle.

44. The method of claim 43, wherein the graphical representation further comprises a confidence contour indicating a confidence level associated with estimating proportional ancestry.

Further comprising identifying a subpopulation structure of the population structure that correlates with the occurrence of AIM nucleotides in the test individual with a predetermined confidence level, wherein the subpopulation structure correlates with the ethnicity of the test individual. Item 31. The method according to Item 31.

46. The method of claim 45, wherein identifying the subpopulation structure comprises the following steps:
Identifying the chromosome of the test individual that contains the AIM that represents the ancestral group of the test subject's proportional ancestry;
Contacting the sample containing the nucleic acid molecule of the test individual with a second hybridizing oligonucleotide, wherein the second hybridizing oligonucleotide can detect the nucleotide appearance of the SNP in the second panel of AIM. And a second panel of AIMs are present on the chromosome of the test individual, including the AIM representing the tester's ancestry group; and a subpopulation structure that correlates with the nucleotide appearance of the second panel of AIMs A stage, wherein the sub-group indicates the ethnicity of the ancestral group of the test individual.

48. The method of claim 45, wherein the ancestry group is Indo-European and the ethnicity includes Northern European or Mediterranean.

32. The method of claim 31, further comprising: creating a global ancestry map, wherein the location of a population having a proportional ancestry corresponding to the proportional ancestry of the test individual is indicated on the ancestor map. .

49. The method of claim 48, further comprising the following steps:
a) overlaying an ancestor map with a phylogenetic map, where the phylogenetic map indicates the location of a population with geopolitical relevance with respect to the examined individual; and
b) statistically combining ancestral and phylogenetic information to obtain the most probable estimate of the test individual's pedigree.

The method of claim 31, wherein identifying a population structure that correlates with the nucleotide occurrence of AIM comprises comparing the AIM nucleotide occurrence of the test individual to a known proportional ancestor corresponding to the nucleotide occurrence of AIM indicative of BGA. Method.

51. The method of claim 50, wherein the database includes known proportional ancestry corresponding to the nucleotide occurrence of AIM indicative of BGA.

52. The method of claim 51, wherein the comparing is performed using a computer.

51. The method of claim 50, wherein each of the known proportional ancestry corresponding to the nucleotide occurrence of AIM indicative of BGA is further comprised of a photograph of the person for whom the known proportional ancestry was determined.

54. The method of claim 53, wherein the photograph comprises a digital photograph.

55. The method of claim 54, wherein digital information including digital photographs is included in the database.

56. The method of claim 55, wherein the digital information in the database is associated with a known proportional ancestor corresponding to the nucleotide occurrence of AIM indicative of the person's BGA in the photograph.

52. The method of claim 51, further comprising identifying a photograph of a person having a proportional ancestry that corresponds to the proportional ancestry of the examined individual.

AIM showing the steps of identifying a photo, scanning a database containing multiple files, each file containing digital information corresponding to a digital photo of a person with a known proportional ancestor, and the BGA of the individual being examined 58. The method of claim 57, comprising identifying at least one photograph of a person having an AIM nucleotide occurrence indicative of a BGA corresponding to the nucleotide occurrence.

A product comprising at least one photograph of a person with a known proportional ancestry corresponding to a population structure containing nucleotide occurrences of AIM representing biogeographic ancestry (BGA).

60. The article of claim 59 contained in a file.

60. The plurality of files comprising the product of claim 59, wherein the plurality of files comprises at least one photograph of a person with a known proportional ancestry corresponding to a population structure that includes a nucleotide occurrence of AIM indicative of BGA.

61. The file of claim 60, wherein the plurality of photographs includes a plurality of photographs, including photographs of persons with known proportional ancestry corresponding to a population structure that includes nucleotide occurrences of AIM indicative of BGA.

64. The file of claim 62, wherein the photos of the plurality include photos of different people having the same known proportional ancestry.

64. The file of claim 62, wherein the photos of the plurality include photos of different people with different known proportional ancestry.

60. The product of claim 59, wherein the at least one photo comprises a digital photo.

68. The product of claim 65, wherein the digital photograph includes digital information.

68. The product of claim 66, wherein the digital information is included in the database.

68. The product of claim 65, comprising a plurality of digital photographs.

66. The plurality of products of claim 65, comprising at least two digital photographs.

70. The plurality of claim 69, wherein the digital photograph includes digital information.

72. The plurality of claim 70, wherein the digital information is included in the database.

A kit comprising a plurality of hybridizing oligonucleotides comprising at least 15 contiguous nucleotides of at least 5 polynucleotides shown in SEQ ID NOs: 1-331, or polynucleotides complementary thereto.

75. The kit of claim 72, wherein the hybridizing oligonucleotide comprises at least 15 contiguous nucleotides of at least 5 polynucleotides shown in SEQ ID NOs: 1-71, or polynucleotides complementary thereto.

75. The kit of claim 72, wherein the hybridizing oligonucleotide of the plurality comprises at least one nucleotide corresponding to a polymorphic position of the polynucleotide or a polynucleotide complementary thereto.

The hybridizing oligonucleotide of the plurality is a SEQ ID NO: 1-34, 36-49, 52-55, or 57-98, 100-105, 107-162, 164-331, or a complementary polynucleotide thereof. 75. The kit of claim 74, comprising nucleotide position 50 of the polynucleotide shown in any of the nucleotides.

73. The kit of claim 72, wherein the hybridizing oligonucleotide of the plurality comprises at least one probe, at least one primer, or a combination thereof.

77. The kit of claim 76, comprising at least one amplification primer.

77. The kit of claim 76, comprising at least one amplification primer pair comprising a forward primer and a reverse primer.

79. The kit of claim 78, further comprising a reagent for performing an amplification reaction using at least one amplification primer pair.

73. The kit of claim 72, wherein the ancestral information providing marker (AIM) further comprises at least one AIM corresponding to the hybridizing oligonucleotide of the plurality.

75. The kit of claim 72, further comprising a detectable label that can be bound to or incorporated into at least one hybridizing oligonucleotide of the plurality.

73. The kit of claim 72, wherein the plurality of hybridizing oligonucleotides are detectably labeled.