WO2011107973A2

WO2011107973A2 - Method for prediction of human iris color

Info

Publication number: WO2011107973A2
Application number: PCT/IB2011/050951
Authority: WO
Inventors: Manfred Heinz Kayser; Fan Liu; Albert Hofman; Andreas Gerardus Uitterlinden
Original assignee: Erasmus University Medical Center Rotterdam; Identitas, Inc.
Priority date: 2010-03-05
Filing date: 2011-03-07
Publication date: 2011-09-09
Also published as: WO2011107973A3

Abstract

A method for predicting the iris color of a human, the method comprising: (a) obtaining a sample of the nucleic acid of the human; (b) genotyping the nucleic acid for at least one of the following polymorphisms: (i) a polymorphism which is (a) in the region between basepairs 76891593 and 77498447 on chromosome 17 according to NCBI Build 36; between basepairs 37100732 and 37761703 on chromosome 21; between basepairs 233690968 and 234296843 on chromosome 1; or between basepairs 233848903 and 234546690 on chromosome 2; and (b) is associated with variation in iris color; or (ii) rs9894429, rs7277820, rs3768056, rs2070959 or a polymorphic site which is in linkage disequilibrium with rs9894429, rs7277820, rs3768056 or rs2070959 at an r² value of at least 0.5; and (c) predicting the iris color based on the results of step (b). A method of genotyping said polymorphisms, and kits comprising or a solid substrate having attached thereto nucleic acid molecules suitable for performing the method.

Description

METHOD FOR PREDICTION OF HUMAN IRIS COLOR

The present invention relates to a method for prediction of the phenotype of a complex polygenic trait. In particular, it relates to a method for prediction of human iris color.

Predicting externally visible characteristics (EVCs) using informative molecular markers, such as those from DNA, has started to become a rapidly developing area in forensic genetics. With knowledge gleaned from this type of data, it could be viewed as a 'biological witness' tool in suitable forensic cases, leading to a new era of 'DNA intelligence' (sometimes referred to as Forensic DNA Phenotyping); an era in which the externally visible traits of an individual may be defined solely from a biological sample left at a crime scene or from a dismembered part of a missing person. Human eye (iris) color is a highly polymorphic phenotype in people of European descent and, albeit less so, in those from surrounding regions such as the Middle East or Western Asia, and is under strong genetic control (R. A. Sturm, T. N. Frudakis, Trends Genet. 20 (2004) 327- 332). Most human populations around the world have non-variable dark brown iris color while blue, green, gray and light brown colors are additionally found in people of European descent, and people originating from Europe-neighbouring regions. Thus, the DNA-based prediction of iris color may be useful in identifying persons of European and neighboring descent, or persons residing in an area which is populated by persons of European descent.

Currently, human identification using nucleic acid markers is completely based on comparing marker profiles (DNA fingerprints, DNA profiles) obtained from crime scene samples with those obtained from known suspects. If no suspect (or close relative thereof) is known to the police no profile can be obtained and compared with the one collected from the crime scene. Consequently, in such cases the person who left the sample at the crime scene and who might have committed the crime can not be identified using genetic (DNA) evidence. Similarly, missing persons are currently identified by comparing a DNA profile obtained from their remains with that obtained from a known relative. If nothing is known about the missing person, no relatives can be identified for genetic testing and no DNA profile is available for comparison. The identification of nucleic acid markers that could reliably predict eye (iris) color would help in finding unknown persons (suspects/missing persons) in a direct way and without comparing DNA profiles. Recent years have yielded intensive studies to increase the genetic understanding of human eye color, via genome-wide association and linkage analysis or candidate gene studies (Sulem et al, Nat. Genet. 39 (2007) 1443-1452; Eiberg et al, Hum. Genet. 123 (2008) 177-187; Kayser et al, Am. J. Hum. Genet. 82 (2008) 411-423; Sturm et al, Am. J. Hum. Genet. 82 (2008) 424-431; Han et al, PLoS Genet. 4 (2008) e1000074; Sulem et al, Nat. Genet. 40 (2008) 835-837; Kanetsky et al, Am. J. Hum. Genet. 70 (2002) 770- 775; Duffy et al, Am. J. Hum. Genet. 80 (2007) 241-252; Zhu et al, Twin Res. 7 (2004) 197-210; Posthuma et al, Behav. Genet. 36 (2006) 12-17; Frudakis et al, Genetics 165 (2003) 2071-2083). The OCA2 gene on chromosome 15 was originally thought to be the most informative human eye color gene due to its association with the human P protein required for the processing of melanosomal proteins, and mutations in this gene do result in pigmentation disorders. However, recent studies have shown that genetic variants in the neighbouring HERC2 gene are more significantly associated with eye color variation than those in OCA2 CSulem et al, 2007, supra; Eiberg et al, supra; Kayser et al, supra; Sturm et al, supra; Han ef al, supra). Also, one of the most significant non-synonymous SNPs associated with eye color, rs1800407 located in exon 12 of the OCA2 gene, acts only as a penetrance modifier of rs12913832 in HERC2 and is, to a lesser extent, independently associated with eye color variation (Sturm ef al, supra). While the HERC2IOCA2 region harbours most blue and brown eye color information, other genes were also identified as contributing to eye color variation, such as SLC24A4, SLC45A2 (MATP), TYRP1, TYR, ASIP, IRF4, CYP1A2, CYP2C8, and CYP2C9 although to a much lesser degree (Sulem ef al 2007, supra; Han ef al, supra; Sulem et al 2008, supra; Kanetsky ef al, supra; Frudakis ef al, supra; WO 2002/097047). Despite this abundance of information concerning the association of various polymorphisms with human iris color variation, there have been few attempts to predict iris color of an individual based on their genotype. Sulem et al, 2007, supra, attempted to predict iris color using polymorphisms within various genes and concluded that, in their study, prediction of blue versus brown iris color is dominated by variants in OCA2. However, in WO 2009/025544 (Kayser ef al; Erasmus University Medical Center Rotterdam) and the corresponding publication Kayser ef al. 2008, supra, various SNPs within the HERC2 gene were found to be more useful than variations within OCA2 for prediction of iris color. The inventors' earlier publication, Liu F, et al (2009) Curr Biol 19: R192-193 ranks currently known SNPs for their contribution to a model for prediction of iris color. All of these prior attempts to predict iris color rely on categorised prediction, i.e. prediction of blue versus non-blue, brown versus non-brown, or blue, brown and intermediate. This approach neglects the fact that iris color is a continuous trait ranging from the brightest blue to the darkest brown with all in-between shades being possible. It would be advantageous to be able to predict the subtleties of iris color using a continuous, quantitative prediction model. Valenzuela et al (2010) J Forensic Sci 55 : 315-322 assayed polymorphisms in 24 candidate genes and concluded that SNPs in HERC2, SLC45A2 and SLC24A5 explain 76% of the variation in eye color in their study population. Valenzuela used a categorical approach to eye color classification, with six color categories. The study population mirrored the ethnic composition of the student population at the University of Arizona.

The present inventors have observed that the A allele of the SNP rs1426654 in the SLC24A5 gene used in Valenzuela's method has a fixed frequency of 1.0 in a population of European descent (HapMap CEU samples). For a SNP to have genuine predictive power for eye color it is necessary that its allele frequency is not fixed in populations of European descent, because such populations exhibit eye color variation. Further, the A allele has a frequency of close to 0 in Han Chinese in Beijing (HapMap-HCB), Japanese in Tokyo (HapMap-JPT) and Yoruba in Ibadan, Nigeria (HapMap-YRI) and has a frequency of 0.33 in Maasai in Kinyawa, Kenya (HapMap-MKK) and 0.59 in Mexican ancestry in Los Angeles, California (HapMap-MEK). Based on general knowledge, it is believed that no categorical eye color variation occurs in people of these HapMap populations. It follows that the SLC24A5 SNP cannot be predictive of eye color variation in people of European descent and was only identified in Valenzuela's study as a marker associated with ethnicity. In addition, in Valenzuela et al, 2010, supra, the prediction accuracy (R2) was evaluated using the same individuals based on whom the prediction model was derived and there is no evidence that the prediction model would be effective in predicting eye color of individuals from independent data sources. Thus, the SLC24A5 SNP is not a genuine eye color marker. As described in Example 1 , the present inventors included the SNP rs16891982 in another gene SLC45A2 (also used in Valenzuela's method) in their analyses and found that it contributed only minimally to quantitative eye color prediction unless its interaction with a SNP in OCA2 was taken into account. Valenzuela did not consider SNP interactions. Of the three SNPs used in Valenzuela's method, only the HERC2 SNP would actually have been capable of contributing more than minimally to eye color prediction in that method, as applied to persons of European descent.

Identifying the most useful polymorphisms for prediction is not simply a matter of using the polymorphisms which are most strongly associated with iris color variation, even if the study population consists only of people of European or Europe-neighbouring descent. The P-values derived from the association testing do not provide sufficient information on the prediction accuracy of the SNPs involved. Further, the genetic association analyses were mostly based on iteratively testing the association between a single SNP and eye color. This does not consider various combinations of associated SNPs, which is important when SNPs are not independent of each other, e.g. in linkage disequilibrium or in genetic interaction. Rather, identifying the most useful polymorphisms for prediction requires analysis of a combination of informative SNPs and application of a dedicated prediction methodology.

Neither is it practical to generate a prediction model using all known polymorphisms, as this would require large numbers of polymorphisms to be genotyped every time that the model was to be applied in order to arrive at a prediction, which would be costly and laborious.

There is therefore a need for a more accurate and yet simple genetic test for prediction of iris color, and for the identification of further polymorphisms which are predictive of iris color. The listing or discussion of an apparently prior-published document in this specification should not necessarily be taken as an acknowledgement that the document is part of the state of the art or is common general knowledge.

FIGURE LEGENDS

Figure 1. The Hue-Saturation (H-S) eye color space in the Rotterdam Study.

(a-c) 3 color categories defined by an ophthalmologist during eye examination and highlighted in the H-S space; (d-h) 5 categories defined by two researchers from digital full eye size photographs used for digital quantitative extraction of eye colors; (i) 4 quartiles of the 1st principle component CHS1; and (j) 4 quartiles of the 2nd principle component CHS2; all depicted on the H-S color space. In (e), the quartiles are depicted from high H and low S to low H and high S as follows: CHS1 <25% as open diamonds; CHS1 25-50% as closed triangles; CHS1 50-75% as crosses; and CHS1 > 75% as closed circles. In (f), the quartiles are depicted from low H and low S to high H and high S as follows: CHS2 >75% as closed circles; CHS2 50-75% as crosses; CHS2 25-50% as closed triangles; and CHS2 <25% as open diamonds. Figure 2. Genotype quality control.

(a) Genotypes from 120 HapMap Phase 2 subjects were merged with the RS3 samples. QCs of RS1 and RS2 samples have been described in detail previously. The first 2 principal components derived from multidimensional decomposition analysis of the 1-IBS matrix are depicted. Filled circles represent the HapMap European (CEU) samples; the HapMap East Asian (CHB+JPT) samples; and the HapMap West African (YRI) samples. Arrows indicate the positions of these circles. RS3 samples are mostly clustered around CEU. Open circles which are not clustered around CEU are samples from RS3. In total 112 RS3 samples outside of 4 standard deviations of the principle component of the CEU samples were removed.

Figure 3. Observed and expected P values for eye color in the Rotterdam Study.

Observed -log¹⁰ P values in a GWA of CHS1 in the merged RS123 dataset are ranked on the y-axis and plotted against the expected distribution under the null on the x-axis. All P values smaller than 10^"10 were truncated at 10 at the log scale. The red dots are the P values excluding the effects of sex, age, and population stratification. Blue dots are the P values excluding the effects of 7 genes previously known to be involved in eye color. Green dots are the P values additionally excluding the effects of 3 newly identified loci, no more SNPs were significant at the genome-wide level.

Figure 4 and Figure 5. Three new regions associated with quantitative eye color.

Regional association plots for 300 Kb surrounding the four newly identified eye color loci are shown in Figure 4. Statistical significance of associated SNPs at each locus are shown on the -log(P) scale as a function of chromosomal position. P values were derived for 6 eye color traits (see figure key, in which diamonds represent 3-categorical color, squares represent CHS1 , triangles represent CHS2, dark circles represent 5- categorial color; crosses represent hue and light circles represent saturation). Genes in the region and LD patterns according to HapMap version 21a CEU samples are aligned in Figure 5. Chromosome 1 233.85-234.25 Mb region includes the LYST gene, where SNPs rs3768056 and rs9782955 showed genome-wide significant association with saturation only (a). The chromosome 17 77.05-77.35 Mb region includes multiple small genes, SNPs rs7219915, rs9894429, and rs12452184 showed genome-wide significant association with multiple traits (b). The chromosome 21 37.30-37.65 Mb region includes DSCR6, PIGP, TTC3, DSCR9, and DSCR3 genes, SNPs rs1003719, rs2252893, rs2835621 , rs2835630, and rs7277820 showed genome-wide significant association with CHS1 (c). The chromosome 2 234.10-234.45 region includes multiple genes. SNPs rs2070959 and rs1105879 showed association with CHS2. Figure 6. Significant SNP interactions on eye color

SNPs having significant interaction effect on eye color are depicted using box-and- whisker diagrams. Color H and S distributions are grouped by cross genotypes of 2 interacting SNPs. Distribution summaries include min-max range (black dotted vertical line), lower-upper 25% quartile range (blue box), and median (red line). Observations outside of 1.5 folds of the quartile range are indicated using red pluses.

A first aspect of the invention provides a method for predicting the iris color of a human, the method comprising:

(a) obtaining a sample of the nucleic acid of the human;

(b) genotyping the nucleic acid for the single nucleotide polymorphism (SNP) rs12913832 or a polymorphic site which is in linkage disequilibrium with rs12913832 at an r² value of at least 0.9; and

genotyping the nucleic acid for at least one of the following polymorphisms:

(i) a SNP selected from the group consisting of rs1800407, rs12896399, rs12203592, rs1325127, rs1393350, rs728405, rs1129038, and a polymorphic site which is in linkage disequilibrium with one of said SNPs at an r² value of at least 0.5;

(ii) a polymorphism which is (a) in the region between basepairs 76891593 and 77498447 on chromosome 17 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs9894429 or a polymorphic site which is in linkage disequilibrium with rs9894429 at an r² value of at least 0.5;

(iii) a polymorphism which is (a) in the region between basepairs 37100732 and 37761703 on chromosome 21 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs7277820 or a polymorphic site which is in linkage disequilibrium with rs7277820 at an r² value of at least 0.5; and

(iv) a polymorphism which is (a) in the region between basepairs 233690968 and 234296843 on chromosome 1 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs3768056 or a polymorphic site which is in linkage disequilibrium with rs3768056 at an r² value of at least 0.5; and

(v) a polymorphism which is (a) in the region between basepairs 233848903 and 234546690 on chromosome 2 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs2070959 or a polymorphic site which is in linkage disequilibrium with rs2070959 at an r² value of at least 0.5; and

(c) predicting at least one quantitative color parameter of the iris as a numeric variable based on the results of step (b), and thereby predicting the iris color.

The sample of nucleic acid from the human may be any suitable sample and includes genomic DNA, RNA and cDNA. Genomic DNA is preferred because most SNPs are in non-translated regions, but for the avoidance of doubt and where the context permits it, the term "sample" also includes cDNA derived from other nucleic acid in the sample and mRNA. The nucleic acid may be isolated from any raw sample material, optionally reverse transcribed into cDNA and directly cloned and/or sequenced. DNA and RNA isolation kits are commercially available from for instance QIAGEN GmbH, Hilden, Germany, or Roche Diagnostics, a division of F. Hoffmann-La Roche Ltd, Basel, Switzerland.

A sample useful for practicing a method of the invention can be any biological sample of a subject that contains nucleic acid molecules, including portions of the gene sequences to be examined. As such, the sample can be a cell, tissue or organ sample, or can be a sample of a biological fluid such as semen, saliva, blood, and the like.

In a forensic application of a method of the invention, the human nucleic acid sample can be obtained from a crime scene, using well established sampling methods. Thus, the sample can be a fluid sample or a swab sample for example blood stain, semen stain, hair follicle, or other biological specimen, taken from a crime scene, or can be a soil sample suspected of containing biological material of a potential crime victim or perpetrator, can be material retrieved from under the finger nails of a putative crime victim, or the like. Another application of the invention is in identifying missing persons (such as deceased persons or parts thereof but potentially also missing persons who are unable or unwilling for whatever reason to disclose their identity) by analysing the herein identified markers from nucleic acids from samples of the unknown person to be identified. A suitable sample may be obtained from a cell, tissue or organ sample, including bone material, or may be a biological fluid.

Another suitable application of the method is in preimplantation or prenatal diagnostics in which case the sample would be extracted from cellular material of the embryo or fetus. The human from whom the nucleic acid sample is obtained can be of any race. As such, the human can be of any group of people classified together on the basis of common history, nationality, or geographic distribution. For example, the subject can be of African, Asian, such as West Asian, Australasian, European, Middle Eastern, North American or South American descent. In certain embodiments the human is Asian, Hispanic, African, or Caucasian. In one embodiment the human is Caucasian. In one embodiment the human is of European, West Asian or Middle Eastern descent, as iris color variation is generally confined to such persons. Often the race of the human subject may not be known. The term "of European descent" means an individual who is a descendant of an individual who was born in a European country or territory in the 11^th through 20^th centuries, typically in the 15^th through 18^th centuries. Typically, at least 10%, at least 15%, at least 20%, at least 25%, 30%, 40%, 50%, 60%, 70%, 75%, 80%, 90% or 95% and up to 100% of the genetic material of a person of European descent is derived from ancestors who were born in a European country/territory or European countries/territories. The term "of West Asian descent" or "of Middle Eastern descent" can be understood accordingly.

European countries include the following: Albania, Andorra, Armenia, Austria, Azerbaijan, Belarus, Belgium, Bosnia and Herzegovina, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Georgia, Germany, Greece, Hungary, Iceland, Ireland, Italy, Kazakhstan, Latvia, Liechtenstein, Lithuania, Luxembourg, Macedonia, Malta, Moldova, Monaco, Montenegro, The Netherlands, Norway, Poland, Portugal, Romania, Russia, San Marino, Serbia, Slovakia, Slovenia, Spain, Sweden, Switzerland, Turkey, Ukraine, United Kingdom and Vatican City. European territories include the following: Aland, Akrotiri and Dhekelia, Faroe Islands, Gibraltar, Guernsey, Isle of Man, Jersey, Abkhazia, Kosovo, Northern Cyprus and South Ossetia. Middle Eastern countries include the following: Turkey, Bahrain, Kuwait, Oman, Qatar, Saudi Arabia, United Arab Emirates, Yemen, Gaza strip, Iraq, Israel, Jordan, Lebanon, Syria, West Bank, Iran, Cyprus and Egypt. West Asian countries include the following: Armenia, Azerbaijan, Bahrain, Cyprus, Georgia, Iraq, Israel, Jordan, Kuwait, Lebanon, Oman, Pakistan, Palestine, Qatar, Saudi Arabia, Syria, Turkey, United Arab Emirates and Yemen.

The method comprises genotyping the nucleic acid for the single nucleotide polymorphism (SNP) rs12913832 or a polymorphic site which is in linkage disequilibrium with rs12913832 at an r² value of at least 0.9. The SNP rs12913832 is in the HERC2 gene on chromosome 15, and the allele may be either A with reference to the positive DNA strand (or, when considering the complementary DNA strand, T) or G (or, when considering the complementary DNA strand, C). The G allele has been associated with blue iris color (Eiberg et al 2008 Hum Genet 123: 177-187). The inventors have found that this marker is the most useful marker for prediction of quantitative color parameters of the human iris. Individuals may be either homozygous or heterozygous for a given allele of this SNP.

The method of the first aspect of the invention comprises genotyping the nucleic acid for at least one further polymorphism, suitably at least one of the following polymorphisms: rs12203592, rs1325127, rs1393350, rs12896399, rs728405, rs1800407, rs1 29038, rs9894429, rs7277820, rs3768056 or rs2070959. Optionally, rs16891982 may also be genotyped. Details of these SNPs, including the identity of the minor allele, the chromosomal location and the gene in which they are present are shown in Table 2 with reference to the positive DNA strand. Individuals may be either homozygous or heterozygous for a given allele of any of these SNPs. Instead of genotyping at least one SNP selected from the group consisting of rs16891982, rs12203592, rs1325127, rs1393350, rs12896399, rs728405, rs1800407, rs1129038, rs12913832, rs9894429, rs7277820, rs3768056 and rs2070959, the method may comprise genotyping a polymorphic site which is in linkage disequilibrium with one of said SNPs at an r² value of at least 0.5.

Typically, the polymorphic sites are SNPs; however, they may be an insertion, a deletion, a microsatellite or an inversion or a combination of these. The polymorphic sites disclosed herein may or may not be causative. Polymorphic sites which are in linkage disequilibrium with any of rs12913832, rs16891982, rs12203592, rs1325 27, rs1393350, rs12896399, rs728405, rs1800407, rs1129038, rs9894429, rs7277820, rs3768056 and rs2070959 may be used as proxy markers. If two loci are in linkage disequilibrium (LD), it means that the degree of recombination between these loci within a population is low. In other words, particular alleles tend to be inherited together. In that case, the presence of an allele at one locus may be predictive of the presence of a particular allele at the other locus, such that one can be used as a proxy for the other. The degree of linkage disequilibrium (LD) between two markers is typically indicated by the parameter r², with an r² value of 1 indicating complete LD and an r² value of 0 indicating complete independence. The extent of LD between markers can vary to an extent depending on the population. As iris color variation is most prevalent among Europeans, a European population or a population of European descent is the most relevant population for the determination of LD in this case. Unless otherwise stated herein, r² values are given for European populations or populations of European descent.

If a polymorphic site which is in linkage disequilibrium with rs12913832 is to be used, it should be in high linkage disequilibrium because rs12913832 contributes most substantially to the predictive accuracy of the method, and polymorphisms with a relatively low linkage disequilibrium with rs12913832 would reduce the predictive accuracy of the method. A suitable polymorphic site which may be used in place of rs12913832 is one which is in linkage disequilibrium with rs12913832 at an r² value of at least 0.9, preferably at least 0.95, more preferably at least 0.975, or at least 0.99. rs1129038 (26030454bp on chromosome 15) is a known SNP which is in linkage disequilibrium with rs12913832 at an r² value of at least 0.9; the relevant r² value is 0.99.

Rs16891982, rs12203592, rs1325127, rs1393350, rs12896399, rs728405, rs1800407, rs1129038, rs9894429, rs7277820, rs3768056 and rs2070959 contribute less to the predictive accuracy of the method compared to rs12913832. The method will still provide an adequate predictive accuracy if a polymorphic site having a lower degree of linkage disequilibrium is to be used in place of one of these SNPs. A suitable polymorphic site which may be used in place of one of these SNPs is one which is in linkage disequilibrium with the SNP at an r² value of at least 0.5, suitably at least 0.6, at least 0.7, at least 0.8, at least 0.9 or at least 0.95. SNPs having the required linkage disequilibrium with each of these SNPs are listed in Table 1. SNP positions and chromosomal locations indicated throughout this document are according to NCBI Build 36.

Table 1 : SNPs having the required linkage disequilibrium with each of the selected SNPs

rs2208384 233956719 0.951 RS rs4659611 234099738 0.946 HapMapCEU rs2799442 233993760 0.999 RS rs4659612 234099913 0.946 HapMapCEU rs3754230 233893206 0.545 HapMapCEU rs4659613 234099952 0.946 HapMapCEU rs3768068 234060930 0.581 HapMapCEU rs2145786 234101921 0.946 HapMapCEU rs2104125 234066928 0.931 HapMapCEU rs9782955 234106500 0.946 HapMapCEU rs6668242 234068402 0.934 HapMapCEU rs6673218 234107685 0.946 HapMapCEU rs6658406 234096098 0.946 HapMapCEU rs6673237 234107752 0.946 HapMapCEU rs2070959 UGT1A6

rs2741012 234173702 0.612 HapMapCEU rs10168416 234261826 0.95 HapMapCEU rs2741013 234174650 0.628 HapMapCEU rs10173355 234262060 0.95 HapMapCEU rs274 019 234175427 0.628 HapMapCEU rs11680450 234262222 0.582 HapMapCEU rs2741021 234176622 0.628 HapMapCEU rs10171367 234262406 0.582 HapMapCEU rs2741022 234179012 0.628 HapMapCEU rs10179094 234262564 0.689 HapMapCEU rs2602362 234181287 0.651 HapMapCEU rs7563561 234263730 0.582 HapMapCEU rs2741027 234182750 0.628 HapMapCEU rs7608175 234263828 0.582 HapMapCEU SNP Position LD(i^) DataSource SNP Position LD(r^) DataSource rs2741028 234183653 0.628 Hap apCEU rs12623271 234264680 0.582 HapMapCEU rs2602363 234185264 0.628 HapMapCEU rs10445704 234265013 0.582 HapMapCEU rs2602364 234188968 0.585 HapMapCEU rs13015720 234265738 0.582 HapMapCEU rs2741029 234194899 0.628 HapMapCEU rs6759892 234266408 0.582 HapMapCEU rs2741030 234200891 0.628 HapMapCEU rs1105880 234266704 0.825 HapMapCEU rs2741034 234213553 0.628 HapMapCEU rs1105879 234266941 0.825 HapMapCEU rs2741036 234223541 0.628 HapMapCEU rs28899170 234268969 0.95 HapMapCEU rs2602373 234226692 0.628 HapMapCEU rs6715829 234270901 0.731 HapMapCEU rs2741038 234228737 0.628 HapMapCEU rs17863787 234275833 1 HapMapCEU rs2741042 234230656 0.628 HapMapCEU rs6725478 234280139 0.72 HapMapCEU rs2602374 234233703 0.715 HapMapCEU rs7583278 234282146 0.718 HapMapCEU rs2741044 234244107 0.85 HapMapCEU rs6744284 234290036 0.902 HapMapCEU rs2741045 234244879 0.855 HapMapCEU rs1875263 234290361 0.623 HapMapCEU rs2741046 234244988 0.855 HapMapCEU rs869283 234291026 0.72 HapMapCEU rs2602376 234246790 0.855 HapMapCEU rs1983023 234301761 0.716 HapMapCEU rs10202865 234252446 0.95 HapMapCEU rs11891311 234304049 0.623 HapMapCEU rs10197460 234253929 0.689 HapMapCEU rs7564935 234309925 0.623 HapMapCEU rs10167119 234254051 0.689 HapMapCEU rs6722076 234312056 0.898 HapMapCEU rs4530361 234254780 0.704 HapMapCEU rs2018985 234313599 0.72 HapMapCEU rs7586110 234255266 0.689 HapMapCEU rs17862875 234314041 0.853 HapMapCEU rs7577677 234255355 0.689 HapMapCEU rs13009407 234317086 0.809 HapMapCEU rs6724485 234257555 0.582 HapMapCEU rs17864701 234317456 0.826 HapMapCEU rs4347832 234257780 0.582 HapMapCEU rs11695484 234319188 0.853 HapMapCEU rs4261716 234257856 0.582 HapMapCEU rs11888459 234321379 0.623 HapMapCEU rs13002774 234258445 0.582 HapMapCEU rs10178992 234322616 0.623 HapMapCEU rs4553819 234258822 0.582 HapMapCEU rs7604115 234322855 0.623 HapMapCEU rs11902131 234259008 0.582 HapMapCEU rs11673726 234328799 0.623 HapMapCEU rs6753320 234260354 0.582 HapMapCEU rs6714634 234329504 0.853 HapMapCEU rs6736508 234260486 0.582 HapMapCEU rs10929302 234330521 0.853 HapMapCEU rs6753569 234260556 0.582 HapMapCEU rs887829 234333309 0.951 HapMapCEU rs6736743 234260689 0.582 HapMapCEU rs6742078 234337378 0.951 HapMapCEU rs10168155 234261575 0.582 HapMapCEU rs4148324 234337461 0.951 HapMapCEU rs10175809 234261604 0.582 HapMapCEU rs4148325 234338048 0.951 HapMapCEU rs10168333 234261727 0.582 HapMapCEU rs929596 234339215 0.809 HapMapCEU rs16891982 SLC45A2

rs35407 33982328 0.772 RS rs35389 33990637 0.896 RS rs35395 33984346 0.784 RS rs28777 33994716 0.896 RS rs35397 33986873 0.682 RS rs183671 33999967 0.896 RS rs2278007 33987308 0.889 RS rs3797201 34003902 0.883 RS rs1325127 TYRP1

rs13283649 12608337 0.699 RS rs1326798 12712227 0.592 RS rs7466934 12609840 0.701 RS rs12379260 12713112 0.592 RS rs7036899 12610266 0.702 RS rs13284453 12714280 0.538 RS rs10756386 12611004 0.702 RS rs13284898 12714560 0.591 RS rs10960723 12612878 0.653 RS rs6474717 12579068 0.529 HapMapCEU rs977888 12614357 0.702 RS rs13283146 12589561 0.57 HapMapCEU rs10809808 12614463 0.653 RS rs1408790 12592681 0.702 HapMapCEU rs10960730 12621099 0.655 RS rs10960716 12594407 0.702 HapMapCEU rs10809809 12621398 0.655 RS rs713596 12595687 0.744 HapMapCEU rs10960732 12623495 0.655 RS rs13288558 12602529 0.744 HapMapCEU rs7026116 12623981 0.655 RS rs1325117 12603472 0.671 HapMapCEU rs7047297 12628540 0.676 RS rs1325118 12609616 0.623 HapMapCEU rs10960735 12631821 0.73 RS rs1886586 12616009 0.744 HapMapCEU rs1325122 12632878 0.679 RS rs10960738 12638831 0.517 HapMapCEU rs10809811 12640996 0.73 RS rs13283345 12640198 0.545 HapMapCEU rs1408794 12641340 0.73 RS rs1408795 12641413 0.545 HapMapCEU rs995263 12644578 0.68 RS rs2733836 12682252 0.624 HapMapCEU rs1121541 12657049 0.732 RS rs2762458 12682280 0.646 HapMapCEU rs10809818 12658121 1 RS rs2762463 12691897 0.507 HapMapCEU rs10960748 12658805 0.786 RS rs2209277 12696236 0.507 HapMapCEU rs9298679 12659346 0.597 RS rs2733834 12698910 0.506 HapMapCEU SNP Position LD( DataSource SNP Position LD(r') DataSource rs10960749 12661566 0.787 RS rs1137134 12702157 0.758 HapMapCEU rs1408799 12662097 0.53 RS rs10960774 12729313 0.715 HapMapCEU rs1408800 12662275 0.53 RS rs10756405 12738234 0.715 HapMapCEU rs13294134 12663636 0.785 RS rs10756406 12738587 0.75 HapMapCEU rs10960751 12665264 0.732 RS rs927868 12738795 0.656 HapMapCEU rs10960752 12665284 0.732 RS rs927869 12738962 0.75 HapMapCEU rs10960753 12665522 0.731 RS rs4741245 12739300 0.75 HapMapCEU rs13296454 12667181 0.731 RS rs7023927 12739596 0.75 HapMapCEU rs13297008 12667471 0.731 RS rs7035500 12740095 0.745 HapMapCEU rs10809826 12672663 0.749 RS rs13302551 12740812 0.715 HapMapCEU rs2762460 12686478 0.643 RS rs1543587 12741741 0.75 HapMapCEU rs2762461 12686499 0.695 RS rs1074789 12742340 0.707 HapMapCEU rs2733831 12693484 0.642 RS rs1074790 12742371 0.513 HapMapCEU rs2733832 12694725 0.676 RS rs10960779 12748881 0.707 HapMapCEU rs10960758 12706315 0.609 RS rs1326789 12749838 0.634 HapMapCEU rs10960759 12706428 0.609 RS rs7025842 12750647 0.673 HapMapCEU rs12379024 12707405 0.609 RS rs7025953 12750718 0.673 HapMapCEU rs13295868 12707912 0.609 RS rs7025771 12750762 0.673 HapMapCEU rs7019226 12708370 0.592 RS rs7025914 12750884 0.673 HapMapCEU rs11789751 12709264 0.609 RS rs10491743 12750920 0.673 HapMapCEU rs10491744 12710106 0.609 RS rs1326790 12751168 0.673 HapMapCEU rs10960760 12710152 0.609 RS rs1326791 12751300 0.678 HapMapCEU rs2382361 12710786 0.609 RS rs1326792 12751360 0.673 HapMapCEU rs1409626 12710820 0.609 RS rs7030485 12751819 0.647 HapMapCEU rs1409630 12711251 0.592 RS rs10960781 12752374 0.613 HapMapCEU rs13288475 12711714 0.592 RS rs12115198 12753450 0.666 HapMapCEU rs13288636 12711806 0.592 RS rs10960783 12753809 0.583 HapMapCEU rs13288681 12711881 0.592 RS rs7855624 12763263 0.514 HapMapCEU rs10491742 12765488 0.592 HapMapCEU rs1393350 TYR

rs10765198 88609422 0.862 RS rs 1018464 88460762 0.52 HapMapCEU rs7358418 88609786 0.862 RS rs12363323 88495940 0.535 HapMapCEU rs10765200 88611332 0.862 RS rs1942486 88496430 0.52 HapMapCEU rs10765201 88611352 0.862 RS rs17792911 88502470 0.53 HapMapCEU rs4396293 88615761 0.522 RS rs10830219 88512157 0.535 HapMapCEU rs2186640 88615811 0.531 RS rs10830236 88540464 0.597 HapMapCEU rs10501698 88617012 0.797 RS rs12270717 88551838 0.872 HapMapCEU rs10830250 88617255 0.558 RS rs7129973 88555218 0.514 HapMapCEU rs7924589 88617956 0.697 RS rs11018525 88559553 0.514 HapMapCEU rs4121401 88619494 0.639 RS rs17793678 88561172 1 HapMapCEU rs1847134 88644901 0.791 RS rs10765196 88564890 1 HapMapCEU rs1827430 88658088 0.57 RS rs10765197 88564976 0.514 HapMapCEU rs3900053 88660713 0.758 RS rs7123654 88565603 0.512 HapMapCEU rs1847142 88661222 0.808 RS rs11018528 88570025 1 HapMapCEU rs4121403 88664103 0.694 RS rs12791412 88570229 0.936 HapMapCEU rs10830253 88667691 0.807 RS rs12789914 88570555 0.761 HapMapCEU rs7951935 88670047 0.619 RS rs7107143 88571135 0.827 HapMapCEU rs1847140 88676712 0.684 RS rs4512823 88606232 0.87 HapMapCEU rs1806319 88677584 0.634 RS rs4512825 88606499 0.777 HapMapCEU rs4106039 88680791 0.568 RS rs7101897 88647570 0.779 HapMapCEU rs4106040 88680802 0.608 RS rs1126809 88657609 0.827 HapMapCEU rs11018463 88459390 0.535 HapMapCEU

rs12896399 SLC24A4

rs8017054 91830169 0.651 RS rs1885194 91847215 0.992 RS rs4900109 91833144 0.985 RS rs17184180 91850140 0.992 RS rs4904866 91838256 1 RS rs4904868 91850754 0.661 RS rs746586 91845720 1 RS rs4904870 91856761 0.661 RS rs1075830 91845915 0.661 RS rs4900114 91865488 0.653 HapMapCEU rs941799 91846578 0.992 RS

rs728405 0CA2

SNP Position LD(r^) DataSource SNP Position LD(r*) DataSource rs-1981433 37464093 0.843 RS rs2835579 37379835 0.883 HapMapCEU rs2835643 37465172 0.619 RS rs2835582 37380643 0.883 HapMapCEU rs2835644 37467480 0.843 RS rs2835584 37382208 0.883 HapMapCEU rs2835645 37468252 0.619 RS rs2835585 37382358 0.904 HapMapCEU rs2835646 37468444 0.843 RS rs2835586 37383196 0.883 HapMapCEU rs2835647 37468643 0.843 RS rs3787780 37384484 0.923 HapMapCEU rs8131944 37469085 0.843 RS rs3787782 37384779 0.883 HapMapCEU rs2835648 37469924 0.619 RS rs2835589 37385212 0.883 HapMapCEU rs2040125 37471106 0.759 RS rs2835590 37385276 0.883 HapMapCEU rs2051398 37472128 0.814 RS rs3787787 37385322 0.883 HapMapCEU rs13046303 37473846 0.759 RS rs2835594 37387934 0.883 HapMapCEU rs8129942 37473940 0.843 RS rs2835598 37388539 0.883 HapMapCEU rs2835649 37474100 0.843 RS rs2835599 37388807 0.923 HapMapCEU rs2835652 37479278 0.843 RS rs2006941 37389486 0.883 HapMapCEU rs2835653 37480506 0.843 RS rs2251161 37389727 0.883 HapMapCEU rs2251952 37481581 0.843 RS rs2835600 37390828 0.883 HapMapCEU rs6517405 37482204 0.843 RS rs2835601 37390887 0.883 HapMapCEU rs2835655 37485595 0.619 RS rs11088381 37400299 0.883 HapMapCEU rs2835657 37486934 0.876 RS rs8127148 37403162 0.764 HapMapCEU rs2835658 37487071 0.844 RS rs4816560 37409431 0.883 HapMapCEU rs2835659 37489772 0.844 RS rs2040341 37411623 0.883 HapMapCEU rs1053966 37489879 0.76 RS rs2835605 37412604 0.883 HapMapCEU rs6579 37490178 0.844 RS rs2835606 37412721 0.779 HapMapCEU rs762139 37490345 0.844 RS rs1003719 37412965 0.643 HapMapCEU rs2835660 37490752 0.844 RS rs2154536 37414017 0.643 HapMapCEU rs1034357 37493352 0.844 RS rs1003721 37414527 0.509 HapMapCEU rs2835664 37495299 0.844 RS rs6517404 37416788 0.751 HapMapCEU rs2032063 37498527 0.844 RS rs2835613 37418753 0.921 HapMapCEU rs2835667 37501784 0.844 RS rs2835615 37419173 0.921 HapMapCEU rs2835669 37502685 0.621 RS rs9981715 37421480 0.961 HapMapCEU rs2835671 37502995 0.844 RS rs3787788 37424674 0.961 HapMapCEU rs2835672 37503104 0.844 RS rs2000417 37424738 0.961 HapMapCEU rs9636914 37511602 0.76 RS rs2835618 37424996 0.819 HapMapCEU rs2835676 37513181 0.59 RS rs3992 37427839 0.883 HapMapCEU rs2073356 37522096 0.57 RS rs2252893 37429442 0.961 HapMapCEU rs2835678 37525351 0.704 RS rs2835621 37432486 0.961 HapMapCEU rs9305614 37531609 0.649 RS rs2154538 37434826 0.961 HapMapCEU rs2249947 37355389 0.755 HapMapCEU rs764175 37438928 0.961 HapMapCEU rs8127496 37364352 0.755 HapMapCEU rs2018521 37439134 0.819 HapMapCEU rs2298682 37366760 0.71 HapMapCEU rs13046844 37439473 0.956 HapMapCEU rs1543749 37368491 0.883 HapMapCEU rs2252402 37442590 0.779 HapMapCEU rs4817843 37368910 0.883 HapMapCEU rs2835629 37443530 0.959 HapMapCEU rs4817844 37368934 0.883 HapMapCEU rs2835630 37443712 1 HapMapCEU rs1155786 37369895 0.923 HapMapCEU rs3737540 37445136 0.961 HapMapCEU rs1015549 37370750 0.847 HapMapCEU rs2835633 37446889 0.819 HapMapCEU rs1015551 37371380 0.883 HapMapCEU rs1053808 37447226 0.819 HapMapCEU rs9975168 37372427 0.883 HapMapCEU rs767998 37447522 0.923 HapMapCEU rs7275582 37373000 0.845 HapMapCEU rs2835634 37450140 0.961 HapMapCEU rs7276917 37373204 0.883 HapMapCEU rs2835635 37450736 0.812 HapMapCEU rs2156076 37374504 0.883 HapMapCEU rs1053984 37495888 0.819 HapMapCEU rs4411798 37375053 0.921 HapMapCEU rs2096495 37543067 0.545 HapMapCEU rs12203592 IRF4

None

identified

RS means Rotterdam cohort (Hofman A et al (1991) Eur J Epidemiol 7: 403-422) HapMap CEU means Utah residents with Northern and Western European ancestry from the HapMap database (The International HapMap Project. Nature (2003) 426: 789-796; http://www.hapmap.org). HapMap CEU data are only included for SNPs that were not detected in the Rotterdam cohort.

The method may involve genotyping polymorphisms which are yet to be identified. If a new polymorphism e.g. SNP is identified, it is straightforward to determine the LD with a known SNP by genotyping both polymorphisms in at least about 100 unrelated individuals in a population and using standard formulas. The r² value can be calculated using standard formulas when haplotypes between 2 SNPs are known. Haplotypes can be inferred from genotype data. For population data, the Expectation Maximization algorithm based programs such as haplo.stats (software website: http://mayoresearch.mayo. edu/mayo/research/schaid_lab/software.cfm;algorithm reference: Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA. (2002) Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am J Hum Genet, 70: 425-434.) can be used. For pedigree data, linkage based programs such as Merlin (software website: http://www.sph.umich.edu/csg/abecasis/MERLIN; algorithm reference: Abecasis et al. (2001) Merlin— rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet, 30: 97-101). New polymorphisms having high LD with a known SNP, such as an r² value of at least 0.5 or at least 0.9, may be found within 200 kb of the known SNP on the chromosome, such as within 100 kb, or 50 kb, or within the same linkage block. Locations of the SNPs useful in the invention, linkage blocks and broader chromosomal regions encompassing 100 kb upstream and downstream of each SNP are shown in Table 2. Table 2: Chromosomal regions which may encompass polymorphisms in LD with SNPs useful in the invention:

Position SNP location +/- 100

SNP Gene Chr Linkage block kb rs12896399C SLC24A4 14 91843416 91830169-91875964 91743416-91943416 rs728405C OCA2 15 25873448 25874249-25908005 25773448-25973448 rs1800407G OCA2 15 25903913 25874249-25908005 257803913-26003913 rs1129038C HERC2 15 26030454 26002360-26051367 25930454-26130454 rs12913832A HERC2 15 26039213 26032853-26051367 25939213-26139213 rs9894429A NPLOC4 17 77207216 77157287-77227939 77107216-77307216 rs7277820G DSCR9 21 37502179 37453741-37520446 37402179-37602179

234233924 - 234166930- rs2070959G UGT1A 2 234266930 234328388 234366930

For each of rs9894429, rs7277820, rs3768056 and rs2070959, proxy markers may exist which are not necessarily in LD with the SNP at an r² value of at least 0.5. Each of these SNPs are located in chromosomal regions which have not previously been associated with iris color variation in humans. There may be other polymorphisms in the genes in these regions which are not necessarily in LD with the given SNP, but have the same or a similar effect on iris color by virtue of affecting the expression or function of the gene that is affected by the given SNP. Accordingly, instead of genotyping rs9894429 or a polymorphic site which is in linkage disequilibrium with rs9894429 at an r² value of at least 0.5, the method may comprise genotyping a polymorphism in the region between basepairs 76891593 and 77498447 on chromosome 17 (suitably between 77000000 and 77400000) which is associated with variation in iris color. Suitably, a polymorphic site which is in linkage disequilibrium with rs9894429 at an r² value of at least 0.5 is also associated with variation in iris color and/or is located in the region between basepairs 76891593 and 77498447 on chromosome 17. As the polymorphic site is in strong LD with rs9894429, it can be inferred that it is associated with variation in iris color. Instead of genotyping rs7277820 or a polymorphic site which is in linkage disequilibrium with rs7277820 at an r² value of at least 0.5, the method may comprise genotyping a polymorphism in the region between basepairs 37100732 and 37761703 on chromosome 21 (suitably between 37200000 and 37650000) which is associated with variation in iris color. Suitably a polymorphic site which is in linkage disequilibrium with rs7277820 at an r² value of at least 0.5 is also associated with variation in iris color and/or is located in the region between basepairs 37100732 and 37761703 on chromosome 21. Instead of genotyping rs3768056 or a polymorphic site which is in linkage disequilibrium with rs3768056 at an r² value of at least 0.5, the method may comprise genotyping a polymorphism in the region between basepairs 233690968 and 234296843 on chromosome 1 (suitably between 233800000 and 234200000) which is associated with variation in iris color. Suitably a polymorphic site which is in linkage disequilibrium with rs3768056 at an r² value of at least 0.5 is also associated with variation in iris color and/or is located in the region between basepairs 233690968 and 234296843 on chromosome 1. Instead of genotyping rs2070959 or a polymorphic site which is in linkage disequilibrium with rs2070959 at an r² value of at least 0.5, the method may comprising genotyping a polymorphism in the region between basepairs 233848903 and 234546690 on chromosome 2 (suitably between 234000000 and 234450000) which is associated with variation in iris color. Suitably a polymorphic site which is in linkage disequilibrium with rs2070959 at an r² value of at least 0.5 is also associated with variation in iris color and/or is located in the region between basepairs 233848903 and 234546690 on chromosome 2. By "associated with variation in iris color" we mean that there is a significant statistical correlation between the presence of a particular allele of the polymorphism in individuals in a population and an aspect of iris color. Suitably, the correlation may be observed among healthy individuals within a population, particularly, those who do not suffer from a disease which affects the structure or function of the eye. The effect may be on any quantitative color parameter, such as hue, saturation, chroma or colourfulness, or a principal component of iris color variation within a population; or it may be on categorisation of iris color, for example as blue, brown or intermediate. The population is typically of European descent, but can also be of West Asian descent or of Middle Eastern descent, where eye color variation also exists. The P value for association should be less than 10^"6, typically less than 10^~7, most typically less than 5x10^'8, suitably less than 10^"9, 10^"10, 10^"12, 10^"14, 10^"16, 10^'18, 10^"20, 10^"25 or 10³⁰. For quantitative color parameters, P values can be derived using linear regression where the quantitative color parameter is the dependent variable and the number of minor alleles of a given SNP is the independent variable. For categorical eye color, P values can be derived using binary or multinomial logistic models. Polymorphisms as defined above may be proxy markers for the indicated SNPs.

Suitably, according to the method of the first aspect, step (b) comprises genotyping the nucleic acid for:

(i) the SNP rs1800407 or a polymorphic site which is in linkage disequilibrium with rs 800407 at an r² value of at least 0.5; and/or

(ii) the SNP rs12896399 or a polymorphic site which is in linkage disequilibrium with rs12896399 at an r² value of at least 0.5. In other words, a suitable minimal combination of polymorphisms which is genotyped in the method may be: rs12913832 and rs1800407; rs12913832 and rs12896399; or rs12913832, rs12896399 and rs1800407; or combinations where a proxy marker is used in place of one or more of these SNPs, i.e. a corresponding polymorphic site which is in sufficiently strong LD as defined above.

If the method of iris colour prediction involves genotyping only the SNPs rs12913832, rs1800407 and/or rs12896399 (or polymorphic sites which are in linkage disequilibrium with one of those SNPs at the required r² value), it is preferable to identify the race of the human from whom the nucleic acid sample was obtained. The prediction accuracy is better for persons of European descent, e.g. for Caucasians. The European descent of an unknown person can be determined using ancestry-sensitive DNA markers as described in Lao et al AJHG 2008, Vol 78, 680-690; and Kersbergen et al. 2009 BMC Genetics 10:69. Ancestry can also be inferred from skull morphometry.

The embodiment of the method in which each of rs12913832 in HERC2; rs1800407 in OCA2; rs12896399 in SLC24A4; rs16891982 in SLC45A2; and optionally also rs1393350 in TYR; and/or rs12203592 in IRF4 is used for prediction of iris color, the prediction is accurate irrespective of the ancestry of the human subject. Hence, additional testing to determine the race or bio-geographic ancestry of a person is not necessary for correct interpretation of the prediction results, providing a clear advantage in practical forensic applications. Further embodiments utilising other combinations of polymorphisms may also provide ancestry-independent prediction of iris color. If a combination of polymorphisms is identified which allows for ancestry-independent prediction of iris color, adding further polymorphisms will not prejudice the ancestry- independence of the method.

Suitably, step (b) of the method of the first aspect comprises genotyping the nucleic acid for: a polymorphism in the region between basepairs 76891593 and 77498447 on chromosome 17 which is associated with variation in iris color, preferably rs9894429 or a polymorphic site which is in linkage disequilibrium with rs9894429 at an r² value of at least 0.5. In other words, a suitable minimal combination of polymorphisms which is genotyped in the method may be: rs12913832 and rs9894429; rs12913832, rs1800407 and rs9894429; rs12913832, rs12896399 and rs9894429; rs12913832, rs12896399, rs1800407 and rs9894429; or a combination where a proxy marker is used in place of one or more of these SNPs, i.e. one or more of these SNPs is replaced by a corresponding polymorphic site which is in sufficiently strong LD as defined above and/or rs9894429 is replaced by a polymorphic site which is in the region between basepairs 76891593 and 77498447 on chromosome 17 which is associated with variation in iris color. Rs9894429 contributes at least 0.5% summary variance of hue and saturation of the iris.

Suitably, step (b) comprises genotyping the nucleic acid for: a polymorphism in the region between basepairs 37100732 and 37761703 on chromosome 21 which is associated with variation in iris color, preferably rs7277820 or a polymorphic site which is in linkage disequilibrium with rs7277820 at an r² value of at least 0.5. In other words, a suitable minimal combination of polymorphisms which is genotyped in the method may be: rs12913832 and rs7277820; rs12913832, rs1800407 and rs7277820; rs129 3832, rs12896399 and rs7277820; rs12913832, rs12896399, rs1800407 and rs7277820; rs12913832, rs9894429 and rs7277820; rs12913832, rs1800407, rs9894429 and rs7277820; rs12913832, rs12896399, rs9894429 and rs7277820; rs12913832, rs12896399, rs1800407, rs9894429 and rs7277820; or a combination where a proxy marker is used in place of one or more of these SNPs, i.e. one or more of these SNPs is replaced by a corresponding polymorphic site which is in sufficiently strong LD as defined above, and/or rs9894429 is replaced by a polymorphic site which is in the region between basepairs 76891593 and 77498447 on chromosome 17 which is associated with variation in iris color, and/or rs7277820 is replaced by a polymorphic site which is in the region between basepairs 37100732 and 37761703 on chromosome 21 which is associated with variation in iris color. Rs7277820 contributes at least 0.5% summary variance of hue and saturation of the iris.

Suitably, step (b) comprises genotyping the nucleic acid for: a polymorphism in the region between basepairs 233690968 and 234296843 on chromosome 1 which is associated with variation in iris color, preferably rs3768056 or a polymorphic site which is in linkage disequilibrium with rs3768056 at an r² value of at least 0.5. In other words, a suitable minimal combination of polymorphisms which is genotyped in the method may be: rs12913832 and rs3768056; rs12913832, rs1800407 and rs3768056; rs12913832, rs12896399 and rs3768056; rs12913832, rs12896399, rs1800407 and rs3768056; rs12913832, rs9894429 and rs3768056; rs12913832, rs1800407, rs9894429 and rs3768056; rs12913832, rs12896399, rs9894429 and rs3768056; rs12913832, rs 2896399, rs1800407, rs9894429 and rs3768056; rs12913832, rs7277820 and rs3768056; rs12913832, rs1800407, rs7277820 and rs3768056; rs12913832, rs 2896399, rs7277820 and rs3768056; rs12913832, rs12896399, rs1800407, rs7277820 and rs3768056; rs12913832, rs9894429, rs7277820 and rs3768056; rs12913832, rs1800407, rs9894429, rs7277820 and rs3768056; rs12913832, rs12896399, rs9894429, rs7277820 and rs3768056; rs12913832, rs12896399, rs1800407, rs9894429, rs7277820 and rs3768056; or a combination where a proxy marker is used in place of one or more of these SNPs, i.e. one or more of these SNPs is replaced by a corresponding polymorphic site which is in sufficiently strong LD as defined above, and/or rs9894429 is replaced by a polymorphic site which is in the region between basepairs 76891593 and 77498447 on chromosome 17 which is associated with variation in iris color, and/or rs7277820 is replaced by a polymorphic site which is in the region between basepairs 37100732 and 37761703 on chromosome 21 which is associated with variation in iris color, and/or rs3768056 is replaced by a polymorphic site in the region between basepairs 233690968 and 234296843 on chromosome 1 which is associated with variation in iris color. Suitably, step (b) comprises genotyping the nucleic acid for: a polymorphism in the region between basepairs 233848903 and 234546690 on chromosome 2 which is associated with variation in iris color; preferably rs2070959 or a polymorphic site which is in linkage disequilibrium with rs2070959 at an r² value of at least 0.5. Rs2070959 or a suitable proxy marker may be genotyped in addition to or instead of rs3768056 (or proxy marker) for all combinations of polymorphisms mentioned above as comprising rs3768056 (or proxy marker).

Suitably at least 2, at least 3, at least 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13 of the polymorphisms rs12913832, rs16891982, rs12203592, rs1325127, rs1393350, rs12896399, rs728405, rs1800407, rs1129038, rs9894429, rs7277820, rs3768056 and rs2070959 or appropriate proxy markers as defined above are genotyped according to the method of the first aspect of the invention.

The inventors have found that the presence of an allele of a certain SNP may modify the effect of an allele of another SNP on a quantitative color parameter. In other words, there is a genetic interaction between the two SNPs. Details of the interactions identified are shown in Example 1. Interactions have been identified between the following pairs of alleles: rs1800407G and rs16891982C; rs12913832A and rs12203592A; rs12913832A and rs728405C; and rs12913832A and rs12896399C. The interaction between rs12913832A and rs12896399C has a particularly strong effect on a quantitative color parameter. Therefore, it is preferred to genotype rs 2896399 in addition to rs12913832 in the method of the first aspect. Because of the interaction between rs1800407G and rs16891982C, it is preferred to genotype rs1800407 when the method also comprises genotyping rs16891982; similarly it is preferred to genotype rs16891982 when the method also comprises genotyping rs1800407. The prediction of at least one quantitative color parameter of the iris involves analyzing the nucleotide occurrences of each of these SNPs (or polymorphisms which act as proxy markers) in a nucleic acid sample of the subject, and comparing the combination of nucleotide occurrences of the SNPs (or genotypes of the proxy markers) to known relationships of genotype and hue and saturation of the iris. Thus, the at least one quantitative color parameter of the iris may be inferred from the genotypes of the polymorphisms that have been analyzed.

Step (c) comprises predicting at least one quantitative color parameter of the iris as a numeric variables based on the results of step (b), and thereby predicting the iris color.

A color model is an abstract mathematical model describing the way all human perceivable colors can be represented quantitatively, typically containing three or four color dimensions. A "quantitative color dimension", also referred to as a "quantitative color parameter", is an attribute of a visual sensation which is represented in a certain color model. For example, hue, brightness (value), lightness, colorfulness, chroma, saturation, and luminance are color dimensions. The Hue, Saturation, and Brightness (Value) dimensions compose the HSB (HSV) color model. Hue is the attribute of a visual sensation according to which an area appears to be similar to one of the perceived color types: red, yellow, green, and blue, or to a combination of two of them. Saturation is the attribute of a visual sensation referring to the perceived intensity of a specific Hue. "Brightness" or "Value" is the attribute of a visual sensation according to which an area appears to emit more or less light. Brightness depends on the lighting conditions, for example when a photograph is taken, and, thus, may not be under genetic control. Other colour dimensions such as chroma, purity, and intensity are based on a similar intuitive concept to saturation but depend greatly on the specific color model in use. According to one accepted meaning, "lightness" is the brightness relative to the brightness of a similarly illuminated white. "Colorfulness" is the attribute of a visual sensation according to which the perceived color of an area appears to be more or less chromatic. "Chroma" is the colorfulness relative to the brightness of a similarly illuminated white. "Saturation" is the colorfulness of a stimulus relative to its own brightness. Brightness and colorfulness are absolute measures, which usually describe the spectral distribution of light entering the eye, while lightness and chroma are measured relative to some white point, and are thus often used for descriptions of surface colors, remaining roughly constant even as brightness and colorfulness change with different illumination. Saturation is colorfulness/brightness and is therefore also an absolute measure. Saturation is also chroma/lightness. The numeric values for a given color dimension may differ between different models, such that the same color may be represented by different values according to different models. The commonly used color models include but are not restricted to HSL, L^*a^*b, L^*u^*v, XYZ, CMYK, and RGB models. The advantage of HSB (HSV) color model in genetic research is that the Hue and Saturation dimensions are invariant to the Brightness (Value) dimension. The HSL model is conceptually similar to the HSB (HSV) model, in which the Hue and Saturation dimensions are the same as the HSB (HSV) model and the Luminance dimension has a linear relationship with Brightness or Value. Suitable color models are described in the following: HSB, HSV, HSL:

Judd, Deane B. (January 1940). "Hue saturation and lightness of surface colors with chromatic illumination". JOSA 30 (1): 2-32;

L*a^*b:

Hunter, Richard Sewall (July 1948). "Photoelectric Color-Difference Meter". JOSA 38 (7): 661. http://www.opticsinfobase.orq/abstract.cfm?id=76613.

L^*u*v:

George A. Agoston, Color Theory and Its Application in Art and Design.Springer-Verlag, 1987.p. 240; CIE 1960

Poynton, Charles (2003). Digital Video and HDTV. Morgan-Kaufmann. pp. 226. ISBN 1- 55860-792-7

XYZ:

CIE (1932). Commission Internationale de I'Eclairage proceedings, 1931. Cambridge University Press, Cambridge

Smith, Thomas; Guild, John (1931-32). "The CLE. colorimetric standards and their use". Transactions of the Optical Society 33 (3): 73-134. doi: 10.1088/1475-4878/33/3/301 CMYK:

Horvat, Les (2003). Digital Imaging: Essential Skills. Focal Press, pp. 74. ISBN

0240519132, 9780240519135

RGB:

Cowlishaw, M. F. (1985). Fundamental requirements for picture presentation. Proc. Society for Information Display 26 (2): 101-107. http://speleotrove.com/misc/cowlishaw1985-fundamental.pdf In the HSB model, H stands for hue, S stands for saturation and B stands for brightness. H and S values are invariant to brightness. Under a fixed brightness, HS can be viewed as a color pie where H represents the variation of the color type, ranging from 0°-360° for all human detectable true colors, and the radius S represents the purity or intensity of the color, ranging from 0 to 1. The B dimension, ranging from 0-225, is suitably discarded in genetic association testing and arbitrarily fixed at average daylight conditions of 150 when predicting eye colors. According to the method of the first aspect of the invention, at least one quantitative color parameter of the iris is predicted as a numeric variable. Suitably, the at least one variable is provided with a confidence interval, such as a 95% confidence interval. If a variable is represented as X and the 95% confidence interval is represented as ±Y, there is a probability of 95% that the actual value of the parameter lies within the predicted range. Suitable confidence intervals include 99%, 95%, 90%, 85%, 80%, 75% and 70%.

The predicted values for the at least one quantitative color parameter, with or without confidence intervals, may be mapped onto an appropriate color space. This may or may not be the color space underlying the model used in the color prediction. However, if a different model is used, it may be necessary to correlate the predicted color to the color space of the new model. Brightness can be arbitrarily fixed at a certain value, suitably to reflect average daylight conditions. Suitably, the predicted color may be assigned to a standardized color chart, in which different colors are depicted as swatches and may also be assigned a number. Suitable color charts include Pantone (Pantone Inc, New Jersey, USA) and Natural Color System (Skandinaviska Farginstitutet AB, Sweden). Such detailed color information predicted from a biological sample found at a crime scene or collected from the body / body parts of an unknown (missing) person found can than be used for forensic investigations. Suitably, the method comprises predicting at least one of hue, saturation, colourfulness and chroma as a numeric variable. Typically, at least two quantitative color parameters are predicted, suitably selected from hue, saturation, colourfulness and chroma. Suitably, hue and saturation are predicted. Colorfulness and chroma may also be predicted. Red, Green, or Blue in the RGB color model may be predicted indirectly. An indirect prediction may be necessary because all three parameters are influenced by the brightness. RGB can be predicted by firstly predicting Hue and Saturation values in the HSB model and then converting the Hue, Saturation and Brightness (arbitrary number) to RGB values. The lightness or brightness of the iris will depend on the lighting conditions and so are not genetically determined. Therefore, numeric variables of lightness or brightness can be set to reflect an appropriate lighting condition, such as daylight. In the HSB model, an average brightness value of 150 is suitable in prediction, but this can be specified arbitrarily. Depending on the color model used, these variables may then be included in the prediction of other quantitative color parameters.

According to the method of the first aspect of the invention, the prediction of iris color involves genotyping appropriate polymorphisms as discussed above, and comparing the combination of the genotypes of the polymorphisms to known relationships of genotype and iris color. Thus, the iris color may be inferred in a quantitative way from the genotypes of the polymorphisms that have been analyzed.

Methods for performing such a comparison and reaching a conclusion based on that comparison are exemplified herein. The inference typically involves using a complex model that involves using known relationships of known alleles or nucleotide occurrences as classifiers. Such a model is a "prediction model". Various methods can be used to arrive at a prediction model. Commonly used methods include but are not restricted to linear (Tibshirani, Robert (1996). "Regression Shrinkage and Selection via the Lasso". Journal of the Royal Statistical Society. Series B (Methodological) 58 (1): 267-288. http://www.jstor.org/stable/2346178) and heretical regressions (Breiman, L, Friedman, J. H., Olshen, R. A., and Stone, C. J. 1984. Classification and Regression Trees. Wadsworth International, Belmont, Ca), principal component analysis (Pearson, K. (1901). On Lines and Planes of Closest Fit to Systems of Points in Space. Philosophical Magazine 2 (6): 559-572. http://stat.smmu.edu.cn/history/pearson1901.pdf.), support vector machines (David Meyer, Friedrich Leisch, and Kurt Hornik. The support vector machine under test. Neurocomputing 55(1-2): 169-186, 2003 http://dx.doi.org/10.1016/S0925-2312(03)00431-4), artificial neural network (Roman M. Balabin, Ekaterina I. Lomakina (2009). Neural network approach to quantum-chemistry data: Accurate prediction of density functional theory energies. J. Chem. Phys. 131 (7): 074104. doi: 0.1063/1.3206326), fuzzy c-means clustering (J. C. Dunn (1973): A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters, Journal of Cybernetics 3: 32-57), and genetic algorithms (Eiben, A. E. et al (1994). Genetic algorithms with multi-parent recombination. PPSN III: Proceedings of the International Conference on Evolutionary Computation. The Third Conference on Parallel Problem Solving from Nature: 78-87. ISBN 3-540-58484-6.) A preferred method, as illustrated in Example 1 , involves a step-wise linear regression and principal component analysis. A skilled person may develop alternative prediction models.

One way of implementing the method is therefore to genotype the necessary polymorphisms and apply the model described in Example 1 to make the prediction.

According to one embodiment, a proxy marker may be genotyped in place of the corresponding SNP mentioned in Example 1. To use the information from such a polymorphism in the prediction method, it may be necessary to build a modified prediction model based on genotype and phenotype data (either Rotterdam cohort data as described in Example 1 or other available data or new data). The modified prediction model can be developed using the statistical techniques described in Example 1.

Model parameters for use in the regression model described in Example 1 for combinations of SNPs and optionally other predictors (i.e. gender and age) according to the first aspect of the invention are provided in Table 3 and Table 4.

Table 3: Beta variables and constants for prediction of hue using 23 different combinations of predictors

H1 H2 H3 H4 H5 H6

constant 31.58 31.81 32.00 31.54 32.45 32.00

Female 0.31 0.32 0.32 0.33 0.32 0.33

Age per year 0.07 0.08 0.07 0.07 0.07 0.07 rs12896399C -0.50 -0.50 -0.51 -0.49 -0.50 rs1800407T 1.33 1.34 1.33 1.31 1.30 rs12913832A -10.66 -10.56 -10.66 -10.65 -10.66 -10.65 rs9894429A 0.56 0.55 rs7277820G -0.50 -0.49

H7 H8 H9 H10 H11 H12

constant 36.67 37.17 37.14 36.63 37.60 37.09 rs12896399C -0.47 -0.48 -0.49 -0.47 -0.48 rs1800407T 1.81 1.83 1.81 1.80 1.78 rs12913832A -10.71 -10.58 -10.71 -10.70 -10.71 -10.70 rs9894429A 0.58 0.57 rs7277820G -0.50 -0.49 H13 H14 H15 H16 H17 H18

Constant 31.40 31.38 31.61 31.52 31.39 31.85

Female 0.32 0.30 0.31 0.30 0.29 0.29

Age per year 0.08 0.08 0.07 0.08 0.08 0.08

rs16891982C -2.52 -2.52 -2.50

rs12203592A 0.71 0.71 rs1800407G 1.68 2.03 2.02 1.98 rs1129038C -4.31 -4.43 -4.39 -4.41 -4.38 rs12913832A -10.60 -5.94 -5.93 -5.95 -5.94 -5.96 rs7277820G -0.51

H19 H20 H21 H22 H23

Constant 31.36 31.71 31.49 31.89 31.79

Female 0.29 0.30 0.29 0.30 0.30

Age per year 0.07 0.07 0.07 0.07 0.07

rs16891982C -2.50 -2.49 -2.46 -2.43 -2.37

rs12203592A 0.71 0.70 0.69 0.69 0.68

rs1325127G -0.53 -0.52 -0.53 -0.53

rs1806319G 0.41 0.40 0.40

rs12896399C -0.45 -0.45

rs728405C 0.27

rs1800407G 1.97 2.00 2.00 2.00 1.84

rs1129038C -4.43 -4.43 -4.41 -4.35 -4.40

rs12913832A -5.90 -5.90 -5.92 -5.99 -6.00

rs9894429A 0.57 0.57 0.56 0.56 0.56

rs7277820G -0.50 -0.48 -0.48 -0.48 -0.48

Table 4: Beta variables and constants for prediction of saturation using 24 different combinations of predictors

S1 S2 S3 S4 S5 S6

constant 0.57 0.56 0.55 0.56 0.54 0.55

Female 0.01 0.01 0.01 0.01 0.01 0.01

Age per year 0.00 0.00 0.00 0.00 0.00 0.00 rs12896399C 0.02 0.02 0.02 0.02 0.02 rs1800407T -0.03 -0.03 -0.03 -0.03 -0.03 rs12913832A 0.22 0.21 0.22 0.22 0.22 0.22 rs9894429A -0.01 -0.01 rs7277820G 0.01 0.01 S7 S8 S9 S10 S11 S12 constant 0.37 0.35 0.35 0.36 0.34 0.35 rs12896399C 0.02 0.02 0.02 0.02 0.02 rs1800407T -0.05 -0.05 -0.05 -0.05 -0.05 rs12913832A 0.22 0.21 0.22 0.22 0.22 0.22 rs9894429A -0.01 -0.01 rs7277820G 0.01 0.01

S13 S14 S15 S16 S17 S18 constant 0.574 0.574 0.570 0.571 0.553 0.556

Female 0.009 0.009 0.009 0.009 0.008 0.008

Age per year -0.003 -0.003 -0.003 -0.003 -0.003 -0.003 rs3768056

rs16891982C 0.033 0.032 0.032 rs12203592A -0.014 rs12896399C 0.020 0.020 rs1800407G -0.029 -0.034 -0.034 -0.034 rs1129038C 0.026 0.028 0.027 0.025 0.025 rs12913832A 0.214 0.186 0.186 0.186 0.189 0.189

S19 S20 S21 S22 S23 S24 constant 0.550 0.541 0.532 0.540 0.546 0.546

Female 0.009 0.008 0.008 0.008 0.009 0.009

Age per year -0.003 -0.003 -0.003 -0.003 -0.003 -0.003 rs3768056G 0.013 0.013 0.013 0.013 0.013 0.013 rs16891982C 0.031 0.031 0.031 0.031 0.030 0.030 rs12203592A -0.014 -0.014 -0.014 -0.014 -0.014 -0.014 rs1325127G 0.012 0.012 0.012 0.012 0.012 rs1806319G -0.011 -0.011 rs12896399C 0.020 0.020 0.020 0.020 0.020 0.020 rs728405C 0.001 rs1800407G -0.034 -0.034 -0.033 -0.033 -0.033 -0.034 rs1129038C 0.024 0.024 0.024 0.025 0.024 0.024 rs12913832A 0.190 0.189 0.190 0.189 0.190 0.190 rs9894429A -0.009 -0.009 -0.009 rs7277820G 0.010 0.010 0.010 0.010 Suitably, step (b) of the method of the first aspect of the invention further comprises evaluating the age of the human. The age of the human is then used in the prediction of iris color. Surprisingly, age was identified to be a strong predictor of quantitative eye color; increased age was associated with increased H and decreased S, as described in Example 1. The age of unidentified corpses and skeletons, and also of living persons, can be evaluated using methods known in the art, as described in Schmeling ef al, 2007, Forensic Sci Int. 165:178-81. Methods of evaluating age based on skeletal and/or dental indicators are described in Matrille et al, 2007, J Forensic Sci. 52: 302-7. Ritz-Timme et al, 2000, Int J Legal Med. 113:129-36 describes that in childhood and adolescence, morphological methods based on the radiological examination of dental and skeletal development are to be recommended. In adulthood, a biochemical method based on aspartic acid racemization in dentine provides the most accurate estimates of age, followed by special morphological dental and skeletal methods. Alkass et al, 2009, Mol Cell Proteomics. Dec 4. [Epub ahead of print] describes a method of radiocarbon analysis of tooth enamel which provides an estimated year of birth, with an absolute error of 0.6±0.4 years. Aspartic acid racemization analysis indicates the chronological age of the individual at the time of death, with an absolute error of 5.4±4.2 years. Either method or a combination of the two may be used to evaluate age of unidentified corpses and skeletons. Age may also be inferred from biological markers such as gene expression markers as described in Lu T et al (2004) Nature 429 (6994): 883-91 , or from DNA methylation markers.

Suitably, step (b) of the method of the first aspect of the invention further comprises evaluating the gender of the human. The gender of the human is then used in the prediction of iris color. As shown in Example 1 , gender showed a small effect on both H and S. Gender can be determined morphologically. Genetic tests based on the presence or absence of markers indicative of the Y chromosome are also available (Esteve Codina A ef al (2009) Int J Legal Med 123: 459-464). A second aspect of the invention provides a method for predicting the iris color of a human, the method comprising:

(a) obtaining a sample of the nucleic acid of the human;

(b) genotyping the nucleic acid for a polymorphism which is (i) in the region between basepairs 76891593 and 77498447 on chromosome 17 according to NCBI Build 36 and is associated with variation in iris color; and/or is (ii) rs9894429 or a polymorphic site which is in linkage disequilibrium with rs9894429 at an r² value of at least 0.5; and (c) predicting the iris color based on the results of step (b).

Step (a) may be performed as described above in relation to the first aspect of the invention.

The inventors are the first to identify a SNP which is associated with variation in iris color in the region between basepairs 76891593 and 77498447 on chromosome 17. SNPs in this region which are associated with iris color variation are rs9894429, rs7219915 and rs12452184. Different alleles of a polymorphism typically affect a phenotype by modifying a gene or the expression of a gene. They may affect the coding portion of a gene such that the protein encoded by the gene has a different amino acid sequence depending on which allele is present. Alternatively, they may affect the expression of a gene. It is thought that the SNPs rs9894429, rs7219915 and rs12452184 affect a gene in the region between basepairs 76891593 and 77498447 on chromosome 17. The gene may be ACTG1 (basepair position 77091593 to 77094422); C17orf70 (basepair position 77117386 to 77129868); FSCN2 (basepair position 77110011 to 77114632); NPLOC4 (basepair position 77134356 to 77214543); TSPAN10 (basepair position 77219753 to 77226184); PDE6G (basepair position 77227655 to 77233971); ARL16 (basepair position 77258628 to 77261359); HGS (basepair position 77261424 to 77279552); MRPL12 (basepair position 77280811 to 77284953); SLC25A10 (basepair position 77289775 to 77298447). As regulatory regions may be located upstream or downstream of a gene, and SNPs which are located outside of the regulatory regions of a gene may be in strong LD with a SNP which is located within the regulatory region, suitable SNPs may be located in any of the genes mentioned above, or up to 200 kb, typically up to 100 kb, more typically up to 50 kb upstream of the start of the gene, or up to 200 kb, typically up to 100 kb, more typically up to 50 kb downstream of the end of the gene.

Further details of the genes are given below:

ACTG1

Symbol ACTG1 and Name: actin, gamma 1 [Homo sapiens]

Other Aliases: ACT, ACTG, DFNA20, DFNA26

Other Designations: actin, cytoplasmic 2; actin, gamma 1 propeptide; cytoskeletal gamma-actin

Chromosome: 17; Location: 17q25

Annotation: Chromosome 17, NC_000017.10 (77091593..77094422, complement) MIM: 102560

GenelD: 71

FSCN2

Official Symbol FSCN2 and Name: fascin homolog 2, actin-bundling protein, retinal

(Strongylocentrotus purpuratus) [Homo sapiensj

Other Aliases: RFSN, RP30

Other Designations: fascin 2

Chromosome: 17; Location: 17q25

Annotation: Chromosome 17, NC_000017.10 (77110011..77114632)

MIM: 607643

GenelD: 25794

C17orf70

Official Symbol C17orf70 and Name: chromosome 17 open reading frame 70 [Homo sapiens]

Other Aliases: FAAP100, FLJ22175, FLJ30151

Other Designations: Fanconi anemia associated protein 100 kDa subunit; Fanconi anemia core complex 100 kDa subunit

Chromosome: 17; Location: 17q25.3

Annotation: Chromosome 17, NC_000017.10 (77117386..77129868, complement) MIM: 611301

GenelD: 80233 NPLOC4

Official Symbol NPLOC4 and Name: nuclear protein localization 4 homolog (S. cerevisiae) [Homo sapiens]

Other Aliases: FLJ20657, FLJ23742, KIAA1499, NPL4

Other Designations: nuclear protein localization 4

Chromosome: 17; Location: 17qter

Annotation: Chromosome 17, NC_000017.10 (77134356..77214543, complement) MIM: 606590

GenelD: 55666 TSPAN10

Official Symbol TSPAN10 and Name: tetraspanin 10 [Homo sapiens]

Other Aliases: FLJ39607, OCSP Other Designations: oculospanin

Chromosome: 17; Location: 17q25.3

Annotation: Chromosome 17, NC_000017.10 (77219753..77226184)

GenelD: 83882

PDE6G

Official Symbol PDE6G and Name: phosphodiesterase 6G, cGMP-specific, rod, gamma [Homo sapiens]

Other Aliases: DKFZp686C0587, MGC125749, PDEG

Other Designations: phosphodiesterase 6G; retinal rod rhodopsin-sensitive cGMP 3',5'- cyclic phosphodiesterase subunit gamma; rod cG-PDE G

Chromosome: 7; Location: 17q25

Annotation: Chromosome 17, NC_000017.10 (77227655..77233971, complement) MIM: 180073

GenelD: 5148

ARL16

Official Symbol ARL16 and Name: ADP-ribosylation factor-like 16 [Homo sapiens] Chromosome: 17; Location: 17q25.3

Annotation: Chromosome 17, NC_000017.10 (77258628..77261359, complement) GenelD: 339231

HGS

Official Symbol HGS and Name: hepatocyte growth factor-regulated tyrosine kinase substrate [Homo sapiens]

Other Aliases: HRS, Vps27, ZFYVE8

Other Designations: human growth factor-regulated tyrosine kinase substrate

Chromosome: 17; Location: 17q25

Annotation: Chromosome 17, NC_000017.10 (77261424..77279552)

MIM: 604375

GenelD: 9146

MRPL12

Official Symbol MRPL12 and Name: mitochondrial ribosomal protein L12 [Homo sapiens] Other Aliases: 5c5-2, FLJ60124, L12mt, MGC8610, MRP-L31/34, MRPL7, MRPL7/L12, RPML12

Chromosome: 17; Location: 17q25 Annotation: Chromosome 17, NC_000017.10 (77280811..77284953)

MIM: 602375

GenelD: 6182 SLC25A10

Official Symbol SLC25A10 and Name: solute carrier family 25 (mitochondrial carrier, dicarboxylate transporter), member 0 [Homo sapiens]

Other Aliases: DIC

Other Designations: dicarboxylate ion carrier

Chromosome: 17; Location: 17q25.3

Annotation: Chromosome 17, NC_000017.10 (77289775.-77298447)

MIM: 606794

GenelD: 1468 Step (c) of the method of the second aspect of the invention comprises predicting the iris color based on the results of step (b).

The method may involve prediction of at least one quantitative color parameter as a numeric variable. Suitable models may be constructed as described in relation to the first aspect of the invention, and in Example 1.

Alternatively, the method may comprise a categorical prediction of the iris color. Various methods can be used to arrive at a categorical prediction model, as described in Liu F et al (2009) Curr Biol 19: R192-193 including ordinal regression, multinomial logistic regression, fuzzy c-means clustering, neural networks or classification trees. Multinomial logistic regression is preferred as described in Liu F et al, supra. The probabilities of each individual being brown (π1), blue (π2), and otherwise (π3) are calculated based on on the sample genotypes, π =

l + exp(«, +∑fi(^)_kx_k) + exp(a₂ +∑fi(^)_kx_k) π₂

l + exp(Qr, +∑β(π_ι)_Ιι ^χ _ΐ!) + εχρ(α₂ +∑p{n₂)_kx_k )

7T₃ = \— n_x— 71

where x_k is the number of minor alleles of the kth SNP. The skilled person may develop alternative prediction models. Typically, the categories which may be predicted are brown, blue and intermediate. Another possible categorisation could be between blue and non-blue or between brown and non-brown. "Brown" includes all hues and all shades or tints of brown. "Blue" includes all hues and all shades or tints of gray or blue. "Intermediate" includes hazel, or green iris color. When developing a model, assignment of an eye color category for the model building data set can be done on the basis of inspection of eye photographs. The use of good quality photographic images, several images per eye and categorisation by a single grader are preferred.

Typically, a categorical prediction may return a probability of a true positive for each of the categories, the probabilities adding up to 1. Suitably, the category which has the highest probability of a true positive would be the category in which the iris color is predicted. For example, the probability may be 0.90 for blue, 0.06 for intermediate and 0.04 for brown. In that case, the prediction would be that the iris color is blue. If the probability of blue was, say, only 0.70, the degree of confidence that the prediction is correct would be lower. In particular, there would be a greater probability of a false positive, i.e. blue is predicted but the color is actually not blue. One can set a minimum probability below which the prediction is unclassified. For example, if one set a minimum probability of 0.80, in the case in which blue is predicted at 0.90, the prediction would remain blue. In the second case, where the probability of blue was only 0.70, the prediction would be unclassified. Different degrees of sensitivity and specificity would be associated with each probability (accuracy) level. "Sensitivity" is the correct call rate and equals 100% minus the percentage of false negatives. "Specificity" equals 100% minus the percentage of false positives. Historical data may be used to establish the sensitivity and specificity of the prediction at a given probability level. Altering the probability level can achieve higher specificity levels although this will affect the overall sensitivity of the model. Thus, as well as returning the category, whether it be blue, intermediate, brown or unclassified, the method can also involve recording the probability of a true positive in that category, and/or the probability level used as the cut-off, and/or the specificity and/or sensitivity of the model for the given probability level.

It will be appreciated that further polymorphisms may be genotyped in the method of the second aspect of the invention in order to arrive at a prediction of iris color. Suitable combinations of polymorphisms which include a polymorphism which is (i) in the region between basepairs 76891593 and 77498447 on chromosome 17 according to NCBI Build 36 and is associated with variation in iris color; and/or is (ii) rs9894429 or a polymorphic site which is in linkage disequilibrium with rs9894429 at an r² value of at least 0.5 are as described in relation to the first aspect of the invention. Suitable proxy markers within the region between basepairs 76891593 and 77498447 on chromosome 17 and/or in LD with rs9894429 are also described in relation to the first aspect of the invention.

Alpha and beta variables for use in the multinomial logistic regression as described above and in Liu F et al, supra for combinations of SNPs according to the second aspect of the invention are provided in Table 5. Also shown is the expected AUC, an indication of prediction accuracy. The effect allele for each SNP is the minor allele as shown in the table.

Table 5: Model parameters for categorical iris color prediction.

expected_AUC model chr pos allele betal beta2 blue inter alpha 3.2487 0.3312 0.8826 0.6584 rs12913832 15 26039213 A -4.5388 -1.7889

rs9894429 17 77207216 A 0.2411 0.2179 alpha 3.2432 0.3289

rs12913832 15 26039213 A -4.6439 -1.8943 0.8845 0.6628 rs1800407 15 25903913 T 0.9389 0.9052

rs9894429 17 77207216 A 0.2369 0.2137 alpha 3.813 0.3336 0.9038 0.6993 rs12913832 15 26039213 A -4.6371 -1.7886

rs12896399 14 91843416 C -0.5166 -0.003

rs9894429 17 77207216 A 0.2472 0.2187 alpha 3.8153 0.3382 0.9055 0.7042 rs12913832 15 26039213 A -4.7461 -1.8959

rs12896399 14 91843416 C -0.523 -0.0087

rs1800407 15 25903913 T 0.9573 0.9098

rs9894429 17 77207216 A 0.2429 0.2147 alpha 3.9647 0.3456 0.907 0.7092 rs12913832 15 26039213 A -4.7577 -1.896

rs12896399 14 91843416 C -0.5262 -0.0097

rs1800407 15 25903913 T 0.9627 0.9105

rs9894429 17 77207216 A 0.238 0.2132

rs7277820 21 37502179 G -0.1482 -0.0O52 A third aspect of the invention provides a method for predicting the iris color of a human, the method comprising:

(a) obtaining a sample of the nucleic acid of the human;

(b) genotyping the nucleic acid for a polymorphism which is (i) in the region between basepairs 37100732 and 37761703 on chromosome 21 according to

NCBI Build 36 and is associated with variation in iris color; and/or is (ii) rs7277820 or a polymorphic site which is in linkage disequilibrium with rs7277820 at an r² value of at least 0.5; and

(c) predicting the iris color based on the results of step (b).

The inventors are the first to identify a SNP which is associated with variation in iris color in the region between basepairs 37100732 and 37761703 on chromosome 21. SNPs in this region which are associated with iris color variation are rs1003719, rs2252893, rs2835621, rs2835630 and rs7277820. It is thought that these SNPs affect a gene in the region between basepairs 37100732 and 37761703 on chromosome 21. The gene may be DSCR3 (basepair position 37517595 to 37561703); DSCR6 (basepair position 37300732 to 37313828); DSCR9 (basepair position 37502674 to 37515906); TTC3 (basepair position 37367440 to 37497278); PIGP (basepair position 37359533 to 37367328). As regulatory regions may be located upstream or downstream of a gene, and SNPs which are located outside of the regulatory regions of a gene may be in strong LD with a SNP which is located within the regulatory region, suitable SNPs may be located in any of the genes mentioned above, or up to 200 kb, typically up to 100 kb, more typically up to 50 kb upstream of the start of the gene, or up to 200 kb, typically up to 100 kb, more typically up to 50 kb downstream of the end of the gene.

Further details of the genes are given below.

DSCR3

Official Symbol DSCR3 and Name: Down syndrome critical region gene 3 [Homo sapiens]

Other Aliases: DCRA, DSCRA, MGC117385

Other Designations: Down syndrome critical region protein 3; Down syndrome critical region protein A

Chromosome: 21; Location: 21q22.2 Annotation: Chromosome 21, NC_000021.8 (37517595..37561703, complement) MIM: 605298

GenelD: 10311 DSCR6

Official Symbol DSCR6 and Name: Down syndrome critical region gene 6 [Homo sapiens]

Other Aliases: RIPPLY3

Other Designations: Down syndrome critical region protein 6

Chromosome: 21; Location: 21q22.2

Annotation: Chromosome 21 , NC_000021.8 (37300732..37313828)

MIM: 609892

GenelD: 53820 DSCR9

Down syndrome critical region gene 9 (non-protein coding) [ Homo sapiens ]

Official Full Name Down syndrome critical region gene 9 (non-protein coding) provided by HGNC

Also known as NCRNA00038

GenelD: 257203 updated 27-Jan-2010

TTC3

Official Symbol TTC3 and Name: tetratricopeptide repeat domain 3 [Homo sapiens] Other Aliases: DCRR1 , DKFZp686M0150, RNF105, TPRDIII

Other Designations: OTTHUMP37367440..37497278; TPR repeat protein D

Chromosome: 21 ; Location: 21q22.2

Annotation: Chromosome 21, NC_000021.8 (37367440..37497278)

MIM: 602259

GenelD: 7267

PIGP

Official Symbol PIGP and Name: phosphatidylinositol glycan anchor biosynthesis, class P [Homo sapiens]

Other Aliases: DCRC, DCRC-S, DSCR5, DSRC

Other Designations: Down syndrome critical region gene 5; Down syndrome critical region protein 5; Down syndrome critical region protein C; OTTHUMP37359533..37367328; OTTHUMP00000109079; phosphatidylinositol N- acetylglucosaminyltransferase subunit P; phosphatidylinositol glycan, class phosphatidylinositol-glycan biosynthesis class P; phosphatidylinositol acetylglucosaminyltranferase subunit

Chromosome: 21; Location: 21q22.2

Annotation: Chromosome 21, NC_000021.8 (37359533..37367328, complement) MIM: 605938

GenelD: 51227

Step (c) of the method of the third aspect of the invention comprises predicting the iris color based on the results of step (b). The method may involve prediction of at least one quantitative color parameter as a numeric variable, or may be a categorical prediction. Suitable methods are as described above in relation to the first and second aspects of the invention. It will be appreciated that further polymorphisms may be genotyped in the method of the third aspect of the invention in order to arrive at a prediction of iris color. Suitable combinations of polymorphisms which include a polymorphism which is (i) in the region between basepairs 37100732 and 37761703 on chromosome 21 according to NCBI Build 36 and is associated with variation in iris color; and/or is (ii) rs7277820 or a polymorphic site which is in linkage disequilibrium with rs7277820 at an r² value of at least 0.5 are as described in relation to the first aspect of the invention. Suitable proxy markers within the region between basepairs 37100732 and 37761703 on chromosome 21 and/or in LD with rs7277820 are also described in relation to the first aspect of the invention.

Alpha and beta variables for use in the multinomial logistic regression as described above and in Liu F et al, supra for combinations of SNPs according to the third aspect of the invention are provided in Table 6. Also shown is the expected AUC, an indication of prediction accuracy. The effect allele for each SNP is the minor allele as shown in the table.

Table 6: Model parameters for categorical iris color prediction

expected_AUC model chr pos allele betal beta2 blue inter brown alpha 3.5962 0.5228 0.8839 0.6615 0.9004 rs12913832 15 26039213 A -4.5334 -1.7739

rs7277820 21 37502179 G -0.1521 -0.01 expected_AUC model chr pos allele betal beta2 blue inter brown alpha 3.5896 0.5196 0.8863 0.6693 0.9058 rs12913832 15 26039213 -4.6408 -1.8809

rs1800407 15 25903913 0.95 0.9098

rs7277820 21 37502179 -0.1543 -0.0122 alpha 4.164 0.5226 0.9038 0.6968 0.9103 rs12913832 15 26039213 -4.632 -1.7729

rs12896399 14 91843416 -0.5153 0.0008

rs7277820 21 37502179 -0.1516 -0.0116 alpha 4.1662 0.5264 0.9059 0.7048 0.9152 rs12913832 15 26039213 -4.7438 -1.8817

rs12896399 14 91843416 -0.5222 -0.0051

rs1800407 15 25903913 0.9702 0.9142

rs7277820 21 37502179 -0.1544 -0.0133 alpha 3.3921 0.3343 0.887 0.6688 0.9067 rs12913832 15 26039213 -4.5482 -1.7886

rs9894429 17 77207216 0.2372 0.2169

rs7277820 21 37502179 -0.1469 -0.0025 alpha 3.3889 0.3341 0.8889 0.6734 0.9105 rs12913832 15 26039213 A -4.6541 -1.8942

rs1800407 15 25903913 T 0.9429 0.9057

rs9894429 17 77207216 A 0.2327 0.2124

rs7277820 21 37502179 G -0.1491 -0.0044

alpha 4.2043 0.2884 0.9172 0.7336 0.93 rs3768056 1 2.34E+08 G -0.1675 0.0853

rs16891982 5 33987450 C -1.4741 -0.6475

rs12203592 6 341321 A 0.6064 0.6971

rs1325127 9 12658328 G -0.433 -0.2485

rs1393350 11 88650694 A 0.4249 0.2395

rs12896399 14 91843416 C -0.5467 -0.0171

rs728405 15 25873448 C 0.2481 0.3101

rs1800407 15 25903913 T 1.0283 0.8895

rs1129038 15 26030454 C -0.9377 -0.6178

rs12913832 15 26039213 A -4.0914 -1.4644

rs9894429 17 77207216 A 0.2295 0.2323

rs7277820 21 37502179 G -0.167 -0.0199

rs2070959 2 2.34E+08 G -0.0542 -0.0568 expected_AUC model chr pos allele betal beta2 blue inter brown alpha 3.9594 0.3394 0.9053 0.7038 0.9136 rs12913832 15 26039213 A -4.6476 -1.7885

rs12896399 14 91843416 C -0.5195 -0.0039

rs9894429 17 77207216 A 0.2426 0.2174

rs7277820 21 37502179 G -0.1457 -0.004

A fourth aspect of the invention provides a method for predicting the iris color of a human, the method comprising:

(a) obtaining a sample of the nucleic acid of the human;

(b) genotyping the nucleic acid for: a polymorphism which is (i) in the region between basepairs 233690968 and 234296843 on chromosome 1 according to NCBI Build 36 and is associated with variation in iris color; and/or is (ii) rs3768056 or a polymorphic site which is in linkage disequilibrium with rs3768056 at an r² value of at least 0.5; and

(c) predicting the iris color based on the results of step (b).

Step (a) may be performed as described above in relation to the first aspect of the invention. The inventors are the first to identify a SNP which is associated with variation in iris color in the region between basepairs 233690968 and 234296843 on chromosome 1. SNPs in this region which are associated with iris color variation are rs3768056 and rs9782955. It is thought that these SNPs affect a gene in the region between basepairs 233690968 and 234296843 on chromosome 1. The gene may be LYST (basepair position 233890968 to 234096843). As regulatory regions may be located upstream or downstream of a gene, and SNPs which are located outside of the regulatory regions of a gene may be in strong LD with a SNP which is located within the regulatory region, suitable SNPs may be located in LYST, or up to 200 kb, typically up to 100 kb, more typically up to 50 kb upstream of the start of LYST, or up to 200 kb, typically up to 100 kb, more typically up to 50 kb downstream of the end of LYST.

Further details of the gene are given below.

LYST

Official Symbol LYST and Name: lysosomal trafficking regulator [Homo sapiens] Other Aliases: CHS, CHS1

Other Designations: Chediak-Higashi syndrome 1 ; beige protein

Chromosome: 1 ; Location: 1q42.1-q42.2

Annotation: Chromosome 1 , NC_000001.10 (233890968..234096843, complement)

IM: 606897

GenelD: 1130

Step (c) of the method of the fourth aspect of the invention comprises predicting the iris color based on the results of step (b). The method may involve prediction of at least one quantitative color parameter as a numeric variable, or may be a categorical prediction. Suitable methods are as described above in relation to the first and second aspects of the invention. Where at least one quantitative color parameter is predicted, it is suitably saturation. It will be appreciated that further polymorphisms may be genotyped in the method of the fourth aspect of the invention in order to arrive at a prediction of iris color. Suitable combinations of polymorphisms which include a polymorphism which is (i) in the region between basepairs 233690968 and 234296843 on chromosome 1 according to NCBI Build 36 and is associated with variation in iris color; and/or is (ii) rs3768056 or a polymorphic site which is in linkage disequilibrium with rs3768056 at an r² value of at least 0.5 are as described in relation to the first aspect of the invention. Suitable proxy markers within the region between basepairs 233690968 and 234296843 on chromosome 1 and/or in LD with rs3768056 are also described in relation to the first aspect of the invention.

Alpha and beta variables for use in the multinomial logistic regression as described above and in Liu F et al, supra for combinations of SNPs according to the fourth aspect of the invention are provided in Table 7. Also shown is the expected AUC, an indication of prediction accuracy. The effect allele for each SNP is the minor allele as shown in the table.

Table 7: Model parameters for categorical iris color prediction

expected_AUC model chr pos allele betal beta2 blue inter brown alpha 3.5422 0.4463 0.8862 0.6732 0.8992 rs12913832 26039213 A -4.5448 -1.7483

rs3768056 2.34E+08 G -0.1622 0.0927 expected_AUC model chr pos allele betal beta2 blue inter brown alpha 3.5333 0.441 0.889 0.6783 0.9049 rs12913832 15 26039213 A -4.6513 -1.8561

rs1800407 15 25903913 T 0.943 0.9123

rs3768056 1 2.34E+08 G -0.1616 0.0931 alpha 4.1227 0.4505 0.9058 0.7032 0.9108 rs12913832 26039213 -4.6468 -1.7476

rs12896399 91843416 -0.5255 -0.0035

rs3768056 2.34E+08 -0.1604 0.0908 alpha 4.1213 0.4526 0.9075 0.7103 0.9156 rs12913832 26039213 -4.7573 -1.8576

rs12896399 91843416 -0.5316 -0.0093

rs1800407 25903913 0.9606 0.9172

rs3768056 2.34E+08 -0.1595 0.0912 alpha 4.2853 0.5301 0.9094 0.7164 0.9164 rs12913832 15 26039213 A -4.7634 -1.8107

rs12896399 14 91843416 C -0.5359 -0.0114

rs1800407 15 25903913 T 0.9571 0.8693

rs9894429 17 77207216 A -0.0218 -0.171

rs7277820 21 37502179 G -0.1559 -0.0192

rs3768056 1 2.34E+08 G -0.164 0.0891

A fifth aspect of the invention provides a method for predicting the iris color of a human, the method comprising:

(a) obtaining a sample of the nucleic acid of the human;

(b) genotyping the nucleic acid for a polymorphism which is (i) in the region between basepairs 233848903 and 234546690 on chromosome 2 according to NCBI Build 36 and is associated with variation in iris color; and/or is (ii) rs2070959 or a polymorphic site which is in linkage disequilibrium with rs2070959 at an r² value of at least 0.5; and

(c) predicting the iris color based on the results of step (b).

Step (a) may be performed as described above in relation to the first aspect of the invention. The inventors are the first to identify a SNP which is associated with variation in iris color in the region between basepairs 233848903 and 234546690 on chromosome 2. SNPs in this region which are associated with iris color variation are rs2070959, rs1105879, rs892839, rs10209564. It is thought that these SNPs affect a gene in the region between basepairs 233848903 and 234546690 on chromosome 2. The gene may be USP40 (basepair position 234048903 to 234134606); UGT1A1 (basepair position 234333657 to 234346684); UGT1A3 (basepair position 234302511 to 234346684); UGT1A4 (basepair position 234292176 to 234346684); UGT1A5 (basepair position 234286376 to 234346684); UGT1A6 (basepair position 234265059 to 234346690); UGT1A7 (basepair position 234255322 to 234346684); UGT1A8 (basepair position 234191029 to 234346684); UGT1A9 (basepair position 234245282 to 234346690); UGT1A10 (basepair position 234209861 to 234346690); DNAJB3 (basepair position 234316134 to 234317400). As regulatory regions may be located upstream or downstream of a gene, and a polymorphism which is located outside of the regulatory regions of a gene may be in strong l_D with a polymorphism which is located within the regulatory region, suitable polymorphisms may be located in any of the genes mentioned above, or up to 200 kb, typically up to 100 kb, more typically up to 50 kb upstream of the start of the gene, or up to 200 kb, typically up to 100 kb, more typically up to 50 kb downstream of the end of the gene.

Further details of the genes are given below.

USP40

Official Symbol USP40 and Name: ubiquitin specific peptidase 40 [Homo sapiens] Other Aliases: FLJ10785, FLJ42100

Other Designations: deubiquitinating enzyme 40; ubiquitin carboxyl-terminal hydrolase 40; ubiquitin specific protease 40; ubiquitin thioesterase 40; ubiquitin-specific-processing protease 40

Chromosome: 2; Location: 2q37.1

Annotation: Chromosome 2, NC_000002.11 (234048903..234134606, complement) MIM: 610570

GenelD: 55230

UGT1A1

Official Symbol UGT1A1 and Name: UDP glucuronosyltransferase 1 family, polypeptide A1 [Homo sapiens]

Other Aliases: GNT1 , HUG-BR1 , UDPGT, UGT1 , UGT1A Other Designations: UDP glucuronosyltransferase 1A1; UDP glycosyltransferase 1 family, polypeptide A1; bilirubin UDP-glucuronosyltransferase 1-1; bilirubin UDP- glucuronosyltransferase isozyme 1

Chromosome: 2; Location: 2q37

Annotation: Chromosome 2, NC_000002.11 (234333657..234346684)

MIM: 191740

GenelD: 54658

UGT1A3

Official Symbol UGT1A3 and Name: UDP glucuronosyltransferase 1 family, polypeptide A3 [Homo sapiens]

Other Aliases: UGT1C

Other Designations: UDP glucuronosyltransferase 1A3; UDP glycosyltransferase 1 family, polypeptide A3; UDP-glucuronosyltransferase

Chromosome: 2; Location: 2q37

Annotation: Chromosome 2, NC_000002.11 (234302511..234346684)

MIM: 606428

GenelD: 54659 UGT1A4

Official Symbol UGT1A4 and Name: UDP glucuronosyltransferase 1 family, polypeptide A4 [Homo sapiens]

Other Aliases: HUG-BR2, UDPGT, UGT1D

Other Designations: UDP glycosyltransferase 1 family, polypeptide A4; UDP- glucuronosyltransferase; bilirubin UDP-glucuronosyltransferase isozyme 2

Chromosome: 2; Location: 2q37

Annotation: Chromosome 2, NC_000002.11 (234292176..234346684)

MIM: 606429

GenelD: 54657

UGT1A5

Official Symbol UGT1A5 and Name: UDP glucuronosyltransferase 1 family, polypeptide A5 [Homo sapiens]

Other Aliases: UDPGT, UGT1E

Other Designations: UDP glycosyltransferase 1 family, polypeptide A5; UDP- glucuronosyltransferase 1 family polypeptide A5s; UDP-glucuronosyltransferase 1A5 Chromosome: 2; Location: 2q37 Annotation: Chromosome 2, NC_000002.11 (234286376.-234346684) MI : 606430

GenelD: 54579 UGT1A6

Official Symbol UGT1A6 and Name: UDP glucuronosyltransferase 1 family, polypeptide A6 [Homo sapiens]

Other Aliases: GNT1, HLUGP, HLUGP1 , MGC29860, UDPGT, UGT1, UGT1A6S, UGT1F

Other Designations: OTTHUMP234265059..234346690; UDP glycosyltransferase 1 family, polypeptide A6; UDP-glucuronosyltransferase 1 family polypeptide A6s; UDP- glucuronosyltransferase 1A6; phenol-metabolizing UDP-glucuronosyltransferase

Chromosome: 2; Location: 2q37

Annotation: Chromosome 2, NC_000002.11 (234265059..234346690)

MIM: 606431

GenelD: 54578

UGT1A7

Official Symbol UGT1A7 and Name: UDP glucuronosyltransferase 1 family, polypeptide A7 [Homo sapiens]

Other Aliases: UDPGT, UGT1G

Other Designations: UDP glycosyltransferase 1 family, polypeptide A7; UDP- glucuronosyltransferase 1 family polypeptide A7s; UDP-glucuronosyltransferase 1A7 Chromosome: 2; Location: 2q37

Annotation: Chromosome 2, NC_000002.11 (234255322..234346684)

MIM: 606432

GenelD: 54577

UGT1A8

Official Symbol UGT1A8 and Name: UDP glucuronosyltransferase 1 family, polypeptide A8 [Homo sapiens]

Other Aliases: UDPGT, UGT1A8S, UGT1H

Other Designations: UDP glycosyltransferase 1 family, polypeptide A8; UDP- glucuronosyltransferase 1 family polypeptide A8s; UDP-glucuronosyltransferase 1A8 Chromosome: 2; Location: 2q37

Annotation: Chromosome 2, NC_000002.11 (234191029..234346684)

MIM: 606433 GenelD: 54576

UGT1A9

Official Symbol UGT1A9 and Name: UDP glucuronosyltransferase 1 family, polypeptide A9 [Homo sapiens]

Other Aliases: HLUGP4, LUGP4, UDPGT, UGT1AI

Other Designations: UDP glycosyltransferase 1 family, polypeptide A9; UDP- glucuronosyltransferase 1A9

Chromosome: 2; Location: 2q37

Annotation: Chromosome 2, NC_000002.11 (234245282..234346690)

MIM: 606434

GenelD: 54600

UGT1A10

Official Symbol UGT1A10 and Name: UDP glucuronosyltransferase 1 family, polypeptide A10 [Homo sapiens]

Other Aliases: UDPGT, UGT1J

Other Designations. UDP glycosyltransferase 1 family, polypeptide A10; UDP- glucuronosyltransferase 1A10

Chromosome: 2; Location: 2q37

Annotation: Chromosome 2, NC_000002.11 (234209861..234346690)

MIM: 606435

GenelD: 54575 DNAJB3

Official Symbol DNAJB3 and Name: DnaJ (Hsp40) homolog, subfamily B, member 3 [Homo sapiens]

Other Aliases: HCG3, MGC26879

Other Designations: HCG3

Chromosome: 2; Location: 2q37

Annotation: Chromosome 2, NC_000002.11 (234316134..234317400, complement) GenelD: 414061

Step (c) of the method of the fifth aspect of the invention comprises predicting the iris color based on the results of step (b). The method may involve prediction of at least one quantitative color parameter as a numeric variable, or may be a categorical prediction. Suitable methods are as described above in relation to the first and second aspects of the invention.

It will be appreciated that further polymorphisms may be genotyped in the method of the fifth aspect of the invention in order to arrive at a prediction of iris color. Suitable combinations of polymorphisms which include a polymorphism which is (i) in the region between basepairs 233848903 and 234546690 on chromosome 2 according to NCBI Build 36 and is associated with variation in iris color; and/or is (ii) rs2070959 or a polymorphic site which is in linkage disequilibrium with rs2070959 at an r² value of at least 0.5 are as described in relation to the first aspect of the invention. Suitable proxy markers within the region between basepairs 233848903 and 234546690 on chromosome 2 and/or in LD with rs2070959 are also described in relation to the first aspect of the invention.

Alpha and beta variables for use in the multinomial logistic regression as described above and in Liu F et al, supra for combinations of SNPs according to the fifth aspect of the invention are provided in Table 8. Also shown is the expected AUC, an indication of prediction accuracy. The effect allele for each SNP is the minor allele as shown in the table.

Table 8: Model parameters of categorical iris color prediction

expected_AUC model chr pos allele betal beta2 blue inter brown alpha 3.4727 0.5318 0.8779 0.6474 0.8984 rs12913832 15 26039213 -4.5225 -1.771

rs2070959 2 2.34E+08 -0.037 -0.035 alpha 3.4636 0.5254 0.8806 0.6579 0.9039 rs12913832 15 26039213 -4.6298 -1.8783

rs1800407 15 25903913 0.9556 0.9197

rs2070959 2 2.34E+08 -0.0369 -0.0339 alpha 4.0418 0.5334 0.9018 0.6935 0.9101 rs12913832 15 26039213 -4.6207 -1.7702

rs12896399 14 91843416 -0.5143 0.0004

rs2070959 2 2.34E+08 -0.0405 -0.039 expected_AUC model chr os allele betal beta2 blue inter brown alpha 4.0401 0.5337 0.9039 0.7 0.9148 rs12913832 15 26039213 A -4.7318 -1.8793

rs12896399 14 91843416 C -0.5204 -0.005

rs1800407 15 25903913 T 0.9734 0.9238

rs2070959 2 2.34E+08 G -0.0405 -0.0379 alpha 4.1305 0.3236 0.9097 0.7171 0.918 rs12913832 15 26039213 A -4.7808 -1.8692

rs12896399 14 91843416 C -0.5416 -0.0166

rs1800407 15 25903913 T 0.9644 0.921

rs9894429 17 77207216 A 0.2212 0.2164

rs7277820 21 37502179 G -0.1558 -0.0151

rs3768056 1 2.34E+08 G -0.169 0.0887

rs2070959 2 2.34E+08 G -0.0498 -0.0493

By 'genotyping', we include determining the genotype of at least one of the SNPs described herein. In this way, the particular base or allele of a polymorphic site (e.g. SNP) becomes known. It is appreciated that by 'genotyping' we include the direct determination of a particular base or allele of a polymorphic site, as well as an indirect indicator of a particular base or allele of a polymorphic site.

It will be appreciated that genotyping a polymorphic site (e.g. SNP) as described above conveniently comprises contacting a sample of nucleic acid from the human with one or more nucleic acid molecules that hybridize selectively to a genomic region encompassing the polymorphism (e.g. SNP).

By "selective hybridization" or "selectively hybridize" we include the meaning that the nucleic acid molecule has sufficient nucleotide sequence similarity with the said genomic DNA or cDNA or mRNA that it can hybridise under highly stringent conditions. As is well known in the art, the stringency of nucleic acid hybridisation depends on factors such as length of nucleic acid over which hybridisation occurs, degree of identity of the hybridising sequences and on factors such as temperature, ionic strength and CG or AT content of the sequence. Conditions that allow for selective hybridization can be determined empirically, or can be estimated based, for example, on the above parameters (see, for example, Sambrook et al., "Molecular Cloning: A laboratory manual (Cold Spring Harbor Laboratory Press 1989)). Thus, any nucleic acid which is capable of selectively hybridising as said is useful in the practice of the invention. An example of a typical hybridization solution when a nucleic acid is immobilised on a nylon membrane and the probe is an oligonucleotide of between 15 and 50 bases is: 3.0 M trimethylammonium chloride (TMACI)

0.01 M sodium phosphate (pH 6.8)

1 mm EDTA (pH 7.6)

0.5% SDS

100 pg/ml denatured, fragmented salmon sperm DNA

0.1% nonfat dried milk

The optimal temperature for hybridisation is usually chosen to be 5^°C below the T₍ for the given chain length. T₍ is the irreversible melting temperature of the hybrid formed between the probe and its target sequence. Jacobs et al (1988) Nucl. Acids Res. 16, 4637 discusses the determination of TjS. The recommended hybridization temperature for 17-mers in 3 M TMACI is 48-50 ; for 19-mers, it is 55-57^°C; and for 20-mers, it is 58- 66^°C.

Nucleic acids which can selectively hybridise to the said DNA (such as human DNA) include nucleic acids which have >95% sequence identity, preferably those with >98%, more preferably those with >99% sequence identity, for example 100% sequence identity, over at least a portion of the nucleic acid with the said DNA or cDNA. As is well known, human genes usually contain introns such that, for example, a mRNA or cDNA derived from a gene within the said human DNA would not match perfectly along its entire length with the said human DNA but would nevertheless be a nucleic acid capable of selectively hybridizing to the said human DNA. Thus, the invention specifically includes nucleic acids which selectively hybridize to a cDNA but may not hybridise to the corresponding gene, or vice versa. For example, nucleic acids which span the intron- exon boundaries of a given gene may not be able to selectively hybridize to the cDNA of the gene. The nucleic acid may selectively hybridise to the said DNA over substantially the entire length of the nucleic acid, or only a portion of it may selectively hybridise, i.e. the hybridizing portion.

Typically, the one or more nucleic acid molecules that hybridize selectively to a genomic region encompassing the polymorphism are less than 100 bases in length, such as less than 90, 80, 70, 60, 50, 40 or 30 bases. Typically, the hybridising portion is less than 100 bases in length, such as less than 90, 80, 70, 60, 50, 40 or 30 bases. Typically, the hybridising portion may be between 10 and 30 bases in length, such as 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29 bases in length. The nucleic acid molecule may comprise one or more regions which do not hybridize selectively to said genomic region. Such regions may be useful for distinguishing between different nucleic acid molecules in a population of nucleic acid molecules. For example, the nucleic acid molecules used in a multiplex single base extension reaction to genotype SNPs in Example 2 comprise 5' non-hybridising portions of different numbers of T residues. The nucleic acid molecules are distinguished by virtue of their differing molecular weights, which in turn depends on the number of T residues.

"Nucleic acid that hybridizes selectively" is typically nucleic acid which will amplify DNA from the said region of DNA by any of the well known amplification systems such as those described in more detail below, in particular the polymerase chain reaction (PCR). Suitable conditions for PCR amplification include amplification in a suitable 1 x amplification buffer:

10 x amplification buffer is 500 mM KCI; 100 mM Tris.CI (pH 8.3 at room temperature); 15 m MgCI₂; 0.1% gelatin. A suitable denaturing agent or procedure (such as heating to 95°C) is used in order to separate the strands of double-stranded DNA.

Suitably, the annealing part of the amplification is between 37^°C and 60^°C, preferably 50^°C.

By 'hybridizing selectively to a genomic region encompassing the polymorphism' we include hybridizing at or near the polymorphism. The nucleic acid molecule may hybridize equally to the genomic region irrespective of the identity of the allele, or it may hybridize differentially to a genomic region encompassing one allele of a polymorphic site (e.g. SNP) versus another allele of that polymorphic site (e.g. SNP).

The "genomic region encompassing a polymorphism" can be considered as the polymorphism itself and its upstream and/or downstream flanking nucleotide sequences. The latter can serve to aid in the identification of the precise location of the SNP in the human genome, and serve as target gene segments useful for performing methods of the invention. Primers and probes that selectively hybridize to either or both flanking nucleotide sequences and optionally also the polymorphism, can be designed based on the disclosed gene sequences and information provided herein.

Typically, the sample of nucleic acid which is analysed is one which has been amplified from the immediate sample obtained from the human. Any of the nucleic acid amplification protocols can be used including the polymerase chain reaction, QB replicase and ligase chain reaction. Also, NASBA (nucleic acid sequence based amplification), also called 3SR, can be used as described in Compton (1991) Nature 350, 91-92 and AIDS (1993), Vol 7 (Suppl 2), S108 or SDA (strand displacement amplification) can be used as described in Walker et al (1992) Nucl. Acids Res. 20, 1691-1696. The polymerase chain reaction is particularly preferred because of its simplicity. Thus it will be appreciated that the sample of the nucleic acid of the human may be subjected to a nucleic acid amplification before genotyping or as part of the genotyping method. Typically, the amplification will be directed to the polymorphisms of interest using appropriate primer pairs.

Numerous methods are known in the art for genotyping a polymorphism, and particularly for determining the nucleotide occurrence for a particular SNP in a sample. Such methods can utilize one or more oligonucleotide probes or primers, including, for example, an amplification primer pair that selectively hybridize to a genomic region encompassing a polymorphism (e.g. SNP). Oligonucleotide probes useful in practicing a method of the invention can include, for example, an oligonucleotide that is complementary to and spans a portion of the genomic region encompassing the SNP, including the position of the SNP, wherein the presence of a specific nucleotide at the position (i.e., the SNP) is detected by differential hybridization of the probe, such as by the presence or absence of selective hybridization of the probe. Such a method can further include contacting the genomic region encompassing the polymorphism and hybridized oligonucleotide with an endonuclease, and detecting the presence or absence of a cleavage product of the probe, depending on whether the nucleotide occurrence at the SNP site is complementary to the corresponding nucleotide of the probe. Ye et al 2002 J Forensic Sci 47:592-600 describe how differential hybridization of a probe depending on the allele of a polymorphism can be determined by melting curve analysis.

An oligonucleotide ligation assay also can be used to identify a nucleotide occurrence at a polymorphic position, wherein a pair of probes that selectively hybridize upstream and adjacent to and downstream and adjacent to the site of the SNP, and wherein one of the probes includes a terminal nucleotide complementary to a nucleotide occurrence of the SNP. Where the terminal nucleotide of the probe is complementary to the nucleotide occurrence, selective hybridization includes the terminal nucleotide such that, in the presence of a ligase, the upstream and downstream oligonucleotides are ligated. As such, the presence or absence of a ligation product is indicative of the nucleotide occurrence at the SNP site.

An oligonucleotide can be useful as a primer, for example, for a primer extension reaction, wherein the product (or absence of a product) of the extension reaction is indicative of the nucleotide occurrence. In addition, a primer pair useful for amplifying a portion of the target polynucleotide including the SNP site can be useful, wherein the amplification product is examined to determine the nucleotide occurrence at the SNP site. Particularly useful methods include those that are readily adaptable to a high throughput format, to a multiplex format, or to both. The primer extension or amplification product can be detected directly or indirectly and/or can be sequenced using various methods known in the art. Amplification products which span a SNP locus can be sequenced using traditional sequence methodologies (e.g., the "dideoxy- mediated chain termination method," also known as the "Sanger Method" (Sanger, F., et al., J. Molec. Biol. 94:441 (1975); Prober et al. Science 238:336-340 (1987)) and the "chemical degradation method," "also known as the "Maxam-Gilbert method" (Maxam, A. M., ef al., Proc. Natl. Acad. Sci. (U.S.A.) 74:560 (1977)), both references herein incorporated by reference) to determine the nucleotide occurrence at the SNP loci.

Methods of the invention can identify nucleotide occurrences at SNPs using a "microsequencing" method. Microsequencing methods determine the identity of only a single nucleotide at a "predetermined" site. Such methods have particular utility in determining the presence and identity of polymorphisms in a target polynucleotide. Such microsequencing methods, as well as other methods for determining the nucleotide occurrence at a SNP locus are discussed in Boyce-Jacino et al., U.S. Pat. No. 6,294,336, incorporated herein by reference, and summarized herein.

Microsequencing methods include the Genetic Bit Analysis method disclosed by Goelet, P. ef al. (WO 92/15712, herein incorporated by reference). Additional, primer-guided, nucleotide incorporation procedures for assaying polymorphic sites in DNA have also been described (Komher et al, Nucl. Acids. Res. 17:7779-7784 (1989); Sokolov, Nucl. Acids Res. 18:3671 (1990); Syvanen, et al., Genomics 8:684-692 (1990); Kuppuswamy ef al., Proc. Natl. Acad. Sci. (U.S.A.) 88:1143-1147 (1991); Prezant et al, Hum. Mutat. 1:159-164 (1992); Ugozzoli et al., GATA 9:107-112 (1992); Nyren et al., Anal. Biochem. 208:171-175 (1993); and Wallace, WO 89/10414). These methods differ from Genetic Bit™ method of analysis in that they all rely on the incorporation of labeled deoxynucleotides to discriminate between bases at a polymorphic site. In such a format, since the signal is proportional to the number of deoxynucleotides incorporated, polymorphisms that occur in runs of the same nucleotide can result in signals that are proportional to the length of the run (Syvanen et al. Amer. J. Hum. Genet. 52:46-59 (1993)). Alternative microsequencing methods have been provided by Mundy (U.S. Pat. No. 4,656,127) and Cohen, D. et al (French Patent 2,650,840; PCT Appl. No. WO91/02087) which discusses a solution-based method for determining the identity of the nucleotide of a polymorphic site. As in the Mundy method of U.S. Pat. No. 4,656,127, a primer is employed that is complementary to allelic sequences immediately 3'- to a polymorphic site.

Boyce-Jacino et al., U.S. Pat. No. 6,294,336 provides a solid phase sequencing method for determining the sequence of nucleic acid molecules (either DNA or RNA) by utilizing a primer that selectively binds a polynucleotide target at a site wherein the SNP is the most 3' nucleotide selectively bound to the target.

In one particular commercial example of a method that can be used to identify a nucleotide occurrence of one or more SNPs, the nucleotide occurrences of SNPs in a sample can be determined using the SNP-IT™ method (Orchid Biosciences, Inc., Princeton, NJ). In general, SNP-IT™ is a 3-step primer extension reaction. In the first step a target polynucleotide is isolated from a sample by hybridization to a capture primer, which provides a first level of specificity. In a second step the capture primer is extended from a terminating nucleotide triphosphate at the target SNP site, which provides a second level of specificity. In a third step, the extended nucleotide triphosphate can be detected using a variety of known formats, including: direct fluorescence, indirect fluorescence, an indirect colorimetric assay, mass spectrometry, fluorescence polarization, etc. Reactions can be processed in 384 well format in an automated format using a SNPstream™ instrument (Orchid Biosciences, Inc., Princeton, NJ).

It will be appreciated that the methods of the invention may also be carried out on "DNA chips". Such "chips" are described in US 5,445,934 (Affymetrix; probe arrays), WO 96/31622 (Oxford Gene Technology; probe array plus ligase or polymerase extension), and WO 95/22058 (Affymax; fluorescently marked targets bind to oligomer substrate, and location in array detected); all of these are incorporated herein by reference. PCR amplification of small regions (for example up to 300bp) can be used to detect small changes greater than 3-4 bp insertions or deletions. Amplified sequence may be analysed on a sequencing gel, and small changes (minimum size 3-4 bp) can be visualised. Suitable primers are designed as herein described.

In one embodiment, the method of genotyping a polymorphism comprises performing a primer extension reaction and detecting the primer extension reaction product. Suitably, the primer extension reaction is a multiplex primer extension reaction. In such a reaction, the primers themselves or the extension products of the different primers are distinguishable from each other. For example, they may be distinguishable by virtue of molecular size (for example as in the ABI Prism^® SNaPshot™ Multiplex assay as described below), the presence of a unique tag in each primer which allows binding to appropriately located complementary nucleic acid molecules on a solid substrate (see Hirshchom et al 2000 Proc Natl Acad Sci USA 97: 12164-12169), or by virtue of their individualised location on a solid substrate (see Krjutskov et al 2008 Nucleic Acids Res 36: e75.

A suitable method is the ABI Prism^® SNaPshot™ Multiplex assay (Applied Biosystems, CA, USA) as used in the Examples. Multiplex PCR is used to amplify the genomic regions encompassing several SNPs in a single PCR. For each PCR product, a primer which hybridises selectively to the PCR product is used in a single base extension (SBE) reaction. Each primer has a 5' non-hybridizing region containing an appropriate number of T residues, such that each SBE reaction product has a different molecular size to allow unequivocal detection when several SNPs are included in a single (multiplex) SBE reaction.. The single base extension (SBE) reaction is performed to introduce a dye- labelled ddNTP complementary to the allele of each target SNP and the products are then separated by electrophoresis and the dye detected using appropriate sensors. Alternative 5' non-hybridizing regions may comprise A residues. Other suitable methods involving a primer extension are as discussed above.

A sixth aspect of the invention provides a method of preparing a data carrier containing data on the predicted iris color of a human, the method comprising recording the results of a method carried out according to any of the first to fifth aspects of the invention on a data carrier. The data produced from carrying out the methods of the invention may conveniently be recorded on a data carrier. Thus, the invention includes a method of recording data on the predicted iris color of a human using any of the methods of the invention and recording the results on a data carrier. Typically, the data are recorded in an electronic form and the data carrier may be a computer, a disk drive, a memory stick, a CD or DVD or floppy disk or the like.

Information recorded on the data carrier may include the genotype information obtained using the methods of the invention and/or the prediction of iris color. If a prediction of at least one quantitative color parameter is made, this may include the identity of the quantitative color parameter or parameters, then numerical variable for each quantitative color parameter and optionally a confidence interval. It may include a visual representation of the color, and/or a code associated with the color in a color scheme such as Pantone™. If a categorical prediction is given, the information given may include the category of iris color, such as whether it be blue, intermediate, brown or unclassified, the probability of a true positive in that category, the probability level used as the cut-off, and/or the specificity and/or sensitivity of the model for the given probability level. Other identifying information may also be included, such as the date and location from which the nucleic acid sample was obtained.

A seventh aspect of the invention provides a method for predicting the iris color of a human based on the allele occurrences in a sample of their DNA of at least the following polymorphisms:

(a) the single nucleotide polymorphism (SNP) rs12913832 or a polymorphic site which is in linkage disequilibrium with rs12913832 at an r² value of at least 0.9; and at least one of the following polymorphisms:

a SNP selected from the group consisting of rs1800407, rs12896399, rs12203592, rs1325127, rs1393350, rs728405, rs1129038, and a polymorphic site which is in linkage disequilibrium with one of said SNPs at an r² value of at least 0.5;

a polymorphism which is (a) in the region between basepairs 76891593 and 77498447 on chromosome 17 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs9894429 or a polymorphic site which is in linkage disequilibrium with rs9894429 2 value of at least 0.5;

36 and is associated with variation in iris color; and/or is (b) rs7277820 or a polymorphic site which is in linkage disequilibrium with rs7277820 at an r² value of at least 0.5; and

(iv) a polymorphism which is (a) in the region between basepairs 233690968 and 234296843 on chromosome 1 according to NCBI Build 36 and is associated with variation in iris color and/or is (b) rs3768056 or a polymorphic site which is in linkage disequilibrium with rs3768056 at an r² value of at least 0.5; and

(v) a polymorphism which is (a) in the region between basepairs 233848903 and 234546690 on chromosome 2 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs2070959 or a polymorphic site which is in linkage disequilibrium with rs2070959 at an r² value of at least 0.5.

The allele occurrences may typically be determined or have been determined by performing steps (a) and (b) of the method of the first aspect of the invention. The prediction of the iris color may then be made using step (c) of the first aspect of the invention. Further polymorphisms may be genotyped and/or age and/or gender may be evaluated to arrive at the prediction.

An eighth aspect of the invention provides a method for predicting the iris color of a human based on the allele occurrences in a sample of their DNA of at least one of the following polymorphisms:

(a) a polymorphism which is (i) in the region between basepairs 76891593 and 77498447 on chromosome 17 according to NCBI Build 36 and is associated with variation in iris color; and/or is (ii) rs9894429 or a polymorphic site which is in linkage disequilibrium with rs9894429 at an r² value of at least 0.5; and

(b) a polymorphism which is (i) in the region between basepairs 37100732 and 37761703 on chromosome 21 according to NCBI Build 36 and is associated with variation in iris color; and/or is (ii) rs7277820 or a polymorphic site which is in linkage disequilibrium with rs7277820 at an r² value of at least 0.5; and

(c) a polymorphism which is (i) in the region between basepairs 233690968 and 234296843 on chromosome 1 according to NCBI Build 36 and is associated with variation in iris color; and/or is (ii) rs3768056 or a polymorphic site which is in linkage disequilibrium with rs3768056 at an r² value of at least 0.5; and

(d) a polymorphism which is (i) in the region between basepairs 233848903 and 234546690 on chromosome 2 according to NCBI Build 36 and is associated with variation in iris color; and/or is (ii) rs2070959 or a polymorphic site which is in linkage disequilibrium with rs2070959 at an r² value of at least 0.5.

The allele occurrences may typically be determined or have been determined by performing steps (a) and (b) of the method of any of the second to fifth aspects of the invention. The prediction of the iris color may then be made using step (c) of the second to fifth aspects of the invention. Further polymorphisms may be genotyped and/or age and/or gender may be evaluated to arrive at the prediction.

A ninth aspect of the invention provides a method for creating a description of a human based on forensic testing, wherein the description includes a prediction of the iris color of the human based on the allele occurrences in a sample of their DNA of at least the following polymorphisms:

(i) a SNPs selected from the group consisting of rs1800407, rs12896399, rs12203592, rs1325127, rs1393350, rs728405, rs1129038, and a polymorphic site which is in linkage disequilibrium with one of said SNPs at an r² value of at least 0.5;

(iii) a polymorphism which is (a) in the region between basepairs 37100732 and 37761703 on chromosome 21 according to NCBI Build 36 and is associated with variation in iris color; and/or (b) rs7277820 or a polymorphic site which is in linkage disequilibrium with rs7277820 at an r² value of at least 0.5;

(v) a polymorphism which is (a) in the region between basepairs 233848903 and 234546690 on chromosome 2 according to NCBI Build 36 and is associated with variation in iris color; and/or is rs2070959 or a polymorphic site which is in linkage disequilibrium with rs2070959 at an r² value of at least 0.5. A tenth aspect of the invention provides a method for creating a description of a human based on forensic testing, wherein the description includes a prediction of the iris color of the human based on the allele occurrences in a sample of their DNA of at least the following polymorphisms:

According to the ninth and tenth aspects, the determination of the allele occurrences and the prediction of iris color may be made as described in relation to the seventh and eighth aspects of the invention respectively. The description may include features in addition to the predicted iris color, such as the age or gender of the human, including features determined using further forensic tests. The age of unidentified corpses and skeletons, and also of living persons, can be evaluated using methods known in the art, as described in relation to the first aspect of the invention. Gender can be determined using genetic tests based on the presence or absence of markers indicative of the Y chromosome, as described in relation to the first aspect of the invention. Such a description of a human, particularly of a wanted person, may be useful in tracing the wanted person. A description of a person to be identified from their remains may be useful in identifying a potential relative of the person. Once a potential relative is identified, the genetic profile of the potential relative and the person's remains can be compared, to determine whether the two are in fact related.

An eleventh aspect of the invention provides a method for genotyping polymorphisms indicative of human iris color comprising:

(a) obtaining a sample of the nucleic acid of a human; and

(b) genotyping the nucleic acid for at least one of the following polymorphisms:

(i) a polymorphism which is (a) in the region between basepairs 76891593 and 77498447 on chromosome 17 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs9894429 or a polymorphic site which is in linkage disequilibrium with rs9894429 at an r² value of at least 0.5; and

(ii) a polymorphism which is (a) in the region between basepairs 37100732 and 37761703 on chromosome 21 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs7277820 or a polymorphic site which is in linkage disequilibrium with rs7277820 at an r² value of at least 0.5; and

(iii) a polymorphism which is (a) in the region between basepairs 233690968 and 234296843 on chromosome 1 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs3768056 or a polymorphic site which is in linkage disequilibrium with rs3768056 at an r² value of at least 0.5; and

(iv) a polymorphism which is (a) in the region between basepairs 233848903 and 234546690 on chromosome 2 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs2070959 or a polymorphic site which is in linkage disequilibrium with rs2070959 at an r² value of at least 0.5.

The genotyping methods are as discussed in relation to the first to fifth aspects of the invention. Additional polymorphisms to those listed above, including some or all of those discussed in relation to the first aspect of the invention may also be genotyped according to this aspect of the invention. A twelfth aspect of the invention provides kit of parts for use in predicting the iris color of a human comprising:

(i) a primer pair suitable for amplifying the genomic region encompassing a polymorphism which is (i) in the region between basepairs 76891593 and 77498447 on chromosome 17 according to NCBI Build 36 and is associated with variation in iris color; and/or (ii) rs9894429 or a polymorphic site which is in linkage disequilibrium with rs9894429 at an r² value of at least 0.5; a primer pair suitable for amplifying the genomic region encompassing a polymorphism which is (a) in the region between basepairs 37100732 and 37761703 on chromosome 21 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs7277820 or a polymorphic site which is in linkage disequilibrium with rs7277820 at an r² value of at least 0.5;

a primer pair suitable for amplifying the genomic region encompassing a polymorphism which is (a) in the region between basepairs 233690968 and 234296843 on chromosome 1 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs3768056 or a polymorphic site which is in linkage disequilibrium with rs3768056 at an r² value of at least 0.5; or

a primer pair suitable for amplifying the genomic region encompassing a polymorphism which is (a) in the region between basepairs 233848903 and 234546690 on chromosome 2 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs2070959 or a polymorphic site which is in linkage disequilibrium with rs2070959 at an r² value of at least 0.5. Suitable primer pairs and amplification methods are as discussed in relation to the first aspect of the invention. Suitably, each of the primer pairs is suitable for use together in a multiplex polymerase chain reaction. The kit may be used in conjunction with the genotyping methods discussed in relation to the first aspect of the invention. Suitable primer pairs for amplifying genomic regions encompassing additional polymorphisms to those listed above, including some or all of those discussed in relation to the first to fifth aspects of the invention may also be included in the kit. The amplified regions may then be genotyped according to the first aspect of the invention.

A thirteenth aspect of the invention provides a kit of parts for use in predicting the iris color of a human comprising:

(i) a nucleic acid molecule that hybridizes selectively to a genomic region encompassing a polymorphism which is (a) in the region between basepairs 76891593 and 77498447 on chromosome 17 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs9894429 or a polymorphic site which is in linkage disequilibrium with rs9894429 at an r² value of at least 0.5; a nucleic acid molecule that hybridizes selectively to a genomic region encompassing a polymorphism which is (a) in the region between basepairs 37100732 and 37761703 on chromosome 21 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs7277820 or a polymorphic site which is in linkage disequilibrium with rs7277820 at an r² value of at least 0.5,

a nucleic acid molecule that hybridizes selectively to a genomic region encompassing a polymorphism which is (a) in the region between basepairs 233690968 and 234296843 on chromosome 1 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs3768056 or a polymorphic site which is in linkage disequilibrium with rs3768056 at an r² value of at least 0.5;

a nucleic acid molecule that hybridizes selectively to a genomic region encompassing a polymorphism which is (a) in the region between basepairs 233848903 and 234546690 on chromosome 2 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs2070959 or a polymorphic site which is in linkage disequilibrium with rs2070959 at an r² value of at least 0.5. Suitable nucleic acid molecules and methods of using them to genotype polymorphisms are as discussed in relation to the first aspect of the invention. Suitably, each of the nucleic acid molecules is a primer suitable for performing a primer extension reaction, suitably in a multiplex reaction. The kit may be provided or used in conjunction with the kit of the twelfth aspect of the invention. Suitable nucleic acid molecules that hybridize selectively to additional genomic region encompassing a polymorphism, including some or all of those discussed in relation to the first to fifth aspects of the invention, may also be included in the kit.

A fourteenth aspect of the invention provides a solid substrate for use in predicting the iris color of a human, the solid substrate having attached thereto:

a nucleic acid molecule that hybridizes selectively to a genomic region encompassing a polymorphism which is (a) in the region between basepairs 233848903 and 234546690 on chromosome 2 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs2070959 or a polymorphic site which is in linkage disequilibrium with rs2070959 at an r² value of at least 0.5. The solid substrate with the nucleic acids attached thereto may be a DNA chip or a microarray. Typically, each array position on the DNA chip or microarray is attached to a nucleic acid molecule having a different sequence. Suitable chips and microarrays are as described above in relation to the first aspect of the invention. Suitably, each of the nucleic acid molecules is a primer suitable for performing a primer extension reaction.

In one embodiment, the solid substrate has only the nucleic acid molecules that hybridise as said attached thereto.

The solid substrate may be used in conjunction with the kit of the twelfth aspect of the invention. Suitable nucleic acid molecules that hybridize selectively to additional genomic regions encompassing a polymorphism, including some or all of those discussed in relation to the first aspect of the invention, may also be attached to the solid substrate. The present invention will be further illustrated in the following examples, without any limitation thereto. Example 1: Digital quantification of human eye color highlights genetic association of three new loci

Summary

Previous studies have successfully identified genetic variants in several genes associated with human iris (eye) color; however, they all used simplified categorical trait information. Here, we quantified continuous eye color variation into hue and saturation values using high-resolution digital full-eye photographs and conducted a genome-wide association study on 5,951 Dutch Europeans from the Rotterdam Study. Three new regions, 1q42.3, 17q25.3, and 21q22.13, were highlighted meeting the criterion for genome-wide statistically significant association. The latter two loci were replicated in 2,261 individuals from the UK and in 1,282 from Australia. The LYST gene at 1q42.3 and the DSCR9 gene at 21q22.13 serve as suitable embodiments. A model for predicting quantitative eye colors explained over 50% of trait variance in the Rotterdam Study. Over all our data exemplify that fine phenotyping is a useful strategy for finding genes involved in human complex traits.

We measured human eye color to hue and saturation values from high-resolution digital full eye photographs of several thousand of Dutch Europeans. This quantitative approach, which is extremely cost-effective, portable and time efficient, revealed that human eye color varies along more dimensions than the one represented by the blue- green-brown categories studied previously. Our work represents the first genome-wide study of quantitative human eye color. We clearly identified three new loci, LYST, 17q25.3, TTC3/DSCR9, in contributing to the natural and subtle eye color variation along multiple dimensions. The latter two loci were replicated in two independent cohorts. Our quantitative prediction model explained over 50% of the trait variance, representing the highest accuracy achieved so far in genomic prediction of human complex and quantitative traits, with relevance for future forensic applications. INTRODUCTION

The iris functions as the diaphragm of the eye controlling the amount of light reaching the retina. The type and amount of pigments in the iris determine eye color [1,2]. Eye color shows a high degree of variation in people of European ancestry and correlates with latitude within the European continent, which may be explained by a combination of natural and sexual selection [3,4]. The inheritance of eye color is not strictly Mendelian although blue iris color follows largely a recessive pattern [1]. Genome-wide association studies in people of Europeans decent [5,6,7,8] have confirmed eye color as a polygenic trait, with the HERC2/OCA2 genes explaining the most of the blue and brown eye color inheritance, whereas other genes such as SLC2A4, TYR, TYRP1, SLC45A2, and IRF4 contribute additionally to eye color variation, albeit with minor effects [9]. These findings increased our understanding of the genetic basis of human pigmentation, and drew attention to their potential applications, such as in forensic sciences [10].

However, all previous genetic association studies on human eye color were based on categorical trait information, most often a three-point scale of blue, green-hazel or intermediate, and brown eye color [5,6,7,11,12], whereas it is known that in reality iris colour exists in a continuum from the lightest shades of blue to the darkest of brown or black [13]. The use of categorized information from continuous traits is expected to oversimplify the quantitative nature of the trait. Therefore, additional genes contributing to human iris coloration may be identifiable if the full quantitative spectrum of eye coloration could be exploited. To this aim, we digitally quantified continuous eye colors into hue and saturation values from high-resolution, full-eye photographs, and conducted a genome-wide association study in 5951 Dutch Europeans from the Rotterdam Study genotyped with 550-610,000 single nucleotide polymorphisms (SNPs). Genetic variants with genome-wide significant eye color association were further tested in replication samples of 2,261 participants of the TwinsUK Study and 1 ,282 participants of the Brisbane Twin Nevus Study (BTNS) Australia. Finally, we evaluated the predictive value of an updated list of informative SNPs, including interacting ones, on quantitative eye color that is of relevance in forensic applications.

RESULTS

Quantitative eye color phenotyping

The discovery sample set included participants of three Rotterdam Study (RS) cohorts (RS1, RS2, and RS3) with a total of 5,951 Dutch European individuals after quality control of genetic and phenotypic data (Table 9). Digitally extracted iris (eye) color was quantified into two interval dimensions hue (H) and saturation (S). H measures the variation in color spectrum, whereas S measures the variation in color purity or intensity. Thus, H and S may serve as representations of the type and the amount of iris pigments. We noticed a high correlation between H and S (r = -0.77), which may have a biological explanation. Eyes classified in three different color categories "blue", "brown" and "intermediate" by an ophthalmologist during eye examination largely clustered around distinct areas on the HS color space but with considerable overlap (Figure 1a-c). This is also true for the five color categories graded by reviewing the digital photographs used for eye color quantification in this report (Figure 1d-h). The overlap between clusters may be expected given the quantitative nature of iris coloration and the variation in color conception. Principal component analysis on z-transformed H and S values revealed two components CHS1 and CHS2 that accounted for 88.75% and 11.25% of the total quantitative eye color variance (Figure 1i-j). Among the 4 quantitative measurements, the CHS1 variable showed the highest correlation with the 3-ordinal category variable blue-intermediate-brown.

Genome-wide Association Studies (GWAS)

GWAS in three independent RS cohorts, as well as in the merged dataset (RS123), were carried out for 6 eye color traits i) H, ii) S, iii) CHS1, iv) CHS2, v) 3-category color classification ("blue", "brown" and "intermediate"), and vi) 5-category color classification ("pure blue", "light blue/grey", "green/mixed with brown spots", "light brown", and "dark brown"). Genetic outliers of non-European ancestry were excluded (Figure 2). No institutional heterogeneity between the three cohorts or residual population sub- stratification was noticed after merging the genotype data. Inflation factors for all color traits were in the range from 1.02 to 1.03 after adjusting for population sub-stratification. The initial scan of the merged R123 samples for all color traits revealed a sharp deviation between the observed P values and the expected ones under the null hypothesis (Figure 3), mainly due to a very strong effect of the HERC2 and OCA2 genes on chromosome 15q13.1 (Table 13). SNPs in HERC2 showed the most significant effect on all color traits (rs12913832 P < 10^"300; except for CHS2 with P = 0.60), confirming previous findings on categorical eye color information [6,7,11,14]. In the subsequent scan adjusted for the effect of HERC2 rs12913832, five other genes known to be involved in eye color (OCA2, SLC2A4, TYR, TYRP1, and SLC45A2) [5,8] revealed genome-wide significant eye color association (P < 5*10^"8), and the effect of IRF4 [8] was confirmed at a somewhat lower significance level (P = 1.4 10^"6). We did not observe a significant effect of ASIP on eye color, which is in agreement with our earlier study on categorized eye color [9], and in line with previous findings suggesting that ASIP may be more involved in skin pigmentation [5,15]. Noteably, SNPs in the previously known eye color genes TYRP1, TYR, and SLC24A4 showed more significant association with quantitative eye color compared with categorical ones. In the subsequent GWAS adjusted for the effects of all seven known genes, the P values derived for CHS1, H and S still significantly deviated from the expected ones (Figure 3). The tail of deviation was mainly explained by 10 SNPs at three new loci 1q42.3, 17q25.3, and 21q22.13 (Table 10). The association of the three new loci met the genome-wide significance criterion of P < 5*10^"8. The allelic effects of the 10 SNPs were consistent through the 3 independent RS cohorts and were nominally significant (Table 10). No more SNPs were clearly associated with any eye color trait at the genome-wide significant level in an additional scan adjusted for all previously known genes as well as the 3 new loci.

At the 1q42.3 locus two SNPs, rs3768056 and rs9782955, were associated with S at the genome-wide significance level (5.5*10⁹ < P < 7.8*10^~9) (Table 10, Figure 4a and Figure 5a). Both SNPs are located in introns of the lysosomal trafficking regulator (LYST) gene. Note that SNPs at this locus were associated with S but not with H or categorical colors, which is a different phenomenon compared to the other two new loci identified. Three SNPs at 17q25.3 were associated with multiple color traits at the genome-wide significance level and the association with CHS1 was the most significant (5.9x10¹¹ < P < 7.2x10⁹) (Table 10, Figure 4b and Figure 5b). The SNP rs7219915 is intronic and rs9894429 exonic of the nuclear protein localization 4 homolog (NPLOC4) gene and rs12452184 is intronic of the hepatocyte growth factor-regulated tyrosine kinase substrate (HGS) gene. There are multiple small genes in the 17q25.3 region (Figure 4b and Figure 5b). Five SNPs at 21q22.13 were significantly associated with CHS1 (5.0x10⁹ < P < 3.1x10 ^s) (Table 10, Figure 4c and Figure 5c). Four SNPs, rs1003719, rs2252893, rs2835621, and rs2835630, are intronic of the tetratricopeptide repeat domain 3 (TTC3) gene, and one, rs7277820, is in the flanking 5' UTR region of the Down Syndrome Critical Region 9 (DSCR9) gene. The TTC3 and DSCR9 genes are in the same LD block (Figure 4c and Figure 5c).

On chromosome 2q37 SNPs rs2070959, rs1105879, rs892839, rs10209564 were associated with CHS2 at borderline genome-wide significance (10⁷ < P < 10^"6,) (Table 10, Figure 4d and Figure 5d). The first 2 SNPs are in the coding region of the UDP glycosyltransferase 1 family (UGT1A) gene.

Replication analyses in TwinsUK and BTNS

Eye color data from the TwinsUK cohort were extracted from digital portrait photographs with limited iris resolution. As these photographs were taken under some variation in daylight and exposure conditions, the trait variance was larger compared with those of RS (H = 19.22 ± 18.44; S = 0.47 ± 0.19; Table 9). This, in combination with smaller sample size, resulted in less significant eye color association detected for the previously known eye color SNPs, such as HERC2 rs12913832 (RS123: P < IxlO^"300, TwinsUK: P = 1.4x10^"88), SLC24A4 rs12896399 (RS123: P = 2.0x10^"23, TwinsUK: P = 2.1x10^"3), TYR rs1393350 (RS123: P = 1.0x10^"9, TwinsUK: P = 3.9x10^"2), and TYRP1 rs1325127 (RS123: P = 4.0^χ10^~11, TwinsUK P > 0.05). Despite the considerable loss of statistical power, two of the three regions newly identified here were replicated with significant eye color association in the TwinsUK data. The SNPs at chromosome 21q22.13 locus were replicated with consistent allelic effects (P for CHS1 and H < 0.01, Table 10). The SNPs at 17q25.3 were associated with S and CHS2 (P < 0.02), but not significant with CHS1 (Table 10), which was the most significant association in the RS cohort (P = 5.9*10^"11). The chromosome 1q42.3 region was not significantly associated with any eye color trait in the TwinsUK data.

Participants of the BTNS cohort were on average much younger (17.19 ± 4.56 years) than the other two cohorts (over 50 years) and had more intermediate colored eyes compared with RS (Table 9). The eye photographs from BTNS had similar resolutions and sizes as the ones from RS; however, in contrast to RS they were also taken under some variation in daylight and exposure conditions and the effective sample size was the smallest among the 3 studies. P-values derived from BTNS for the association between eye color and previously known eye color SNPs were somewhat in between those derived from RS and TwinsUK (e.g. P for rs12913832 = 1.26x10^~200). The newly identified SNPs at 17q25.3 (P for CHS1 <0.05) and 21q22.13 (P for CHS1 and S <0.05) showed significant association with eye color and the betas were consistent with those derived from RS (Table 10). The chromosome 1q42.3 region was not significantly associated with any eye color trait in the BTNS data.

Eye color prediction

We identified 17 predictors that significantly explained the trait variance, including age and sex, 11 SNPs from nine genes, and four SNP pairs that showed significant interaction effects. For details of interaction analysis, see below. The 17 predictors together explained 48.87% of the H variance and 56.30% of the S variance in the Rotterdam Study (Table 11). Adding a further SNP for the UGT1A gene (18 predictors) increased the percentage of variance explained further. Most predictors had significant effects on both H and S. Exceptions were rs3768056 in LYST and the interaction between HERC2 rs12913832 and SLC24A4 rs12896399, which were only significant for S, as well as IRF4 rs12203592, OCA2 rs728405, and the interaction between HERC2 rs12913832 and OCA2 rs728405, which were only significant for H. The main effect of SLC45A2 rs16891982 is no longer significant when its interaction with rs1800407 was taken into account. The HERC2 SNP rs12913832 showed, as expected, the strongest predictive power, which alone explained 44.50% of the H and 48.31% of the S variance. Surprisingly, age was identified to be the 2nd strongest predictor of quantitative eye color; the increased age was associated with increased H (AR2 = 1.17%, P = 8.2x10^"29) and decreased S (AR² = 5.03%, P = 1.4*10^"131). The 3 newly identified loci together explained 0.53% and 0.73% and the identified SNP-SNP interactions explained 0.75% and 0.72% of the H and S variance, respectively. Gender showed a small effect on H (AR² = 0.04%) and S (AR² = 0.09%), although statistically significant (P < 0.04). After adjusting for the effects of the 17 predictors, the summary variance explained by the remaining SNPs was negligible (AR² < 0.01%). These 17 identified predictors explained 56.2% of S and 11.1% of H variance in BTNS as well as 28.5% of S and 4.1% of H in TwinsUK.

We also used the 17 predictors (i.e. excluding UGT1A) for 3 or 5 categorical eye color prediction based on a multinomial logistic regression model. The prediction accuracy was measured by the Area Under the receiver operation Curve (AUC). The accuracy in predicting 3-category eye color was 0.92 for blue, 0.74 for intermediate, and 0.93 for brown, which reflects a slight but statistically significant (P = 2.7x10^"4) improvement compared to our previous attempt using 15 SNPs from 8 genes (AUC 0.91 for blue, 0.73 for intermediate, and 0.93 for brown) [9]. Excluding age and gender from the prediction model had no major impact on the accuracy of categorical prediction (AAUC < 0.01 for any color category). Predicting 5-category eye colors was category-wise less accurate compared to the 3-category prediction (AUC 0.72 for pure blue, 0.82 for light blue/grey, 0.66 for green/mixed, 0.93 for light brown, 0.89 for dark brown.

DISCUSSION

Using digitally-quantified continuous eye color information, extracted from high-resolution full eye size pictures, we were able to improve the power of finding genetic associations as evident from seeing SNPs in some known eye color genes with more significant association with quantitative than categorical eye color. The gain of power allowed us to identify three new loci, which add substantially to the previously available list of seven genes and provide additional insights into the genetic origins of human pigmentation. Fine-resolution phenotyping may therefore serve as an important alternative strategy for finding genes involved in complex traits to simply increasing sample size, which represents the main trend of current GWA studies in humans.

All SNPs that have so far been found to be associated with eye color at 1q42.3 are located in the LYST gene. Mutations in the LYST gene are involved in Chediak-Higashi and exfoliation syndromes characterized by iris pigmentation dispersion, transillumination and other defects [16,17]. Mice studies showed that LYST mutations reproduced the iris defects of human exfoliation syndrome [18]. Furthermore, a study of coat colour in cattle showed that LYST may influence the intensity of pigment within coat colour categories, e.g., dark grey to light grey, but do not result in color type changes, e.g., grey to red or black [19]. These authors suggested that allelic variation in this gene, possibly not associated with illness, could underlie the different shades of colours observed in the partially diluted colour. Our results in the Rotterdam Study are in perfect agreement with their conclusion. Also, the LYST gene was identified in two studies with evidence for positive selection when comparing continental populations that strongly differ in pigmentation phenotypes [20,21]. This provides additional arguments that the gene is involved in human pigmentation traits [2]. Noteworthy, the SNPs in LYST gene were associated at genome-wide significance with saturation only but were not even nominally significant with hue. This finding underlines the relevance of our approach to separately analyze the H and S dimensions, which are likely to involve independent biological bases. The failure in replicating the 1q42.3 / LYST findings in the TwinsUK and BTNS studies may be explained by a combination of factors related to the smaller sample size and the smallest genetic effect size among the three new loci, as well as some limitations in the photographs at least for TwinsUK. However, given the abundant evidence from animal studies and human evolutionary studies, together with our findings from the Rotterdam Study, we suggest that LYST is likely to influence subtle variation in the amount of pigmentation that requires high precision measurements to be detectable. The replicated significant association at chromosome 17q25.3 locus was detected for SNPs located in the NPLOC4 and HGS genes. There are, however, multiple small genes in this region, including ACTG1, FSCN2, C17orf70, NPLOC4, TSPA 10, PDE6G, LOC339229, ARL16, HGS, MRPL12, and SLC25A10. At this moment it is difficult to clearly affiliate a functional unit to the association signal observed. Based on current knowledge, PDE6G may be the best candidate gene for the association signal observed. Mutations in PDE6G cause autosomal recessive retinitis pigmentosa [22], in which the dysfunction in retinal pigment epithelium is typical.

The chromosome 21q22.13 locus, which we identified with replicated significant eye color association, contains several genes including the Down Syndrome Critical Region 3 (DSCR3), 6 (DSCR6), 9 (DSCR9), tetratricopeptide repeat domain 3 (TTC3), and phosphatidylinositol glycan anchor biosynthesis (PIGP) genes. The SNPs showing significant association with eye color were in the TTC3 and DSCR9 genes. Both genes are in the same high linkage disequilibrium region. It is known that trisomy of the chromosomal 21q22 region leads to Down syndrome in which so called Brushfield spots are often observed [23]. Brushfield spots are small white or grayish/brown spots on the periphery of the human iris due to aggregation of connective tissue, a normal iris element. These spots are normal in children but much more frequently (up to 78%) observed in newborn Down Syndrome patients [24]. Also, they are much more likely to occur in patients of European origin, where eye color variation is observed, compared to patients of Asian ancestry with homogeneous brown eyes [25]. Further, the DSCR9 gene, encoding functionally unknown proteins, was found a new gene in the primate lineage during evolution and exclusive to primate genomes [26]. We therefore hypothesize that genetic variants in DSCR9 or nearby genes may influence the aggregation of connective tissue of normal iris resulting in different iris color appearance, and extreme forms of variation, e.g., via trisomy, lead to Down Syndrome. It has been suggested that the development of the iris and brain are linked, speculatively via genetic pathways that may also involve pigment production [27]. Our findings support this idea but suggest that the link at 21q22.13 might be more related to tissue development than pigmentation. There remained several residual signals over the genome at borderline genome-wide significant association with eye color. Such signals may represent false positive results or genes with true but small effects requiring a larger sample for detecting unambiguous associations. Most notable is the association identified at 2q37; this region includes the UGT1A gene encoding a UDP-glucuronosyltransferase, an enzyme of the glucuronidation pathway that transforms bilirubin into water-soluble metabolites. Variants in this gene influence bilirubin plasma levels[28] and cause Gilbert's syndrome [29,30,31], which is the most common syndrome known in humans characterized with mild and harmless jaundice characterized by a yellowish discoloration of the skin. Interestingly, SNPs in the UGT1A gene were most significantly associated with CHS2, a dimension that is uncorrelated with the blue-brown variation represented by CHS1 , indicating that CHS2 may represent the variation in yellowish pigments.

The HERC2/OCA2 genes showed some "masking" effects over SLC24A4, SLC45A2 and IRF4 genes that significantly improved the prediction accuracy. However, it remains uncertain if these interactions are truly genetic or confounded by other factors. For example, high melanin concentration in the frontal iris epithelia may block the color variation in the inner layers from being measurable, which may lead to statistically significant interactions. Still, not all genes showed interaction with HERC2/OCA2 and some of the interactions are specific for the H or S dimension. These findings are of interest for further functional studies. Our prediction model explained 49-56% of the trait variance in the Rotterdam Study. To our knowledge these values represent the highest accuracy achieved so far in genomic prediction of human complex and quantitative traits[32]. We used non-overlapping samples in building and evaluating the prediction model, and this may lead to slightly conservative R² estimates compared with the methods based on cross validations. Also note that these R² estimates are not equivalent to the ones from linkage-based studies or logistic models. The fact that the identified 17 predictors explain less trait variance in TwinsUK may be addressed by the quality limitations in the photographs available. In both TwinsUK and BTNS the variance explained for H was much lower than that for S. This is most likely because the light conditions were not standardized when the photographs were taken in these two cohorts available for replication analyses. Given that the newly identified genetic variants together explained less than 2% of the trait summary variance, we do not expect that additional but unknown genetic variants may account for an essential portion of the unexplained variance. The color of the eye as perceived from the outside was the main outcome of this study, whereas the pigmentation genes by definition have a more direct effect on the melanin content. However, so far it is unclear if probing deeper into endophenotypes, e.g., directly measuring melanin content using biochemical methods, is going to reduce the unexplained variance, as we have also shown that there are regions putatively associated to eye colour but not clearly involved in the melanin pathways. Using the 17 predictors for 3-categorical color prediction slightly improved the accuracy compared to our previous attempt using 15 SNPs from 8 genes. The 5-category model had little power in differentiating "pure blue" from "light blue/grey", and "dark brown" from "light brown" categories, which are more likely to be consequences of differences in tissue structure than chemical composition [1]. The proposed quantitative prediction model may be helpful as an investigative tool in forensic applications, i.e. to better trace unknown suspects in cases where conventional DNA profiles from crime scene samples do not match those of known suspects including those already in criminal DNA databases [10]. Instead of a verbal statement on categorical eye color, which is prone to subjective imagination and is expected to result in inter-individual differences on the actual eye color in question, our quantitative prediction approach results in a more precisely defined eye color outcome as could be used in forensic practice via standardized color charts or computer-based color prints. Therefore quantitative eye color prediction is expected to enhance the success rate of tracing unknown individuals according to eye color in forensic applications compared with simple categorical eye color prediction. Our data also demonstrate that eye color saturation declines substantially in elderly people, further emphasizing the gain in power by using a quantitative approach. Age was significant in each of the 3 RS cohorts as well as in the UK and Australia replication cohorts. Thus, its effect on eye color is unlikely a reflection of sample composition and we speculate its effect may share some biological pathways involved in the graying of hair color. Ongoing studies aiming to identify biomarkers for age prediction may further improve the eye color prediction accuracy.

Using the example of eye color we have demonstrated that employing quantitative phenotype information about a complex trait in GWA analysis allows detection of new genetic variants. The three new regions and genetic interactions identified here as being involved in human quantitative eye color variation may serve as guides for future studies exploring the functional basis of human pigmentation. Finally, our findings are relevant for predicting eye color in applied areas of science such as in forensics.

METHODS

Rotterdam Study

The Rotterdam Study (RS) is a population-based prospective study including a main cohort and 2 extensions. The RS1 [33] is ongoing since 1990 and included 7,983 participants living in Rotterdam in The Netherlands. The RS2 [34] is an extension of the cohort, started in 1999 and included 3,011 participants. The RS3 [35] is a further extension of the cohort started in 2006 and included 3,932 participants. The participants were all examined in detail at baseline. Collection and purification of DNA have been described in detail previously [7]. Each eye was examined by slit lamp examination by an ophthalmological medical researcher, and iris color was graded by standard images showing various degrees of iris pigmentation. Three categories of iris color (blue, intermediate, and brown) were distinguished based on predominant color and the amount of yellow or brown pigment present in the iris. Additionally, digital full eye size photographs of the anterior segment were obtained with a Sony HAD 3CCD color video camera with a resolution of 800x600 pixel for each of three colors (Sony Electronics Inc., New York, NY) mounted on a Topcon TRC-50EX fundus camera (Topcon Corporation, Tokyo, Japan) after pharmacologic mydriasis (tropicamide 0.5% and phenylephrine 5%). The procedure of pharmacologic mydriasis (dilation of the pupil) was employed because the initial target for taking these pictures was the retina. The treatment makes the area of visible iris tissue smaller, and, thus, these images were not initially optimized for iris color examination. However, this procedure had little influence on the precision of the color measurements given the large number of the pixels in iris part. Two independent researchers additionally reviewed these images on a monitor with standard settings and graded the eye color into five categories, "pure blue", "light blue/grey", "green/mixed with brown spots", "light brown", and "dark brown". The Medical Ethics Committee of the Erasmus Medical Center approved the study protocol, and all participants provided written informed consent. The current study included in total of 5951 RS participants who had both genotypic information and eye photos.

TwinsUK

The TwinsUK cohort is a volunteer cohort of 10,000 same-sex monozygotic and dizygotic twins recruited from the general population (http://www.twinsUK.ac.uk). They have been extensively phenotyped, and gradeable portrait images (digitized from Polaroid photographs and digital photographs), with GWAS information, were available for 2,261 subjects. The study was reviewed by the St Thomas' Hospital Local Research Ethics Committee, and subjects were included after fully informed consent.

BTNS Australia

Adolescent twins, their siblings and parents have been recruited over sixteen years into an ongoing study of genetic and environmental factors contributing to the development of pigmented nevi and other risk factors for skin cancer as described in detail elsewhere [36,37]. The proband twins were recruited at age twelve years via schools around Brisbane, Australia, and followed up at age fourteen. Iris colour was scored by a trained nurse. Iris photographs were taken for all twins using a 13.6 megapixel digital camera (Sony Cybershot W300) using a flash. The camera was placed 5-7 cm in front of the eye to be photographed. Images were cropped in-camera to show only the iris, and the cropped 5 megapixel image stored for later processing. BTNS photos were similar with those from RS in term of sizes and resolutions. The pupils were not dilated so more iris area were available to score. However, these photos were taken under some variation in day light conditions and exposure levels. Principal components analysis of lllumina 610k GWAS data for all participants allowed identification of ancestry outliers and these were removed before further analysis so that the sample here is of exclusively northern European origin. All participants gave informed consent to participation in this study, and the study protocol was approved by appropriate institutional review boards. The current study includes 1,282 participants with eye photographs and GWAS information.

Eye color quantification

To measure colors quantitatively, we first compared several models in representing iris color including the RGB, CIE Lab, CIE XYZ and HSB/HSV models. We chose the HSB model where H stands for hue, S for saturation, and B for brightness. Under a fixed B, HS can be viewed as a color pie where H represents the variation of the color type, ranging from 0°-360° for all human detectable true colors, and the radius S represents the purity or intensity of the color, ranging from 0 to 1. The brightness or luminance is measured by B, a separate dimension that was removed from genetic analysis since it is sensitive to the lighting conditions when a photo is taken. The HSB color model suits well the current application because (1) the perceptual difference in it is uniform, (2) H and S values are invariant to brightness, (3) H and S may represent the type and the amount of iris pigments and (4) H and S values can be directly translated to true colors.

We developed a simple algorithm to automatically retrieve iris colors from the RS eye photos. Starting at the center of an image where the pupil is located, the algorithm samples pixels along multiple radii that cross the pupil, the iris, and the white of the eye in that sequence. The color intensity distribution of the sampled pixels follows a characteristic shape, based on which, the algorithm determines the starting and ending points of the iris by means of edge detection. It then connects all detected edge points by fitting an inner and an outer ellipse. The region between the inner and outer ellipse is considered as the iris region. Median RGB values of the pixels in the iris region were retrieved from each image and transformed to HS values according to standard formulas. The image processing procedures were programmed using Matlab 7.6.0 (The MathWorks, Inc., Natick, MA).

We noticed minor discordances between digital quantification and expert classification; 0.25% (58) "brown" eyes appeared in the blue area of the HS space (H>35 and S<0.45) whereas 1.65% (98) "blue" eyes were in the brown area (H<30 and S>0.55) (Figure 1). Most of these are due to expert misclassification. We kept the color categories of these individuals in the prediction analysis for a fair comparison with our previous prediction results that also allow a certain degree of sampling uncertainty.

Due to significant differences between RS eye photographs and TwinsUK portrait photographs, we preprocessed TwinsUK photographs by correcting R, G, B channels of each photo Y = ^x X , where c is the channel mean of all photos, x is the channel mean of each photo and X is the matrix of the raw channel values of all pixels in that photo. We then applied the iris color retrieval algorithm on the TwinsUK photographs where the pupil was centralized manually. We applied the iris color retrieval algorithm on the BTNS full-size eye photographs. BTNS photos were similar with those from RS in term of sizes and resolutions but were also taken under various day light conditions. The resultant distribution on the Hue dimension was not normal with a cluster of samples having low values. The mean correction technique used in TwinsUK data could not be applied because the iris part composed an essential portion of the image. We therefore excluded 66 samples with H < 20 from the BTNS data.

Genotyping and quality control

In RS1 and RS2, genotyping was carried out using the Infinium II HumanHap550K Genotyping BeadChip version 3. Complete information on genotyping protocols and quality control measures for RS1 and RS2 have been described previously [38,39]. In RS3, the genotyping method followed tightly those of RS1 and RS2 but using a denser array, the Human 610 Quad Arrays of lllumina. We excluded individuals with a call rate < 97.5%, gender mismatch with typed X-linked markers, excess autosomal heterozygosity > 0.33, duplicates or 1st degree relatives identified using IBS probabilities, and outliers using multi-dimensional scaling analysis with reference to the 210 Hap ap samples (Figure 2). Further excluding individuals without eye photos from all cohorts left 2429 individuals in RS1, 1535 in RS2, and 1987 in RS3 (Table 9). Genome-wide imputation in RS3 also followed tightly the methods used in RS1 and RS2 as described in detail previously [39]. Genotypes were imputed using MACH [40] based upon phased autosomal chromosomes of the HapMap CEU Phase II panel (release 22, build 36), orientated on the positive strand. The scripts developed for this project are freely available online. In total of 2543887 SNPs passed quality control. DNA samples from the TwinsUK registry genotyped using the Hap317K chip (lllumina, San Diego, California, USA). Quality control at individual and SNP levels were described in detail previously [41]. DNA samples from the BTNS were genotyped by the Scientific Services Division at deCODE Genetics, Iceland (http://www.decode.com/qenotypinq/) using the lllumina 610- Quad BeadChip. Additional genotyping for SNPs within known pigmentation genes was conducted using Sequenom as described in detail previously [42]. GWA analysis

GWA analysis was conducted in RS1 , RS2, and RS3 separately as well as in the merged data set RS123. The genotypes were merged according to the annotation files provided by lllumina on the positive strand. Pair-wise identity by state (IBS) matrix between individuals in RS123 was recalculated by using a subset of pruned markers (50,000 SNPs) that are in approximate linkage equilibrium. Principle components were re-derived using multidimensional scaling analysis of the 1-IBS matrix. The potential institutional heterogeneity between the three RS data sets and residual population stratification were checked by plotting the first 2 principal components. The effects of sex, age, and 4 main principal components on eye color traits were regressed out prior to GWA analysis. Association was based on a score test of the additive effect of the minor allele and the χ² value with 1df was derived. Inflation factors were derived for each trait and were used to adjust the χ² values. The distribution of observed P values was inspected using Q-Q plots against the P values from the null χ² distribution with 1df. P values smaller than 5x10^~8 were considered to be genome-wide significant. A subsequent scan is performed on the residuals excluding the effects of the significant SNPs in a previous scan, until no more significant SNP is identified. All significant SNPs were further examined using linear regression for quantitative traits and multinomial logistic regression for categorical traits, where sex, age, and the 4 principal components were adjusted as covariates. GWA analyses were conducted using R library GenABEL v1.4-3 [43] for genotyped SNPs and PLINK v1.07 [44] for imputed data. Haplotype and LD analysis were conducted for the regions of interest using Haploview v4.1 [45]. Replication analysis in TwinsUK and BTNS were conducted using the score test implemented in Merlin [45], which took account of relatedness.

Prediction analysis

We performed a multivariate analysis and present a linear model for predicting quantitative human eye color. A total of 70 predictors were analyzed, including the 64 SNPs (Table 13), the 4 SNP-SNP interaction terms identified in the interaction analysis, age, and sex. The predictors included in the final model were selected by iteratively including the next ranked predictor that reduces the Akaike information criterion [46] value of the model. The predictors and model parameters were derived in the RS1 and RS2 cohorts and subsequently used to predict eye color H and S in the RS3 cohort. The prediction accuracy was evaluated using R², the variance of H and S that were explained by the predictors in RS3. The genotype of rs12913832 was binary coded as 0 representing the GG genotype and 1 representing the GA or AA genotypes, whereas the genotypes of other SNPs were coded as 0, 1 and 2 number of the minor alleles.

Multinomial logistic regression was used for categorical prediction as described previously [9]. Categorical prediction was evaluated using AUC. Interaction analysis, prediction modeling and evaluation procedures were scripted in Matlab v7.6.0 (The MathWorks, Inc., Natick, MA). Interaction analysis

We tested pair-wise interactions between 64 SNPs from 7 previously known genes (HERC2, OCA2, SLC2A4, TYR, TYRP1, SLC45A2, IRF4) and the 3 newly identified loci (1q42.3, 17q25.3, 21q22.13) and for H and S (Table 13). The interactions were tested at the multiplicative scale between each pair of SNPs by comparing two models with and without the interaction term using F-test. Let a column vector y denote the color trait residuals after regressing out the effects of known factors. Let a n-by-3 matrix X0 denote individual genotypes, where the 1st column is constant of ones, 2nd and 3rd columns are the number of minor alleles minus 1 of the 2 SNPs under testing (-1 , 0, 1). Let X1 contain an additional column of the interaction term at the multiplicative scale.

The total sum of squares is

The residual sum of squares in the model without the interaction term is

SSR0 = - ny² .

and the residual sum of squares in the model with the interaction term is

SSRl = {{Χ^τΧ_χγΧ^τγ)^τΧ_χ ^τγ- ηγ²

The F value can be derived based on the sum of squares,

SSRl -SSRO

~ (SSt -SSRl)/(n - \)

which follows the F distribution with 1 and n-1 degrees of freedom under the null hypothesis of no interaction. A next round analysis is performed by adjusting for the effect of significant interactions in a previous round until no more significant interaction was detected. The significance threshold was defined at the level of P = 10^"5.

Because some of the SNPs tested are in high LD, we investigated the effect of LD on significant findings. We randomly selected over autonomies 10,000 pairs of SNPs in LD (r² > 0.5) and 10,000 pairs of SNPs not in LD (r² < 0.01) and tested for interaction with permutated color traits using the specified F-test. The observed test statistics, regardless to the presence of LD, did not deviate from the expected ones under the null distribution of no interaction. Thus, it is unlikely that the significant interactions identified in the current study are spurious due to LD between the SNPs.

We detected significant pair-wise interactions between SNPs in the HERC2, OCA2, SLC24A4, and IRF4 genes. The most significant interactions were between OCA2 rs1800407 and SLC45A2 rs16891982 (P = 2.7x10^"6 for H, P = 7.3x10^"5 for S), HERC2 rs12913832 and IRF4 rs12203592 (P = 6.1 ^χ10^"9 for H, P = 1.4x10^"6 for S), HERC2 rs12913832 and OCA2 rs728405 (P = 2.1 ^χΐθ^"6 for H only), and HERC2 rs12913832 and SLC24A4 rs12896399 (P = 1.9^χ10^~14 for S only). After these effects were adjusted, no additional SNP interactions were significant. Furthermore, we examined the distributions of H and S stratified by genotypes of the interacting SNPs. The effect of rs16891982 on H was only seen in rs1800407 CT carriers (P = 8.9^χ10^"21) but not in CC carriers (P = .69) (Figure 6). The effects of rs12203592 and rs728405 on H were seen in rs12913832 GA/AA carriers (P < 5.7 10^"6) but not in GG carriers (P > .26) (Figure 6). The SLC24A4 rs12896399 showed a highly significant effect on S in HERC2 rs12913832 GG carriers (P = 2.1X10^"42) but not in GA/AA carriers (P = .66) (Figure 6).

REFERENCES

1. Sturm RA, Larsson M (2009) Genetics of human iris colour and patterns. Pigment Cell Melanoma Res 22: 544-562.

2. Parra EJ (2007) Human pigmentation variation: evolution, genetic basis, and implications for public health. Am J Phys Anthropol Suppl 45: 85-105.

3. Frost P (2007) Human skin-color sexual dimorphism: a test of the sexual selection hypothesis. Am J Phys Anthropol 133: 779-780; author reply 780-771.

4. Chen TC, Chimeh F, Lu Z, Mathieu J, Person KS, et al. (2007) Factors that influence the cutaneous synthesis and dietary sources of vitamin D. Arch Biochem Biophys 460:

213-217.

5. Sulem P, Gudbjartsson DF, Stacey SN, Helgason A, Rafnar T, et al. (2008) Two newly identified genetic determinants of pigmentation in Europeans. Nat Genet 40: 835-837.

6. Sulem P, Gudbjartsson DF, Stacey SN, Helgason A, Rafnar T, et al. (2007) Genetic determinants of hair, eye and skin pigmentation in Europeans. Nat Genet 39: 1443-1452.

7. Kayser M, Liu F, Janssens AC, Rivadeneira F, Lao O, et al. (2008) Three genome- wide association studies and a linkage analysis identify HERC2 as a human iris color gene. Am J Hum Genet 82: 411-423.

8. Han J, Kraft P, Nan H, Guo Q, Chen C, et al. (2008) A genome-wide association study identifies novel alleles associated with hair color and skin pigmentation. PLoS Genet 4: e1000074.

9. Liu F, van Duijn K, Vingerling JR, Hofman A, Uitterlinden AG, et al. (2009) Eye color and the prediction of complex phenotypes from genotypes. Curr Biol 19: R192-193.

10. Kayser M, Schneider PM (2009) DNA-based prediction of human externally visible characteristics in forensics: motivations, scientific challenges, and ethical considerations.

Forensic Sci Int Genet 3: 154-161. 11. Sturm RA, Duffy DL, Zhao ZZ, Leite FP, Stark MS, et al. (2008) A single SNP in an evolutionary conserved region within intron 86 of the HERC2 gene determines human blue-brown eye color. Am J Hum Genet 82: 424-431.

12. Duffy DL, Montgomery GW, Chen W, Zhao ZZ, Le L, et al. (2007) A three-single- nucleotide polymorphism haplotype in intron 1 of OCA2 explains most human eye-color variation. Am J Hum Genet 80: 241-252.

13. Brues AM (1975) Rethingking human pigmentation. Am J Phys Anthropol 43: 387- 391.

14. Eiberg H, Troelsen J, Nielsen M, Mikkelsen A, Mengel-From J, et al. (2008) Blue eye color in humans may be caused by a perfectly associated founder mutation in a regulatory element located within the HERC2 gene inhibiting OCA2 expression. Hum Genet 123: 177-187.

15. Bonilla C, Boxill LA, Donald SA, Williams T, Sylvester N, et al. (2005) The 8818G allele of the agouti signaling protein (ASIP) gene is ancestral and is associated with darker skin color in African Americans. Hum Genet 116: 402-406.

16. Kaplan J, De Domenico I, Ward DM (2008) Chediak-Higashi syndrome. Curr Opin Hematol 15: 22-29.

17. Challa P (2009) Genetics of pseudoexfoliation syndrome. Curr Opin Ophthalmol 20: 88-91.

18. Trantow CM, Mao M, Petersen GE, Alward EM, Alward WL, et al. (2009) Lyst mutation in mice recapitulates iris defects of human exfoliation syndrome. Invest Ophthalmol Vis Sci 50: 1205-1214.

19. Gutierrez-Gil B, Wiener P, Williams JL (2007) Genetic effects on coat colour in cattle: dilution of eumelanin and phaeomelanin pigments in an F2-Backcross Charolais x Holstein population. BMC Genet 8: 56.

20. Izagirre N, Garcia I, Junquera C, de la Rua C, Alonso S (2006) A scan for signatures of positive selection in candidate loci for skin pigmentation in humans. Mol Biol Evol 23: 1697-1706.

21. McEvoy B, Beleza S, Shriver MD (2006) The genetic architecture of normal variation in human pigmentation: an evolutionary perspective and model. Hum Mol Genet 15 Spec

No 2: R176-181.

22. Tuntivanich N, Pittler SJ, Fischer AJ, Omar G, Kiupel M, et al. (2009) Characterization of a canine model of autosomal recessive retinitis pigmentosa due to a PDE6A mutation. Invest Ophthalmol Vis Sci 50: 801-813.

23. Patterson D (2009) Molecular genetic analysis of Down syndrome. Hum Genet 126: 195-214. 24. Saenz RB (1999) Primary care of infants and young children with Down syndrome. Am Fam Physician 59: 381-390, 392, 395-386.

25. Kim JH, Hwang JM, Kim HJ, Yu YS (2002) Characteristic ocular findings in Asian children with Down syndrome. Eye 6: 710-714.

26. Takamatsu K, Maekawa K, Togashi T, Choi DK, Suzuki Y, et al. (2002) Identification of two novel primate-specific genes in DSCR. DNA Res 9: 89-97.

27. Larsson M, Pedersen NL, Stattin H (2003) Importance of genetic effects for characteristics of the human iris. Twin Res 6: 192-200.

28. Mercke Odeberg J, Andrade J, Holmberg K, Hoglund P, Malmqvist U, et al. (2006) UGT1A polymorphisms in a Swedish cohort and a human diversity panel, and the relation to bilirubin plasma levels in males and females. Eur J Clin Pharmacol 62: 829- 837.

29. Strassburg CP (2008) Pharmacogenetics of Gilbert's syndrome. Pharmacogenomics 9: 703-715.

30. Burchell B, Hume R (1999) Molecular genetic basis of Gilbert's syndrome. J Gastroenterol Hepatol 14: 960-966.

31. Watson KJ, Gollan JL (1989) Gilbert's syndrome. Baillieres Clin Gastroenterol 3: 337- 355.

32. Maher B (2008) Personal genomes: The case of the missing heritability. Nature 456: 18-21.

33. Hofman A, Grobbee DE, de Jong PT, van den Ouweland FA (1991) Determinants of disease and disability in the elderly: the Rotterdam Elderly Study. Eur J Epidemiol 7: 403-422.

34. Hofman A, Breteler MM, van Duijn CM, Krestin GP, Pols HA, et al. (2007) The Rotterdam Study: objectives and design update. Eur J Epidemiol 22: 819-829.

35. Hofman A, Breteler MM, van Duijn CM, Janssen HL, Krestin GP, et al. (2009) The Rotterdam Study: 2010 objectives and design update. Eur J Epidemiol 24: 553-572.

36. Zhu G, Montgomery GW, James MR, Trent JM, Hayward NK, et al. (2007) A genome-wide scan for naevus count: linkage to CDKN2A and to other chromosome regions. Eur J Hum Genet 15: 94-102.

37. Falchi M, Bataille V, Hayward NK, Duffy DL, Bishop JA, et al. (2009) Genome-wide association study identifies variants at 9p21 and 22q13 associated with development of cutaneous nevi. Nat Genet 41: 9 5-919.

38. Newton-Cheh C, Eijgelsheim M, Rice KM, de Bakker PI, Yin X, et al. (2009) Common variants at ten loci influence QT interval duration in the QTGEN Study. Nat Genet 41:

399-406. 39. Estrada K, Krawczak M, Schreiber S, van Duijn K, Stolk L, et al. (2009) A genome- wide association study of northwestern Europeans involves the C-type natriuretic peptide signaling pathway in the etiology of human height variation. Hum Mol Genet 18: 3516- 3524.

40. Li Y, Wilier C, Sanna S, Abecasis G (2009) Genotype imputation. Annu Rev Genomics Hum Genet 10: 387-406.

41. Zhai G, van Meurs JB, Livshits G, Meulenbelt I, Valdes AM, et al. (2009) A genome- wide association study suggests that a locus within the ataxin 2 binding protein 1 gene is associated with hand osteoarthritis: the Treat-OA consortium. J Med Genet 46: 614-616. 42. Zhao ZZ, Nyholt DR, Le L, Martin NG, James MR, et al. (2006) KRAS variation and risk of endometriosis. Mol Hum Reprod 12: 671-676.

43. Aulchenko YS, Ripke S, Isaacs A, van Duijn CM (2007) GenABEL: an R library for genome-wide association analysis. Bioinformatics 23: 1294-1296.

44. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J

Hum Genet 81: 559-575.

45. Barrett JC, Fry B, Mailer J, Daly MJ (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21 : 263-265.

46. Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Contr 19: 716-723.

Example 2: Method and nucleic acid molecules useful for genotyping SNPs

We describe a multiplex method for genotyping twelve SNPs. Twelve PCR primer pairs were designed using the free web-based design software Primer3Plus (A. Untergasser et al (2007) Nucleic Acids Res. 35: W71-74) using the default parameters of the program. Each PCR fragment size was limited to less than 150 bp to cater for degraded DNA samples, vital for future application on forensic samples. The sequences surrounding the relevant SNP were searched with BLAST (S. F. Altschul et al (1997) Nucleic Acids Res. 25: 3389-3402) against dbSNP (S. T. Sherry et al (2001) Nucleic Acids Res. 29: 308- 311) for other SNP sites that may interfere with primer binding, and these sites were avoided. Also, to ensure there would be little interaction between all forward and reverse primers, the software program AutoDimer (P. M. Vallone, J. M. Butler (2004) Biotechniques 37: 226-231) was used throughout the design. The PCR primer sequences can be found in Table 12. PCR primers pairs 6a or 6b can be used in the alternative. For the single multiplex PCR, a total of 1 μΙ (0.5 - 2 ng) genomic DNA extract from each individual will be amplified in a 12 μΙ PCR reaction with 1 x PCR buffer, 2.7 mM MgCI2 , 200 μΜ of each dNTP, primer concentrations of 0.416 μΜ each and 0.5 U AmpliTaq Gold DNA polymerase (Applied Biosystems Inc., Foster City, CA). Thermal cycling for PCR will be performed on the gold-plated 96-well GeneAmp® PCR system 9700 (Applied Biosystems). The conditions for multiplex PCR will be as follows: (1) 95 X for 10 min, (2) 33 cycles of 95 °C for 30s and 60 X for 30s, (3) 5 min at 60 X. Both forward and reverse SBE primers were designed for each SNP and the final primers chosen were based on their suitability for the multiplex and the genotype of the resultant product to allow complete multiplexing. The primer sequences and specifications can be found in Table 12. SBE primer 6a or 6b can be used in the alternative. The design followed a similar protocol to the PCR primer design ensuring primer melting temperatures of approximately 55 X for the SBE reaction and all possible primer interactions were screened. To ensure complete capillary separation between the products, poly-T tails of varying sizes were added to the 5' ends of the SBE primers. Following PCR product purification to remove unincorporated primers and dNTPs, the multiplex SBE assay will be performed using 1 μΙ of product with 1 μΙ SNaPshot reaction mix in a total reaction volume of 5 μΙ. Thermal cycling for SBE will be performed on the gold-plated 96-well GeneAmp® PCR system 9700 (Applied Biosystems). The following thermocycling programme will be used: 96 X for 2 min and 25 cycles of 96 X for 10s, 50 X for 5s and 60 X for 30s. Excess fluorescently labelled ddNTPs will be inactivated and 1 μΙ of cleaned multiplex extension products will be run on an ABI 3130x1 Genetic Analyser (Applied Biosystems) following the ABI Prism® SNaPshot kit standard protocol (Applied Biosystems). Allele calling will be performed using GeneMapper v. 3.7 software (Applied Biosystems). A custom designed bin set will be implemented to allow automation of genotyping. For sensitivity testing, a threshold of 50 rfu for peak intensities will be adopted to ensure accuracy of genotyping. Template concentrations from 0.5 ng/μΙ - 0.015 ng/μΙ will be used to test the overall sensitivity of the multiplex. The assay is expected to work optimally between 0.25-0.5 ng of template DNA, but may also reveal complete SNP profiles down to a level of 31 pg representing approximately 6 human diploid cells. Table 9. Eye color details and demographics of the subjects

RS1 RS2 RS3 TwinsUK BTNS

N individuals 2429 1535 1987 2261 1282

N SNPs (K) 550 550 610

Age 74.02 8.23 67.65 7.37 56.24 5.80 52.22 12.52 17.19 4.56

Female 59.40 54.40 56.10 89.31 51.48

Hue 34.86 7.62 32.19 7.25 32.27 6.93 19.22 18.44 27.99 6.38

Saturation 0.40 0.13 0.44 0.14 0.48 0.15 0.47 0.19 0.54 0.24

Blue 68.01 69.51 71.26 53.51

Intermediate 9.76 4.82 7.50 28.39

Brown 22.23 25.67 21.24 18.10

Dark blue 4.12 3.52 2.62

Grey/light blue 57.93 52.83 41.62

Green/brown

spots 18.98 18.89 31.71

Light brown 18.03 23.13 22.35

Dark brown 0.95 1.63 1.71

Raw values, percentages, or means and standard deviations

TwinsUK: normal portrait photos were available with low iris resolution

Table 10. New SNPs associated with eye color in Rotterdam Studies, and replications in TwinsUK and Brisbane Twin Nevus Study

RS123 TwinsUK

RS1 (n=2429) RS2 (n=1535) RS3 ( n=1987) (n=5951) (n=2261) BTNS (n=12

SNP EA Trait beta P beta P beta P beta P beta P beta

1q42.3 LYST

rs3768056 G S 0.01 5.0E-05 0.01 3.7E-03 0.01 1.3E-03 0.01 7.8E-09 NS

rs9782955 T S 0.01 3.3E-05 0.01 3.6E-03 0.01 1.2E-03 0.01 5.5E-09 NS

2q37 UGT1A

rs2070959 G C 0.05 4.8E-07

rs1105879 C C 0.04 7.8E-07

17q25.3 NPLOC4- ■HGS

rs7219915 T C 0.11 4.1E-05 0.10 1.8E-03 0.10 2.8E-04 0.11 5.9E-11 NS 0.10 4.3E rs9894429 T C 0.11 2.8E-05 0.07 2.0E-02 0.11 2.8E-05 0.10 2.0E-10 NS 0.12 7.0E rs12452184 T C 0.10 1.5E-04 0.08 1.2E-02 0.10 2.1E-04 0.10 7.2E-09 NS 0.07 6.1 E

21q22.13 TTC3-DSCR9

rs1003719 A C -0.10 7.2E-05 -0.11 5.6E-04 -0.06 2.8E-02 -0.09 1.9E-08 -0.12 9.1 E-04 -0.08 4.1 E rs2252893 C C -0.12 2.4E-06 -0.08 1.0E-02 -0.06 3.4E-02 -0.10 6.0E-09 -0.11 2.3E-03 -0.08 4.1 E rs2835621 A C -0.12 3.2E-06 -0.09 8.0E-03 -0.06 3.1 E-02 -0.10 5.0E-09 -0.11 2.3E-03 -0.08 4.1 E rs2835630 G C -0.13 1.2E-06 -0.09 9.2E-03 -0.04 1.2E-01 -0.09 3.1 E-08 -0.10 4.8E-03 -0.09 1.8E rs7277820 G C -0.12 1.5E-06 -0.09 4.9E-03 -0.04 1.0E-01 -0.09 1.5E-08 -0.10 6.9E-03 -0.09 1.5E

EA, the effect allele based on which beta was derived

RS123, merged data of Rotterdam Study 1 , 2, and 3

NS, not significant

Table 11. Predicting quantitative eye color in Rotterdam Studies

Hue Saturation

AR⁷

Predictors Gene Beta se P value % rank Beta se AF P value % ra

Constant 34.31 1.32 0.550 0.011

Female 0.29 0.14 4.2 4.2E-02 0.04 17† 0.009 0.003 11.8 6.0E-04 0.09 1

Age per 10 yr 0.71 0.07 125.4 8.2E-29 1.17 2 -0.030 0.001 627.5 1.4E-131 5.03 rs3768056G LYST NS 0.013 0.002 37.9 8.0E-10 0.29 rs2070959G UGT1A 0.27 0.12 6.8 0.011 0.06 16 0.004 0.001 4.6 0.031 0.04 1 rs16891982C SLC45A2 — — — NS — — — — — NS — rs12203592A IRF4 -2.10 0.51 17.9 2.4E-05 0.16 10 — — — NS — rs1325127G TYRP1 -0.54 0.11 23.5 1.3E-06 0.21 7 0.012 0.002 40.9 1.7E-10 0.32 rs1393350A TYR 0.38 0.11 13.9 1.9E-04 0.12 14 -0.011 0.002 34.0 5.9E-09 0.26 rs12896399C SLC24A4 -0.46 0.10 21.9 3.0E-06 0.20 9 0.059 0.005 125.0 9.9E-29 0.98 rs728405C OCA2 -1.38 0.41 11.8 6.0E-04 0.10 15 — — — NS — rs1800407G OCA2 7.69 1.30 20.7 5.6E-06 0.18 12 -0.062 0.010 19.6 9.8E-06 0.15 1 rs1129038C HERC2 -4.55 0.43 96.2 1.6E-22 0.88 3 0.027 0.008 12.3 4.5E-04 0.09 1 rs 12913832 A* HERC2 -9.91 0.74 4665.6 <1 E-300 44.50 1 0.258 0.012 5438.3 <1 E-300 48.31 rs9894429A NPLOC4 0.55 0.10 33.1 9.2E-09 0.30 5 -0.009 0.002 25.0 5.9E-07 0.19 rs7277820G DSCR9 -0.45 0.10 25.7 4.2E-07 0.23 6 0.010 0.002 33.6 7.1 E-09 0.25 rs1800407G rs16891982C -5.25 1.09 23.7 1.1 E-06 0.21 13 0.025 0.006 15.8 7.3E-05 0.12 1 rs12913832A rs12203592A 2.11 0.36 33.9 6.1E-09 0.31 4 -0.011 0.002 23.3 1.4E-06 0.18 1 rs12913832A rs728405C 1.13 0.27 22.5 2.1 E-06 0.20 8 — — — NS — rs12913832A rs12896399C — — — NS — — -0.030 0.004 58.9 1.9E-14 0.46

Total (excl UGT1) 48.79 56.72

Total (incl UGT1) 48.85 56.76

NS: not significant. Beta, se, and P values were derived in RS1 and RS2 cohorts, R changes were estimated in RS3 cohort. The interaction terms defined at the multiplicative scale. * rs12913832 A allele is modeled to have a dominant effect, allelic effects in other SNPs are modeled additively. The m effect of rs16891982C is not significant when the interaction term is included.† Rank is 16 if UGT1 SNP is not included.

Table 12: Primers for multiplex genotyping of 12 SNPs

PCR primers

rs1800407_F: TGAAAGGCTGCCTCTGTTCT rs1800407_R: CGATGAGACAGAGCATGATGA

rs2070959_F: ATTTGGGCCTACCATCTGTG rs2070959_R: TTGTGTAGCACCTGGGAATG

rs9894429_F: TGTTGCTGTGATCCGCTTC rs9894429_R: AGGACCTCACTAGGCTGTGC

rs1129038_F: TCCTTTGCTTCGGACTCTACA rs1129038_R: ACACCAGGCAGCCTACAGTC

rs12203592_F: ACAGGGCAGCTGATCTCTTC rs12203592_R: GCTAAACCTGGCACCAAAAG

rs1393350_F: TTTCTTTATCCCCCTGATGC rs1393350_R: GGGAAGGTGAATGATAACACG

rs1393350_F: GCGTGCATATCCACCAACT rs1393350_R: TGTTTGTATCTGGGAAGGTGAA

rs12913832_F: GAATTTGTTCTTCATGGCTCTCT rs12913832_R: GGCCCCTGATGATGATAGC

rs12896399_F: CTGGCGATCCAATTCTTTGT Rs12896399_R: CTTAGCCCTGGGTCTTGATG

rs3768056_F: GGATCTACAGAGCTGTTTCTCTGC rs3768056_R: TGTGCAACAGACTCCCAGAC

Rs2835630_F: CCCTCTTTTAGTGGTCTTAACTATTCC Rs2835630_R: TGCCACAAGATATTTGGGTTG

Rs16891982_F: TCCAAGTTGTGCTAGACCAGA Rs16891982_R: CGAAAGAGGAGTCGAGGTTG

rs1325127_F: TCTGTTGTTAGCCTACCTAGATGTTT rs1325127 R: AAACATAAAAACATGATGGAACACA

SBE primer

rs1800407_snF:

TTTTGCATACCGGCTCTCCC

rs2070959_snF:

ΤΊ CGTGTTCCCTGGAGCAT

rs9894429_snF: TT TTTGATCCGCTTCACTCCATC

rs1129038_snR: TT TTT CAGTCTACACAGCAGCGAG

rs12203592_snF:

TT TTT ATTTGGTGGGTAAAAGAAGG

Rs1393350_snR:

TTT ATTTGTAAAAGACCACACAGATTT

Rs1393350_snF:

TT TT TCAGTCCCTTCTCTGCAAC

rs12913832_snR:

TTTTTTTGCGTGCAGAACTTGACA

Rs12896399_snF:

TT TTT TTTTTATCTTTAGGTCAGTATATTTTGGG

rs3768056_snR:

TT TTTTT TTTTTTTTTTACCATGATTATCACATATACAGCA

Rs2835630_SnR: TT TTTTT TTTTTTTT I I I I I TTTGGAAAGCTGACTAATTTACAGAG

Rs16891982_snR: TT TTTTT TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGGATGTTGGGGCTT rs1325127 snF: TT TTTTT TTTTTTTTTTTTTTTTTTTTTTTATGTTGTTAGCCTACCTAGATGTTTA

Table 13. SNPs ascertained for pair-wise interaction analysis and P values from single SNP analysis

P values

ex SNP Chr Gene 3color 5color H S CHS1 CHS2

1 rs3768056 1q42.3 LYST 0.073 0.015 0.368 8.3E-09 2.2E-04 3.4E-07

2 rs9782955 1q42.3 LYST 0.168 0.027 0.374 6.0E-09 2.0E-04 2.3E-07

65 Π52070959 2q37 UGT1A 0.981 0.441 0.010 0.031 0.817 4.8E-07

66 rs1105879 2q37 UGT1A 0.756 0.440 0.002 0.104 0.441 7.8E-07

3 rs16891982 5p15.33 SLC45A2 2.9E-08 5.8E-07 9.3E-06 0.008 6.1E-05 0.035

4 rs26722 5p15.33 SLC45A2 2.9E-05 2.0E-06 4.4E-06 0.002 1.2E-05 0.080

5 rs12203592 6p25.3 IRF4 7.9E-05 8.7E-06 2.2E-05 1.6E-05 1.4E-06 0.896

6 rs1540771 6p25.3 IRF4 0.056 0.002 0.147 0.002 0.012 0.105

7 rs791691 9p23 TYRP1 1.7E-04 1.8E-06 4.0E-04 2.1 E-08 2.9E-07 0.045

8 rs702133 9p23 TYRP1 4.7E-04 3.7E-06 0.001 4.6E-08 9.8E-07 0.030

9 rs1408809 9p23 TYRP1 5.4E-05 2.6E-07 4.2E-04 1.1 E-08 2.2E-07 0.032

10 rs10429629 9p23 TYRP1 2.5E-07 5.2E-11 6.7E-07 1.2E-10 1.4E-10 0.187

11 rs10809808 9p23 TYRP1 1.3E-07 1.5E-10 3.3E-08 1.9E-11 5.8E-12 0.320

12 rs1325127 9p23 TYRP1 2.2E-09 1.1E-09 1.0E-07 1.3E-10 4.0E-11 0.363

13 rs1408799 9p23 TYRP1 9.6E-05 1.9E-06 4.9E-06 8.3E-08 2.3E-08 0.539

14 rs10960751 9p23 TYRP1 5.4E-09 1.3E-10 8.5E-09 1.2E-11 1.7E-12 0.425

15 rs683 9p23 TYRP1 1.2E-04 7.2E-06 1.4E-04 2.3E-06 1.6E-06 0.436

16 rs1137134 9p23 TYRP1 2.0E-07 2.8E-09 1.0E-06 4.2E-10 3.8E-10 0.231

17 rs1042602 11q14.3 TYR 0.097 0.623 0.200 0.590 0.301 0.396

18 rs11018528 11q14.3 TYR 1.7E-06 3.7E-10 1.2E-05 2.3E-10 1.8E-09 0.062

19 rs10765 98 11q14.3 TYR 2.0E-07 1.1E-11 1.0E-06 2.1E-13 6.8E-12 0.018

20 rs1847134 11q14.3 TYR 2.2E-08 3.5E-13 1.1E-06 1.7E-12 2.1E-11 0.039

21 rs1393350 11q14.3 TYR 4.3E-10 3.8E-14 3.1E-09 7.4E-18 3.3E-16 0.011

22 rs1806319 11q14.3 TYR 1.5E-08 1.0E-09 4.5E-05 2.4E-08 5.8E-08 0.169

23 rs1875565 11q14.3 TYR 0.005 5.8E-07 6.9E-04 6.2E-08 7.9E-07 0.049

24 rs4904864 14q32.12 SLC24A4 0.002 1.0E-09 0.028 2.9E-13 1.2E-07 1.3E-07

25 rs4904866 14q32.12 SLC24A4 2.3E-07 6.1E-20 5.9E-05 4.3E-23 8.2E-15 1.9E-09

26 rs 12896399 14q32.12 SLC24A4 1.7E-07 4.8E-20 4.0E-05 2.0E-23 3.8E-15 2.1E-09

27 rs4904868 14q32.12 SLC24A4 7.6E-06 7.7E-16 1.4E-04 2.1E-18 2.2E-12 5.0E-07

28 rs2594935 15q13.1 OCA2 2.0E-27 1.4E-25 4.0E-21 1.9E-28 1.3E-27 0.040

29 rs728405 15q 3.1 OCA2 1.1E-29 1.0E-26 5.2E-24 1.3E-35 4.1E-33 0.002

30 rs1800407 15q13.1 OCA2 5.6E-12 3.4E-11 3.1E-09 3.6E-10 9.0E-11 0.749

31 rs3794604 15q13,1 OCA2 3.0E-64 1.3E-46 3.8E-53 1.6E-42 7.7E-54 0.004

32 rs4778232 15q13.1 OCA2 1.3E-46 9.2E-36 3.7E-41 4.2E-37 4.4E-44 0.155

33 rs1448485 15q13.1 OCA2 8.7E-60 3.6E-42 3.4E-48 1.4E-38 8.3E-49 0.007

34 rs8024968 15q13.1 OCA2 1.2E-73 1.1E-53 1.4E-61 6.5E-50 7.7E-63 0.004

35 rs1597196 15q13.1 OCA2 1.5E-50 5.9E-37 4.2E-45 9.8E-41 2.2E-48 0.142

36 rs7179994 15q13.1 OCA2 8.9E-23 8.1E-21 2.4E-24 6.3E-26 3.3E-28 0.848

37 rs4778138 15q13.1 OCA2 4.6E-227 1.1E-199 1.1E-182 9.9E-164 3.0E-197 0.004

38 rs4778241 15q13.1 OCA2 4.4E-282 1.2E-257 1.9E-224 9.5E-227 8.1E-258 0.463

39 rs7495174 15q13.1 OCA2 1.1E-273 1.2E-231 1.0E-220 3.2E-202 2.0E-241 0.007

40 rs1129038 15q13.1 HERC2 <1.0E-300 <1.0E-300 <1.0E-300 <1.0E-300 <1.0E-300 0.606 P values

Index SNP Chr Gene 3color 5color H S CHS1 CHS2

41 rs12593929 15q13.1 HERC2 6.6E-225 1.9E- 92 1.3E-180 9.8E-173 2.5E-201 0.090

42 rs12913832 15q13.1 HERC2 <1.0E-300 <1.0E-300 <1.0E-300 <1.0E-300 <1.0E-300 0.602

43 rs7183877 15q13.1 HERC2 2.0E-142 8.3E-134 2.6E-103 1.1E-128 4.9E-131 0.005

44 rs11635884 15q13.1 HERC2 9.3E-30 1.9E-31 3.8E-36 1.3E-17 2.6E-29 7.7E-10

45 rs3935591 15q13.1 HERC2 <1.0E-300 <1.0E-300 <1.0E-300 <1.0E-300 <1.0E-300 0.893

46 rs7170852 15q13.1 HERC2 <1.0E-300 <1.0E-300 <1.0E-300 <1.0E-300 <1.0E-300 0.778

47 rs8041209 15q13.1 HERC2 2.6E-211 1.2E-178 4.6E-171 9.6E-167 2.3E-192 0.192

48 rs8028689 15q13.1 HERC2 2.2E-210 1.3E-178 1.1E-172 2.7E-168 3.3E-194 0.188

49 rs2240203 15q13.1 HERC2 1.1E-209 1.0E-176 1.4E-171 2.6E-167 5.4E-193 0.194

50 rs2240202 15q13.1 HERC2 6.5E-208 2.1E-177 9.8E-171 4.0E-167 2.1E-192 0.220

51 rs916977 15q13.1 HERC2 <1.0E-300 <1.0E-300 <1.0E-300 <1.0E-300 <1.0E-300 0.927

52 rs16950979 15q13.1 HERC2 9.6E-209 5.2E-178 2.1E-171 5.0E-168 2.6E-193 0.229

53 rs2346050 15q13.1 HERC2 4.6E-212 6.6E-180 4.3E-173 4.8E-169 7.2E-195 0.201

54 rs16950987 15q13.1 HERC2 7.5E-213 1.4E-180 4.4E-174 4.0E-170 4.7E-196 0.205

55 rs1667394 15q13.1 HERC2 <1.0E-300 <1.0E-300 <1.0E-300 <1.0E-300 <1.0E-300 0.932

56 rs12592730 15q13.1 HERC2 6.5E-208 2.1E-177 9.8E-171 4.0E-167 2.1E-192 0.220

57 rs1635168 15q13.1 HERC2 7.5E-219 2.6E-187 1.3E-181 3.8E-172 1.5E-201 0.062

58 rs7219915 17q25.3 NPLOC4 6.9E-04 2.2E-09 4.6E-09 6.4E-09 4.8E-11 0.730

59 rs9894429 17q25.3 NPLOC4 0.002 3.4E-07 6.3E-09 6.1E-08 2.4E-10 0.489

60 rs6O58017 21q22.13 TTC3 0.314 0.890 0.061 0.353 0.109 0.221

61 rs2252893 2 q22.13 TTC3 0.003 1.6E-04 3.6E-07 6.9E-09 9.1E-10 0.618

62 rs2835621 21q22.13 TTC3 0.003 1.7E-04 2.2E-07 8.7E-09 7.2E-10 0.723

63 rs2835630 21q22.13 TTC3 0.003 8.1E-05 1.4E-06 1.8E-08 4.0E-09 0.548

64 rs7277820 21q22.13 DSCR9 0.002 6.6E-05 5.9E-07 1.6E-08 2.1E-09 0.654

P values are adjusted for the effect of HERC2 rs12913832 except the chromosome 15q13.1 region.

Claims

A method for predicting the iris color of a human, the method comprising:

(a) obtaining a sample of the nucleic acid of the human;

genotyping the nucleic acid for at least one of the following polymorphisms:

(v) a polymorphism which is (a) in the region between basepairs 233848903 and 234546690 on chromosome 2 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs2070959 or a polymorphic site which is in linkage disequilibrium with rs2070959 at an r² value of at least 0.5 ; and

The method of Claim 1 wherein step (b) comprises genotyping the nucleic acid (i) the SNP rs1800407 or a polymorphic site which is in linkage disequilibrium with rs1800407 at an r² value of at least 0.5; and/or

(ii) the SNP rs12896399 or a polymorphic site which is in linkage disequilibrium with rs12896399 at an r² value of at least 0.5.

3. The method of Claim 1 or 2 wherein step (b) comprises genotyping the nucleic acid for: a polymorphism in the region between basepairs 76891593 and 77498447 on chromosome 17 which is associated with variation in iris color, preferably rs9894429 or a polymorphic site which is in linkage disequilibrium with rs9894429 at an r² value of at least 0.5.

The method of any of Claims 1 to 3 wherein step (b) comprises genotyping the nucleic acid for: a polymorphism in the region between basepairs 37100732 and 37761703 on chromosome 21 which is associated with variation in iris color, preferably rs7277820 or a polymorphic site which is in linkage disequilibrium with rs7277820 at an r² value of at least 0.5.

The method of any of Claims 1 to 4 wherein step (b) comprises genotyping the nucleic acid for: a polymorphism in the region between basepairs 233690968 and 234296843 on chromosome 1 which is associated with variation in iris color, preferably rs3768056 or a polymorphic site which is in linkage disequilibrium with rs3768056 at an r² value of at least 0.5.

The method of any of Claims 1 to 5 wherein step (b) comprises genotyping the nucleic acid for: a polymorphism in the region between basepairs 233848903 and 234546690 on chromosome 2 which is associated with variation in iris color, preferably rs2070959 or a polymorphic site which is in linkage disequilibrium with rs2070959 at an r² value of at least 0.5.

The method of any of Claims 1 to 6 wherein step (b) further comprises evaluating the age of the human.

The method of any one of Claim 1 to 7 wherein step (b) further comprises evaluating the gender of the human.

9. The method of any one of Claims 1 to 8 wherein step (c) comprises predicting at least one of hue, saturation, colourfulness and chroma as a numeric variable based on the results of step (b). 10. The method of Claim 9 wherein step (c) comprises predicting hue and saturation as numeric variables based on the results of step (b). 1. A method for predicting the iris color of a human, the method comprising:

(a) obtaining a sample of the nucleic acid of the human;

(b) genotyping the nucleic acid for a polymorphism which is (i) in the region between basepairs 76891593 and 77498447 on chromosome 17 according to NCBI Build 36 and is associated with variation in iris color; and/or is (ii) rs9894429 or a polymorphic site which is in linkage disequilibrium with rs9894429 at an r² value of at least 0.5; and

(c) predicting the iris color based on the results of step (b). 2. A method for predicting the iris color of a human, the method comprising:

(a) obtaining a sample of the nucleic acid of the human;

(c) predicting the iris color based on the results of step (b).

13. A method for predicting the iris color of a human, the method comprising:

(a) obtaining a sample of the nucleic acid of the human;

(b) genotyping the nucleic acid for: a polymorphism which is (i) in the region between basepairs 233690968 and 234296843 on chromosome 1 according to NCBI Build 36 and is associated with variation in iris color; 3nd/or is (ii) rs3768056 or a polymorphic site which is in linkage disequilibrium with rs3768056 at an r² value of at least 0.5; and

(c) predicting the iris color based on the results of step (b). 14. A method for predicting the iris color of a human, the method comprising:

(a) obtaining a sample of the nucleic acid of the human; (b) genotyping the nucleic acid for a polymorphism which is (i) in the region between basepairs 233848903 and 234546690 on chromosome 2 according to NCBI Build 36 and is associated with variation in iris color; and/or is (ii) rs2070959 or a polymorphic site which is in linkage disequilibrium with rs2070959 at an r² value of at least 0.5; and

(c) predicting the iris color based on the results of step (b).

15. The method of any of Claims 11 to 14 wherein step (c) comprises predicting at least one quantitative color parameter of the iris as a numeric variable.

16. The method of any of Claims 11 to 14 wherein step (c) comprises a categorical prediction of the iris color, such as a categorical prediction of brown, blue or intermediate. 17. The method of any preceding claim wherein for each polymorphism to be genotyped in step (b), the method comprises contacting the sample of the nucleic acid of the human with a nucleic acid molecule that hybridises selectively to a genomic region encompassing the polymorphism. 18. The method of Claim 17 wherein the sample of the nucleic acid of the human is subjected to a nucleic acid amplification before being contacted with the nucleic acid molecule.

19. The method of Claim 17 or 18 wherein the nucleic acid molecule is a primer and the method comprises performing a primer extension reaction and detecting the primer extension reaction product.

20. The method of Claim 19 wherein the primer extension reaction is a multiplex primer extension reaction.

21. A method of preparing a data carrier containing data on the predicted iris color of a human, the method comprising carrying out the method of any one of Claims 1 to 20 and recording the results on a data carrier. 22. A method of preparing a data carrier containing data on the predicted iris color of a human, the method comprising recording the results of a method carried out according to any one of Claims 1 to 20 on a data carrier.

23. The method of Claim 21 or 22 wherein the data is recorded in electronic form.

24. A method for predicting the iris color of a human based on the allele occurrences in a sample of their DNA of at least the following polymorphisms:

25. A method for predicting the iris color of a human based on the allele occurrences in a sample of their DNA of at least one of the following polymorphisms:

26. A method for creating a description of a human based on forensic testing, wherein the description includes a prediction of the iris color of the human based on the allele occurrences in a sample of their DNA of at least the following polymorphisms:

(iii) a polymorphism which is (a) in the region between basepairs 37100732 and 37761703 on chromosome 21 according to NCBI Build 36 and is associated with variation in iris color; and/or (b) rs7277820 or a polymorphic site which is in linkage disequilibrium with rs7277820 at an r² value of at least 0.5; (iv) a polymorphism which is (a) in the region between basepairs 233690968 and 234296843 on chromosome 1 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs3768056 or a polymorphic site which is in linkage disequilibrium with rs3768056 at an r² value of at least 0.5; and

(v) a polymorphism which is (a) in the region between basepairs 233848903 and 234546690 on chromosome 2 according to NCBI Build 36 and is associated with variation in iris color; and/or is rs2070959 or a polymorphic site which is in linkage disequilibrium with rs2070959 at an r² value of at least 0.5.

27. A method for creating a description of a human based on forensic testing, wherein the description includes a prediction of the iris color of the human based on the allele occurrences in a sample of their DNA of at least one of the following polymorphisms:

(b) a polymorphism which is (i) in the region between basepairs 37100732 and

37761703 on chromosome 21 according to NCBI Build 36 and is associated with variation in iris color; and/or is (ii) rs7277820 or a polymorphic site which is in linkage disequilibrium with rs7277820 at an r² value of at least 0.5; and

28. A method for genotyping polymorphisms indicative of human iris color comprising:

(a) obtaining a sample of the nucleic acid of a human; and

(b) genotyping the nucleic acid for at least one of the following polymorphisms: (i) a polymorphism which is (a) in the region between basepairs 76891593 and 77498447 on chromosome 17 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs9894429 or a polymorphic site which is in linkage disequilibrium with rs9894429 at an r² value of at least 0.5; and

29. A kit of parts for use in predicting the iris color of a human comprising:

(i) a primer pair suitable for amplifying the genomic region encompassing a polymorphism which is (a) in the region between basepairs 76891593 and 77498447 on chromosome 17 according to NCBI Build 36 and is associated with variation in iris color; and/or (b) rs9894429 or a polymorphic site which is in linkage disequilibrium with rs9894429 at an r² value of at least 0.5;

(ii) a primer pair suitable for amplifying the genomic region encompassing a polymorphism which is (a) in the region between basepairs 37100732 and 37761703 on chromosome 21 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs7277820 or a polymorphic site which is in linkage disequilibrium with rs7277820 at an r² value of at least 0.5;

(iii) a primer pair suitable for amplifying the genomic region encompassing a polymorphism which is (a) in the region between basepairs 233690968 and 234296843 on chromosome 1 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs3768056 or a polymorphic site which is in linkage disequilibrium with rs3768056 at an r² value of at least 0.5; or (iv) a primer pair suitable for amplifying the genomic region encompassing a polymorphism which is (a) in the region between basepairs 233848903 and 234546690 on chromosome 2 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs2070959 or a polymorphic site which is in linkage disequilibrium with rs2070959 at an r² value of at least 0.5.

30. The kit of Claim 29 wherein each of the primer pairs are suitable for use together in a multiplex polymerase chain reaction.

31. A kit of parts for use in predicting the iris color of a human comprising:

(i) a nucleic acid molecule that hybridizes selectively to a genomic region encompassing a polymorphism which is (a) in the region between basepairs 76891593 and 77498447 on chromosome 17 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs9894429 or a polymorphic site which is in linkage disequilibrium with rs9894429 at an r² value of at least 0.5;

(ii) a nucleic acid molecule that hybridizes selectively to a genomic region encompassing a polymorphism which is (a) in the region between basepairs 37100732 and 37761703 on chromosome 21 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs7277820 or a polymorphic site which is in linkage disequilibrium with rs7277820 at an r² value of at least 0.5,

(iii) a nucleic acid molecule that hybridizes selectively to a genomic region encompassing a polymorphism which is (a) in the region between basepairs 233690968 and 234296843 on chromosome 1 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs3768056 or a polymorphic site which is in linkage disequilibrium with rs3768056 at an r² value of at least 0.5;

(iv) a nucleic acid molecule that hybridizes selectively to a genomic region encompassing a polymorphism which is (a) in the region between basepairs 233848903 and 234546690 on chromosome 2 according to NCBI Build 36 and is associated with variation in iris color; and/or is (b) rs2070959 or a polymorphic site which is in linkage disequilibrium with rs2070959 at an r² value of at least 0.5.

32. The kit of Claim 31 wherein each of the nucleic acid molecules is a primer suitable for performing a primer extension reaction.

A solid substrate for use in predicting the iris color of a human, the solid substrate having attached thereto:

The solid substrate of Claim 33 wherein each of the nucleic acid molecules is a primer suitable for performing a primer extension reaction.