WO2008059165A2

WO2008059165A2 - Methods and tools for determining the origin of an individual

Info

Publication number: WO2008059165A2
Application number: PCT/FR2007/052330
Authority: WO
Inventors: Jean-Paul Moisan; Chrystelle Richard
Original assignee: Institut Genetique Nantes Atlantique (Igna)
Priority date: 2006-11-15
Filing date: 2007-11-13
Publication date: 2008-05-22
Also published as: WO2008059165A3; FR2914316A1

Abstract

The present application relates to methods for determining the ethnogeographic origin of an individual based on a nucleic acid sample. It also relates to genetic markers and to tools (primers, probes, chips, etc.) that can be used for determining this origin, and also to analytical kits and tools. The invention can be used in various contexts, such as in the case of legal expertise.

Description

Methods and tools to determine the origin of a subject

Introduction

The present application relates to methods of analyzing nucleic acid samples for determining the ethnogeographic origin of an individual. It also concerns a precise collection of genetic markers and tools (primers, probes, chips, etc.) that can be used to determine this origin, as well as kits and analysis tools. The invention can be used in a variety of contexts, such as in forensic expertise, and in research studies (migration and / or population dispersal).

The discovery, in the 1980s, by Jeffreys et al. ¹ ' ² that minisatellite sequences (or "variable number of tandem repeats", VNTRs) could provide a unique and original genetic fingerprint (excluding identical twins) placed DNA at the heart of forensic expertise ³ . This interest has continued to grow and now the analysis of the STRs ("short tandem repeats") provides the authorities with the ability to confuse suspects and link their genetic fingerprint with traces found at the workplace. 'a crime ⁴ but also to identify corpses of missing persons of known identity. This technology has already made it possible to solve many cases and shed light on certain errors of justice. In recent years, many governments have instigated the creation of national DNA fingerprint databases (FNAEG), which group together the STRs profiles of individuals who have committed criminal acts. This makes it possible to compare the DNA profile found on the scene of a crime with the set of profiles stored in the file and thus to show the involvement of an individual. However, this method does not in any way provide clues as to the membership of an individual to a specific human group, crucial information in the search for suspects or in the identification of human remains not claiming any subject to the known identity.

A genetic test for determining the traits of an individual has been proposed in WO2004 / 016768. However, this test requires the analysis of a very large number of genetic markers, namely 176 AIMs (Ancestry Informative Markers). On the other hand, these markers are indifferently located in or out of the coding regions of genes. The analysis of coding sequences poses problems from a regulatory point of view in a number of countries such as France.

Thus, there is currently no simple genetic method to obtain information on the ethnogeographic origin of an individual in the absence of any eyewitness.

Summary of the Invention

The present application proposes methods and tools for determining the ethnogeographic origin of an individual. More particularly, the present invention results from the identification of precise sets of genetic markers that make it possible to predict the membership of an individual to a particular ethnogeographic group, in the absence of a witness or evidence in the survey.

A first subject of the invention thus resides in a method for analyzing a sample comprising nucleic acids, making it possible to determine the ethnogeographic origin of an individual from whom the sample originates, the method comprising comprising the determination (in vitro or ex vivo). ), in the sample of nucleic acid (eg, DNA), alleles of a set of nucleotide polymorphisms (SNPs) located in non-coding regions of the genome, to obtain a set of alleles, this set of alleles being an indication of the ethnogeographic group.

A particular object of the invention is a method for determining the ethnogeographic origin of an individual, comprising determining (in vitro or ex vivo), in a sample of nucleic acid (eg, DNA), from the individual, alleles of a set of nucleotide polymorphisms (SNPs) located in non-coding regions of the genome, to obtain a set of alleles, this set of alleles being an indication of the ethnogeographic group to which the individual belongs. Advantageously, the set of SNPs comprises at least 5 SNPs chosen from the SNPs described in Table 1, preferably at least 6, 7, 8, 9, 10, 15, 20, 25 or 30 SNPs described in Table 1. In a particular embodiment, the set of SNPs comprises at least 9 SNPs chosen from the SNPs described in Table 1. Examples of sets of SNPs that can be used in the present invention are described later in the text and in FIGS. the experimental part.

As will be described in detail in the rest of the text, the set of alleles determined from the DNA sample of the individual is typically compared to one or more sets of reference alleles characteristic of ethnogeographic groups, thus making it possible to calculate the probability of belonging to one of these groups, for example by a Bayesian method. Advantageously, the sets of reference alleles are characteristic sets of the European, Saharan, Asian, North African and / or Indian populations, this list not being exhaustive.

In preferred embodiments, alleles are determined by sequencing, selective hybridization and / or selective amplification, and / or - the nucleic acid sample is derived from a fluid or biological tissue of the individual; and / or the nucleic acid sample is from a forensic sample; and / or the nucleic acid from the individual is amplified beforehand.

Another object of the present application lies in a kit usable for the implementation of a method as defined above, comprising a set of nucleotide probes specific for at least one allele, preferably each allele, SNPs. of the set of SNPs and / or a set of nucleotide primers allowing specific amplification of at least one allele, preferably of each allele, SNPs of the set of SNPs. The nucleotide probes are advantageously immobilized on a support. Another subject of the present application resides in a product (or a device) comprising a support on which nucleotide probes are immobilized, said probes being specific for at least one allele, preferably each SNPs allele of the set of SNPs defined above.

The invention also relates to the use of a set of nucleotide probes specific for at least one allele, preferably each SNPs allele of the set of SNPs defined above and / or a set of primers. nucleotides allowing a specific amplification of at least one allele, preferably of each allele, SNPs of the set of SNPs defined above, for determining in vitro the ethnogeographic origin of an individual or a sample.

The invention can be used from samples from any individual, of any age or origin, and can be implemented for example in the context of forensic expertise, or for migration and / or dispersion studies. populations.

Legend of Figures

Figure 1. Fst between populations and overall for each marker.

Figure 2. Overall Fst for each marker in ascending order

Figure 3. Assignment of samples from sub-Saharan Africa, East Asia and Europe to genetic groups derived from the analysis of the 32 SNPs in the study (Groups I to V) by Bayesian calculation. Each individual is represented by a vertical line partitioned in X segments that represent the fraction belonging to each X genetic group. The origin of individuals is as follows: 1 to 115 (sub-Saharan Africa); 116 to 231 (East Asia) and 232 to 348 (Europe). Figure 4. Ethogeographic inference on genotypes of the 9 group I SNPs (Fst> 0.8) of individuals from sub-Saharan Africa (1 to 115), East Asia (116 to 231), Europe ( 232 to 348).

Figure 5 Ethnogeographic inference on the genotypes of the 32 SNPs of the study (Groups I to V) of individuals from sub-Saharan Africa (1 to 115), East Asia (116 to 231), Europe ( 232 to 348), northern Africa (349 to 463) with 3 genetic clusters (X = 3).

Figure 6 Ethnogeographic inference made on the genotypes of the 32 SNPs of the study (Groups I to V) of individuals from sub-Saharan Africa (1 to 115), East Asia (116 to 231), Europe ( 232 to 348), northern Africa (349 to 463) with 4 genetic clusters (X = 4).

Figure 7 Ethnogeographic inference on the genotypes of the 32 SNPs in the study (Groups I to V) of individuals from sub-Saharan Africa (1 to 115), East Asia (116 to 231), Europe ( 232 to 348), North Africa (349 to 463) and India (464 to 519) with 5 genetic clusters (X = 5).

Detailed Description of the Invention

The study of human diversity and population structure has been the subject of much research in recent years. Lewontin ⁵ , one of the pioneers of this research, found by studying a large number of biochemical markers of blood groups on different populations that 85.5% of variations were observed within populations while only 6.3% of differences were observed between populations. populations. The conclusion was, therefore, that the majority of the differences residing within populations, the very notion of "race" had no genetic reality. Further diversity studies were then conducted with the analysis of DNA markers on autosomes (by distinction with mitochondrial DNA and Y chromosome) of the RFLP, STR and SNP ^{6 type} . The conclusion remains the same as that of Lewontin: approximately 83 to 88% of Autosomal variations are found within populations while 9 to 13% are found between continental groups. This extreme similarity between human populations is explained largely by the recent origin from a common ancestor to all populations in Africa ⁷ ^'8.

After the sequencing of the human genome and despite all the advances made in the biomedical field, the genetic factors causing common diseases remain largely unknown.

The international project HapMap ⁹ ^'10 , launched in 2002, aims to map all the variations common to individuals among the 3 billion nucleotide pairs composing the human genome, and to define the structure of the link blocks . This study was conducted on 270 DNAs from 3 different continental populations (Africa, Asia and Europe) which makes it possible to avoid listing the rare polymorphisms. A project similar to the HapMap project, carried out by the Perlegen company, made it possible to map 1,586,383 SNPs distributed uniformly along the genome in 71 individuals from 3 populations (Africa, Asia and Europe) ¹¹ .

Although it is known that variants that are physically close are often linked together, the linkage disequilibrium phenomenon remains complex and varies from one region of the genome to another and also between populations ¹² .

With regard to association studies, their success lies in estimating the allelic difference of certain markers between a cohort of patients with a disease (the cases) and healthy people (the controls). The search for low genetic effects can thus be biased if there is a stratification between the two populations compared ¹³ , ¹⁴ . The study of markers presenting population specificities can thus make it possible to detect possible stratifications between the two cohorts ¹⁵ .

A new methodology for genetic studies of complex disease mapping is linkage disequilibrium in mixed populations (MALD) ¹⁶ ^'17. The

MALD is based on the analysis of mixed origin patients such as Africans American or Hispanic Americans. MALD must be effective in the case of more frequent diseases in certain populations ¹⁸ (examples: multiple sclerosis where the genetic risk is higher in Europeans, prostate cancer is more common among Africans). In this perspective, many teams are working to form a dense mapping markers distributed throughout the genome and having a great difference between the parental populations ^19. A simple measure of population differentiation is the Fst (Wright Fixing Index) statistic, which measures the fraction of total genetic variation due to inter-population differences. The Fst is quantified by the variation of the allelic frequency on the tested loci and on a set of population. Its value is between 0 (no genetic difference) and 1 (fixed difference between populations) ²⁰ . Its calculation is a means of detecting any natural selection signatures that then generate a systematic deviation of Fst for the gene under selection and the surrounding genetic markers ²¹ ' ²² ' ²³ . The most obvious natural selection phenomenon in humans is that of the Duffy blood group locus, made up of 3 FY * B, FY * A and FYW ²⁴ alleles. The 3 alleles show a very strong differentiation according to the geographic regions considered. Thus, the distribution of the silent allele (which corresponds to an absence of the Fy antigen on the red blood cells) is superimposed on the endemic areas of Plasmodium vivax-related malaria and is fixed in the sub-Saharan populations ²³ ' ²⁵ ' ²⁶ . This silent phenotype is due to a transition within the FY * B gene promoter (Blood group duffy System [MIM 110700]). Another example of selection is the capacity to digest milk in adulthood (lactase persistence [MIM 223100]), a characteristic that varies with the populations associated in Europeans with two polymorphisms located far upstream of the lactase gene. The allelic frequency of these SNPs is very variable between populations of European and African / Asian origin ²⁷ ' ²⁸ ' ²⁹ . The study of Fst of more than 20,000 SNPs spread across the genome suggests that at least 174 genes are subjected to natural selection ^30. Recently, Hinds et al. have questioned this systematic approach to the genome by emphasizing that strong Fst are localized in both gene and non-gene regions and similarly affect non-synonymous and synonymous coding SNPs ¹¹ . The ideal genetic markers for distinguishing two populations are therefore those with allele attachment in one population and absence in the other. In reality, such loci, called private, are rare in the genome ¹⁰ ^'11. Just as widely distributed in gene regions as they are non-genic or coding and non-coding, they do not seem to be the products of natural selection.

The present application results from the identification of precise and selected genetic markers to allow the characterization of the ethnogeographic origin of individuals (or samples from individuals). More specifically, the invention shows that the ethnogeographic origin of individuals can be determined on the basis of a limited number of genetic markers located in non-coding regions of the genome. Thus, 32 SNPs were selected and tested on a panel of individuals from 5 populations (Europe, Asia, Africa, North Africa and India). The genotyping of the SNPs was performed by the allelic discrimination technique which combines the amplification of the DNA portion carrying the variant and the detection of this variant using two probes specific to one or the other of the alleles. A panel of about 100 DNAs per population was tested for each individual while the population of India was characterized on 56 DNAs. Genotyping each of these SNPs allowed us to calculate the allelic frequency of each SNP in each of the groups. These SNPs have a very different frequency between the DNAs of sub-Saharan Africa, Asia, Europe and North Africa.

The genetic markers identified are not directly correlated with a typical distinctive pheno character between populations, as is for example the color of the skin. Indeed, this character is linked to extremely strong selection forces, which implies that the same skin color can reflect a common adaptation as well as a common genetic origin. So even if the sub-Saharan African, the tribal people of South India and the Aborigines have similar skin pigmentation, they do not have more similarities between them than other populations ^31. In addition, the identified genetic markers are located in non-coding regions of the genome, which ensures compliance with French legislation (Article 706-54 of the Code of Criminal Procedure).

Moreover, the markers used are preferentially localized on the most possible chromosomes and, in the case where they are located on the same chromosome, the most distant possible.

Finally, the markers of the invention are of the SNP type, and are mainly transitions or transversions, which facilitates their analysis. Single nucleotide polymorphisms (SNPs) are the most abundant form of variation in the human genome and it is estimated that the global population shares about 10 million sites (or 1 variant every 300 bases) ⁹ . They correspond to the change of a single nucleotide, by transition, transversion, insertion or deletion, on a DNA sequence.

The list of identified SNPs is described in Table 1 with, for each SNP, the two alleles encountered. This list makes it possible to define (sub) sets of SNPs characteristic of the ethnogeographic origin of human individuals, comprising at least one, preferably at least 5 SNPs represented in Table 1. It is understood that these sets of SNPs can also include additional SNPs.

A first object of the invention thus lies in a method for determining the ethnogeographic origin of an individual, comprising determining, in a sample of DNA from the individual, the alleles of a precise set of SNPs located in non-coding regions of the genome, said set comprising at least 5 SNPs selected from the SNPs described in Table 1, preferably at least 6, 7, 8, 9, 10, 15, 20, 25 or 30 SNPs described in FIG. Table 1, to obtain a set of alleles, this set of alleles being an indication of the ethnogeographic group to which the individual belongs. Another object of the invention resides in a method for analyzing a sample containing nucleic acids, the method comprising determining, in said sample, alleles of a precise set of SNPs located in non-coding regions of the genome, said set comprising at least 5 SNPs selected from the SNPs described in Table 1, preferably at least 6, 7, 8, 9, 10, 15, 20, 25 or 30 SNPs described in Table 1, to obtain a set of alleles, this set of alleles being an indication of the ethnogeographic group from which the sample comes.

In a first particular embodiment, the set of SNPs comprises all 32 SNPs mentioned in Table 1, and the method comprises a determination of each allele of said SNPs.

In another embodiment, the set of SNPs comprises at least the 9 SNPs of group I as defined in Table 4, namely SNPs M1, M2, M5, M6, M7, M9, M15, M24. and M30. Such a set of SNPs makes it possible to determine the membership of an individual in the African, Asian or European group.

In another embodiment, the set of SNPs comprises at least 5 SNPs among the SNPs M1, M2, M3, M4, M5, M6, M7, M8 and M9. Such a set of SNPs makes it possible to determine the membership of an individual in the African group (see Table 3 for specific alleles).

In another embodiment, the set of SNPs comprises at least the SNPs M20, M21, M22, M23, M24, M25, M26, M27, M28, M29, M30, M31 and M32. Such a set of SNPs makes it possible to determine the membership of an individual in the European group (see Table 3 for specific alleles).

In another embodiment, the set of SNPs comprises at least 5 SNPs among the SNPs MlO, MiI, M12, M13, M14, M15, M16, M17, M18 and M19. Such a set of SNPs makes it possible to determine the membership of an individual in the Asian group (see Table 3 for specific alleles). Examples of population-specific reference profiles are provided in Table 5.

It is understood that the set of SNPs can be supplemented by other markers or SNPs, in order to further refine the method, if it is useful. Nevertheless, the sets of markers of the invention can make it possible to determine the ethnogeographic origin of the individuals with total reliability.

The determination of the set of alleles can be performed simultaneously, parallel or sequentially.

Different techniques can be used to determine alleles of a SNP in a sample, such as, for example, allele-specific hybridization (5 'nuclease assay, LightCycler, chip hybridization, etc.), primer extension ( minisequencing, SNAPshot, pyrosequencing, allele-specific extension, mass spectrometry), oligonucleotide-specific ligation, invasive cleavage, sequencing, selective hybridization, use of oligonucleotide-coated probes, amplification nucleic acid, or ligation-PCR or any molecular biology technique useful for genotyping. These methods may include the use of a nucleic probe (eg an oligonucleotide) capable of selectively or specifically detecting an SNP allele in the sample. The amplification can be carried out according to various methods known per se to those skilled in the art, such as PCR, CSF, strand displacement amplification (SDA), the use of oligonucleotides specific for alleles (ASO ), allele-specific amplification, Southern blotting, SSCA conformational analysis, electrophoresis, etc.

Several detection methods can be used to analyze the products of each type of reaction: fluorescence, luminescence, size measurement, mass measurement, etc.). Moreover, the reaction can be carried out in solution or on a solid support. According to a preferred embodiment, the method comprises the detection of the presence or absence of an allele by selective hybridization and / or by selective amplification.

Selective hybridization is typically performed using nucleic probes, preferably immobilized on a support, such as a solid or semi-solid support having at least one surface, flat or not, allowing the immobilization of nucleic probes. Such supports are for example a blade, ball, membrane, filter, column, plate, etc. They can be made of any compatible material, such as glass, silica, plastic, fiber, metal, polymer, etc. Nucleic probes can be any nucleic acid (DNA, RNA, PNA, etc.), preferably single-stranded, comprising a specific sequence of an allele of an SNP. The probes typically comprise from 5 to 300 bases, preferably from 8 to 150, more preferably less than 100, and even more preferentially less than 60, 50, 40 or 30 bases. The probes may be synthetic oligonucleotides, produced on the basis of the sequences of the alleles to be detected, according to conventional synthesis techniques. Such oligonucleotides typically have from 10 to 50 bases, preferably from 20 to 40, for example about 25 bases.

In a particular mode of implementation, several different oligonucleotides (or probes) are used to detect an allele of interest. This may include specific oligonucleotides centered differently on the SNP to be analyzed.

In another embodiment, a pair of probes can be used to analyze each biallelic SNP, one member of which is perfectly matched to one of the alleles and whose other member is perfectly matched to the other allele. Of course, it is possible to combine these two embodiments.

Specific examples of probes usable for the implementation of the invention are described in Table 2. The probes may be synthesized beforehand and then deposited on the support, or synthesized directly in situ, on the support, according to methods known per se to those skilled in the art. The probes can also be manufactured by genetic techniques, for example by amplification, recombination, ligation, etc.

The probes thus defined constitute another object of the present application, as well as their uses (essentially in vitro) for determining the ethnogeographic origin of an individual.

The hybridization can be carried out under standard conditions known to those skilled in the art and adjustable by it (see for example Sambrook et al., (1989) Molecular Cloning, CoId Spring Harbor Laboratory Press). In particular, the hybridization can be carried out under conditions of high, medium or low stringency, depending on the desired level of sensitivity, the quantity of available material, etc. For example, suitable hybridization conditions include a temperature of 55 to 65 ° C for 2 to 18 hours. After hybridization, different washes can be performed to remove unhybridized molecules, typically in SSC buffers comprising SDS, such as a buffer comprising 0.1 to 10 X SSC and 0.5-0.01% SDS.

The selective amplification is preferably carried out using a primer or a pair of primers for amplifying a region of the nucleic acid carrying the SNP to be analyzed. The primer may be specific for a sequence of the SNP or a region flanking the sequence of the SNP in a nucleic acid of the sample. The primer typically comprises a single-stranded nucleic acid, preferably between 5 and 50 bases in length, preferably between 5 and 30.

Specific examples of primers usable for the implementation of the invention are described in Table 2.

Such primers are another object of the present application, as well as their use (essentially in vitro) to determine the ethnogeographic origin of an individual. In this regard, another object of the invention lies in the use (in vitro) of a nucleotide primer allowing the amplification of an SNP as defined above to determine the ethnogeographic origin of an individual. Another particular object of the invention resides in the use (in vitro) of a set of nucleotide primers allowing the amplification of a set of SNPs as defined above, in order to determine the ethnogeographic origin of An individual The method of the invention can be performed from any sample comprising nucleic acids. Advantageously, a sample of tissue (bone, muscle) or biological fluid comprising nucleic acids, typically a sample of blood, sperm, saliva, urine, stool, hair, skin, etc., may be advantageously mentioned. The method may further be practiced from partially damaged, degraded nucleic acid and / or in very small amounts.

The sample can be obtained by any technique known per se, for example by sampling, by non-invasive techniques, from collections or sample banks, seals containing samples taken on crime scenes or crime, etc. . The sample may also be pre-treated to facilitate the accessibility of the nucleic acids, for example by lysis (mechanical, chemical, enzymatic, etc.), purification, centrifugation, separation, dilution, etc. The sample can also be labeled, to facilitate the determination of the presence of nucleic acids (fluorescent, radioactive, luminescent, chemical, enzymatic labeling, etc.). Moreover, the nucleic acids of the sample can be amplified prior to the SNPs analysis step.

According to a particular example of implementation of the invention, a sample of the individual is taken. The sample is optionally processed to make the nucleic acids more accessible and / or to amplify the nucleic acids (or a fraction thereof). The nucleic acids are then brought into contact with nucleic probes as defined above (optionally immobilized on a support) and the hybridization profile obtained is determined, making it possible to determine or predict the ethnogeographic group of membership of the individual. Alternatively, the acids nuclei are contacted with nucleic primers as defined above and the amplification product is analyzed, making it possible to determine or predict the ethnogeographic group belonging to the individual.

Another object of the present application lies in a kit usable for the implementation of a method as defined above, comprising a set of nucleotide probes specific for at least one allele, preferably each allele, SNPs. a set of SNPs as defined above and / or a set of nucleotide primers allowing specific amplification of each SNP of a set of SNPs as defined above. The nucleotide probes are advantageously immobilized on a support.

The term "specific", when referring to hybridization or amplification, refers to the fact that hybridization or amplification makes it possible to discriminate, according to the conditions used, between two alleles of an SNP. Thus, a specific probe of an allele hybridizes, under appropriate conditions, only to this allele. Similarly, a specific primer of an allele makes it possible, under appropriate conditions, to amplify only this allele.

Another subject of the present application resides in a product (or a device) comprising a support on which nucleotide probes are immobilized, said probes being specific for at least one allele, preferably each SNPs allele of a set of nucleotides. SNPs defined above. The support can be any solid or semi-solid support having at least one surface, flat or not, allowing the immobilization of nucleic acids or polypeptides. Such supports are for example a blade, ball, membrane, filter, column, plate, etc. They can be made of any compatible material, such as glass, silica, plastic, fiber, metal, polymer, polystyrene, teflon, etc. The reagents can be immobilized on the surface of the support by known techniques, or, in the case of nucleic acids, synthesized directly in situ on the support. Immobilization techniques include passive adsorption ³² Ia covalent bond. Techniques are described for example in WO90 / 03382, WO99 / 46403. The probes immobilized on the support can be ordered according to a pre-established scheme, to facilitate the detection and identification of formed complexes, and in a variable and adaptable density.

The invention can be used to determine the probability of membership of any individual to any ethnogeographic group, such as for example the European, Saharan African, Asian, North African and / or Indian populations, this list not being exhaustive.

Other aspects and advantages of the present invention will appear on reading the examples which follow, which should be considered as illustrative and not limiting.

EXAMPLES

1. Materials and Methods

1.1. Selection of markers

A comprehensive study of each chromosome was conducted on the National Center for Biotechnology Information (NCBI) SNPs database (dbSNP), searching for genotypic SNPs from the 3 populations of East Asia, Europe and from Africa those with the most difference between allelic frequencies. This resulted in the selection of about 100 SNPs distributed in the genome. Among these SNPs, the 32 with the highest discriminating power were retained.

The complete sequence of the 32 SNPs is presented in Table 1.

The selected genetic markers are all SNPs and are transitions or transversions. For the sake of French law (Article 706-54 of the Code of Criminal Procedure), all markers were chosen from non-coding regions of the genome. In addition, efforts have been made to select markers for as many chromosomes as possible and, in the case where they are located on the same chromosome, the most distant possible for a given marker group. The SNPs of our study were selected on the basis of their discriminant allelic frequency between the 3 continental populations of Africa, Asia and Europe.

Of the 32 SNPs selected, only the SNPs associated with the GUCY2D ¹⁹ , FY ³³ and LAC ²⁸ genes (respectively M31, M9, M32) have been described in the literature as having possible allelic discrimination between populations. However, no set of SNPs according to the invention has been described previously, making it possible to provide reliable information on the ethnogeographic membership of individuals.

Table 3 shows the distribution of the alleles of each marker among the different populations. We can distinguish 3 groups:

SNPs to distinguish Africans from Europeans and Asians: group Ml to M9

SNPs to distinguish Asians from Europeans and Africans: group MlO to M19

SNPs to distinguish Europeans from Africans AND Asians: group M20 to M32

We favored markers with maximal allelic differences between populations since, for biallelic markers such as SNPs, we can consider that the informativeness will be maximum if one of the alleles is limited to a single population. 1.2. DNA samples

The validation of the markers was carried out in two stages. Firstly, a first-order screening of 24 DNAs from 3 different continental origins (sub-Saharan Africa, East Asia and Europe: prescreening plate) allowed the selection of markers with the most discriminating allelic frequency between these three groups. The genetic markers selected for the study were then tested on a panel of 92 individuals per population.

In addition to validating our markers on the 3 populations mentioned above, we tested the DNA of 112 North African and 56 Indian individuals. All these individuals come from grandparents from the geographical area of interest. The DNAs were extracted from the donor blood using the Nucleon Bac2 kit (Amersham) and quantified by the Quantifiler kit (Applied Biosystems).

1.3. Validation of markers by Taqman

Taqman primers and probes for all markers were custom synthesized (Custom TaqMan® SNP Genotyping Assays, Applied Biosystems, Table 2) with the exception of the null GUCY2D and FY markers for which marketed TaqMan tests were used. SNP Genotyping Assays (respectively C_11951988_20, C1576961410, C321130820, Applied Biosystems).

For each of the tests, an allelic discrimination PCR was carried out in a reaction volume of 11 .mu.l in the presence of 5 ng of DNA, 5.5 .mu.l of Taqman Master Mix (Applied Bioystems) and 0.275 .mu.l of 4OX of Custom TaqMan® SNP. Genotyping Assays, ie 0.55 μl of 2OX of TaqMan® SNP Genotyping Assays. After amplification on thermal cycler 7500 (15 min at 95 ° followed by 40 cycles of 15 sec at 92 ° and 1 min at 60 °) the plate is read in end point, and all the data is interpreted using the software Sequence Detection System 1.2 (SDS 1.2, Applied Biosystems). Typical geno data for all SNPs tested are expressed as 11 (homozygous allele 1), 12 (heterozygous) and 22 (homozygous allele 2).

1.4. Estimation of Fst and analysis of ethno geographical inference:

Population differentiation for each locus was estimated using the fastening Fst Wright index calculated according to the formulas and Weir Cokerham ³⁴ ^'35. Software, based on Bayesian calculations, calculates the inference of individuals to an ethnogeographic group This program uses geno data typical of several independent loci to search for the population structure. For each individual, the software estimates the proportion of membership in relation to each population subjected to the calculation.

2. Results

2.1. Distribution of Fst

After the compilation of all the Taqman genotypes, the proportion of the variation attributed to the differences between populations (Fst) was calculated for each pair of population and for each locus (Figure 1), a high value of Fst reflects the a population shows a large difference in allelic frequency with respect to the other one or two (global Fst).

The differentiation between the populations of Europe and East Asia, as reflected by Fst Europe Asia, is obtained with the markers MlO to M32 while the loci Ml to M9 remain monomorphic in these two populations (Fst close or equal to 0). Discriminant loci have an interpopulation Fst greater than 0.4 apart from the M17 and M22 markers.

Europe differs very significantly from Africa by the markers M1 to M9 and M20 to M32, with Fst greater than 0.5 (except for the M20 to M23 loci) and this time the markers Ml0 to Ml9 present a very weak differentiation.

Markers M1 to M19 distinguish Africa from Asia, while values close to zero are observed for M20 to M32.

With regard to the global Fst, it is undeniably the markers M1 to M9 that have the highest Fst, greater than 0.70, which shows that for these markers a very small part of the observed variance (less than 30% ) is due to differences within populations. The markers MlO to M32 generally have an Fst of between 0.41 and 0.92. On the other hand, some values of Fst indicate a weak diversification for the markers M17, M21, M22 (Fst of 0.23 to 0.37). A hierarchy of the informativity of the markers can be established according to the values of Fst (Figure 2). Of the five groups of markers (Table 4), group I collects the most informative SNPs whose Fst are greater than 0.80. It can be noted that these are mainly markers discriminating Africa from Europe and Asia. Groups II to IV include markers whose Fst are between 0.40 and 0.80. The markers M17, M21 and M22 of the group V are the least discriminating with Fst less than 0.4.

2.2. Inference of ethno geographical origin

2.2.1.Inference with the 32 markers of groups I to V

All genotypes of the 32 markers on DNA from Africa (n = 115, DNA 1 to 115), Asia (n = 116, DNA 116 to 231) and Europe (n = 117, DNA 232 to 348) have been submitted to the Baysesian calculation software without the label of the population. . The assignment of each DNA of different origin to genetic groups is illustrated in Figure 3. The 3 genetic clusters defined by the algorithm separate the 3 populations of Sub-Saharan Africa, East Asia and Europe without ambiguity. It thus appears that the assignment and clustering based on the analysis of the 32 SNPs selected, are in perfect agreement with the ethnogeographic origin of the individuals.

The reliability of the test for samples from sub-Saharan Africa, East Asia and Europe is absolute since all submitted samples are correctly assigned.

2.2.2.Inference with the 9 markers of group I

We then tested whether group I markers with the highest informativeness were sufficient to determine ethnogeographic origins in Bayesian computation (Figure 4). Submission of the only genotypes of the M1, M2, M5, M6, M7, M9, M15, M24, M30 markers is sufficient to distinguish 3 genetic clusters within the 3 populations of different origin. The samples are all correctly assigned, even if the calculation with the only 9 SNPs in group I shows a slight proportion of admixture, especially among Asians, not observed with the 32 markers.

The 9 markers of group I are therefore sufficient to distinguish the 3 genetic clusters present in the populations of Africa, Asia and Europe. The use of the 32 markers M1 to M32 makes it possible to refine the proportions of membership of each individual. This is due to the fact that, among the 9 markers, only one allows discrimination Asia versus Africa / Europe (M15) and two differentiate Europe from Asia and Africa (M24 and M30).

The determination of the ethnogeographic origin of individuals with only 9 markers is therefore perfectly reliable, but at the individual level the proportion of belonging to the original group with all the markers is better.

2.3. Inference of the ethnogeographic origin of intermediate populations

After having validated our test on the 3 continental populations of Africa, Asia and Europe, we sought to determine the ethnogeographical origins of two "intermediate" populations by selecting individuals from North Africa (n = 15). ) and India (n = 56).

The genotypes obtained for the North African population were integrated with those of the 348

DNA from Africa, Asia and Europe and Bayesian calculations (Figures 5 and 6).

If we consider 3 genetic groups (Figure 5), the Bayesian algorithm separates the populations of Africa, Asia and Europe while the North African population presents itself as a mixture of the 3 preceding groups with a very strong predominance of Europe group. Some individuals, however, have a strong affiliation to the African genetic cluster (N ° 367-389-423-448-454). If this time we consider 4 genetic groups (Figure 6), the population of North Africa appears then as a separate genetic group, separate from the other 3. On average, 81.2% of the genetic pool of the North African population belongs to this fourth cluster, which underlines the entity of this group. It is interesting to note that this The fourth group is the result of a division of the Europe group and the individuals with strong mixtures with Africa generally retain the same admixture proportions (N ° 367-389-423-454). Despite this fourth cluster, some individuals still have a strong stronger sense of belonging to Europe (N ° 401-408-424-446). Similarly, a certain proportion of miscegenation with the fourth cluster, inherent to the strong similarity between certain North African and European genotypes, appears in certain European individuals (N ° 401-408-424-446). The inclusion of Indian samples in the previous 4 populations leads to a simulation of 519 samples (Figure 7) and leads to a new genetic cluster that defines the native Indian population. As for North Africa, the European component is very important for some individuals (N ° 466-475). It is very clear in Figure 7 that among the individuals in northern Africa and India, which remain characterized by a genetic cluster of their own, many have a very high percentage of membership in the other three. groups. This represents a very strong contrast with sub-Saharan, Asian and European individuals.

3. Discussion

We have shown by the analysis of 32 SNPs on a collection of DNA from sub-Saharan Africa, East Asia and Europe that the genetic data make it possible to assign with total reliability the samples to corresponding groups major continental areas. Previous studies had shown that a minimum of 60 AIu or microsatellite sequences ³⁶ or very recently 10 SNPs ³⁷ were necessary for proper sample assignment. In our case, the use of the 9 markers with the highest Fst values is sufficient to determine the ethnogeographic origin of the individuals. In addition, the total number of markers remains very modest and easy to implement from a technical point of view. Moreover, it appears that even if the markers of the invention have been selected for their discriminant allelic frequencies between the populations of Africa, Asia and Europe, they also make it possible to distinguish the populations of North Africa and Africa. 'India. These last two populations appear as intermediaries with the Asian populations and Europe because of the selected markers and the respective settlement histories. In the case of an analysis comprising 3 genetic groups, North Africa is assigned mainly to Europe while India is included in the Asia cluster.

Table 1 List of SNPs

Ref SNP ID: rs2814778 allele Organism: Human (HOT, O ^ ag e _f / 7s) Alleles: A / G MOLECULE TYPE: Genomic

GGCTGTCAGCGCCTGTGCTTCCAAG \ TAAGAGCCAAGGACTAATGAGGGCC

Ref SNP ID: rs2816 Allele Organism: human

Alleles: C / T Type of molecule: Genomic

ACACTGCATTGCTGGGCTGTGTTCC CGGGCTCTTCTGGACCTTGCACCGT

Ref SNP ID: rs 182549 Allele Organism: human (U_ύ_mn_ sdf_> jύfi Alleles: C / T

Molecule type: Genomic

actgggacaaaggtgtgagccaccg> _ ^Λ gcccagctGAGAATGCTGTTTTTAA

Ref SNP ID: rs2335853 Allele Organism: human (Homo sap, ens) Alleles: A / G Molecule type: Genomic

CTCGTTAATGGGTACTCAGTGAATC v CAAACTCTTAAGGATAGAAAGGGGT

Ref SNP ID: rs2495813 Allele Organism: human (Hθ; no_Srfj) / g »f_) Alleles: C / T Type of molecule Genomic

ACATTTCTTGTACTCAGGGCTGGTG TATGGGAGAGCTGGAGGTTGCTGTC Ref SNP ID: rs857455 Allele Organism: human

Alleles: A / G Type of Molecule Genomics

aaaatctttgtgaaagtttctctgt ^ v> tggagataaaaagatggtacccgtg

Ref SNP ID: rs7326934 Allele Organism: human (HGIVQ sapiens) Alleles: C / G Type of Molecule Genomic

GTGATTTCAAGCATCCTGATTTACA • • TTGCTCACTCAGCCACTCAGAGATG

Ref SNP ID: rs17031237 Allele Organism: human (Homo sapiens) Alleles: C / G Molecule type Genomic

GCGAGCACCAGAAATGACAGGCTCA>>> TGGGGACACGGCAGATAGGTCCCCG

Ref SNP ID: rs8079412 Allele Organism: human {Homo sapiens) Alleles: A / T Molecule type Genomic

TGATTATTTCCATTTCACTGATGAG ^ TATACAGTCCCAGGAAGGGCAGGTG

Ref SNP ID: rs17092950 Allele Organism: human (H_Grno_sag_ {eπs) Alleles: C / G Type of Molecule Genomic

AGCTCACTAGACTACAGGT AAGGAG v v> AGACAGAC AGTAAAC AAATCATGGA

Ref SNP ID: rs12261591 Allele Organism: human Alleles: C / T Molecule type Genomic CTAGGGCAAATGAAAGAGGGAAACA GGATGGCATGGATGCTTTCAGAAGA

Ref SNP ID: rs522153 Allele Organism: human (Hcnc sapiens) Alleles: A / G Molecule type Genomic

AGTGACAGATAAAGTGAAGGGCAAT V ATTTCTGACATTTGCTGCCAGGATC

Ref SNP ID: rs4427950 Allele Organism: human (honio sapiens) Alleles: C / T Type of Molecule Genomic

TGCTGTGTGTTACAATAGCCCTACA>. AGGCTTTGGAAACAATAACACAACC

Ref SNP ID: rs2007542 Allele Organism: human (Hcnc sapiens) Alleles: C / T Type of molecule Genomic

CTCAACAGGGGTTCTGATGATTTGC ATctgcagttatctggagacttgag

Ref SNP ID: rs1389600 Allele Organism: human (U. U. U.M.M.RTM) Alleles: GfT Molecule Type Genomic

TTAGTAAGGTGGAAGAAGACCCTAT CAATGGGTGGCACTATCTCCATATT

Ref SNP ID: rs12594144 Allele Organism: human (honio sapiens) Alleles: A / C Molecule type Genomic

CAGTCTGGGTCCTAATTGTTTGTGA ^ v. TCTTTTTCAGGGTGGGAGCAGGGTG

Ref SNP ID: rs4830702 Allele Organism: human (Ho ₁ HO sapiens) Alleles: A / G Genomic molecule type

GTGAGGGGAGAGCTGCTTCAGACGA - i. GGTGAGGAGTGACATGGACAGTGTG

Ref SNP ID: rs10918999 Allele Organism: human {Homo sapiens) Alleles: C / T Type of Molecule Genomic

TTTGATTGGATTTCCATTTTCAGGG _L <1 ATAATCCATTTTCAAGATGTATCAA

Ref SNP ID: rs16867765 Allele Organism: human (/ _. / (; Mo s ^ g / cf; s) Alleles: A / T Type of Molecule Genomic

TCAAGATCTGTCACGGGAAGAATTT _^ ^ 1 AAAAAACTGGCGGCTAAGCAGAATG

Ref SNP ID: rs16938528 Allele Organism: human {Homo sapiens) Alleles: A / G Type of Molecule Genomic

GTTTCACATTAGCGATAACGAGAGA V ^ CTGGTGAGATCTTCTTCCCAGAATG

Ref SNP ID: rs10842028 Allele Organism: human {Homo sapiens) Alleles: C / T Type of Molecule Genomic

CACTCTCACCTTGGTTAGGCCTGTG v. 1 GTCTCTCATAGATCCTTGTTACAGC

Ref SNP ID: rs4414866 Allele Organism: human (HGIVQ sapiens) Alleles: C / T Molecule type Genomic

ATTTCTTTGTATTGTTTTCTCCCAG -; GATGCAAATTATATTAAATATAATA Ref SNP ID: rs1297321 Allele Organism: human (Homo sapiens) Alleles: A / G Molecule type Genomic

CTGAAACCATCAGATAACACAAATC [A / G] TGATGGCTAAAATACATTGTTGAAC

Ref SNP ID: rs7161203 Allele Organism: human (Homo sapiens) Alleles: A / C Molecule type Genomic

TATCTAGCTTAGAACATCCCTAAGA [A / C] GTCAGTTGTTCATATTTTGACAGCA

Ref SNP ID: rs1441098 Allele Organism: human (Homo sapiens) Alleles: A / T Molecule type Genomic

CGCTTGCCAAGAGTGTGGAATCTCA [A / T] TTCTTCCCACCTTCCTACCATCTTT

Ref SNP ID: rs35397 Allele Organism: human (Homo sapiens) Alleles: G / T Molecule type Genomic

TCAGTGTCTTCACAGCTGCAACTTA [G / T] GTAAGTGGAGGTTAAGAGGCTCAGA

Ref SNP ID: rs10189663 Allele Organism: human (Homo sapiens) Alleles: A / T Molecule type Genomic

TTCTTCTTCCATAAAATGCACCACC [AT / T] TGGACAGTCAAAAGAAGTAATTTAA

Ref SNP ID: rs260692 Allele Organism: human (hcrπo sapiens) Alleles: C / T Type of molecule Genomic

AATGTTTGGAAATAATTCCACAAAC V GTGTAGCATGACAAAAACATACTTA

Ref SNP ID: rs7866023 Allele Organism: human (H_Grno_scβ> _ {f_πs) Alleles: C / T Type of Molecule Genomic

GGAGTAGAAACTACTCTCTGCAGCA GTACTTTCATTTTATACCCTACCAG

Ref SNP ID: rs5981317 Allele Organism: human (H_Grno_scβ> _ {f_πs) Alleles: C / T Type of molecule Genomic

TTACGCACTGCCTAGAGTACAGCTA •, ⁿ GAAGACAATTTTCTAATTCACAGAA

Ref SNP ID: rs973649 Allele Organism: human (Ho ₁ HO sapiens) Alleles: C / T Molecule type Genomic

tatagcacacagcgctcaaaagata \. ctgtGagccaggtgtgctggccctg

Ref SNP ID: rs2413887 Allele Organism: Human (Uf 'iïlQ Sjβif _{_ns..)} Alleles: C / T type molecule Genomics

GAATCAGTTTTATAACTGGGGACTT v TGTTTTTAATAATATTTTGTTATTA Table 2. List of primers and probes used in custom synthesized Taqman assays.

Table 3. List of SNPs in the study and segregation of alleles by population

Table 4. Classification of the informativity of the markers according to their values Fst.

Table 5 Examples of reference profiles characteristic of populations.

Bibliographical references

I Jeffreys AJ, Wilson V, Thein SL: Individual-specific 'fingerprints' of human DNA. Nature 1985; 316: 76-79. 2 Jeffreys AJ, Wilson V, Thein SL: Hypervariable 'minisatellite' regions in human

DNA. Nature 1985; 314: 67-73.

3 Jeffreys AJ: Genetic fingerprinting. Nat Med 2005; 11: 1035-1039.

4 Jobling MA, GiI P: Encoded evidence: DNA in forensic analysis. Nat Rev Genet 2004; 5: 739-751. 5 Lewontin RC: The Apportionment of Human Diversity. Evol. Biol 1972; 6: 381-

398.

6 Barbujani G, Magagni A, Minch E, Cavalli-Sforza LL: An apportionment of human DNA diversity. Proc Natl Acad Sci USA 1997; 94: 4516-4519.

7 Tishkoff SA, Verrelli BC: Patterns of human genetic diversity: implications for human evolutionary history and disease. Annu Rev Genomics Hum Genet 2003;

4: 293-340.

8 Pakendorf B, Stoneking M: Mitochondrial DNA and human evolution. Annu Rev Genomics Hum Genet 2005; 6: 165-183.

9 The International HapMap Project. Nature 2003; 426: 789-796. Altshuler D, Brooks LD, Chakravarti A, Collins FS, DaIy MJ, Donnelly P: A haplotype map of the human genome. Nature 2005; 437: 1299-1320.

Hinds DA, Stuve LL, Nilsen GB and Whole-genome patterns of common DNA variation in three human populations. Science 2005; 307: 1072-1079.

12 Wall JD, Pritchard JK: Haplotype blocks and disequilibrium linkage in the human genome. Nat Rev Genet 2003; 4: 587-597.

13 DA Hinds, Stokowski RP, Patil N and Matching Strategies for Genetic Structures. Am J Hum Genet 2004; 74: 317-325.

14 Campbell CD, Ogburn EL, KL Lunetta and Demonstrating Stratification in a European American population. Nat Genet 2005; 37: 868-872.

Freedman ML, Reich D, Penney KL and Assessing the impact of population stratification on genetic association studies. Nat Genet 2004; 36: 388-393.

16 Chakraborty R, Weiss KM: Admixture as a tool for finding genes and detecting that difference from allelic association between loci. Proc Natl Acad Sci USA 1988; 85: 9119-9123.

17 Smith MW, O'Brien SJ: Mapping by admixture disequilibrium linkage: advances, limitations and guidelines. Nat Rev Genet 2005; 6: 623-632.

18 Chakraborty R, Weiss KM: Frequencies of complex diseases in hybrid populations. Am JPhys Anthropol 1986; 70: 489-503. 19 Shriver MD, Parra EJ, Dios S and Skin pigmentation, biogeographical ancestry and admixture mapping. Hum Genet 2003; 112: 387-399.

20 Kittles RA, Weiss KM: Race, ancestry, and genes: implications for defining disease risk. Annu Rev Genomics Hum Genet 2003; 4: 33-67.

Rana BK, Hewett-Emmett D, Jin L and High polymorphism at the human melanocortin 1 receptor locus. Genetics 1999; 151: 1547-1557.

22 Hollox EJ, Poulter M, Zvarik M and Lactase haplotype diversity in the Old World. Am J Hum Genet 2001; 68: 160-172. 23 Hamblin MT, Thompson EE, Di Rienzo A: Complex signatures of natural selection at the Duffy blood group locus. Am J Hum Genet 2002; 70: 369-383.

24 Jobling MA, Hurles ME, Tyler-Smith C: Human evolutionary genetics. Origins, peoples and disease. Garland Science, 2004. 25 Wellems TE, Fairhurst RM: Malaria-protective traits at odds in Africa? Nat Genet 2005; 37: 1160-1162.

26 Kwiatkowski DP: How malaria has affected the human genome and what human genetics can teach us about malaria. Am J Hum Genet 2005; 77: 171-192.

27 Bersaglieri T, Sabeti PC, Patterson N and Genetic signatures of strong recent positive selection at the lactase gene. Am J Hum Genet 2004; 74: 1111-1120.

28 Mulcare CA, ME Weale, AL Jones and The T allele of a single-nucleotide polymorphism 13.9 kb upstream of the lactase gene (LCT) (C-13.9kbT) does not predict the cause of lactase-persistence phenotype in Africans. Am J Hum Genet 2004; 74: 1102-1110. 29 Hollox E: Evolutionary genetics: the genetics of lactase persistence-fresh lessons in the history of milk drinking. Eur JHum Genet 2005; 13: 267-269.

30 Akey JM, Zhang G, Zhang K, L Jin, Shriver MD: High-density SNP map for signatures of natural selection. Genome Res 2002; 12: 1805-1814.

31 The use of racial, ethnic, and ancestral categories in human genetics research. Am J Hum Genet 2005; 77: 519-532.

32 Inouye S, Hondo R: Microplate hybridization of amplified viral DNA segment. J Clin Microbiol 1990; 28: 1469-1472.

33 Parra EJ, Marcini A, Akey J and Estimating African American admixture proportions by use of population-specific alleles. Am J Hum Genet 1998; 63: 1839-1851.

34 Weir BS, Cockerham CC: Estimating F-statistics for the analysis of population structure. Evolution 1984; 38: 1358-1370.

Weir BS, Hi WG: Estimating F-statistics. Annu Rev Genet 2002; 36: 721-750.

36 Bamshad MJ, Wooding S, WS Watkins, Ostler CT, Batzer MA, Jorde LB: Human population genetic structure and inference of group membership. Am J

Hum Genet 2003; 72: 578-589.

37 Lao O, van Duijn K, Kersbergen P, Knijff P, Kayser M: Proportioning Whole-Genome Single-Nucleotide-Polymorphism Diversity for the Identification of Geography Population Structure and Genetic Ancestry. Am J Hum Genet 2006; 78: 680-690.

Claims

A method for determining the ethnogeographic origin of an individual, comprising determining, in a nucleic acid sample, preferably DNA, from the individual, alleles of a set of nucleotide polymorphisms (SNPs) ) located in non-coding regions of the genome, to obtain a set of alleles, this set of alleles being an indication of the ethnogeographic group to which the individual belongs, the set of SNPs comprising at least 5 SNPs selected from the SNPs identified in Table 1.

2. Method according to claim 1, characterized in that the set of SNPs comprises at least 6, 7, 8, 9, 10, 15, 20, 25 or 30 SNPs identified in Table 1.

3. Method according to claim 1 or 2, characterized in that the set of SNPs comprises all 32 SNPs mentioned in Table 1.

4. Method according to claim 1 or 2, characterized in that the set of SNPs comprises at least the 9 SNPs M1, M2, M5, M6, M7, M9, M15, M24 and M30 as defined in Table 3.

5. Method according to claim 1 or 2, characterized in that the set of SNPs comprises

at least 5 SNPs among the SNPs M1, M2, M3, M4, M5, M6, M7, M8 and M9 shown in Table 3; or - at least 5 SNPs among the SNPs M20, M21, M22, M23, M24, M25, M26, M27,

M28, M29, M30, M31 and M32 shown in Table 3; or

at least 5 SNPs among the MlO, MiI, M12, M13, M14, M15, M16, M17, M18 and M13 SNPs shown in Table 3.

6. Method according to one of claims 1 to 5, characterized in that the set of alleles determined from the nucleic acid sample of the individual is compared to one or more sets of reference alleles. group characteristics ethnogeographic, and in that the probability of belonging to one of these groups is calculated.

7. Method according to claim 6, characterized in that the sets of reference alleles are characteristic sets of the European, Saharan, Asian, North African and / or Indian populations.

8. Method according to claim 6 or 7, characterized in that the probability of membership of the individual to one of the reference groups is calculated by a Bayesian method.

9. Method according to any one of the preceding claims, characterized in that the alleles are determined by sequencing, selective hybridization and / or selective amplification.

10. Method according to any one of the preceding claims, characterized in that the nucleic acid sample comes from a fluid or biological tissue of the individual.

11. Method according to any one of the preceding claims, characterized in that the nucleic acid sample comes from a medico-legal sample.

12. Method according to any one of the preceding claims, characterized in that the nucleic acid from the individual is amplified beforehand.

Kit for the implementation of a method according to any one of claims 1 to 12, comprising a set of nucleotide probes specific for at least one allele, preferably each allele, SNPs of the set of SNPs and / or a set of nucleotide primers for specific amplification of at least one allele, preferably each SNPs allele of the set of SNPs.

14. Kit according to claim 13, characterized in that the nucleotide probes are immobilized on a support.

15. A product comprising a support on which nucleotide probes are immobilized, said probes being specific for each SNPs allele of a set of SNPs defined in any one of claims 1 to 5.

16. Use of a set of nucleotide probes specific for an allele at least SNPs of a set of SNPs defined in any one of claims 1 to 5 and / or a set of nucleotide primers for amplification specific to each SNP of a set of SNPs defined in any one of claims 1 to 5 for determining in vitro the ethnogeographic origin of an individual.

17. Nucleic acid comprising a sequence chosen from the sequences given in Table 2.