WO2002020835A2 - Etude genetique - Google Patents
Etude genetique Download PDFInfo
- Publication number
- WO2002020835A2 WO2002020835A2 PCT/GB2001/003970 GB0103970W WO0220835A2 WO 2002020835 A2 WO2002020835 A2 WO 2002020835A2 GB 0103970 W GB0103970 W GB 0103970W WO 0220835 A2 WO0220835 A2 WO 0220835A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- haplotype
- haplotypes
- individuals
- individual
- frequencies
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
Definitions
- the invention relates to a method of performing a genetic study.
- Association studies are performed to determine whether a particular region of the genome contributes to a phenotype. These studies are based on the detection of a correlation (association) between the presence of a polymorphism in the genomic region and a change in the phenotype. The phenotype may be predisposition or susceptibility to a particular disease or response to medication. Association studies can therefore be used to determine whether a particular gene is relevant in a disease or whether a particular polymorphism causes or contributes to the disease.
- the invention provides a method of performing an association study comprising:
- Figure 1 shows the SNPs of the NAT2 gene.
- Figure 2 shows the SNPs spanning 140 b on chromosome X.
- the invention provides a method for performing an association study in which a correlation between a computed haplotype and a phenotype is analysed. If such a correlation is detected then this indicates that. the region in which the haplotype occurs is able to affect (cause or contribute) to the phenotype.
- the haplotype is defined by particular nucleotides at particular positions of the chromosome, and comprises at least 2 or more polymorphic regions (e.g. single nucleotide polymorphisms (SNP's) or microsatellites) typically in linkage disequilibrium with each other. Typically the haplotype comprises at least 2, 3, 5, 10 or more SNP's.
- SNP's single nucleotide polymorphisms
- the alleles of polymorphisms which are in linkage disequilibrium with each other in a population tend to be found together more often than expected on the same chromosome.
- all of the remaining polymorphisms of the haplotype will be present on the chromosome at least 30% of the times, for example at least 40 %, 50%, 70% or 90, of the time any of the polymorphisms of the haplotype is present in the chromosome.
- the allele frequency of each of the polymorphisms in the haplotype generally varies from 1% to 50%.
- the frequency of the haplotypes defined by the polymorphisms will generally be 1% to 99%.
- any of the polymorphisms in the haplotype may also be present on chromosomes in the absence of the remaining polymorphisms of the haplotype, or in the form of a different haplotype.
- at least 2 for example at least 5, 10, 100, 1000, 10 4 , 10 5 , 10 6 , 10 8 or more polymorphisms are analysed in the study.
- Polymorphisms which are in linkage disequilibrium are typically within 500kb, preferably within 400kb, 200kb, 100 kb, 50kb, lOkb, 5kb or 1 kb of each other, and thus typically the two polymorphisms in the haplotype which are most distant from each other will be within any of these distances from each other.
- Each of the polymorphisms is an insertion, deletion or substitution of a nucleotide.
- the polymorphism may be an A, T, C or G.
- the haplotype is typically in (or at least in the vicinity of) of a gene which expresses a product.
- the haplotype may or may not cause a different RNA (e.g. mRNA) or protein product to be expressed from the gene.
- the haplotype may be 5' to the coding region (e.g. within the promoter), in the coding region, in an intron or 3' to the coding region.
- the haplotype may stretch across more than one of these regions of the gene, or may stretch across a region which contains more than one gene.
- the polymorphism information which is used to deduce the haplotypes present in the population is generally in the form of specifying the nucleotide (including a deletion and or insertion) and its position on the chromosome.
- the polymorphism information typically details the polymorphisms present in at least 1 kb, 10 kb, 50 kb,
- polynucleotide 100 kb, 1 Mb, 10 Mb, 100 Mb or more of polynucleotide. These specified numbers of bases may or may not be fully contiguous, i.e. there may be sequence within the regions for which no polymorphism information is available.
- the polymorphism information is from* more than one chromosome (i.e. from more than one pair of homologous chromosomes), and in a preferred embodiment the association study is a genome wide study, so that the polymorphism information is from all the chromosomes. Thus the study does not need to be performed on a predetermined locus.
- the genetic information contains incomplete or no phase information for the polymorphisms.
- the position of the polymorphisms is known but it is not known on which chromosome (of the two homologous chromosomes) the polymorphism is present. Therefore it is not possible to determine which alleles of the polymorphisms are present on the same chromosome.
- the haplotype analysis algorithms mentioned below may be used to deduce the phase information and thus determine the haplotypes present in the population.
- the polymorphism information may be from known or unknown genes.
- the polymorphisms which are analysed may be in genes which have not been fully defined (in terms of sequence or function).
- the polymorphism information may be obtained from a database which contains the results of the genetic typing of the individuals being analysed in the study.
- the genetic typing comprises detecting the presence of a polymorphism in a region of the genome.
- the presence of the polymorphism is determined in a method that comprises contacting a polynucleotide of the individual with a specific binding agent for the polymorphism and determining whether the agent binds to the region of the polynucleotide which may contain the polymorphism, the binding of the agent to the polymorphism indicating that the individual carries the polymorphism.
- the agent will also bind to flanking nucleotides on one or both sides of the polymorphism, for example at least 2, 5, 10, 15 or more flanking nucleotides in total or on each side.
- the agent may be a polynucleotide (single or double stranded) typically with a length of at least 10 nucleotides, for example at least 15, 20, 30 or more polynucleotides.
- the agent may be molecule which is structurally related to polynucleotides that comprises units (such as purines or pyrimidines) able to participate in Watson-Crick base pairing.
- a polynucleotide agent which is used in the method will generally bind to the polymorphism, and flanking sequence, of the polynucleotide of the individual in a sequence specific manner (e.g. hybridise in accordance with Watson-Crick base pairing) and thus typically has a sequence which is fully or partially complementary to the sequence of the polymorphism and flanking region.
- the agent is a probe.
- This may be labelled- or may be capable of being labelled indirectly.
- the detection of the label may be used to detect the presence of the probe on (and hence bound to) the polynucleotide of the individual.
- the binding of the probe to the polynucleotide may be used to immobilise either the probe or the polynucleotide (and thus to separate it from one composition or solution).
- the polynucleotide of the individual is immobilised on a solid support and then contacted with the probe.
- the presence of the probe immobilised to the solid support (via its binding to the polymorphism) is then detected, either directly by detecting a label on the probe or indirectly by contacting the probe with a moiety that binds the probe.
- the solid support is generally made of nitrocellulose or nylon.
- the method may be based on an oligonucleotide ligation assay in which two oligonucleotide probes are used. These probes bind to adjacent areas on the polynucleotide which contains the polymorphism, allowing (after binding) the two probes to be ligated together by an appropriate ligase enzyme. However the two probes will only bind (in a manner which allows ligation) to a polynucleotide that contains the polymorphism, and therefore the detection of the ligated product may be used to determine the presence of the polymorphism.
- the probe is used in a heteroduplex analysis based system to detect polynucleotide polymorphisms.
- a heteroduplex structure can be detected by the use of an enzyme which single or double strand specific.
- the probe is an RNA probe and the enzyme used is RNAse H which cleaves the heteroduplex region, thus allowing the polymorphism to be detected by means of the detection of the cleavage products.
- the method may be based on fluorescent chemical cleavage mismatch analysis which is described for example in PCR Methods and Applications 3, 268-71 (1994) and Proc. Natl. Acad. Sci. 85, 4397-4401 (1998).
- the polynucleotide agent is able to act as a primer for a PCR reaction only if it binds a polynucleotide containing the polymorphism (i.e. a sequence- or allele-specific PCR system).
- a polynucleotide containing the polymorphism i.e. a sequence- or allele-specific PCR system.
- a PCR product will only be produced if the polymorphism is present in the polynucleotide of the individual.
- the presence of the polymorphism may be determined by the detection of the PCR product.
- the region of the primer which is complementary to the polymorphism is at or near the 3' end of the primer.
- the polynucleotide agent will bind to the wild-type sequence (in which the polymorphism is not present) but will not act as a primer for a PCR reaction.
- the method may be an RFLP based system. This can be used if the presence of the polymorphism in the polynucleotide creates or destroys a restriction site which is recognised by a restriction enzyme. Thus treatment of a polynucleotide with such a polymorphism will lead to different products being produced compared to the corresponding wild-type sequence. Thus the detection of the presence of particular restriction digest products can be used to determine the presence of the polymorphism.
- the presence of the polymorphism may be determined based on the change which the presence of the polymorphism makes to the mobility of the polynucleotide during gel electrophoresis, e.g. single-stranded conformation polymorphism (SSCP) analysis may be used.
- SSCP single-stranded conformation polymorphism
- Denaturing gradient gel electrophoresis is a similar system where the polynucleotide is electrophoresed through a gel with a denaturing gradient, a difference in mobility compared to the corresponding wild-type polynucleotide indicating the presence of the polymorphism.
- the presence of the polymorphism may be determined using a fluorescent dye and quenching agent-based PCR assay such as the Taqman PCR detection system.
- This assay uses an allele specific primer comprising the sequence around, and including, the polymorphism.
- the specific primer is labelled with a fluorescent dye at its 5' end , a quenching agent at its 3' end and a 3' phosphate group preventing the addition of nucleotides to it. Normally the fluorescence of the dye is quenched by the quenching agent present in the same primer.
- the allele specific primer is used in conjunction with a second primer capable of hybridising to either allele 5' of the polymorphism.
- Taq DNA polymerase adds nucleotides to the non-specific primer until it reaches the specific primer. It then releases polynucleotides, the fluorescent dye and quenching agent from the specific primer through its endonuclease activity. The fluorescent dye is therefore no longer in proximity to the quenching agent and fluoresces.
- the mismatch between the specific primer and template inhibits the endonuclease activity of Taq and the fluorescent dye is not release from the quenching agent. Therefore by measuring the fluorescence emitted the presence or absence of the polymorphism can be determined.
- the phenotype information may be obtained from a database that contains the results of measurement of phenotypes in the individuals in the study.
- the phenotypes may be discrete phenotypes (such as the presence or absence of a trait) or a continuous phenotype (represented by the magnitude of a trait).
- the phenotype is related to a disease, such as the presence or absence of the disease.
- the phenotype may be presence, or magnitude of, a symptom.
- the phenotype may be susceptibility (predisposition) to the disease.
- the phenotype may be the response to medical or pharmaceutical treatment.
- the disease may be one which substantially only has a genetic component, or one that also has an environmental component. Typically in the disease a particular gene product (such as a protein) is either not expressed, expressed at a reduced or elevated level, or expressed in a form which is functionally deficient.
- the disease may be one which is caused by a pathogen, such as a virus or bacterium.
- the disease may be one which is caused by an immune response, such as an auto-immune disease.
- the disease may be a cancer.
- the disease may be caused by an abnormality in or damage to a particular organ, body sytem, tissue or cell type.
- the disease is typically caused by old age, stress, or a diet with high levels of lipid or carbohydrate.
- the disease may be a neurodegenerative, neurological, cardiovascular, inflammatory, psychiatric respiratory or metabolic.
- the phenotype may be a particular response to a therapeutic agent, such as a therapeutic effect or a deleterious side effect.
- the individuals in the association study have different phenotypes, i.e. have differences in the trait which is being studied.
- the . study will comprise at least 50, 100, 500, 1000, 2000, 5000 or more individuals.
- the study contains less than 50%, such as less than 20% or 10% of first degree relatives.
- the study is performed in the form of a case-control study.
- This study method generally comprises individuals who have a particular phenotype (cases) and individuals who do not have the phenotype (controls).
- Other study formats include triads (parent and child), sib pairs, other small nuclear families (such as parent with two or more children) or families with extended relatives.
- the cases may be defined as possessing the trait to the level of a threshold magnitude, i.e. more than or less than a particular level. Controls would then be defined as not possessing the trait to the extent of the threshold value. In the case of more than one case group more than one threshold value would be used.
- the haplotypes may be computed by any suitable algorithm.
- the algorithm is generally able to deduce whether any of the types of haplotypes mentioned above are present.
- the algorithm is typically one which is able to predict the haplotype of unrelated individuals (and generally also the haplotype frequency) for which the polymorphism phase information is not available or is incomplete.
- Such an algorithm typically performs the following steps: (i) assigning initial haplotype frequencies by sampling them at random from an appropriate distribution; (ii) resolving unambiguous haplotypes by identifying individuals who are either homozygotes or single site-site heterozygotes across a defined region of genome (the "scan window"), and calculating their contribution to the corresponding haplotype classes; (iii) for each multiple heterozygous (ambiguous) individual, calculating expected contributions to all haplotype types compatible with its genotype; (iv) updating haplotype frequencies by counting each gamete type across individuals and dividing counts by twice the sample size; and (v) iterating steps (ii) through (iv) until frequencies stabilise.
- Suitable algorithms which infer haplotype frequency when only single-locus genotypes are scored are known in the art. In these situations, individuals that are heterozygous for more than one locus convey ambiguous information about the gametic phase and missing data techniques, such as the E-M algorithm, formalized by Dempster et al. (1977) Journal of the Royal Statistical Society B39, 1-38 are appropriate. Hill (1974) Heredity 33, 229-39 gave a cubic equation for the maximum likelihood estimate of a gametic frequency for the case of two loci and two , alleles and proposed an iterative E-M solution.
- the Bayesian statistical method described in Stephens et al (2001) Am. J. Hum. Gen, 68, 978-89 is used for haplotype reconstruction and haplotype frequency determination.
- the haplotype analysis for a continuous trait is generally performed by: selecting a subset of markers from the set of markers that may correlate with the continuous trait; for each individual, obtaining a value of the continuous trait and a pair of alleles for each of the markers in the subset of markers; for each individual, determining probabilities of haplotypes that are compatible with the alleles in the subset of markers; and performing a regression on the probabilities of haplotypes that are compatible with the alleles in the subset of markers, for all the individuals, to determine correlations between the continuous trait and the haplotypes.
- the method of performing regression may comprise for each individual, sampling a first haplotype from the haplotypes that are compatible with the individual's set of alleles, to thereby define a second haplotype which is determined by the sampling of the first haplotype; assigning the value of the continuous trait for the individual to both the first haplotype and the second haplotype, to thereby define a doubled sample size; performing an analysis of variance by comparing average values of the trait among the sampled first and second haplotypes for all the individuals; repeating the steps of sampling, assigning and performing to obtain a distribution of correlations of the continuous trait and the haplotypes; and determining a value from the distribution that identifies a significance of the correlation.
- the step of performing an analysis of variance may comprise: defining a design matrix of first and second indicator values having two rows for each individual, where the second indicator value is associated with the first and second haplotypes and remaining positions in the design matrix are set to the first indicator value in the two rows; and performing a regression on the design matrix, to thereby identify a correlation value between the value of the continuous trait and the first and second haplotypes.
- the step of determining a value may comprise: determining a median from the distribution that identifies a significance of the correlation.
- the step of performing regression may comprise: for each haplotype in the set, assigning a rank of significance; for each individual, sampling a first haplotype from the haplotypes that are compatible with the individual's set of alleles, to thereby define a second haplotype which is determined by the sampling of the first haplotype; assigning the value of the continuous trait for the individual to both the first haplotype and the second haplotype, to thereby define a doubled sample size; performing a one degree of freedom regression on the ranks for the sampled first and second haplotypes for all the individuals; repeating the steps of sampling, assigning the value of the continuous trait and performing a one degree of freedom regression to obtain a distribution of the correlation of the continuous trait and the haplotypes; and deterniining a value from the distribution that identifies a significance of the correlation.
- the step of performing a one degree of freedom regression may comprise: defining a design matrix having two columns of the ranks of the first and second haplotypes and having two rows for each individual; and performing a regression on the design matrix, to thereby identify a correlation value between the value of the continuous trait and the haplotypes.
- the step of determining a value may again comprise: determining a median from the distribution that identifies a significance of the correlation.
- the step of performing regression may comprise: relating the value of the continuous trait for each individual to a vector of estimated frequencies of all haplotypes; and performing a multiple regression of the trait values on the vectors of estimated frequencies.
- the step of. determining may comprise performing an expectation- maximization.
- the analysis comprises: for each individual, determining probabilities of haplotypes that are compatible with the alleles in the subset of markers; and performing a regression on the probabilities of haplotypes that are compatible with the alleles in the subset of markers, for all the individuals, to determine correlations between the continuous trait and the haplotypes.
- the step of performing regression may comprise: for each individual, sampling a first haplotype from the haplotypes that are compatible with the individual's set of alleles, to thereby define a second haplotype which is determined by the sampling of the first haplotype; assigning the value of the continuous trait for the individual to both the first haplotype and the second haplotype, to thereby define a doubled sample size; performing an analysis of variance by comparing average values of the trait among the sampled first and second haplotypes for all the individuals; repeating the steps of sampling, assigning and performing to obtain a distribution of correlations of the continuous trait and the haplotypes; and determining a value from the distribution that identifies a significance of the correlation.
- the step of performing analysis of variance may comprise: defining a design matrix of first and second indicator values having two rows for each individual, where the second indicator value is associated with the first and second haplotypes and remaining positions in the design matrix are set to the first indicator value in the two rows; and perfo ⁇ riing a regression on the design matrix, to thereby identify a correlation value between the value of the continuous trait and the first and second haplotypes.
- the step of determining a value may comprise: determining a median from the distribution that identifies a significance of the correlation.
- the step of performing regression may comprise: for each haplotype in the set, assigning a rank of significance; for each individual, sampling a first haplotype from the haplotypes that are compatible with the individual's set of alleles, to thereby define a second haplotype which is determined by the sampling of the first haplotype; assigning the value of the continuous trait for the individual to both the first haplotype and the second haplotype, to thereby define a doubled sample size; performing a one degree of freedom regression on the ranks for the sampled first and second haplotypes for all the individuals; repeating the steps of sampling, assigning the value of the continuous trait and performing a one degree of freedom regression to obtain a distribution of the correlation of the continuous trait and the haplotypes; and determining a value from the distribution that identifies a significance of the correlation.
- the step of performing a one degree of freedom regression may comprise: defining a design matrix having two columns of the ranks of the first and second haplotypes and having two rows for each individual; and performing a regression on the design matrix, to thereby identify a correlation value between the value of the continuous trait and the haplotypes.
- the step of determining a value may comprise: determining a median from the distribution that identifies a significance of the correlation.
- the step of performing regression may comprise: relating the value of the continuous trait for each individual to a vector of estimated frequencies of all haplotypes; and performing a multiple regression of the trait values on the vectors of estimated frequencies.
- the step of determining may comprise performing an expectation- maximization.
- haplotype frequencies can be estimated through expectation- maximization (E-M), and each individual in a sample is expanded into all possible haplotype configurations with corresponding probabilities.
- E-M expectation- maximization
- a subset of markers may be selected from the set of markers that may correlate with the continuous trait.
- the selection of a subset of markers may be determined empirically and/or theoretically based on available literature, studies and/or other teclmiques. The selection of a subset of markers that may correlate with the continuous trait is well known to those having skill in the art and need not be described further herein.
- a value of the continuous trait and the pair of alleles for each of the markers in the subset of markers is obtained.
- the obtaining of a value of the continuous trait and the pair of alleles for each of the markers may be obtained through clinical trials or other studies that may involve a control group and a sample group.
- the obtaining a value of a continuous trait and a pair of alleles for each of the markers in the subset of markers is well known to those having skill in the art and need not be described further herein.
- the table below illustrates an example of data that may be thus obtained.
- the probabilities of haplotypes that are compatible with the alleles in the subset of markers is determined. Then, a regression is performed on the probabilities of haplotypes that are compatible with the alleles in the subset of markers, for all of the individuals, to determine correlations between the continuous trait and the haplotypes.
- a first haplotype from the haplotypes that are compatible with the individual set of alleles is sampled from the probability distribution determined, to thereby define a second haplotype which is determined by the sampling of the first haplotype.
- the value of the continuous trait for the individual is assigned to both the first haplotype and the second haplotype, to thereby define a doubled sample size.
- An analysis of variance may be performed by comparing average values of the trait among the sampled first and second haplotypes for all the individuals. These operations are repeated a sufficient number of times, to obtain a distribution of correlations of the continuous trait and the haplotypes. When all the haplotypes have been processed, a value is determined from the distribution that identifies a significance of the correlation.
- An analysis of variance may be performed by defining a design matrix of first -and second indicator values (such as 0 and 1) having two rows for each individual, where the second indicator value is associated with the first and second haplotypes and remaining positions in the design matrix are set to the first indicator value in the two rows.
- a regression in then performed on the design matrix, to thereby identify a correlation value between the value of the continuous trait and the first and second haplotypes.
- allelic versus genotypic tests for the case-control design and bi-allelic markers were studied.
- a genotypic test for association can operate on a 2 x 3 contingency table of individuals, classified according to their genotypes and the affection status. The total count of such a table is n.
- An allelic test would operate on a 2 x 2 table of allele counts versus affection status. Thus, each individual would contribute two alleles to the table, and the total count becomes 2n.
- the gametes may be single locus, in which case the values of X are called alleles.
- haplotypes When X is multi-locus they are called haplotypes. Individual i has two gametes, X ⁇ and X,- 2 , and the genotype of individual i is denoted (Xu, Xi 2 ). Individual i also has an associated phenotype, Yj.
- Equation (1) is an Analysis of Variance (ANON A) model relating response to allele class.
- ANON A Analysis of Variance
- the F test may be suspect because the response variable has been doubled. In other words, it may appear like "cheating" to artificially double the sample size.
- the F statistic is equivalent to that of the following n-dimensional regression model, and the data doubling is therefore valid.
- Equation (1) An alternative model to Equation (1) with similar asymptotic properties and well-known finite-sample properties now will be described.
- the model is an n- dimensional regression model
- Equation (2) may have the usual validity (or lack thereof, in cases of lack of fit) of standard regression models, whereas Equation (1) may seem unrealistic since the observations are simply doubled. Nevertheless, it will be shown that these models can produce equivalent F statistics when HWE holds. To understand potential lack of fit, note that Equation (2) may correspond to that of Weir et al. (1977) Two-locus theory in Quantitative Genetics, Proceedings of the International Conference on Qunatitative Genetics pp247-69 and Nielsen et al.
- Equation (2) is exactly equation (3) with d j ⁇ 0.
- Equation (3) may lack sensitivity in cases of dominance effects d jk ⁇ 0).
- the test for ⁇ H 0 : ⁇ , ⁇ 0 and d j ⁇ 0 ⁇ may lose power because of the large numerator degrees of freedom (L(L+l)/2 - 1).
- the additive Equation (2) may be preferable despite possible lack of fit.
- the F test uses:
- SSA 1 Y'(D(D'D)- 1 D'-J nxn /n)Y and
- the "alleles" can denote multi-locus haplotypes rather than single-locus alleles.
- the parameter O j refers to the main effect of haplotype j.
- the haplotypes are generally unobservable, and therefore missing data methods may be used for their estimation.
- ANOVA model Equation (1) but where the Ay are generated at random from a distribution inferred through the observed single-locus genotypes, then results are averaged over random haplotype generations.
- the second basic type is like the regression model Equation (2), where instead of using actual haplotype frequencies (0,1,2) for person i, the expected haplotype frequencies ⁇ given the observed single locus genotypes) are used.
- E-M Expectation-Maximization
- haplotype frequencies real values
- vectors of possible haplotypes vectors of integers
- a model specified by (1) is formed, and a test statistic (F) for the importance of including the genotype is calculated;
- the final p-value is given by the median of the distribution of p-values. It is possible to greatly increase power of the test if some of the vector haplotype indicators An, A; 2 can be replaced by corresponding scalar rank scores Rn, R 2 (in order of "importance"), based on prior tests or biological knowledge. In that case the model is formed as Yr ⁇ +R i2 ⁇ + € i2 This can concentrate the effect into a test with a single degree of freedom, and can have much greater power when the rank scores are chosen well.
- Equation (2) An alternative is to perform a multiple regression, based on n observations instead of 2n, directly on the set of per-person expected haplotype frequencies. This is motivated by Equation (2), where the traits are regressed on the observed frequencies. If all elements in the matrix D in Equation (2) are divided by two, then they can be considered as probabilities for the individuals to have a particular allele. In the single-locus case, the identification of alleles may be certain, and so 0, 0.5, and 1 generally are the only values possible. In the case of E-M inferred haplotypes the corresponding model is:
- Equations (6)-(9) concern the behavior of the n, k and the p jk . Now consider the Yi under embodiment 3. Assume that are independent with
- Equation (13) MSE, (MSE, +o p (l)) by Equation (13), and the result will be proven.
- Equations (14), (15) and (16) need to be demonstrated.
- MSE-MSEi o p (l) + Y A (D'DrD A D A Y ⁇ n 2n
- Equation (15) is proven.
- Equation (16) uses Equation (18).
- Equation (16) follows by noting that n Y A converges in distribution and that the elements of B n converge in probability, and Equation (4) is finally proven.
- the haplotype analysis is generally performed over a sliding scan window, such as for polymorphisms from a whole genome scan, chromosome scan or a chromosomal region scan.
- the haplotype scan window is generally at least 1 kb or 10 kb, and is preferably 10 kb to 100 kb, 10 kb to 500 kb, 10 kb to lOOOkb, 10 kb to 10,000 kb, 500 kb to 10,000 kb or larger.
- the scan window comprises at least 2, 3, 4, 5, 10, 20 or more polymorphic regions (e.g. SNPs). Generally at least 2,
- the length of the scan window might be analysis-specific, and analysis with different window lengths can be used to fully explore the data.
- marker in high pair-wise linkage disequilibrium would provide similar information and one of such markers might be excluded form consideration.
- the pair-wise disequilibrium might also provide some insight on the optimal length of the window, as there might be no reason to extend it over distances where linkage disequilibrium substantially drops. Windows containing larger numbers of markers might provide more precise information on the association with the response, but the estimation precision and high statistical power will require larger sample sizes.
- the statistical analysis which detects the correlation between the phenotype and deduced haplotypes determines whether an individual who has the haplotype is more likely to have the relevant phenotype. Generally this is done by determining whether there is a difference in the phenotype amongst people with and without the haplotype. In a case-control study the frequency of the haplotype in the cases is compared to the frequency of the haplotype in the controls. Typically a likelihood ratio test is used and the p-value for the association is obtained from the chi-square distribution.
- NAT2 N-Acetyltransferase 2
- table 1 Five SNPs from the N-Acetyltransferase 2 (NAT2) gene on chromosome 8 (table 1) were genotyped for each of the 81 GlaxoSmithKline employees using PCR and direct sequencing. An 850-bp fragment of the NAT2 gene was amplified using primers Fl and Rl and subsequently sequenced using the initial PCR primers and two additional nested primers (F2 and R2) on an ABI 377 Sequencer (PE Applied
- F2 (nested forward primer): 5'-CACCTTCTCCTGCAGGTGACCA-3'
- R2 (nested reverse primer): 5'-TGTCAAGCAGAAAATGCAAGGC-3'
- Sequencher (Genecodes Corporation, Ann Arbor, USA) was used to analyse the sequences in order to generate genotype results for each of the 5 polymorphic sites.
- OLA-PCR was performed at 94°C for 2 minutes, then at 94°C for 30 seconds, 50°C for 30 seconds, 72°C for 1 minute for 40 cycles, and finally at 72°C for 5 minutes using an MJ thermal cycler (MJ Research INC, Watertown, Massachusetts, USA).
- the ligation reaction was run at 94°C for 20 seconds and 50°C for 1 minute for 30 cycles on an MJ thermal cycler.
- Ten ⁇ l reaction mix contained 3 ⁇ l of the lyophilised PCR products, 10 units of Taq DNA ligase (thermo-stable), 45 nM of each of the three probes, and IX ligase buffer (20mM Tris- HC1, 25mM potassium acetate, lOmM magnesium acetate, lOmM DTT, lmM NAD, and 0.1% Triton X-100).
- the 850-bp PCR fragment of the NAT2 gene was cloned into a TA cloning vector (Invitrogen, Groningen, the Netherlands). Between six and twelve subclones from each of the 81 individuals were sequenced. These sequence data were analysed using Lasergene (DNASTAR Inc., Madison, USA) to resolve the haplotypes for both chromosomes of each individual. The haplotypes from the 5 SNPs on chromosome X were assigned directly according to the genotype data, as each individual male has only one X chromosome.
- haplotypes for each diploid for both genetic regions were assigned using the subtraction algorithm (Clark (1990) Mol. Biol. Evol. 7, 111-22). Briefly, haplotypes for individuals who were either complete homozygotes or single - site heterozygotes were assigned initially and a preliminary list of haplotypes present in the samples was recorded. Then other individuals who carried a copy of the previously recognised haplotypes were identified. Each time a resolved haplotype was identified as one of the possible alleles in an ambiguous individual, the homologue allele was considered to be a recognisable haplotype and added to the haplotype list.
- haplotype frequencies were calculated by gene counting using individuals with resolved haplotype phases.
- the sample haplotype frequencies and individual conditional haplotype probabilities for both genomic regions were also estimated using the EM algorithm with multiple restarts. All haplotype pairs that can yield an unphased genotype pattern were enumerated. The probability for each of the haplotype configurations was calculated using the estimated population haplotype frequencies. The haplotype phase was considered to be resolved if the probability of a haplotype pair was greater than 99%). For example, suppose one haplotype pair that generates the unphased pattern is i/j, where i and j represent two of the haplotypes with p(i) an ⁇ pQ) frequencies as estimated by the EM algorithm. By the Bayes rule, the conditional probability that the unphased genotype Gy has the haplotype pair i/j is
- x and y indicate a haplotype pair that can yield the same unphased genotype, and the sum is taken over all such pairs including i andj. If the conditional probability is less than 99%, the phase of that genotype pattern is considered unresolved.
- I F the similarity index
- Ip varies between 0 and 1 (a value of 1 is achieved when the actual and estimated frequencies are identical).
- I ff compares the number of different haplotypes seen experimentally with the number of different haplotypes identified by the computer programs.
- a haplotype is defined as being detected if it has an estimated frequency of at least l/(2n) in a population of n individuals (Excoffier and Slatkin, supra).
- m true is the number of haplotypes determined experimentally
- m est is the number of estimated haplotypes with frequency above the threshold
- m missec ⁇ is the number of haplotypes identified experimentally but not computationally.
- the value of In can vary between one (when the computational identified haplotypes are exactly the same as those determined experimentally) to zero (when none of the true haplotypes are identified computationally).
- MSE mean squared error
- MSE ⁇ (p ek - Ptk ) 2 /h
- LD was measured using the standardised D' first proposed by Lewontin (Lewontin (1964) Genetics 49, 49-67).
- D' is the LD relative to its maximum value for a given set of allelic frequencies for the pair of sites. It is calculated by dividing the raw D value by the absolute maximal value possible. In that sense, D' is a normalised value of LD.
- Figure 1 shows the distribution of the 5 SNPs utilised in this study over the NAT2 locus.
- the 850-bp fragment of the NAT2 gene was amplified using primers Fl and Rl and the genotypes of each individual for the 5 SNPs were determined by direct sequencing of the PCR products in both directions.
- the minor allele frequencies of the 5 polymorphisms determined in the 81 individuals ranged from 0.25 to 0.49 (table 1), which were similar to those reported for Caucasians (Agundez et al (1996) Pharmacogenetics 6, 423-8).
- the genotype distribution for each SNP did not deviate significantly from Hardy- Weinberg equilibrium.
- the 850-bp PCR fragment was cloned into a TA cloning vector and between six and twelve subclones from each individual were analysed by PCR and sequencing.
- the maximum number of haplotypes for a locus with 5 biallelic variable sites is 6 (i.e. n+1), with n being the number of SNP sites.
- the maximum number of potential haplotypes for a locus with 5 SNPs is 32 (2 5 ).
- SNP pairs (table 4). Each of the eight SNP pairs created only three haplotypes. The remaining two SNP pairs created all four possible haplotypes with three haplotypes accounting for 98-99%) of the alleles.
- haplotypes from the genotyping results for the 81 individuals using the subtraction algorithm (Clark (1990), supra). Thirty-one individuals were either homozygous for all SNP sites or heterozygous at only one SNP site; thus their haplotypes could be assigned directly. Six haplotypes were identified in these 31 individuals (table 3, H1-H6). Eight, eighteen, three and twenty-one individuals were heterozygous at two, three, four and all five SNP sites, respectively. Using the subtraction method, we resolved the haplotype phases for 64 individuals (79%>). There was 100% concordance between experimentally determined haplotype phases and those predicted computationally for the individuals.
- the remaining 17 individuals were heterozygous at the same three SNP sites and each had two possible haplotype configurations.
- the haplotype frequencies were calculated from the 64 phase-resolved individuals (table 3).
- the similarity index (I F ) value was 0.91, which was close to its maximal value, suggesting that the subtraction method was effective in estimating haplotype frequencies for this region.
- the overall error (MSE value) between estimated and true sample frequencies was 1-2 orders of magnitude greater than that reported by Fallin and Schork using the EM algorithm (Fallin and Schork (2000), supra), probably because of the subtraction method used a reduced number of individuals in haplotype frequency estimation (table 3).
- haplotype frequencies using the EM algorithm with 100 restarts to minimize chances of local convergence.
- the algorithm predicted a total of 7 haplotypes with 3 main haplotypes (H1-H3) representing 93% of all alleles (table 3).
- Comparison of the haplotype frequencies determined molecularly and that estimated using the EM algorithm showed very high concordance (table 3).
- the I F value was 0.999 and the MSE value was 4 orders of magnitude smaller than that obtained using the subtraction method.
- the EM algorithm predicted haplotypes phases for all of the 81 individuals.
- Figure 2 shows the distribution of the 5 SNPs over a region of 140-kb on chromosome X.
- the 5 SNPs for the 154 males were genotyped using OLA.
- the minor allele frequencies of the 5 SNPs ranged from 0.06 to 0.40 (figure 2). Pair-wise linkage disequilibrium was measured using D' (table 5).
- haplotypes of the 5 markers for the 154 males were assigned directly according to the genotype data, as each individual male has only one X chromosome.
- the five polymorphisms established 21 out of the 32 (2 5 ) potential haplotypes (table 6).
- Six of the haplotypes (hl6-h21) were observed only once and four haplotypes (hl2-hl5) were seen only twice.
- These ten rare haplotypes (hi 2-21) represented 9%> of all the 154 alleles.
- Six haplotypes (hl-h6) had allele frequencies above 5%o, representing 75 % of the 154 alleles. Of these six haplotypes, four had allele frequencies above 10% (hl-h4), accounting for 57% of all alleles.
- the haplotype frequencies were estimated using the 43 phase-resolved diploids by the subtraction method (table 6).
- the combined haplotype frequency for the six haplotypes (hl-h6) that had true allele frequencies greater than 5%o was 0.81, which was higher than that determined molecularly (0.75).
- the I F value of the subtraction method was lower for this region than that for the NAT2 region (table 3, 6).
- haplotype frequencies for the 77 artificially generated diploids using the EM algorithm with 100 restarts.
- a total of 26 haplotypes were predicted, including the 21 molecularly determined haplotypes and 5 additional rare haplotypes which accounted for 0.1% of all the alleles (table 6, data not shown).
- Sixteen haplotypes predicted by the EM algorithm had allele frequencies greater than 0.5%).
- haplotypes predicted computationally represented 76% of the 154 alleles, which is similar to that determined molecularly (75%>).
- the EM algorithm performed marginally better than the subtraction method in estimating haplotype frequencies for this locus (table 6).
- the reduced I value and increased MSE value for this locus in comparison with that observed for the NAT 2 locus suggested that the estimation error for overall haplotype frequencies was increased with decreased LD between SNP sites using the EM algorithm (table 3, table 6).
- haplotype phases for 49 out of the 77 diploids were considered to be resolved. More than one possible haplotype configuration was present for the remaining 28 diploids (36%>), and the haplotype phases for these diploids remained unresolved (table 7).
- the EM algorithm predicted haplotypes were in agreement with those experimentally determined for all of the 38 diploids, which were either complete homozygotes or single-site heterozygotes. For the 11 diploids that were multiple-site heterozygotes, the haplotype phases assigned by the EM algorithm were in agreement with those experimentally determined for only 5 diploids. The overall accuracy for predicting haplotype phases using the EM algorithm was 88% for all of the diploids in this region. Thus, the EM algorithm also performed poorly in predicting individual haplotype phases for genetic regions with low LD.
- haplotype subtraction method To make a more comprehensive comparison between the haplotype subtraction method and the EM method in haplotype frequency estimation, we performed simple computer simulation to assess the accuracy of both algorithms when there is uniformity of haplotype frequencies. The amount of heterozygosity, and therefore the number of ambiguous haplotypes, was increased by equalising haplotype frequencies, thereby presenting more of a challenge to the computational algorithms.
- haplotype frequency estimation Fallin and Schork 2000, supra.
- Fallin and Schork demonstrated that the EM algorithm performed very well under a wide range of population and data set scenarios.
- haplotype frequencies can be estimated from genotype data computationally without additional laboratory cost, and the estimation error was increased with decreased LD.
- Sequential haplotype scanning may be able to provide a richly detailed view of specific genomic fragments and reveal the inter-relationships between SNPs surrounding the regions, thus offering an additional method for identifying genomic fragments that harbour the variants causing the phenotype. If the haplotype information is derived from genotype data using computational methods, it needs to be noted that the accuracy of such haplotype information is decreased with decreased LD and increased ambiguity between markers.
- haplotypes For a locus with n biallelic variable sites, the maximum number of haplotypes is n +1 in the absence of recombination, repeated or back mutations; whereas the potential number of haplotypes could reach 2 n if there is linkage equilibrium between polymorphic sites.
- Haplotype Hap ⁇ otype a Experimental Estimated Estimated determined frequency frequency (EM frequency (subtraction algorithm) 0 algorithm) 0
- the haplotype phases were resolved for 64/81 individuals according to the subtraction method (Clark 1990). The haplotype frequencies were calculated from the
- haplotype composition 11211/12221 Four individuals had haplotype composition 11211/12221, and two individuals had haplotype composition 12211/11221. b One individual had haplotype composition 12111/22112, and one individual had haplotype composition 22111/12112.
- Phase-resolved Accuracy Phase-resolved Accuracy individuals (%) diploids (%)
Landscapes
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Chemical & Material Sciences (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Computational Biology (AREA)
- Analytical Chemistry (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2001284265A AU2001284265A1 (en) | 2000-09-04 | 2001-09-04 | Genetic study |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB0021667.1 | 2000-09-04 | ||
GBGB0021667.1A GB0021667D0 (en) | 2000-09-04 | 2000-09-04 | Genetic study |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2002020835A2 true WO2002020835A2 (fr) | 2002-03-14 |
WO2002020835A3 WO2002020835A3 (fr) | 2003-10-09 |
Family
ID=9898796
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/GB2001/003970 WO2002020835A2 (fr) | 2000-09-04 | 2001-09-04 | Etude genetique |
Country Status (3)
Country | Link |
---|---|
AU (1) | AU2001284265A1 (fr) |
GB (1) | GB0021667D0 (fr) |
WO (1) | WO2002020835A2 (fr) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002031188A2 (fr) * | 2000-10-11 | 2002-04-18 | Genprofile Ag | Procede et dispositif pour prevoir des haplotypes |
WO2002035442A2 (fr) * | 2000-10-23 | 2002-05-02 | Glaxo Group Limited | Denombrements d'haplotypes composites pour loci et alleles multiples et tests d'association avec des phenotypes continus ou distincts |
EP1508623A1 (fr) * | 2002-01-15 | 2005-02-23 | Genesys Technologies, Inc. | Procede de specification de polymorphisme nucleotidique simple (snp) |
US7107155B2 (en) | 2001-12-03 | 2006-09-12 | Dnaprint Genomics, Inc. | Methods for the identification of genetic features for complex genetics classifiers |
EP1864235A2 (fr) * | 2005-03-31 | 2007-12-12 | Mizhuo Information & Research Institute, Inc. | Systeme, procede et programme d'analyse de genetique statistique |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1997040462A2 (fr) * | 1996-04-19 | 1997-10-30 | Spectra Biomedical, Inc. | Formes polymorphes en correlation au niveau de phenotypes multiples |
WO1999054500A2 (fr) * | 1998-04-21 | 1999-10-28 | Genset | Marqueurs bialleles convenant a la constitution d'une carte haute densite des desequilibres du genome humain |
EP0955382A2 (fr) * | 1998-05-07 | 1999-11-10 | Affymetrix, Inc. (a California Corporation) | Des polymorphismes associés à l'hypertension |
WO2000051053A1 (fr) * | 1999-02-26 | 2000-08-31 | Gemini Genomics (Uk) Limited | Base de donnees clinique et diagnostique |
WO2001091026A2 (fr) * | 2000-05-25 | 2001-11-29 | Genset S.A. | Procedes d'analyse genetique au moyen de marqueurs d'adn qui utilisent des frequences haplotypes estimees et utilisations de ces procedes |
-
2000
- 2000-09-04 GB GBGB0021667.1A patent/GB0021667D0/en not_active Ceased
-
2001
- 2001-09-04 WO PCT/GB2001/003970 patent/WO2002020835A2/fr active Application Filing
- 2001-09-04 AU AU2001284265A patent/AU2001284265A1/en not_active Abandoned
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1997040462A2 (fr) * | 1996-04-19 | 1997-10-30 | Spectra Biomedical, Inc. | Formes polymorphes en correlation au niveau de phenotypes multiples |
WO1999054500A2 (fr) * | 1998-04-21 | 1999-10-28 | Genset | Marqueurs bialleles convenant a la constitution d'une carte haute densite des desequilibres du genome humain |
EP0955382A2 (fr) * | 1998-05-07 | 1999-11-10 | Affymetrix, Inc. (a California Corporation) | Des polymorphismes associés à l'hypertension |
WO2000051053A1 (fr) * | 1999-02-26 | 2000-08-31 | Gemini Genomics (Uk) Limited | Base de donnees clinique et diagnostique |
WO2001091026A2 (fr) * | 2000-05-25 | 2001-11-29 | Genset S.A. | Procedes d'analyse genetique au moyen de marqueurs d'adn qui utilisent des frequences haplotypes estimees et utilisations de ces procedes |
Non-Patent Citations (5)
Title |
---|
CLARK A G ET AL: "HAPLOTYPE STRUCTURE AND POPULATION GENETIC INFERENCES FROM NUCLEOTIDE-SEQUENCE VARIATION IN HUMAN LIPOPROTEIN LIPASE" AMERICAN JOURNAL OF HUMAN GENETICS, UNIVERSITY OF CHICAGO PRESS, CHICAGO,, US, vol. 63, 1998, pages 595-612, XP002944466 ISSN: 0002-9297 * |
EXCOFFIER L SLATKIN M: "Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population" MOLECULAR BIOLOGY AND EVOLUTION, THE UNIVERSITY OF CHICAGO PRESS, US, vol. 12, 1995, pages 921-927, XP002953528 ISSN: 0737-4038 cited in the application * |
FALLIN D AND SCHORK N J: "Accuracy of Haplotype Frequency Estimation for Biallelic Loci, via the Expectation-Maximization Algorithm for Unphased Diploid Genotype Data" AMERICAN JOURNAL OF HUMAN GENETICS, AMERICAN SOCIETY OF HUMAN GENETICS, vol. 67, 22 August 2000 (2000-08-22), pages 947-959, XP002953525 ISSN: 0002-9297 cited in the application * |
HAWLEY M E ET AL: "HAPLO: A PROGRAM USING THE EM ALGORITHM TO ESTIMATE THE FREQUENCIESOF MULTI-SITE HAPLOTYPES" JOURNAL OF HEREDITY, OXFORD UNIVERSITY PRESS, CARY, GB, vol. 86, no. 5, 1995, pages 409-411, XP000891489 ISSN: 0022-1503 * |
LONG A D ET AL: "The Power of Association Studies to Detect the Contribution of Candidate Genetic Loci to Variation in Complex Traits" GENOME RESEARCH, COLD SPRING HARBOR LABORATORY PRESS, US, vol. 9, August 1999 (1999-08), pages 720-731, XP002222375 ISSN: 1088-9051 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002031188A2 (fr) * | 2000-10-11 | 2002-04-18 | Genprofile Ag | Procede et dispositif pour prevoir des haplotypes |
WO2002031188A3 (fr) * | 2000-10-11 | 2003-10-09 | Genprofile Ag | Procede et dispositif pour prevoir des haplotypes |
WO2002035442A2 (fr) * | 2000-10-23 | 2002-05-02 | Glaxo Group Limited | Denombrements d'haplotypes composites pour loci et alleles multiples et tests d'association avec des phenotypes continus ou distincts |
WO2002035442A3 (fr) * | 2000-10-23 | 2003-07-31 | Glaxo Group Ltd | Denombrements d'haplotypes composites pour loci et alleles multiples et tests d'association avec des phenotypes continus ou distincts |
US7107155B2 (en) | 2001-12-03 | 2006-09-12 | Dnaprint Genomics, Inc. | Methods for the identification of genetic features for complex genetics classifiers |
EP1508623A1 (fr) * | 2002-01-15 | 2005-02-23 | Genesys Technologies, Inc. | Procede de specification de polymorphisme nucleotidique simple (snp) |
EP1508623A4 (fr) * | 2002-01-15 | 2006-10-04 | Genesys Technologies Inc | Procede de specification de polymorphisme nucleotidique simple (snp) |
EP1864235A2 (fr) * | 2005-03-31 | 2007-12-12 | Mizhuo Information & Research Institute, Inc. | Systeme, procede et programme d'analyse de genetique statistique |
Also Published As
Publication number | Publication date |
---|---|
GB0021667D0 (en) | 2000-10-18 |
AU2001284265A1 (en) | 2002-03-22 |
WO2002020835A3 (fr) | 2003-10-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xu et al. | Effectiveness of computational methods in haplotype prediction | |
Glusman et al. | Whole-genome haplotyping approaches and genomic medicine | |
Leal | Detection of genotyping errors and pseudo‐SNPs via deviations from Hardy‐Weinberg equilibrium | |
Gertz et al. | Analysis of DNA methylation in a three-generation family reveals widespread genetic influence on epigenetic regulation | |
Mueller | Linkage disequilibrium for different scales and applications | |
Rannala | Finding genes influencing susceptibility to complex diseases in the post-genome era | |
EP1129216B1 (fr) | Methodes, logiciel et appareils permettant d'identifier des regions genomiques hebergeant un gene associe a un trait detectable | |
US8140270B2 (en) | Methods and systems for medical sequencing analysis | |
Long et al. | Low base-substitution mutation rate in the germline genome of the ciliate Tetrahymena thermophila | |
Mohlke et al. | Linkage disequilibrium between microsatellite markers extends beyond 1 cM on chromosome 20 in Finns | |
Calaway et al. | Genetic architecture of skewed X inactivation in the laboratory mouse | |
Hinrichs et al. | Population stratification and patterns of linkage disequilibrium | |
Carlson et al. | MIPSTR: a method for multiplex genotyping of germline and somatic STR variation across many individuals | |
Aguado et al. | Validation and genotyping of multiple human polymorphic inversions mediated by inverted repeats reveals a high degree of recurrence | |
Lynn et al. | Patterns of meiotic recombination on the long arm of human chromosome 21 | |
Sabbagh et al. | Inferring haplotypes at the NAT2 locus: the computational approach | |
Elahi et al. | Global genetic analysis | |
Zapata et al. | Spectrum of nonrandom associations between microsatellite loci on human chromosome 11p15 | |
WO2002020835A2 (fr) | Etude genetique | |
McRae et al. | Power and SNP tagging in whole mitochondrial genome association studies | |
Langefeld et al. | Association methods in human genetics | |
Song et al. | Resolving the recombination pattern of 38 X-STRs from Chinese Han three-generation pedigrees | |
Fukuta et al. | A simple method for calculating the likelihood ratio in a kinship test using X-chromosomal markers incorporating linkage, linkage disequilibrium, and mutation | |
Collins | Linkage disequilibrium and association mapping: an introduction | |
Forabosco et al. | Statistical tools for linkage analysis and genetic association studies |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PH PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase |
Ref country code: JP |