WO2008007424A1 - Système d'analyse du génome, procédé d'analyse du génome et programme - Google Patents

Système d'analyse du génome, procédé d'analyse du génome et programme Download PDF

Info

Publication number
WO2008007424A1
WO2008007424A1 PCT/JP2006/313757 JP2006313757W WO2008007424A1 WO 2008007424 A1 WO2008007424 A1 WO 2008007424A1 JP 2006313757 W JP2006313757 W JP 2006313757W WO 2008007424 A1 WO2008007424 A1 WO 2008007424A1
Authority
WO
WIPO (PCT)
Prior art keywords
state variable
population
equation
update
genome analysis
Prior art date
Application number
PCT/JP2006/313757
Other languages
English (en)
Japanese (ja)
Inventor
Junji Tanaka
Masato Inoue
Original Assignee
Digital Information Technologies Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Digital Information Technologies Corporation filed Critical Digital Information Technologies Corporation
Priority to PCT/JP2006/313757 priority Critical patent/WO2008007424A1/fr
Publication of WO2008007424A1 publication Critical patent/WO2008007424A1/fr

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection

Definitions

  • Genome analysis system Genome analysis method, genome analysis method and program
  • the present invention relates to a genome analysis system, an analysis method, and a program for performing analysis for estimating the characteristics of a population and the position of Z or each specimen in the population, particularly from sample data.
  • the genome refers to a set of chromosomes that are indispensable for carrying out life activities.
  • a genome is a compound word made up of a gene and a chromosome.
  • the basis of life is a cell, the cell is surrounded by a cell membrane, the nucleus is surrounded by a nuclear membrane, and the independence of each unit is maintained.
  • Human cells are specialized cell groups that have differentiated functions and forms such as nerve cells, muscle cells, blood cells, immune system cells, epithelial cells that are cells on the surface of skin and tissues, and sensory cells. It is made up of undifferentiated cells called stem cells. Cells have important time-varying aspects. It is to make new cells by dividing cells. Cell division is an important mechanism that enables the transmission and expression of genetic information.
  • chromosomes in the nucleus. These chromosomes are the ones that carry genetic information, and the genes are contained in them. Genes mainly define how proteins are made in the genome.
  • the basic substance that makes up a chromosome is DNA (deoxyribonucleic acid), and genetic information is stored in a sequence of four bases in DNA, A, T, G, and C.
  • Haploid organisms such as bacteria and viruses have a single genome.
  • a diploid organism has two sets of genomes with overlapping genetic information. For example, germ cells such as human eggs and sperm have a set of genomes with 23 chromosomal forces. Somatic cells have two sets of genomes (46 chromosomes)! The human genome consists of about 3 billion DNA base pairs (3000 megabase pairs, 1 megabase is 1 million base pairs), and a single string is about 1 meter long.
  • a genome is a total of gene information existing in a cell, and includes information for controlling genes and gene expression.
  • proteins and genes are so-called products and blueprints, and there are parts on the genome that control and control the production of products in addition to blueprints.
  • the significance of its existence is unknown, but there are also some areas where it seems to have some influence on the maintenance of biological functions. By clarifying these, it is believed that more accurate understanding of life phenomena will be possible.
  • genome analysis is a comprehensive analysis of the genetic information of an organism's genome, and the power to determine the base sequences of DNA molecules (GATC alignment) that make up the genome begins.
  • GATC alignment the base sequences of DNA molecules
  • the nucleotide sequence of about 3 billion pairs of DNA contained in 24 chromosomes (that is, DNA molecules) in total, 22 autosomes, X chromosome, and Y chromosome, is the human genome.
  • the genome information we have is the inherited genome information of the previous parent. Parents' genome information inherits the ancestral power of the previous generation. In this way, by going back to the origin of genetic information one generation ago, we can reach the genome of the first organism 3.8 billion years ago.
  • genome sequence information is input as a genome analysis, and a plurality of (for example, 10) or more identical bases are continuously arranged in the input genome sequence information. If there is a sequence portion, the plurality of the same bases are continuously arranged !, and the sequence portion is continuously arranged in front and rear of the predetermined number of
  • a genome analysis method that extracts base sequence information consisting of bases and outputs the extracted base sequence information.
  • a polymorphic marker for identifying a disease-related candidate gene can be found quickly and efficiently with an accuracy close to that of SNPs without using SNPs (single nucleotide polymorphism). It's like! /
  • Patent Document 1 is a force that is a method of genome analysis that attempts to find polymorphic markers for identifying disease-related candidate genes. It is necessary to analyze the DNA base sequence as well as various viewpoints. Therefore, it has not yet been elucidated, and it is expected that there will be various methods for genome analysis, and it is expected to be elucidated.
  • Patent Document 1 Japanese Patent Laid-Open No. 2003-288346
  • the present invention has been made in view of such a situation, and solves the above problems. It is intended to provide a genome analysis system and analysis method that can estimate population characteristics and z or positioning of each sample in a population from sample data.
  • the present invention has the following configurations to solve the above problems.
  • the gist of the invention of claim 1 is that sample data is taken in
  • Two first state variables and a second state variable are selected which are state variables that characterize the population to which the sample data belongs, or state variables that represent the position of each sample of the sample data in the population. And a convergence means for converging the first state variable and the second state variable to a desired value, and the characteristics of the population and Z or each sample in the population. It exists in the genome-analysis system characterized by having the characteristic estimation means by which positioning is estimated.
  • the gist of the invention described in claim 2 is provided with a taking-in means for taking in sample data and a computing means,
  • the calculation means is a state variable that characterizes a population to which the sample data captured by the capture means belongs, or a state variable that represents a position of each sample of the sample data in the population. Select the state variable 1 and the second state variable,
  • the present invention resides in a genome analysis system characterized by estimating the characteristics of the population and the positioning of Z or each specimen in the population.
  • the gist of the invention described in claim 3 is that an operator uses an update expression embedded with knowledge of genetics (statistics) in which the first state variable and the second state variable are represented by each other. Further comprising conversion means for mutual conversion, and estimation means for estimating the first state variable and the second state variable by a third state variable embedded in the update formula adapted to each of the first state variable and the second state variable It exists in the genome-analysis system of Claim 1 or 2 characterized by the above-mentioned.
  • the gist of the invention described in claim 4 is that the first state variable is an origin population belonging degree of each sample of the sample data, and the second state variable is an origin of the sample data.
  • the gist of the invention described in claim 5 is that the third state variable is a diplotype of each sample of the sample data and its frequency, according to any one of claims 1 to 4. It exists in the genome analysis system.
  • the gist of the invention described in claim 6 is that the first state variable update expression, which is an update expression adapted to the first state variable, is represented by the following expression (1): It exists in the genome analysis system in any one of.
  • the gist of the invention described in claim 7 is that the second state variable update expression, which is an update expression adapted to the second state variable, is represented by the following expression (2): It exists in the genome analysis system in any one of.
  • the gist of the invention described in claim 8 is that the second state variable update expression, which is an update expression adapted to the second state variable, is represented by the following expression (3): It exists in the genome analysis system in any one of.
  • the gist of the invention described in claim 9 is the power of any one of claims 1 to 8, further comprising K optimum solution deriving means for obtaining an optimum solution by using the number K of the origin population as the following equation (4): Exists in the described genome analysis system.
  • the gist of the invention described in claim 10 is characterized in that it further comprises K optimum solution deriving means for obtaining an optimum solution by using the number K of the origin population as the following equation (5).
  • K optimum solution deriving means for obtaining an optimum solution by using the number K of the origin population as the following equation (5).
  • the gist of the invention described in claim 11 is that the update equation for updating the first state variable and the second state variable is expressed by the following equation (6): It exists in the genome analysis system in any one of.
  • the gist of the invention described in claim 12 is a determining means for determining a genetic polymorphism to be investigated
  • a wet process means for determining or estimating an individual's haplotype from allele information determined by the wet process for the genetic polymorphism of the population to be investigated;
  • a feature parameter determining means for determining two feature parameters, which are a feature parameter that characterizes the group and z or a feature parameter that indicates the position of the group in the population;
  • Update formula construction means for constructing an update formula between the two feature parameters from genetic information
  • a feature parameter deriving means for sequentially obtaining the two feature parameters by an update formula
  • Conversion convergence means for repeating conversion until the two feature parameters converge, and by obtaining the two feature parameters, characteristics of the population and
  • the gist of the invention described in claim 13 is an acquisition step of acquiring sample data, a state variable characterizing the population to which the sample data belongs, and a state variable indicating the position of z or each specimen in the population.
  • the gist of the invention described in claim 14 is that the first state variable and the second state variable are mutually expressed by using an update expression in which genetic (statistical) knowledge represented by the other one is embedded as an operator. And a conversion step of converting the first state variable and the second state variable to an estimation step for estimating the first state variable and the second state variable by using the third state variable embedded in the update equation. It exists in the genome-analysis method of Claim 13 characterized by the above-mentioned.
  • the gist of the invention of claim 15 is that the first state variable is an origin population membership degree of each sample of the sample data, and the second state variable is an origin population haplotype frequency of the sample data. 15.
  • the gist of the invention of claim 16 is that the third state variable is each sample data.
  • the gist of the invention described in claim 17 is that the first state variable update expression which is an update expression adapted to the first state variable is expressed by the following expression (1): The genome analysis method described in any of the above.
  • the gist of the invention described in claim 18 is that the second state variable update expression which is an update expression adapted to the second state variable is expressed by the following expression (2): The genome analysis method described in any of the above.
  • the gist of the invention described in claim 19 is that the second state variable update expression, which is an update expression adapted to the second state variable, is represented by the following expression (3): The genome analysis method described in any of the above.
  • the gist of the invention described in claim 20 further includes a K optimum solution derivation step for obtaining an optimum solution using the number K of the origin population as the following equation (4). It exists in the genome analysis method of description. [Equation 4]
  • the gist of the invention described in claim 21 is the K optimal solution derivation step for obtaining an optimal solution with the number K of origin population as the following equation (5): 21.
  • the gist of the invention described in claim 22 is that the update equation for updating the first state variable and the second state variable is expressed by the following equation (6): The genome analysis method described in any of the above.
  • the gist of the invention described in claim 23 is a determination step of determining a genetic polymorphism to be investigated, a wet process step of determining allele information by a wet process of a genetic polymorphism of a population to be investigated,
  • the gist of the invention described in claim 24 resides in a program capable of executing the genome analysis method according to any one of claims 13 to 23.
  • the genome analysis system of the present invention is a state variable that represents the characteristics of the population, and a state variable that represents the position of each sample in the population, for example, the origin population membership of each sample and each source population. It is possible to determine the frequency of haplotypes at a much higher speed than conventional methods by using genotype data and multitype data of multiple loci.
  • FIG. 1 is a diagram for explaining the outline of the genome analysis system using the genome analysis method of the present invention
  • Fig. 2 is a block diagram of the genome analysis system of the present invention
  • Fig. 3 is the genome analysis system of Fig. 1.
  • FIG. 4 is a flow chart showing the genome analysis method of the present invention.
  • the genome analysis system 1 uses the sample data to determine the characteristics of each population and each The position of the sample in the population is estimated and the analysis result is output.
  • Sample data is sampled from a population of broad genomic information represented by genetic polymorphisms.
  • the genome analysis system 1 it is possible to use a notebook computer, desktop computer, or the like equipped with an analysis program for performing calculations for genome analysis described later.
  • the configuration of the genome analysis system of the present invention is as shown in FIG. 2 in the form of determination means'wet process means, capture means, calculation means, selection means, feature parameter determination means, convergence means, conversion means, update formula construction means ⁇ Feature parameter deriving means ⁇ Conversion convergence means ⁇ Feature estimation means ⁇ Estimation means
  • the outline of the analysis by the genome analysis system 1 is a model of an entity that can be characterized by two state variables that characterize a group, as shown in FIG. 3, for example.
  • a state variable that characterizes a population is a statistical statistic derived from the population or each sample.For example, the origin population attribution of each sample, the origin population haplotype frequency, and the individual diplotype frequency. Can be mentioned.
  • State variables include state A, which is the first state, and state B, which is the second state.
  • state A is the origin population attribution of each sample
  • state B is the origin population haplotype frequency. Then, state A and state B are converted to each other using the update expression represented by the other side as an operator. Details of this update expression will be described later.
  • the genome analysis system 1 has a function of estimating three variables representing characteristics of the population to which the sample data belongs or the position of each specimen in the population, that is, the first variable and the second variable.
  • the variable has a loose relationship through the third variable, and has the function of estimating these three variables from the fourth variable that can be observed. For example, as shown in Fig. 3, we focus on the fact that state A and state B can be considered as two aspects of a group. The characteristic parameters are none other than these three variables.
  • the first, second, third, and fourth variables are defined by the following expression (7).
  • the sample is 1, Assume I of 2, ..., I, K of origin 1, 2, ..., K and H of 1, 2, ..., H of haplotypes .
  • the vector may be labeled as b Kh ⁇ -ib k [, P —A, w].
  • a diplotype represents a set of mother-derived and father-derived haplotypes ⁇ ,, 2 ⁇
  • first and second variables can be thought of as two states that characterize the system of interest and are not completely independent but are loosely related via the third variable. . Considering this, the first and second variables in equation (7) above can be considered as update operators that update each other.
  • an update operator adapted to the first variable and the second variable can be derived, and genetic (statistical) knowledge of these update operators can be derived. It is assumed that genetic information is embedded. At this time, if the first variable and the second variable are weakly related to each other, an appropriate initial value is given. It will converge to the feature of positioning in the population.
  • the genetic knowledge is expressed by the following equation (8). This is based on the assumption that the probability that a particular sample is a particular diplotype knows which origin population the sample originated from and also knows the haplotype frequency of that origin population. Below, it is as simple as restoring the haplotype twice from the original population.
  • Equation (9) the overall probability model is expressed as the following equation (10).
  • D is a function that represents 1 if the expression attached to the lower right is correct, and 0 otherwise.
  • equation (10) includes random variables that cannot be observed, let us consider obtaining optimal parameters within the framework of the EM algorithm. Specifically, the optimal parameters that characterize the observation data are estimated according to the following equation (11). That is, the optimal number of origin populations, the haplotype frequency of each population, the probability of which diplotype each sample is from, and the probability of which origin population originated are obtained.
  • the haplotype frequency of each origin population can be obtained using the following two sequential update equations, (12).
  • the estimation algorithm is as follows. First, find the value of a. This is the origin It can be obtained by assuming that the number of population is one. Specifically, using the following sequential update equation, (16)
  • Equation (14) becomes unnecessary because estimation has already been completed in equations (16) and (17).
  • variable update equation can be expressed as in equation (19)
  • optimum K can be expressed as in equation (20).
  • each haplotype is given an ID number h as shown in the following equation (21). There are H types of haplotypes.
  • an ID number is assigned to the observed dienotype (eg, observation data of multiple SNPs), and is set to xi as shown in equation (25). It is assumed that there are I people in all.
  • a possible diplotype type is set as set D (x) as shown in equation (26).
  • D (xi) the dienotypic power of a person in the eye
  • is introduced as shown in the following equation (27). This ⁇ takes a value of 1 if di is present in D (x), and 0 otherwise.
  • the person's dienotype can be uniquely determined as shown in the following equation (29).
  • the dienotype probability for the i-th person can be expressed as the following equation (30).
  • Equation 31 y ⁇ arg max (in P ( ⁇ , ⁇
  • y)) P ( ⁇ mXi) y) arg max Q ( ⁇ xi ⁇ ⁇ y) ⁇ ⁇ ⁇ (3 D [0079]
  • equation (31) is converged to a true value by iterative calculation by iterative substitution. This can be mathematically expressed as the following equation (32).
  • equation (32) the y (t) force is first started, y is repeatedly calculated, and y is converged.
  • the estimation method for a single population force also has a conventional force, but in the present invention, this method is applied to provide a diplotype determination method for samples from a plurality of populations. .
  • This diplotype determination method will be explained in more detail and more easily.
  • each parameter and variable are set.
  • the number of populations is shown.
  • the matrix bk shows the frequency of the kth population who has the haplotype with ID h. Therefore, when this bk is added to all people belonging to the kth population, it is 1.
  • ki is the ID number of the i-th person that the person in the grid belongs to the ki mother group. Equations for each of these parameters and variables are listed in Equation (35).
  • di which is the diplotype of a certain i-th person, can be expressed from ki and the haplotype frequency of the ki population.
  • the equation (37) representing this di is shown below (see the estimation model (28) from a single population).
  • the (30) power in the estimation model from a single population is expanded, and the probability for the i-th person's dienotype can be expressed as the following (38) it can.
  • the required memory order is about O (KH).
  • the genetic polymorphism to be investigated is determined (step S1 'determination step).
  • allele information from the genetic polymorphism wet process of the population to be investigated is a process of determining genomic information such as genetic polymorphism of a sample using a DNA sequencer or the like.
  • haplotypes of individuals are determined or estimated from allele information (step S3 ⁇ haplotype estimation process).
  • step S4 two loosely related feature parameters representing the group are determined.
  • the origin population membership of the sample and the haplotype frequency of each origin population are used as two feature parameters.
  • an update operator between the two feature parameters is constructed from the genetic information and the third parameter (step S5 'update formula construction process).
  • the third parameter here is the individual diplotype and its frequency.
  • Embedding genetic (statistical) knowledge, that is, genetic information, in the update operator means adopting the diplotype and frequency of the individual, which is genetic (statistical) knowledge, and information as the third parameter. It is none other than.
  • step S6 two feature parameters are obtained in turn by an update operator
  • step S7 conversion convergence step
  • step S8 Two feature parameters are then obtained (step S8).
  • Updating feature parameters using an update formula is nothing but updating two feature parameters by obtaining two feature parameters in turn using this update operator, and alternately deriving one force and the other.
  • Converging the parameters by this update means converging the state variable to the original value, that is, approximating the true value.
  • Figures 5 to 9 show the results of genome analysis using an update operator that uses multilocus genotype data and neuroprotype data to infer the origin population and assign each sample to the origin population. It is a figure which shows an example of the obtained analysis result.
  • the fast grouping algorithm is the analysis method of the present invention.
  • the haplotype is considered to be more powerful gene information than the allele, and the haplotype is used instead of the allele as the gene information used in the analysis.
  • the haplotype frequency bk of the origin population and the degree of membership cik of the sample to the origin population are adopted as two state variables characterizing the population.
  • the characteristics of the population to which the sampled individuals belong can be estimated.
  • the third variable linking the two state variables is the individual diplotype and its frequency ai, di, the data observed as the fourth variable, ie, the dienotype information xi. It was adopted.
  • ai and di are obtained from observation data X. Specifically, put an appropriate initial value in y, calculate (45) and (46) in order, and continue about 100 times until the value converges.
  • equation (54) can be used instead of equation (53). (See equation (49) below) [0128] [Numerical equation 49] li ... ( 5 4
  • FIG. 5 shows the difference in execution time between the present embodiment of the structure analysis program and the MCMC method. As shown in FIG. 4, the method of the present invention can output the result at a much higher speed than the conventional method.
  • Fig. 6 shows the haplotype frequency results of the two origin populations estimated by this example.
  • Fig. 7 shows the result of cik belonging to the origin population of the sample estimated by this example: cik.
  • Fig. 8 shows the ratio of the estimation accuracy for various data of this example, MCMC, and cluster method. It is a comparison result. In the method of the present invention, the estimation is performed with higher accuracy than the conventional method.
  • FIG. 9 is an example of the result of the estimated number of origin populations in this example.
  • analysis for estimating the characteristics of a population from sample data can be performed at a higher speed and with respect to more samples.
  • FIG. 1 An explanatory diagram of the outline of a genome analysis system used in the genome analysis method of the present invention.
  • FIG. 2 is a block diagram of the genome analysis system of the present invention.
  • FIG. 3 is a diagram for explaining the outline of analysis by the genome analysis system of FIG. 1.
  • FIG. 4 is a flowchart showing the genome analysis method of the present invention.
  • FIG. 5 is a comparison of the execution time of the genome analysis method of the present invention and the MCMC method.
  • FIG. 8 Comparison of origin population estimation results between the present invention, MCMC method, and cluster one method.

Abstract

La présente invention concerne un système d'analyse et un procédé permettant d'effectuer une analyse afin d'évaluer la caractéristique d'une population au moyen de données échantillonnées. Des données d'échantillonnage sont capturées. La connaissance du caractère génétique (statistiques) est comprise dans une expression de mise à jour de deux premières et secondes variables d'état caractérisant un groupe. En utilisant l'expression de mise à jour, la mise à jour est répétée afin de faire converger les premières et secondes variables d'état vers les valeurs convenables. Ainsi, le paramètre de caractéristique de la population à laquelle les données d'échantillonnage appartiennent et/ou le paramètre de caractéristique représentant l'emplacement de chaque échantillon dans la population est évalué, puis le résultat de l'évaluation de la caractéristique de la population et/ou de l'emplacement de chaque échantillon dans la population peut être fourni.
PCT/JP2006/313757 2006-07-11 2006-07-11 Système d'analyse du génome, procédé d'analyse du génome et programme WO2008007424A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2006/313757 WO2008007424A1 (fr) 2006-07-11 2006-07-11 Système d'analyse du génome, procédé d'analyse du génome et programme

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2006/313757 WO2008007424A1 (fr) 2006-07-11 2006-07-11 Système d'analyse du génome, procédé d'analyse du génome et programme

Publications (1)

Publication Number Publication Date
WO2008007424A1 true WO2008007424A1 (fr) 2008-01-17

Family

ID=38922995

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2006/313757 WO2008007424A1 (fr) 2006-07-11 2006-07-11 Système d'analyse du génome, procédé d'analyse du génome et programme

Country Status (1)

Country Link
WO (1) WO2008007424A1 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005276022A (ja) * 2004-03-26 2005-10-06 Hitachi Ltd 診断支援システムおよび診断支援方法
WO2006027835A2 (fr) * 2004-09-08 2006-03-16 Genesys Technologies Inc Procede d'analyse de genome

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005276022A (ja) * 2004-03-26 2005-10-06 Hitachi Ltd 診断支援システムおよび診断支援方法
WO2006027835A2 (fr) * 2004-09-08 2006-03-16 Genesys Technologies Inc Procede d'analyse de genome

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ITO T.: "Association test algorithm between a qualitative phenotype and a haplotype or haplotype set using simultaneous estimation of haplotype frequencies, diplotype configurations and diplotype-based penetrances", GENETICS, vol. 168, no. 4, 2004, pages 2339 - 2348, XP002990146 *
SHIMOSATO J., KOMAI M., KATTO J.: "A Proposal for Haplotype Estimation of Many SNPs Inputs Using Block Division", THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, vol. 103, no. 150, 2003, pages 17 - 22, XP003002936 *
TANAKA J. ET AL.: "An unsupervised diplotype clustering method to improve race-based medicine", 2005, XP003002935, Retrieved from the Internet <URL:http://www.jsbi.org/journal/GIW05/GIW05P101.pdf> *

Similar Documents

Publication Publication Date Title
Nater et al. Resolving evolutionary relationships in closely related species with whole-genome sequencing data
Richardson et al. Statistical methods in integrative genomics
Willems et al. Population-scale sequencing data enable precise estimates of Y-STR mutation rates
Li et al. Single nucleotide mapping of trait space reveals Pareto fronts that constrain adaptation
Rosenberg et al. Genealogical trees, coalescent theory and the analysis of genetic polymorphisms
De Iorio et al. Importance sampling on coalescent histories. I
Faure et al. DiMSum: an error model and pipeline for analyzing deep mutational scanning data and diagnosing common experimental pathologies
WO2020133588A1 (fr) Procédé rapide et stable d&#39;évaluation de valeurs individuelles de reproduction de génome d&#39;animal
KR102487135B1 (ko) 기지 또는 미지의 유전자형의 다수의 기여자로부터 dna 혼합물을 분해 및 정량하기 위한 방법 및 시스템
Wang et al. CNVeM: copy number variation detection using uncertainty of read mapping
Illingworth et al. Inferring genome-wide recombination landscapes from advanced intercross lines: application to yeast crosses
Sun et al. Recursive test of Hardy-Weinberg equilibrium in tetraploids
Böndel et al. The distribution of fitness effects of spontaneous mutations in Chlamydomonas reinhardtii inferred using frequency changes under experimental evolution
Li et al. Fit-Seq2. 0: an improved software for high-throughput fitness measurements using pooled competition assays
Nouhaud et al. Rapid and predictable genome evolution across three hybrid ant populations
CN117457065A (zh) 一种基于单细胞多组学数据识别表型相关细胞类型的方法和系统
WO2008007424A1 (fr) Système d&#39;analyse du génome, procédé d&#39;analyse du génome et programme
KR20200135221A (ko) Ngs 데이터를 이용하여 유전형을 예측하는 방법 및 장치
Shpak et al. Variance in estimated pairwise genetic distance under high versus low coverage sequencing: the contribution of linkage disequilibrium
Araki et al. An estimation method for a cellular-state-specific gene regulatory network along tree-structured gene expression profiles
Zhang et al. Transfer learning across cancers on DNA copy number variation analysis
Dao et al. Variance estimation and confidence intervals from high-dimensional genome-wide association studies through misspecified mixed model analysis
WO2006027835A2 (fr) Procede d&#39;analyse de genome
Mackintosh et al. Do chromosome rearrangements fix by genetic drift or natural selection? A test in Brenthis butterflies
Schiavinato et al. JLOH: Inferring loss of heterozygosity blocks from sequencing data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 06768070

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

NENP Non-entry into the national phase

Ref country code: RU

122 Ep: pct application non-entry in european phase

Ref document number: 06768070

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP