WO2006027835A2 - ゲノム解析方法 - Google Patents
ゲノム解析方法 Download PDFInfo
- Publication number
- WO2006027835A2 WO2006027835A2 PCT/JP2004/013075 JP2004013075W WO2006027835A2 WO 2006027835 A2 WO2006027835 A2 WO 2006027835A2 JP 2004013075 W JP2004013075 W JP 2004013075W WO 2006027835 A2 WO2006027835 A2 WO 2006027835A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- population
- analysis method
- state
- genome
- sample
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/40—Population genetics; Linkage disequilibrium
Definitions
- the present invention relates to a genome analysis method for performing analysis for estimating characteristics of a population from sample data.
- the genome refers to a set of chromosomes that are indispensable for carrying out life activities.
- the genome is a compound word made up of a gene and a chromosome.
- the basis of life is a cell, the cell is surrounded by a cell membrane, the nucleus is surrounded by a nuclear membrane, and the independence of each unit is maintained.
- Human cells are specialized cell groups that have differentiated functions and forms such as nerve cells, muscle cells, blood cells, immune system cells, epithelial cells that are cells on the surface of skin and tissues, and sensory cells. It is made up of undifferentiated cells called stem cells. Cells have important time-varying aspects. It is to make new cells by dividing cells. Cell division is an important mechanism that enables the transmission and expression of genetic information.
- chromosomes in the nucleus. These chromosomes are the ones that carry genetic information, and the genes are lined up on them. If a gene defines how to make a protein in the genome! Make up chromosomes! The basic substance is DNA (deoxyribonucleic acid), and genetic information is conserved in the sequence of four bases, A, T, G, and C. Haploid organisms such as butterflies and viruses have a single genome.
- Germ cells such as human eggs and sperm, which are diploid organisms, have 23 types of staining. It has a set of genomes consisting of color bodies. Somatic cells have two sets of genomes (46 types of chromosomes). The human genome is made up of about 3 billion DNA base pairs (3000 megabase pairs, 1 million is 1 million base pairs), and a single string is about 1 meter long.
- a genome is a total of gene information existing in a cell, and includes information for controlling genes and gene expression.
- proteins and genes are so-called products and blueprints, and there are parts on the genome that control and control the production of products in addition to blueprints.
- the significance of its existence is unknown, but there are also some areas where it seems to have some influence on the maintenance of biological functions. By clarifying these, it is believed that more accurate understanding of life phenomena will be possible.
- genome analysis is a comprehensive analysis of the genetic information of an organism's genome, and the power to determine the base sequences of DNA molecules (GATC alignment) that make up the genome begins.
- GATC alignment the base sequences of DNA molecules
- the nucleotide sequence of about 3 billion pairs of DNA contained in 46 chromosomes (that is, DNA molecules) in total, 44 autosomes, X chromosome, and Y chromosome, is the human genome.
- Genomic information possessed by is inherited from previous generations of parental genomic information. Parents' genome information inherits the ancestral power of the previous generation. In this way, by going back to the origin of genetic information one generation ago, we can reach the genome of the first organism 3.8 billion years ago.
- genome sequence information is input as a genome analysis, and a plurality of (for example, 10) or more identical bases are continuously arranged in the input genome sequence information. If there is a sequence portion, the plurality of the same bases are continuously arranged !, and the sequence portion is continuously arranged in front and rear of the predetermined number of
- a genome analysis method that extracts base sequence information consisting of bases and outputs the extracted base sequence information.
- a polymorphic marker for identifying a disease-related candidate gene can be found quickly and efficiently with an accuracy close to that of SNPs without using SNPs (single nucleotide polymorphism). It's like! /
- Patent Document 1 is a force that is a method of genome analysis that attempts to find polymorphic markers for identifying disease-related candidate genes. It is necessary to analyze the DNA base sequence as well as various viewpoints. Therefore, it has not yet been elucidated, and it is expected that there will be various methods for genome analysis, and it is expected to be elucidated.
- the present invention has been made in view of such a situation, and provides a genome analysis method capable of estimating the characteristics of a population from sample data.
- Patent Document 1 Japanese Patent Laid-Open No. 2003-288346
- the genome analysis method of the present invention is a genome analysis method for performing analysis for estimating the characteristics of a population from sample data, and is a dual method based on knowledge of the step of acquiring the sample data and genetic (statistical) science. Estimating the characteristics of the population to which the sample data belongs by selecting two first and second state variables that have sex and converging the first and second state variables to their original values. And the result of estimating the characteristics of the population And a step of performing.
- first and second state variables are converted to each other using a conversion formula in which genetic (statistical) knowledge expressed by each other is embedded as an operator, and the first and second state variables are converted to each other. It is possible to have a process of estimating by a third state variable embedded in those operators.
- the first state variable may be an origin population membership degree of each sample
- the second state variable may be an origin population haplotype frequency
- the third state variable may be the diplotype of each sample and its frequency.
- the process of determining the genetic polymorphism to be investigated Determining the allele information by the wet process of the genetic polymorphism of the population, and the haplotype of the individual from the allele information
- a step of determining or estimating a step of determining two feature parameters in a dual state of the population, a step of constructing a conversion operator between the two feature parameters from genetic information, and a predetermined initial Starting with the value, and sequentially obtaining the two feature parameters by a transformation operator and repeating the transformation until the feature parameters converge, and obtaining the two feature parameters results in obtaining the sample.
- Population characteristics can be estimated from the data.
- Fig. 1 is a diagram for explaining the outline of the genome analyzer used in the genome analysis method of the present invention
- Fig. 2 is a diagram for explaining the outline of analysis by the genome analyzer of Fig. 1
- Fig. 3 is It is a flowchart which shows the genome-analysis method of this invention.
- the genome analyzer 1 estimates the characteristics of a population from sample data.
- the analysis result is output.
- a notebook computer, a desktop computer or the like equipped with an analysis program for performing calculations for genome analysis described later can be used as the genome analysis apparatus 1.
- the outline of the analysis by the genome analysis apparatus 1 is a state of the first state, which is a model of the reality that can be characterized in a duality state as shown in Fig. 2, for example.
- state A and state B which is the second state.
- transformation operator ⁇ and transformation operator ⁇ By embedding genetic (statistical) knowledge in transformation operator ⁇ and transformation operator ⁇ , the duality operation between state A and state B can be performed. This is done, and the features of the population are estimated by converging on the values (states) of the real (population).
- state A is the origin population attribution of each sample
- state B is the origin population haplotype frequency.
- the genome analysis apparatus 1 also observes these two variables when the two first and second variables representing the characteristics of the population to which the sample data belongs are not completely independent and not completely dependent. It has a function to estimate two variables from a possible third variable (incomplete data). This is because, for example, as shown in Fig. 2, it can be considered that state A and state B form a kind of duality.
- the population to which the sample data belongs is considered as a system that can be expressed in Hilbert space.
- two first and second variables are expressed as q, p (i is a sample number, k is
- 1 and k can be thought of as transformation operators that transform each other so that the particle side and wave side of photons can be Fourier transformed (inverse Fourier transform).
- the degree of attribution of sample i to the origin population is ⁇
- p and q can be expressed as follows, assuming that they are converted to each other by a projection operator.
- the ratio of i i k k ijl ij2 i is equivalent to adding to every k and standardizing.
- the genetic polymorphism to be investigated is determined (step Sl).
- the allele information is first determined by the wet process of the genetic polymorphism of the population to be investigated (step S2).
- the individual haplotype is determined or estimated from the allele information (step S3).
- step S4 two feature parameters in the dual state of the group are determined.
- the origin population membership of the sample and the haplotype frequency of each origin population are used as two feature parameters.
- a conversion operator between two feature parameters is constructed from genetic information (step S5).
- the genetic information here is the individual's duplotype and its frequency.
- step S6 two characteristic parameters are obtained in turn by a conversion operator.
- step S7 the conversion is repeated until the parameter force converges.
- step S8 two feature parameters are obtained.
- Figure 4-15 shown below, shows a genome with a duality transformation operator that uses multilocus genotype data and nodal type data to infer the origin population and assign each sample to the origin population. It is a figure which shows an example of the analysis result by an analysis method.
- case-control correlation analysis phenotype data eg disease It is a powerful way to map genotype data to correlation mapping to find genes.
- genotype data from structured populations can result in errors in data mapping and result in positive results.
- the haplotype was considered to be more powerful genetic information than the allele, and the haplotype was used instead of the allele.
- the vector in the Hilbert space represents a genetic state.
- An operator can transform one vector representation into another vector representation.
- the attribution q to the source population was adopted as two characterizing operators in the duality state. As a result, the hidden reality to which the sampled individual belongs can be estimated. Further, in this example, as described above, the individual duplotype and its frequency d are adopted as the observed data.
- q is the Fourier transform of the photon particle side and the wave side, as described above.
- ⁇ k ⁇ 'a m ' * b kl * b kl , ... (6)
- ⁇ K ⁇ i c ik ⁇ ir a iir (7)
- step 1) an appropriate initial value corresponding to q ; However, the initial value is other than lZk. K is the number of origin populations.
- step 2) find p from equation (7).
- step 3 q is obtained from equation (6).
- FIG. 4 shows an example of haplotype frequencies of, for example, two groups of a group (origin population).
- the haplotype also represents six sitting forces. It can also be seen that each locus has two alleles (SNPs).
- SNPs alleles
- "1" represents a large number of alleles
- "2" represents a small number of alleles.
- the detailed group (origin population) information evaluated here and its haplotype frequency can be confirmed from the comprehensive data in Figure 10.
- FIG. 5 shows the q evaluation, and the details can be confirmed from the comprehensive data in FIG.
- a comparison is made between the force of which the sampled population is composed of several origin populations and the evaluation of the method of the present invention with other methods.
- the more similar the haplotype frequencies of the origin population the more difficult it is to identify these differences.
- 1123 is the combined data of II, 12 and 13 as three haplotype blocks.
- 1123456 is the combined data of II, 12, 13, 14, 15 and 16. The results of these multiple haplotype blocks will show a much better match than the single block alone.
- FIG. 12 is a diagram showing comprehensive data as details of 1 and 3
- sample data is taken, genetic (statistical) knowledge is embedded in the two first and second state variables forming duality, and the first and second state variables
- genetic (statistical) knowledge is embedded in the two first and second state variables forming duality, and the first and second state variables
- the characteristics of the population of the sample data are estimated, and the results of estimating the characteristics of the population are output.
- the characteristics of the population from the sample data Can be analyzed.
- FIG. 1 is a diagram for explaining an outline of a genome analysis apparatus used in the genome analysis method of the present invention.
- FIG. 2 is a diagram for explaining the outline of analysis by the genome analysis apparatus of FIG.
- FIG. 3 is a flowchart showing the genome analysis method of the present invention.
- FIG. 4 is a diagram showing an example of haplotype frequencies of two origin populations.
- FIG. 5 is a diagram showing q evaluation.
Landscapes
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Analytical Chemistry (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/574,948 US20080318214A1 (en) | 2004-09-08 | 2004-09-08 | Genome Analysis Method |
EP04787758A EP1832992A4 (en) | 2004-09-08 | 2004-09-08 | GENOME ANALYSIS METHOD |
PCT/JP2004/013075 WO2006027835A2 (ja) | 2004-09-08 | 2004-09-08 | ゲノム解析方法 |
JP2006534946A JPWO2006027835A1 (ja) | 2004-09-08 | 2004-09-08 | ゲノム解析方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2004/013075 WO2006027835A2 (ja) | 2004-09-08 | 2004-09-08 | ゲノム解析方法 |
Publications (3)
Publication Number | Publication Date |
---|---|
WO2006027835A1 WO2006027835A1 (ja) | 2006-03-16 |
WO2006027835A2 true WO2006027835A2 (ja) | 2006-03-16 |
WO2006027835A8 WO2006027835A8 (ja) | 2009-08-20 |
Family
ID=36036742
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2004/013075 WO2006027835A2 (ja) | 2004-09-08 | 2004-09-08 | ゲノム解析方法 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20080318214A1 (ja) |
EP (1) | EP1832992A4 (ja) |
JP (1) | JPWO2006027835A1 (ja) |
WO (1) | WO2006027835A2 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008007424A1 (fr) * | 2006-07-11 | 2008-01-17 | Digital Information Technologies Corporation | Système d'analyse du génome, procédé d'analyse du génome et programme |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU785425B2 (en) * | 2001-03-30 | 2007-05-17 | Genetic Technologies Limited | Methods of genomic analysis |
-
2004
- 2004-09-08 JP JP2006534946A patent/JPWO2006027835A1/ja active Pending
- 2004-09-08 EP EP04787758A patent/EP1832992A4/en not_active Withdrawn
- 2004-09-08 WO PCT/JP2004/013075 patent/WO2006027835A2/ja active Application Filing
- 2004-09-08 US US11/574,948 patent/US20080318214A1/en not_active Abandoned
Non-Patent Citations (1)
Title |
---|
See references of EP1832992A1 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008007424A1 (fr) * | 2006-07-11 | 2008-01-17 | Digital Information Technologies Corporation | Système d'analyse du génome, procédé d'analyse du génome et programme |
Also Published As
Publication number | Publication date |
---|---|
US20080318214A1 (en) | 2008-12-25 |
EP1832992A1 (en) | 2007-09-12 |
EP1832992A4 (en) | 2008-02-13 |
WO2006027835A8 (ja) | 2009-08-20 |
JPWO2006027835A1 (ja) | 2008-07-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hohenlohe et al. | Population genomic analysis of model and nonmodel organisms using sequenced RAD tags | |
Zhang et al. | Learning gene networks under SNP perturbations using eQTL datasets | |
Akers et al. | Gene regulatory network inference in single-cell biology | |
Nielsen et al. | Likelihood analysis of ongoing gene flow and historical association | |
Dinh et al. | Statistical inference for the evolutionary history of cancer genomes | |
Illingworth et al. | Inferring genome-wide recombination landscapes from advanced intercross lines: application to yeast crosses | |
Balaban et al. | Phylogenetic double placement of mixed samples | |
WO2006027835A2 (ja) | ゲノム解析方法 | |
Hibbins et al. | Population genetic tests for the direction and relative timing of introgression | |
Ortega-Del Vecchyo et al. | Haplotype-based inference of the distribution of fitness effects | |
Barroso et al. | Inference of recombination maps from a single pair of genomes and its application to archaic samples | |
Meyer et al. | Modeling methylation patterns with long read sequencing data | |
Polushina et al. | Change-point detection in binary Markov DNA sequences by the Cross-Entropy method | |
CN116959561B (zh) | 一种基于神经网络模型的基因相互作用预测方法和装置 | |
Araki et al. | An estimation method for a cellular-state-specific gene regulatory network along tree-structured gene expression profiles | |
Jhwueng | An improved tree-based statistical method for genome-wide association study | |
Hintze et al. | Testing the efficiency of a genome-wide association study on a computational evolutionary model | |
Cao et al. | De novo reconstruction of microbial haplotypes by integrating statistical and physical linkage | |
WO2008007424A1 (fr) | Système d'analyse du génome, procédé d'analyse du génome et programme | |
Casale | Multivariate linear mixed models for statistical genetics | |
Magori-Cohen et al. | Mutation parameters from DNA sequence data using graph theoretic measures on lineage trees | |
WO2006120752A1 (ja) | ゲノム解析システムii | |
Lukaszewicz et al. | Approximate Bayesian computational methods to estimate the strength of divergent selection in population genomics models | |
Zhou et al. | Pairclonetree: Reconstruction of tumor subclone phylogeny based on mutation pairs using next generation sequencing data | |
Pearson | Ancestral Paths: Redefining local genetic ancestry and its inference with application to Europeans |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
DPEN | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed from 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2006534946 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11574948 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2004787758 Country of ref document: EP |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWP | Wipo information: published in national office |
Ref document number: 2004787758 Country of ref document: EP |