CN110400603A - IBD matrix computational approach based on pattern weighting - Google Patents
IBD matrix computational approach based on pattern weighting Download PDFInfo
- Publication number
- CN110400603A CN110400603A CN201910666056.3A CN201910666056A CN110400603A CN 110400603 A CN110400603 A CN 110400603A CN 201910666056 A CN201910666056 A CN 201910666056A CN 110400603 A CN110400603 A CN 110400603A
- Authority
- CN
- China
- Prior art keywords
- ibd
- pattern
- haplotype
- probability
- allele
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/40—Population genetics; Linkage disequilibrium
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Analytical Chemistry (AREA)
- Physiology (AREA)
- Ecology (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Chemical & Material Sciences (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses the IBD matrix computational approach weighted based on pattern.The present invention is according to large family, the genotype data feature of multidigit point, low miss rate, a possibility that haplotype pattern occurs size is measured in proposition using combination condition probability, introduce horizontal and vertical control parameter, the search strategy of bounded depth-first and branch-and-bound is taken, establishes between family's individual based on multiple possible haplotype patterns and specifies the weighting IBD calculation method in site in chromosome.
Description
Technical field
The invention belongs to biogenetics technical field, it is related to the IBD matrix computational approach weighted based on pattern.
Background technique
The rapid development of gene sequencing technology of new generation is so that the Human Genome Project is fulfiled ahead of schedule, nucleic acid database, base
Heredity, physics and the transcriptional expression map of cause have become completely, this provides dye for biological geneticist and related fields researcher
The high density genetic polymorphism flag information of magnanimity in colour solid candidate region.How these microsatellite polymorphisms are made full use of
(microsatellite polymorphisms) or nucleotide polymorphisms (single-nucleotide polymorphisms,
SNP) the important genetics information carried.
The present invention is directed to the large family deletion form data of intensive SNP, analyzes the linkage relationship in SNP data multiple labeling site,
Using there is the higher haplotype pattern of possibility and its combination condition probability, propose family member in chromosome any position
IBD (identity-by descent) matrix weights estimation method, for complex disease mode of inheritance research provide it is important according to
According to.After the progress of haplotype pattern generating process, both made us and has reduced the haplotype pattern being likely to occur using a variety of strategies,
But finally still obtain a large amount of haplotype pattern.And traditional method calculate a family in any two individual at some
It is for a determining haplotype pattern when determining the IBD in site.Therefore, it is proposed that by each haplotype lattice
The combination condition probability of office finally obtains the IBD matrix of pattern weighting as corresponding power.Wherein we will be to haplotype pattern
Weight carries out specially treated, because the combination condition probability of a haplotype pattern is one very small small under normal circumstances
Number, if use of directly holding power may result in calculated result exception.
Summary of the invention
The purpose of the present invention is to provide the IBD matrix computational approach weighted based on pattern.The present invention is for intensive SNP's
Large family deletion form data analyze the linkage relationship in SNP data multiple labeling site, using there are the higher haplotype lattice of possibility
Office and its combination condition probability propose family member in IBD (identity-by descent) matrix of chromosome any position
Weighting estimating method provides important evidence for the mode of inheritance research of complex disease.
The technical scheme adopted by the invention is that including the IBD matrix computational approach based on pattern weighting.
For single haplotype pattern, the IBD matrix between recursive algorithm calculating family member in a certain specified site is utilized.
Based on observation data D, the allele that individual i is inherited from parents xThe allele inherited with ancestors j from parents y(i>
J) the IBD probability on QTL are as follows:
WhereinWithIt is allele respectivelyWith the paternal allele of xWith
Maternal alleleIBD probability.WithIt is individual i respectively from parents' x equipotential base
CauseWithInherit alleleProbability.In this way, any two individual i and j (i > j) is in a hypothesis QTL d in family
The IBD probability at placeAre as follows:
Based on original large family missing data, the available one haplotype pattern collection to match with initial data.Base
It is all inaccurate to ignore the IBD matrix that other monomers type pattern obtains in any one.Therefore using to haplotype pattern
The strategy of weighting calculates final IBD matrix, it may be assumed thatWherein hciIt is that haplotype pattern is concentrated
Haplotype pattern,It is the IBD matrix based on family's data D (haplotype pattern), pr (hci| D) it is given observation man
Haplotype pattern hc under the conditions of race data DiThe probability of appearance.Only take the preceding n of combination condition maximum probabilitys(one given in advance
Control parameter) a haplotype pattern calculated, and corresponding processing has also been made in weight of each haplotype pattern for calculating.
Specific embodiment
The present invention is described in detail With reference to embodiment.
The present invention is based on the IBD matrix computational approach of pattern weighting: for single haplotype pattern, utilizing recursive algorithm meter
In the IBD matrix in a certain specified site between calculation family member.Based on observation data D, the allele that individual i is inherited from parents xThe allele inherited with ancestors j from parents yThe IBD probability of (i > j) on QTL are as follows:
WhereinWithIt is allele respectivelyWith the paternal allele of xWith
Maternal alleleIBD probability.WithIt is individual i respectively from parents' x equipotential base
CauseWithInherit alleleProbability.In this way, any two individual i and j (i > j) is in a hypothesis QTL in family
IBD probability at dAre as follows:
Based on original large family missing data, the available one haplotype pattern collection to match with initial data.Base
It is all inaccurate to ignore the IBD matrix that other monomers type pattern obtains in any one.Therefore using to haplotype pattern
The strategy of weighting calculates final IBD matrix, it may be assumed thatWherein hciIt is that haplotype pattern is concentrated
Haplotype pattern,It is the IBD matrix based on family's data D (haplotype pattern), pr (hci| D) it is given observation man
Haplotype pattern hc under the conditions of race data DiThe probability of appearance.Only take the preceding n of combination condition maximum probabilitys(one given in advance
Control parameter) a haplotype pattern calculated, and corresponding processing has also been made in weight of each haplotype pattern for calculating.
The present invention proposes to utilize combination condition probability according to large family, the genotype data feature of multidigit point, low miss rate
Measure a possibility that haplotype pattern occurs size, introduce horizontal and vertical control parameter, take bounded depth-first and point
The search strategy that branch is delimited is established between family's individual based on multiple possible haplotype patterns and specifies the weighting in site in chromosome
IBD calculation method.
The above is only not to make limit in any form to the present invention to better embodiment of the invention
System, any simple modification that embodiment of above is made according to the technical essence of the invention, equivalent variations and modification,
Belong in the range of technical solution of the present invention.
Claims (1)
1. the IBD matrix computational approach based on pattern weighting, it is characterised in that: for single haplotype pattern, calculated using recurrence
In the IBD matrix in a certain specified site, based on observation data D, the equipotential that individual i is inherited from parents x between method calculating family member
GeneThe allele inherited with ancestors j from parents yIBD probability on QTL are as follows:
WhereinWithIt is allele respectivelyWith the paternal allele of xAnd female parent
AlleleIBD probability,WithIt is individual i respectively from parents' x allele
WithInherit alleleProbability, in this way, any two individual i and j (i > j) is at a hypothesis QTL d in family
IBD probabilityAre as follows:
Based on original large family missing data, the available one haplotype pattern collection to match with initial data is based on it
In any one and to ignore the IBD matrix that other monomers type pattern obtains all inaccurate, therefore weighted using to haplotype pattern
Strategy calculate final IBD matrix:Wherein hciIt is the haplotype that haplotype pattern is concentrated
Pattern,It is the IBD matrix based on family's data D haplotype pattern, pr (hci| D) it is given observation family's data D
Under the conditions of haplotype pattern hciThe probability of appearance only takes the preceding n of combination condition maximum probabilitysA haplotype pattern is calculated,
Corresponding processing has also been made in weight of each haplotype pattern for calculating.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910666056.3A CN110400603A (en) | 2019-07-23 | 2019-07-23 | IBD matrix computational approach based on pattern weighting |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910666056.3A CN110400603A (en) | 2019-07-23 | 2019-07-23 | IBD matrix computational approach based on pattern weighting |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110400603A true CN110400603A (en) | 2019-11-01 |
Family
ID=68325754
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910666056.3A Pending CN110400603A (en) | 2019-07-23 | 2019-07-23 | IBD matrix computational approach based on pattern weighting |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110400603A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020077775A1 (en) * | 2000-05-25 | 2002-06-20 | Schork Nicholas J. | Methods of DNA marker-based genetic analysis using estimated haplotype frequencies and uses thereof |
US20050089906A1 (en) * | 2003-09-19 | 2005-04-28 | Nec Corporation Et Al. | Haplotype estimation method |
US20110117552A1 (en) * | 2002-10-18 | 2011-05-19 | Cedars-Sinai Medical Center | Methods of using a nod2/card15 haplotype to diagnose crohn's disease |
CN107977550A (en) * | 2017-12-29 | 2018-05-01 | 天津科技大学 | A kind of quick analysis Disease-causing gene algorithm based on compression |
CN109072299A (en) * | 2016-05-12 | 2018-12-21 | 先锋国际良种公司 | Merge the method for Genotyping simultaneously |
CN109477145A (en) * | 2016-07-05 | 2019-03-15 | 剑桥企业有限公司 | The biomarker of inflammatory bowel disease |
CN109493919A (en) * | 2018-10-31 | 2019-03-19 | 中国石油大学(华东) | Genotype assigning method based on conditional probability |
-
2019
- 2019-07-23 CN CN201910666056.3A patent/CN110400603A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020077775A1 (en) * | 2000-05-25 | 2002-06-20 | Schork Nicholas J. | Methods of DNA marker-based genetic analysis using estimated haplotype frequencies and uses thereof |
US20110117552A1 (en) * | 2002-10-18 | 2011-05-19 | Cedars-Sinai Medical Center | Methods of using a nod2/card15 haplotype to diagnose crohn's disease |
US20050089906A1 (en) * | 2003-09-19 | 2005-04-28 | Nec Corporation Et Al. | Haplotype estimation method |
CN109072299A (en) * | 2016-05-12 | 2018-12-21 | 先锋国际良种公司 | Merge the method for Genotyping simultaneously |
CN109477145A (en) * | 2016-07-05 | 2019-03-15 | 剑桥企业有限公司 | The biomarker of inflammatory bowel disease |
CN107977550A (en) * | 2017-12-29 | 2018-05-01 | 天津科技大学 | A kind of quick analysis Disease-causing gene algorithm based on compression |
CN109493919A (en) * | 2018-10-31 | 2019-03-19 | 中国石油大学(华东) | Genotype assigning method based on conditional probability |
Non-Patent Citations (4)
Title |
---|
BIN YAN ET AL.: "An efficient weighted tag SNP-set analytical method in genome-wide association studies", pages 1 - 8 * |
GUIMIN GAO ET AL.: "Approximating Identity-by-Descent Matrices Using Multiple Haplotype Configurations on Pedigrees", 《GENETICS》, vol. 171, no. 1, pages 365 - 376 * |
LIDE HAN ET AL.: "Using identity by descent estimation with dense genotype data to detect positive selection", pages 205 - 211 * |
蔡振媛 等: "基于线粒体控制区的序列变异分析青海东部甘肃鼢鼠遗传多样性", vol. 50, no. 3, pages 337 - 351 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wei et al. | Detecting epistasis in human complex traits | |
Mackay et al. | The genetics of quantitative traits: challenges and prospects | |
Tang et al. | Reconstructing genetic ancestry blocks in admixed individuals | |
Giraud et al. | Linkage disequilibrium with linkage analysis of multiline crosses reveals different multiallelic QTL for hybrid performance in the flint and dent heterotic groups of maize | |
García-Gámez et al. | Linkage disequilibrium and inbreeding estimation in Spanish Churra sheep | |
Zhao et al. | Power and precision of alternate methods for linkage disequilibrium mapping of quantitative trait loci | |
CN108913776A (en) | Chemicotherapy damages the screening technique and kit of relevant DNA molecular marker | |
CN115691660A (en) | Method for whole genome selection research of cadmium accumulation traits of corn grains | |
Bora et al. | Genetic diversity and population structure of selected Ethiopian indigenous cattle breeds using microsatellite markers | |
Pouyet et al. | Towards an improved understanding of molecular evolution: the relative roles of selection, drift, and everything in between | |
Zhang et al. | Genome-wide identification of allele-specific effects on gene expression for single and multiple individuals | |
Lynch et al. | The linkage-disequilibrium and recombinational landscape in Daphnia pulex | |
CN108411024A (en) | One molecular marker SNP 6 isolated with cucumber-pickled cucumber Introgressed line mildew-resistance gene | |
CN116254364B (en) | SNP (Single nucleotide polymorphism) marker related to peanut fat content traits and application thereof | |
Narain | Quantitative genetics: past and present | |
CN110400603A (en) | IBD matrix computational approach based on pattern weighting | |
CN109493919B (en) | Genotype assignment method based on conditional probability | |
Janzen et al. | Estimating the time since admixture from phased and unphased molecular data | |
Liao et al. | A novel method to select informative SNPs and their application in genetic association studies | |
JP6564053B2 (en) | A method for determining whether cells or cell groups are the same person, whether they are others, whether they are parents and children, or whether they are related | |
Habier et al. | A two-stage approximation for analysis of mixture genetic models in large pedigrees | |
CN106755354B (en) | One kind molecular labeling TaSnRK2.4A relevant to thousand grain weight of wheat and stalk soluble sugar content and its application | |
Ma et al. | Detection of SNP-SNP interaction based on the generalized particle swarm optimization algorithm | |
Liu et al. | TrioMDR: detecting SNP interactions in trio families with model-based multifactor dimensionality reduction | |
Alizadeh et al. | SMIA: a simple way for inference of admixed population ancestors |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |