CN110400603A - IBD matrix computational approach based on pattern weighting - Google Patents

IBD matrix computational approach based on pattern weighting Download PDF

Info

Publication number
CN110400603A
CN110400603A CN201910666056.3A CN201910666056A CN110400603A CN 110400603 A CN110400603 A CN 110400603A CN 201910666056 A CN201910666056 A CN 201910666056A CN 110400603 A CN110400603 A CN 110400603A
Authority
CN
China
Prior art keywords
ibd
pattern
haplotype
probability
allele
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910666056.3A
Other languages
Chinese (zh)
Inventor
王淑栋
李华昱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Petroleum East China
Original Assignee
China University of Petroleum East China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Petroleum East China filed Critical China University of Petroleum East China
Priority to CN201910666056.3A priority Critical patent/CN110400603A/en
Publication of CN110400603A publication Critical patent/CN110400603A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/40Population genetics; Linkage disequilibrium

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Analytical Chemistry (AREA)
  • Physiology (AREA)
  • Ecology (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Chemical & Material Sciences (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses the IBD matrix computational approach weighted based on pattern.The present invention is according to large family, the genotype data feature of multidigit point, low miss rate, a possibility that haplotype pattern occurs size is measured in proposition using combination condition probability, introduce horizontal and vertical control parameter, the search strategy of bounded depth-first and branch-and-bound is taken, establishes between family's individual based on multiple possible haplotype patterns and specifies the weighting IBD calculation method in site in chromosome.

Description

IBD matrix computational approach based on pattern weighting
Technical field
The invention belongs to biogenetics technical field, it is related to the IBD matrix computational approach weighted based on pattern.
Background technique
The rapid development of gene sequencing technology of new generation is so that the Human Genome Project is fulfiled ahead of schedule, nucleic acid database, base Heredity, physics and the transcriptional expression map of cause have become completely, this provides dye for biological geneticist and related fields researcher The high density genetic polymorphism flag information of magnanimity in colour solid candidate region.How these microsatellite polymorphisms are made full use of (microsatellite polymorphisms) or nucleotide polymorphisms (single-nucleotide polymorphisms, SNP) the important genetics information carried.
The present invention is directed to the large family deletion form data of intensive SNP, analyzes the linkage relationship in SNP data multiple labeling site, Using there is the higher haplotype pattern of possibility and its combination condition probability, propose family member in chromosome any position IBD (identity-by descent) matrix weights estimation method, for complex disease mode of inheritance research provide it is important according to According to.After the progress of haplotype pattern generating process, both made us and has reduced the haplotype pattern being likely to occur using a variety of strategies, But finally still obtain a large amount of haplotype pattern.And traditional method calculate a family in any two individual at some It is for a determining haplotype pattern when determining the IBD in site.Therefore, it is proposed that by each haplotype lattice The combination condition probability of office finally obtains the IBD matrix of pattern weighting as corresponding power.Wherein we will be to haplotype pattern Weight carries out specially treated, because the combination condition probability of a haplotype pattern is one very small small under normal circumstances Number, if use of directly holding power may result in calculated result exception.
Summary of the invention
The purpose of the present invention is to provide the IBD matrix computational approach weighted based on pattern.The present invention is for intensive SNP's Large family deletion form data analyze the linkage relationship in SNP data multiple labeling site, using there are the higher haplotype lattice of possibility Office and its combination condition probability propose family member in IBD (identity-by descent) matrix of chromosome any position Weighting estimating method provides important evidence for the mode of inheritance research of complex disease.
The technical scheme adopted by the invention is that including the IBD matrix computational approach based on pattern weighting.
For single haplotype pattern, the IBD matrix between recursive algorithm calculating family member in a certain specified site is utilized. Based on observation data D, the allele that individual i is inherited from parents xThe allele inherited with ancestors j from parents y(i> J) the IBD probability on QTL are as follows:
WhereinWithIt is allele respectivelyWith the paternal allele of xWith Maternal alleleIBD probability.WithIt is individual i respectively from parents' x equipotential base CauseWithInherit alleleProbability.In this way, any two individual i and j (i > j) is in a hypothesis QTL d in family The IBD probability at placeAre as follows:
Based on original large family missing data, the available one haplotype pattern collection to match with initial data.Base It is all inaccurate to ignore the IBD matrix that other monomers type pattern obtains in any one.Therefore using to haplotype pattern The strategy of weighting calculates final IBD matrix, it may be assumed thatWherein hciIt is that haplotype pattern is concentrated Haplotype pattern,It is the IBD matrix based on family's data D (haplotype pattern), pr (hci| D) it is given observation man Haplotype pattern hc under the conditions of race data DiThe probability of appearance.Only take the preceding n of combination condition maximum probabilitys(one given in advance Control parameter) a haplotype pattern calculated, and corresponding processing has also been made in weight of each haplotype pattern for calculating.
Specific embodiment
The present invention is described in detail With reference to embodiment.
The present invention is based on the IBD matrix computational approach of pattern weighting: for single haplotype pattern, utilizing recursive algorithm meter In the IBD matrix in a certain specified site between calculation family member.Based on observation data D, the allele that individual i is inherited from parents xThe allele inherited with ancestors j from parents yThe IBD probability of (i > j) on QTL are as follows:
WhereinWithIt is allele respectivelyWith the paternal allele of xWith Maternal alleleIBD probability.WithIt is individual i respectively from parents' x equipotential base CauseWithInherit alleleProbability.In this way, any two individual i and j (i > j) is in a hypothesis QTL in family IBD probability at dAre as follows:
Based on original large family missing data, the available one haplotype pattern collection to match with initial data.Base It is all inaccurate to ignore the IBD matrix that other monomers type pattern obtains in any one.Therefore using to haplotype pattern The strategy of weighting calculates final IBD matrix, it may be assumed thatWherein hciIt is that haplotype pattern is concentrated Haplotype pattern,It is the IBD matrix based on family's data D (haplotype pattern), pr (hci| D) it is given observation man Haplotype pattern hc under the conditions of race data DiThe probability of appearance.Only take the preceding n of combination condition maximum probabilitys(one given in advance Control parameter) a haplotype pattern calculated, and corresponding processing has also been made in weight of each haplotype pattern for calculating.
The present invention proposes to utilize combination condition probability according to large family, the genotype data feature of multidigit point, low miss rate Measure a possibility that haplotype pattern occurs size, introduce horizontal and vertical control parameter, take bounded depth-first and point The search strategy that branch is delimited is established between family's individual based on multiple possible haplotype patterns and specifies the weighting in site in chromosome IBD calculation method.
The above is only not to make limit in any form to the present invention to better embodiment of the invention System, any simple modification that embodiment of above is made according to the technical essence of the invention, equivalent variations and modification, Belong in the range of technical solution of the present invention.

Claims (1)

1. the IBD matrix computational approach based on pattern weighting, it is characterised in that: for single haplotype pattern, calculated using recurrence In the IBD matrix in a certain specified site, based on observation data D, the equipotential that individual i is inherited from parents x between method calculating family member GeneThe allele inherited with ancestors j from parents yIBD probability on QTL are as follows:
WhereinWithIt is allele respectivelyWith the paternal allele of xAnd female parent AlleleIBD probability,WithIt is individual i respectively from parents' x allele WithInherit alleleProbability, in this way, any two individual i and j (i > j) is at a hypothesis QTL d in family IBD probabilityAre as follows:
Based on original large family missing data, the available one haplotype pattern collection to match with initial data is based on it In any one and to ignore the IBD matrix that other monomers type pattern obtains all inaccurate, therefore weighted using to haplotype pattern Strategy calculate final IBD matrix:Wherein hciIt is the haplotype that haplotype pattern is concentrated Pattern,It is the IBD matrix based on family's data D haplotype pattern, pr (hci| D) it is given observation family's data D Under the conditions of haplotype pattern hciThe probability of appearance only takes the preceding n of combination condition maximum probabilitysA haplotype pattern is calculated, Corresponding processing has also been made in weight of each haplotype pattern for calculating.
CN201910666056.3A 2019-07-23 2019-07-23 IBD matrix computational approach based on pattern weighting Pending CN110400603A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910666056.3A CN110400603A (en) 2019-07-23 2019-07-23 IBD matrix computational approach based on pattern weighting

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910666056.3A CN110400603A (en) 2019-07-23 2019-07-23 IBD matrix computational approach based on pattern weighting

Publications (1)

Publication Number Publication Date
CN110400603A true CN110400603A (en) 2019-11-01

Family

ID=68325754

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910666056.3A Pending CN110400603A (en) 2019-07-23 2019-07-23 IBD matrix computational approach based on pattern weighting

Country Status (1)

Country Link
CN (1) CN110400603A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020077775A1 (en) * 2000-05-25 2002-06-20 Schork Nicholas J. Methods of DNA marker-based genetic analysis using estimated haplotype frequencies and uses thereof
US20050089906A1 (en) * 2003-09-19 2005-04-28 Nec Corporation Et Al. Haplotype estimation method
US20110117552A1 (en) * 2002-10-18 2011-05-19 Cedars-Sinai Medical Center Methods of using a nod2/card15 haplotype to diagnose crohn's disease
CN107977550A (en) * 2017-12-29 2018-05-01 天津科技大学 A kind of quick analysis Disease-causing gene algorithm based on compression
CN109072299A (en) * 2016-05-12 2018-12-21 先锋国际良种公司 Merge the method for Genotyping simultaneously
CN109477145A (en) * 2016-07-05 2019-03-15 剑桥企业有限公司 The biomarker of inflammatory bowel disease
CN109493919A (en) * 2018-10-31 2019-03-19 中国石油大学(华东) Genotype assigning method based on conditional probability

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020077775A1 (en) * 2000-05-25 2002-06-20 Schork Nicholas J. Methods of DNA marker-based genetic analysis using estimated haplotype frequencies and uses thereof
US20110117552A1 (en) * 2002-10-18 2011-05-19 Cedars-Sinai Medical Center Methods of using a nod2/card15 haplotype to diagnose crohn's disease
US20050089906A1 (en) * 2003-09-19 2005-04-28 Nec Corporation Et Al. Haplotype estimation method
CN109072299A (en) * 2016-05-12 2018-12-21 先锋国际良种公司 Merge the method for Genotyping simultaneously
CN109477145A (en) * 2016-07-05 2019-03-15 剑桥企业有限公司 The biomarker of inflammatory bowel disease
CN107977550A (en) * 2017-12-29 2018-05-01 天津科技大学 A kind of quick analysis Disease-causing gene algorithm based on compression
CN109493919A (en) * 2018-10-31 2019-03-19 中国石油大学(华东) Genotype assigning method based on conditional probability

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
BIN YAN ET AL.: "An efficient weighted tag SNP-set analytical method in genome-wide association studies", pages 1 - 8 *
GUIMIN GAO ET AL.: "Approximating Identity-by-Descent Matrices Using Multiple Haplotype Configurations on Pedigrees", 《GENETICS》, vol. 171, no. 1, pages 365 - 376 *
LIDE HAN ET AL.: "Using identity by descent estimation with dense genotype data to detect positive selection", pages 205 - 211 *
蔡振媛 等: "基于线粒体控制区的序列变异分析青海东部甘肃鼢鼠遗传多样性", vol. 50, no. 3, pages 337 - 351 *

Similar Documents

Publication Publication Date Title
Wei et al. Detecting epistasis in human complex traits
Mackay et al. The genetics of quantitative traits: challenges and prospects
Tang et al. Reconstructing genetic ancestry blocks in admixed individuals
Giraud et al. Linkage disequilibrium with linkage analysis of multiline crosses reveals different multiallelic QTL for hybrid performance in the flint and dent heterotic groups of maize
García-Gámez et al. Linkage disequilibrium and inbreeding estimation in Spanish Churra sheep
Zhao et al. Power and precision of alternate methods for linkage disequilibrium mapping of quantitative trait loci
CN108913776A (en) Chemicotherapy damages the screening technique and kit of relevant DNA molecular marker
CN115691660A (en) Method for whole genome selection research of cadmium accumulation traits of corn grains
Bora et al. Genetic diversity and population structure of selected Ethiopian indigenous cattle breeds using microsatellite markers
Pouyet et al. Towards an improved understanding of molecular evolution: the relative roles of selection, drift, and everything in between
Zhang et al. Genome-wide identification of allele-specific effects on gene expression for single and multiple individuals
Lynch et al. The linkage-disequilibrium and recombinational landscape in Daphnia pulex
CN108411024A (en) One molecular marker SNP 6 isolated with cucumber-pickled cucumber Introgressed line mildew-resistance gene
CN116254364B (en) SNP (Single nucleotide polymorphism) marker related to peanut fat content traits and application thereof
Narain Quantitative genetics: past and present
CN110400603A (en) IBD matrix computational approach based on pattern weighting
CN109493919B (en) Genotype assignment method based on conditional probability
Janzen et al. Estimating the time since admixture from phased and unphased molecular data
Liao et al. A novel method to select informative SNPs and their application in genetic association studies
JP6564053B2 (en) A method for determining whether cells or cell groups are the same person, whether they are others, whether they are parents and children, or whether they are related
Habier et al. A two-stage approximation for analysis of mixture genetic models in large pedigrees
CN106755354B (en) One kind molecular labeling TaSnRK2.4A relevant to thousand grain weight of wheat and stalk soluble sugar content and its application
Ma et al. Detection of SNP-SNP interaction based on the generalized particle swarm optimization algorithm
Liu et al. TrioMDR: detecting SNP interactions in trio families with model-based multifactor dimensionality reduction
Alizadeh et al. SMIA: a simple way for inference of admixed population ancestors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination