CN110400603A

CN110400603A - IBD matrix computational approach based on pattern weighting

Info

Publication number: CN110400603A
Application number: CN201910666056.3A
Authority: CN
Inventors: 王淑栋; 李华昱
Original assignee: China University of Petroleum East China
Current assignee: China University of Petroleum East China
Priority date: 2019-07-23
Filing date: 2019-07-23
Publication date: 2019-11-01

Abstract

The invention discloses the IBD matrix computational approach weighted based on pattern.The present invention is according to large family, the genotype data feature of multidigit point, low miss rate, a possibility that haplotype pattern occurs size is measured in proposition using combination condition probability, introduce horizontal and vertical control parameter, the search strategy of bounded depth-first and branch-and-bound is taken, establishes between family's individual based on multiple possible haplotype patterns and specifies the weighting IBD calculation method in site in chromosome.

Description

IBD matrix computational approach based on pattern weighting

Technical field

The invention belongs to biogenetics technical field, it is related to the IBD matrix computational approach weighted based on pattern.

Background technique

The rapid development of gene sequencing technology of new generation is so that the Human Genome Project is fulfiled ahead of schedule, nucleic acid database, base Heredity, physics and the transcriptional expression map of cause have become completely, this provides dye for biological geneticist and related fields researcher The high density genetic polymorphism flag information of magnanimity in colour solid candidate region.How these microsatellite polymorphisms are made full use of (microsatellite polymorphisms) or nucleotide polymorphisms (single-nucleotide polymorphisms, SNP) the important genetics information carried.

The present invention is directed to the large family deletion form data of intensive SNP, analyzes the linkage relationship in SNP data multiple labeling site, Using there is the higher haplotype pattern of possibility and its combination condition probability, propose family member in chromosome any position IBD (identity-by descent) matrix weights estimation method, for complex disease mode of inheritance research provide it is important according to According to.After the progress of haplotype pattern generating process, both made us and has reduced the haplotype pattern being likely to occur using a variety of strategies, But finally still obtain a large amount of haplotype pattern.And traditional method calculate a family in any two individual at some It is for a determining haplotype pattern when determining the IBD in site.Therefore, it is proposed that by each haplotype lattice The combination condition probability of office finally obtains the IBD matrix of pattern weighting as corresponding power.Wherein we will be to haplotype pattern Weight carries out specially treated, because the combination condition probability of a haplotype pattern is one very small small under normal circumstances Number, if use of directly holding power may result in calculated result exception.

Summary of the invention

The purpose of the present invention is to provide the IBD matrix computational approach weighted based on pattern.The present invention is for intensive SNP's Large family deletion form data analyze the linkage relationship in SNP data multiple labeling site, using there are the higher haplotype lattice of possibility Office and its combination condition probability propose family member in IBD (identity-by descent) matrix of chromosome any position Weighting estimating method provides important evidence for the mode of inheritance research of complex disease.

The technical scheme adopted by the invention is that including the IBD matrix computational approach based on pattern weighting.

For single haplotype pattern, the IBD matrix between recursive algorithm calculating family member in a certain specified site is utilized. Based on observation data D, the allele that individual i is inherited from parents xThe allele inherited with ancestors j from parents y(i> J) the IBD probability on QTL are as follows:

WhereinWithIt is allele respectivelyWith the paternal allele of xWith Maternal alleleIBD probability.WithIt is individual i respectively from parents' x equipotential base CauseWithInherit alleleProbability.In this way, any two individual i and j (i > j) is in a hypothesis QTL d in family The IBD probability at placeAre as follows:

Based on original large family missing data, the available one haplotype pattern collection to match with initial data.Base It is all inaccurate to ignore the IBD matrix that other monomers type pattern obtains in any one.Therefore using to haplotype pattern The strategy of weighting calculates final IBD matrix, it may be assumed thatWherein hc_iIt is that haplotype pattern is concentrated Haplotype pattern,It is the IBD matrix based on family's data D (haplotype pattern), pr (hc_i| D) it is given observation man Haplotype pattern hc under the conditions of race data D_iThe probability of appearance.Only take the preceding n of combination condition maximum probability_s(one given in advance Control parameter) a haplotype pattern calculated, and corresponding processing has also been made in weight of each haplotype pattern for calculating.

Specific embodiment

The present invention is described in detail With reference to embodiment.

The present invention is based on the IBD matrix computational approach of pattern weighting: for single haplotype pattern, utilizing recursive algorithm meter In the IBD matrix in a certain specified site between calculation family member.Based on observation data D, the allele that individual i is inherited from parents xThe allele inherited with ancestors j from parents yThe IBD probability of (i > j) on QTL are as follows:

WhereinWithIt is allele respectivelyWith the paternal allele of xWith Maternal alleleIBD probability.WithIt is individual i respectively from parents' x equipotential base CauseWithInherit alleleProbability.In this way, any two individual i and j (i > j) is in a hypothesis QTL in family IBD probability at dAre as follows:

The present invention proposes to utilize combination condition probability according to large family, the genotype data feature of multidigit point, low miss rate Measure a possibility that haplotype pattern occurs size, introduce horizontal and vertical control parameter, take bounded depth-first and point The search strategy that branch is delimited is established between family's individual based on multiple possible haplotype patterns and specifies the weighting in site in chromosome IBD calculation method.

The above is only not to make limit in any form to the present invention to better embodiment of the invention System, any simple modification that embodiment of above is made according to the technical essence of the invention, equivalent variations and modification, Belong in the range of technical solution of the present invention.

Claims

1. the IBD matrix computational approach based on pattern weighting, it is characterised in that: for single haplotype pattern, calculated using recurrence In the IBD matrix in a certain specified site, based on observation data D, the equipotential that individual i is inherited from parents x between method calculating family member GeneThe allele inherited with ancestors j from parents yIBD probability on QTL are as follows:

WhereinWithIt is allele respectivelyWith the paternal allele of xAnd female parent AlleleIBD probability,WithIt is individual i respectively from parents' x allele WithInherit alleleProbability, in this way, any two individual i and j (i > j) is at a hypothesis QTL d in family IBD probabilityAre as follows:

Based on original large family missing data, the available one haplotype pattern collection to match with initial data is based on it In any one and to ignore the IBD matrix that other monomers type pattern obtains all inaccurate, therefore weighted using to haplotype pattern Strategy calculate final IBD matrix:Wherein hc_iIt is the haplotype that haplotype pattern is concentrated Pattern,It is the IBD matrix based on family's data D haplotype pattern, pr (hc_i| D) it is given observation family's data D Under the conditions of haplotype pattern hc_iThe probability of appearance only takes the preceding n of combination condition maximum probability_sA haplotype pattern is calculated, Corresponding processing has also been made in weight of each haplotype pattern for calculating.