CN109273046B - Biological whole sibling identification method based on probability statistical model - Google Patents

Biological whole sibling identification method based on probability statistical model Download PDF

Info

Publication number
CN109273046B
CN109273046B CN201811223988.2A CN201811223988A CN109273046B CN 109273046 B CN109273046 B CN 109273046B CN 201811223988 A CN201811223988 A CN 201811223988A CN 109273046 B CN109273046 B CN 109273046B
Authority
CN
China
Prior art keywords
ibs
probability
str
identified
unrelated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811223988.2A
Other languages
Chinese (zh)
Other versions
CN109273046A (en
Inventor
赵书民
靳超
赵峰
赵琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Dongnan Evidence Science Research Institute Co ltd
Original Assignee
江苏东南证据科学研究院有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 江苏东南证据科学研究院有限公司 filed Critical 江苏东南证据科学研究院有限公司
Priority to CN201811223988.2A priority Critical patent/CN109273046B/en
Publication of CN109273046A publication Critical patent/CN109273046A/en
Application granted granted Critical
Publication of CN109273046B publication Critical patent/CN109273046B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to the technical field of biological genetics, in particular to a biological holomorphic identification method based on a probability statistic model. The biological holomorphic identification method based on the probability statistic model, provided by the invention, carries out n STR loci typing independent of each other on two identified persons, and IBS-B (2n, pi) is distributed according to two items of IBS in a population of unrelated individual pairs0) And two-fold distribution of IBS-B (2n, π) in the cohort population1) The probability p of two identified artificial unrelated individuals can be calculatedH0And the probability p that both are full siblingsH1. The biological full sibling identification method based on the probability statistical model is suitable for any number of STR loci and has high analysis efficiency, so the technical scheme provided by the invention has wide application prospect and good market potential.

Description

Biological whole sibling identification method based on probability statistical model
Technical Field
The invention relates to the technical field of biological genetics, in particular to a biological holomorphic identification method based on a probability statistic model.
Background
Biological homozygotes refer to a plurality of bodies that are born by the same pair of parents and mothers. Among the various genetic identification, biological holomorphic (hereinafter, abbreviated as holomorphic) identification is one of the most common genetic identification cases except for paternity identification. Two individuals carrying out the identification of the homoblast can provide some indirect supporting evidence by means of sex chromosome genetic markers (such as Y chromosome STR for male and X chromosome STR for female) and the female genetic characteristics of the mitochondrial DNA when the sexes are the same, and only the mitochondrial DNA polymorphism can provide indirect supporting evidence when the sexes are different. Because of the limited mitochondrial DNA polymorphism, the academia has recognized that the highly polymorphic protosome STR genetic marker should be dominant in performing the identification of the homoblast.
In the prior art, there are generally two different approaches to the evaluation of genetic evidence when using autosomal STRs for whole-sib identification.
The first approach is to use the ITO algorithm to calculate the Full Sibling Index (FSI) between two identified persons. FSI is the ratio of the probability that two identified persons are homozygotes to the probability that two identified persons are unrelated as a measure of the value of the genetic evidence obtained.
The second approach is to use an Identity By State (IBS), which is also a global evaluation index that has been studied more in recent years in the field of global identity. The IBS value is the number of identical alleles present at the same STR locus in both identified persons. At the same STR locus, if the genotypes are the same, i.e. 2 alleles are all the same, then the score is 2; scoring 1 if there are only 1 identical alleles; if there are no identical alleles then a score of 0 is given. For a multiple STR typing system, the sum of the scores of each STR locus is the IBS score of two identified persons based on the STR multiple typing system. The basic rationale for IBS for the identification of homozygotes is that the occurrence of the same allele between unrelated individuals is a random match event, but the probability of the occurrence of the same allele between homozygotes is determined by the mendelian genetic law of the autosomal STR: that is, there is a 25% probability that two homomorphic genotypes are the same, and the score is 2; there is a 50% probability that two homozygotes have 1 identical allele, and a score of 1; there is only a 25% probability that two full siblings do not have the same allele. Because of the high polymorphism of STR genetic markers, it is determined that the probability of the same allele occurring in the same cell is much higher than the random matching probability between unrelated individuals. Compared with FSI, IBS has the obvious advantage that no complex operations are required; therefore, IBS scores are also adopted as an evaluation index in 'identification and implementation specifications of biological holomorphic relationship, SF/Z JD 0105002-2014' issued by the department of justice in China in 2014.
However, whatever the above evaluation schemes, they still have certain problems in practical application of the homomorphic identification case.
For example, the problem of how to evaluate system performance: in any paternity test using autosomal STRs, the performance of the multiplex STR typing system used needs to be defined. Taking the diad paternity test as an example, the system performance refers to the non-paternal exclusion rate of the corresponding multiple STR typing system. However, there is no corresponding method for assessing systemic efficacy in the whole sibling assay, whether using FSI or IBS scoring.
As another example, the problem of how to determine the threshold value of the corresponding evaluation index: in the evaluation of the value of genetic evidence for homoblast identification using FSI, by definition of FSI, it is intended that two identified persons will be biologically homoblast when FSI > 1. In the case of a system that cannot be evaluated, there is a great risk of performing a full sibling judgment based on this threshold of FSI. Therefore, in actual practice, when performing a full-sibling test using an STR typing system such as Sinofiler or Identifiler, a scholars empirically set the FSI threshold to be greater than 20. If the STR loci are further increased, the threshold may be considered appropriate and the accuracy of the identification associated with the threshold may need to be reviewed and evaluated. In addition, the same problem of threshold setting exists in the application of IBS. The biological holomorphic relation identification implementation specification, SF/Z JD0105002-2014, for IBS thresholding clearly embodies its dilemma with respect to IBS thresholding. In this specification, in order to set the threshold for IBS, the number of STR loci detected is artificially specified, specifically divided into three grades, 19, 29 and 39 (among them, 19 autosomal STR loci commonly used (vWA, D21S11, D18S51, D5S818, D7S820, D13S317, D16S539, FGA, D8S1179, D3S1358, CSF1PO, TH01, TPOX, Penta E, Penta D, D2S1338, D19S433, D12S391, D6S1043) are essential loci). These three ranks are defined because the prior art cannot study and evaluate the probability distribution of IBS corresponding to any number of STR gene combinations.
Therefore, how to establish a statistical model suitable for biological whole-sibling identification to determine the probability distribution characteristics of evaluation indexes of the whole-sibling identification in the unrelated individual-to-population and the whole-sibling-to-population is one of the research focus and difficulty of researchers in the field.
Disclosure of Invention
Because IBS indexes in 'biological holomorphic relationship identification implementation specification, SF/Z JD 0105002-2014' are generally adopted at present, the probability distribution characteristics of IBS scores in unrelated individual-to-population and holomorphic-to-population are given by establishing a proper statistical model.
Two identified persons are provided, and three mutually exclusive results can be obtained by comparing the typing results of the two persons at a certain STR locus: has 2 identical alleles, only 1 identical allele and no identical allele, respectively expressed as a2=1、a1=1、a01 indicates that three states are satisfied, and 0 indicates that three states are not satisfied. The number of alleles of the two identified persons at the STR locus which are the same is the status consistency score of the two identified persons at the STR locus, and the score is recorded as ibs. For a particular individual pair, there is: a is2+a1+a01, and ibs 2a2+a1
If the two identified persons complete n autosomal STR locus typing with mutual independence, the total number of loci with the same genotype between the two identified persons is marked as A2The total number of loci with only 1 allele identical was designated A1The total number of loci without the same allele was designated A0Then A is2、A1、A0A on each STR seat2、a1、a0The sum of the values, and having:
A2+A1+A0is n (formula 1)
Further, according to A2、A1、A0Can be used forCalculating the total number of the same alleles of the unrelated individual pairs on the n detected autosomal STR loci, namely the IBS value, and the calculation formula is as follows:
IBS=2×A2+1×A1+0×A0(formula 2)
According to the above parameter definitions, it is obvious that a2、a1、a0All parameters are parameters (the values are only possible to be 0 and 1) which accord with binomial distribution; according to the additivity of the two-term distribution, the sum A of each of them is known2、A1、A0Is also a two-term distribution parameter, and IBS is also a two-term distribution parameter. Therefore, only the key parameter (total rate pi) of the two IBS distributions needs to be obtained to realize the description of the IBS probability distribution.
Specifically, the invention provides a biological holomorphic identification method based on a probability statistical model, which comprises the following steps:
s1: using a DNA extraction kit to respectively extract the genomic DNA of biological test materials of two identified persons;
s2: performing PCR amplification, and performing STR typing to obtain typing results of n STR loci which are independent from each other;
s3: establishing a test hypothesis:
primitive hypothesis H0: two identified persons are unrelated individuals;
alternative hypothesis H1: two identified artificial biological whole siblings;
s4: comparing and calculating the number of the same alleles existing on each STR locus according to the genotypes of the STR loci of the two identified persons, and accumulating to obtain the total number of the same alleles existing on all the STR loci of the two identified persons, namely IBS value;
s5: according to the allele frequency of each STR locus in the crowd, the IBS is obtained to accord with two distributions of IBS-B (2n, pi) in the crowd of the unrelated individual0) IBS meets the two-phase distribution of IBS-B (2n, π) in the cohort of the homoblast pair1);
Wherein, with a2=1、a1=1、a01 is independentlyIndicates that the presence of 2 identical alleles, only 1 identical allele, and no identical allele at any STR locus are satisfied, and a2、a1、a0When the values are 0, the states are not satisfied;
setting m alleles of any STR locus and using fiIndicates the frequency of the ith allele at the STR locus, where i ═ 1,2,3 … … m, then:
Figure BDA0001835479540000041
with p2、p1、p0Respectively represents a between the unrelated individual pairs A and B2、a1、a0The probability of 1 is obtained because A and B are on the same STR locus, a2、a1、a0If one of the three can take the value of 1, the following are provided:
p2+p1+p0as 1 (formula 3)
Wherein the content of the first and second substances,
Figure BDA0001835479540000042
Figure BDA0001835479540000043
Figure BDA0001835479540000044
thus, the expectation of IBS in unrelated individuals for the population e (IBS) is:
Figure BDA0001835479540000051
in unrelated individual to populationTotal rate of IBS of0Comprises the following steps:
Figure BDA0001835479540000052
with p2FS、p1FSRespectively represent a between the pair of homozygotes C and D2、a1The probability of a value of 1 is then:
Figure BDA0001835479540000053
Figure BDA0001835479540000054
thus, the expectation for IBS in the cohort population of pan-siblings, e (IBS), is:
Figure BDA0001835479540000055
total Power of IBS in the Pan-Coincident population1Comprises the following steps:
Figure BDA0001835479540000056
s6: the calculation was performed using the binommist function in EXCEL software:
the input parameter BINOMDIST (IBS value, 2n, pi)0FALSE) to calculate the probability p of two identified individuals being unrelated to each otherH0
The input parameter BINOMDIST (IBS value, 2n, pi)1FALSE) to calculate the probability p of two identified artifacts being homologiesH1
S7: evaluation of genetic evidence value:
when p isH1>pH0When, p is calculatedH1And pH0Ratio R of1The meaning is as follows: probability of two identified persons being a full siblingR being the probability that both are unrelated individuals1Doubling; when p isH1<pH0When, p is calculatedH0And pH1Ratio R of0The meaning is as follows: r that the probability of two identified individuals being unrelated is the probability of both being a full sibling0Doubling;
when two identified people tend to be unrelated, the probability of making a mistake is pH1Accuracy of 1-pH1(ii) a When two identified people tend to be identified as being homozygotes, the probability of making a mistake in this conclusion is pH0Accuracy of 1-pH0
Preferably, in the above method for identifying biological whole siblings based on probabilistic statistical model, the STR loci in step S2 include at least the following 19 essential loci: vWA, D21S11, D18S51, D5S818, D7S820, D13S317, D16S539, FGA, D8S1179, D3S1358, CSF1PO, TH01, TPOX, Penta E, Penta D, D2S1338, D19S433, D12S391, D6S 1043.
Further preferably, in the above method for identifying biological holosibles based on probabilistic statistical model, the STR locus in step S2 is composed of the 19 essential loci and several different complementary loci, wherein the complementary loci are selected from: D1S1656, D2S441, D3S1744, D3S3045, D4S2366, D5S2500, D6S477, D7S1517, D7S3048, D8S1132, D10S1248, D10S1435, D10S2325, D11S2368, D13S325, D14S608, D15S659, D17S1290, D18S535, D19S253, D21S2055, D22-GATA198B 05.
Still further preferably, in the above-mentioned probabilistic statistical model-based biological whole sibling identification method, the STR locus described in step S2 is composed of the 19 examined loci and 10 different supplementary loci.
Still further preferably, in the above-mentioned probabilistic statistical model-based biological whole sibling identification method, the STR locus described in step S2 is composed of the 19 examined loci and 20 different supplementary loci.
In summary, the probability-based statistical model provided by the inventionThe biological full sibling identification method can obtain probability distribution characteristics of IBS in two research crowds according to the allele frequencies of n mutually independent STR loci in the crowds, and specifically comprises the following steps: two-phase distribution of IBS-B (2n, π) in unrelated individual vs. population0) And two-fold distribution of IBS-B (2n, π) in the cohort population1). On the basis, only a calculation tool of two-term distribution probability, such as BINOMDIST function in EXCEL software, is needed to be used for conveniently calculating the probability p of two identified human unrelated individuals corresponding to any IBS scoreH0And the probability p of the two artificial full siblingsH1. By pH1And pH0The ratio of (a) to (b) represents that the probability of two identified human biological synucleotides is a multiple of the probability of the two human unrelated individuals, so as to complete the value evaluation of genetic evidence in the synucleosis identification.
In addition, the systemic potency of any STR multiple typing system on biological whole-sibling can also be evaluated based on the characteristics of the two-fold distribution of IBS. The system potency of STR multi-typing systems in biological whole-sib identification is related to the artificially defined IBS threshold. As shown in FIG. 1, when the criterion of IBS less than or equal to X is used as the criterion of the related-unrelated individual, it can be calculated by a binomist calculation tool such as the binomist in EXCEL, by inputting binomist (X,2n, π) into EXCEL0TRUE), thereby obtaining the cumulative probability corresponding to IBS from 0 to X in the unrelated individual, i.e. the ratio of the unrelated individual pair that can be determined by the multiple typing system according to the determination threshold, i.e. the detection rate of the unrelated individual pair corresponding to the corresponding threshold of the multiple typing system; inputting BINOMDIST (X,2n, pi)1TRUE) to obtain the cumulative probability corresponding to IBS from 0 to X in the biological full-sibling pair, that is, the multiple typing system determines the biological full-sibling pair as the ratio of the unrelated individual pair by mistake according to the determination threshold, and the accuracy of determining the unrelated individual pair corresponding to the corresponding threshold of the multiple typing system is approximated by subtracting the ratio by 1. Same as thatThe detection rate and accuracy corresponding to the threshold value for judging biological full siblings by the multi-typing system can be obtained. The invention also successfully provides an evaluation method for the performance of the STR typing system holomorphic identification system.
In a word, the invention determines the two distribution characteristics of the evaluation index IBS of the homomorphic identification in the individual-to-crowd without genetic relationship and the homomorphic-to-crowd through the establishment of a statistical model, and further successfully applies the two distribution characteristics to the biological homomorphic identification; in addition, the biological full sibling identification method based on the probability statistical model is suitable for any number of STR loci, and has high analysis efficiency, so the technical scheme provided by the invention has wide application prospect and good market potential.
Drawings
FIG. 1 is a graph of probability distribution of IBS values for the 19 examined locus typing system of example 1; the curve is obtained by adopting binomial distribution approximate normal distribution simulation, and the nonlinear simulation adopts a GraphPad Prism 5.0 software package; wherein, the left curve represents the probability distribution of the IBS value corresponding to the unrelated individual to the crowd, and the right curve represents the probability distribution of the IBS value corresponding to the crowd of the identical cell pair; and the height of the vertical dotted line is the probability that the detected individual pair is an unrelated individual when the IBS is 10.
Detailed Description
The present invention will now be described in detail and with reference to specific embodiments thereof for the purpose of promoting a better understanding of the invention, but the following detailed description does not limit the scope of the invention.
Derivation process of two-item distribution of IBS in crowd by unrelated individual
P may be paired according to the genotype (homozygous or heterozygous) of unrelated individual pairs A and B at the STR locus2、p1、p0And (3) decomposing:
p2=p2_HoHo+p2_HeHe(formula 4)
p1=p1_HoHe+p1_HeHe(formula 5)
p0=p0_HoHo+p0_HoHe+p0_HeHe(formula 6)
Wherein HoHoHoHo means that A and B are homozygote type, HeHe means that A and B are heterozygote type, HoHe means that one person is homozygote type and the other person is heterozygote type.
On the basis of this, p2、p1、p0All can be STR locus allele frequency fiPut another way, we can obtain a two-fold distribution of ibs in two study populations (unrelated individual versus population and biologically identical cell versus population) as long as the allelic frequency distribution of STR loci in the population is obtained.
p2Derivation of calculation formulas
Wherein p is2_HoHoRefers to the probability that the genotypes A and B are identical and are homozygotes, and according to this definition, p2_HoHoCan be written as:
Figure BDA0001835479540000081
p2_HeHerefers to the probability that the genotypes A and B are the same and are heterozygotes, and according to this definition, p2_HeHeCan be written as:
Figure BDA0001835479540000091
thus, according to equation (4), it is possible to obtain:
Figure BDA0001835479540000092
p1derivation of calculation formulas
Wherein p is1_HoHeRefers to the probability that A and B have only 1 identical allele at that locus, one of which is homozygous and the other is heterozygous. According to this definition, p1_HoHeCan be written as:
Figure BDA0001835479540000093
p1_HeHerefers to the probability that A and B have only 1 identical allele at that locus and that both are heterozygous. According to this definition, p1_HeHeCan be written as:
Figure BDA0001835479540000101
further, according to p1_HoHe、p1_HeHeP can be obtained by the calculation formula of (5)1The calculation formula of (a) is as follows:
Figure BDA0001835479540000102
p0derivation of calculation formulas
According to formulae (3) and p2、p1Can obtain p0The calculation formula of (2) is as follows:
Figure BDA0001835479540000103
wherein p is0_HoHoRefers to the probability that A and B do not have the same allele and are both homozygotes. According to this definition, p0_HoHoCan be written as:
Figure BDA0001835479540000104
at the same time, p0_HoHeRefers to the probability that A and B do not have the same allele and one of them is homozygous and the other is heterozygous. According to this definition, p0_HoHeCan be written as:
Figure BDA0001835479540000111
furthermore, p0_HeHeMeans that both A and B are heterozygous and do not have the same allele. According to this definition, p0_HeHeCan be written as:
Figure BDA0001835479540000112
according to formulae (6) and p0、p0_HoHoAnd p0_HeHoCan be converted into p0_HeHeIs calculated as follows:
Figure BDA0001835479540000113
expectation, overall rate and variance of IBS in unrelated individuals for the population
According to a2、a1、a0As can be seen from the definition of (a), typing detection is carried out on the unrelated individual pairs A and B by using STR loci containing n mutually independent STR loci2The number of times such an event occurs as 1, i.e. A2Obedience overall rate is P2The distribution of two terms in (A) can be written as2~B(n,P2) (ii) a Likewise, A1Overall compliance rate of P1The distribution of two terms in (A)1~B(n,P1). Wherein P is1For each bit p2Arithmetic mean of (1), P1For each bit p1The arithmetic mean of (d) is then:
Figure BDA0001835479540000114
Figure BDA0001835479540000115
for a typing system comprising n STR loci independent of each other, the maximum value of IBS is 2n, and the overall IBS compliance of unrelated individuals to the population is pi0The second distribution of (c): IBS-B (2n, π)0)。
The expected value of IBS, e (IBS), can be calculated according to equation (2):
Figure BDA0001835479540000116
then the overall rate of IBS pi for unrelated individuals to the population0Comprises the following steps:
Figure BDA0001835479540000121
derivation of two-phase distribution of IBS in homoblast-pair population
According to the modern marriage custom in China, biological parents of the same sibling pair C and D can be regarded as unrelated individuals. At least 1 allele (designated as P) and at most 4 different alleles (designated as P, Q, R, S) can be detected in two unrelated individuals A and B at a certain autosomal STR locus. A and B have two independent events, and under the condition of not considering the spontaneous mutation of STR locus, according to Mendelian genetic rule, the autosomal STR locus a between C and D corresponding to different genotype combinations of A and B can be obtained2=1、a1=1、a0The probability of 1 is specifically listed in table 1 below:
TABLE 1 parental generation different genotype combination corresponding progeny holomorphic pair a2、a1Or a0Probability of 1
Figure BDA0001835479540000122
Remarking: possible genotypes of the offspring are listed in brackets after the equal sign of the first column, and probability calculation formulas corresponding to different genotype combinations of the parent A and the parent B in the first column are shown in (formula 4) to (formula 6); ho represents homozygote, He represents heterozygote; HoHoHo indicates that the compounds are homozygotes; HoHe indicates that one is homozygote and the other is heterozygote; HeHe indicates all heterozygotes.
If with p2FS、p1FSRespectively represent a between the homoblast pair C and D2、a1The probability of 1 is obtained, and then p is known according to the table 12FS、p1FSCan be respectively expressed as:
p2FS=p2FS_HoHo+p2FS_HeHe(formula 7)
p1FS=p1FS_HoHe+p1FS_HeHe(formula 8)
p2FSDerivation of calculation formulas
Wherein p is2FS_HoHoThe probability that the genotypes C and D are the same and are homozygotes is the sum of the products of the probability values of the second row in the table 1 and the combined probabilities of the corresponding genotypes A and B, namely p2FS_HoHoCan be written as:
Figure BDA0001835479540000131
thus, it can be seen that:
Figure BDA0001835479540000132
and, p2FS_HeHeThe probability that the genotypes C and D are the same and are heterozygotes is the sum of the products of the probability values of the rows in the third column in the table 1 and the combined probability of the corresponding genotypes A and B, namely p2FS_HeHeCan be written as:
Figure BDA0001835479540000133
further, it can be seen that:
Figure BDA0001835479540000134
thus, according to equation (7):
Figure BDA0001835479540000135
p1FSderivation of calculation formulas
Wherein p is1FS-HoHeRefers to the probability that C and D have only 1 identical allele at that locus and one of them is homozygous and the other is heterozygous. This probability is the sum of the probability values of the rows in the fourth column of Table 1 multiplied by the probability values of the combinations of the corresponding genotypes A and B, i.e., p1FS_HoHeCan be written as:
Figure BDA0001835479540000136
thus, it can be seen that:
Figure BDA0001835479540000141
and, p1FS_HeHeRefers to the probability that C and D have only 1 identical allele at that locus and that both are heterozygous. This probability is the sum of the probability values of the rows in the fifth column of Table 1 multiplied by the probability values of the combinations of the corresponding genotypes A and B, i.e., p1FS_HeHeCan be written as:
Figure BDA0001835479540000142
thus:
Figure BDA0001835479540000143
thus, according to equation (8):
Figure BDA0001835479540000144
expectation, overall rate and variance of IBS in the cohort of the pan-siblings
Typing biological holosyncytium pairs C and D by using STR loci containing n mutually independent STR loci, a2The number of times such an event occurs as 1, i.e. A2FSObedience overall rate is P2FSThe distribution of two terms in (A) can be written as2~B(n,P2FS) (ii) a Likewise, A1FSOverall compliance rate of P1FSThe distribution of two terms in (A)1~B(n,P1FS). Wherein P is2FSFor each bit p2FSIs an arithmetic mean of1FSFor each bit p1FSThe arithmetic mean of (d) is then:
Figure BDA0001835479540000145
Figure BDA0001835479540000146
and, according to equation (2) and the additivity of the two distributions, for a typing system comprising n STR loci independent of each other, the maximum value of IBS is 2n, and IBS obeys an overall rate of π in the cohort1The second distribution of (c): IBS-B (2n, π)1)。
The expectation value for IBS in the cohort of pan-siblings, e (IBS), is known to be:
Figure BDA0001835479540000151
the overall rate of IBS in the cohort of the homoblast pairs pi1Comprises the following steps:
Figure BDA0001835479540000152
example 1
According to the implementation specification of biological holomorphic relationship identification, SF/Z JD0105002-2014, the biological holomorphic identification method based on the probability statistical model is adopted to detect and identify two identified persons:
step one, extracting DNA:
extracting the finger blood genome DNA of two identified persons respectively by adopting a commercially available DNA extraction kit;
step two, STR typing:
performing PCR amplification by using a golden eye 20A kit, and performing STR typing on 19 essential test loci of two identified persons to obtain typing results of 19 STR loci which are independent from each other;
step three, establishing a detection hypothesis
Primitive hypothesis H0: two identified persons are unrelated individuals;
alternative hypothesis H1: two identified artificial biological whole siblings;
step four, calculating statistics:
comparing and calculating the number of the same alleles existing on each STR locus according to the genotypes of 19 essential loci of two identified persons, and accumulating to obtain the total number of the same alleles existing on all the STR loci of the two identified persons, namely the IBS value is 10;
step five, obtaining two distribution characteristics:
according to the 19 essential loci allele frequencies in the Han nationality of east China, provided by Forensic Sci Int, Genet,2009,3(4), e117-e118 and the journal of Forensic science, 2007,23(5):345-346, in the typing system, IBS conforms to two distributions of IBS-B (38, 0.3110) in the unrelated individual versus the population, and IBS conforms to two distributions of IBS-B (38,0.6280) in the same cell-pair population, and the corresponding probability distributions are shown in FIG. 1;
H0step six, respectively calculating the probability p of two identified persons as unrelated persons and the probability p of two identified persons as unrelated persons H1Probability of full sibling p:
using the binomial distribution function in EXCEL, it is known that IBS 10, 2n 38, pi0=0.3110,π10.6280, then:
pH0=BINOMDIST(10,38,0.3110,FALSE)=0.118129596;
pH1=BINOMDIST(10,38,0.6280,FALSE)=0.000004260。
and seventhly, evaluating the genetic evidence value:
R0=pH0/pH1=0.118129596/0.000004260≈27726
this indicates that: the probability of two identified individuals being unrelated is about 27726 times the probability of both being a full sibling. When IBS is 10, the probability of misjudging a full sibling as an unrelated individual is pH1The accuracy of the conclusion "tend to identify two identified unrelated individuals" when 0.000004260, i.e. IBS is 10: 1-pH1=0.999995740。
Example 2
The biological holomorphic relationship identification implementation standard, SF/Z JD0105002-2014, of the biological holomorphic relationship identification method based on the probability statistical model is adopted to evaluate the efficiency of a given threshold value:
from example 1 above, it can be seen that when two identified persons were subjected to STR typing using 19 examined STR loci in "biological holomorphic relationship identification practice, SF/Z JD 0105002-2014", IBS was found to match two distributions of IBS-B in the unrelated individual versus the population (38, 0.3110) and IBS was found to match two distributions of IBS-B in the holomorphic versus population (38,0.6280), there were:
the truly unrelated individuals with IBS < ═ 13 were: binomdst (13,38,0.3110, tune) ═ 0.7268
The full sibling of IBS > -22 is: 1-binomdst (21,38,0.6280, tune) ═ 0.7876
The holomorphs (misclassified as unrelated individuals) for IBS < ═ 13 were:
BINOMDIST(13,38,0.6280,TRUE)=0.0003
irrelevant individuals (misinterpreted as a pan) for IBS > -22 were:
BINOMDIST(21,38,0.3110,TRUE)=0.0006
therefore, when 19 mandatory STR loci are tested, the proportion of biased opinions (i.e., sensitivity) can be derived from the threshold given by the specification: (0.7268+0.7876)/2 ═ 0.7572;
the confidence (i.e., specificity) of the obtained tendency opinion is: 0.9996.
the embodiments of the present invention have been described in detail, but the embodiments are merely examples, and the present invention is not limited to the embodiments described above. Any equivalent modifications and substitutions to those skilled in the art are also within the scope of the present invention. Accordingly, equivalent changes and modifications made without departing from the spirit and scope of the present invention should be covered by the present invention.

Claims (3)

1. A biological full sibling identification method based on a probability statistic model is characterized by comprising the following steps:
s1: using a DNA extraction kit to respectively extract the genomic DNA of biological test materials of two identified persons;
s2: performing PCR amplification, and performing STR typing to obtain typing results of n STR loci which are independent from each other;
s3: establishing a test hypothesis:
primitive hypothesis H0: two identified persons are unrelated individuals;
alternative hypothesis H1: two identified artificial biological whole siblings;
s4: comparing and calculating the number of the same alleles existing on each STR locus according to the genotypes of the STR loci of the two identified persons, and accumulating to obtain the total number of the same alleles existing on all the STR loci of the two identified persons, namely IBS value;
s5: according to the allele frequency of each STR locus in the crowd, the IBS is obtained to accord with two distributions of IBS-B (2n, pi) in the crowd of the unrelated individual0) IBS meets the two-phase distribution of IBS-B (2n, π) in the cohort of the homoblast pair1);
Wherein, with a2=1、a1=1、a0Each is represented by 1The condition that 2 identical alleles exist, only 1 identical allele exists and no identical allele exists at any STR locus is respectively satisfied, a2、a1、a0When the values are 0, the states are not satisfied;
setting m alleles of any STR locus and using fiIndicates the frequency of the ith allele at the STR locus, where i ═ 1,2,3 … … m, then:
Figure FDA0001835479530000011
with p2、p1、p0Respectively represents a between the unrelated individual pairs A and B2、a1、a0The probability of a value of 1 is then:
p2+p1+p0=1
wherein the content of the first and second substances,
Figure FDA0001835479530000012
Figure FDA0001835479530000013
Figure FDA0001835479530000021
thus, the expectation of IBS in unrelated individuals for the population e (IBS) is:
Figure FDA0001835479530000022
total Power of IBS in unrelated individuals vs. the population0Comprises the following steps:
Figure FDA0001835479530000023
with p2FS、p1FSRespectively represent a between the pair of homozygotes C and D2、a1The probability of a value of 1 is then:
Figure FDA0001835479530000024
Figure FDA0001835479530000025
thus, the expectation for IBS in the cohort population of pan-siblings, e (IBS), is:
Figure FDA0001835479530000026
total Power of IBS in the Pan-Coincident population1Comprises the following steps:
Figure FDA0001835479530000027
s6: the calculation was performed using the binommist function in EXCEL software:
the input parameter BINOMDIST (IBS value, 2n, pi)0FALSE) to calculate the probability p of two identified individuals being unrelated to each otherH0
The input parameter BINOMDIST (IBS value, 2n, pi)1FALSE) to calculate the probability p of two identified artifacts being homologiesH1
S7: evaluation of genetic evidence value:
when p isH1>pH0When, p is calculatedH1And pH0Ratio R of1The meaning is as follows: the probability that two identified persons are full siblings is twoR being the probability of an unrelated individual1Doubling; when p isH1<pH0When, p is calculatedH0And pH1Ratio R of0The meaning is as follows: r that the probability of two identified individuals being unrelated is the probability of both being a full sibling0Doubling;
when two identified people tend to be unrelated, the probability of making a mistake is pH1Accuracy of 1-pH1(ii) a When two identified people tend to be identified as being homozygotes, the probability of making a mistake in this conclusion is pH0Accuracy of 1-pH0
2. The method of claim 1, wherein the STR loci of step S2 comprise at least the following 19 essential loci: vWA, D21S11, D18S51, D5S818, D7S820, D13S317, D16S539, FGA, D8S1179, D3S1358, CSF1PO, TH01, TPOX, Penta E, Penta D, D2S1338, D19S433, D12S391, D6S 1043.
3. The probabilistic model-based biological whole sibling identification method of claim 2, wherein the STR loci of step S2 consists of the 19 essential loci and several different complementary loci, wherein the complementary loci are selected from the group consisting of: D1S1656, D2S441, D3S1744, D3S3045, D4S2366, D5S2500, D6S477, D7S1517, D7S3048, D8S1132, D10S1248, D10S1435, D10S2325, D11S2368, D13S325, D14S608, D15S659, D17S1290, D18S535, D19S253, D21S2055, D22-GATA198B 05.
CN201811223988.2A 2018-10-19 2018-10-19 Biological whole sibling identification method based on probability statistical model Active CN109273046B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811223988.2A CN109273046B (en) 2018-10-19 2018-10-19 Biological whole sibling identification method based on probability statistical model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811223988.2A CN109273046B (en) 2018-10-19 2018-10-19 Biological whole sibling identification method based on probability statistical model

Publications (2)

Publication Number Publication Date
CN109273046A CN109273046A (en) 2019-01-25
CN109273046B true CN109273046B (en) 2022-04-22

Family

ID=65193026

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811223988.2A Active CN109273046B (en) 2018-10-19 2018-10-19 Biological whole sibling identification method based on probability statistical model

Country Status (1)

Country Link
CN (1) CN109273046B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111091869A (en) * 2020-01-13 2020-05-01 北京奇云诺德信息科技有限公司 Genetic relationship identification method using SNP as genetic marker
CN115206424B (en) * 2022-08-24 2023-04-07 温州医科大学 Method, system, equipment and storage medium for identifying full sibling relationship
CN115273976B (en) * 2022-08-24 2023-05-05 温州医科大学 Method, system, equipment and storage medium for identifying semi-sibling relation
CN115206425B (en) * 2022-08-24 2023-03-21 温州医科大学 Triplet paternity testing method, system, equipment and storage medium
CN117219162A (en) * 2023-09-12 2023-12-12 四川大学 Evidence intensity assessment method for body source identification aiming at tumor tissue STR (short tandem repeat) map

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101144774A (en) * 2007-08-24 2008-03-19 张兹钧 Human STRtyper PCR amplification fluorescence detection reagent kit
CN101838686A (en) * 2009-12-16 2010-09-22 公安部物证鉴定中心 Genetic relationship identification device and method
CN101948919A (en) * 2010-09-09 2011-01-19 成都大熊猫繁育研究基地 Kit used for paternity test of giant pandas
CN102858998A (en) * 2009-11-25 2013-01-02 雀巢产品技术援助有限公司 Novel genomic biomarkers for irritable bowel syndrome diagnosis
CN104024438A (en) * 2012-09-28 2014-09-03 未名兴旺系统作物设计前沿实验室(北京)有限公司 Snp loci set and usage method and application thereof
CN104134016A (en) * 2014-07-30 2014-11-05 北京诺禾致源生物信息科技有限公司 Device and method for genealogy reestablishing on molecular level
CN105861668A (en) * 2016-04-21 2016-08-17 昆明医科大学 Forensic physical evidence paternity testing and individual identification parameter calculating method
CN107541554A (en) * 2017-09-14 2018-01-05 中山大学 Genetic marker and its detection method and kit for human body individual identification and/or paternity identification
CN107977550A (en) * 2017-12-29 2018-05-01 天津科技大学 A kind of quick analysis Disease-causing gene algorithm based on compression
CN108103184A (en) * 2018-02-23 2018-06-01 古洁若 A kind of kit for being used to detect ankylosing spondylitis susceptible risk site
CN108135944A (en) * 2014-11-25 2018-06-08 伊夫罗生物科学公司 Probiotics and prebiotic compositions and its method and purposes for adjusting microorganism group

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6777196B2 (en) * 1997-02-18 2004-08-17 Genentech, Inc. Neurturin receptor
US20040157220A1 (en) * 2003-02-10 2004-08-12 Purnima Kurnool Methods and apparatus for sample tracking
ZA200903761B (en) * 2006-11-30 2010-08-25 Navigenics Inc Genetic analysis systems and methods
CN101921851A (en) * 2010-08-13 2010-12-22 司法部司法鉴定科学技术研究所 Method for identifying source of tumor tissue based on Identifiler system
CN102982222B (en) * 2011-09-02 2016-03-02 司法部司法鉴定科学技术研究所 Obtain the short-cut method without relationship index under sudden change situation
CN105407728A (en) * 2013-07-21 2016-03-16 霍勒拜欧姆公司 Methods and systems for microbiome characterization, monitoring and treatment
CN103559427B (en) * 2013-11-12 2017-10-31 高扬 A kind of use Digital ID biological sequence and the method for inferring species affiliation
CN104480205B (en) * 2014-12-10 2017-01-18 西安交通大学 Method of establishing animal paternity identification system on basis of whole genome STR
CN104630383B (en) * 2015-03-17 2017-03-01 内蒙古农业大学 Two-humped camel polymorphism primer and its method for screening technique and identification paternity
CN107609343B (en) * 2017-08-14 2019-11-08 广州金域司法鉴定技术有限公司 Relationship iden- tification method, system, computer equipment and readable storage medium storing program for executing

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101144774A (en) * 2007-08-24 2008-03-19 张兹钧 Human STRtyper PCR amplification fluorescence detection reagent kit
CN102858998A (en) * 2009-11-25 2013-01-02 雀巢产品技术援助有限公司 Novel genomic biomarkers for irritable bowel syndrome diagnosis
CN101838686A (en) * 2009-12-16 2010-09-22 公安部物证鉴定中心 Genetic relationship identification device and method
CN101948919A (en) * 2010-09-09 2011-01-19 成都大熊猫繁育研究基地 Kit used for paternity test of giant pandas
CN104024438A (en) * 2012-09-28 2014-09-03 未名兴旺系统作物设计前沿实验室(北京)有限公司 Snp loci set and usage method and application thereof
CN104134016A (en) * 2014-07-30 2014-11-05 北京诺禾致源生物信息科技有限公司 Device and method for genealogy reestablishing on molecular level
CN108135944A (en) * 2014-11-25 2018-06-08 伊夫罗生物科学公司 Probiotics and prebiotic compositions and its method and purposes for adjusting microorganism group
CN105861668A (en) * 2016-04-21 2016-08-17 昆明医科大学 Forensic physical evidence paternity testing and individual identification parameter calculating method
CN107541554A (en) * 2017-09-14 2018-01-05 中山大学 Genetic marker and its detection method and kit for human body individual identification and/or paternity identification
CN107977550A (en) * 2017-12-29 2018-05-01 天津科技大学 A kind of quick analysis Disease-causing gene algorithm based on compression
CN108103184A (en) * 2018-02-23 2018-06-01 古洁若 A kind of kit for being used to detect ankylosing spondylitis susceptible risk site

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Systematic detection of putative tumor suppressor genes through the combined use of exome and transcriptome sequencing;Zhao Q等;《Genome Biology》;20101231;第11卷(第11期);第1-14页 *
推导 IBS 评分在无关个体对人群中概率分布的计算公式;赵焕东等;《法医学杂志》;20181008;第34卷(第4期);第370-374页 *

Also Published As

Publication number Publication date
CN109273046A (en) 2019-01-25

Similar Documents

Publication Publication Date Title
CN109273046B (en) Biological whole sibling identification method based on probability statistical model
Zaidi et al. Demographic history mediates the effect of stratification on polygenic scores
Wang Sibship reconstruction from genetic data with typing errors
EP2321642B1 (en) Methods for allele calling and ploidy calling
Adie et al. Speeding disease gene discovery by sequence based candidate prioritization
CN110176273B (en) Method and process for non-invasive assessment of genetic variation
Thomson et al. Validation of short tandem repeat analysis for the investigation of cases of disputed paternity
Tian et al. Estimating the genome-wide mutation rate with three-way identity by descent
US20140032128A1 (en) System and method for cleaning noisy genetic data and determining chromosome copy number
Duntsch et al. Polygenic basis for adaptive morphological variation in a threatened Aotearoa| New Zealand bird, the hihi (Notiomystis cincta)
Osada Genetic diversity in humans and non-human primates and its evolutionary consequences
Crow et al. Mutation in human populations
KR102405245B1 (en) Method for Detecting Chromosomal Abnormalities Based on Whole Genome Sequencing and Uses thereof
Fuller et al. Extensive recombination suppression and epistatic selection causes chromosome-wide differentiation of a selfish sex chromosome in Drosophila pseudoobscura
Inbar et al. Comparative study of population genomic approaches for mapping colony-level traits
Yin et al. Overt and concealed genetic loads revealed by QTL mapping of genotype-dependent viability in the Pacific oyster Crassostrea gigas
Mackintosh et al. Do chromosome rearrangements fix by genetic drift or natural selection? Insights from Brenthis butterflies
KR20220062265A (en) SYSTEM AND METHOD FOR DETERMINING GENETIC RELATIONSHIPS BETWEEN A SPERM PROVIDER, OOCYTE PROVIDER, AND THE RESPECTIVE CONCEPTUS
Hatch et al. Phylogenetic relationships among the baleen whales based on maternally and paternally inherited characters
CN112639129A (en) Method and apparatus for determining the genetic status of a new mutation in an embryo
JP6564053B2 (en) A method for determining whether cells or cell groups are the same person, whether they are others, whether they are parents and children, or whether they are related
JP7446343B2 (en) Systems, computer programs and methods for determining genome ploidy
Williams Neuroscience meets quantitative genetics: using morphometric data to map genes that modulate CNS architecture
Mackintosh et al. Do chromosome rearrangements fix by genetic drift or natural selection? A test in Brenthis butterflies
Langlois A review on the methods of parentage and inbreeding analysis with molecular markers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20211123

Address after: Room 201-1, No. 88, dongfangcheng, Xuanwu District, Nanjing, Jiangsu 210018

Applicant after: JIANGSU DONGNAN EVIDENCE SCIENCE RESEARCH INSTITUTE Co.,Ltd.

Address before: 201100 room 402, building 2, No. 138, Xinjun Ring Road, Minhang District, Shanghai

Applicant before: SHANGHAI JINGZHUN BIOMEDICINE Co.,Ltd.

Applicant before: Jiangsu southeast Evidence Science Research Institute Co., Ltd

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant